Category Archives: admin

Getting OpenVPN to Add DNS Servers

I couldn’t get my OpenVPN client to add a DNS server that I know the VPN server was telling it about. It turns out that on Xubuntu 16.04 (and all flavors of Ubuntu, probably), you need to supply additional arguments to make it handle the “dhcp-option” information it receives. More specifically, you have to use the –up and –down options to point to an Ubuntu-supplied script that needs to run when the VPN connection goes up and down.

sudo openvpn --config office.ovpn --script-security 2 --up /etc/openvpn/update-resolv-conf --down /etc/openvpn/update-resolv-conf

Adventures in Docker

large_h-trans

After you’ve taken the time to puzzle through what it is exactly, Docker is nothing short of life changing.

In a nutshell, Docker lets you run programs in separate “containers”, isolating the dependencies each one requires. This is similar to virtualization solutions like VMWare and Virtualbox but Docker is a much more fine grained, customizable tool that operates at a different level.

It took me a week of experimentation to develop a firm grasp of the Docker concepts and how all its pieces work together in practice. I’m now using it at work for development, and I hope to be setting up a configuration for staging soon.

This is a short write-up of what I’ve learned so far.

The Concepts

On a first glance, almost everyone (including me) mistakes Docker as another virtualization technology. It’s not. Virtualization lets you run a machine within a machine. A Docker container is more subtle: it segments or isolates part of your existing Linux operating system.

A container consists of a separate memory space and filesystem (taken from something called an image). A container actually uses the same kernel as your “host” system. Some fancy Linux kernel technologies allow all of this to happen; there is no hardware virtualization going on.

You start using Docker by creating an image or using an existing one made by someone else. An image is a filesystem snapshot. You can build images in an automated fashion using a Dockerfile, which allows you to “script” the running of commands to do things like install software and copy files around.

When you launch a container, Docker makes a “copy” of an image (well, not really, but we’ll pretend for now) and uses that to start a process. The fact that the filesystem is a “copy” is important: if you launch 2 containers using the same image, and the processes in each modify files, those changes happen in each CONTAINER’s filesystem; the image doesn’t change. Processes inside containers only see what’s in the container, so they are isolated from one another. This allows complete dependency separation, at the level of processes.

You can do a lot with containers. You can run multiple processes in them (though this is discouraged). You can start another process in an already running container, so it can interact with the already running process. After a container has stopped (by halting the process running in it), you can start it back up again. Again, any file changes made are in the container’s filesystem; the image remains unchanged.

Issues

There are three issues I’ve personally encountered so far in using Docker:

1) Persistent Storage

Containers are meant to be transient. In a well designed setup, you should, theoretically, be able to spin up new containers and discard old ones all the time, and not lose any data. This means that your persistent storage has to live somewhere else, in one of two places: a special “data container” or a directory mounted from the host filesystem.

Data containers were too complicated and weird, and I couldn’t get them to work the way I expected, so I mounted directories instead. This has the nice side effect that, as Dockerized processes change files, you can see those changes immediately from the host without having to do anything special to access them. I’m not sure, however, what “best practices” are concerning storage.

2) Multi-Container Applications

Many modern applications consist of several processes or require other applications. For example, my project at work consists of a Rails web app, a delayed_job worker process, an Apache Solr instance, and a MySQL database.

Since Docker strongly recommends a one-process-per-container configuration, you need a way to coordinate a set of running processes and make sure they can communicate with one another. Docker Compose does this, allowing you to easily specify whether containers should be able to open connections to each other’s network ports, share filesystems, etc.

Currently, Docker Compose is not yet considered “production ready.” While it addresses the need to orchestrate processes, there is also the problem of monitoring and restarting processes as needed. It’s not clear to me yet what the best tool is for doing this (it may even be a completely orthogonal concern to what Docker does).

3) Running Commands

Sometimes you need to use the Rails CLI to do things like run database migrations or a rake task. Running commands takes a bit of extra effort, since they need to happen in a container. A slight complication is whether to run the command in the existing Rails container or to start another one entirely for that new process. It’s a bit of typing, which is annoying.

The Payoffs

How is Docker life changing? There are several scenarios I encounter ALL THE TIME that Docker helps tremendously with.

* On the same machine, you can run several web applications, which each require different versions of, say, Ruby, Rails, MySQL and Apache, without dealing with software conflicts and incompatibilities.

* Related to the previous point, Docker lets you more easily experiment with different software without polluting your base system with a lot of crap you might never use again.

* There is MUCH less overhead and less wasted memory usage than with virtualization. If you allocate 1GB of RAM to a virtual machine but only use 512MB, the other half goes to waste. Docker containers only use as much memory as the processes themselves take up, plus a small bit of overhead. Since Docker uses unionfs to “copy” (really, overlay) images to container filesystems, the actual disk space used isn’t as much as you might think.

* Since Docker containers are entirely self-contained, they can be deployed in development, staging, and production environments with almost NO differences among them, thereby minimizing the problems that usually arise.

For me, a lot of the benefits boil down to this: virtualization is amazing, but in practice, I don’t use it that much because it’s too heavyweight and too coarse for my needs. A Virtualbox is a whole other machine that I need to think about. By working at the level of Linux processes, Docker is exactly the right kind of tool for managing application dependencies.

A cautionary note: there’s a lot of buzz right now around containers, including efforts at defining vendor-neutral standards, such as appc. Although Docker releases have been rapid and it is already seeing a lot of adoption, it feels bleeding edge to me. It’s exciting but in a few years, it’s entirely possible that another container solution might surpass it to become the de facto standard. The playing field is just too new, which means Docker comes with some risk. But it’s well worth exploring at this early stage, even if only to get a taste of the new ideas shaping systems configuration and deployment that are definitely here to stay.

Installing all the Software from an Old System

I recently got a new laptop (more on this in another post, perhaps). After I installed Debian Linux from scratch and copied my home directory over to it, I needed to install all the software I used on my old machine. Here’s one way to do this.

On your old system, create a list of your installed packages:

dpkg -l > old_packages
cat old_packages | awk '{print $2}' | sort > old_packages_list

Copy the file old_packages_list to your new system. Then, on your new system, run:

dpkg -l > new_packages
cat new_packages | awk '{print $2}' | sort > new_packages_list
comm -2 -3 old_packages_list new_packages_list > missing

The file missing now contains a list of packages that existed on your old system that don’t exist on your new one.

If you don’t work with many library packages directly, you can filter them out by reverse grepping for ones that begin with “lib”. (These are installed automatically as dependencies for other things; you don’t usually need to worry about them unless you need them for development.)

cat missing | egrep -v "^lib" > missing_apps_only

Going through this list, I could quickly identify all the software I recognized and wanted to install, and skip packages no longer useful to me (programs to burn CDs and DVDs, for example, since my new machine doesn’t have a drive) or applications I only played with once or twice. And I didn’t install things I didn’t recognize—I can always install them later.

This gets your new computer up and running in short order.

Where To Find Info When Packages Break in Debian Testing

The chromium package in Debian testing broke a few days ago. After I ran “apt-get update” and “apt-get upgrade”, chromium disappeared from my Xfce menu, and the executable was gone from my system. Nothing like that has ever happened to me before. Odd!

When I tried to re-install it by running “apt-get install chromium”, I got the following error:

The following packages have unmet dependencies:
chromium : Depends: libudev0 (>= 146) but it is not installable
E: Unable to correct problems, you have held broken packages.

Indeed, there is no package called libudev0 (there is, however, a libudev1, which I already had installed). Mysterious.

Being fairly new to Debian testing, I was at a loss as to what to do. After some googling, I discovered some information that’s useful to users trying to troubleshoot broken packages.

I already knew that Debian has a searchable package database on their website. If you search for ‘chromium’ in the testing distribution, you’ll get to a page for it.

What I’d never noticed before were the links on the right-hand side. Every package apparently has its own mailing list archive and QA page.

The QA page isn’t the easily thing in the world to make sense of. I couldn’t find a simple listing of bugs in reverse chronological order, which would let me quickly see the newest bugs filed. The closest thing is the list of all open bugs. There is also a dashboard page which is vaguely reverse chronological, though it may be sorted by priority; it’s not clear.

In any event, this was good enough. I could see the bug for the error message I was getting. Turns out an update had mistakenly built the package for stable, which is why the unmet dependency was coming up.

It’s yet to be fixed, but at least now I know exactly what the problem is.

Upgrading to Ubuntu 12.04

I finally took the plunge and upgraded to Ubuntu 12.04 from 11.10. It was surprisingly painless, taking about an hour and a half on a Core i5 laptop.

In many years of using various flavors of Linux (Slackware, Red Hat, Debian, now Ubuntu), this is the first time I’ve used an automated upgrade program! In the past, I always made a backup of up my data, did a fresh install, and restored the data. I never trusted upgrades to get everything exactly right. The hassle of a new installation always seemed like less trouble to me than having to figure out any strange system quirks resulting from a slightly imperfect upgrade process.

The only hair-raising moment was when the upgrade program seemed to stall during the “Installing the Upgrades” stage, with the progress bar showing “Configuring debconf.” If you’re not noticing any CPU or disk activity for a while, click on the Terminal toggle to show what dpkg is doing. In my case, it was prompting for input there instead of popping up a graphical window, because gtk was temporarily hosed at that point during the upgrade.

But it seems to have done the job, and everything’s working well. Nice job, Canonical. I’m still undecided about Ubuntu past 12.04 because of the serious data privacy issues I’ve mentioned before. It’s a shame, really–despite the disappointing direction that Ubuntu is taking, it’s done so much right to improve desktop Linux.

SBCL 1.0.55 x64 needs GLIBC 2.14

It installed on my Ubuntu 11.10 (Oneiric) laptop, but when I tried to run it, I got this message:

sbcl: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found (required by sbcl)

Hmmm. Let’s see..

# ls -al /lib/x86_64-linux-gnu/libc.so.6
lrwxrwxrwx 1 root root 12 2012-03-06 21:17 /lib/x86_64-linux-gnu/libc.so.6 -> libc-2.13.so

Argh.

This has already been reported in the SBCL bug database.

My workaround was to grab 1.0.54 from the old file releases section on SBCL’s sourceforge site. That worked fine. So you’ll need to downgrade if you want to use the 64 bit version.

This came up when I ran into problems installing cl-libxml2 using quicklisp. The x86 version of SBCL throws an error when using cffi to load the Ubuntu-packaged libxml2 library, because it expects a 32-bit version of the library but finds a 64-bit one instead (I neglected to copy the error message, but it’s something about finding an unexpected/wrong class ELFCLASS64).

So I suspect anyone who has a x64 system that uses cffi to access libraries in this manner will run into the same problem with SBCL 1.0.55.

On a side note, I can’t imagine installing and managing Common Lisp libraries without quicklisp. It took a few seconds to do what would probably take a beginner like me hours to figure out. Awesome tools like that make a huge difference to newcomers who want to get up and running with CL without a lot of pain.

* * *

UPDATE: This is fixed in 1.0.56.

Installing DBI on Leopard’s perl 5.8.8

I needed to get my perl installation updated to do some development locally. As usual, perl was a pain in the ass. Long story short: Install Xcode 3.0, copy the “reentr.inc” file from the 5.8.8 source distribution, and DBI should install.

Below is the long-winded log of my woes, offered in the hopes it might help someone.

First, DBI 1.604 wouldn’t install via cpan, so I tried installing it by hand. But I just got the same error when running “make”:

No rule to make target `/System/Library/Perl/5.8.8/darwin-thread-multi-2level/CORE/config.h', needed by `Makefile'.

I found this blog post, “Leopard Perl 5.8.8 installation throws errors when compiling (makefile)” mentioning the exact message, which recommended copying the CORE directory from 5.8.6 instead the above location. I tried that, but then I got this error instead:

DBI.xs: In function ‘dbi_profile’:
DBI.xs:2398: warning: implicit declaration of function ‘GvSVn’
DBI.xs:2398: error: invalid lvalue in assignment
DBI.xs: In function ‘dbi_profile’:
DBI.xs:2398: warning: implicit declaration of function ‘GvSVn’
DBI.xs:2398: error: invalid lvalue in assignment
DBI.xs: In function ‘XS_DBI_dispatch’:
DBI.xs:2970: warning: assignment makes pointer from integer without a cast
DBI.xs:2972: error: invalid lvalue in assignment
DBI.xs:2985: error: invalid type argument of ‘->’
DBI.xs:2989: error: invalid lvalue in assignment
DBI.xs:3293: warning: unused variable ‘Perl___notused’
DBI.xs: In function ‘XS_DBI_dispatch’:
DBI.xs:2970: warning: assignment makes pointer from integer without a cast
DBI.xs:2972: error: invalid lvalue in assignment
DBI.xs:2985: error: invalid type argument of ‘->’
DBI.xs:2989: error: invalid lvalue in assignment

Googling these error messages turned up surprisingly little. On another blog with a post titled, “Mac OS 10.5: Leopard” that mentioned difficulties with DBI, commenters suggested various solutions, but none of them worked for me. The blog author got an older version of DBI to install but I couldn’t get that to work either.

I discovered that Xcode 3.0 (developer tools for Leopard) contains the 5.8.8 files that belong in that CORE directory. This seemed like a better option than copying the probably outdated 5.8.6 files. You can get the gigantic Xcode disk image, a whopping 1.1 gigabytes, from the Apple Developer Connection site. Registration is required through the “Member” link, and once you’re in, go to Downloads and search for Xcode.

Before installing Xcode, I cleared out the hosed CORE directory I’d been mucking with. “DeveloperTools.pkg” is what contains the perl headers, so you can probably get away with just installing that (double-click it), instead of the entire XcodeTools.pkg. It did the trick: the compiler was now finding the “GvSVn” symbol it couldn’t before. But now I got this message during make:

In file included from DBIXS.h:19,
from Perl.xs:6:
/System/Library/Perl/5.8.8/darwin-thread-multi-2level/CORE/perl.h:3993:22: error: reentr.inc: No such file or directory
In file included from DBIXS.h:19,
from Perl.xs:6:
/System/Library/Perl/5.8.8/darwin-thread-multi-2level/CORE/perl.h:3993:22: error: reentr.inc: No such file or directory
lipo: can't open input file: /var/tmp//ccQ8vbDU.out (No such file or directory)
make: *** [Perl.o] Error 1

In desperation, I downloaded the perl 5.8.8 source distribution tarball, and simply copied reentr.inc into the CORE directory. Voila! Make went to completion and I could install the module. From there, I went back into cpan to install DBD::mysql without any problems (you need mysql installed in the default location, /usr/local/mysql, of course).

The Inexact Science of Optimizing MySQL

The past week or so, I’ve been playing database administrator: monitoring load, examining the log for slow queries, tweaking parameters, rewriting queries, adding indexes, and repeating the cycle over and over again. It’s a tedious, time-consuming, and exhausting process.

Tuning is tricky because inefficiencies aren’t always immediately apparent, and there may be uncontrollable factors affecting performance in VPS environments. Here are some things I’ve learned or had to refresh myself about with respect to tuning MySQL.

Don’t assume anything about why queries are slow. Under high load, a lot of queries can pop up in the slow query log just because the database is working hard. It doesn’t necessarily mean every query needs optimization. Try to look for patterns, and expect to spend time reviewing queries that are already optimized.

Look for strange queries. Don’t recognize it? Well, maybe it shouldn’t be there. I found an expensive query for a website feature that was obsolete months ago; it was being made even though the data wasn’t being used or displayed anywhere.

EXPLAIN is your friend. Interpreting the output makes my head pound sometimes, but it’s absolutely necessary. Invest the time and effort to understand it.

Multi-column indexes are a necessary evil. Most tables I was examining had only single-column indexes, causing complex WHERE clauses and/or ORDER BY to do costly “Using temporary; Using filesort” operations. Adding a few multi-column indexes helped a great deal. In general, I dislike multi-column indexes because they require maintenance: if you modify a complex query, you might have to add or change the index, or it will become slow as molasses. But unfortunately, that’s the tradeoff for performance.

The “same” query can behave differently with different datasets. A query can sometimes use an index, and sometimes use a filesort, depending, for example, on constants in the WHERE clause. Be sure to use EXPLAIN with different values.

Give the query optimizer hints. Disk access is slow, especially on the VPS, so I wanted to avoid filesorts at all costs. Using STRAIGHT_JOIN forced MySQL to process an indexed table first; otherwise, it believed that a filesort would be faster in some cases. It might be, sometimes–but if you think you know better, use the hint.

Disk access on a VPS can be wonky. It depends a lot on what else is happening on the hardware node. This relates somewhat to the previous point: all other things being equal, MySQL could be right about a filesort being faster than an index. But it’s usually not the case in an environment where disk I/O is very costly.

Reduce the number of joins wherever possible. It doesn’t seem like a few indexed joins into “lookup” tables (basically key-value types of things) would affect performance that much, but amazingly, it can. Really. Break up queries into smaller, simpler ones if at all possible.

Run ANALYZE TABLE and OPTIMIZE TABLE. The manual says you don’t typically need to run these commands, but I found frequently updated tables benefit from this maintenance, even if they have only a few thousand records.

Work iteratively. Somtimes performance problems are localized to one or two spots, but more often, inefficiencies are spread out over a lot of queries; they may become noticeable only after several passes, as server conditions fluctuate. Discovering these problem areas takes a lot of patience.

Up and Running

This domain is being served from a slicehost.com VPS account, which I’m sharing with my friend Danny. We spent a few hours installing a typical LAMP environment from Debian packages. It all went smoothly; we’re pretty much set up. Eventually, we’ll get around to putting actual content up on our respective domains, but we expect traffic and load to remain very light. So we spent most of our time dealing with the main resource limit on our “box”: memory.

There’s lots of information available on various optimizing strategies for Apache and MySQL, but it basically boiled down to 2 things for us. In Apache, we lowered the number of start/max/min servers, and loaded only modules we needed. In MySQL, we decreased the sizes of various buffers and queues, and also disabled InnoDB support. The latter alone changed the initial size of MySQL from 20 megs of resident memory to 9 megs! I’m sure we’ll continue to tweak later on, but memory usage seems fairly reasonable right now for what we’re running.

It’s funny to reflect on how far open source software has come in general. I remember talking nine years ago with some corporate guy in New York about a piece of open source he promptly dismissed as “not ready for the enterprise” (that phrase always makes me laugh. Whose enterprise, exactly? Mine?). Back then, it was common to hear disparaging stuff like that all the time.

And now–is there anything that’s NOT enterprise ready?! Maybe the suits are happy, but I’m not sure everyone else has benefitted. As the software has grown in scalability, performance, and features, the requirements have also increased drastically. And that means the bar is higher for even a fairly standard LAMP environment, for example. Default installs of Apache, MySQL and PHP can be real memory hogs.

I’ve been discovering a few interesting alternative projects with lighter feature sets. lighttpd and thttpd are both high-performance, small-foot web servers. And a lot of folks seem to be looking to SQLite if they don’t need the fancier features of a complex RDBMS. Maybe I will experiment with these at some point.