On Magic

Kids, I hate to break it to you, but there is no such thing as magic.

The cool whizzy stuff on your screen that impresses you: that’s the result of work. The button that was broken yesterday, that now works correctly today: also the product of work. The screen that was discussed in a meeting last week that suddenly appeared today on the development server: yup, work. When you look for a feature in the web application and it isn’t there, there’s this thing that can create it and put it there: it’s called work.

Someday we’ll all get over the mystifying aura of technology. Someday people will learn to recognize that programmers are not magicians, just workers, and that the work they do involves mundane, non-magical tasks, like wrestling with code libraries and frameworks to get them to do what we want, reorganizing files to make sure stuff exists in sensible places, and figuring out what to do when changing one piece affects three other pieces in unexpected ways.

And this means, someday, people will understand that, like any other kind of work, software development takes resources (namely, time!), not a magic wand. And no amount of “ambition” (read: wishful thinking) can really change that basic equation. You can pretend magic exists, but that doesn’t make it so. You aren’t fooling anyone. You just look childish.

When software development is recognized as work, there can be clarity about what is possible with a given set of resources. Then tasks can be sanely identified, specified, prioritized, coordinated, scheduled, executed, completed.

And then some really cool things can happen. Not magical things, but really cool things. Great things, even. The kind of great things that result from understanding, dedication, and hard work.

Adventures in Docker

large_h-trans

After you’ve taken the time to puzzle through what it is exactly, Docker is nothing short of life changing.

In a nutshell, Docker lets you run programs in separate “containers”, isolating the dependencies each one requires. This is similar to virtualization solutions like VMWare and Virtualbox but Docker is a much more fine grained, customizable tool that operates at a different level.

It took me a week of experimentation to develop a firm grasp of the Docker concepts and how all its pieces work together in practice. I’m now using it at work for development, and I hope to be setting up a configuration for staging soon.

This is a short write-up of what I’ve learned so far.

The Concepts

On a first glance, almost everyone (including me) mistakes Docker as another virtualization technology. It’s not. Virtualization lets you run a machine within a machine. A Docker container is more subtle: it segments or isolates part of your existing Linux operating system.

A container consists of a separate memory space and filesystem (taken from something called an image). A container actually uses the same kernel as your “host” system. Some fancy Linux kernel technologies allow all of this to happen; there is no hardware virtualization going on.

You start using Docker by creating an image or using an existing one made by someone else. An image is a filesystem snapshot. You can build images in an automated fashion using a Dockerfile, which allows you to “script” the running of commands to do things like install software and copy files around.

When you launch a container, Docker makes a “copy” of an image (well, not really, but we’ll pretend for now) and uses that to start a process. The fact that the filesystem is a “copy” is important: if you launch 2 containers using the same image, and the processes in each modify files, those changes happen in each CONTAINER’s filesystem; the image doesn’t change. Processes inside containers only see what’s in the container, so they are isolated from one another. This allows complete dependency separation, at the level of processes.

You can do a lot with containers. You can run multiple processes in them (though this is discouraged). You can start another process in an already running container, so it can interact with the already running process. After a container has stopped (by halting the process running in it), you can start it back up again. Again, any file changes made are in the container’s filesystem; the image remains unchanged.

Issues

There are three issues I’ve personally encountered so far in using Docker:

1) Persistent Storage

Containers are meant to be transient. In a well designed setup, you should, theoretically, be able to spin up new containers and discard old ones all the time, and not lose any data. This means that your persistent storage has to live somewhere else, in one of two places: a special “data container” or a directory mounted from the host filesystem.

Data containers were too complicated and weird, and I couldn’t get them to work the way I expected, so I mounted directories instead. This has the nice side effect that, as Dockerized processes change files, you can see those changes immediately from the host without having to do anything special to access them. I’m not sure, however, what “best practices” are concerning storage.

2) Multi-Container Applications

Many modern applications consist of several processes or require other applications. For example, my project at work consists of a Rails web app, a delayed_job worker process, an Apache Solr instance, and a MySQL database.

Since Docker strongly recommends a one-process-per-container configuration, you need a way to coordinate a set of running processes and make sure they can communicate with one another. Docker Compose does this, allowing you to easily specify whether containers should be able to open connections to each other’s network ports, share filesystems, etc.

Currently, Docker Compose is not yet considered “production ready.” While it addresses the need to orchestrate processes, there is also the problem of monitoring and restarting processes as needed. It’s not clear to me yet what the best tool is for doing this (it may even be a completely orthogonal concern to what Docker does).

3) Running Commands

Sometimes you need to use the Rails CLI to do things like run database migrations or a rake task. Running commands takes a bit of extra effort, since they need to happen in a container. A slight complication is whether to run the command in the existing Rails container or to start another one entirely for that new process. It’s a bit of typing, which is annoying.

The Payoffs

How is Docker life changing? There are several scenarios I encounter ALL THE TIME that Docker helps tremendously with.

* On the same machine, you can run several web applications, which each require different versions of, say, Ruby, Rails, MySQL and Apache, without dealing with software conflicts and incompatibilities.

* Related to the previous point, Docker lets you more easily experiment with different software without polluting your base system with a lot of crap you might never use again.

* There is MUCH less overhead and less wasted memory usage than with virtualization. If you allocate 1GB of RAM to a virtual machine but only use 512MB, the other half goes to waste. Docker containers only use as much memory as the processes themselves take up, plus a small bit of overhead. Since Docker uses unionfs to “copy” (really, overlay) images to container filesystems, the actual disk space used isn’t as much as you might think.

* Since Docker containers are entirely self-contained, they can be deployed in development, staging, and production environments with almost NO differences among them, thereby minimizing the problems that usually arise.

For me, a lot of the benefits boil down to this: virtualization is amazing, but in practice, I don’t use it that much because it’s too heavyweight and too coarse for my needs. A Virtualbox is a whole other machine that I need to think about. By working at the level of Linux processes, Docker is exactly the right kind of tool for managing application dependencies.

A cautionary note: there’s a lot of buzz right now around containers, including efforts at defining vendor-neutral standards, such as appc. Although Docker releases have been rapid and it is already seeing a lot of adoption, it feels bleeding edge to me. It’s exciting but in a few years, it’s entirely possible that another container solution might surpass it to become the de facto standard. The playing field is just too new, which means Docker comes with some risk. But it’s well worth exploring at this early stage, even if only to get a taste of the new ideas shaping systems configuration and deployment that are definitely here to stay.

Software Old and New

Waaaay back in middle school, I used WordPerfect 5.1 to type up book reports and other homework assignments. This was on a Tandy 1000, one of the first home computers. Having never used a PC before, much less word processing software, it took some time to learn. WordPerfect came with a plastic template you laid above the keyboard’s F-keys, which told you what pressing various key combinations did. In my ignorance, I hit Enter twice at the end of every line of text to get double line spacing, which, of course, made editing and revising a nightmare. My uncle, a computer wiz, laughed when he saw this, and taught me how to set the line spacing the right way. It amazed me that the computer could reflow the text automatically.

A lesson I learned from this was that the manual that came with the 3.5″ disks was pretty darn useful.

Back then, in the late 80s and early 90s, software was a specialized tool or instrument. I was fortunate to have a computer at home. Not everyone did. To use it proficiently, you had to do some learning. This was expected. It wasn’t WordPerfect’s fault that I didn’t even know line spacing existed as a feature. Like learning any powerful tool, it required some time and effort to develop the skills.

There’s been a drastic paradigm shift over the last 25 years. Software has become ubiquitous. It’s no longer just the programs you run on your home or work computer. It’s on our phones and tablets. It’s what web applications are made of. It’s in cars, ATMs, information kiosks, and home appliances. Commercial software rarely comes with user manuals anymore. My smartphone came with a single sheet of paper showing you how to turn it on. When there are Android updates, I don’t get a book that explains the additional gestures it now recognizes, what the new icons mean, or how the menus have been restructured. I’m expected to just poke around the new interface until I can do what I’m trying to do. When you visit a new website you haven’t been to before, you are similarly expected to already know how to navigate it. This is possible because there are common conventions around software features and interface design, so that, when using a new piece of software, you are not starting completely from scratch.

The consequence of this radical shift is that if you can’t immediately use a new piece of software, there are 2 possible explanations: 1) you are lacking a general “digital literacy” which most people are understood to have (as opposed to specialized knowledge), or 2) the software is crappy.

We take pity on digital illiterates, but we have no sympathy or patience for crappy software. “Why does it take me 3 clicks to get to X? Why doesn’t this application do Y? Why doesn’t the icon resemble this, instead of that?” These complaints are commonplace. Increasingly, it doesn’t seem to matter what the software actually does or what the level of its inherent complexity might be. The pace of technological change and the pressures of high-tech business have made it important for users to be able to use software immediately, and to be satisfied enough that they don’t run off to a competitor’s product. Our intolerance is a direct result of this frenzied climate, which has taken user-friendliness to the extreme of trying to be all things to all people (or at least, as many things to as many people as possible).

The problem is that there is a lot of variability in user preferences, opinions, and needs. The more that software tries to accommodate a wide variety of these concerns, the less useful it becomes as a tool. I think you see this especially in many mobile apps and websites. They DO very little, but they go out of their way to make it easy to do it. This focus on ease is deceptive. It leads to a false sense of empowerment. We are surrounded by software everywhere that appears to enable us to do all sorts of things, but we actually don’t understand enough to know how to operate things skillfully. We just click and swipe, click and swipe, and get frustrated when magic doesn’t happen.

Using technology as a tool can save significant work and allow us to do things not possible before. But that doesn’t necessarily imply that it is or should be easy. It’s a subtle but important difference. Knowing how to fly an airplane enables you to traverse thousands of miles in a few hours, but that doesn’t mean operating one is easy, or that should be. One should be trained to be a skilled pilot, so that she can make the machine do all the complex things it needs to, in a variety of situations. One shouldn’t expect a cockpit that lets anyone to marginally be able to fly a plane. Because how far is that going to get you, really?

Variable assignment quirks in Ruby

# irb
irb(main):001:0> puts x
NameError: undefined local variable or method 'x' for main:Object
	from (irb):1
	from /usr/bin/irb:11:in '<main>'

Okay. I expected that.

irb(main):002:0> x = x + 1
NoMethodError: undefined method '+' for nil:NilClass
	from (irb):2
	from /usr/bin/irb:11:in '<main>'

Whoa. Did NOT expect that! Wait, so does that mean…?

irb(main):003:0> z = z
=> nil

Hmm. Okay. Well, time to call it a day.

Managing Dependencies in Python vs Ruby

Ruby's Bundler tool is amazing.
Ruby’s Bundler tool is amazing.

With Python projects, the standard way of doing things is to set up a virtualenv and use pip to install packages from PyPI specified in a requirements.txt file. This way, each of your project’s dependencies are kept separate, installed in their own directories in isolated sandboxed environments.

This works pretty well. But sometimes, when I am debugging a third party package, I want to be able to get the source code from git and use it instead of the package from PyPI, so I can make changes, troubleshoot, experiment, etc. This is a pain in the butt. You have to remove the installed package and either 1) install manually (and repeatedly, as you work) from your cloned repository, or 2) add the repository directory to your Python library path somehow. Then you have to undo these changes to go back to using the PyPI package. Either way, it’s clunky and annoying.

Ruby’s bundler tool has a very different approach to dependencies. It, too, downloads appropriate versions of gems (which is what packages are called), which are listed in a Gemfile. But unlike pip, it can store multiple versions of a gem, and even let you specify that a gem lives in a github or local repository; moreover, it makes the right packages available each time you run your program! That is, each time you run the “bundle exec” wrapper to run Rails or anything else, it sets up a custom set of directories for Ruby’s library path that point ONLY to the versions you want, ignoring the others.

I did this today when trying to pin down the source of some deprecation warnings I was seeing after some gem upgrades. My Gemfile had these lines in it:

gem 'sunspot_rails', '~> 2.1.0'
gem 'sunspot_solr', '~> 2.1.0'

I cloned the sunspot repository containing those gems. Then I ran:

# bundle config local.sunspot_rails ~/sunspot
# bundle config local.sunspot_solr ~/sunspot

And changed the Gemfile lines:

gem 'sunspot_rails', :github => 'sunspot/sunspot_rails', :branch => 'master'
gem 'sunspot_solr', :github => 'sunspot/sunspot_solr', :branch => 'master'

Finally, I ran “bundler update”. That’s it! I could make changes to my cloned repository, restart Rails, and see the changes immediately.

When I was done messing around, I changed my Gemfile back, ran “bundler update” again, and I was back to using my original gems.

Being able to work so easily with third party code allowed me to quickly figure out where the deprecated calls were being made and file an issue with the sunspot project.