Managing Dependencies in Python vs Ruby

Ruby's Bundler tool is amazing.
Ruby’s Bundler tool is amazing.

With Python projects, the standard way of doing things is to set up a virtualenv and use pip to install packages from PyPI specified in a requirements.txt file. This way, each of your project’s dependencies are kept separate, installed in their own directories in isolated sandboxed environments.

This works pretty well. But sometimes, when I am debugging a third party package, I want to be able to get the source code from git and use it instead of the package from PyPI, so I can make changes, troubleshoot, experiment, etc. This is a pain in the butt. You have to remove the installed package and either 1) install manually (and repeatedly, as you work) from your cloned repository, or 2) add the repository directory to your Python library path somehow. Then you have to undo these changes to go back to using the PyPI package. Either way, it’s clunky and annoying.

Ruby’s bundler tool has a very different approach to dependencies. It, too, downloads appropriate versions of gems (which is what packages are called), which are listed in a Gemfile. But unlike pip, it can store multiple versions of a gem, and even let you specify that a gem lives in a github or local repository; moreover, it makes the right packages available each time you run your program! That is, each time you run the “bundle exec” wrapper to run Rails or anything else, it sets up a custom set of directories for Ruby’s library path that point ONLY to the versions you want, ignoring the others.

I did this today when trying to pin down the source of some deprecation warnings I was seeing after some gem upgrades. My Gemfile had these lines in it:

gem 'sunspot_rails', '~> 2.1.0'
gem 'sunspot_solr', '~> 2.1.0'

I cloned the sunspot repository containing those gems. Then I ran:

# bundle config local.sunspot_rails ~/sunspot
# bundle config local.sunspot_solr ~/sunspot

And changed the Gemfile lines:

gem 'sunspot_rails', :github => 'sunspot/sunspot_rails', :branch => 'master'
gem 'sunspot_solr', :github => 'sunspot/sunspot_solr', :branch => 'master'

Finally, I ran “bundler update”. That’s it! I could make changes to my cloned repository, restart Rails, and see the changes immediately.

When I was done messing around, I changed my Gemfile back, ran “bundler update” again, and I was back to using my original gems.

Being able to work so easily with third party code allowed me to quickly figure out where the deprecated calls were being made and file an issue with the sunspot project.

Installing all the Software from an Old System

I recently got a new laptop (more on this in another post, perhaps). After I installed Debian Linux from scratch and copied my home directory over to it, I needed to install all the software I used on my old machine. Here’s one way to do this.

On your old system, create a list of your installed packages:

dpkg -l > old_packages
cat old_packages | awk '{print $2}' | sort > old_packages_list

Copy the file old_packages_list to your new system. Then, on your new system, run:

dpkg -l > new_packages
cat new_packages | awk '{print $2}' | sort > new_packages_list
comm -2 -3 old_packages_list new_packages_list > missing

The file missing now contains a list of packages that existed on your old system that don’t exist on your new one.

If you don’t work with many library packages directly, you can filter them out by reverse grepping for ones that begin with “lib”. (These are installed automatically as dependencies for other things; you don’t usually need to worry about them unless you need them for development.)

cat missing | egrep -v "^lib" > missing_apps_only

Going through this list, I could quickly identify all the software I recognized and wanted to install, and skip packages no longer useful to me (programs to burn CDs and DVDs, for example, since my new machine doesn’t have a drive) or applications I only played with once or twice. And I didn’t install things I didn’t recognize—I can always install them later.

This gets your new computer up and running in short order.

Demystifying Ruby functions

(NOTE: Edited slightly on 6/6/2015 for better clarity about the behavior of Object in different versions of Ruby. This post mostly applies to version 1.9.)

Something that confused me about Ruby at first was why tutorials, articles, and books kept referring to Ruby “methods” when talking about functions. For example:

def hello
 puts "hi there"
end

Since hello is not being defined as part of a class or module, it stands on its own. Any reasonable programmer would call this a function, not a method. Initially, I took this as some odd quirk of the Ruby community. They include some pretty strange folks, after all.

Turns out there’s a good reason for this terminology. It IS a method.

As Pat Shaughnessy explains in this article, when you execute def hello at the top level, it gets added as an instance method to the Object class. You can verify this yourself:

def hello
 puts "hi there"
end
# this will print 'true'
puts Object.respond_to? :hello

When you call hello, the method is invoked on a receiver (ie. self) that is an object instance called “main”, whose class is Object.

def hello
  puts "hi there"
  puts "self is #{self}, its class is #{self.class}"
end
# this prints 'self is main, its class is Object'
hello

This explains why defining methods at the top level appears to make them globally accessible. What actually happens is that, when you call hello from any class or module, Ruby will look for it up the class hierarchy until it finds the method in Object.

This also explains something else not immediately obvious to the newbie rubyist: things such as puts are not builtins or special cases, but just regular methods in the Kernel class, which is the parent of Object. Whether you call puts at the top level or from inside a class or module, Ruby will look up the class hierarchy and find it in Kernel.

Kernel‘s members include things that might surprise you, like require!

These simple mechanisms—the implicit “main”, the Object class, and the Kernel module—are stunningly elegant design ideas (and, apparently, according to Shaughnessy, ideas adapted from Smalltalk). It allows Ruby to provide the convenient calling syntax of functions, while remaining object-oriented down to its very core.

(Compare this to Python, which does support “real” functions and differentiates them from methods. The distinction is less important in practice, as Python treats them both as “callables”–ie. arbitrary executable things.)

One small note: Shaughnessy writes that “All Ruby functions are actually private methods of Object.” In his example, he prints Object.private_instance_methods and finds his newly defined function there. This is true in Ruby 2.1, but not in 1.9.3, where functions are public:

def hello
 puts "hi there"
end
# In Ruby 1.9.3, this will print 'false' and then 'true'
puts Object.private_instance_methods.member? :hello
puts Object.methods.member? :hello

I am just now beginning to realize how much Ruby is truly an object oriented language through and through. Parts of the language design might seem ad hoc and magical, which is indeed part of Ruby’s beauty and allure, but some very consistent and elegant mechanisms under the hood are what make them work.

(I’ve eagerly ordered Shaughnessy’s book, Ruby Under a Microscope, which looks terrific.)

A VIAF Reconciliation Service for OpenRefine

open-refine

OpenRefine is a wonderful tool my coworkers have been using to clean data for my project at work. Our workflow has been nice and simple: they take a CSV dump from a database, transform the data in OpenRefine, and export it as CSV. I write scripts to detect the changes and update the database with the new data.

We have a need, in the next few months, to reconcile the names of various individuals and organizations with standard “universal” identifiers for them in the Virtual International Authority File. The tricky part is that any given name in our system might have several candidates in VIAF, so it can’t be a fully automated process. A human being needs to look at them and make a decision. OpenRefine allows you to do this reconciliation, and also provides an interface that lets you choose among candidates.

Communicating with VIAF is not built in, though. Roderic D. M. Page wrote a VIAF reconciliation service, and it’s publicly accessible at the address listed on the linked page (the PHP source code is available here). It works very nicely.

I wanted to write my own version for 2 reasons: 1) I needed it to support the different name types in VIAF, 2) I wanted to host it myself, in case I needed to make large numbers of queries, so as not to be an obnoxious burden on Page’s server.

The project is called refine_viaf and the source code is available at https://github.com/codeforkjeff/refine_viaf.

For those who just want to use it without hosting their own installation, I’ve also made the service publicly accessible at http://refine.codefork.com, where there are instructions on how to configure OpenRefine to use it.

Arriving Late to the Party

chunky_bacon

I started playing with Ruby this past week, at the suggestion of a coworker I respect who thinks highly of the language, and of Rails. I’ve found Programming Ruby 1.9 & 2.0 to be an excellent way to get started.

Now, it’s only been a week, but so far? I really love it.

This surprised me.

Some first thoughts and impressions, subject to change:

1) Ruby is VERY expressive. I’m astounded by what you can do in a few short lines of code. I wasn’t initially thrilled about some of its uses of character symbols, obviously borrowed from Perl, but these are actually pretty judicious and not as bad as I expected.

2) There’s a commonplace idea that Python and Ruby are redundant: if you know one, there’s not a lot of reason to learn the other. I can see this, as they do have many overlapping features and roles, but their philosophies could not be more different. So from an experiential standpoint rather than a business one, they’re both worth learning.

3) Things that Ruby does better than Python: the object orientation is stronger (access controls, single inheritance with a powerful mixin mechanism, most[?] operators are methods); blocks are way more powerful than lambdas; regexes are native constructs; Ruby’s symbols are a nice feature taken from Lisp.

4) I hate that ‘require’ autoimports things into your namespace, like Perl does. This is where I much prefer Python’s ‘import’ statement. For the love of God, DO NOT TOUCH MY NAMESPACE unless I say so. (EDIT: this is actually incorrect! Files that get required typically define modules, which are constants in a global namespace, which is quite different from autoimporting names into the current namespace, like Perl.)

5) The fact that strings are mutable is one of those things that filled me with horror, but I’m not sure it actually matters that much in practice.

6) If Why’s Poignant Guide to Ruby doesn’t inspire you to learn and use Ruby, you may not be human.

As a learning exercise, I rewrote a short Python script that watches RSS feeds for certain keywords. It’s available on github. Even as a newcomer (albeit one with Perl and Python experience), writing the code was intuitive and painless. In fact, it was a strangely pleasing experience. The code could be improved, I’m sure, but arriving at a decent, clean first pass wasn’t hard to do at all.

This feels like a good time to look at Ruby. The initial frenzy and excitement over Ruby, and Rails, in the mid-to-late 2000s has died down considerably, displaced by the current wave of languages focusing on concurrency. Yet its performance, once a huge blight, has been quietly improving from 1.8 to 1.9 to 2.0, and the ecosystem (gems, rbenv, rake) seems very mature.

Now that it’s no longer sexy, it can focus on doing what it was designed for in the first place: making programmers happier to be writing code. Languages these days are emphasizing features and power, but how many actually tout your happiness as its primary reason for existence?