Installing all the Software from an Old System

I recently got a new laptop (more on this in another post, perhaps). After I installed Debian Linux from scratch and copied my home directory over to it, I needed to install all the software I used on my old machine. Here’s one way to do this.

On your old system, create a list of your installed packages:

dpkg -l > old_packages
cat old_packages | awk '{print $2}' | sort > old_packages_list

Copy the file old_packages_list to your new system. Then, on your new system, run:

dpkg -l > new_packages
cat new_packages | awk '{print $2}' | sort > new_packages_list
comm -2 -3 old_packages_list new_packages_list > missing

The file missing now contains a list of packages that existed on your old system that don’t exist on your new one.

If you don’t work with many library packages directly, you can filter them out by reverse grepping for ones that begin with “lib”. (These are installed automatically as dependencies for other things; you don’t usually need to worry about them unless you need them for development.)

cat missing | egrep -v "^lib" > missing_apps_only

Going through this list, I could quickly identify all the software I recognized and wanted to install, and skip packages no longer useful to me (programs to burn CDs and DVDs, for example, since my new machine doesn’t have a drive) or applications I only played with once or twice. And I didn’t install things I didn’t recognize—I can always install them later.

This gets your new computer up and running in short order.

Demystifying Ruby functions

(NOTE: Edited slightly on 6/6/2015 for better clarity about the behavior of Object in different versions of Ruby. This post mostly applies to version 1.9.)

Something that confused me about Ruby at first was why tutorials, articles, and books kept referring to Ruby “methods” when talking about functions. For example:

def hello
 puts "hi there"
end

Since hello is not being defined as part of a class or module, it stands on its own. Any reasonable programmer would call this a function, not a method. Initially, I took this as some odd quirk of the Ruby community. They include some pretty strange folks, after all.

Turns out there’s a good reason for this terminology. It IS a method.

As Pat Shaughnessy explains in this article, when you execute def hello at the top level, it gets added as an instance method to the Object class. You can verify this yourself:

def hello
 puts "hi there"
end
# this will print 'true'
puts Object.respond_to? :hello

When you call hello, the method is invoked on a receiver (ie. self) that is an object instance called “main”, whose class is Object.

def hello
  puts "hi there"
  puts "self is #{self}, its class is #{self.class}"
end
# this prints 'self is main, its class is Object'
hello

This explains why defining methods at the top level appears to make them globally accessible. What actually happens is that, when you call hello from any class or module, Ruby will look for it up the class hierarchy until it finds the method in Object.

This also explains something else not immediately obvious to the newbie rubyist: things such as puts are not builtins or special cases, but just regular methods in the Kernel class, which is the parent of Object. Whether you call puts at the top level or from inside a class or module, Ruby will look up the class hierarchy and find it in Kernel.

Kernel‘s members include things that might surprise you, like require!

These simple mechanisms—the implicit “main”, the Object class, and the Kernel module—are stunningly elegant design ideas (and, apparently, according to Shaughnessy, ideas adapted from Smalltalk). It allows Ruby to provide the convenient calling syntax of functions, while remaining object-oriented down to its very core.

(Compare this to Python, which does support “real” functions and differentiates them from methods. The distinction is less important in practice, as Python treats them both as “callables”–ie. arbitrary executable things.)

One small note: Shaughnessy writes that “All Ruby functions are actually private methods of Object.” In his example, he prints Object.private_instance_methods and finds his newly defined function there. This is true in Ruby 2.1, but not in 1.9.3, where functions are public:

def hello
 puts "hi there"
end
# In Ruby 1.9.3, this will print 'false' and then 'true'
puts Object.private_instance_methods.member? :hello
puts Object.methods.member? :hello

I am just now beginning to realize how much Ruby is truly an object oriented language through and through. Parts of the language design might seem ad hoc and magical, which is indeed part of Ruby’s beauty and allure, but some very consistent and elegant mechanisms under the hood are what make them work.

(I’ve eagerly ordered Shaughnessy’s book, Ruby Under a Microscope, which looks terrific.)

A VIAF Reconciliation Service for OpenRefine

open-refine

OpenRefine is a wonderful tool my coworkers have been using to clean data for my project at work. Our workflow has been nice and simple: they take a CSV dump from a database, transform the data in OpenRefine, and export it as CSV. I write scripts to detect the changes and update the database with the new data.

We have a need, in the next few months, to reconcile the names of various individuals and organizations with standard “universal” identifiers for them in the Virtual International Authority File. The tricky part is that any given name in our system might have several candidates in VIAF, so it can’t be a fully automated process. A human being needs to look at them and make a decision. OpenRefine allows you to do this reconciliation, and also provides an interface that lets you choose among candidates.

Communicating with VIAF is not built in, though. Roderic D. M. Page wrote a VIAF reconciliation service, and it’s publicly accessible at the address listed on the linked page (the PHP source code is available here). It works very nicely.

I wanted to write my own version for 2 reasons: 1) I needed it to support the different name types in VIAF, 2) I wanted to host it myself, in case I needed to make large numbers of queries, so as not to be an obnoxious burden on Page’s server.

The project is called refine_viaf and the source code is available at https://github.com/codeforkjeff/refine_viaf.

For those who just want to use it without hosting their own installation, I’ve also made the service publicly accessible at http://refine.codefork.com, where there are instructions on how to configure OpenRefine to use it.

Arriving Late to the Party

chunky_bacon

I started playing with Ruby this past week, at the suggestion of a coworker I respect who thinks highly of the language, and of Rails. I’ve found Programming Ruby 1.9 & 2.0 to be an excellent way to get started.

Now, it’s only been a week, but so far? I really love it.

This surprised me.

Some first thoughts and impressions, subject to change:

1) Ruby is VERY expressive. I’m astounded by what you can do in a few short lines of code. I wasn’t initially thrilled about some of its uses of character symbols, obviously borrowed from Perl, but these are actually pretty judicious and not as bad as I expected.

2) There’s a commonplace idea that Python and Ruby are redundant: if you know one, there’s not a lot of reason to learn the other. I can see this, as they do have many overlapping features and roles, but their philosophies could not be more different. So from an experiential standpoint rather than a business one, they’re both worth learning.

3) Things that Ruby does better than Python: the object orientation is stronger (access controls, single inheritance with a powerful mixin mechanism, most[?] operators are methods); blocks are way more powerful than lambdas; regexes are native constructs; Ruby’s symbols are a nice feature taken from Lisp.

4) I hate that ‘require’ autoimports things into your namespace, like Perl does. This is where I much prefer Python’s ‘import’ statement. For the love of God, DO NOT TOUCH MY NAMESPACE unless I say so. (EDIT: this is actually incorrect! Files that get required typically define modules, which are constants in a global namespace, which is quite different from autoimporting names into the current namespace, like Perl.)

5) The fact that strings are mutable is one of those things that filled me with horror, but I’m not sure it actually matters that much in practice.

6) If Why’s Poignant Guide to Ruby doesn’t inspire you to learn and use Ruby, you may not be human.

As a learning exercise, I rewrote a short Python script that watches RSS feeds for certain keywords. It’s available on github. Even as a newcomer (albeit one with Perl and Python experience), writing the code was intuitive and painless. In fact, it was a strangely pleasing experience. The code could be improved, I’m sure, but arriving at a decent, clean first pass wasn’t hard to do at all.

This feels like a good time to look at Ruby. The initial frenzy and excitement over Ruby, and Rails, in the mid-to-late 2000s has died down considerably, displaced by the current wave of languages focusing on concurrency. Yet its performance, once a huge blight, has been quietly improving from 1.8 to 1.9 to 2.0, and the ecosystem (gems, rbenv, rake) seems very mature.

Now that it’s no longer sexy, it can focus on doing what it was designed for in the first place: making programmers happier to be writing code. Languages these days are emphasizing features and power, but how many actually tout your happiness as its primary reason for existence?

Buffer Menu Sorting in Emacs 24

Upgrading to Emacs 24 a few months ago was mostly seamless. I only encountered one issue that was difficult to fix and has been lingering until today, when I finally got annoyed enough to devote some time to researching and fixing it.

I like having the Buffer Menu configured to always be sorted by filename. In Emacs 23, you could simply set the Buffer-menu-sort-column variable and you were done. That variable isn’t recognized in Emacs 24, despite claims to the contrary on various web sources.

If you look at the code in buff-menu.el, it was rewritten in 24 to use tabulated-list for its display mechanics. There is a Buffer-menu-sort function which is an alias to tabulated-list-sort. Once you are in the *Buffer List* buffer, you can call Buffer-menu-sort interactively using a prefix argument to specify the column: type ‘M-5 M-x Buffer-menu-sort’ to sort by filename. This works very nicely. Doing it a second time reverses the sort order.

So the challenge is to automate this, calling Buffer-menu-sort only once, automatically at the time a *Buffer List* buffer is first created. The mode will keep its entries sorted going forward.

I tried doing this by advising the list-buffer function, and having a buffer local variable keep track of whether I had already called Buffer-menu-sort once before. This didn’t work: something seemed to be resetting my buffer local variable, but I couldn’t figure out what.

The solution I ended up with is below. The advice function checks a variable called jc-buffer-menu which stores the current Buffer Menu buffer, and calls Buffer-menu-sort as needed. Put the code in your .emacs file, and you’re done.

Whew!