Category Archives: ruby

True Empowerment

I fixed a bug in the blacklight-marc gem recently. It involved this line of Ruby code:

vals << (v == 'AD') ? 'Atlas' : 'Map'

Contrary to what it looks like, this line adds a boolean value to the vals array. The << operation returns true, so the entire line of code always evaluates to ‘Atlas’. Then nothing happens with that string.

Obviously, this isn’t what was intended. The problem is that << has higher precedence than the if-else operators. So here’s the fix:

vals << (v == 'AD' ? 'Atlas' : 'Map')

This code path wasn’t being taken all the time, and it also didn’t raise any exceptions: the calling code uses the result as an array of strings, so the booleans get automatically converted to “true” and “false” strings. I just happened to notice those weird values where they didn’t make sense, and thought to dig into it.

Let’s be honest: this is the kind of mistake anyone could easily make. I’m 100% certain I’ve done something similar. In fact, I innocently asked some co-workers what the original line of code did, and of course, they interpreted it incorrectly. It’s a tricky little bug.

I thought to post about this because it’s a perfect example of how, in a loosely typed, dynamic language like Ruby, you’re really on your own.

Dynamic languages can often feel “empowering” because they place trust in the programmer. It’s your responsibility not to write code that does anything really crazy or stupid. But there are a lot of these “gotcha” cases, where you’re writing code that’s quite reasonable, and you simply made a mistake that the language lets you get away with, because it’s interpeted differently from what you intended. It’s valid code. And you won’t figure it out until much later, when it shows up as a symptom elsewhere.

By contrast, with Java or Scala, you wouldn’t be able to do this. The compiler would check the types, and meaningfully, say, “Sorry buddy, it doesn’t make sense to me to add a boolean to a List of Strings,” and you’d immediately notice the problem with operator precedence. And you’d fix it.

Your program would never even be able to run with that error in it. Which is some awfully nice work that the language is doing for you there. That feels like true empowerment to me.

Final note: you could argue that good test coverage would catch this. That’s true, but we all know the difficulties of achieving thorough test coverage under deadlines. And this example is particularly annoying to get thorough coverage for, because the line of code is one case of many different cases of values for the variable ‘v’.

One-liners

I stupidly created some directories with a colon in the filename, which confuses some programs. I wanted to change these colons to underscores. Shell scripting makes my head hurt, so I turned to do it in Ruby instead…


Not a bad one-liner. For all I complain about dynamic languages, they can sure be handy.

What would this look like in Scala (which also has a handy REPL)? Almost a one-liner, if you don’t count the import:

Scala will never be a popular scripting tool, obviously, but it’s cool that you can achieve a Ruby-like level of compactness with it.

Lumen: A port of Blacklight to Scala and the Play framework

I’ve done some work the past two years using Blacklight, a great discovery interface for Solr with a lot of library catalog features. It’s quality software with years of work invested in it by some very smart people.

Over time, “the Ruby way” of doing things, as well as “the Rails way,” has bugged me more and more. Things like the use of naming conventions for hooks, passing arrays and hashes around in lieu of actual data structures, the varying use of hashes and OpenStructs, the ability to monkey patch, the difficulty of looking at a method and not being able to tell what its arguments are or can be, the need to go digging around in the source code of a gem to figure out how certain APIs are dynamically created because those methods can’t get automatically documented by tools like rubydoc or yard. These things often make life easier when you’re writing new code and trying to do it quickly, but they create nightmares when you try to refactor stuff or upgrade gem dependencies.

The last few months, I’ve been slowly porting Blacklight to Scala and the Play framework. I’m calling this new project Lumen.

Scala is a powerful statically typed, compiled language that permits you to mix object-oriented and functional paradigms, and it allows you to take advantage of the enormous ecosystem of existing Java libraries. It also incorporates some of the innovations of the last two decades of dynamic languages that make programmers happy. I think it’s a language that privileges software quality over rapid development.

So far, Lumen has been a hobby project to learn Scala, so I’ve approached it in a less disciplined fashion than I otherwise might. This means there are lots of TODOs scattered throughout, and style/design inconsistencies as I’ve learned better ways to do things but haven’t always gone back to change things everywhere. That’s life when you’re noodling around in your spare time. Lumen is largely an experiment right now, but I hope it will eventually grow into a full-featured, production-quality piece of software. We’ll see.

You can check out a demo here: http://lumen-demo.codefork.com.

The Myth of Artisanal Programming

Paul Chiusano, the author of the excellent Functional Programming in Scala from Manning (one of the few tech publishers I buy from; worth every penny), recently wrote a blog post titled, “The advantages of static typing, simply stated”.

Lately all I seem to do is rant to people about this exact topic. Paul’s post is way more succinct than anything I can write, so go over there and read it.

While he takes pains to give a balanced treatment of static vs dynamic type systems, it seems much more cut and dry to me. Dynamic languages are easier and faster for development when you’re getting started on a project, and it’s great if that project never gets very big. But they scale very poorly, for all the reasons he describes. Recently, I had the daunting task of reading almost ~10k lines of Perl code (pretty good Perl, in my opinion). It was hard to make sense of and figure out how to modify and extend, whereas the MUCH larger Java codebase (over 100k lines, if I recall) that I worked with years ago felt very manageable.

My own history as a programmer matches Paul’s very closely. I started with Java, which was annoying but not a bad language by any means. Then Python came along and seemed like a liberation from Java’s rigidity and verbosity. But Python, Ruby and others are showing their weaknesses, and it’s no mystery why people are turning to the newer generation of statically typed languages like Scala, Haskell, Go, etc.

People who haven’t been around as long don’t necessarily have this perspective.

In retrospect, it’s interesting to me how we programmers “got sold” on dynamic languages, from a cultural perspective. You might recall that a big selling point was using simple text editors rather than IDEs, and there was this sense that writing code this way made you closer to the software somehow. Java was corporate, while Python was hand-crafted. There was a vague implicit notion of “artisanal” programming in these circles.

The upshot, of course, is that every time you read a chunk of code or call a function or method, your brain has to do a lot of the work that a statically typed language would be able to enforce and verify for you. But in a dynamic language, you won’t know what happens until the code runs. In large measure, the quality of software hinges on how much you can tell, a priori, about code before it runs at all. In a dynamic world, anything can happen, and often does.

This is a nightmare, pure and simple. Much of the strong focus on writing automated tests is to basically make up for the lack of static typing.

True artisanship lies in design: namely, thinking hard about the data structures and code organization you’re committing to. It’s not about being able to take liberties that can result in things that make no sense to the machine and that can cause errors at runtime that could have been caught beforehand.

Data Streams in Ruby

Recently I wrote up some notes on how to do data processing using streams (lazy enumerators) in Ruby. Doing so served two purposes: 1) to help clarify my own thinking about better ways to write code for common data-munging tasks, 2) to pass along to co-workers in the hopes of establishing some informal best practices and initiating some conversations.

I decided to post my notes on github. Take a peek if this sort of thing interests you.

Variable assignment quirks in Ruby

# irb
irb(main):001:0> puts x
NameError: undefined local variable or method 'x' for main:Object
	from (irb):1
	from /usr/bin/irb:11:in '<main>'

Okay. I expected that.

irb(main):002:0> x = x + 1
NoMethodError: undefined method '+' for nil:NilClass
	from (irb):2
	from /usr/bin/irb:11:in '<main>'

Whoa. Did NOT expect that! Wait, so does that mean…?

irb(main):003:0> z = z
=> nil

Hmm. Okay. Well, time to call it a day.

Managing Dependencies in Python vs Ruby

Ruby's Bundler tool is amazing.

Ruby’s Bundler tool is amazing.

With Python projects, the standard way of doing things is to set up a virtualenv and use pip to install packages from PyPI specified in a requirements.txt file. This way, each of your project’s dependencies are kept separate, installed in their own directories in isolated sandboxed environments.

This works pretty well. But sometimes, when I am debugging a third party package, I want to be able to get the source code from git and use it instead of the package from PyPI, so I can make changes, troubleshoot, experiment, etc. This is a pain in the butt. You have to remove the installed package and either 1) install manually (and repeatedly, as you work) from your cloned repository, or 2) add the repository directory to your Python library path somehow. Then you have to undo these changes to go back to using the PyPI package. Either way, it’s clunky and annoying.

Ruby’s bundler tool has a very different approach to dependencies. It, too, downloads appropriate versions of gems (which is what packages are called), which are listed in a Gemfile. But unlike pip, it can store multiple versions of a gem, and even let you specify that a gem lives in a github or local repository; moreover, it makes the right packages available each time you run your program! That is, each time you run the “bundle exec” wrapper to run Rails or anything else, it sets up a custom set of directories for Ruby’s library path that point ONLY to the versions you want, ignoring the others.

I did this today when trying to pin down the source of some deprecation warnings I was seeing after some gem upgrades. My Gemfile had these lines in it:

gem 'sunspot_rails', '~> 2.1.0'
gem 'sunspot_solr', '~> 2.1.0'

I cloned the sunspot repository containing those gems. Then I ran:

# bundle config local.sunspot_rails ~/sunspot
# bundle config local.sunspot_solr ~/sunspot

And changed the Gemfile lines:

gem 'sunspot_rails', :github => 'sunspot/sunspot_rails', :branch => 'master'
gem 'sunspot_solr', :github => 'sunspot/sunspot_solr', :branch => 'master'

Finally, I ran “bundler update”. That’s it! I could make changes to my cloned repository, restart Rails, and see the changes immediately.

When I was done messing around, I changed my Gemfile back, ran “bundler update” again, and I was back to using my original gems.

Being able to work so easily with third party code allowed me to quickly figure out where the deprecated calls were being made and file an issue with the sunspot project.

Demystifying Ruby functions

(NOTE: Edited slightly on 6/6/2015 for better clarity about the behavior of Object in different versions of Ruby. This post mostly applies to version 1.9.)

Something that confused me about Ruby at first was why tutorials, articles, and books kept referring to Ruby “methods” when talking about functions. For example:

def hello
 puts "hi there"
end

Since hello is not being defined as part of a class or module, it stands on its own. Any reasonable programmer would call this a function, not a method. Initially, I took this as some odd quirk of the Ruby community. They include some pretty strange folks, after all.

Turns out there’s a good reason for this terminology. It IS a method.

As Pat Shaughnessy explains in this article, when you execute def hello at the top level, it gets added as an instance method to the Object class. You can verify this yourself:

def hello
 puts "hi there"
end
# this will print 'true'
puts Object.respond_to? :hello

When you call hello, the method is invoked on a receiver (ie. self) that is an object instance called “main”, whose class is Object.

def hello
  puts "hi there"
  puts "self is #{self}, its class is #{self.class}"
end
# this prints 'self is main, its class is Object'
hello

This explains why defining methods at the top level appears to make them globally accessible. What actually happens is that, when you call hello from any class or module, Ruby will look for it up the class hierarchy until it finds the method in Object.

This also explains something else not immediately obvious to the newbie rubyist: things such as puts are not builtins or special cases, but just regular methods in the Kernel class, which is the parent of Object. Whether you call puts at the top level or from inside a class or module, Ruby will look up the class hierarchy and find it in Kernel.

Kernel‘s members include things that might surprise you, like require!

These simple mechanisms—the implicit “main”, the Object class, and the Kernel module—are stunningly elegant design ideas (and, apparently, according to Shaughnessy, ideas adapted from Smalltalk). It allows Ruby to provide the convenient calling syntax of functions, while remaining object-oriented down to its very core.

(Compare this to Python, which does support “real” functions and differentiates them from methods. The distinction is less important in practice, as Python treats them both as “callables”–ie. arbitrary executable things.)

One small note: Shaughnessy writes that “All Ruby functions are actually private methods of Object.” In his example, he prints Object.private_instance_methods and finds his newly defined function there. This is true in Ruby 2.1, but not in 1.9.3, where functions are public:

def hello
 puts "hi there"
end
# In Ruby 1.9.3, this will print 'false' and then 'true'
puts Object.private_instance_methods.member? :hello
puts Object.methods.member? :hello

I am just now beginning to realize how much Ruby is truly an object oriented language through and through. Parts of the language design might seem ad hoc and magical, which is indeed part of Ruby’s beauty and allure, but some very consistent and elegant mechanisms under the hood are what make them work.

(I’ve eagerly ordered Shaughnessy’s book, Ruby Under a Microscope, which looks terrific.)

Arriving Late to the Party

chunky_bacon

I started playing with Ruby this past week, at the suggestion of a coworker I respect who thinks highly of the language, and of Rails. I’ve found Programming Ruby 1.9 & 2.0 to be an excellent way to get started.

Now, it’s only been a week, but so far? I really love it.

This surprised me.

Some first thoughts and impressions, subject to change:

1) Ruby is VERY expressive. I’m astounded by what you can do in a few short lines of code. I wasn’t initially thrilled about some of its uses of character symbols, obviously borrowed from Perl, but these are actually pretty judicious and not as bad as I expected.

2) There’s a commonplace idea that Python and Ruby are redundant: if you know one, there’s not a lot of reason to learn the other. I can see this, as they do have many overlapping features and roles, but their philosophies could not be more different. So from an experiential standpoint rather than a business one, they’re both worth learning.

3) Things that Ruby does better than Python: the object orientation is stronger (access controls, single inheritance with a powerful mixin mechanism, most[?] operators are methods); blocks are way more powerful than lambdas; regexes are native constructs; Ruby’s symbols are a nice feature taken from Lisp.

4) I hate that ‘require’ autoimports things into your namespace, like Perl does. This is where I much prefer Python’s ‘import’ statement. For the love of God, DO NOT TOUCH MY NAMESPACE unless I say so. (EDIT: this is actually incorrect! Files that get required typically define modules, which are constants in a global namespace, which is quite different from autoimporting names into the current namespace, like Perl.)

5) The fact that strings are mutable is one of those things that filled me with horror, but I’m not sure it actually matters that much in practice.

6) If Why’s Poignant Guide to Ruby doesn’t inspire you to learn and use Ruby, you may not be human.

As a learning exercise, I rewrote a short Python script that watches RSS feeds for certain keywords. It’s available on github. Even as a newcomer (albeit one with Perl and Python experience), writing the code was intuitive and painless. In fact, it was a strangely pleasing experience. The code could be improved, I’m sure, but arriving at a decent, clean first pass wasn’t hard to do at all.

This feels like a good time to look at Ruby. The initial frenzy and excitement over Ruby, and Rails, in the mid-to-late 2000s has died down considerably, displaced by the current wave of languages focusing on concurrency. Yet its performance, once a huge blight, has been quietly improving from 1.8 to 1.9 to 2.0, and the ecosystem (gems, rbenv, rake) seems very mature.

Now that it’s no longer sexy, it can focus on doing what it was designed for in the first place: making programmers happier to be writing code. Languages these days are emphasizing features and power, but how many actually tout your happiness as its primary reason for existence?