Data Streams in Ruby

Recently I wrote up some notes on how to do data processing using streams (lazy enumerators) in Ruby. Doing so served two purposes: 1) to help clarify my own thinking about better ways to write code for common data-munging tasks, 2) to pass along to co-workers in the hopes of establishing some informal best practices and initiating some conversations.

I decided to post my notes on github. Take a peek if this sort of thing interests you.

Variable assignment quirks in Ruby

# irb
irb(main):001:0> puts x
NameError: undefined local variable or method 'x' for main:Object
	from (irb):1
	from /usr/bin/irb:11:in '<main>'

Okay. I expected that.

irb(main):002:0> x = x + 1
NoMethodError: undefined method '+' for nil:NilClass
	from (irb):2
	from /usr/bin/irb:11:in '<main>'

Whoa. Did NOT expect that! Wait, so does that mean…?

irb(main):003:0> z = z
=> nil

Hmm. Okay. Well, time to call it a day.

Managing Dependencies in Python vs Ruby

Ruby's Bundler tool is amazing.
Ruby’s Bundler tool is amazing.

With Python projects, the standard way of doing things is to set up a virtualenv and use pip to install packages from PyPI specified in a requirements.txt file. This way, each of your project’s dependencies are kept separate, installed in their own directories in isolated sandboxed environments.

This works pretty well. But sometimes, when I am debugging a third party package, I want to be able to get the source code from git and use it instead of the package from PyPI, so I can make changes, troubleshoot, experiment, etc. This is a pain in the butt. You have to remove the installed package and either 1) install manually (and repeatedly, as you work) from your cloned repository, or 2) add the repository directory to your Python library path somehow. Then you have to undo these changes to go back to using the PyPI package. Either way, it’s clunky and annoying.

Ruby’s bundler tool has a very different approach to dependencies. It, too, downloads appropriate versions of gems (which is what packages are called), which are listed in a Gemfile. But unlike pip, it can store multiple versions of a gem, and even let you specify that a gem lives in a github or local repository; moreover, it makes the right packages available each time you run your program! That is, each time you run the “bundle exec” wrapper to run Rails or anything else, it sets up a custom set of directories for Ruby’s library path that point ONLY to the versions you want, ignoring the others.

I did this today when trying to pin down the source of some deprecation warnings I was seeing after some gem upgrades. My Gemfile had these lines in it:

gem 'sunspot_rails', '~> 2.1.0'
gem 'sunspot_solr', '~> 2.1.0'

I cloned the sunspot repository containing those gems. Then I ran:

# bundle config local.sunspot_rails ~/sunspot
# bundle config local.sunspot_solr ~/sunspot

And changed the Gemfile lines:

gem 'sunspot_rails', :github => 'sunspot/sunspot_rails', :branch => 'master'
gem 'sunspot_solr', :github => 'sunspot/sunspot_solr', :branch => 'master'

Finally, I ran “bundler update”. That’s it! I could make changes to my cloned repository, restart Rails, and see the changes immediately.

When I was done messing around, I changed my Gemfile back, ran “bundler update” again, and I was back to using my original gems.

Being able to work so easily with third party code allowed me to quickly figure out where the deprecated calls were being made and file an issue with the sunspot project.

Demystifying Ruby functions

(NOTE: Edited slightly on 6/6/2015 for better clarity about the behavior of Object in different versions of Ruby. This post mostly applies to version 1.9.)

Something that confused me about Ruby at first was why tutorials, articles, and books kept referring to Ruby “methods” when talking about functions. For example:

def hello
 puts "hi there"
end

Since hello is not being defined as part of a class or module, it stands on its own. Any reasonable programmer would call this a function, not a method. Initially, I took this as some odd quirk of the Ruby community. They include some pretty strange folks, after all.

Turns out there’s a good reason for this terminology. It IS a method.

As Pat Shaughnessy explains in this article, when you execute def hello at the top level, it gets added as an instance method to the Object class. You can verify this yourself:

def hello
 puts "hi there"
end
# this will print 'true'
puts Object.respond_to? :hello

When you call hello, the method is invoked on a receiver (ie. self) that is an object instance called “main”, whose class is Object.

def hello
  puts "hi there"
  puts "self is #{self}, its class is #{self.class}"
end
# this prints 'self is main, its class is Object'
hello

This explains why defining methods at the top level appears to make them globally accessible. What actually happens is that, when you call hello from any class or module, Ruby will look for it up the class hierarchy until it finds the method in Object.

This also explains something else not immediately obvious to the newbie rubyist: things such as puts are not builtins or special cases, but just regular methods in the Kernel class, which is the parent of Object. Whether you call puts at the top level or from inside a class or module, Ruby will look up the class hierarchy and find it in Kernel.

Kernel‘s members include things that might surprise you, like require!

These simple mechanisms—the implicit “main”, the Object class, and the Kernel module—are stunningly elegant design ideas (and, apparently, according to Shaughnessy, ideas adapted from Smalltalk). It allows Ruby to provide the convenient calling syntax of functions, while remaining object-oriented down to its very core.

(Compare this to Python, which does support “real” functions and differentiates them from methods. The distinction is less important in practice, as Python treats them both as “callables”–ie. arbitrary executable things.)

One small note: Shaughnessy writes that “All Ruby functions are actually private methods of Object.” In his example, he prints Object.private_instance_methods and finds his newly defined function there. This is true in Ruby 2.1, but not in 1.9.3, where functions are public:

def hello
 puts "hi there"
end
# In Ruby 1.9.3, this will print 'false' and then 'true'
puts Object.private_instance_methods.member? :hello
puts Object.methods.member? :hello

I am just now beginning to realize how much Ruby is truly an object oriented language through and through. Parts of the language design might seem ad hoc and magical, which is indeed part of Ruby’s beauty and allure, but some very consistent and elegant mechanisms under the hood are what make them work.

(I’ve eagerly ordered Shaughnessy’s book, Ruby Under a Microscope, which looks terrific.)

Arriving Late to the Party

chunky_bacon

I started playing with Ruby this past week, at the suggestion of a coworker I respect who thinks highly of the language, and of Rails. I’ve found Programming Ruby 1.9 & 2.0 to be an excellent way to get started.

Now, it’s only been a week, but so far? I really love it.

This surprised me.

Some first thoughts and impressions, subject to change:

1) Ruby is VERY expressive. I’m astounded by what you can do in a few short lines of code. I wasn’t initially thrilled about some of its uses of character symbols, obviously borrowed from Perl, but these are actually pretty judicious and not as bad as I expected.

2) There’s a commonplace idea that Python and Ruby are redundant: if you know one, there’s not a lot of reason to learn the other. I can see this, as they do have many overlapping features and roles, but their philosophies could not be more different. So from an experiential standpoint rather than a business one, they’re both worth learning.

3) Things that Ruby does better than Python: the object orientation is stronger (access controls, single inheritance with a powerful mixin mechanism, most[?] operators are methods); blocks are way more powerful than lambdas; regexes are native constructs; Ruby’s symbols are a nice feature taken from Lisp.

4) I hate that ‘require’ autoimports things into your namespace, like Perl does. This is where I much prefer Python’s ‘import’ statement. For the love of God, DO NOT TOUCH MY NAMESPACE unless I say so. (EDIT: this is actually incorrect! Files that get required typically define modules, which are constants in a global namespace, which is quite different from autoimporting names into the current namespace, like Perl.)

5) The fact that strings are mutable is one of those things that filled me with horror, but I’m not sure it actually matters that much in practice.

6) If Why’s Poignant Guide to Ruby doesn’t inspire you to learn and use Ruby, you may not be human.

As a learning exercise, I rewrote a short Python script that watches RSS feeds for certain keywords. It’s available on github. Even as a newcomer (albeit one with Perl and Python experience), writing the code was intuitive and painless. In fact, it was a strangely pleasing experience. The code could be improved, I’m sure, but arriving at a decent, clean first pass wasn’t hard to do at all.

This feels like a good time to look at Ruby. The initial frenzy and excitement over Ruby, and Rails, in the mid-to-late 2000s has died down considerably, displaced by the current wave of languages focusing on concurrency. Yet its performance, once a huge blight, has been quietly improving from 1.8 to 1.9 to 2.0, and the ecosystem (gems, rbenv, rake) seems very mature.

Now that it’s no longer sexy, it can focus on doing what it was designed for in the first place: making programmers happier to be writing code. Languages these days are emphasizing features and power, but how many actually tout your happiness as its primary reason for existence?