Generator Expressions

I discovered Python generator expressions this week. They’ve been around since v2.4, way back in 2004, and yet, somehow, they managed to escape my notice until now. I don’t think I’ve ever seen them used in any codebase I’ve worked on.

Generator expressions are almost the same as list comprehension syntax, except instead of square brackets, you use parentheses. And instead of returning a list, the comprehension returns a generator object.

This is quite a nice bit of syntactic sugar that helps to keep things more succinct than writing iterators and generator functions. But it’s not as succinct as, say, the chained method syntax you find in Scala or Ruby, which I think is vastly superior in terms of clarity. Several years ago, I wrote up some notes about Ruby streams in a github repo for my co-workers because it was a valuable technique in some data processing scripts we were working on. (Since then, I haven’t done much Ruby.)

Here’s a bit of Python code that duplicates the simple Ruby example in the linked repo above, with the identical output. Again, it’s not nearly as nice to look at as chained method calls, but still better than having to write separate generator functions or classes.

nums = list(range(1,6))

add10 = lambda i: print(f"adding 10 to {i}") or i + 10
filter_even = lambda i: print(f"filtering {i} on evenness") or i % 2 == 0
output = lambda i: print(f"in each: {i}")

# list comprehensions

r1 = [ add10(x) for x in nums ]
r2 = [ x for x in r1 if filter_even(x) ]
r3 = [ output(x) for x in r2 ]

# generator expressions

r1 = ( add10(x) for x in nums )
r2 = ( x for x in r1 if filter_even(x) )
r3 = ( output(x) for x in r2 )
list(r3)

True Empowerment

I fixed a bug in the blacklight-marc gem recently. It involved this line of Ruby code:

vals << (v == 'AD') ? 'Atlas' : 'Map'

Contrary to what it looks like, this line adds a boolean value to the vals array. The << operation returns true, so the entire line of code always evaluates to ‘Atlas’. Then nothing happens with that string.

Obviously, this isn’t what was intended. The problem is that << has higher precedence than the if-else operators. So here’s the fix:

vals << (v == 'AD' ? 'Atlas' : 'Map')

This code path wasn’t being taken all the time, and it also didn’t raise any exceptions: the calling code uses the result as an array of strings, so the booleans get automatically converted to “true” and “false” strings. I just happened to notice those weird values where they didn’t make sense, and thought to dig into it.

Let’s be honest: this is the kind of mistake anyone could easily make. I’m 100% certain I’ve done something similar. In fact, I innocently asked some co-workers what the original line of code did, and of course, they interpreted it incorrectly. It’s a tricky little bug.

I thought to post about this because it’s a perfect example of how, in a loosely typed, dynamic language like Ruby, you’re really on your own.

Dynamic languages can often feel “empowering” because they place trust in the programmer. It’s your responsibility not to write code that does anything really crazy or stupid. But there are a lot of these “gotcha” cases, where you’re writing code that’s quite reasonable, and you simply made a mistake that the language lets you get away with, because it’s interpeted differently from what you intended. It’s valid code. And you won’t figure it out until much later, when it shows up as a symptom elsewhere.

By contrast, with Java or Scala, you wouldn’t be able to do this. The compiler would check the types, and meaningfully, say, “Sorry buddy, it doesn’t make sense to me to add a boolean to a List of Strings,” and you’d immediately notice the problem with operator precedence. And you’d fix it.

Your program would never even be able to run with that error in it. Which is some awfully nice work that the language is doing for you there. That feels like true empowerment to me.

Final note: you could argue that good test coverage would catch this. That’s true, but we all know the difficulties of achieving thorough test coverage under deadlines. And this example is particularly annoying to get thorough coverage for, because the line of code is one case of many different cases of values for the variable ‘v’.

One-liners

I stupidly created some directories with a colon in the filename, which confuses some programs. I wanted to change these colons to underscores. Shell scripting makes my head hurt, so I turned to do it in Ruby instead…


Not a bad one-liner. For all I complain about dynamic languages, they can sure be handy.

What would this look like in Scala (which also has a handy REPL)? Almost a one-liner, if you don’t count the import:

Scala will never be a popular scripting tool, obviously, but it’s cool that you can achieve a Ruby-like level of compactness with it.

Lumen: A port of Blacklight to Scala and the Play framework

I’ve done some work the past two years using Blacklight, a great discovery interface for Solr with a lot of library catalog features. It’s quality software with years of work invested in it by some very smart people.

Over time, “the Ruby way” of doing things, as well as “the Rails way,” has bugged me more and more. Things like the use of naming conventions for hooks, passing arrays and hashes around in lieu of actual data structures, the varying use of hashes and OpenStructs, the ability to monkey patch, the difficulty of looking at a method and not being able to tell what its arguments are or can be, the need to go digging around in the source code of a gem to figure out how certain APIs are dynamically created because those methods can’t get automatically documented by tools like rubydoc or yard. These things often make life easier when you’re writing new code and trying to do it quickly, but they create nightmares when you try to refactor stuff or upgrade gem dependencies.

The last few months, I’ve been slowly porting Blacklight to Scala and the Play framework. I’m calling this new project Lumen.

Scala is a powerful statically typed, compiled language that permits you to mix object-oriented and functional paradigms, and it allows you to take advantage of the enormous ecosystem of existing Java libraries. It also incorporates some of the innovations of the last two decades of dynamic languages that make programmers happy. I think it’s a language that privileges software quality over rapid development.

So far, Lumen has been a hobby project to learn Scala, so I’ve approached it in a less disciplined fashion than I otherwise might. This means there are lots of TODOs scattered throughout, and style/design inconsistencies as I’ve learned better ways to do things but haven’t always gone back to change things everywhere. That’s life when you’re noodling around in your spare time. Lumen is largely an experiment right now, but I hope it will eventually grow into a full-featured, production-quality piece of software. We’ll see.

You can check out a demo here: http://lumen-demo.codefork.com.

The Myth of Artisanal Programming

Paul Chiusano, the author of the excellent Functional Programming in Scala from Manning (one of the few tech publishers I buy from; worth every penny), recently wrote a blog post titled, “The advantages of static typing, simply stated”.

Lately all I seem to do is rant to people about this exact topic. Paul’s post is way more succinct than anything I can write, so go over there and read it.

While he takes pains to give a balanced treatment of static vs dynamic type systems, it seems much more cut and dry to me. Dynamic languages are easier and faster for development when you’re getting started on a project, and it’s great if that project never gets very big. But they scale very poorly, for all the reasons he describes. Recently, I had the daunting task of reading almost ~10k lines of Perl code (pretty good Perl, in my opinion). It was hard to make sense of and figure out how to modify and extend, whereas the MUCH larger Java codebase (over 100k lines, if I recall) that I worked with years ago felt very manageable.

My own history as a programmer matches Paul’s very closely. I started with Java, which was annoying but not a bad language by any means. Then Python came along and seemed like a liberation from Java’s rigidity and verbosity. But Python, Ruby and others are showing their weaknesses, and it’s no mystery why people are turning to the newer generation of statically typed languages like Scala, Haskell, Go, etc.

People who haven’t been around as long don’t necessarily have this perspective.

In retrospect, it’s interesting to me how we programmers “got sold” on dynamic languages, from a cultural perspective. You might recall that a big selling point was using simple text editors rather than IDEs, and there was this sense that writing code this way made you closer to the software somehow. Java was corporate, while Python was hand-crafted. There was a vague implicit notion of “artisanal” programming in these circles.

The upshot, of course, is that every time you read a chunk of code or call a function or method, your brain has to do a lot of the work that a statically typed language would be able to enforce and verify for you. But in a dynamic language, you won’t know what happens until the code runs. In large measure, the quality of software hinges on how much you can tell, a priori, about code before it runs at all. In a dynamic world, anything can happen, and often does.

This is a nightmare, pure and simple. Much of the strong focus on writing automated tests is to basically make up for the lack of static typing.

True artisanship lies in design: namely, thinking hard about the data structures and code organization you’re committing to. It’s not about being able to take liberties that can result in things that make no sense to the machine and that can cause errors at runtime that could have been caught beforehand.