Author Archives: jeff

True Empowerment

I fixed a bug in the blacklight-marc gem recently. It involved this line of Ruby code:

vals << (v == 'AD') ? 'Atlas' : 'Map'

Contrary to what it looks like, this line adds a boolean value to the vals array. The << operation returns true, so the entire line of code always evaluates to ‘Atlas’. Then nothing happens with that string.

Obviously, this isn’t what was intended. The problem is that << has higher precedence than the if-else operators. So here’s the fix:

vals << (v == 'AD' ? 'Atlas' : 'Map')

This code path wasn’t being taken all the time, and it also didn’t raise any exceptions: the calling code uses the result as an array of strings, so the booleans get automatically converted to “true” and “false” strings. I just happened to notice those weird values where they didn’t make sense, and thought to dig into it.

Let’s be honest: this is the kind of mistake anyone could easily make. I’m 100% certain I’ve done something similar. In fact, I innocently asked some co-workers what the original line of code did, and of course, they interpreted it incorrectly. It’s a tricky little bug.

I thought to post about this because it’s a perfect example of how, in a loosely typed, dynamic language like Ruby, you’re really on your own.

Dynamic languages can often feel “empowering” because they place trust in the programmer. It’s your responsibility not to write code that does anything really crazy or stupid. But there are a lot of these “gotcha” cases, where you’re writing code that’s quite reasonable, and you simply made a mistake that the language lets you get away with, because it’s interpeted differently from what you intended. It’s valid code. And you won’t figure it out until much later, when it shows up as a symptom elsewhere.

By contrast, with Java or Scala, you wouldn’t be able to do this. The compiler would check the types, and meaningfully, say, “Sorry buddy, it doesn’t make sense to me to add a boolean to a List of Strings,” and you’d immediately notice the problem with operator precedence. And you’d fix it.

Your program would never even be able to run with that error in it. Which is some awfully nice work that the language is doing for you there. That feels like true empowerment to me.

Final note: you could argue that good test coverage would catch this. That’s true, but we all know the difficulties of achieving thorough test coverage under deadlines. And this example is particularly annoying to get thorough coverage for, because the line of code is one case of many different cases of values for the variable ‘v’.

One-liners

I stupidly created some directories with a colon in the filename, which confuses some programs. I wanted to change these colons to underscores. Shell scripting makes my head hurt, so I turned to do it in Ruby instead…


Not a bad one-liner. For all I complain about dynamic languages, they can sure be handy.

What would this look like in Scala (which also has a handy REPL)? Almost a one-liner, if you don’t count the import:

Scala will never be a popular scripting tool, obviously, but it’s cool that you can achieve a Ruby-like level of compactness with it.

Lumen: A port of Blacklight to Scala and the Play framework

I’ve done some work the past two years using Blacklight, a great discovery interface for Solr with a lot of library catalog features. It’s quality software with years of work invested in it by some very smart people.

Over time, “the Ruby way” of doing things, as well as “the Rails way,” has bugged me more and more. Things like the use of naming conventions for hooks, passing arrays and hashes around in lieu of actual data structures, the varying use of hashes and OpenStructs, the ability to monkey patch, the difficulty of looking at a method and not being able to tell what its arguments are or can be, the need to go digging around in the source code of a gem to figure out how certain APIs are dynamically created because those methods can’t get automatically documented by tools like rubydoc or yard. These things often make life easier when you’re writing new code and trying to do it quickly, but they create nightmares when you try to refactor stuff or upgrade gem dependencies.

The last few months, I’ve been slowly porting Blacklight to Scala and the Play framework. I’m calling this new project Lumen.

Scala is a powerful statically typed, compiled language that permits you to mix object-oriented and functional paradigms, and it allows you to take advantage of the enormous ecosystem of existing Java libraries. It also incorporates some of the innovations of the last two decades of dynamic languages that make programmers happy. I think it’s a language that privileges software quality over rapid development.

So far, Lumen has been a hobby project to learn Scala, so I’ve approached it in a less disciplined fashion than I otherwise might. This means there are lots of TODOs scattered throughout, and style/design inconsistencies as I’ve learned better ways to do things but haven’t always gone back to change things everywhere. That’s life when you’re noodling around in your spare time. Lumen is largely an experiment right now, but I hope it will eventually grow into a full-featured, production-quality piece of software. We’ll see.

You can check out a demo here: http://lumen-demo.codefork.com.

Announcing conciliator

I’ve just created a github repository for conciliator, a growing collection of OpenRefine reconciliation services, as well as a Java framework for creating them.

conciliator is a major refactoring of my refine_viaf project and supercedes it. This new project cleanly separates the VIAF-specific parts and the more “boilerplate” pieces needed for any OpenRefine reconciliation service. The result is a framework that allows you to easily write new reconciliation services. My intent here is to make some existing code way more flexible, so that it might be useful to more users and have a longer lifespan.

http://refine.codefork.com has already been running conciliator for a week now; if you’ve been using it, you don’t need to make any changes in OpenRefine.

Currently, conciliator out-of-the-box can query VIAF exactly like refine_viaf does, down to the same URLs. Additionally, conciliator can now query ORCID names. This was a somewhat arbitrary choice; I’ve been doing some ORCID integration at work so it was convenient for me to implement a data source for it as a proof of concept.

With VIAF and ORCID, conciliator acts as an intermediate or “bridge” service, but it would be possible to use conciliator to query other types of data sources as well: files, SQL databases, etc. Right now, you’d have to write your own code to read and parse files, open database connections, etc. But in the future, I hope to add support for these options to make them easier to implement.

For details on how to write your own service in Java using conciliator, see the README.

Are there data sources you’d like to see available as a reconciliation service? Leave a comment to this post. No promises, but I’ll at least consider all requests. And if you write your own service for a data source, please consider submitting your code as a pull request so that others can use it too!

The Myth of Artisanal Programming

Paul Chiusano, the author of the excellent Functional Programming in Scala from Manning (one of the few tech publishers I buy from; worth every penny), recently wrote a blog post titled, “The advantages of static typing, simply stated”.

Lately all I seem to do is rant to people about this exact topic. Paul’s post is way more succinct than anything I can write, so go over there and read it.

While he takes pains to give a balanced treatment of static vs dynamic type systems, it seems much more cut and dry to me. Dynamic languages are easier and faster for development when you’re getting started on a project, and it’s great if that project never gets very big. But they scale very poorly, for all the reasons he describes. Recently, I had the daunting task of reading almost ~10k lines of Perl code (pretty good Perl, in my opinion). It was hard to make sense of and figure out how to modify and extend, whereas the MUCH larger Java codebase (over 100k lines, if I recall) that I worked with years ago felt very manageable.

My own history as a programmer matches Paul’s very closely. I started with Java, which was annoying but not a bad language by any means. Then Python came along and seemed like a liberation from Java’s rigidity and verbosity. But Python, Ruby and others are showing their weaknesses, and it’s no mystery why people are turning to the newer generation of statically typed languages like Scala, Haskell, Go, etc.

People who haven’t been around as long don’t necessarily have this perspective.

In retrospect, it’s interesting to me how we programmers “got sold” on dynamic languages, from a cultural perspective. You might recall that a big selling point was using simple text editors rather than IDEs, and there was this sense that writing code this way made you closer to the software somehow. Java was corporate, while Python was hand-crafted. There was a vague implicit notion of “artisanal” programming in these circles.

The upshot, of course, is that every time you read a chunk of code or call a function or method, your brain has to do a lot of the work that a statically typed language would be able to enforce and verify for you. But in a dynamic language, you won’t know what happens until the code runs. In large measure, the quality of software hinges on how much you can tell, a priori, about code before it runs at all. In a dynamic world, anything can happen, and often does.

This is a nightmare, pure and simple. Much of the strong focus on writing automated tests is to basically make up for the lack of static typing.

True artisanship lies in design: namely, thinking hard about the data structures and code organization you’re committing to. It’s not about being able to take liberties that can result in things that make no sense to the machine and that can cause errors at runtime that could have been caught beforehand.

Data Streams in Ruby

Recently I wrote up some notes on how to do data processing using streams (lazy enumerators) in Ruby. Doing so served two purposes: 1) to help clarify my own thinking about better ways to write code for common data-munging tasks, 2) to pass along to co-workers in the hopes of establishing some informal best practices and initiating some conversations.

I decided to post my notes on github. Take a peek if this sort of thing interests you.

Getting OpenVPN to Add DNS Servers

I couldn’t get my OpenVPN client to add a DNS server that I know the VPN server was telling it about. It turns out that on Xubuntu 16.04 (and all flavors of Ubuntu, probably), you need to supply additional arguments to make it handle the “dhcp-option” information it receives. More specifically, you have to use the –up and –down options to point to an Ubuntu-supplied script that needs to run when the VPN connection goes up and down.

sudo openvpn --config office.ovpn --script-security 2 --up /etc/openvpn/update-resolv-conf --down /etc/openvpn/update-resolv-conf

Upgrading the Touchpad on a Thinkpad x240

This is a stock photo of a Thinkpad x240, stolen from the interwebs:

x240_stock

This is my own x240, which I bought back in January 2015.

x240_trackpad

If you know about Thinkpads, you probably noticed the difference right away. The x240 (and other models that year) suffered from an incredibly crappy buttonless touchpad. It’s so bad that it’s barely usable. Clicking is ridiculously inaccurate: there’s so much travel that the mouse pointer moves during a click, and there are no buttons to use instead. There were so many complaints that Lenovo replaced it with a better one in the next year’s lineup.

This weekend I finally got around to upgrading it with a touchpad replacement part for the x250. It cost $32 on ebay. This modification is popular, so you can find info about it scattered around in forums and such. I followed the instructions on this page, How to change an x240 trackpad, as it’s one of the clearest ones out there. I couldn’t find much info about the author, whose name appears only as “Michael” on that blog.

Some notes and tips from my experience:

1) Michael’s picture shows a set of wires connected to the touchpad along its side, but mine didn’t have them.

2) The touchpad sits in a well, held in place with adhesive tape, so to remove it, you just pry it off. The problem is that it’s hard to reach “under” the entire touchpad assembly, which is sort of like a sandwich with layers. I ended up partially prying off the top layer before I could get to the bottom and pry the whole thing from the case. Needless to say, this bent the touchpad.

I couldn’t figure out a way to avoid effectively destroying the old touchpad. But since it was so crappy, it was also somewhat satisfying.

3) Detaching and re-attaching the small ribbon cable from/to the underside of the touchpad is VERY tricky. The end of the ribbon is held in place to the connector on the touchpad by a thin black “latch” sitting just behind it. You CANNOT just yank the ribbon out. (This took me a while to figure out!) Lift the black latch, and the ribbon will slide from the connector easily. When connecting it to the new touchpad, tuck the ribbon end securely into the connector, then flip the latch down to lock it in place.

4) At first, the new touchpad wasn’t being recognized by the machine. It worked after I re-seated the ribbon in the connector and also reset the BIOS (as shown in this video: stick a paper clip end into the tiny hole beside the battery and press for 20 seconds). I should have tried those things separately, but got a bit too excited. So you may or may not have to do a BIOS reset.

So far so good. The new touchpad is definitely a big improvement. There’s much less click travel using the pad, and it feels snappier. I really like having the buttons.

The only quirk is that the surface of the touchpad now sits just a hair higher than the palm rest. It’s probably slightly more likely for my hand to accidentally brush it while typing, as compared to the original touchpad, but only time will tell for sure.

It feels like a totally different computer.

Annoyances in Xubuntu 16.04 LTS

This week, I installed Xubuntu on a new work computer. I’d previously sworn off Ubuntu, but I admit, I’m crawling back now… the reality is that Ubuntu has smoothed out many of the rough edges that I’m simply not willing to deal with at work. Sigh.

Even as generally polished as Xubuntu is, I did encounter a few hiccups.

1) To adjust settings for the screen locking software, light-locker, I needed to make sure the light-locker-settings package was installed. Nothing happened when I selected “Light Locker Settings” from the whisker menu, though, because it was crashing. I ran “light-locker-settings” via a terminal, and saw some python error messages.

Python was trying to import a module from python-gobject, which wasn’t installed and wasn’t a prerequisite for light-locker-settings for some reason.

After that error went away, I got another one about a missing function. To fix it, you have to manually patch two lines in a python file, as described in this bug report. [NOTE: This has been fixed as of 7/20/2016, in version 1.5.0-0ubuntu1.1 of light-locker-settings]

2) Another light-locker quirk: the mouse pointer becomes invisible when I lock the screen by hitting Ctrl-Alt-Del and then unlock it. To make it visible again, hit Ctrl-Alt-F1 to switch to a text console and then Ctrl-Alt-F7 to return to Xfce.

3) The “Greybird” theme is notorious for making it VERY difficult to resize windows by dragging the handles that appear when you mouse-over the window edges and bottom corners. The pointer has to be EXACTLY on an edge or corner; it won’t display the resize handle if you’re slightly off.

For reasons I don’t understand, the devs seem intent on not changing this. But enough users have complained that the Xubuntu blog even has a post about alternative ways to resize windows. The disregard for user experience here is simply mind-blowing.

I’ve grudgingly started using the Alt and right-click drag combo to resize windows.

Addendum:

4) Intermittent DNS problems: hostnames on our internal domain weren’t always resolving. This seems like a common problem on Ubuntu caused by dnsmasq. The solution is to disable it by commenting out the line “dns=dnsmasq” in /etc/NetworkManager/NetworkManager.conf and rebooting.

“Proxy mode” added in refine_viaf 1.4

A refine_viaf user recently commented that she would like to get Library of Congress IDs for the name candidates in OpenRefine, instead of VIAF IDs.

It would be ideal if the name IDs for LC and other sources could be additional fields in the JSON data returned from refine_viaf, which you could then extract using some GREL code. Unfortunately, OpenRefine doesn’t allow you to access additional fields on name candidate objects.

So I’ve created a separate “proxy mode” that returns IDs used by source institutions themselves, rather than the VIAF IDs. To use proxy mode, add a reconciliation service in OpenRefine using this URL format instead of the usual URL:

http://refine.codefork.com/reconcile/viafproxy/LC

One quirk is that OpenRefine will create broken hyperlinks for a few sources (at the moment, these are BNC, BNF, DBC, and NUKAT). This is due to the fact that the IDs in these URLs don’t match the name record IDs, which is a requirement for the hyperlinking mechanism to work properly.

In short, you can now use refine_viaf to reconcile “directly” against the name authority records of VIAF’s source institutions, which should be useful to many people.