Data as Coping Mechanism

Life under this pandemic has been hard. Oddly, one of the things that’s helped me deal is to play around with the coronavirus data. The numbers in the U.S. are horrifying, of course, but they’ve also been soothing at a technical level, maybe because working with the data is somewhat different from the work I do for my job. It’s also been neat to do hands-on validation of reporting in the media and various claims made about trends. I’ve been showing my results to friends who seem to find them insightful.

The repository is here. There are links in the README to the charts and visualizations.

Some technical reflections on this little hobby project:

I used Python and SQLite. Apache Spark seemed like overkill and I haven’t spent enough time with Spark to be able to troubleshoot intermediate pipeline steps as easily as I can in a plain SQL database. SQLite is fantastic for doing ETL or ELT. I can’t recommend it enough for non-“big data” scenarios. It’s fast (if you use local disk access), has enough SQL features for most ELT/ETL work, and is well-suited for use by a single user. It’s also good if the end goal is to produce data that will ultimately get imported into another system, say, a full-fledged RDBMS data warehouse that serves multiple users.

Currently, with just over 5 months of county- and state-level data, it takes ~2 minutes to to load all the raw data, transform it into dimensional tables that calculate various measures, and create data files used by the web pages that display tables and charts. The SQLite file is 850 MB which includes a lot of stage tables. This is on my laptop with a i5-7300U processor. Not too bad.

I created a Makefile to handle dependencies in the data pipeline, so that it only re-runs parts as needed. It’s currently not as fine-grained as it could be. For example, any change in the JHU CSSE data files will reload ALL the files into the database, but that portion of the code takes only maybe 10s total anyway. Similarly, all the dimensional models are created in a single process and could be split out. I’m happy with how it turned out overall with the qualification that writing and maintaining the Makefile is a bit of a pain. I might try using Apache Airflow instead at some point.

Storing data files in a git repo feels gross. But I did this so the chart and map web pages served through GitHub Pages could load the static data files. It’s a simple and free hosting solution.

In general, I like how simple this setup turned out to be and how easy it’s been to add new measures or tweak existing ones.

EDIT: In November 2020, I switched to using BigQuery.

Getting OpenVPN to Add DNS Servers

I couldn’t get my OpenVPN client to add a DNS server that I know the VPN server was telling it about. It turns out that on Xubuntu 16.04 (and all flavors of Ubuntu, probably), you need to supply additional arguments to make it handle the “dhcp-option” information it receives. More specifically, you have to use the –up and –down options to point to an Ubuntu-supplied script that needs to run when the VPN connection goes up and down.

sudo openvpn --config office.ovpn --script-security 2 --up /etc/openvpn/update-resolv-conf --down /etc/openvpn/update-resolv-conf

Annoyances in Xubuntu 16.04 LTS

This week, I installed Xubuntu on a new work computer. I’d previously sworn off Ubuntu, but I admit, I’m crawling back now… the reality is that Ubuntu has smoothed out many of the rough edges that I’m simply not willing to deal with at work. Sigh.

Even as generally polished as Xubuntu is, I did encounter a few hiccups.

1) To adjust settings for the screen locking software, light-locker, I needed to make sure the light-locker-settings package was installed. Nothing happened when I selected “Light Locker Settings” from the whisker menu, though, because it was crashing. I ran “light-locker-settings” via a terminal, and saw some python error messages.

Python was trying to import a module from python-gobject, which wasn’t installed and wasn’t a prerequisite for light-locker-settings for some reason.

After that error went away, I got another one about a missing function. To fix it, you have to manually patch two lines in a python file, as described in this bug report. [NOTE: This has been fixed as of 7/20/2016, in version 1.5.0-0ubuntu1.1 of light-locker-settings]

2) Another light-locker quirk: the mouse pointer becomes invisible when I lock the screen by hitting Ctrl-Alt-Del and then unlock it. To make it visible again, hit Ctrl-Alt-F1 to switch to a text console and then Ctrl-Alt-F7 to return to Xfce.

3) The “Greybird” theme is notorious for making it VERY difficult to resize windows by dragging the handles that appear when you mouse-over the window edges and bottom corners. The pointer has to be EXACTLY on an edge or corner; it won’t display the resize handle if you’re slightly off.

For reasons I don’t understand, the devs seem intent on not changing this. But enough users have complained that the Xubuntu blog even has a post about alternative ways to resize windows. The disregard for user experience here is simply mind-blowing.

I’ve grudgingly started using the Alt and right-click drag combo to resize windows.

Addendum:

4) Intermittent DNS problems: hostnames on our internal domain weren’t always resolving. This seems like a common problem on Ubuntu caused by dnsmasq. The solution is to disable it by commenting out the line “dns=dnsmasq” in /etc/NetworkManager/NetworkManager.conf and rebooting.

The Linux Desktop bleeding edge

I’m having some trouble running Firefox 46, which was released in late April. I’ve had to roll back to 45.0.2 for now. No big deal, but my woes are pretty indicative of the complexities of the Linux desktop, so I thought it’d be interesting to write a little about it.

I run the “testing” distribution of Debian. Its stability lies somewhere between the “unstable” (things are largely untested) and “stable” (release quality) dists, so it’s pretty good, although the occasional hiccup is to be expected. My desktop environment is Xfce.

Firefox 46 contained a big change: the official binary releases are compiled using gtk3 instead of gtk2. I like using the official releases because the debian firefox package sometimes takes a little while to catch up to the latest version. For those who are unfamiliar, gtk is the graphics library for rendering the user interface, including all the widgets and their look-and-feel.

gtk3, in turn, has its own versions. In late March, gtk3 in Debian testing was updated to 3.20, which apparently contained some major changes from 3.19.

The problem is that Firefox 46 seems to work fine with pre-3.20 versions of gtk3; with 3.20, however, scrollbars, radio buttons and other widgets are rendered incorrectly or are even missing entirely. One of the bug reports can be found here.

You can apparently work around this issue if you use certain gtk3 themes. Not being a theme guru, I’m not sure exactly why; I was only able to determine this by experimenting with different themes and seeing how the Firefox rendering changed.

Okay, fine: I’d been using the default Xfce theme, which I quite like, but I’m willing to change it to make Firefox work. But I still encountered 2 problems with this workaround: 1) 3.20 is so new that many gtk3 themes included in Debian testing are broken because they haven’t been updated yet to be compatible. While I could get the scrollbars and radio buttons to work with some of these themes, there were often spacing issues around certain widgets, making UIs unusable or extremely annoying. 2) I need to find a theme that supports BOTH gtk2 and gtk3, since Xfce uses gtk2, otherwise I’ll end up with inconsistent look-and-feel across applications. Not all themes support both.

People complain about the state of the Linux desktop all the time, but the fact is, there are many moving parts that comprise a desktop environment. It’s a complex web of dependencies. Sometimes this means certain software packages have to be locked in to previous versions. Sometimes the newest version of a library can’t go into a distribution because it would break too many things that use it. Being able to run the latest and greatest versions of everything is a LOT harder than one might imagine.

In this case, I’m sure there will be a fix in Firefox and/or updates to the gtk3 themes soon enough.

Installing all the Software from an Old System

I recently got a new laptop (more on this in another post, perhaps). After I installed Debian Linux from scratch and copied my home directory over to it, I needed to install all the software I used on my old machine. Here’s one way to do this.

On your old system, create a list of your installed packages:

dpkg -l > old_packages
cat old_packages | awk '{print $2}' | sort > old_packages_list

Copy the file old_packages_list to your new system. Then, on your new system, run:

dpkg -l > new_packages
cat new_packages | awk '{print $2}' | sort > new_packages_list
comm -2 -3 old_packages_list new_packages_list > missing

The file missing now contains a list of packages that existed on your old system that don’t exist on your new one.

If you don’t work with many library packages directly, you can filter them out by reverse grepping for ones that begin with “lib”. (These are installed automatically as dependencies for other things; you don’t usually need to worry about them unless you need them for development.)

cat missing | egrep -v "^lib" > missing_apps_only

Going through this list, I could quickly identify all the software I recognized and wanted to install, and skip packages no longer useful to me (programs to burn CDs and DVDs, for example, since my new machine doesn’t have a drive) or applications I only played with once or twice. And I didn’t install things I didn’t recognize—I can always install them later.

This gets your new computer up and running in short order.