The Myth of Artisanal Programming

Paul Chiusano, the author of the excellent Functional Programming in Scala from Manning (one of the few tech publishers I buy from; worth every penny), recently wrote a blog post titled, “The advantages of static typing, simply stated”.

Lately all I seem to do is rant to people about this exact topic. Paul’s post is way more succinct than anything I can write, so go over there and read it.

While he takes pains to give a balanced treatment of static vs dynamic type systems, it seems much more cut and dry to me. Dynamic languages are easier and faster for development when you’re getting started on a project, and it’s great if that project never gets very big. But they scale very poorly, for all the reasons he describes. Recently, I had the daunting task of reading almost ~10k lines of Perl code (pretty good Perl, in my opinion). It was hard to make sense of and figure out how to modify and extend, whereas the MUCH larger Java codebase (over 100k lines, if I recall) that I worked with years ago felt very manageable.

My own history as a programmer matches Paul’s very closely. I started with Java, which was annoying but not a bad language by any means. Then Python came along and seemed like a liberation from Java’s rigidity and verbosity. But Python, Ruby and others are showing their weaknesses, and it’s no mystery why people are turning to the newer generation of statically typed languages like Scala, Haskell, Go, etc.

People who haven’t been around as long don’t necessarily have this perspective.

In retrospect, it’s interesting to me how we programmers “got sold” on dynamic languages, from a cultural perspective. You might recall that a big selling point was using simple text editors rather than IDEs, and there was this sense that writing code this way made you closer to the software somehow. Java was corporate, while Python was hand-crafted. There was a vague implicit notion of “artisanal” programming in these circles.

The upshot, of course, is that every time you read a chunk of code or call a function or method, your brain has to do a lot of the work that a statically typed language would be able to enforce and verify for you. But in a dynamic language, you won’t know what happens until the code runs. In large measure, the quality of software hinges on how much you can tell, a priori, about code before it runs at all. In a dynamic world, anything can happen, and often does.

This is a nightmare, pure and simple. Much of the strong focus on writing automated tests is to basically make up for the lack of static typing.

True artisanship lies in design: namely, thinking hard about the data structures and code organization you’re committing to. It’s not about being able to take liberties that can result in things that make no sense to the machine and that can cause errors at runtime that could have been caught beforehand.

Algorithms II: An Aside in Week 2 on Learning to Code

I finished Part I a while ago (yay!) and am currently in week 2 of Part II.

Tangential thoughts: a particularly challenging aspect of studying algorithms is proof of solvability and correctness. How can you tell if a computation is possible to solve at all? If you devise a new method for computing something, how do you know that it really works in every case? Mathematical reasoning can allow you to definitively prove that something does what you intended it to do. This is especially important when empirical verification makes it difficult or even impossible to cover all possible cases.

Sedgewick usually glosses the proofs in his lectures, since they’re not the core focus of the course. Some of these proofs are pretty hard to grasp even at a general level of description.

This aspect of algorithms dovetails with my excursion into functional programming in that both are deeply mathematical. They both indicate a view of computing as a branch of formal mathematics. Edsger Dijkstra was a strong proponent of this approach to computer science. I don’t claim to understand what this means in a very deep way, but I found the following example in Dijkstra’s essay, “On the Cruelty of Really Teaching Computer Science”, extremely helpful in starting to grasp this principle:

Consider the plane figure Q, defined as the 8 by 8 square from which, at two opposite corners, two 1 by 1 squares have been removed. The area of Q is 62, which equals the combined area of 31 dominos of 1 by 2. The theorem is that the figure Q cannot be covered by 31 such dominos.

Another way of stating the theorem is that if you start with squared paper and begin covering this by placing each next domino on two new adjacent squares, no placement of 31 dominos will yield the figure Q.

So, a possible way of proving the theorem is by generating all possible placements of dominos and verifying for each placement that it does not yield the figure Q: a tremendously laborious job.

The simple argument, however, is as follows. Color the squares of the squared paper as on a chess board. Each domino, covering two adjacent squares, covers 1 white and 1 black square, and, hence, each placement covers as many white squares as it covers black squares. In the figure Q, however, the number of white squares and the number of black squares differ by 2—opposite corners lying on the same diagonal—and, hence, no placement of dominos yields figure Q.

Not only is the above simple argument many orders of magnitude shorter than the exhaustive investigation of the possible placements of 31 dominos, it is also essentially more powerful for it covers the generalization of Q by replacing the original 8 by 8 square with any rectangle with sides of even length. The number of such rectangles being infinite, the former method of exhaustive exploration is essentially inadequate for proving our generalized theorem.

And this concludes my example. It has been presented because it illustrates, in a nutshell, the power of down-to-earth mathematics; needless to say, refusal to exploit this power of down-to-earth mathematics amounts to intellectual and technological suicide. The moral of the story is: deal with all elements of a set by ignoring them and working with the set’s definition.

The bombshell here is that learning to code shouldn’t be treated as a matter of what he calls its “operational semantics.” It’s a mistake to focus on what code does or how it behaves in its execution. Instead, you should think about code as a purely formal system:

… A programming language, with its formal syntax and with the proof rules that define its semantics, is a formal system for which program execution provides only a model. It is well-known that formal systems should be dealt with in their own right and not in terms of a specific model. And, again, the corollary is that we should reason about programs without even mentioning their possible “behaviors.”

This isn’t academic. When people often talk about the ability to “reason about code,” I think this is what they’re talking about. It’s a skill that can be hard to pin down exactly, but you can recognize it right away in the programmers who have it. They can well envision the challenges in designing a piece of software without being at a computer or writing any code; they can predict the consequences that a given change has for complex systems; and they can often effectively troubleshoot bugs by asking the right questions rather than rooting around in code. This is the holy grail of programming.

Needless to say, it’s a life-long pursuit.

Functional Languages

Functional programming is a hot topic these days, one that I’ve become increasingly interested in over the last two years. One marker of its rising popularity is the publication of books on how to do FP in mainstream languages such as Python and Javascript.

While it seems worthwhile to adopt FP concepts and get their benefits wherever you can, these approaches inevitably run up against the fact that you need certain language features in order for FP to work smoothly in practice. In other words, there is functional programming as a paradigm, and there are functional languages, which provide crucial features that make the paradigm actually work effectively in the real world.

I’ve been gradually understanding this better while learning Scala. If, like myself, you’re a newcomer to functional languages, here are three prominent features that illustrate what I mean by things that a full-fledged functional language provides.

1) Lists: This data structure, implemented as a linked list, is a staple of functional languages. And since you use it everywhere, it’s important to understand its performance characteristics: appending to a list takes n time (where n is the size of the list) whereas prepending takes constant time; retrieving an element at index i takes i iterations over the nodes in the list; etc.

Coming from other languages, it might seem weird that the linked list (rather than the fixed-length array) plays such a prominent role, but the reason is that it’s great for iterative recursion: you process the “head” or first element of the list, and recursively call your function with the “tail” or the remaining elements in the list. A linked list makes this head/tail pattern very fast, because it avoids having to make a copy of the list on each recursive call: retrieving the list’s tail only requires advancing a memory pointer.

More broadly, because data is immutable in pure FP, a language needs persistent data structures so that when you manipulate those data structures, you avoid making unnecessary copies in memory. This isn’t a problem in the imperative paradigm, where mutating data is the norm.

2) Tail call optimization: Scala and other functional languages implement tail call optimization: if a function returns a recursive call to itself, it does not grow the call stack for each iteration. Since the result of the very last call will be the result of the first call in the recursive chain, it can safely circumvent the stack.

This is a crucial optimization. A problem with deep recursion is that you run the risk of overflowing the call stack. Without this language feature, deeply recursive code will crash.

3) Lazy evaluation: the ability to lazily evaluate functions allows you to do some very powerful things, such as create infinite data structures and invent custom control structures. A mechanism for lazy evaluation avoids having to jump through hoops to accomplish the same thing.

If you look through Functional Python Programming (which is a very good book), it addresses the fact that Python lacks the above features, and offers some options for simulating them or working around them somehow. To newcomers, such discussions and tricks can be a little confusing, because it’s hard to grasp why the lack of these features is a shortcoming in the first place.

This is why I think it’s best for people new to FP to approach learning Scala as a functional language. When I initially tried to learn it through resources and books that aimed to cover the language and API comprehensively, it was simply bewildering. But learning Scala through the lens of FP concepts and the features needed to support those concepts makes it much easier to understand why many of the seemingly odd constructs and idioms exist.

(As a footnote, this is exactly the approach taken by Martin Odersky’s Coursera course, “Functional Programming Principles in Scala”. I’m nearly done going through the lectures, and I think it’s a much better resource than any book I’ve perused.)