Algorithms II: An Aside in Week 2 on Learning to Code

I finished Part I a while ago (yay!) and am currently in week 2 of Part II.

Tangential thoughts: a particularly challenging aspect of studying algorithms is proof of solvability and correctness. How can you tell if a computation is possible to solve at all? If you devise a new method for computing something, how do you know that it really works in every case? Mathematical reasoning can allow you to definitively prove that something does what you intended it to do. This is especially important when empirical verification makes it difficult or even impossible to cover all possible cases.

Sedgewick usually glosses the proofs in his lectures, since they’re not the core focus of the course. Some of these proofs are pretty hard to grasp even at a general level of description.

This aspect of algorithms dovetails with my excursion into functional programming in that both are deeply mathematical. They both indicate a view of computing as a branch of formal mathematics. Edsger Dijkstra was a strong proponent of this approach to computer science. I don’t claim to understand what this means in a very deep way, but I found the following example in Dijkstra’s essay, “On the Cruelty of Really Teaching Computer Science”, extremely helpful in starting to grasp this principle:

Consider the plane figure Q, defined as the 8 by 8 square from which, at two opposite corners, two 1 by 1 squares have been removed. The area of Q is 62, which equals the combined area of 31 dominos of 1 by 2. The theorem is that the figure Q cannot be covered by 31 such dominos.

Another way of stating the theorem is that if you start with squared paper and begin covering this by placing each next domino on two new adjacent squares, no placement of 31 dominos will yield the figure Q.

So, a possible way of proving the theorem is by generating all possible placements of dominos and verifying for each placement that it does not yield the figure Q: a tremendously laborious job.

The simple argument, however, is as follows. Color the squares of the squared paper as on a chess board. Each domino, covering two adjacent squares, covers 1 white and 1 black square, and, hence, each placement covers as many white squares as it covers black squares. In the figure Q, however, the number of white squares and the number of black squares differ by 2—opposite corners lying on the same diagonal—and, hence, no placement of dominos yields figure Q.

Not only is the above simple argument many orders of magnitude shorter than the exhaustive investigation of the possible placements of 31 dominos, it is also essentially more powerful for it covers the generalization of Q by replacing the original 8 by 8 square with any rectangle with sides of even length. The number of such rectangles being infinite, the former method of exhaustive exploration is essentially inadequate for proving our generalized theorem.

And this concludes my example. It has been presented because it illustrates, in a nutshell, the power of down-to-earth mathematics; needless to say, refusal to exploit this power of down-to-earth mathematics amounts to intellectual and technological suicide. The moral of the story is: deal with all elements of a set by ignoring them and working with the set’s definition.

The bombshell here is that learning to code shouldn’t be treated as a matter of what he calls its “operational semantics.” It’s a mistake to focus on what code does or how it behaves in its execution. Instead, you should think about code as a purely formal system:

… A programming language, with its formal syntax and with the proof rules that define its semantics, is a formal system for which program execution provides only a model. It is well-known that formal systems should be dealt with in their own right and not in terms of a specific model. And, again, the corollary is that we should reason about programs without even mentioning their possible “behaviors.”

This isn’t academic. When people often talk about the ability to “reason about code,” I think this is what they’re talking about. It’s a skill that can be hard to pin down exactly, but you can recognize it right away in the programmers who have it. They can well envision the challenges in designing a piece of software without being at a computer or writing any code; they can predict the consequences that a given change has for complex systems; and they can often effectively troubleshoot bugs by asking the right questions rather than rooting around in code. This is the holy grail of programming.

Needless to say, it’s a life-long pursuit.

Functional Languages

Functional programming is a hot topic these days, one that I’ve become increasingly interested in over the last two years. One marker of its rising popularity is the publication of books on how to do FP in mainstream languages such as Python and Javascript.

While it seems worthwhile to adopt FP concepts and get their benefits wherever you can, these approaches inevitably run up against the fact that you need certain language features in order for FP to work smoothly in practice. In other words, there is functional programming as a paradigm, and there are functional languages, which provide crucial features that make the paradigm actually work effectively in the real world.

I’ve been gradually understanding this better while learning Scala. If, like myself, you’re a newcomer to functional languages, here are three prominent features that illustrate what I mean by things that a full-fledged functional language provides.

1) Lists: This data structure, implemented as a linked list, is a staple of functional languages. And since you use it everywhere, it’s important to understand its performance characteristics: appending to a list takes n time (where n is the size of the list) whereas prepending takes constant time; retrieving an element at index i takes i iterations over the nodes in the list; etc.

Coming from other languages, it might seem weird that the linked list (rather than the fixed-length array) plays such a prominent role, but the reason is that it’s great for iterative recursion: you process the “head” or first element of the list, and recursively call your function with the “tail” or the remaining elements in the list. A linked list makes this head/tail pattern very fast, because it avoids having to make a copy of the list on each recursive call: retrieving the list’s tail only requires advancing a memory pointer.

More broadly, because data is immutable in pure FP, a language needs persistent data structures so that when you manipulate those data structures, you avoid making unnecessary copies in memory. This isn’t a problem in the imperative paradigm, where mutating data is the norm.

2) Tail call optimization: Scala and other functional languages implement tail call optimization: if a function returns a recursive call to itself, it does not grow the call stack for each iteration. Since the result of the very last call will be the result of the first call in the recursive chain, it can safely circumvent the stack.

This is a crucial optimization. A problem with deep recursion is that you run the risk of overflowing the call stack. Without this language feature, deeply recursive code will crash.

3) Lazy evaluation: the ability to lazily evaluate functions allows you to do some very powerful things, such as create infinite data structures and invent custom control structures. A mechanism for lazy evaluation avoids having to jump through hoops to accomplish the same thing.

If you look through Functional Python Programming (which is a very good book), it addresses the fact that Python lacks the above features, and offers some options for simulating them or working around them somehow. To newcomers, such discussions and tricks can be a little confusing, because it’s hard to grasp why the lack of these features is a shortcoming in the first place.

This is why I think it’s best for people new to FP to approach learning Scala as a functional language. When I initially tried to learn it through resources and books that aimed to cover the language and API comprehensively, it was simply bewildering. But learning Scala through the lens of FP concepts and the features needed to support those concepts makes it much easier to understand why many of the seemingly odd constructs and idioms exist.

(As a footnote, this is exactly the approach taken by Martin Odersky’s Coursera course, “Functional Programming Principles in Scala”. I’m nearly done going through the lectures, and I think it’s a much better resource than any book I’ve perused.)