The Lifespan of Software

Rumors of Chandler’s Death Are Greatly Exaggerated. So says the renowned Phillip J. Eby.

In light of all the damning media scrutiny paid to Chandler in recent years, Phillip makes an excellent point: the project funded work on a bunch of important open source python libraries. I didn’t realize this—it drastically changed my regard for the OSAF‘s work. If this aspect of the project got mentioned more, I think Chandler would get a lot more respect. Even if Chandler 1.0 never sees the light of day, it’s already made major contributions to the python community.

Proprietary software has a definite lifespan: once a company has stopped developing and supporting it, that’s the end. For the company, value is localized and non-transferable in the closed source code base. The business model of selling software depends on this. Once the company kills off the product, the value more or less disappears. You can still use it, of course, but it will decrease in value as similar, hopefully better products appear on the market.

The value of open source software, on the other hand, isn’t limited to its immediate use. Even if an application is no longer actively used and maintained, the code can spark ideas, be used to fork a new project, serve as a lesson in design, etc. Its value can be perpetually renewed by virtue of the fact that it circulates in different ways. If it’s large enough, like Chandler or Zope, it can spawn mini-projects, components, and libraries for reuse.

Years ago, I wrote a Java version of a napster server. Just for fun. It was called jnerve, and I released the code as open source. I tried to get people to host it and use it, but opennap, the C implementation, was naturally faster, more efficient, and more mature. jnerve seemed like a dead end, so I stopped working on it. There were some cool architectural bits to it that were interesting to write, but I regarded the project as a failure.

Months later at a conference, I got a demo CD of some new peer-to-peer file sharing software. (“P2P” was all the rage then.) When I ran it, I was astounded to see a copyright message with my name on it. They had used my code as the basis for their commercial product! The code was able to live on in a different form. I’m not sure it was actually legal, given that jnerve was GPL, but I didn’t care enough to pursue the matter.

Maintainability Pitfalls in PHP

Tim Bray makes this prediction about PHP for 2008:

PHP will remain popular but its growth will slow, as people get nervous about its maintainability and security stories.

I share Tim’s love/hate relationship with PHP. It’s definitely a powerful and easy language. But,

… speaking as an actual computer programmer, I really dislike PHP. I find it ugly and un-modular and there’s something about it that encourages people to write horrible code. We’re talking serious maintainability pain.

I’m seeing this right now in some code I’ve recently taken over. The previous programmer was quite skilled and did a great job, but it’s clear there are some areas he had to write quickly and hack together. The flip side of PHP’s ease of use is that sloppiness accumulates very quickly when you’re doing things in a hurry. To some extent, that’s an unavoidable aspect of a growing codebase. But there’s also specific things about PHP itself that foster disorganization and unmaintainability:

* The lack of namespaces. This makes it hard to quickly locate a function or class definition. Classes can be used as namespaces, but that’s a hack, and leads to ugly un-OOPish uses of classes. PHP could really benefit from packages or modules.

* While PHP5 has vastly improved its object functionality, it often feels like the developer culture remains mired in a function-oriented paradigm. PHP’s relative ease of use and wide availability on commodity webhosting has produced a huge pool of developers whose skills are pretty wide-ranging. The low end of that tends towards hacky, function-oriented code that simply “gets the job done.” I’d like to see more thoughtful discussion on PHP sites and forums about object design and philosophy, about when to use functions and classes, and about how to mix them up harmoniously.

* Having a library of thousands of built-in functions in a global namespace with little rhyme or reason to their naming doesn’t exactly provide a great model of maintainability.

* extract() should die. Die, die, die.

* There’s not much agreement about OOP performance: some insist that heavy usage of some OOP features slows PHP down a lot, so you should avoid them whenever possible. Which not only is plain dumb but leads to deliberately confusing and half-assed uses of OOP in the name of better performance.

Maintainability is a matter of discipline, since you can write sloppy code in any language. That aside, PHP does make it extra hard to keep things orderly. I think CakePHP is a step in the right direction, though if you’re going to use a strict MVC architecture, you might as well dump PHP and just go with Ruby on Rails or Python.

Amateur thoughts and ambitions

One of the better things I’ve stumbled across this past year is Larry Lessig’s talk, How creativity is being strangled by the law.

The piece makes his usual argument that copyright law stifles innovation in the age of new media. Most striking to me, though, was the part where he uses the phrase “amateur culture.” He explains, “…I don’t mean amateurish culture, I mean culture where people produce for the love of what they’re doing and not for the money.” He uses the term to describe the activity of “kids” (?) creating their own remixes from existing media.

I can remember another amateur culture that’s now largely disappeared. Back in my teens, modem-based bulletin board systems (BBSes) fostered a rich “read-write” culture for amateur programmers. Most of us did not work in technology; after all, the commercial Internet hadn’t been born yet, so the computing industry was much smaller and more obscure. A career as a programmer seemed like a mysterious and rarefied thing to me back then. The coders you met on BBSes were often people who simply liked to do programming in their spare time.

These systems allowed us to circulate public domain source code for fun games and useful applications written in BASIC, Pascal, C, even assembler. We hacked on existing code to get it to do what we wanted, trying to figure out ways to push the limits of our little 8086 processors and 640K of RAM. We mingled regardless of our level of knowledge, beginners and experts alike. We had friendly user meetings in diners in Brooklyn and Manhattan (I lived in NY at the time), where we chatted about home-grown upgrades and discussed how to link up to the nation-wide discussion networks that existed then.

It was amateur culture at its best: lots of exchange, circulation, and cooperation happened all the time. But it was definitely not amateurish. Many were extremely capable and knowledgeable coders.

Today, there are still people who code just because they enjoy it, but the amateur culture and its community hardly exist anymore. Beginners on web forums are more interested in what they need to know in order to land a job, rather than in coding itself. Even open source projects tend to be dominated by career professionals; read any public mailing list and you’ll see how unhelpful they often are to amateurs who want to get involved. One reason I like python is that the project makes a genuine effort to connect to the sensibilities of amateurs. But even its forums are littered with snarky individuals.

All of this is largely due, I think, to the ideology of professionalism, which convinces us that having a stable career is the pinnacle of achievement. It damagingly equates amateurs with dilettantes. That’s why one of the first things we ask in this country when meeting a stranger is, “So what do you do?” By which we really mean, “Tell me what you do for a living so I can know who you are and whether you’re worth talking to.”

In 2008, I resolve to be more wary of this ideology and its negative effects. I want to embrace being an amateur in the various things that I do. I want to think less about careers and focus more on how to best spend my time doing what’s important to me. And I want to find more amateurs to hang out with as well.

Pimping Python’s property()

A basic tenet of object oriented coding is encapsulation: expose the interface but hide the implementation.

Seductress that it is, Python makes it very tempting to violate this principle. Since all members of an object are public, it’s easy to access and modify them from outside. For example:

class Person:
    def __init__(self):
        self.age = None
jeff = Person()
jeff.age = 5 # maturity

Compare this with Java, where you can make a member private, and use “getters” and “setters”:

class Person {
    private int age;
    public int getAge() {
        return this.age;
    }
    public void setAge(int age) {
        this.age = age;
    }
    public static void main(String[] args) {
        Person jeff = new Person();
        jeff.setAge(5);
    }
}

The Java version has better encapsulation. If you decide later to change “age” to, say, a float, a string(!), or if you move it into another internally linked object, it’s no big deal. Just change the body of the getAge() and setAge() methods to do the typecasting or make the extra call, and you won’t need to change any other code. The implementation changes; the interface stays the same.

Of course, you could write getters and setters in Python, too, but you can’t enforce their use (data members are public, remember?). A better way is to do some magic using __getattr__ and __setattr__, although that could get messy, especially if you have a lot of names to intercept.

This has been bothering me for a while. Then I recently discovered the built-in function property(), which lets you rewrite the example above as follows:

class Person(object):
    def __init__(self):
        self._age = None
    def get_age(self):
        return self._age
    def set_age(self, value):
        self._age = value
    def del_age(self):
        del self._age
    age = property(get_age, set_age, del_age)
jeff = Person()
jeff.age = 5

Sweeeet. From the outside, nothing seems different: it still appears that you’re getting and setting age directly, just as before. But property() calls get_age and set_age transparently, giving you a layer that effectively provides encapsulation, even though you can’t tell from the outside. After all, that’s the whole point of encapsulation: you shouldn’t be able to tell from the outside.

In fact, this is actually better than how Java does it. A first version of Person might very well have age as a member which its users directly modify: this is intuitive and effortless, both conceptually and syntactically. Only later might you need to bring property() into the picture, as the object’s internals change or grow more complex. And that’s how it should be: more lines of code and more complexity only when you need it.

(Of course, you still can’t hide data members, but this still goes a long way towards better, though not complete, encapsulation.)

I can’t remember ever seeing property() mentioned anywhere in tutorials or introductory docs (it’s in the reference for built-in functions) That’s especially odd considering that one of Python’s strengths is its powerful object features. It would make sense to talk about this in the context of encapsulation. Plus I wonder how widespread the use of property() is in real world code.

Stuff like this is why I love Python.

Update: Some good comments over at reddit.

Mashup: Google Maps + Amazon Reviews

Saturday morning I woke up with a rather odd idea: to create a mashup of google maps and Amazon book reviews. I was mildly surprised to discover it hadn’t been done yet. Here’s the result of spending a chunk of the weekend writing it in python (11/15 Update: should now work in Safari, Firefox2, and IE6):

http://mapreviews.codefork.com

It didn’t take long to write, since I’d recently been investigating the google maps API for a friend, and I’d also been doing bits of Amazon integration for a project. The biggest pain was dealing with the 1 call per second per IP address rate limiting mechanism of Amazon’s ECS API: the service returns error codes if you exceed the limit. So the application is pretty slow, especially if others are also using it.

But if you’re patient, it’s fun to watch the map markers gradually pop up. It’s also fun to look up a book with strong political leanings, and see how the ratings are distributed geographically. For example, you can look at The Communist Manifesto by Marx, and Bill O’Reilly’s Culture Warrior (*shudder*). Data for both are cached, so they should load very fast.