Monthly Archives: October 2007

In praise of feedparser

I discovered an issue this morning in the excellent feedparser module.

feedparser (aka Universal Feed Parser) has a reputation in the python community for being an incredible piece of code. With good reason: it understands a mind-boggling array of feed formats and versions, and it’s been put through the paces with a suite of 3262 unit tests. Mark Pilgrim’s terrific work has saved me (and many others, no doubt) months of toil and sweat.

The issue has to do with encountering multiple “title” tags defined in different XML namespaces. A dc:title or media:title tag, if it is encountered anywhere after a regular RSS title tag, will overwrite it. Several other people have actually already documented this in the bug tracking system.

It’s unclear how much work is actively being done with feedparser these days, so I reluctantly dived into the module to try to see what was going on. A little over an hour later, with almost no pain, I found myself with a patch that passed the test suite. I’d love to say it was due to my incredible coding prowess, but really, the code is just amazingly clean and easy to understand. That’s how fixing bugs should be.

The patch is here if anyone wants it.

Musings on User Interfaces

It’s interesting to read how different people are reacting to the user interface changes (“improvements?”) in Leopard, which got released today. It’s probably premature to say how opinion will ultimately shake out. But I’ve started to collect some noteworthy opinions.

Also concerning user interfaces: Mozilla Prism has been announced. It’s a project which extends the existing browser platform of established web standards in order to run as desktop applications. The description is light on details and implementation, but several comments on the announcement already express skepticism.

Ian Bicking is clearly excited about it (I actually found the announcement through his truly wonderful blog). He sees Prism as embracing the essence of the web model. The key, I think, is whether the extensions will be rich enough: will there be ways to access local storage, control security, and access devices? Without those changes, I’m not sure the desktop app “frame” around the web application buys all that much. Some underlying functionality needs to accompany the desktop interface in order to be really transformative.

Is “race” a valid scientific construct?

James Watson, of Watson and Crick fame, has resigned this morning (Race row DNA scientist quits lab). No doubt it was due to the controversy over his incredibly offensive comments about people of African descent:

He was quoted as saying he was “inherently gloomy about the prospect of Africa” because “all our social policies are based on the fact that their intelligence is the same as ours – whereas all the testing says not really”.

He later issued a pretty feeble apology; actually more of a statement that he’d been misunderstood. On the subject of whether Africans were genetically inferior, he retracted his earlier statement: “there is no scientific basis for such a belief.”

The most interesting discussions so far about this incident resolve around the question: Is “race” a valid scientific construct for genetic research, or is it merely a social construct? It’s a deceptive question whose answer probably isn’t either/or: for example, I think it’s been shown that certain populations correlated with “race” are more genetically prone to having particular medical conditions.

I’m a skeptic when it comes to science. For me, the deeper question is: when does race get used in research and when doesn’t it? For what purposes? Science rarely takes place (if ever) in a vacuum of objectivity. Research gets funded, often in order to support human decisions about something. Note that Watson’s original comments reference “social policies.” He also mentioned what employers tend to think about the intelligence of people of African descent.

This is what’s truly scary about genetics in this day and age. It definitely can be a tool for helping people. But I suspect the greater likelihood is that it’ll be a tool for deciding who’s smart/able, who’s more deserving of opportunity, even who’s more deserving of a chance to live (health insurance companies love this stuff!).

My new web project: DebateWire

A big reason that I read blogs is because there are great writers out there who raise questions that I hadn’t considered before. Often they make me look at things I care about in a new light.

It’d be cool, I thought, to be able to solidify questions of debate as a way to organize blog entries. It’s more specific than categories or tags, and if someone sees a provocative question that really makes them think, they could chime in with their own opinion or argument. I’m particularly interested in politics, but it would work for issues of debate on any controversial topic.

For the past few weeks, I’ve been sort of obsessed with that idea. The project is called DebateWire, and it lets you do what I described above. It’s got some rough edges, but I decided it’s ready for 1st release. Please take a look, and spread the word if you like it.

My next post is an example of how to use it.

The Uses of Function Attributes

A Python function is an object of type “function.” One of the things you can do, as a result, is set attributes on the function object, like so:

def say_hello():
	print "hi!"
say_hello.x = 1

(You can also do that on a function that’s an object method.)

This is pretty odd. In some ways, this feature interestingly blurs the distinction among classes, objects, and functions, since all of them can have their own attributes. It strikes me that this illustrates one of the key philosophical differences between Python and a language like Java, where object-oriented principles are more rigidly enforced, In Java, a class is a class, an object is an object, and a method is a method. We shall not even speak of functions!

For a while, I understood this language feature, but not its real utility, which is this: you don’t have to write entire classes for small pieces of functionality.

cherrypy is a good example. It uses the attribute name “exposed” to indicate whether a method should be accessible to the URL-to-object mapping mechanism. Combined with some other clever design, this allows an HTTP request handler to be an ordinary method in a tree structure that corresponds to the web application’s URL scheme. By contrast, in Java, you have to subclass Servlet for each handler, and then manually map those Servlets to URLs. Ugh.

Another use is that decorators can manipulate the attributes of functions they modify. If the decorator’s behavior should change based on state, or it wants to track statistics or debugging information, it can use the underlying function’s attributes for storage. (Some examples of this are strewn throughout the PythonDecoratorLibrary page at the Python Wiki.) Again, this avoids extra class definitions and objects, which would typically be necessary for separately storing that state information.

The last use I can think of right now is that you can easily and quickly create singletons. Because classes are meant for instantiation into objects, it can be inelegant to enforce a singleton pattern. With function attributes, you can define a function in a module, set some defaults, and allow callers to manipulate them, which will remain a single set application-wide.

(Note that you CAN create multiple instances of a function, as this interesting example shows. But the singleton idea is still sound in the context of module functions.)

The Right Metaphor

If you haven’t caught Kyle Wilson’s recent piece, “Software is Hard,” I highly recommend it. The essay moves elegantly from book review, to musings on knowing when the code is “done,” to issues of measuring quality, to the ever-present problems of lateness and going over budget, to the potential inadequacy of “engineering” as the metaphor for writing software.

It’s the last topic that’s the most fascinating to me. Kyle points out that new software is written only in response to new problems (otherwise, you’d just use existing software). As such, new code ventures into the unknown, where you can, at best, only guess at the challenges you’ll encounter. We always try our best to assess what we’ll face, but by their very nature, these are imperfect assessments. As Kyle puts it, “The only way to avoid that is to have your design go all the way down to specifying individual lines of code, in which case you aren’t designing at all, you’re just programming.”

Which is not to say, of course, we should simply give up engineering. Without some sort of plan for design and advance assessment, we’d be utterly lost. Businesses couldn’t function and programmers couldn’t make a living. For better or worse, the smooth functioning of our society is founded on the arrogance of making accurate predictions, not just about business and software, but about everything from politics and law, to human behavior and psychology, to weather. Such hubris…

No surprise, then, that even real-world traditional engineering often fails to be predictable. Kyle mentions the Oakland Bay Bridge as a project that’s hugely over time and budget. Just yesterday, Boeing announced its much-anticipated Dreamliner would be six months late.

So maybe software engineering IS the right term after all.