Small Victories

It always feels very rewarding when my coding-related problems have happy outcomes. A few small but cool things that have happened recently:

A while ago, I discovered an issue with the way the cherrypy web server resolves “localhost” to an IP6 address on some operating systems. The cherrypy folks actually listened to my suggestion and made a helpful change.

It looks like one person has already benefited from the tiny patch I created for a feedparser issue last week.

When I couldn’t get SQLAlchemy’s “pool_recycle” option to properly close and re-open inactive connections, the estimable Michael Bayer took a minute out of his busy life to explain what I had missed, for which I was extremely grateful. (One of these days, I need to write about a post about how truly amazing SQLAlchemy is.)

Open source projects create the opportunities for good things to happen. Most of the time. (Or maybe it’s just the python projects.)

In praise of feedparser

I discovered an issue this morning in the excellent feedparser module.

feedparser (aka Universal Feed Parser) has a reputation in the python community for being an incredible piece of code. With good reason: it understands a mind-boggling array of feed formats and versions, and it’s been put through the paces with a suite of 3262 unit tests. Mark Pilgrim’s terrific work has saved me (and many others, no doubt) months of toil and sweat.

The issue has to do with encountering multiple “title” tags defined in different XML namespaces. A dc:title or media:title tag, if it is encountered anywhere after a regular RSS title tag, will overwrite it. Several other people have actually already documented this in the bug tracking system.

It’s unclear how much work is actively being done with feedparser these days, so I reluctantly dived into the module to try to see what was going on. A little over an hour later, with almost no pain, I found myself with a patch that passed the test suite. I’d love to say it was due to my incredible coding prowess, but really, the code is just amazingly clean and easy to understand. That’s how fixing bugs should be.

The patch is here if anyone wants it.

The Uses of Function Attributes

A Python function is an object of type “function.” One of the things you can do, as a result, is set attributes on the function object, like so:

def say_hello():
	print "hi!"
say_hello.x = 1

(You can also do that on a function that’s an object method.)

This is pretty odd. In some ways, this feature interestingly blurs the distinction among classes, objects, and functions, since all of them can have their own attributes. It strikes me that this illustrates one of the key philosophical differences between Python and a language like Java, where object-oriented principles are more rigidly enforced, In Java, a class is a class, an object is an object, and a method is a method. We shall not even speak of functions!

For a while, I understood this language feature, but not its real utility, which is this: you don’t have to write entire classes for small pieces of functionality.

cherrypy is a good example. It uses the attribute name “exposed” to indicate whether a method should be accessible to the URL-to-object mapping mechanism. Combined with some other clever design, this allows an HTTP request handler to be an ordinary method in a tree structure that corresponds to the web application’s URL scheme. By contrast, in Java, you have to subclass Servlet for each handler, and then manually map those Servlets to URLs. Ugh.

Another use is that decorators can manipulate the attributes of functions they modify. If the decorator’s behavior should change based on state, or it wants to track statistics or debugging information, it can use the underlying function’s attributes for storage. (Some examples of this are strewn throughout the PythonDecoratorLibrary page at the Python Wiki.) Again, this avoids extra class definitions and objects, which would typically be necessary for separately storing that state information.

The last use I can think of right now is that you can easily and quickly create singletons. Because classes are meant for instantiation into objects, it can be inelegant to enforce a singleton pattern. With function attributes, you can define a function in a module, set some defaults, and allow callers to manipulate them, which will remain a single set application-wide.

(Note that you CAN create multiple instances of a function, as this interesting example shows. But the singleton idea is still sound in the context of module functions.)

Decorators, CherryPy Tools, and Other Python Adventures

In my free time I’ve been working on my own interesting side project using CherryPy. This is my first major foray into Python: I’ve admired it for a long time, but haven’t used it much except for the occasional small script. So it’s pretty awesome to be really digging in. And I’m finding the more I learn about Python, the more I love it.

CherryPy, like Python, is extremely easy to start developing with, but it also has a ton of mind-blowing stuff available when you’re ready to do more. One of these more advanced features is what they call “Tools,” which (among other things!) let you write callbacks into various points of the HTTP request-response cycle. The documentation explains tools in detail, but a good practical example is here. I’ll condense it to relevant bits:

def noBodyProcess():
    """Sets cherrypy.request.process_request_body = False, giving
    us direct control of the file upload destination. By default
    cherrypy loads it to memory, we are directing it to disk."""
    cherrypy.request.process_request_body = False

cherrypy.tools.noBodyProcess = cherrypy.Tool('before_request_body', noBodyProcess)

class fileUpload:
    """fileUpload cherrypy application"""

    """ [bunch of code cut out] """    

    @cherrypy.expose
    @cherrypy.tools.noBodyProcess()
    def upload(self, theFile=None):
        """upload action
        """ [more code ... ] """

The example shows how to set cherrypy.request.process_request_body to False, at the “before_request_body” hook; this overrides the default behavior, allowing you to deal directly with the request body contents.

The nice thing is you don’t need to understand a whole lot about the Tools architecture to make them work, although some things puzzled me initially (more below). Since I really wanted to know why and how the above did what it did, I spent some time poking around. Some things I discovered:

1) Decorators (the lines with the @ symbol) are executed when the class definition is executed. It’s a bit of shortcut syntax for modifying method definitions. I was confused about this for a while, thinking that decorators are just simple wrappers, called each time the function is. Nope!

2) The Tool decorator above modifies an attribute called “_cp_config” of the index() callable. (Not only do objects have attributes, but functions do too in Python–in fact, functions are actually objects! Wacky.) This is how CherryPy stores info about the Tools that should apply to specific handlers.

3) When Request.run() executes, it looks at the relevant Tools, and calls into them as appropriate. In this example, the specific Tool created says noBodyProcess() should be executed at the “process_request_body” point in the request cycle. So it does.

4) cherrypy.request is a strange thing. I was wondering why it’s accessed everywhere directly, as opposed to being passed as request instances into the handler (as it is, say, in Java Servlets). Doesn’t that mean every thread is handling the same request?! Nope. Turns out cherrypy.request is able to store per-thread data, even though the name is accessed globally. (See the threading.local class.)

The convenience in CherryPy comes at the cost of some transparency and intuitiveness: not a high cost, mind you, but a cost nevertheless. Don’t get me wrong, I think CherryPy is pretty excellent. Still… it really tripped me up that Request.run() examines the handler’s attributes for Tool callbacks, instead of storing that information separately (there may well be good reasons for doing it the way it’s done). The fact that cherrypy.request is thread-local also prompted a “Huh?!?!” at first.