September 2007 – codefork.com

A Lesson in Software Development

Slashdot reported on an opinion piece by David Sivers entitled, “7 reasons I switched back to PHP after 2 years on Rails.” It’s sparked heated discussion about languages, but if you read the article carefully, it’s not really anti-Ruby or pro-PHP, though David does imply serious shortcomings to Rails. He writes, “at every step, it seemed our needs clashed with Rails’ preferences.”

The problem with the short piece is that he’s not very specific about what these “preferences” are. The main criticisms seem to be that Rails is too complicated, too slow, and doesn’t allow direct SQL. Complexity and performance are often the cost of using a large web framework; these don’t seem like Rails problems, but issues you’d have to deal with when using any big framework (in fact, there’s an apples-and-oranges dimension here: Ruby on Rails is a full web stack and architecture, whereas PHP is a scripting language with built-in access to tons of different libraries. But we’ll put that aside for now.). As for direct SQL, a lot of folks have pointed out that David is just plain mistaken about not being able to do what he wanted.

The real reasons he switched back actually have little to do with PHP per se: read the piece carefully and you’ll see he’s much more comfortable thinking in terms of libraries than frameworks. Also, he needed to integrate tightly with an existing codebase. Those two things are the real reasons why switching back to PHP worked for him. They don’t have anything to do with either the strengths or failures of Ruby on Rails itself.

David doesn’t quite do justice to his main point, which is actually about software development, not language features: don’t expect a much touted language or tool to work magic for you. Of course, this should apply as much to PHP as to Rails! I’m not trying to defend Rails, which I know zilch about. There’s a bigger lesson here: the strengths and weaknesses of languages have less to do with success than the overall environment, the functional and integration requirements, and the coders’ facility with a toolset.

The Virtues of Simplicity

In this age of bloated software, it’s hard to find solutions that do what you need without a ton of unnecessary complexity. Software that’s more complex than the problem at hand makes it costly to learn, maintain, and troubleshoot. For companies and individuals with limited resources, that’s a real challenge. Glenn (my current client) and I wanted a solution to replace some slow performing CGIs but mod_perl was too much. Plus the idea of embedding a perl interpreter in Apache seemed scary.

I looked for perl web application server options that could be proxied through Apache. That architecture would give us the performance benefits of code preloaded into memory, persistent database connections, and precompiled templates. The only thing I found was OpenInteract, but its module list is quite large, and the project hasn’t been updated in a while. I didn’t want to have to dig around a ton of foreign code if I needed to squash a bug. So I decided to write my own.

The result is a neat little thing I call “perlserver,” a heavily modified version of this piece of public domain code for a preforking HTTP server. I deliberately kept it simple but gave it all the needed features: URL-to-handler mapping, safeguards against memory leaks, maximum configurability. It’s only 450 lines of code and is tightly integrated with the LWP modules. To convert the existing set of perl CGIs, the code was simply wrapped in packages that conform to a simple API understood by perlserver. Existing URLs were proxied to perlserver by clever RewriteRules in Apache.

Before, a CGI that built a complex page typically took 900 – 1500ms. Now, the same page served from perlserver by proxy takes around 300 – 400ms.

I’m thinking about releasing the code as open source. It’s not fancy, but that’s why it’s great: it’s a good option for sites that want better perl performance without the tedious complexity of other existing solutions.

Decorators, CherryPy Tools, and Other Python Adventures

In my free time I’ve been working on my own interesting side project using CherryPy. This is my first major foray into Python: I’ve admired it for a long time, but haven’t used it much except for the occasional small script. So it’s pretty awesome to be really digging in. And I’m finding the more I learn about Python, the more I love it.

CherryPy, like Python, is extremely easy to start developing with, but it also has a ton of mind-blowing stuff available when you’re ready to do more. One of these more advanced features is what they call “Tools,” which (among other things!) let you write callbacks into various points of the HTTP request-response cycle. The documentation explains tools in detail, but a good practical example is here. I’ll condense it to relevant bits:

def noBodyProcess():
    """Sets cherrypy.request.process_request_body = False, giving
    us direct control of the file upload destination. By default
    cherrypy loads it to memory, we are directing it to disk."""
    cherrypy.request.process_request_body = False

cherrypy.tools.noBodyProcess = cherrypy.Tool('before_request_body', noBodyProcess)

class fileUpload:
    """fileUpload cherrypy application"""

    """ [bunch of code cut out] """    

    @cherrypy.expose
    @cherrypy.tools.noBodyProcess()
    def upload(self, theFile=None):
        """upload action
        """ [more code ... ] """

The example shows how to set cherrypy.request.process_request_body to False, at the “before_request_body” hook; this overrides the default behavior, allowing you to deal directly with the request body contents.

The nice thing is you don’t need to understand a whole lot about the Tools architecture to make them work, although some things puzzled me initially (more below). Since I really wanted to know why and how the above did what it did, I spent some time poking around. Some things I discovered:

1) Decorators (the lines with the @ symbol) are executed when the class definition is executed. It’s a bit of shortcut syntax for modifying method definitions. I was confused about this for a while, thinking that decorators are just simple wrappers, called each time the function is. Nope!

2) The Tool decorator above modifies an attribute called “_cp_config” of the index() callable. (Not only do objects have attributes, but functions do too in Python–in fact, functions are actually objects! Wacky.) This is how CherryPy stores info about the Tools that should apply to specific handlers.

3) When Request.run() executes, it looks at the relevant Tools, and calls into them as appropriate. In this example, the specific Tool created says noBodyProcess() should be executed at the “process_request_body” point in the request cycle. So it does.

4) cherrypy.request is a strange thing. I was wondering why it’s accessed everywhere directly, as opposed to being passed as request instances into the handler (as it is, say, in Java Servlets). Doesn’t that mean every thread is handling the same request?! Nope. Turns out cherrypy.request is able to store per-thread data, even though the name is accessed globally. (See the threading.local class.)

The convenience in CherryPy comes at the cost of some transparency and intuitiveness: not a high cost, mind you, but a cost nevertheless. Don’t get me wrong, I think CherryPy is pretty excellent. Still… it really tripped me up that Request.run() examines the handler’s attributes for Tool callbacks, instead of storing that information separately (there may well be good reasons for doing it the way it’s done). The fact that cherrypy.request is thread-local also prompted a “Huh?!?!” at first.

Just claiming Technorati, nothing to see here…

My Technorati Profile.