Category Archives: python

Amateur thoughts and ambitions

One of the better things I’ve stumbled across this past year is Larry Lessig’s talk, How creativity is being strangled by the law.

The piece makes his usual argument that copyright law stifles innovation in the age of new media. Most striking to me, though, was the part where he uses the phrase “amateur culture.” He explains, “…I don’t mean amateurish culture, I mean culture where people produce for the love of what they’re doing and not for the money.” He uses the term to describe the activity of “kids” (?) creating their own remixes from existing media.

I can remember another amateur culture that’s now largely disappeared. Back in my teens, modem-based bulletin board systems (BBSes) fostered a rich “read-write” culture for amateur programmers. Most of us did not work in technology; after all, the commercial Internet hadn’t been born yet, so the computing industry was much smaller and more obscure. A career as a programmer seemed like a mysterious and rarefied thing to me back then. The coders you met on BBSes were often people who simply liked to do programming in their spare time.

These systems allowed us to circulate public domain source code for fun games and useful applications written in BASIC, Pascal, C, even assembler. We hacked on existing code to get it to do what we wanted, trying to figure out ways to push the limits of our little 8086 processors and 640K of RAM. We mingled regardless of our level of knowledge, beginners and experts alike. We had friendly user meetings in diners in Brooklyn and Manhattan (I lived in NY at the time), where we chatted about home-grown upgrades and discussed how to link up to the nation-wide discussion networks that existed then.

It was amateur culture at its best: lots of exchange, circulation, and cooperation happened all the time. But it was definitely not amateurish. Many were extremely capable and knowledgeable coders.

Today, there are still people who code just because they enjoy it, but the amateur culture and its community hardly exist anymore. Beginners on web forums are more interested in what they need to know in order to land a job, rather than in coding itself. Even open source projects tend to be dominated by career professionals; read any public mailing list and you’ll see how unhelpful they often are to amateurs who want to get involved. One reason I like python is that the project makes a genuine effort to connect to the sensibilities of amateurs. But even its forums are littered with snarky individuals.

All of this is largely due, I think, to the ideology of professionalism, which convinces us that having a stable career is the pinnacle of achievement. It damagingly equates amateurs with dilettantes. That’s why one of the first things we ask in this country when meeting a stranger is, “So what do you do?” By which we really mean, “Tell me what you do for a living so I can know who you are and whether you’re worth talking to.”

In 2008, I resolve to be more wary of this ideology and its negative effects. I want to embrace being an amateur in the various things that I do. I want to think less about careers and focus more on how to best spend my time doing what’s important to me. And I want to find more amateurs to hang out with as well.

Pimping Python’s property()

A basic tenet of object oriented coding is encapsulation: expose the interface but hide the implementation.

Seductress that it is, Python makes it very tempting to violate this principle. Since all members of an object are public, it’s easy to access and modify them from outside. For example:

class Person:
    def __init__(self):
        self.age = None
jeff = Person()
jeff.age = 5 # maturity

Compare this with Java, where you can make a member private, and use “getters” and “setters”:

class Person {
    private int age;
    public int getAge() {
        return this.age;
    }
    public void setAge(int age) {
        this.age = age;
    }
    public static void main(String[] args) {
        Person jeff = new Person();
        jeff.setAge(5);
    }
}

The Java version has better encapsulation. If you decide later to change “age” to, say, a float, a string(!), or if you move it into another internally linked object, it’s no big deal. Just change the body of the getAge() and setAge() methods to do the typecasting or make the extra call, and you won’t need to change any other code. The implementation changes; the interface stays the same.

Of course, you could write getters and setters in Python, too, but you can’t enforce their use (data members are public, remember?). A better way is to do some magic using __getattr__ and __setattr__, although that could get messy, especially if you have a lot of names to intercept.

This has been bothering me for a while. Then I recently discovered the built-in function property(), which lets you rewrite the example above as follows:

class Person(object):
    def __init__(self):
        self._age = None
    def get_age(self):
        return self._age
    def set_age(self, value):
        self._age = value
    def del_age(self):
        del self._age
    age = property(get_age, set_age, del_age)
jeff = Person()
jeff.age = 5

Sweeeet. From the outside, nothing seems different: it still appears that you’re getting and setting age directly, just as before. But property() calls get_age and set_age transparently, giving you a layer that effectively provides encapsulation, even though you can’t tell from the outside. After all, that’s the whole point of encapsulation: you shouldn’t be able to tell from the outside.

In fact, this is actually better than how Java does it. A first version of Person might very well have age as a member which its users directly modify: this is intuitive and effortless, both conceptually and syntactically. Only later might you need to bring property() into the picture, as the object’s internals change or grow more complex. And that’s how it should be: more lines of code and more complexity only when you need it.

(Of course, you still can’t hide data members, but this still goes a long way towards better, though not complete, encapsulation.)

I can’t remember ever seeing property() mentioned anywhere in tutorials or introductory docs (it’s in the reference for built-in functions) That’s especially odd considering that one of Python’s strengths is its powerful object features. It would make sense to talk about this in the context of encapsulation. Plus I wonder how widespread the use of property() is in real world code.

Stuff like this is why I love Python.

Update: Some good comments over at reddit.

Mashup: Google Maps + Amazon Reviews

Saturday morning I woke up with a rather odd idea: to create a mashup of google maps and Amazon book reviews. I was mildly surprised to discover it hadn’t been done yet. Here’s the result of spending a chunk of the weekend writing it in python (11/15 Update: should now work in Safari, Firefox2, and IE6):

http://mapreviews.codefork.com

It didn’t take long to write, since I’d recently been investigating the google maps API for a friend, and I’d also been doing bits of Amazon integration for a project. The biggest pain was dealing with the 1 call per second per IP address rate limiting mechanism of Amazon’s ECS API: the service returns error codes if you exceed the limit. So the application is pretty slow, especially if others are also using it.

But if you’re patient, it’s fun to watch the map markers gradually pop up. It’s also fun to look up a book with strong political leanings, and see how the ratings are distributed geographically. For example, you can look at The Communist Manifesto by Marx, and Bill O’Reilly’s Culture Warrior (*shudder*). Data for both are cached, so they should load very fast.

Small Victories

It always feels very rewarding when my coding-related problems have happy outcomes. A few small but cool things that have happened recently:

A while ago, I discovered an issue with the way the cherrypy web server resolves “localhost” to an IP6 address on some operating systems. The cherrypy folks actually listened to my suggestion and made a helpful change.

It looks like one person has already benefited from the tiny patch I created for a feedparser issue last week.

When I couldn’t get SQLAlchemy’s “pool_recycle” option to properly close and re-open inactive connections, the estimable Michael Bayer took a minute out of his busy life to explain what I had missed, for which I was extremely grateful. (One of these days, I need to write about a post about how truly amazing SQLAlchemy is.)

Open source projects create the opportunities for good things to happen. Most of the time. (Or maybe it’s just the python projects.)

In praise of feedparser

I discovered an issue this morning in the excellent feedparser module.

feedparser (aka Universal Feed Parser) has a reputation in the python community for being an incredible piece of code. With good reason: it understands a mind-boggling array of feed formats and versions, and it’s been put through the paces with a suite of 3262 unit tests. Mark Pilgrim’s terrific work has saved me (and many others, no doubt) months of toil and sweat.

The issue has to do with encountering multiple “title” tags defined in different XML namespaces. A dc:title or media:title tag, if it is encountered anywhere after a regular RSS title tag, will overwrite it. Several other people have actually already documented this in the bug tracking system.

It’s unclear how much work is actively being done with feedparser these days, so I reluctantly dived into the module to try to see what was going on. A little over an hour later, with almost no pain, I found myself with a patch that passed the test suite. I’d love to say it was due to my incredible coding prowess, but really, the code is just amazingly clean and easy to understand. That’s how fixing bugs should be.

The patch is here if anyone wants it.

The Uses of Function Attributes

A Python function is an object of type “function.” One of the things you can do, as a result, is set attributes on the function object, like so:

def say_hello():
	print "hi!"
say_hello.x = 1

(You can also do that on a function that’s an object method.)

This is pretty odd. In some ways, this feature interestingly blurs the distinction among classes, objects, and functions, since all of them can have their own attributes. It strikes me that this illustrates one of the key philosophical differences between Python and a language like Java, where object-oriented principles are more rigidly enforced, In Java, a class is a class, an object is an object, and a method is a method. We shall not even speak of functions!

For a while, I understood this language feature, but not its real utility, which is this: you don’t have to write entire classes for small pieces of functionality.

cherrypy is a good example. It uses the attribute name “exposed” to indicate whether a method should be accessible to the URL-to-object mapping mechanism. Combined with some other clever design, this allows an HTTP request handler to be an ordinary method in a tree structure that corresponds to the web application’s URL scheme. By contrast, in Java, you have to subclass Servlet for each handler, and then manually map those Servlets to URLs. Ugh.

Another use is that decorators can manipulate the attributes of functions they modify. If the decorator’s behavior should change based on state, or it wants to track statistics or debugging information, it can use the underlying function’s attributes for storage. (Some examples of this are strewn throughout the PythonDecoratorLibrary page at the Python Wiki.) Again, this avoids extra class definitions and objects, which would typically be necessary for separately storing that state information.

The last use I can think of right now is that you can easily and quickly create singletons. Because classes are meant for instantiation into objects, it can be inelegant to enforce a singleton pattern. With function attributes, you can define a function in a module, set some defaults, and allow callers to manipulate them, which will remain a single set application-wide.

(Note that you CAN create multiple instances of a function, as this interesting example shows. But the singleton idea is still sound in the context of module functions.)

Decorators, CherryPy Tools, and Other Python Adventures

In my free time I’ve been working on my own interesting side project using CherryPy. This is my first major foray into Python: I’ve admired it for a long time, but haven’t used it much except for the occasional small script. So it’s pretty awesome to be really digging in. And I’m finding the more I learn about Python, the more I love it.

CherryPy, like Python, is extremely easy to start developing with, but it also has a ton of mind-blowing stuff available when you’re ready to do more. One of these more advanced features is what they call “Tools,” which (among other things!) let you write callbacks into various points of the HTTP request-response cycle. The documentation explains tools in detail, but a good practical example is here. I’ll condense it to relevant bits:

def noBodyProcess():
    """Sets cherrypy.request.process_request_body = False, giving
    us direct control of the file upload destination. By default
    cherrypy loads it to memory, we are directing it to disk."""
    cherrypy.request.process_request_body = False

cherrypy.tools.noBodyProcess = cherrypy.Tool('before_request_body', noBodyProcess)

class fileUpload:
    """fileUpload cherrypy application"""

    """ [bunch of code cut out] """    

    @cherrypy.expose
    @cherrypy.tools.noBodyProcess()
    def upload(self, theFile=None):
        """upload action
        """ [more code ... ] """

The example shows how to set cherrypy.request.process_request_body to False, at the “before_request_body” hook; this overrides the default behavior, allowing you to deal directly with the request body contents.

The nice thing is you don’t need to understand a whole lot about the Tools architecture to make them work, although some things puzzled me initially (more below). Since I really wanted to know why and how the above did what it did, I spent some time poking around. Some things I discovered:

1) Decorators (the lines with the @ symbol) are executed when the class definition is executed. It’s a bit of shortcut syntax for modifying method definitions. I was confused about this for a while, thinking that decorators are just simple wrappers, called each time the function is. Nope!

2) The Tool decorator above modifies an attribute called “_cp_config” of the index() callable. (Not only do objects have attributes, but functions do too in Python–in fact, functions are actually objects! Wacky.) This is how CherryPy stores info about the Tools that should apply to specific handlers.

3) When Request.run() executes, it looks at the relevant Tools, and calls into them as appropriate. In this example, the specific Tool created says noBodyProcess() should be executed at the “process_request_body” point in the request cycle. So it does.

4) cherrypy.request is a strange thing. I was wondering why it’s accessed everywhere directly, as opposed to being passed as request instances into the handler (as it is, say, in Java Servlets). Doesn’t that mean every thread is handling the same request?! Nope. Turns out cherrypy.request is able to store per-thread data, even though the name is accessed globally. (See the threading.local class.)

The convenience in CherryPy comes at the cost of some transparency and intuitiveness: not a high cost, mind you, but a cost nevertheless. Don’t get me wrong, I think CherryPy is pretty excellent. Still… it really tripped me up that Request.run() examines the handler’s attributes for Tool callbacks, instead of storing that information separately (there may well be good reasons for doing it the way it’s done). The fact that cherrypy.request is thread-local also prompted a “Huh?!?!” at first.