Archive for the 'software' Category

Lessons Learned

Thursday, April 17th, 2008

After about 5 months, I’ve decided that it’s time to move on to another gig. I’ve learned a few things, and I’m posting them here in the hopes that the lessons might be helpful to other programmers and techies-at-large.

Working in a small business as the sole do-it-all technology person has its unique challenges. It can be very fulfilling to be the sole expert and “enabler,” if that turns you on. But the flip side is that management might not really understand or care that much about their technology. Is there a reasonable budget for what they’re trying to accomplish? Do they understand, at a high level, your projects and how they contribute to the mission? Are technology projects considered a burdensome mystery or something valuable and embraced by the company? Question the reasons why there’s only one tech guy/girl and whether that seems right.

Another thing to assess is whether you can deal with taking over the existing codebase. I’ve taken over other code before with success, retaining what was good and doing clean up as necessary. At this past gig, things looked reasonably tidy at a first glance, but as time progressed, I realized a ton of abstractions weren’t in place, and those that did exist didn’t make sense. Some refactoring might have been interesting to do, but this endeavor wasn’t valued when I proposed it as a project.

Lastly, I think it’s important to be wary of promises about the future. Even with the best of intentions, things change quickly at small businesses. The projects I was initially excited about got perpetually deferred for various reasons, and I found myself preoccupied with doing maintenance code fixes, making cosmetic tweaks, performing server administration, and providing support for third party software (which I really don’t like to do). The company needed these things done, so I did them with as much cheer as I could muster, hoping we’d eventually get to a place where some solid new development could occur (and I could sneak in some refactoring)—that’s what floats my boat. But it became to clear to me that wasn’t going to happen anytime soon.

So that’s that. It’s a shame it didn’t work out, especially since I actually liked everyone I worked with. At least it’s an amicable departure, and I hope to be involved in hiring a replacement who might be a better fit for their current needs than I am.

The new gig? Java. Been catching up on it, since it’s been a few years. Oh, it feels so nice to have package namespaces, real data types, full-featured APIs, and real object-orientedness again. Like coming home.

“It Works”

Sunday, March 23rd, 2008

This blog post, “The Worst Thing You Can Say About Software Is That It Works,” written by one Kenny Tilton, is pretty hilarious. This is the most beautiful thing I’ve read in a while:

if a pile of code does not work it is not software, we’ll talk about its merit when it works, OK? Therefore to say software works is to say nothing. Therefore anything substantive one can say about software is better than to say it works.

Reading this triggered flashbacks and PTSD. I’d mentioned to a manager recently that I wanted some time to do some badly needed refactoring. My explanation of why was met with a pause, then, “Let me get this straight. You want time to take something that already works, reorganize it, possibly break things, and we wouldn’t have anything new to even show for it?”

That last part was wrong–the value added comes from maintainability and extensibility, but I couldn’t get him to really grasp those ideas. He’s not a technology person. For all he knew, maybe this was an elaborate ruse on my part to be left undisturbed while I surfed porn at my desk for a few weeks.

I work in a very small shop with all non-technology people, so this sort of thing happens a lot. It’s frustrating. It’s sort of nice to know I’m not alone in encountering this mindset. But man… if even the fellow programmer in Kenny’s story doesn’t get it, I’m not sure there’s much hope for the rest of the world.

EAcceleratorCacheFunction = Cache_Lite_Function + EAccelerator

Tuesday, March 4th, 2008

It’s pretty much all in the title. In a nutshell, EAcceleratorCacheFunction is a “memoizing” cache class for PHP that uses shared memory for storage. It is mostly compatible with Cache_Lite and Cache_Lite_Function.

Just like Cache_Lite_Function, it supports per-cache-object lifetime values, instead of specifying the lifetime of an item at the time you store it. This lets you dynamically change the lifetime of the cache. For example, if system load goes up and you don’t mind serving sightly older content instead of regenerating it:

$load = sys_getloadavg();
// use 5 min avg (ignore momentary spikes)
if($load[1] >= 6) {
    $lifetime = 900; # 15 min
} elseif($load[1] >= 3) {
    $lifetime = 600; # 10 min
} else {
    $lifetime = 300; # 5 min
}
$cache = new EAcceleratorCacheFunction(array('lifeTime' => $lifetime));
$cache->call('make_page');

I wrote EAcceleratorCacheFunction as a drop-in replacement for Cache_Lite_Function. On a virtual private server, doing cache reads/writes from memory instead of disk has made a noticeable difference in performance; it helps tremendously that the database has to contend with less disk I/O.

Two Styles of Caching (PHP’s Cache_Lite vs memcached)

Thursday, February 7th, 2008

Since the recent slashdotting of our website (we held up okay, but there’s always room for improvement), I’ve been investigating the possibility of moving from Cache_Lite (actually, Cache_Lite_Function) to memcached in our PHP code.

Much discussion comparing these solutions focuses on raw performance in benchmarks. In the real world, though, not all things outside the benchmark are equal. On a VPS, disk I/O times are notorious for being highly variable. This makes memcached all the more attractive. Yes, memory is faster than disk in almost every environment, but also, avoiding disk access conserves a precious resource so fewer processes must block for it.

A public mailing list post by one Brian Moon points this out exactly:

If you rolled your own caching system on the local filesystem, benchmarks would show that it is faster. However, what you do not see in benchmarks is what happens to your FS under load. Your kernel has to use a lot of resources to do all that file IO. […]

So, enter memcached. It scales much better than a file based cache. Sure, its slower. I have even seen some tests where its slower than the database. But, tests are not the real world. In the real world, memcached does a great job.

Okay, great. memcached is better when you take into account overall resources. But there’s a very useful Cache_Lite_Function feature that memcached doesn’t seem to have.

When you initialize a Cache_Lite_Function object, you set a “lifeTime” parameter, then use the call() method to wrap your regular function calls. If the output of the function hasn’t been cached within that time period, the call gets made and its results replaced in the cache with a new timestamp.

The cool thing about it is that you can create different cache objects pointing to the same directory store without a problem. Pages can increase and decrease the lifetime of the cache dynamically as load changes, so you can serve slightly older data from cache if necessary, keeping the site responsive while saving database queries. On a site where content changes relatively infrequently, this is a great feature to have: serve it fresh when load is low, serve from cache when load is high.

memcached, on the other hand, requires that you specify an expiration time at the time you place data in the cache. A retrieval call doesn’t let you specify a time period, so you can’t do the above. If data has expired, it’s expired.

It’d be interesting to hack Cache_Lite_Function to use memcached as its store, so you could get the best of both worlds. It would involve storing things in memcached with no expiration, tacking on a timestamp in the data, and doing the checking manually. But it might work.

There’s no such thing as a content management system

Monday, February 4th, 2008

During a meeting at work today, someone remarked, “No one I know seems happy with their content management system.”

Somehow, that’s unsurprising. The problem, I think, is that there’s really no such thing as a content management system. Think about how absurd that term is. It’s a system (it’s organized and has structure) that manages (performs operations) on content (er, stuff). Well then… what piece of software isn’t a CMS?!

When people talk about a CMS, they really mean publishing software. The website I maintain was written specifically for managing news articles. It does its job reasonably well, despite needing some cleanup and refactoring. What’s devious about the term “CMS” is that people start to expect all sorts of things from it. After all, it manages content right? So why can’t it easily integrate with other sites, offer social networking features, do fancy AJAX tricks, and make dinner, with cpu cycles to spare?

The fact is, no software can do it all. There’s sometimes the wishful thinking that if we were using a pre-packaged CMS instead of a custom solution, we’d be better off. That’s just not true. A pre-packaged CMS can be a good option for simple needs, but customization is often a huge headache. The end result is that you’d have been better off writing something custom tailored to begin with. The most flexible (and therefore “best”) pre-packaged CMSes are often not ready-to-run software, but actually well-designed frameworks (like Zope) that require coding for the specific content you want to handle.

So why is no one happy with what they have? I suspect it’s because they didn’t give enough thought to what they wanted, or their expectations were too high, or both.

There’s nothing magical about a CMS. It follows the same rules as any other kind of software: the requirements for what it does should be clear, and the proper code abstractions should be in place. It’s like any other project: it should support a set of features, but also be able to change and grow easily. And you can only achieve those goals with proper planning and good code design. Not confusing lingo like “content management system.”

The Lifespan of Software

Friday, January 25th, 2008

Rumors of Chandler’s Death Are Greatly Exaggerated. So says the renowned Phillip J. Eby.

In light of all the damning media scrutiny paid to Chandler in recent years, Phillip makes an excellent point: the project funded work on a bunch of important open source python libraries. I didn’t realize this—it drastically changed my regard for the OSAF’s work. If this aspect of the project got mentioned more, I think Chandler would get a lot more respect. Even if Chandler 1.0 never sees the light of day, it’s already made major contributions to the python community.

Proprietary software has a definite lifespan: once a company has stopped developing and supporting it, that’s the end. For the company, value is localized and non-transferable in the closed source code base. The business model of selling software depends on this. Once the company kills off the product, the value more or less disappears. You can still use it, of course, but it will decrease in value as similar, hopefully better products appear on the market.

The value of open source software, on the other hand, isn’t limited to its immediate use. Even if an application is no longer actively used and maintained, the code can spark ideas, be used to fork a new project, serve as a lesson in design, etc. Its value can be perpetually renewed by virtue of the fact that it circulates in different ways. If it’s large enough, like Chandler or Zope, it can spawn mini-projects, components, and libraries for reuse.

Years ago, I wrote a Java version of a napster server. Just for fun. It was called jnerve, and I released the code as open source. I tried to get people to host it and use it, but opennap, the C implementation, was naturally faster, more efficient, and more mature. jnerve seemed like a dead end, so I stopped working on it. There were some cool architectural bits to it that were interesting to write, but I regarded the project as a failure.

Months later at a conference, I got a demo CD of some new peer-to-peer file sharing software. (”P2P” was all the rage then.) When I ran it, I was astounded to see a copyright message with my name on it. They had used my code as the basis for their commercial product! The code was able to live on in a different form. I’m not sure it was actually legal, given that jnerve was GPL, but I didn’t care enough to pursue the matter.

Caching is a Workaround, not a Solution

Friday, January 18th, 2008

Like every website that deals with traffic spikes, the one I’m working on these days does a lot of caching. This past week I’ve spent a lot of time reviewing the caching code as well as tuning the database, to get the site working efficiently on a newly upgraded virtual private server.

The following occurred to me: as wonderful and necessary as caching is, it’s fundamentally a workaround. The core problem is having insufficient resources. Given enough CPU and memory, you wouldn’t ever need to cache. It’s when those resources are insufficient for a particular traffic load that caching becomes immensely helpful. That’s why it’s a workaround: it practically addresses the problem, but it doesn’t really solve it. And it’s not a perfect solution: simple caching mechanisms usually introduce a lag time in the currency of content.

Why does this matter? Because caching shouldn’t substitute for efficient code. That is, uncached operations should still try to make the best use of resources as possible. Otherwise, caching turns into a panacea, luring you into a false sense of security about how well the guts of the application really perform. Ideally, caching should always be added as an afterthought on top of already well abstracted code.

Maintainability Pitfalls in PHP

Tuesday, January 8th, 2008

Tim Bray makes this prediction about PHP for 2008:

PHP will remain popular but its growth will slow, as people get nervous about its maintainability and security stories.

I share Tim’s love/hate relationship with PHP. It’s definitely a powerful and easy language. But,

… speaking as an actual computer programmer, I really dislike PHP. I find it ugly and un-modular and there’s something about it that encourages people to write horrible code. We’re talking serious maintainability pain.

I’m seeing this right now in some code I’ve recently taken over. The previous programmer was quite skilled and did a great job, but it’s clear there are some areas he had to write quickly and hack together. The flip side of PHP’s ease of use is that sloppiness accumulates very quickly when you’re doing things in a hurry. To some extent, that’s an unavoidable aspect of a growing codebase. But there’s also specific things about PHP itself that foster disorganization and unmaintainability:

* The lack of namespaces. This makes it hard to quickly locate a function or class definition. Classes can be used as namespaces, but that’s a hack, and leads to ugly un-OOPish uses of classes. PHP could really benefit from packages or modules.

* While PHP5 has vastly improved its object functionality, it often feels like the developer culture remains mired in a function-oriented paradigm. PHP’s relative ease of use and wide availability on commodity webhosting has produced a huge pool of developers whose skills are pretty wide-ranging. The low end of that tends towards hacky, function-oriented code that simply “gets the job done.” I’d like to see more thoughtful discussion on PHP sites and forums about object design and philosophy, about when to use functions and classes, and about how to mix them up harmoniously.

* Having a library of thousands of built-in functions in a global namespace with little rhyme or reason to their naming doesn’t exactly provide a great model of maintainability.

* extract() should die. Die, die, die.

* There’s not much agreement about OOP performance: some insist that heavy usage of some OOP features slows PHP down a lot, so you should avoid them whenever possible. Which not only is plain dumb but leads to deliberately confusing and half-assed uses of OOP in the name of better performance.

Maintainability is a matter of discipline, since you can write sloppy code in any language. That aside, PHP does make it extra hard to keep things orderly. I think CakePHP is a step in the right direction, though if you’re going to use a strict MVC architecture, you might as well dump PHP and just go with Ruby on Rails or Python.

Amateur thoughts and ambitions

Monday, December 31st, 2007

One of the better things I’ve stumbled across this past year is Larry Lessig’s talk, How creativity is being strangled by the law.

The piece makes his usual argument that copyright law stifles innovation in the age of new media. Most striking to me, though, was the part where he uses the phrase “amateur culture.” He explains, “…I don’t mean amateurish culture, I mean culture where people produce for the love of what they’re doing and not for the money.” He uses the term to describe the activity of “kids” (?) creating their own remixes from existing media.

I can remember another amateur culture that’s now largely disappeared. Back in my teens, modem-based bulletin board systems (BBSes) fostered a rich “read-write” culture for amateur programmers. Most of us did not work in technology; after all, the commercial Internet hadn’t been born yet, so the computing industry was much smaller and more obscure. A career as a programmer seemed like a mysterious and rarefied thing to me back then. The coders you met on BBSes were often people who simply liked to do programming in their spare time.

These systems allowed us to circulate public domain source code for fun games and useful applications written in BASIC, Pascal, C, even assembler. We hacked on existing code to get it to do what we wanted, trying to figure out ways to push the limits of our little 8086 processors and 640K of RAM. We mingled regardless of our level of knowledge, beginners and experts alike. We had friendly user meetings in diners in Brooklyn and Manhattan (I lived in NY at the time), where we chatted about home-grown upgrades and discussed how to link up to the nation-wide discussion networks that existed then.

It was amateur culture at its best: lots of exchange, circulation, and cooperation happened all the time. But it was definitely not amateurish. Many were extremely capable and knowledgeable coders.

Today, there are still people who code just because they enjoy it, but the amateur culture and its community hardly exist anymore. Beginners on web forums are more interested in what they need to know in order to land a job, rather than in coding itself. Even open source projects tend to be dominated by career professionals; read any public mailing list and you’ll see how unhelpful they often are to amateurs who want to get involved. One reason I like python is that the project makes a genuine effort to connect to the sensibilities of amateurs. But even its forums are littered with snarky individuals.

All of this is largely due, I think, to the ideology of professionalism, which convinces us that having a stable career is the pinnacle of achievement. It damagingly equates amateurs with dilettantes. That’s why one of the first things we ask in this country when meeting a stranger is, “So what do you do?” By which we really mean, “Tell me what you do for a living so I can know who you are and whether you’re worth talking to.”

In 2008, I resolve to be more wary of this ideology and its negative effects. I want to embrace being an amateur in the various things that I do. I want to think less about careers and focus more on how to best spend my time doing what’s important to me. And I want to find more amateurs to hang out with as well.

Software is an Art

Saturday, December 22nd, 2007

Today a blogger named Damon Poole wrote a short post titled, “Designing Software is the same as Predicting the Future.” It resonates with my post from a while back on whether “software engineering” is the right metaphor for writing code.

The essential problem of coding is to deal with the unknown as best you can. Software is made to solve a problem, but the more unique the problem, the more difficult it is to draw upon existing knowledge to create good solutions. Unknowns force you to make guesses. Educated guesses, hopefully, but guesses nonetheless.

This is why I’m in the camp of those who believe that creating software is an art. It’s an endeavor that wrestles with the unknown. This artistry is highest when you find yourself asking, “How do I do X?” and there don’t seem to be any pre-packaged answers you can look up in a textbook or simply google.

Paradoxically, once the software is written and refined, the unknowns are removed from the picture. Art largely disappears once the pure functionalism of operational software emerges. I think this is why many good programmers have short attention spans, get bored, and tend to jump from project to project. They crave the excitement and gratification of facing the unknown. But this is always ultimately ephemeral.