EAcceleratorCacheFunction = Cache_Lite_Function + EAccelerator

It’s pretty much all in the title. In a nutshell, EAcceleratorCacheFunction is a “memoizing” cache class for PHP that uses shared memory for storage. It is mostly compatible with Cache_Lite and Cache_Lite_Function.

Just like Cache_Lite_Function, it supports per-cache-object lifetime values, instead of specifying the lifetime of an item at the time you store it. This lets you dynamically change the lifetime of the cache. For example, if system load goes up and you don’t mind serving sightly older content instead of regenerating it:

$load = sys_getloadavg();
// use 5 min avg (ignore momentary spikes)
if($load[1] >= 6) {
    $lifetime = 900; # 15 min
} elseif($load[1] >= 3) {
    $lifetime = 600; # 10 min
} else {
    $lifetime = 300; # 5 min
}
$cache = new EAcceleratorCacheFunction(array('lifeTime' => $lifetime));
$cache->call('make_page');

I wrote EAcceleratorCacheFunction as a drop-in replacement for Cache_Lite_Function. On a virtual private server, doing cache reads/writes from memory instead of disk has made a noticeable difference in performance; it helps tremendously that the database has to contend with less disk I/O.

Two Styles of Caching (PHP’s Cache_Lite vs memcached)

Since the recent slashdotting of our website (we held up okay, but there’s always room for improvement), I’ve been investigating the possibility of moving from Cache_Lite (actually, Cache_Lite_Function) to memcached in our PHP code.

Much discussion comparing these solutions focuses on raw performance in benchmarks. In the real world, though, not all things outside the benchmark are equal. On a VPS, disk I/O times are notorious for being highly variable. This makes memcached all the more attractive. Yes, memory is faster than disk in almost every environment, but also, avoiding disk access conserves a precious resource so fewer processes must block for it.

A public mailing list post by one Brian Moon points this out exactly:

If you rolled your own caching system on the local filesystem, benchmarks would show that it is faster. However, what you do not see in benchmarks is what happens to your FS under load. Your kernel has to use a lot of resources to do all that file IO. […]

So, enter memcached. It scales much better than a file based cache. Sure, its slower. I have even seen some tests where its slower than the database. But, tests are not the real world. In the real world, memcached does a great job.

Okay, great. memcached is better when you take into account overall resources. But there’s a very useful Cache_Lite_Function feature that memcached doesn’t seem to have.

When you initialize a Cache_Lite_Function object, you set a “lifeTime” parameter, then use the call() method to wrap your regular function calls. If the output of the function hasn’t been cached within that time period, the call gets made and its results replaced in the cache with a new timestamp.

The cool thing about it is that you can create different cache objects pointing to the same directory store without a problem. Pages can increase and decrease the lifetime of the cache dynamically as load changes, so you can serve slightly older data from cache if necessary, keeping the site responsive while saving database queries. On a site where content changes relatively infrequently, this is a great feature to have: serve it fresh when load is low, serve from cache when load is high.

memcached, on the other hand, requires that you specify an expiration time at the time you place data in the cache. A retrieval call doesn’t let you specify a time period, so you can’t do the above. If data has expired, it’s expired.

It’d be interesting to hack Cache_Lite_Function to use memcached as its store, so you could get the best of both worlds. It would involve storing things in memcached with no expiration, tacking on a timestamp in the data, and doing the checking manually. But it might work.

Maintainability Pitfalls in PHP

Tim Bray makes this prediction about PHP for 2008:

PHP will remain popular but its growth will slow, as people get nervous about its maintainability and security stories.

I share Tim’s love/hate relationship with PHP. It’s definitely a powerful and easy language. But,

… speaking as an actual computer programmer, I really dislike PHP. I find it ugly and un-modular and there’s something about it that encourages people to write horrible code. We’re talking serious maintainability pain.

I’m seeing this right now in some code I’ve recently taken over. The previous programmer was quite skilled and did a great job, but it’s clear there are some areas he had to write quickly and hack together. The flip side of PHP’s ease of use is that sloppiness accumulates very quickly when you’re doing things in a hurry. To some extent, that’s an unavoidable aspect of a growing codebase. But there’s also specific things about PHP itself that foster disorganization and unmaintainability:

* The lack of namespaces. This makes it hard to quickly locate a function or class definition. Classes can be used as namespaces, but that’s a hack, and leads to ugly un-OOPish uses of classes. PHP could really benefit from packages or modules.

* While PHP5 has vastly improved its object functionality, it often feels like the developer culture remains mired in a function-oriented paradigm. PHP’s relative ease of use and wide availability on commodity webhosting has produced a huge pool of developers whose skills are pretty wide-ranging. The low end of that tends towards hacky, function-oriented code that simply “gets the job done.” I’d like to see more thoughtful discussion on PHP sites and forums about object design and philosophy, about when to use functions and classes, and about how to mix them up harmoniously.

* Having a library of thousands of built-in functions in a global namespace with little rhyme or reason to their naming doesn’t exactly provide a great model of maintainability.

* extract() should die. Die, die, die.

* There’s not much agreement about OOP performance: some insist that heavy usage of some OOP features slows PHP down a lot, so you should avoid them whenever possible. Which not only is plain dumb but leads to deliberately confusing and half-assed uses of OOP in the name of better performance.

Maintainability is a matter of discipline, since you can write sloppy code in any language. That aside, PHP does make it extra hard to keep things orderly. I think CakePHP is a step in the right direction, though if you’re going to use a strict MVC architecture, you might as well dump PHP and just go with Ruby on Rails or Python.