Archive for the 'admin' Category

The Inexact Science of Optimizing MySQL

Monday, January 21st, 2008

The past week or so, I’ve been playing database administrator: monitoring load, examining the log for slow queries, tweaking parameters, rewriting queries, adding indexes, and repeating the cycle over and over again. It’s a tedious, time-consuming, and exhausting process.

Tuning is tricky because inefficiencies aren’t always immediately apparent, and there may be uncontrollable factors affecting performance in VPS environments. Here are some things I’ve learned or had to refresh myself about with respect to tuning MySQL.

Don’t assume anything about why queries are slow. Under high load, a lot of queries can pop up in the slow query log just because the database is working hard. It doesn’t necessarily mean every query needs optimization. Try to look for patterns, and expect to spend time reviewing queries that are already optimized.

Look for strange queries. Don’t recognize it? Well, maybe it shouldn’t be there. I found an expensive query for a website feature that was obsolete months ago; it was being made even though the data wasn’t being used or displayed anywhere.

EXPLAIN is your friend. Interpreting the output makes my head pound sometimes, but it’s absolutely necessary. Invest the time and effort to understand it.

Multi-column indexes are a necessary evil. Most tables I was examining had only single-column indexes, causing complex WHERE clauses and/or ORDER BY to do costly “Using temporary; Using filesort” operations. Adding a few multi-column indexes helped a great deal. In general, I dislike multi-column indexes because they require maintenance: if you modify a complex query, you might have to add or change the index, or it will become slow as molasses. But unfortunately, that’s the tradeoff for performance.

The “same” query can behave differently with different datasets. A query can sometimes use an index, and sometimes use a filesort, depending, for example, on constants in the WHERE clause. Be sure to use EXPLAIN with different values.

Give the query optimizer hints. Disk access is slow, especially on the VPS, so I wanted to avoid filesorts at all costs. Using STRAIGHT_JOIN forced MySQL to process an indexed table first; otherwise, it believed that a filesort would be faster in some cases. It might be, sometimes–but if you think you know better, use the hint.

Disk access on a VPS can be wonky. It depends a lot on what else is happening on the hardware node. This relates somewhat to the previous point: all other things being equal, MySQL could be right about a filesort being faster than an index. But it’s usually not the case in an environment where disk I/O is very costly.

Reduce the number of joins wherever possible. It doesn’t seem like a few indexed joins into “lookup” tables (basically key-value types of things) would affect performance that much, but amazingly, it can. Really. Break up queries into smaller, simpler ones if at all possible.

Run ANALYZE TABLE and OPTIMIZE TABLE. The manual says you don’t typically need to run these commands, but I found frequently updated tables benefit from this maintenance, even if they have only a few thousand records.

Work iteratively. Somtimes performance problems are localized to one or two spots, but more often, inefficiencies are spread out over a lot of queries; they may become noticeable only after several passes, as server conditions fluctuate. Discovering these problem areas takes a lot of patience.

Up and Running

Sunday, August 19th, 2007

This domain is being served from a slicehost.com VPS account, which I’m sharing with my friend Danny. We spent a few hours installing a typical LAMP environment from Debian packages. It all went smoothly; we’re pretty much set up. Eventually, we’ll get around to putting actual content up on our respective domains, but we expect traffic and load to remain very light. So we spent most of our time dealing with the main resource limit on our “box”: memory.

There’s lots of information available on various optimizing strategies for Apache and MySQL, but it basically boiled down to 2 things for us. In Apache, we lowered the number of start/max/min servers, and loaded only modules we needed. In MySQL, we decreased the sizes of various buffers and queues, and also disabled InnoDB support. The latter alone changed the initial size of MySQL from 20 megs of resident memory to 9 megs! I’m sure we’ll continue to tweak later on, but memory usage seems fairly reasonable right now for what we’re running.

It’s funny to reflect on how far open source software has come in general. I remember talking nine years ago with some corporate guy in New York about a piece of open source he promptly dismissed as “not ready for the enterprise” (that phrase always makes me laugh. Whose enterprise, exactly? Mine?). Back then, it was common to hear disparaging stuff like that all the time.

And now–is there anything that’s NOT enterprise ready?! Maybe the suits are happy, but I’m not sure everyone else has benefitted. As the software has grown in scalability, performance, and features, the requirements have also increased drastically. And that means the bar is higher for even a fairly standard LAMP environment, for example. Default installs of Apache, MySQL and PHP can be real memory hogs.

I’ve been discovering a few interesting alternative projects with lighter feature sets. lighttpd and thttpd are both high-performance, small-foot web servers. And a lot of folks seem to be looking to SQLite if they don’t need the fancier features of a complex RDBMS. Maybe I will experiment with these at some point.