The Inexact Science of Optimizing MySQL

The past week or so, I’ve been playing database administrator: monitoring load, examining the log for slow queries, tweaking parameters, rewriting queries, adding indexes, and repeating the cycle over and over again. It’s a tedious, time-consuming, and exhausting process.

Tuning is tricky because inefficiencies aren’t always immediately apparent, and there may be uncontrollable factors affecting performance in VPS environments. Here are some things I’ve learned or had to refresh myself about with respect to tuning MySQL.

Don’t assume anything about why queries are slow. Under high load, a lot of queries can pop up in the slow query log just because the database is working hard. It doesn’t necessarily mean every query needs optimization. Try to look for patterns, and expect to spend time reviewing queries that are already optimized.

Look for strange queries. Don’t recognize it? Well, maybe it shouldn’t be there. I found an expensive query for a website feature that was obsolete months ago; it was being made even though the data wasn’t being used or displayed anywhere.

EXPLAIN is your friend. Interpreting the output makes my head pound sometimes, but it’s absolutely necessary. Invest the time and effort to understand it.

Multi-column indexes are a necessary evil. Most tables I was examining had only single-column indexes, causing complex WHERE clauses and/or ORDER BY to do costly “Using temporary; Using filesort” operations. Adding a few multi-column indexes helped a great deal. In general, I dislike multi-column indexes because they require maintenance: if you modify a complex query, you might have to add or change the index, or it will become slow as molasses. But unfortunately, that’s the tradeoff for performance.

The “same” query can behave differently with different datasets. A query can sometimes use an index, and sometimes use a filesort, depending, for example, on constants in the WHERE clause. Be sure to use EXPLAIN with different values.

Give the query optimizer hints. Disk access is slow, especially on the VPS, so I wanted to avoid filesorts at all costs. Using STRAIGHT_JOIN forced MySQL to process an indexed table first; otherwise, it believed that a filesort would be faster in some cases. It might be, sometimes–but if you think you know better, use the hint.

Disk access on a VPS can be wonky. It depends a lot on what else is happening on the hardware node. This relates somewhat to the previous point: all other things being equal, MySQL could be right about a filesort being faster than an index. But it’s usually not the case in an environment where disk I/O is very costly.

Reduce the number of joins wherever possible. It doesn’t seem like a few indexed joins into “lookup” tables (basically key-value types of things) would affect performance that much, but amazingly, it can. Really. Break up queries into smaller, simpler ones if at all possible.

Run ANALYZE TABLE and OPTIMIZE TABLE. The manual says you don’t typically need to run these commands, but I found frequently updated tables benefit from this maintenance, even if they have only a few thousand records.

Work iteratively. Somtimes performance problems are localized to one or two spots, but more often, inefficiencies are spread out over a lot of queries; they may become noticeable only after several passes, as server conditions fluctuate. Discovering these problem areas takes a lot of patience.

Leave a Reply

Your email address will not be published. Required fields are marked *