Quick and Dirty MySQL Performance Troubleshooting

Sooner or later, you’re going to get that phone call (or email, SMS, instant message, tweet, or whatever): The database is slow! Fix ASAP! And you’ll be expected to very quickly figure out what’s slowing things down–often wither very little context, background, or sense of what may have changed recently at the application layer.

It’s just a server. How hard could it be?

Well, as we all know “that depends.” It depends on a lot of things that you likely have little time to investigate in the middle of a crisis. So where should you focus your time and attention?

This article assumes that you have little to no good monitoring on the server and virtually no idea of what “typical” usage looks like either. Sadly, that’s a real-world situation for a lot of MySQL administators–especially those in smaller organizations who are expected to wear half a dozen different hats on any given day.

In other words, let’s go back to basics and talk about high-level performance troubleshooting–not fancy high end hardware or new releases of software that you haven’t had a chance to try yet.

Hardware Bottlenecks

The first thing I tend to do is check for hardware bottlenecks. That means logging into the box and running a small list of commands to get a quick idea of what’s happening. Mainly I’m looking for stress points. Which resources are most constrained right now? CPU? Memory? Disk I/O? Something else?

There are three main utilities I’ll run to in a situation like this:

  • top
  • vmstat
  • iostat

First I’m going to use top to see if anything is hogging CPU on the machine. If there are non-mysql processes using a substantial percentage of the CPU cores, I’m going to want to have a look at what that is and see about limiting its use or moving it a dedicated server. If I see mysqld using up a lot of CPU, I know it’s working hard and will have to drill into what’s happening inside of MySQL (maybe some poorly written queries). If nothing is apparently chewing up the CPU time, I know that the problem is likely elsewhere.

Next I’ll run vmstat over a fairly short interval–typically 5 or 10 seconds.

$ vmstat 5

‘ll generally run this for at least two or three minutes to get a sense of what the CPU and memory use are like. I’m also watching to see how much time the CPU is stalled waiting for I/O requests. Doing this for several minutes will make the occasional spikes really stand out and also allow for more time to catch those cron jobs that fire up every few minutes.

The last thing I’ll check before poking at MySQL itself is iostat. Just as with vmstat, I’m going to run it with a short interval (5 or 10 seconds) and do so for several minutes. I’ll likely filter the output so that I only see the output for the most active disk or array (the one where all of MySQL’s data lives).

$ iostat -x 5 | grep sdb

I’m looking closely at the % busy and tps to get a qualitative feel for how “busy” the I/O subsystem is, and I’ll watch both Blk_wrtn/s and Blk_read/s to figure out how many blocks are being written and read every second.

Once I have a basic feel for what the system is doing, I’ll start digging into MySQL itself a bit.

Probing MySQL

Looking inside MySQL is a sort of two-layer problem for me. First I want a high-level picture of what it seems to be doing and then I want to dig into the storage engine(s) doing all the I/O work, since I/O is a very common bottleneck.

For the highest level view, I want to see a number of things quickly:

  • how many queries per second is the server handling
  • how many clients are connected (and active)
  • are there many “slow” queries being executed
  • what, if any, unusual errors are being logged

The first few items can be answered by looking at the results of a few SHOW PROCESSLIST commands along with some SHOW GLOBAL STATUS. Or, better yet, by using a tool that is able to summarize and prevent that data in a more friendly and efficient manner. innotop and mytop both do that. (I wrote mytop but fully admit that innotop is more feature rich and frequently maintained. But either will handle the basics.) Sorting the running queries by execution time is often revealing.

To find out about slow queries I’m going hope that the slow query log is enabled and the server has a sane long_query_time. But even the default of 10 seconds is helpful in truly bad situations.

I’ll also want to glance through MySQL’s error log to make sure nothing bad-looking has started to appear.

With that out of the way, I’ll generally dive into the storage engine statistics. And nowadays that almost always means asking InnoDB to cough up some information with SHOW ENGINE INNODB STATUS. While there’s a wealth of information to process in that output, I’m mainly interested in a few high-level stats at the end.

----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 23904405026; in additional pool allocated 37084160
Buffer pool size 1310720
Free buffers 1
Database pages 1274443
Modified db pages 770518
Pending reads 1
Pending writes: LRU 0, flush list 1, single page 0
Pages read 733227814, created 65128628, written 1679994934
98.63 reads/s, 3.09 creates/s, 227.34 writes/s
Buffer pool hit rate 999 / 1000

I’m usually keen to see that the buffer pool hit rate is good. The closer to 1000 / 1000 things get, the happier I am. I also want to see how mange pages are being read and written per second. If either of those seems high, it should correspond to a high volume of I/O seen earlier in iostat.

The Culprit?

In nine out of ten cases, the culprit is obvious by now–or was obvious half way through this exercise. The reality is that most of the time changes that are very easy to spot are responsible for a sudden decline in performance. The trick is this: you have to be looking in order to see them!

Silly as that sounds, it’s been my experience that most problems go undetected until someone complains–especially in smaller IT groups where there isn’t someone regularly looking after the MySQL server that runs just fine 99.9% of the time. And since it fails so infrequently, nobody bothers to setup decent monitoring or performance logging infrastructure to detect problems before end users are impacted.

Sound familiar?

Just a Start…

The tasks I’ve presented here are just the beginning. Once you’ve got a good sense of what’s happening on a MySQL server, you can really start to dig in and think about how to improve the situation. Do you add or change hardware? Modify the application? Adjust some of MySQL or InnoDB’s run-time or start-time parameters?

No comments: