On disk performance and MySQL replication

If you’re not using replication, the only thing you have to worry about is MyISAM and InnoDB performance. There are a lot of things you can easily do to get more (or less!) performance out of either of them. So, you get your system tuned and handling 1500 updates per second, easy as pie. You’ve got a single IDE, SATA, or SCSI disk, or maybe two using software RAID.

Then, due to natural progression, needed scalability, or catastrophic failure, you realize that you should probably be using replication, so that you can have more than one of these systems working at once.

The first step in setting up replication is normally to turn on binary logging (by enabling log-bin) on the future master. This ensures that every change to the database is logged so that it can be sent to your future slaves and executed there. Here, though, you find your first obstacle.

You enable binary logging, and all of the sudden, performance goes to hell. Your server was running 1500 updates per second, but now it’s running only 150 per second. You freak out. You disable binary logging, and things return to normal. What’s happening?

In order to replicate to slaves, and to maintain consistency, MySQL’s binary logging mechanism writes the binary log “event” for a transaction as part of the commit process. Before a transaction can return “OK” for a COMMIT command (or an implicit commit), the transaction must have been logged to disk in the binary log. In order to ensure that the OS doesn’t cheat and cache the write, MySQL calls fsync on the binary log as part of this commit.

As long as the system is also doing some reads, a few other writes, etc., you can assume that any time you write data and force it to disk with fsync, it will require a disk seek in order to be written. Disk seeks on most disks take anywhere from 4-8ms each, so a single disk can only do anywhere from 125-250 of them per second, on average.

Are things starting to come together now? Enabling binary logging drops your update rate to ~150 per second because each of those updates will need to be written to the binary log, and flushed to disk, and you’re limited by the number of seeks that the disk can physically do in a given second. The only way to get more updates per second through the binary logging mechanism is to do fewer disk seeks. There are a few ways to accomplish this:

  • Unsafe syncing — You can use the sync_binlog=n option, where n > 1 to not call fsync for every single transaction. This is the easiest option, but it means that if your server crashes (power failure, kernel panic, etc.), you will potentially lose n number of committed transactions. That’s generally bad.
  • RAID with battery-backed write cache (BBWC) — This option, though the most expensive, is the most common. The BBWC is some amount of RAM, usually 128MB or so, on the RAID card itself, that is used to cache writes. The RAID card will acknowledge a write once it has been saved in the RAM, but before it has been written to the disk.
  • Separate disk — Put the binary log on its own disk(s)1 using Linux’s ext3 filesystem with a sizable journal, mounted with the options data=journal,commit=10.
  • Ramdisk journal — Put the binary log on a shared disk, but on its own volume, using Linux’s ext3 filesystem, with the journal on a battery-backed hardware ramdisk device. I’ll be writing some more blog entries about this quite soon. :)

I hope you’ve learned something. Feel free to leave me some feedback!

1 Yes, it’s own disk. If anything else touches the disk, ever, this strategy fails.

On IPs, hostnames, and MySQL

This is the first official post in my new category: MySQL Tips. Feel free to subscribe to the category-specific RSS feed, if you prefer.

First, for a bit of background information…

In MySQL, access is always granted based on the combination of username and hostname (or IP address, in some cases), and password—for example, jcole@foo.example.bar could be a valid username and hostname combination. As far as I know, this is a historical thing—MySQL authentication is based on username and hostname because it has always been that way.

In order to verify the “network credentials” of the connecting client, MySQL uses a “double reverse-DNS lookup” on the IP of the incoming connection. In short, MySQL first finds out the hostname using the IP address of the incoming connection via a reverse-DNS lookup using the gethostbyaddr() function or similar. It then resolves that hostname to its IP addresses via a forward-DNS lookup using the gethostbyname() function or similar. Finally it verifies that the original connecting IP is one of the IPs returned.

This can be a problem because reverse-DNS and forward-DNS lookups are fairly expensive, and block waiting on a reply from a DNS server. There are timeouts involved, but they are fairly long, I believe 30 seconds. In order to avoid this long overhead, MySQL has a hostname cache internally.

There are a few problems with the current implementation:

  • The hostname cache is statically sized at 128 entries1. If the cache gets full, the older entries will be purged. If you have more than 128 distinct client machines connecting to a single MySQL server, the cache becomes completely ineffective.
  • If a DNS server cannot be reached to resolve an IP address or hostname, the connection cannot be accepted, and in fact in some circumstances MySQL will cache the failure and refuse further connections from the same IP.
  • If the forward-DNS mapping and reverse-DNS mapping don’t match up, MySQL will refuse to accept the connection.
  • There is no way to see the contents of the hostname cache2. The only available SQL command dealing with it is FLUSH HOSTS which just clears the entire cache.
  • If a particular client machine fails connecting for max_connect_errors, that client (or rather, that IP address) will be refused further connections unless FLUSH HOSTS is run.
  • If libwrap (tcp wrappers) support is enabled in MySQL, that can often cause the reverse-DNS lookup to be done while MySQL is in single-threaded accept(), where a delay in resolution will mean blocking all other clients from connecting.

The whole situation means a few bad things for running busy production MySQL servers:

  • MySQL’s reliability is only as good as your DNS infrastructure. If DNS is down/flaky, so is MySQL.
  • If your network is flaky, it can mean that you quickly reach the default max_connect_errors of 10 and further connections from that client machine will be refused.

I would recommend a few things:

  1. Never ever allow connections to a MySQL server from the outside world. This opens up a huge potential for DNS-based DoS attacks on MySQL.
  2. Don’t enable libwrap support in MySQL, as this generally only makes things worse.
  3. Enable skip_name_resolve in your my.cnf. This will disable all hostname resolutions, period. All GRANTs must be based on IP address.
  4. Set max_connect_errors to something very high, say 99999999. This will avoid your servers getting sporadically blocked because of network or client flakiness.

Good luck, and I hope the first entry of MySQL Tips is useful!

1, 2 I have a patch for MySQL 4.1, which adds a SHOW HOST CACHE command, and a configurable global variable host_cache_size.