If you’re not using replication, the only thing you have to worry about is MyISAM and InnoDB performance. There are a lot of things you can easily do to get more (or less!) performance out of either of them. So, you get your system tuned and handling 1500 updates per second, easy as pie. You’ve got a single IDE, SATA, or SCSI disk, or maybe two using software RAID.
Then, due to natural progression, needed scalability, or catastrophic failure, you realize that you should probably be using replication, so that you can have more than one of these systems working at once.
The first step in setting up replication is normally to turn on binary logging (by enabling log-bin) on the future master. This ensures that every change to the database is logged so that it can be sent to your future slaves and executed there. Here, though, you find your first obstacle.
You enable binary logging, and all of the sudden, performance goes to hell. Your server was running 1500 updates per second, but now it’s running only 150 per second. You freak out. You disable binary logging, and things return to normal. What’s happening?
In order to replicate to slaves, and to maintain consistency, MySQL’s binary logging mechanism writes the binary log “event” for a transaction as part of the commit process. Before a transaction can return “OK” for a COMMIT command (or an implicit commit), the transaction must have been logged to disk in the binary log. In order to ensure that the OS doesn’t cheat and cache the write, MySQL calls fsync on the binary log as part of this commit.
As long as the system is also doing some reads, a few other writes, etc., you can assume that any time you write data and force it to disk with fsync, it will require a disk seek in order to be written. Disk seeks on most disks take anywhere from 4-8ms each, so a single disk can only do anywhere from 125-250 of them per second, on average.
Are things starting to come together now? Enabling binary logging drops your update rate to ~150 per second because each of those updates will need to be written to the binary log, and flushed to disk, and you’re limited by the number of seeks that the disk can physically do in a given second. The only way to get more updates per second through the binary logging mechanism is to do fewer disk seeks. There are a few ways to accomplish this:
- Unsafe syncing — You can use the sync_binlog=n option, where n > 1 to not call fsync for every single transaction. This is the easiest option, but it means that if your server crashes (power failure, kernel panic, etc.), you will potentially lose n number of committed transactions. That’s generally bad.
- RAID with battery-backed write cache (BBWC) — This option, though the most expensive, is the most common. The BBWC is some amount of RAM, usually 128MB or so, on the RAID card itself, that is used to cache writes. The RAID card will acknowledge a write once it has been saved in the RAM, but before it has been written to the disk.
- Separate disk — Put the binary log on its own disk(s)1 using Linux’s ext3 filesystem with a sizable journal, mounted with the options data=journal,commit=10.
- Ramdisk journal — Put the binary log on a shared disk, but on its own volume, using Linux’s ext3 filesystem, with the journal on a battery-backed hardware ramdisk device. I’ll be writing some more blog entries about this quite soon. :)
I hope you’ve learned something. Feel free to leave me some feedback!
1 Yes, it’s own disk. If anything else touches the disk, ever, this strategy fails.