Madness continues: SHOW STATUS

Last month I blogged about the silly changes to the default for the SHOW STATUS command in MySQL 5.0 in Stop the madness: SHOW STATUS, and I filed MySQL Bug #19093: SHOW STATUS defaults to SESSION.

Well, it looks like Monty has spoken and this will not be fixed. That’s unfortunate, but oh well, I tried.

You can read the full text of Monty’s reply in the bug report itself, but his reasons for supporting the current behaviour boil down to:

  • It’s similar to what SHOW VARIABLES does. — Yes, this is true, but neither commands’ output makes sense, really. If you do SHOW [SESSION] STATUS, you get a mix of session-scoped and global-scoped results back, and there’s no way to tell which is which. Same goes for SHOW [SESSION] VARIABLES. Ugh.
  • MySQL 5.0 is considered “stable” now, so it’s bad to change it now. — I agree on this part, it’s bad form to change it now, but at the same time, very few people are really using 5.0, so this is in fact a good time to change it, before people find out that it’s broken.
  • MySQL’s documentation has long suggested that people use FLUSH STATUS and SHOW STATUS to find out how query was executed, which kind of works now. — Yes, the manual has suggested this, and a few people have made limited use of it, but come on, it’s impossible to use this in production, and it’s only even of limited value on a test server.

I suggested a compromise: To add a Scope column to the output of the SHOW STATUS and SHOW VARIABLES commands, which would indicate the scope of the value seen in the output. What do you think?

MySQL 5.1: Partitioning getting closer

Back in March, I wrote about Partitioning in MySQL 5.1 not being usable, and since a few releases have gone out, and a few months have passed, I am re-visiting the issues I saw back then.

MySQL still doesn’t allow you to partition using a native DATE or DATETIME column. This is a major problem in my opinion, as it’s probably one of the most common uses for partitioning, but is only seen as a “feature request” by MySQL. Hopefully that will get fixed sometime in the near future.

I filed a couple more bugs today, the most serious of which was the inability to repair “crashed” partitions:

mysql> alter table ontime repair partition p_199609;
ERROR 145 (HY000): Table './flightstats/ontime#P#p_199609' is marked as
crashed and should be repaired

Here are the bugs I filed:

Partitioning still has quite a way to go before it’s really usable in my opinion, but it’s certainly improving!

On disk performance and MySQL replication

If you’re not using replication, the only thing you have to worry about is MyISAM and InnoDB performance. There are a lot of things you can easily do to get more (or less!) performance out of either of them. So, you get your system tuned and handling 1500 updates per second, easy as pie. You’ve got a single IDE, SATA, or SCSI disk, or maybe two using software RAID.

Then, due to natural progression, needed scalability, or catastrophic failure, you realize that you should probably be using replication, so that you can have more than one of these systems working at once.

The first step in setting up replication is normally to turn on binary logging (by enabling log-bin) on the future master. This ensures that every change to the database is logged so that it can be sent to your future slaves and executed there. Here, though, you find your first obstacle.

You enable binary logging, and all of the sudden, performance goes to hell. Your server was running 1500 updates per second, but now it’s running only 150 per second. You freak out. You disable binary logging, and things return to normal. What’s happening?

In order to replicate to slaves, and to maintain consistency, MySQL’s binary logging mechanism writes the binary log “event” for a transaction as part of the commit process. Before a transaction can return “OK” for a COMMIT command (or an implicit commit), the transaction must have been logged to disk in the binary log. In order to ensure that the OS doesn’t cheat and cache the write, MySQL calls fsync on the binary log as part of this commit.

As long as the system is also doing some reads, a few other writes, etc., you can assume that any time you write data and force it to disk with fsync, it will require a disk seek in order to be written. Disk seeks on most disks take anywhere from 4-8ms each, so a single disk can only do anywhere from 125-250 of them per second, on average.

Are things starting to come together now? Enabling binary logging drops your update rate to ~150 per second because each of those updates will need to be written to the binary log, and flushed to disk, and you’re limited by the number of seeks that the disk can physically do in a given second. The only way to get more updates per second through the binary logging mechanism is to do fewer disk seeks. There are a few ways to accomplish this:

  • Unsafe syncing — You can use the sync_binlog=n option, where n > 1 to not call fsync for every single transaction. This is the easiest option, but it means that if your server crashes (power failure, kernel panic, etc.), you will potentially lose n number of committed transactions. That’s generally bad.
  • RAID with battery-backed write cache (BBWC) — This option, though the most expensive, is the most common. The BBWC is some amount of RAM, usually 128MB or so, on the RAID card itself, that is used to cache writes. The RAID card will acknowledge a write once it has been saved in the RAM, but before it has been written to the disk.
  • Separate disk — Put the binary log on its own disk(s)1 using Linux’s ext3 filesystem with a sizable journal, mounted with the options data=journal,commit=10.
  • Ramdisk journal — Put the binary log on a shared disk, but on its own volume, using Linux’s ext3 filesystem, with the journal on a battery-backed hardware ramdisk device. I’ll be writing some more blog entries about this quite soon. :)

I hope you’ve learned something. Feel free to leave me some feedback!

1 Yes, it’s own disk. If anything else touches the disk, ever, this strategy fails.

Computing distance in miles

At the MySQL Users Conference, I mentioned that I had a stored function for computing distance in miles/kilometers/etc. from latitude and longitude. Some people have asked me for it, so here you go.

First, a general function for computing great circle distance from the radius of the great circle, and latitude/longitude for each point:

DROP FUNCTION IF EXISTS gc_dist;
CREATE FUNCTION gc_dist (
  radius DOUBLE,
  A_lat DOUBLE,
  A_lon DOUBLE,
  B_lat DOUBLE,
  B_lon DOUBLE
) RETURNS DOUBLE
RETURN (2 * radius * ATAN2(SQRT(@x := (POW(SIN((RADIANS(A_lat)-RADIANS(B_lat)) /
 2), 2) + COS(RADIANS(B_lat)) * COS(RADIANS(A_lat)) * POW(SIN((RADIANS(A_lon) - 
RADIANS(B_lon)) / 2), 2))), SQRT(1 - @x)));

Next, we can use gc_dist() to compute the distance in miles—on Earth—by passing in the radius of the Earth in meters:

DROP FUNCTION IF EXISTS earth_distance_miles;
CREATE FUNCTION earth_distance_miles (
  A_lat DOUBLE,
  A_lon DOUBLE,
  B_lat DOUBLE,
  B_lon DOUBLE
) RETURNS DOUBLE 
RETURN gc_dist(3963.1676, A_lat, A_lon, B_lat, B_lon);

You can also easily write another stored function to do the same using a POINT data type by using the X() and Y() functions within the stored function to extract the latitude and longitude. I’ll leave that as an exercise for the reader.

MySQL Users Conference 2006

Well, the conference is now over! I had a great time, and met a lot of awesome people. It was awesome, I truly enjoyed meeting many of you, and will definitely see you next year.

I’ve uploaded my photos and presentations for your enjoyment:

See you all next year!