Computing distance in miles

At the MySQL Users Conference, I mentioned that I had a stored function for computing distance in miles/kilometers/etc. from latitude and longitude. Some people have asked me for it, so here you go.

First, a general function for computing great circle distance from the radius of the great circle, and latitude/longitude for each point:

DROP FUNCTION IF EXISTS gc_dist;
CREATE FUNCTION gc_dist (
  radius DOUBLE,
  A_lat DOUBLE,
  A_lon DOUBLE,
  B_lat DOUBLE,
  B_lon DOUBLE
) RETURNS DOUBLE
RETURN (2 * radius * ATAN2(SQRT(@x := (POW(SIN((RADIANS(A_lat)-RADIANS(B_lat)) /
 2), 2) + COS(RADIANS(B_lat)) * COS(RADIANS(A_lat)) * POW(SIN((RADIANS(A_lon) - 
RADIANS(B_lon)) / 2), 2))), SQRT(1 - @x)));

Next, we can use gc_dist() to compute the distance in miles—on Earth—by passing in the radius of the Earth in meters:

DROP FUNCTION IF EXISTS earth_distance_miles;
CREATE FUNCTION earth_distance_miles (
  A_lat DOUBLE,
  A_lon DOUBLE,
  B_lat DOUBLE,
  B_lon DOUBLE
) RETURNS DOUBLE 
RETURN gc_dist(3963.1676, A_lat, A_lon, B_lat, B_lon);

You can also easily write another stored function to do the same using a POINT data type by using the X() and Y() functions within the stored function to extract the latitude and longitude. I’ll leave that as an exercise for the reader.

MySQL Users Conference 2006

Well, the conference is now over! I had a great time, and met a lot of awesome people. It was awesome, I truly enjoyed meeting many of you, and will definitely see you next year.

I’ve uploaded my photos and presentations for your enjoyment:

See you all next year!

On IPs, hostnames, and MySQL

This is the first official post in my new category: MySQL Tips. Feel free to subscribe to the category-specific RSS feed, if you prefer.

First, for a bit of background information…

In MySQL, access is always granted based on the combination of username and hostname (or IP address, in some cases), and password—for example, jcole@foo.example.bar could be a valid username and hostname combination. As far as I know, this is a historical thing—MySQL authentication is based on username and hostname because it has always been that way.

In order to verify the “network credentials” of the connecting client, MySQL uses a “double reverse-DNS lookup” on the IP of the incoming connection. In short, MySQL first finds out the hostname using the IP address of the incoming connection via a reverse-DNS lookup using the gethostbyaddr() function or similar. It then resolves that hostname to its IP addresses via a forward-DNS lookup using the gethostbyname() function or similar. Finally it verifies that the original connecting IP is one of the IPs returned.

This can be a problem because reverse-DNS and forward-DNS lookups are fairly expensive, and block waiting on a reply from a DNS server. There are timeouts involved, but they are fairly long, I believe 30 seconds. In order to avoid this long overhead, MySQL has a hostname cache internally.

There are a few problems with the current implementation:

  • The hostname cache is statically sized at 128 entries1. If the cache gets full, the older entries will be purged. If you have more than 128 distinct client machines connecting to a single MySQL server, the cache becomes completely ineffective.
  • If a DNS server cannot be reached to resolve an IP address or hostname, the connection cannot be accepted, and in fact in some circumstances MySQL will cache the failure and refuse further connections from the same IP.
  • If the forward-DNS mapping and reverse-DNS mapping don’t match up, MySQL will refuse to accept the connection.
  • There is no way to see the contents of the hostname cache2. The only available SQL command dealing with it is FLUSH HOSTS which just clears the entire cache.
  • If a particular client machine fails connecting for max_connect_errors, that client (or rather, that IP address) will be refused further connections unless FLUSH HOSTS is run.
  • If libwrap (tcp wrappers) support is enabled in MySQL, that can often cause the reverse-DNS lookup to be done while MySQL is in single-threaded accept(), where a delay in resolution will mean blocking all other clients from connecting.

The whole situation means a few bad things for running busy production MySQL servers:

  • MySQL’s reliability is only as good as your DNS infrastructure. If DNS is down/flaky, so is MySQL.
  • If your network is flaky, it can mean that you quickly reach the default max_connect_errors of 10 and further connections from that client machine will be refused.

I would recommend a few things:

  1. Never ever allow connections to a MySQL server from the outside world. This opens up a huge potential for DNS-based DoS attacks on MySQL.
  2. Don’t enable libwrap support in MySQL, as this generally only makes things worse.
  3. Enable skip_name_resolve in your my.cnf. This will disable all hostname resolutions, period. All GRANTs must be based on IP address.
  4. Set max_connect_errors to something very high, say 99999999. This will avoid your servers getting sporadically blocked because of network or client flakiness.

Good luck, and I hope the first entry of MySQL Tips is useful!

1, 2 I have a patch for MySQL 4.1, which adds a SHOW HOST CACHE command, and a configurable global variable host_cache_size.

Plugin-based backup for MySQL

I’ve been working on a new project to fulfill a specific need: consistent, fast, cheap, and flexible backups for MySQL, for all storage engines1. To that end I’m creating a tool called dbsnapper—a plugin-based backup tool. The tool itself is very basic and handles a few jobs: getting configuration information from the user, running through a “run sheet” of different configurable tasks, and reporting status and errors to the user.

The tasks then—the actual backup steps—are fully configurable, via plugins. In fact, the whole process isn’t even MySQL specific, and can potentially be used for PostgreSQL2 and other database as well. Remember the requirements for backups (above):

  • Consistent—We need to do some locking inside MySQL to make sure that the backups are consistent, for both MyISAM and InnoDB tables. This generally means the FLUSH TABLES WITH READ LOCK command.
  • Fast—There are two ways to get fast, but they both involve snapshotting: either inside the database, or on the volume level. The best way to get a backup quickly is by using Linux’s LVM, the Logical Volume Manager, to take a snapshot of the whole filesystem. Using mysqldump for backups fails miserably on this point.
  • Cheap—Well, backups should be free, and open source. Sorry ibbackup, sorry commercial utilities, become open source and we’ll talk.
  • Flexible—Everyone wants to do something slightly different with their backups, and in order for them to use one common tool, that tool needs to be very flexible. Most backup tools for MySQL are completely inflexible (other than the destination of the backup files). People often have slightly different requirements, why not try to make a single tool work?

It’s possible to meet all of the above requirements right now, but you would likely have to write your own backup script. When writing that script, you would likely do the minimum to make it work in your environment. Why should everyone write their own? My project3, dbsnapper is designed from the start to handle backups in a flexible and configurable way—to allow the user to decide what tools and processes to use, but to do it for them.

Keep an eye out, I’ll blog again once I’ve published the code!

1 Yes, yes, I know about the blue-sky internal online backup plans. Need I mention that online backup was originally planned for 4.0? Then it was moved to 4.1, where it would definitely get done… then to 5.0, a major new version, surely it will get done then. Now it’s likely not going to make it in 5.1, and slotted for 5.2, as far as I know. In the end, even if it does get done, that still doesn’t help people who want to backup their 4.0 or 4.1 installations, which is very common.

2 If someone is interested in working on the plugins for PostgreSQL, let me know, and I’ll give you a nudge once the plugin API is stable!

3 It need not be only “my” project. Anyone interested in helping?