On Sun’s acquisition of MySQL AB

If you follow the MySQL world at all, or you just have your eyes open, you have probably noticed that an agreement has been reached for Sun to acquire MySQL AB for about one billion dollars. Quite a few people have asked for my thoughts on the matter, so I will provide them publicly here for all. Overall, I see this as a mostly good thing.

I think that Sun has a very good chance of leading MySQL better than MySQL. At the same time, it’s always disconcerting to see a project managed within a very large company. Having been through the large company picture once already, I know how wrongly things can go when too many people (especially management types) are involved in a project.

At the same time, though, I’ve always liked Sun, and have high hopes for Sun’s management of the MySQL project and the people. There are a number of things I would like to see Sun do with MySQL following the acquisition:

Fix the MySQL Enterprise development model

The MySQL Enterprise (and by proxy MySQL Community) development model has been broken for a long time now. Too long. Take a cue from Sun itself and from RedHat and fix it right. I have a lot of ideas as to how the development model should work, and although our efforts have been time-constrained, we’ve made some effort to actually implement those ideas in DorsalSource.

Fix the product

There are a lot of areas where MySQL has been lacking for a long time, and the power users have been either crying in their beer (most users), or doing the work themselves (us, Google). I have shared some of these ideas with various people over the years, but here are some of the areas/ideas we have:

  • Replication works fairly well (usually), but its model is completely broken and deficient to go forward with it with all the new features of MySQL. It lacks any real solution for multiple masters, synchronous or semi-synchronous replication, safety (checksums, binlog index and master info sync problems), and conflict resolution or automatic detection (transaction ids).
  • Fix the internal memory allocation model so that it’s possible to constrain the memory usage of MySQL. The current situations sucks.
  • Remove some of the outdated cruft littered all over MySQL: MERGE tables
  • Clean up the logging (general, slow, [future] custom logs) code to be completely configurable and sane.
  • Fix auto_increment. We have suffered with it too long. The storage engine isn’t the place to generate sequences, and InnoDB and the replication model suffer greatly for it.
  • Fix the optimizer so that it makes more sane choices and can be more easily extended.
  • This is partly a product problem and partly a people problem, but stop creating 100 different experimental storage engines, and pushing them as truth. Yes, archive, federated, blackhole, I’m looking at you.

Get some muzzles on the sales and marketing team

As Proven Scaling well knows, MySQL basically sells itself. It would be great if we didn’t have to hear from any more customers that MySQL’s sales team has or is trying to screw them over. No consulting without support? Stupid rule. Insane point of view on licensing? Get rid of it. Fluff? Don’t do it.


Sun should buy Innobase Oy back from the clutches of Oracle, pull InnoDB into MySQL proper, and relicense the full set of code under LGPL, BSD, or another similar license. Personally, I think the GPL is fine for MySQL, but the MySQL sales team has done so much damage to people’s ideas about the GPL—when they do and do not need licenses for MySQL—that it’s hard to continue under the GPL now.


I’m hopeful. I’m hesitant to shout out in glee. If anyone has any questions or comments regarding the above ideas, comments, or thoughts: let me know! I’d be happy to discuss any of them.

On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS

Geo-referencing IPs is, in a nutshell, converting an IP address, perhaps from an incoming web visitor, a log file, a data file, or some other place, into the name of some entity owning that IP address. There are a lot of reasons you may want to geo-reference IP addresses to country, city, etc., such as in simple ad targeting systems, geographic load balancing, web analytics, and many more applications.

This is a very common task, but I have never actually seen it done efficiently in MySQL in the wild. There is a lot of questionable advice on forums, blogs, and other sites out there on this topic. After working with a Proven Scaling customer, I recently did some thinking and some performance testing on this problem, so I thought I would publish some hard data and advice for everyone.

Unfortunately, R-tree (spatial) indexes have not been added to InnoDB yet, so the tricks in this entry only work efficiently with MyISAM tables (although they should work with InnoDB, they will perform poorly). This is actually OK for the most part, as the geo-referencing functionality most people need doesn’t really need transactional support, and since the data tables are basically read-only (monthly replacements are published), the likelyhood of corruption in MyISAM due to any server failures isn’t very high.

The data provided by MaxMind

MaxMind is a great company that produces several geo-referencing databases. They release both a commercial (for-pay, but affordable) product called GeoIP, and a free version of the same databases, called GeoLite. The most popular of their databases that I’ve seen used is GeoLite Country. This allows you look up nearly any IP and find out which country (hopefully) its user resides in. The free GeoLite versions are normally good enough, at about 98% accurate, but the for-pay GeoIP versions in theory are more accurate. In this article I will refer to both GeoIP and GeoLite as “GeoIP” for simplicity.

GeoIP Country is available as a CSV file containing the following fields:

  • ip from, ip to (text) — The start and end IP addresses as text in dotted-quad human readable format, e.g. “”. This is a handy way for a human to read an IP address, but a very inefficient way for a computer to store and handle IP addresses.
  • ip from, ip to (integer) — The same start and end IP addresses as 32-bit integers1, e.g. 50331648.
  • country code — The 2-letter ISO country code for the country to which this IP address has been assigned, or in some cases other strings, such as “A2” meaning “Satellite Provider”.
  • country name — The full country name of the same. This is redundant with the country code if you have a lookup table of country codes (including MaxMind’s non-ISO codes), or if you make one from the GeoIP data.

A simple way to search for an IP

Once the data has been loaded into MySQL (which will be explained in depth later), there will be a have a table with a range (a lower and upper bound), and some metadata about that range. For example, one row from the GeoIP data (without the redundant columns) looks like:

ip_from ip_to country_code
50331648 68257567 US

The natural thing that would come to mind (and in fact the solution offered by MaxMind themselves2) is BETWEEN. A simple query to search for the IP would be:

SELECT country_code
FROM ip_country

Unfortunately, while simple and natural, this construct is extremely inefficient, and can’t effectively use indexes (although it can use them, it isn’t efficient). The reason for this is that it’s an open-ended range, and it is impossible to close the range by adding anything to the query. In fact I haven’t been able to meaningfully improve on the performance at all.

A much better solution

While it probably isn’t the first thing that would come to mind, MySQL’s GIS support is actually perfect for this task. Geo-referencing an IP address to a country boils down to “find which range or ranges this item belongs to”, and this can be done quite efficiently using spatial R-tree indexes in MySQL’s GIS implementation.

The way this works is that each IP range of (ip_from, ip_to) is represented as a rectangular polygon from (ip_from, -1) to (ip_to, +1) as illustrated here:

In SQL/GIS terms, each IP range is represented by a 5-point rectangular POLYGON like this one, representing the IP range of –

  50331648 -1,
  68257567 -1,
  68257567  1,
  50331648  1,
  50331648 -1

The search IP address can be represented as a point of (ip, 0), and that point with have a relationship with at least one of the polygons (provided it’s a valid IP and part of the GeoIP database) as illustrated here:

It is then possible to search these polygons for a specific point representing an IP address using the GIS spatial relationship function MBRCONTAINS and POINT3 to search for “which polygon contains this point” like this:

SELECT country_code
FROM ip_country

Pretty cool huh? I will show how to load the data and get started, then take look at how it performs in the real world, and compare the raw numbers between the two methods.

Loading the data and preparing for work

First, a table must be created to hold the data. A POLYGON field will be used to store the IP range. Technically, at this point the ip_from and ip_to fields are unnecessary, but given the complexity of extracting the IPs from the POLYGON field using MySQL functions, they will be kept anyway. This schema can be used to hold the data4:

CREATE TABLE ip_country
  id           INT UNSIGNED  NOT NULL auto_increment,
  ip_poly      POLYGON       NOT NULL,
  ip_from      INT UNSIGNED  NOT NULL,
  ip_to        INT UNSIGNED  NOT NULL,
  country_code CHAR(2)       NOT NULL,
  SPATIAL INDEX (ip_poly)

After the table has been created, the GeoIP data must be loaded into it from the CSV file, GeoIPCountryWhois.csv, downloaded from MaxMind. The LOAD DATA command can be used to do this like so:

INTO TABLE ip_country
  @ip_from_string, @ip_to_string,
  @ip_from, @ip_to,
  @country_code, @country_string
  id      := NULL,
  ip_from := @ip_from,
  ip_to   := @ip_to,
    /* clockwise, 4 points and back to 0 */
    POINT(@ip_from, -1), /* 0, top left */
    POINT(@ip_to,   -1), /* 1, top right */
    POINT(@ip_to,    1), /* 2, bottom right */
    POINT(@ip_from,  1), /* 3, bottom left */
    POINT(@ip_from, -1)  /* 0, back to start */
  country_code := @country_code

During the load process, the ip_from_string, ip_to_string, and country_string fields are thrown away, as they are redundant. A few GIS functions are used to build the POLYGON for ip_poly from the ip_from and ip_to fields on-the-fly. On my test machine it takes about 5 seconds to load the 96,641 rows in this month’s CSV file.

At this point the data is loaded, and everything is ready to go to use the above SQL query to search for IPs. Try a few out to see if they seem to make sense!

Performance: The test setup

In order to really test things, a bigger load testing framework will be needed, as well as a few machines to generate load. In my tests, the machine being tested, kamet, is a Dell PowerEdge 2950 with Dual Dual Core Xeon 5050 @ 3.00Ghz, and 4GB RAM. We have four test clients, makalu{0-3}, which are Apple Mac Mini with 1.66Ghz Intel CPUs and 512MB RAM. The machines are all connected with a Netgear JGS524NA 24-port GigE switch. For the purposes of this test, the disk configuration is not important. On the software side, the server is running CentOS 4.5 with kernel 2.6.9-55.0.2.ELsmp. The Grinder 3.0b32 is used as a load generation tool with a custom Jython script and Connector/J 5.1.5 to connect to MySQL 5.0.45.

There are a few interesting metrics that I tested for:

  • The latency and queries per second with a single client repeatedly querying.
  • Does the number of queries handled increase as the number of clients increases?
  • Is latency and overall performance adversely affected by many clients?

The test consisted of an IP search using the two different methods, and varying the number of clients between 1 and 16 in the following configurations:

Clients Machines Threads
1 1 1
2 1 2
4 1 4
8 2 4
16 4 4

Each test finds the country code for a random dotted-quad format IP address passed in as a string.

How does it perform? How does it compare?

There are a few metrics for determining the performance of these searches. If you tried the BETWEEN version of this query, you may have noticed that, in terms of human time, it doesn’t take very long anyway: I pretty consistently got 1 row in set (0.00 sec). But don’t let that fool you.

It’s clear that GIS wins hands down.

First, a look at raw performance in terms of queries per second.

Using BETWEEN, we max out at 264q/s with 16 clients:

Using MBRCONTAINS, we max out at 17600q/s with 16 clients, and it appears that it’s the test clients that are maxed out, not the server:

Next, a look at latency of the individual responses.

Using BETWEEN, we start out with a single client at 15.5ms per request, which is not very good, but still imperceptible to a human. But with 16 clients, the latency has jumped to 60ms, which is longer than many web shops allocate to completely construct a response. As the number of test clients increases, the latency gets much worse, because the query is so dependent on CPU:

Using MBRCONTAINS, we start out with a single client at 0.333ms per request, and even with 16 clients, we are well under 1ms at 0.743ms:


Definitely consider using MySQL GIS whenever you need to search for a point within a set of ranges. Performance is fantastic, and it’s relatively easy to use. Even if you are an all-InnoDB shop, as most of our customers are (and we would recommend), it may very well be worth it to use MyISAM specifically for this purpose.

Update 1: Another way to do it, and a look at performance

Andy Skelton and Nikolay Bachiyski left a comment below suggesting another way this could be done:

SELECT country_code 
FROM ip_country
WHERE ip_to >= INET_ATON('%s') 

This version of the query doesn’t act exactly the same as the other two — if your search IP is not part of any range, it will return the next highest range. You will have to check whether ip_from is <= your IP within your own code. It may be possible to do this in MySQL directly, but I haven’t found a way that doesn’t kill the performance.

Andy’s version actually performs quite well — slightly faster and more scalable than MBRCONTAINS. I added two new performance testing configurations to better show the differences between the two:

Clients Machines Threads
32 4 8
64 4 16

Here’s a performance comparison of MBRCONTAINS vs. Andy’s Method:

Latency (ms) — Lower is better:

Queries per second — Higher is better:

Once I get some more time to dig into this, I will look at why exactly BETWEEN is so slow. I’ve also run into an interesting possible bug in MySQL: If you add a LIMIT 1 to the BETWEEN version of the query, performance goes completely to hell. Huh?

Thanks for the feedback, Andy and Nikolay.


1 MySQL provides the INET_ATON() and INET_NTOA() functions for converting back and forth between dotted-quad strings (CHAR(15)) and 32-bit integers (INT UNSIGNED). You can also use the equivalent functions, if they exist, if your favorite programming language so that you can just feed an integer to MySQL. I haven’t tested the (positive) performance implications of doing that.

2 Although, strangely they offer a different solution specifically for MySQL using <= and >= operators instead of BETWEEN. I don’t find that that difference has any effect on MySQL. Maybe it was for a really old version of MySQL that didn’t have BETWEEN?

3 Pet peeve: Why does MySQL require you to pass the output of its own POLYGON, LINESTRING, POINT, etc., functions through GEOMFROMWKB in order to use them? It makes life suck that little bit more than necessary.

4 Note that if you’re looking to play around with the BETWEEN version of things, you will want to add some indexes on ip_from and ip_to. I would recommend INDEX (ip_from, ip_to) and INDEX (ip_to, ip_from) as those two seemed to perform the best that I could find (given its poor efficiency to start with).

Yahoo! Mash: Officially Lame

A week ago I got a useless invite for Yahoo! Mash — useless since the service wasn’t open even by invitation yet, and required a Yahoo! Backyard (employee only) login. The service has now officially launched, so I gave my invite a try again, and it worked! It’s invitation-only, so forgive me if you can’t follow along.

There seems to be a damn good reason it is invitation-only… it is totally lame. It is nowhere near ready for the public, embarrassingly so. It’s kind of amusing actually. Lots of things are broken, several things want you to type in e.g. size of pictures in pixels. The lameness comes from many things, not the least of which is that it comes with a “Mash Pet” which is kind of a Tamagotchi composed of pictures of a whiteboard smiley face.

The service overall is like a mix between MySpace and Facebook, and overall pretty lame. Actually, the only thing that it seems to have that is innovative (and a feature people have been wanting for YEARS) is a “this is fugly” link to turn off the custom styling of a profile. Yawn.

Citibank misunderstands mobile, sucks

So we’re sitting at our gate in the terminal in LAX, and Citibank has purchased some likely very expensive advertising space above our heads, for “CITI MOBILESM“. Purportedly, this would be a version of Citibank’s website optimized for the mobile browser. The ad has the url citi.com/citimobile which, one would assume, given the mobile target audience (and, my first thought was “ugh, that’s an unnecessarily long URL to type on a mobile”), and the big picture of a phone, should be visited on your phone.

How wrong I am. I tried it on my phone, and I got a very large Citibank-standard page (which would likely only render at all on Symbian S60 or iPhone) telling me that I am using an unsupported browser, and helpfully recommends IE, Firefox, Netscape, Safari, and AOL. Awesome experience so far, Citibank!

I try it on my laptop, and the reason for that page becomes clear. citi.com/citimobile is not supposed to be used from your phone… apparently you’re supposed to go there on a PC and sign up to receive the application as an SMS containing a link to download the application. How annoying. Whatever, let’s give it a go.

After going through a fairly simple signup process (none of which actually would have required a PC browser), I am sent an SMS message and I download and install the Java application.

The application itself is quite simple, and in fact I suspect based on its behaviour that it doesn’t have any intelligence of its own… it seems to download its menus and such on first run. So, basically, a poorly designed browser.

When you run the application, you’re given a few menu choices:

  • Account Info — Get your account balances, recent activity, etc.
  • Payments — Schedule bill payments.
  • Transfers — Make transfers between your accounts.
  • Citi Locations — Find Citibank branch and ATM locations.
  • Service — Customer service and account management stuff.

Choosing any of these options except for “Citi Locations” will ask you to log in. The first time you use it, the login process is a bit different… it asks for your phone number using a menu which took me a few minutes to figure out, since it completely deviated both from my phone’s interface, and any interface I’ve ever seen. For future logins, it uses the same kludgy interface to ask for your “telephone access code” aka password.

After pressing the middle key (usually “select”-ish) on my phone a few times, and expecting the usual “numeric entry” screen to come up, I finally figured out that you actually have to type on this screen, while the entry you’re typing into is highlighted. On my phone, that means I have to hold down the function key to type numbers.

Folks, phones have interface standards and especially Java has standard interfaces for a reason—so that your users won’t be confused, and your application will look and feel like all of the other applications on their phone.

To make matters worse, both the Account Info “recent activity” and Citi Locations search are next to useless. The recent activity gives you basically no information, not even the name of the vendor/company, very similar to the information that an ATM will give you as a printed receipt of activity.

Overall, a pretty disappointing experience with Citi Mobile!

Scaling out AND up, a compromise

You might have noticed that there’s been quite a (mostly civil, I think) debate about RAID and scaling going on recently:

I’d like to address some of the—in my opinion—misconceptions about “scaling out” that I’ve seen many times recently, and provide some of my experience and opinions.

It’s all about compromise.

Human time is expensive. Having operations, engineering, etc. deal with tasks (such as re-imaging a machine) when fixing a problem that could have been a 30-second disk swap is inefficient use of human resources. Don’t cut corners where it doesn’t make sense. This calls back to Brian’s comments about the real cost of your failed $200 part.

Scaling out doesn’t mean using crappy hardware. I think people take the “scale out” model (that they’ve often only read about from outdated conference presentations) to quite an extreme. They think scaling out means using desktop-class, bad hardware, and just buying a ton of them. That model doesn’t work, and it’s hell to maintain in the long term.

Compromise. One of the key points in the scale-out model: size the physical hardware reasonably to achieve the best compromise between scaling out and scaling UP. This is the main reason that I assert RAID is not going anywhere… it is often simply the best and cheapest way to achieve the performance and reliability that you need in each physical machine in order to make the scale out model work.

Use commodity hardware. You often hear the term “commodity hardware” in reference to scale out. While crappy hardware is also commodity, what this means is that instead of getting stuck on the low-end $40k machine, with thoughts of upgrading to the $250k machine, and maybe later the $1M machine, you use data partitioning and any number of let’s say $5k machines. That doesn’t mean a $1k single-disk crappy machine as said above. What does it mean for the machine to be “commodity”? It means that the components are standardized, common, and the price is set by the market, not by a single corporation. Use commodity machines configured with a good balance of price vs. performance.

Use data partitioning (sharding). I haven’t talked much about this in my previous posts, because it’s sort of a given. My participation in the HiveDB project and my recent talks on “Scaling and High Availability Architectures” at the MySQL Conference and Expo should say enough about my feelings on this subject. Nonetheless I’ll repeat a few points from my talk: data partitioning is the only game in town, cache everything, and use MySQL replication for high availability and redundancy.

Nonetheless, RAID is cheap. I’ve said it several times already, just to be sure you heard me correctly: RAID is a cheap and efficient way to gain both performance and reliability out of your commodity hardware. For most systems, engineering time, operations time, etc., is going to be a lot more expensive to get the same sort of reliability out of a non-RAID partitioned system versus a RAID partitioned system. Yes, other components will fail, but in a sufficiently large data-centric system with server class hardware, disks will fail 10:1 or more over anything else.

That is all, carry on.

Update: Sebastian Wallberg has translated this entry to German. Thanks Sebastian!