Going to MySQL Camp II, Brooklyn, NY

In case you live in the dark ages (that is, before RSS) and haven’t heard, MySQL Camp II is next week at Polytechnic University in Brooklyn, NY. Sign up and head over there, slackers!

I will be there to talk about Proven Scaling, HiveDB, DorsalSource, and much more! Send me a note if you’d like to meet up or talk about something specific. I will also have ample Proven Scaling bottle openers (photo thanks to Colin Charles) to be distributed!

On serving two markets and mistakes

Zack Urlocker wrote an article today on InfoWorld titled Serving Two Markets where he comments on Matt Asay’s The open-source community’s double standard on MySQL (which is a piece of work itself) and says:

Part of the issue is that often discussions about the business of open source is seen as a “zero sum game” between community users and paying customers, meaning that in order for one group to benefit, the other group must lose. To me this polarizes the discussion in an unhealthy way.

I have to admit, I haven’t seen it that way at all. And I don’t see why anyone would. When RedHat split into Fedora and RHEL, I evaluated the design of the split and said “OK”. It made sense to me. These days, I use Fedora quite a bit for personal projects, desktop machines, toying around, and testing the newest things. On the business side, I often recommend RHEL1 to customers because it “just works” and has proven itself to be quite stable. RedHat produced not one, but two products I found to be useful for different purposes.

With RedHat, there was absolutely some discontent initially—largely from the casual users who didn’t understand the RHEL issues anyway, whining because something was being “taken away” from them. Nonetheless, the split has been by all accounts highly successful. Nothing was in fact taken away from the community, actually I think Fedora is much stronger and more promising now than RedHat ever was. At the same time, they’ve also managed to appease the commercial side of things with RHEL being a fantastic and very stable server OS.

When MySQL originally discussed the split into Community and Enterprise with me, I told them all of my concerns, which apparently were in large part identical to the concerns of several other key MySQL players that they asked. Nonetheless, they went ahead with things exactly as they had discussed, with none of our concerns addressed or even acknowledged. One of the key concerns we all had was actually not that they were taking anything away from Community. Rather it was that the release structure between Community and Enterprise made no sense for Enterprise. That is, we were (and still are) concerned that MySQL is spinning its wheels and not creating a useful product we would buy or recommend our customers to buy. And I, at least, told them as much.

Jump forward to the most recent announcements. Once again we got an early look at the changes, and once again, we voiced our concerns. This time it basically amounted to “Is taking away the Enterprise source supposed to convince people to buy Enterprise?” Their answer was “Yep”. Our only response could be “Uh, good luck with that.” Once again, our concerns mostly centered around whether the Enterprise product made sense, and once again we said that it didn’t. We told them flat out that a single person mirroring the code would nullify all the “force people to buy” effects of their removal of the source, while nullifying none of the good will they lose by hiding it.

The issues around Community this time around were basically moot because nothing had really changed. We get a similar number of actual Community builds as before, and new stuff gets pushed into a far future version, when basically nothing new was being accepted anyway.

I find it ironic that Zack ends his article with:

At this stage, I think we’re all exploring different approaches to building open source businesses and communities. But the good news is, if we make mistakes along the way, folks will tell us.

Yes, we’ll certainly tell you when you make mistakes. That doesn’t mean you’ll listen or try to correct them. Horses and water and all that.

1 More accurately, I generally recommend RHEL if you’re interested in the support part of the equation, and CentOS if you’re not. Quite a lot of our customers choose RHEL, if for nothing else than to give back to RedHat for creating a great product. Let’s not talk about up2date for now. ;)

MySQL Community split officially a failure

A few days ago, I got the opportunity to hear about some upcoming changes in MySQL Community and MySQL Enterprise. I’ve been waiting for an official announcement before commenting on the changes, and Kaj has finally posted the official announcement on his blog in Refining MySQL Community Server.

In summary, the changes are:

  • Community gets no new features in any version once that version becomes GA — This effectively means that the difference between the content of Community and Enterprise approaches nil, since the addition of “Community Enhancements” was the major selling point for MySQL Community; In addition, it means that as of today, any new actual features go into MySQL 5.2, aka never-never land
  • Some changes in the policy as to frequency of Community builds, which amounts to no change in the status quo — MySQL has changed its promise for Community releases from 2 per year to 4 per year, despite the fact that we have had 4 releases already in the first half of 2007
  • MySQL will start to hide their Enterprise source releases from the public — A reaction to several Linux distributions using Enterprise releases for their bundled packages, Dorsal Source building binaries of Enterprise, and other issues; this doesn’t really solve any problems, however, as those who need or want the files will still get them all the same

So, what are my thoughts on the matter?

The “MySQL Community” concept has failed

As Kaj admits on IRC after a bit of prodding:

<kaj> JeremyC: Our past 10 months since Oct has been a struggle to make that work, and we’ve failed.

Only 10 months ago in October 2006, MySQL rocked the world a bit with Kaj’s announcement of the split between Community and Enterprise. For MySQL Community, Kaj promised:

  • early access to MySQL features under development — this hasn’t happened, and I don’t see how it could have, as Community was intended to be released infrequently
  • that MySQL AB will listen to their input — nothing has changed in this regard
  • timely corrections to bug fixes they report — nothing has changed in this regard
  • help with enhancing MySQL for their particular needs — nothing has changed in this regard
  • channels to communicate with the rest of community for getting assistance — some nice changes here with the establishment of the FreeNode #mysql-dev IRC channel and the appointment of Chad Miller as community liaison
  • an easier process for having contributions accepted in MySQL — very little has changed in this regard
  • commitment to Open Source — including free, unrestricted availability of source code — uh, ok, kind of assumed

Has the above happened? No, not really. Other than a reduction in frequency for the Community tree, nothing has changed compared to how things used to be.

The fundamental idea behind the Community and Enterprise split is a reasonable one. It’s a model that has worked very well for RedHat with their Fedora / RHEL split (in fact I often recommend RHEL to our customers, because it has worked so well1 for most of them), and I think given the right implementation this model could work nicely for MySQL as well.

MySQL fundamentally misunderstands their community

Generally speaking, any contributions to the server will be to address specific problems, mostly in larger systems. That means that any possible contributions against the server are needed because either of bugs or deficiencies in MySQL that are already affecting production systems. Very few, if any, of the features we’re writing are just because it would be fun. That means we need those features in a version of MySQL we can actually use.

The promise with MySQL Community was that those contributions, small fixes, etc., would be accepted so that we could get on with using them in our production systems if we’re willing to use the Community releases. Eventually, after the changes are vetted and proven stable, they would possibly be pushed into Enterprise. This didn’t really work at all… since the releases of Community are so infrequent, very little vetting happens, and there is no real feedback loop with the users, due to the delay in seeing actual fixes implemented.

The split was confusing from the start

The version numbering scheme makes very little sense, even once you understand it. Case in point, since profiling was added in 5.0.37, does 5.0.44 have it? No? Huh?

Why try to keep the version numbers the same while fundamentally changing the release structure and content of each half of the split? This has confused users beyond anything else. In addition, the documentation has suffered tremendously from the change as well.

Back in my discussions before the actual split with Jay et al, I correctly predicted serious quality control issues in Enterprise given the more frequent release cycle compared to Community. Back in May, I pointed out a perfect example of this in Breakdown in MySQL Enterprise process; A bug fix was applied to Enterprise which had received no testing at all in Community or anywhere, and later had to be reverted. The new changes to Community and Enterprise do absolutely nothing to address these concerns.

What are we doing about it?

Dorsal Source — Community MySQL Builds

As you know, Proven Scaling has sponsored and worked with Solid to bring you Dorsal Source, which in its current incarnation is just scratching the surface of what we hope to make available. Dorsal Source has been and will continue to provide the source packages for Enterprise, as well as community-built binaries of the even-numbered Enterprise releases. In fact, we have just posted source and binaries for MySQL 5.0.46.

If you’re handy with PHP, MySQL, XML, and/or Drupal, and you’re passionate about MySQL or the MySQL community, and interested in helping Solid and Proven Scaling develop Dorsal Source, let me know. We’d love to have you on board.

Announcing a free and open mirror for the community

Proven Scaling immediately announces a new initiative to address the needs of our customers and the rest of the MySQL community: mirror.provenscaling.com/mysql, where we will provide a few unique—and we hope useful—things:

We will provide standard rsync access for anyone else who wants to mirror this content… just send me a note.

Commitment to continuing development of MySQL 5.0

Proven Scaling has developed quite a few patches against MySQL 5.0, and we will continue to provide useful patches and do our development against the version of MySQL that our customers use… which means MySQL 5.0 for some time to come.

As Dorsal Source matures, you will see a whole slew of new features associated with patch management—keep an eye out for that.

1 Ugh, other than the fact up2date in RHEL4 sucks. Long live yum.

Consulting in Europe or Japan in September?

I am planning (and possibly Eric as well) on going to the MySQL Developer’s Meeting in Heidelberg, Germany during the week of September 17th. If you’re in Europe somewhere (in Germany, even better) and are interested in on-site consulting from Proven Scaling in mid September, let me know!

I may also be attending the MySQL Users Conference in Tokyo September 11-12, so if you’re in Japan (or near) and want MySQL consulting on-site in early or mid-September, also do let me know!

My top 5 wishes for MySQL

Jay Pipes, Stewart Smith, Marten Mickos, Frank Mash, and Kevin Burton have all gotten into it, and Marten suggested that I should write my top five. I’m usually not into lists, but this sounds interesting, so here goes!

My top 5 wishes for MySQL are:

1. Restore some of my sanity

OK, well, this actually has several sub-wishes…

a. Global and session INFORMATION_SCHEMA

This is just starting to become interesting, but I’ve told MySQL several times that it’s a mistake to mix session-scoped and global-scoped things together in INFORMATION_SCHEMA. I can only hope they will listen.

b. Namespace-based configuration

A long time ago I started writing a proposal for this, but really anything would be better than today’s jumbled mess. This would also allow plugins and storage engines to bring in not a random smattering of variables, but an entire namespace.

c. Better memory management

I’ve also started writing a proposal for this. Right now nothing really constrains memory within MySQL, and there is no effective way to be sure that you both have enough memory configured in various variables for your needs, and that you don’t run out of memory (or start swapping, or crash on 32-bit systems, etc).

d. Stop writing ugly hacks

We now have FEDERATED, BLACKHOLE, and soon a whole boat load of new stuff that’s quite hacky, and although useful in some situations, the general public should stay away from them. Yet, I continually see them come up in all different situations where people think they are a good idea. FEDERATED should have been implemented as Oracle’s DATABASE LINK feature, which is much more user-friendly, safer, and just generally better. BLACKHOLE was created to solve a stupid deficiency in replication, to allow relay slaves to not need a complete copy of the data. Why not just allow replication to pass on logs raw, or write a log proxy?

e. Fix subquery optimization

Subqueries have been available in the same broken state for over 4 years now. Why are subqueries in an IN (...) clause still optimized in an incredibly stupid and slow way, such as to make them useless? We have customers tripping over this all the time. MySQL can check off “subqueries” on the to-do list, since they do in fact work. The SQL standard doesn’t say anything about not sucking.

2. Parallel query

Parallel execution of a single query is really a requirement for larger reporting/data warehouse systems. MySQL just doesn’t make good use of having lots of disks available in its current state. Parallel execution of single queries could solve this (for reads) to a large extent.

3. Less dependency on latency in writing

MySQL (especially InnoDB, and assuming failsafes1 ON) is really dependent on the latency of the underlying IO system to get reasonable performance on writes. MySQL really ought to batch syncs to disk together better, but this is complex because of the storage engine model and replication using different logs than the storage engines. This means two things practically:

  • You must have a battery-backed write cache to make any decent use of your disks for writes whatsoever
  • Once a system fills its battery-backed write cache with random writes, performance degrades much more than it should

4. Better GIS support

MySQL is being left behind by PostGIS and Oracle Spatial and missing a large segment of the market because the GIS support is so terrible. Nobody can figure out how to get data in and out, which I’ve tried to address (as a hobby project) with libmygis. But there are still too many inherent limitations in the GIS support to make it really useful for serious projects:

  • Spatial indexes only exist in MyISAM, so you cannot use spatial indexes and transactions
  • Currently only 2-dimensional types are supported, while many users needs N-dimension support (but at least 3)
  • Lack of non-MBR-based spatial relationship functions means things like CONTAINS() are really lame
  • Spatial types store everything as a 2-dimensional DOUBLE, so every point costs 16 bytes, while many systems do not need that much accuracy for most things — a choice of lower resolution types would be nice

5. Better binary logs and log management

This is a big nebulous topic, but MySQL’s binary log format sucks, the tools (mysqlbinlog and the SQL commands in MySQL) to deal with them suck, and the replication protocol sucks. Yes, it all works. Yes, I tell all of our customers to use it (and they should!). However, overall there could be a lot to gain by fixing things up. I would like to see:

  • The log format should have each entry checksummed to catch corruption and transmission errors
  • The logs need internal indexes over the entries, in order to be able to scan forwards and backwards, and quickly find a given entry
  • Each entry needs a unique transaction ID instead of basing everything on log positions and filenames
  • A proper C library needs to exist to access the logs, so that new tools can be written and existing ones extended
  • Log archival, and later log-serving tools for the archived logs need to be written (but they would be much easier given the above library)

1 By failsafes, I mean innodb_flush_log_at_trx_commit=1 and sync_binlog=1.

UPDATE: Ronald Bradford, Alan Kasindorf, Jim Winstead, and Jonathon Coombes posted their 5 wishes, and Antony Curtis suggests that it’s not useful to wish.

UPDATE: Antony Curtis finally gave in, and Paul McCullagh, Peter Zaitsev, and Konstantin Osipov also got in on it.