Skipping the MySQL Conference and Expo 2009

I have been silent on the topic of this year’s MySQL conference, and really, silent in nearly all ways anyway. Today, as the MySQL Users Conference Conference and Expo 2009 starts up, some people will be wondering where I am, so I ought to at least answer that: I am skipping the conference this year.

As my wife can tell you, this is not a decision I’ve taken lightly, and I’ve gone back and forth on it for months and then down to the final days leading up to now. I’ve decided after much internal and external debate just to skip it all this year, including the side conferences and other stuff. Why? The reasons are basically:

  • It’s my opinion that the “state of the art” with MySQL has not changed, so I am not really missing much technical content to further myself. There’s a lot of interesting stuff going on with Percona and Google’s work, but I largely follow that through blogs and mailing lists and personal contacts already.
  • I strongly disagree with the speaker selection process used in this and the previous couple of years’ conferences. I feel that they actively discourage any speakers who refuse to stay on script—those willing to tell users the reality of the situation. I can’t and/or won’t do that, so I didn’t submit anything this year—my initial effort, months ago, to disengage from the conference. I watched from the sidelines this year and listened to everyone else’s complaints about not being selected. I don’t care to hear about the process, the details, the counter-arguments, etc., but really, any selection process which ends up leaving out Percona, for any reason, is just broken. I know they got some consolation slots in the end, but that doesn’t change my opinion.
  • The previous years’ conferences, while they have been fun in various ways, have been declining in actual technical content and increasing in politics, marketing hype, PR stuff, and whatnot. This does not have value to me (see above).
  • I am tired of explaining MySQL Enterprise vs. MySQL Community, InnoDB/Oracle vs. MySQL Inc., where the patches have gone, why the foreign key implementation in MySQL sucks, why the triggers implementation sucks, the problems with replication, etc., ad infinitum. Nothing has changed, it’s all basically as broken as it was last year, the year before that, etc.

OK, many people know the above about me already. So why not go to the Percona Performance Conference or hang out and chat with people, or…

  • The most important reason: I am making a concerted effort to stay out of the politics of the whole conference this year. While I feel that I add nominally value to the conference by analyzing and reporting on the happenings, and asking uncomfortable questions, I just can’t do it this year. I need a break.

I debated potentially taking this entire week as vacation and really disappearing for the entire week, disconnected, but I really value the friendship of all of my friends that come out to California from the far reaches of the Earth for the conference. I will be in the Bay Area (as usual every week), and I would very much enjoy having a politics-free drama-free dinner if you’re up for it. Contact me!

Administrative Note: I have disabled comments on this entry, as I don’t want the debate and politics to shift over here. I apologize in advance for my anti-blog-like behavior in that regard. I welcome any and all notes and comments at jeremy@jcole.us.

MySQL Community split officially a failure

A few days ago, I got the opportunity to hear about some upcoming changes in MySQL Community and MySQL Enterprise. I’ve been waiting for an official announcement before commenting on the changes, and Kaj has finally posted the official announcement on his blog in Refining MySQL Community Server.

In summary, the changes are:

  • Community gets no new features in any version once that version becomes GA — This effectively means that the difference between the content of Community and Enterprise approaches nil, since the addition of “Community Enhancements” was the major selling point for MySQL Community; In addition, it means that as of today, any new actual features go into MySQL 5.2, aka never-never land
  • Some changes in the policy as to frequency of Community builds, which amounts to no change in the status quo — MySQL has changed its promise for Community releases from 2 per year to 4 per year, despite the fact that we have had 4 releases already in the first half of 2007
  • MySQL will start to hide their Enterprise source releases from the public — A reaction to several Linux distributions using Enterprise releases for their bundled packages, Dorsal Source building binaries of Enterprise, and other issues; this doesn’t really solve any problems, however, as those who need or want the files will still get them all the same

So, what are my thoughts on the matter?

The “MySQL Community” concept has failed

As Kaj admits on IRC after a bit of prodding:

<kaj> JeremyC: Our past 10 months since Oct has been a struggle to make that work, and we’ve failed.

Only 10 months ago in October 2006, MySQL rocked the world a bit with Kaj’s announcement of the split between Community and Enterprise. For MySQL Community, Kaj promised:

  • early access to MySQL features under development — this hasn’t happened, and I don’t see how it could have, as Community was intended to be released infrequently
  • that MySQL AB will listen to their input — nothing has changed in this regard
  • timely corrections to bug fixes they report — nothing has changed in this regard
  • help with enhancing MySQL for their particular needs — nothing has changed in this regard
  • channels to communicate with the rest of community for getting assistance — some nice changes here with the establishment of the FreeNode #mysql-dev IRC channel and the appointment of Chad Miller as community liaison
  • an easier process for having contributions accepted in MySQL — very little has changed in this regard
  • commitment to Open Source — including free, unrestricted availability of source code — uh, ok, kind of assumed

Has the above happened? No, not really. Other than a reduction in frequency for the Community tree, nothing has changed compared to how things used to be.

The fundamental idea behind the Community and Enterprise split is a reasonable one. It’s a model that has worked very well for RedHat with their Fedora / RHEL split (in fact I often recommend RHEL to our customers, because it has worked so well1 for most of them), and I think given the right implementation this model could work nicely for MySQL as well.

MySQL fundamentally misunderstands their community

Generally speaking, any contributions to the server will be to address specific problems, mostly in larger systems. That means that any possible contributions against the server are needed because either of bugs or deficiencies in MySQL that are already affecting production systems. Very few, if any, of the features we’re writing are just because it would be fun. That means we need those features in a version of MySQL we can actually use.

The promise with MySQL Community was that those contributions, small fixes, etc., would be accepted so that we could get on with using them in our production systems if we’re willing to use the Community releases. Eventually, after the changes are vetted and proven stable, they would possibly be pushed into Enterprise. This didn’t really work at all… since the releases of Community are so infrequent, very little vetting happens, and there is no real feedback loop with the users, due to the delay in seeing actual fixes implemented.

The split was confusing from the start

The version numbering scheme makes very little sense, even once you understand it. Case in point, since profiling was added in 5.0.37, does 5.0.44 have it? No? Huh?

Why try to keep the version numbers the same while fundamentally changing the release structure and content of each half of the split? This has confused users beyond anything else. In addition, the documentation has suffered tremendously from the change as well.

Back in my discussions before the actual split with Jay et al, I correctly predicted serious quality control issues in Enterprise given the more frequent release cycle compared to Community. Back in May, I pointed out a perfect example of this in Breakdown in MySQL Enterprise process; A bug fix was applied to Enterprise which had received no testing at all in Community or anywhere, and later had to be reverted. The new changes to Community and Enterprise do absolutely nothing to address these concerns.

What are we doing about it?

Dorsal Source — Community MySQL Builds

As you know, Proven Scaling has sponsored and worked with Solid to bring you Dorsal Source, which in its current incarnation is just scratching the surface of what we hope to make available. Dorsal Source has been and will continue to provide the source packages for Enterprise, as well as community-built binaries of the even-numbered Enterprise releases. In fact, we have just posted source and binaries for MySQL 5.0.46.

If you’re handy with PHP, MySQL, XML, and/or Drupal, and you’re passionate about MySQL or the MySQL community, and interested in helping Solid and Proven Scaling develop Dorsal Source, let me know. We’d love to have you on board.

Announcing a free and open mirror for the community

Proven Scaling immediately announces a new initiative to address the needs of our customers and the rest of the MySQL community: mirror.provenscaling.com/mysql, where we will provide a few unique—and we hope useful—things:

We will provide standard rsync access for anyone else who wants to mirror this content… just send me a note.

Commitment to continuing development of MySQL 5.0

Proven Scaling has developed quite a few patches against MySQL 5.0, and we will continue to provide useful patches and do our development against the version of MySQL that our customers use… which means MySQL 5.0 for some time to come.

As Dorsal Source matures, you will see a whole slew of new features associated with patch management—keep an eye out for that.

1 Ugh, other than the fact up2date in RHEL4 sucks. Long live yum.

Breakdown in MySQL Enterprise process

In the past few days, MySQL Community 5.0.41 was released. While reading through the changelog, I noticed the following entry:

The patches for Bug #19370 and Bug #21789 were reverted.

Upon looking at Bug #21789, I noted that it was originally committed in MySQL Enterprise 5.0.32, released December 20th, 2006. The next community release which would have contained the patch is MySQL Community 5.0.33, released January 9th, 2007. This means that not only was the patch not vetted by the community, but there was a full 20 days between the enterprise release with the patch, and the next community release which contained it. According to MySQL’s release process, it could have been a full 5 months, given the right timing…

The patches were rolled back in MySQL Enterprise 5.0.40, released April 17th, 2007. Yes, the patch was committed without much vetting, and then had to be rolled back, 118 days later, in the “enterprise” version of MySQL. Why?

Back when MySQL first polled me about the community/enterprise split, I told them that this would happen. The reason it happened, of course, is that MySQL willingly shut down its only avenue for vetting these sorts of patches. They made a similar split to RedHat Enterprise Linux (RHEL) vs. Fedora Core Linux, but for some reason broke the process at the same time: they produce releases of community much less often than enterprise. That means that nobody in the community is testing the features that they stick in enterprise. They just get pushed out with no public vetting.

The way that RHEL and Fedora work is that all the shiny new stuff is pushed into Fedora first. After it has been deemed that the Fedora process, plus plenty of internal vetting, has been successful, those patches or new versions are merged into RHEL either for the next patchset, or the next full release. This, of course, means that Fedora is always ahead of RHEL. That’s exactly the idea. RedHat is betting that enterprise users (whatever that really means, these days) want a stable slowly-moving release that is “guaranteed” to work, and easy to keep up with.

On the flip side, Fedora is great for users who want the latest and greatest all the time—primarily desktop users and developers—people who are willing to work through the quirks and contribute a bit back in the way of feedback. People that like to run yum update a couple times a week. What do they get in return? A (usually) good product that is completely free.

Why did MySQL reverse the process and make it (in my opinion) useless? I suspect their sales team thinks it would look bad if the community users “get more” than the enterprise ones. But, take a look at the MySQL releases themselves, discounting any other “features”—which are debatable—that you receive with MySQL Enterprise. Why would I pay to get a release with the same unvetted, broken, may-be-rolled-back patches as everyone else gets? Why would I suggest that our customers pay?

DorsalSource: MySQL Community Build Site Launched

Back in late 2006, MySQL AB decided to split (or “fork” for the more common open source term) their source code and release structure into two parts: “Community” and “Enterprise”. This has caused quite a lot of stirring in the MySQL market, and a lot of confusion about what exactly the difference is and how the release structure works. It’s actually not easy to really explain the new structure, and I won’t try here. The key point for the purposes of this discussion, is that MySQL is effectively no longer providing builds (binaries) of their community releases, and they don’t provide enterprise builds at all to the public. They are providing source releases of community, but fairly infrequently.

This is a big problem, because it means that there is no realistic way for the average developer or MySQL user to get regular builds (and ones that quickly address bugs) without paying MySQL for a support contract. Most of Proven Scaling‘s customers do not have support contracts with MySQL AB, and have been quite unhappy about the change, and personally I’ve lost a reasonable way to get new features pushed out to the public without excessive delays (up to 6 months to the next community release, and years for enterprise).

Back when MySQL originally polled me on this issue and told me of their plans, I told them that they would just force the community to repair the “damage” in the ecosystem, by providing the builds themselves. I even warned that the one doing the repairing could possibly be myself and Proven Scaling…

So, to get to the point…

This week, at the MySQL Conference and Expo, Solid Information Technology and Proven Scaling have announced a collaborative project to address the needs of the community for frequent releases with interesting features, bug fixes, and new patches: DorsalSource. Immediately, we have begun providing over 40 binaries of MySQL community and enterprise forks on Linux, Mac OS X, and Windows. We plan to add many additional platforms and variations of the builds, in addition to many more features to build a real developer community around MySQL.

We will continue the development of DorsalSource, adding many great new features—we have a ton of ideas, it’s just a matter of implementing them and getting them out there for users to use. If you have any questions, comments, or ideas for how to improve the site, or really anything at all, please feel free to contact me directly or leave a comment here.

Now Available: Profiling in MySQL

Back around November 2005, I started working on query profiling in MySQL via the SHOW PROFILE and SHOW PROFILES commands. It’s been an interesting ride, but profiling support is finally available in public releases of MySQL starting with MySQL Community 5.0.37!

I had a few thoughts on the process and the feature that I’d like to share:

It’s been rough — Although everyone at MySQL who had seen the patch had wildly positive feedback about it, it took almost a year and a half to get things committed. Chad Miller took up my cause (back in December?) with the profiling patch as well as many others, and things actually started making progress. Thanks Chad!

Things were changed — In order to accept the feature, MySQL wanted a few things changed, which Chad handled. An interface using INFORMATION_SCHEMA was added, which I don’t entirely agree with, and the times and statistics returned were changed to absolute instead of cumulative. More on this below.

Absolute times are misleading — With SHOW PROFILE you will see rows like this:

| query end            | 0.00028300 | 

Does that mean it took 0.283ms to end the query? Not necessarily. The only way SHOW PROFILE knows when to cut off the timer is when the status next changes. Since the status messages were only meant to be informational, and in fact many of them were never meant to be seen in the first place, the status is not always changed in logical places in order to collect accurate timestamps this way.

My original patch only used cumulative numbers—they don’t imply any given amount of time spent in a particular place, just the total time or statistics collected at the moment the status was changed. I may submit a patch to once again reveal this information with e.g. SHOW CUMULATIVE PROFILE, as it seems very unlikely that the powers that be will allow it to be changed now.

Status messages need some updating — The last phase of the profiling patch that has yet to be done is to go through all of the status messages, cleaning them up where appropriate, and adding new messages to display more useful profiles. Perhaps I will have time to work on this soon.

Let me know how you like profiling and if you manage to make use of it!

Progress in MySQL Process List

Today I had a sort of short epiphany regarding getting progress of running statements in MySQL. MySQL already keeps a running count of rows touched in most multi-row statements (called thd->row_count1), so I figured there must be a way to make use of it. It was trivial to expose row_count through SHOW PROCESSLIST. After that, it was fairly obvious that another variable could be added: row_count_expected. For certain statements (currently only ALTER TABLE) it is easy to estimate how many rows will be touched, so that number can be used to calculate a Progress_percent column.

The Progress_rows number indicates progress within a given step of executing the query. For instance, if you run a SELECT with a GROUP BY that can’t be done using an index, you will see two cycles of Progress_rows: once with a State of “Copying to tmp table” and once with “Sending data”.

I implemented this all in a small patch to MySQL 5.0 (and backported to MySQL 4.1) which produces the following output from SHOW FULL PROCESSLIST:

mysql> show full processlist G
*************************** 1. row ***************************
              Id: 1
            User: jcole
            Host: localhost
              db: test
         Command: Query
            Time: 3
           State: copy to tmp table
            Info: alter table sclot__a type=myisam
   Progress_rows: 44141
Progress_percent: 76.09

This was really way, way too easy. Hopefully it can be one with MySQL Community soon.

1 Note that currently thd->row_count is a 32-bit unsigned integer, so it will wrap at about 4.2 billion rows. Someone should really think about fixing this. :)

Making connections more manageable

For the past few weeks off and on, as part of Proven Scaling‘s project to improve the MySQL server, I’ve been helping Joel Seligstein to really dig into the MySQL source code and add some features, in preparation for a much bigger feature coming up (more on that at a future date). He has now finished three smaller projects that have been on Proven Scaling’s and my own to-do list for quite some time: SET CONNECTION STATUS status, KILL connection_id WITH QUERY query_id, and SHOW ... FOR CONNECTION connection_id.1

SET CONNECTION STATUS

This patch adds a new SET CONNECTION STATUS status command, which allows each session to set a status which will be shown in a new Status column in SHOW FULL PROCESSLIST.

This allows a administrators to gain a bit more insight into complex multi-tier architectures, by having essentially a comment for each database connection. In SET CONNECTION STATUS, the status argument may be a complex expression, so CONCAT() and other string manipulations may be used. The current connection status may be retrieved with CONNECTION_STATUS(). Some things which I could imagine putting in the connection status are:

  • in pool — See at a glance which connections are idle in the connection pool, and which are checked out.
  • GET /foo.php — Easily see what request each connection is responsible for.
  • apache pid 672 — Allow you to easily correlate activity on a given server with activity on MySQL, without having to use netstat and friends to track things down.

Of course, there are many more creative people than me to figure out things to do with this useful feature!

KILL thread_id WITH QUERY query_id

In order to make this command useful, the query_id first had to be exposed in SHOW FULL PROCESSLIST. When WITH QUERY query_id is specified in a KILL command, KILL checks that the connection is still executing the query_id you’ve specified (and locks to ensure that it does not start a new query) before killing it. This solves a well-known race condition between SHOW PROCESSLIST and KILL, where the connection may have moved on to a potentially dangerous query to kill, such as a non-transactional UPDATE.

SHOW ... FOR CONNECTION connection_id

This patch extends the SHOW VARIABLES and SHOW STATUS commands with a FOR CONNECTION connection_id clause, which allows a user with either the same credentials as the connection, or with the SUPER privilege to view the connection’s status and variables.

All of these features will be great for debugging production systems, where it can be difficult or impossible to get any insight into what is happening at any given moment in time.

Thanks, Joel, for the hard work!

1 All three patches are against 5.0.26.