New Storage Engines: A welcome change

There’s been a lot of buzz lately about new storage engines (Solid’s SolidDB and Jim Starkey’s Falcon) being developed for MySQL. Quite a few people have asked me what I think about them, and if it’s really a seamless process to switch storage engines. Everybody still has Oracle’s acquisition of Innobase Oy fresh on their minds, so nobody is really terribly surprised by the recent announcements. As for my opinion on the matter, well, it’ll take some discussion. I was quoted in ComputerWorld‘s article MySQL to encourage partners to build data storage engines:

Jeremy Cole, who oversees about 8,000 installations of the open-source database at Yahoo Inc., said the Sunnyvale, Calif.-based Web firm uses MyISAM for applications mostly requiring the reading of data—and InnoDB when many users may be writing data simultaneously.

Cole called InnoDB “great,” but also said it is “somewhat poorly integrated” with MySQL, lacking several common features such as full-text search and online configuration changes, while poorly supporting “referential integrity,” which keeps the relationships between data tables consistent.

Furthermore, the only way to do “reasonably fast online hot backups” with InnoDB is a closed-source tool called ibbackup, which is now owned by Oracle.

“If a new storage engine offered InnoDB’s current feature set without the above problems, and was stable, I would switch in a heartbeat,” Cole said. However, he doesn’t expect any of the unnannounced storage engines “to really be ready for use for another year or so.”

I wanted to follow up and provide a bit more depth and context, and some of the technical details that were not completely appropriate for ComputerWorld’s audience:

InnoDB has been great in that it has row-level locking, supports multi-versioning and isolation. That solves quite a few problems for transactional heavy-write applications. However, it’s somewhat poorly integrated into MySQL, and has some problems of its own. To name a few:

  • InnoDB’s only option for reasonably fast online hot backup is a tool called ibbackup, which is closed-source, was previously owned and sold by Innobase Oy (for about $1400 per server) and is now owned by Oracle. I haven’t heard anything about what Oracle intends to do with ibbackup.
  • InnoDB doesn’t allow any online configuration changes, as the rest of MySQL (and MyISAM) does. (Technically: It doesn’t support using the SET command to change its configuration on-the-fly.)
  • InnoDB doesn’t support full-text search—the ability to search for words within text documents. This is a feature used by many web applications. Effectively, users must choose between no transactions, table level locking, and full-text search (MyISAM), or transactions, row-level locking, and no full-text search (InnoDB). It’s sometimes a very painful choice.
  • Innobase Oy / InnoDB made a very “cowboy” effort to support foreign keys a.k.a. “referential integrity”—instead of working with MySQL to support it as a in-built feature of MySQL itself, they basically duck-taped it onto InnoDB itself. This has caused a lot of headaches for many people, including but not limited to completely useless error messages and mysterious failures.
  • InnoDB’s tablespace management leaves a lot to be desired. There are no online tablespace management commands (CREATE TABLESPACE, DROP TABLESPACE, ALTER TABLESPACE). You have two options:

    • All data for all tables and databases is stored in a single common set of files, with no online management, no ability to shrink the tablespace, the only way to add space is to set one of the files (and only one!) to auto-extend, and no ability to move data between the tablespace files; or
    • Each table’s data is stored in a single .ibd file—this file will be as large as the size of the table, so if you have 500GB of data in one table, you have a single unmanageable 500GB file. This file can never be shrunk, it can only grow.

Switching storage engines in MySQL is actually almost as easy as they claim. However, I don’t expect any of these new storage engines to really be ready for use for another year or so. There are a lot of integration issues to be had when pulling in a new storage engine. InnoDB had quite a lot of bugs in the first year due to this as well.

Will I use them? Absolutely.

Stop the madness: SHOW STATUS

Are you a MySQL user? Have you tried 5.0? Did you notice that SHOW STATUS was giving you strange results?

Here’s the skinny: The SHOW STATUS command, which has been in MySQL, well forever, has had a long-standing feature request; to have the ability to report its metrics per-session. That functionality was added in MySQL 5.0, in the form of two new syntaxes: SHOW SESSION STATUS and SHOW GLOBAL STATUS to give the per-session and server-wide statistics, respectively. It’s great, and I’m happy to see it.

The problem comes in because of the default behaviour that was chosen for the basic SHOW STATUS with no SESSION or GLOBAL keyword—that is, the command that every MySQL DBA has been using for years—now defaults to per-session statistics.

Every tool, program, monitoring script, performance graph, etc., that uses SHOW STATUS (which is pretty much all of them) is broken in 5.0. Why? Well, for no reason, in my opinion. There is absolutely no advantage to defaulting to per-session statistics instead of the old standard of global statistics.

If you’re upset about this, add your comment to MySQL Bug #19093 and fight the good fight with me.

Oops, BLOB/TEXT overrides tmp_table_size

First, a bit of background information… MySQL has a variable called tmp_table_size which is used whenever a temporary table is needed for many types of operations, but it’s mainly used for ORDER BY and GROUP BY. The tmp_table_size variable (in correlation with the max_heap_table_size variable) decide at what point to convert an in-memory (that is, HEAP storage engine) temporary table to an on-disk (that is, MyISAM) one.

So, if the result of your query will end up greater than the lesser of tmp_table_size or max_heap_table_size, once it reaches that point, it will be automatically converted to a MyISAM table, on-disk (in tmpdir, incidentally).

Here’s a scenario: You have a SELECT query, nothing particularly special. It isn’t any more complex than another query on the same tables, but it performs a bit worse. You put some load on it, and it performs miserably, even putting the server in pain in the process. You check what’s happening… hmm, lots of blocks being written? Why would there be any blocks written, this is a SELECT!

Does the SELECT have a BLOB or TEXT column in it? Are you using a GROUP BY or ORDER BY that can’t use the same index that MySQL is choosing? (Check EXPLAIN for Using temporary or Using filesort.)

In that case, MySQL will need to create a temporary table to do the sort. Since your query contains a BLOB or TEXT column, the temporary table will have to be created as a MyISAM table, as the HEAP storage engine doesn’t support BLOB or TEXT (or even VARCHAR, for that matter, which is changed to CHAR automatically).

This will mean a lot of extra writes to the disk, for no good reason. What can you do about it?

Short term, for 4.1+: Abuse a subquery in the FROM to read the primary key of the table with the BLOB/TEXT and do the sorting that is necessary. In the outer query, select the full row, with BLOB/TEXT and return it, but don’t use ORDER BY.

Long term: Support for BLOB and TEXT need to be added to the HEAP storage engine.

What if you need to ORDER BY or GROUP BY the BLOB or TEXT column? Well, you’re screwed. Think about redesigning your database. :) (Hint: Add a column containing the first up to 255 bytes of the column as a VARCHAR and sort on that instead…)

Good luck!

MySQL 4.1 and 5.0: Prepared Statement Leaks

As it turns out, the memory leaks which I initially blamed on 5.0 also affect 4.1. It’s not stored procedures leaking memory as I initially thought, but server-side prepared statements. The problem was finally worked out by setting useServerPrepStmts=false in the JDBC code, which “fakes” prepared statements on the client side, without any other code changes.

Read MySQL Bug #18300 for more information!

Tips: Hacking MySQL Server

Hacking the latest storage engine? Working on a fancy new SHOW command? Hunting for a bug in mysqld? This is for you!

Once you’ve got MySQL compiled, how do you test your mysqld binary? You’ll need to get it started. Stop! Don’t make install it!

  1. Set up your sandbox
    $ mkdir ~/t
    $ vi ~/t/my.cnf
    

    Add the following to ~/t/my.cnf (modify to taste):

    [mysqld]
    basedir=.
    datadir=/Users/jcole/t 
    port=3307
    socket=/Users/jcole/t/mysql.sock
    skip-grant
    skip-innodb
    skip-bdb
    
    [client]
    port=3307
    socket=/Users/jcole/t/mysql.sock
    
  2. Build your source
  3. Set up a symlink for share/mysql pointing to the share directory.
    $ mkdir share && ln -s ../sql/share share/mysql
    

    This, combined with the basedir=. in my.cnf means that MySQL will be able to find its error message and character set files, so that the release you’re building doesn’t have to match the release installed on the system.

  4. Start mysqld
    $ sql/mysqld --defaults-file=~/t/my.cnf
    
  5. Start a mysql client (in another window)
    $ client/mysql --defaults-file=~/t/my.cnf
    
  6. Shutdown the mysqld (if it didn’t crash :) )
    $ client/mysqladmin --defaults-file=~/t/my.cnf shutdown
    

That’s it! Much easier, safer, and faster than using make install, and you don’t need to remember all the options to specify to start and test your newly built mysqld! If you manage to break your sandbox, just delete everything except my.cnf and you are back to a clean slate.