Jay Pipes, Stewart Smith, Marten Mickos, Frank Mash, and Kevin Burton have all gotten into it, and Marten suggested that I should write my top five. I’m usually not into lists, but this sounds interesting, so here goes!
My top 5 wishes for MySQL are:
1. Restore some of my sanity
OK, well, this actually has several sub-wishes…
a. Global and session INFORMATION_SCHEMA
This is just starting to become interesting, but I’ve told MySQL several times that it’s a mistake to mix session-scoped and global-scoped things together in INFORMATION_SCHEMA. I can only hope they will listen.
b. Namespace-based configuration
A long time ago I started writing a proposal for this, but really anything would be better than today’s jumbled mess. This would also allow plugins and storage engines to bring in not a random smattering of variables, but an entire namespace.
c. Better memory management
I’ve also started writing a proposal for this. Right now nothing really constrains memory within MySQL, and there is no effective way to be sure that you both have enough memory configured in various variables for your needs, and that you don’t run out of memory (or start swapping, or crash on 32-bit systems, etc).
d. Stop writing ugly hacks
We now have FEDERATED, BLACKHOLE, and soon a whole boat load of new stuff that’s quite hacky, and although useful in some situations, the general public should stay away from them. Yet, I continually see them come up in all different situations where people think they are a good idea. FEDERATED should have been implemented as Oracle’s DATABASE LINK feature, which is much more user-friendly, safer, and just generally better. BLACKHOLE was created to solve a stupid deficiency in replication, to allow relay slaves to not need a complete copy of the data. Why not just allow replication to pass on logs raw, or write a log proxy?
e. Fix subquery optimization
Subqueries have been available in the same broken state for over 4 years now. Why are subqueries in an IN (...) clause still optimized in an incredibly stupid and slow way, such as to make them useless? We have customers tripping over this all the time. MySQL can check off “subqueries” on the to-do list, since they do in fact work. The SQL standard doesn’t say anything about not sucking.
2. Parallel query
Parallel execution of a single query is really a requirement for larger reporting/data warehouse systems. MySQL just doesn’t make good use of having lots of disks available in its current state. Parallel execution of single queries could solve this (for reads) to a large extent.
3. Less dependency on latency in writing
MySQL (especially InnoDB, and assuming failsafes1 ON) is really dependent on the latency of the underlying IO system to get reasonable performance on writes. MySQL really ought to batch syncs to disk together better, but this is complex because of the storage engine model and replication using different logs than the storage engines. This means two things practically:
- You must have a battery-backed write cache to make any decent use of your disks for writes whatsoever
- Once a system fills its battery-backed write cache with random writes, performance degrades much more than it should
4. Better GIS support
MySQL is being left behind by PostGIS and Oracle Spatial and missing a large segment of the market because the GIS support is so terrible. Nobody can figure out how to get data in and out, which I’ve tried to address (as a hobby project) with libmygis. But there are still too many inherent limitations in the GIS support to make it really useful for serious projects:
- Spatial indexes only exist in MyISAM, so you cannot use spatial indexes and transactions
- Currently only 2-dimensional types are supported, while many users needs N-dimension support (but at least 3)
- Lack of non-MBR-based spatial relationship functions means things like CONTAINS() are really lame
- Spatial types store everything as a 2-dimensional DOUBLE, so every point costs 16 bytes, while many systems do not need that much accuracy for most things — a choice of lower resolution types would be nice
5. Better binary logs and log management
This is a big nebulous topic, but MySQL’s binary log format sucks, the tools (mysqlbinlog and the SQL commands in MySQL) to deal with them suck, and the replication protocol sucks. Yes, it all works. Yes, I tell all of our customers to use it (and they should!). However, overall there could be a lot to gain by fixing things up. I would like to see:
- The log format should have each entry checksummed to catch corruption and transmission errors
- The logs need internal indexes over the entries, in order to be able to scan forwards and backwards, and quickly find a given entry
- Each entry needs a unique transaction ID instead of basing everything on log positions and filenames
- A proper C library needs to exist to access the logs, so that new tools can be written and existing ones extended
- Log archival, and later log-serving tools for the archived logs need to be written (but they would be much easier given the above library)
1 By failsafes, I mean innodb_flush_log_at_trx_commit=1 and sync_binlog=1.
UPDATE: Ronald Bradford, Alan Kasindorf, Jim Winstead, and Jonathon Coombes posted their 5 wishes, and Antony Curtis suggests that it’s not useful to wish.