Reconsidering access paths for index ordering… a dangerous optimization… and a fix!

MySQL has had an interesting optimization for years now1, which has popped up from time to time: in certain circumstances, it may choose to use an index that is index-wise less efficient, but provides the resulting rows in order, to avoid a filesort of the result.

What does this typically look like in production? A query that seems simple and easy takes much longer than it should, sometimes. (Perhaps in production, the query gets killed by pt-kill or exceeds the max_execution_time provided.) The query could be very simple indeed:

SELECT ... WHERE `other_id` = 555 ORDER BY `id` ASC LIMIT 1

There’s an index on other_id, and running the query with an appropriate USE INDEX, the query is fast. Even weirder, changing the query to use LIMIT 10 causes it to run lightning-fast! If anything, the LIMIT 1 should be faster… so, what gives?

Looking at the EXPLAIN, you may notice that the LIMIT 1 version is using access type index on the PRIMARY key (in this case id), whereas the LIMIT 10 version is using ref on a secondary key. Access type index means a full-index scan… on the primary key… which is the clustered key… which is a full table scan.

The optimization is hoping that the LIMIT n with a small enough limit will allow execution to be completed early, without scanning many rows, once the LIMIT is satisfied. This hope is often misplaced: there is no guarantee that there will be any matching rows in the first m rows of the table when ordered by the unwisely-chosen index. Hope is not a strategy.

Although the underlying issue had been reported several times already, under various circumstances, since there were so many individual bugs reported, I filed a new bug with a summary of the situation… and a patch: MySQL Bug 97001.

On MySQL Bug 97001, I proposed a solution (and provided a patch, submitted to Oracle under the NDA) introducing a new optimizer_switch flag named reconsider_index_for_order (defaulting to on to duplicate the current behavior). Although this optimization might benefit some queries, it’s too dependent on the actual data in the involved tables, so it’s not a good candidate for a general optimizer feature in my opinion. Maybe the default could eventually be off allowing users to opt into this optimization when they want it but providing compatible default behavior.

1 The underlying optimization actually appears to be have roots in some very old code, most importantly probably one specific commit in 5.1.

One thought on “Reconsidering access paths for index ordering… a dangerous optimization… and a fix!

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s