Reconsidering access paths for index ordering… a dangerous optimization… and a fix!

MySQL has had an interesting optimization for years now1, which has popped up from time to time: in certain circumstances, it may choose to use an index that is index-wise less efficient, but provides the resulting rows in order, to avoid a filesort of the result.

What does this typically look like in production? A query that seems simple and easy takes much longer than it should, sometimes. (Perhaps in production, the query gets killed by pt-kill or exceeds the max_execution_time provided.) The query could be very simple indeed:

SELECT ... WHERE `other_id` = 555 ORDER BY `id` ASC LIMIT 1

There’s an index on other_id, and running the query with an appropriate USE INDEX, the query is fast. Even weirder, changing the query to use LIMIT 10 causes it to run lightning-fast! If anything, the LIMIT 1 should be faster… so, what gives?

Looking at the EXPLAIN, you may notice that the LIMIT 1 version is using access type index on the PRIMARY key (in this case id), whereas the LIMIT 10 version is using ref on a secondary key. Access type index means a full-index scan… on the primary key… which is the clustered key… which is a full table scan.

The optimization is hoping that the LIMIT n with a small enough limit will allow execution to be completed early, without scanning many rows, once the LIMIT is satisfied. This hope is often misplaced: there is no guarantee that there will be any matching rows in the first m rows of the table when ordered by the unwisely-chosen index. Hope is not a strategy.

Although the underlying issue had been reported several times already, under various circumstances, since there were so many individual bugs reported, I filed a new bug with a summary of the situation… and a patch: MySQL Bug 97001.

On MySQL Bug 97001, I proposed a solution (and provided a patch, submitted to Oracle under the NDA) introducing a new optimizer_switch flag named reconsider_index_for_order (defaulting to on to duplicate the current behavior). Although this optimization might benefit some queries, it’s too dependent on the actual data in the involved tables, so it’s not a good candidate for a general optimizer feature in my opinion. Maybe the default could eventually be off allowing users to opt into this optimization when they want it but providing compatible default behavior.

1 The underlying optimization actually appears to be have roots in some very old code, most importantly probably one specific commit in 5.1.

3 thoughts on “Reconsidering access paths for index ordering… a dangerous optimization… and a fix!

  1. Row scanned equals to 1, Is the query is optimally tuned ? – Mydbops

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s