MySQL has had an interesting optimization for years now1, which has popped up from time to time: in certain circumstances, it may choose to use an index that is index-wise less efficient, but provides the resulting rows in order, to avoid a filesort of the result.
What does this typically look like in production? A query that seems simple and easy takes much longer than it should, sometimes. (Perhaps in production, the query gets killed by
pt-kill or exceeds the
max_execution_time provided.) The query could be very simple indeed:
SELECT ... WHERE `other_id` = 555 ORDER BY `id` ASC LIMIT 1
There’s an index on
other_id, and running the query with an appropriate
USE INDEX, the query is fast. Even weirder, changing the query to use
LIMIT 10 causes it to run lightning-fast! If anything, the
LIMIT 1 should be faster… so, what gives?
Looking at the
EXPLAIN, you may notice that the
LIMIT 1 version is using access type
index on the
PRIMARY key (in this case
id), whereas the
LIMIT 10 version is using
ref on a secondary key. Access type
index means a full-index scan… on the primary key… which is the clustered key… which is a full table scan.
The optimization is hoping that the
LIMIT n with a small enough limit will allow execution to be completed early, without scanning many rows, once the
LIMIT is satisfied. This hope is often misplaced: there is no guarantee that there will be any matching rows in the first
m rows of the table when ordered by the unwisely-chosen index. Hope is not a strategy.
Although the underlying issue had been reported several times already, under various circumstances, since there were so many individual bugs reported, I filed a new bug with a summary of the situation… and a patch: MySQL Bug 97001.
On MySQL Bug 97001, I proposed a solution (and provided a patch, submitted to Oracle under the NDA) introducing a new
optimizer_switch flag named
reconsider_index_for_order (defaulting to
on to duplicate the current behavior). Although this optimization might benefit some queries, it’s too dependent on the actual data in the involved tables, so it’s not a good candidate for a general optimizer feature in my opinion. Maybe the default could eventually be
off allowing users to opt into this optimization when they want it but providing compatible default behavior.
1 The underlying optimization actually appears to be have roots in some very old code, most importantly probably one specific commit in 5.1.