MySQL has had an interesting optimization for years now1, which has popped up from time to time: in certain circumstances, it may choose to use an index that is index-wise less efficient, but provides the resulting rows in order, to avoid a filesort of the result.
What does this typically look like in production? A query that seems simple and easy takes much longer than it should, sometimes. (Perhaps in production, the query gets killed by
pt-kill or exceeds the
max_execution_time provided.) The query could be very simple indeed:
SELECT ... WHERE `other_id` = 555 ORDER BY `id` ASC LIMIT 1
There’s an index on
other_id, and running the query with an appropriate
USE INDEX, the query is fast. Even weirder, changing the query to use
LIMIT 10 causes it to run lightning-fast! If anything, the
LIMIT 1 should be faster… so, what gives?
Looking at the
EXPLAIN, you may notice that the
LIMIT 1 version is using access type
index on the
PRIMARY key (in this case
id), whereas the
LIMIT 10 version is using
ref on a secondary key. Access type
index means a full-index scan… on the primary key… which is the clustered key… which is a full table scan.
The optimization is hoping that the
LIMIT n with a small enough limit will allow execution to be completed early, without scanning many rows, once the
LIMIT is satisfied. This hope is often misplaced: there is no guarantee that there will be any matching rows in the first
m rows of the table when ordered by the unwisely-chosen index. Hope is not a strategy.
Although the underlying issue had been reported several times already, under various circumstances, since there were so many individual bugs reported, I filed a new bug with a summary of the situation… and a patch: MySQL Bug 97001.
On MySQL Bug 97001, I proposed a solution (and provided a patch, submitted to Oracle under the NDA) introducing a new
optimizer_switch flag named
reconsider_index_for_order (defaulting to
on to duplicate the current behavior). Although this optimization might benefit some queries, it’s too dependent on the actual data in the involved tables, so it’s not a good candidate for a general optimizer feature in my opinion. Maybe the default could eventually be
off allowing users to opt into this optimization when they want it but providing compatible default behavior.
1 The underlying optimization actually appears to be have roots in some very old code, most importantly probably one specific commit in 5.1.
Totally agree. I’ve seen that many times as well.
Row scanned equals to 1, Is the query is optimally tuned ? – Mydbops
thanks for your share, this helps me :)