MySQL has had an interesting optimization for years now1, which has popped up from time to time: in certain circumstances, it may choose to use an index that is index-wise less efficient, but provides the resulting rows in order, to avoid a filesort of the result.
What does this typically look like in production? A query that seems simple and easy takes much longer than it should, sometimes. (Perhaps in production, the query gets killed by pt-kill
or exceeds the max_execution_time
provided.) The query could be very simple indeed:
SELECT ... WHERE `other_id` = 555 ORDER BY `id` ASC LIMIT 1
There’s an index on other_id
, and running the query with an appropriate USE INDEX
, the query is fast. Even weirder, changing the query to use LIMIT 10
causes it to run lightning-fast! If anything, the LIMIT 1
should be faster… so, what gives?
Looking at the EXPLAIN
, you may notice that the LIMIT 1
version is using access type index
on the PRIMARY
key (in this case id
), whereas the LIMIT 10
version is using ref
on a secondary key. Access type index
means a full-index scan… on the primary key… which is the clustered key… which is a full table scan.
The optimization is hoping that the LIMIT n
with a small enough limit will allow execution to be completed early, without scanning many rows, once the LIMIT
is satisfied. This hope is often misplaced: there is no guarantee that there will be any matching rows in the first m
rows of the table when ordered by the unwisely-chosen index. Hope is not a strategy.
Although the underlying issue had been reported several times already, under various circumstances, since there were so many individual bugs reported, I filed a new bug with a summary of the situation… and a patch: MySQL Bug 97001.
On MySQL Bug 97001, I proposed a solution (and provided a patch, submitted to Oracle under the NDA) introducing a new optimizer_switch
flag named reconsider_index_for_order
(defaulting to on
to duplicate the current behavior). Although this optimization might benefit some queries, it’s too dependent on the actual data in the involved tables, so it’s not a good candidate for a general optimizer feature in my opinion. Maybe the default could eventually be off
allowing users to opt into this optimization when they want it but providing compatible default behavior.
1 The underlying optimization actually appears to be have roots in some very old code, most importantly probably one specific commit in 5.1.
Totally agree. I’ve seen that many times as well.
Row scanned equals to 1, Is the query is optimally tuned ? – Mydbops
thanks for your share, this helps me :)