This afternoon, Arjen Lentz and I were discussing InnoDB’s behavior without a declared PRIMARY KEY, and the topic felt interesting enough and undocumented enough to warrant its own short post.
Background on InnoDB clustered keys
In The physical structure of InnoDB index pages I described how “Everything is an index in InnoDB”. This means that InnoDB must always have a “cluster key” for each table, which is normally the PRIMARY KEY. The manual has this to say in Clustered and Secondary Indexes:
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
I had previously assumed this meant that an invisible column would be used along with the same sequence generation code that is used to implement auto_increment (which itself has some scalability issues). However the reality is that they are completely different implementations.
Implementation of implicit Row IDs
How this is actually implemented is, as the manual says, if a table is declared with no PRIMARY KEY and no non-nullable UNIQUE KEY, InnoDB will automatically add a 6-byte (48-bit) integer column called ROW_ID to the table, and cluster the data based on that column. The column won’t be accessible to any queries nor usable for anything internally such as row-based replication.
What the manual doesn’t mention is that all tables using such ROW_ID columns share the same global sequence counter (the manual says “increases monotonically” and doesn’t clarify), which is part of the data dictionary. The maximum used value for all row IDs (well, technically the next ID to be used) is stored in the system tablespace (e.g. ibdata1) in page 7 (type SYS), within the data dictionary header (field DICT_HDR_ROW_ID).
This global sequence counter is protected by dict_sys->mutex, even for incrementing (as opposed to using atomic increment). The implementation is in include/dict0boot.ic (many blank lines deleted):
38 UNIV_INLINE 39 row_id_t 40 dict_sys_get_new_row_id(void) 41 /*=========================*/ 42 { 43 row_id_t id; 44 45 mutex_enter(&(dict_sys->mutex)); 47 id = dict_sys->row_id; 49 if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) { 51 dict_hdr_flush_row_id(); 52 } 54 dict_sys->row_id++; 56 mutex_exit(&(dict_sys->mutex)); 57 58 return(id); 59 }
(You may also notice that this code lacks any protection for overflowing the 48 bits allotted to row IDs. That is unnecessarily sloppy coding, but even at a continuous 1 million inserts per second [which is probably a bit optimistic ;)] it would take about 9 years to exhaust the ID space. I guess that’s okay.)
Ensuring non-conflicting IDs are generated
The counter is flushed to disk every 256th ID generated (the define DICT_HDR_ROW_ID_WRITE_MARGIN above), by modifying the value in the SYS data dictionary page, which is logged to the transaction log. On startup, InnoDB will increase the DICT_HDR_ROW_ID stored on disk by at least 256, and at most 511. This ensures that any IDs generated will have been less than the new starting value, and thus there will not be any conflicts.
Performance and contention implications
Given how much other code within InnoDB is protected by dict_sys->mutex I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable. Additionally, every 256th value generated will cause a log write (and flush) for the SYS page modification, regardless of whether the transaction has committed yet (or ever will).
you’re awesome
Very interesting. I too assumed that generation was via a per-table sequence. Is there a reason you can think of that a sequence could not be maintained per-table?
I ask, because it is very common to have partitioned tables without primary or unique keys.
No, I don’t know of any reason it couldn’t be changed to use a per-table sequence. I would think it would be fairly easy actually.
InnoDB without PRIMARY KEY | Open Query blog
Why we could experience random insert during operations like dropping (unrelated) tables? The ROW_ID is increases monotonically, so is there any situation that the next ROW_ID less than current max ROW_ID?
Because a mutex is held while increasing the key instead of using atomic increment. The mutex has to be held by DDL operations and can be held for a long time (relatively) during table drop operations. So a drop of table X could result in a mutex wait during insert on table Y.
Nice, good to know the internal things of cluster index, Thank Jeremy C
Does some form of auto_increment_increment come into play when using Galera (PXC, etc)?
That’s a scoop, all those people i told better use no primary key rather than uuid . My fault alway trust what you have checked yourself. Thanks Jeremy
Thanks Jeremy,
Your blogs on Innodb Internal helped me to understand a lot.
Hi Jeremy,
Thanks for the insight.
I’ve tried to simulate this by insert on innodb tables without primary key, unfortunately I couldn’t find the mutex contention from “show mutex status”. Have you done any test on it?
Thanks again.
There’s a similar, more recent post with some benchmarks corroborating your suspicion on the MySQL Performance Blog.
http://www.mysqlperformanceblog.com/2013/10/18/innodb-scalability-issues-tables-without-primary-keys/
Is a composite primary key the same or does it have to be an auto_increment?
Which versions of MySQL are affected by scalability issues when using MySQL-defined hidden ROW_ID column as PK? - BlogoSfera
SQL优化之索引优化(1): 索引的结构与优缺点 – 胤子哥
Impact on Insert Performance — Primary Keys in MySQL - Coiner Blog
Primary Key in InnoDB: how they work and best practices - Federico Razzoli
Why MySQL tables need a primary key - Federico Razzoli
浅析InnoDB索引结构 – iMySQL | 老叶茶馆
Why Tables need a Primary Key in MariaDB and MySQL - Vettabase
MySQL: GIPK (InnoDB and Primary Keys) | Die wunderbare Welt von Isotopp
MySQL – 10 Performance Tips | siddharth's space