How does InnoDB behave without a Primary Key?

This afternoon, Arjen Lentz and I were discussing InnoDB’s behavior without a declared PRIMARY KEY, and the topic felt interesting enough and undocumented enough to warrant its own short post.

Background on InnoDB clustered keys

In The physical structure of InnoDB index pages I described how “Everything is an index in InnoDB”. This means that InnoDB must always have a “cluster key” for each table, which is normally the PRIMARY KEY. The manual has this to say in Clustered and Secondary Indexes:

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

I had previously assumed this meant that an invisible column would be used along with the same sequence generation code that is used to implement auto_increment (which itself has some scalability issues). However the reality is that they are completely different implementations.

Implementation of implicit Row IDs

How this is actually implemented is, as the manual says, if a table is declared with no PRIMARY KEY and no non-nullable UNIQUE KEY, InnoDB will automatically add a 6-byte (48-bit) integer column called ROW_ID to the table, and cluster the data based on that column. The column won’t be accessible to any queries nor usable for anything internally such as row-based replication.

What the manual doesn’t mention is that all tables using such ROW_ID columns share the same global sequence counter (the manual says “increases monotonically” and doesn’t clarify), which is part of the data dictionary. The maximum used value for all row IDs (well, technically the next ID to be used) is stored in the system tablespace (e.g. ibdata1) in page 7 (type SYS), within the data dictionary header (field DICT_HDR_ROW_ID).

This global sequence counter is protected by dict_sys->mutex, even for incrementing (as opposed to using atomic increment). The implementation is in include/dict0boot.ic (many blank lines deleted):

    38  UNIV_INLINE
    39  row_id_t
    40  dict_sys_get_new_row_id(void)
    41  /*=========================*/
    42  {
    43          row_id_t        id;
    44  
    45          mutex_enter(&(dict_sys->mutex));
    47          id = dict_sys->row_id;
    49          if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) {
    51                  dict_hdr_flush_row_id();
    52          }
    54          dict_sys->row_id++;
    56          mutex_exit(&(dict_sys->mutex));
    57  
    58          return(id);
    59  }

(You may also notice that this code lacks any protection for overflowing the 48 bits allotted to row IDs. That is unnecessarily sloppy coding, but even at a continuous 1 million inserts per second [which is probably a bit optimistic ;)] it would take about 9 years to exhaust the ID space. I guess that’s okay.)

Ensuring non-conflicting IDs are generated

The counter is flushed to disk every 256th ID generated (the define DICT_HDR_ROW_ID_WRITE_MARGIN above), by modifying the value in the SYS data dictionary page, which is logged to the transaction log. On startup, InnoDB will increase the DICT_HDR_ROW_ID stored on disk by at least 256, and at most 511. This ensures that any IDs generated will have been less than the new starting value, and thus there will not be any conflicts.

Performance and contention implications

Given how much other code within InnoDB is protected by dict_sys->mutex I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable. Additionally, every 256th value generated will cause a log write (and flush) for the SYS page modification, regardless of whether the transaction has committed yet (or ever will).

23 thoughts on “How does InnoDB behave without a Primary Key?”

you’re awesome

Very interesting. I too assumed that generation was via a per-table sequence. Is there a reason you can think of that a sequence could not be maintained per-table?

I ask, because it is very common to have partitioned tables without primary or unique keys.

Jeremy Cole says:

May 2, 2013 at 23:28

No, I don’t know of any reason it couldn’t be changed to use a per-table sequence. I would think it would be fairly easy actually.

Reply

InnoDB without PRIMARY KEY | Open Query blog

Why we could experience random insert during operations like dropping (unrelated) tables? The ROW_ID is increases monotonically, so is there any situation that the next ROW_ID less than current max ROW_ID?

Justin Swanhart says:

May 11, 2013 at 23:56

Because a mutex is held while increasing the key instead of using atomic increment. The mutex has to be held by DDL operations and can be held for a long time (relatively) during table drop operations. So a drop of table X could result in a mutex wait during insert on table Y.

Reply

Nice, good to know the internal things of cluster index, Thank Jeremy C

Does some form of auto_increment_increment come into play when using Galera (PXC, etc)?

That’s a scoop, all those people i told better use no primary key rather than uuid . My fault alway trust what you have checked yourself. Thanks Jeremy

Thanks Jeremy,

Your blogs on Innodb Internal helped me to understand a lot.

Hi Jeremy,

Thanks for the insight.
I’ve tried to simulate this by insert on innodb tables without primary key, unfortunately I couldn’t find the mutex contention from “show mutex status”. Have you done any test on it?

Thanks again.