[This post refers to innodb_ruby version 0.9.11 as of October 2, 2014.]
Many DBAs know that building indexes in “random” order (or really any order that greatly differs from ordered by key) can be much less efficient. However, it’s often hard to really understand why that is. With the “-illustrate” visualization modes available in innodb_ruby, it’s possible to quite easily visualize the structure of indexes. The space-lsn-age-illustrate mode to innodb_space allows visualization of all pages in a space file by “LSN age”, generating something like a heatmap of the space file based on how recently each page was modified.
(Note that a small Ruby script generate_data_simple.rb was used to generate the test tables used below. You’ll also want to have a reasonable monospace font supporting Unicode block characters correctly, for which I’d heartily recommend Adobe’s Source Code Pro.)
Building an index by insertion in key order
Inserting rows into an index in key order is much more efficient primarily for two reasons:
- Pages can be filled completely, and the database (with sufficient optimizations present) can detect the “bulk loading” behavior and efficiently split pages by creating new, empty pages, rather than splitting pages in half.
- Only the “edge” of the index is being written to; once a page has filled it will not be revisited. This can make the buffer pool, which caches database pages, much more effective.
Here’s an example of a table with only a PRIMARY KEY clustered index, built by inserting rows in key order:
$ innodb_space -s ibdata1 -T test/t space-lsn-age-illustrate
As you can see in the above image, the index pages are written to in nearly perfect order as they are allocated from the beginning of the file towards the end.
The first extent (the first line of the image) is allocated as a fragment extent, and contains single pages allocated for different purposes. You can clearly see the tablespace bookkeeping pages, FSP_HDR at page 0 and INODE at page 2, immediately followed by the root index page at page 3, all very recently modified. Following that are 32 individually allocated “fragment” pages which were allocated first in the index before it switched to allocating full extents. Then the index pages progress perfectly until the end of the used space.
Also note the LSN age histogram (printed in white above the colored legend at the bottom) shows all buckets equally filled.
Building an index by insertion in random order
Looking at an index built by insertion in key order was reassuring. What if it is built in random order instead? The result is perhaps a bit surprising:
$ innodb_space -s ibdata1 -T test/t_shuffle space-lsn-age-illustrate
Because the rows were inserted in completely random order, every page has an equal chance for insertion. This means in practice that every page is very recently modified, and this is clearly visible by the entire heatmap being purple. What this also means is that the entire table must be continually present in the buffer pool, and if it doesn’t fit, performance will suffer greatly. This is the main reason that performance of insertion in random order is terrible.
Additionally, you can see a few misbehaviors of InnoDB: Note the extents starting at page 1088, 1152, and 1216. Why do you think they look like that? Let me know your ideas in the comments.
Building a primary and secondary index in primary key order
What if you have multiple indexes? Looking at a very simple and typical case, inserting data in key order by the primary key, while a secondary index exists:
$ innodb_space -s ibdata1 -T test/t_index_before space-lsn-age-illustrate
Notice that this looks like the previous cases perfectly interleaved together, because it is exactly that. Since the primary key and secondary index contain completely different data, insertion is ordered by the primary key, but completely mis-ordered by the secondary index, resulting in the secondary index being built inefficiently.
Adding a secondary index to an existing table
The obvious answer then to the previous problem is to add the index after the data has been loaded, and this does in fact result in the expected outcome:
$ innodb_space -s ibdata1 -T test/t_index_after space-lsn-age-illustrate
When the index is built on the existing table (via ALTER TABLE ... ADD INDEX), it is built by scanning and sorting the data before insertion into the index, resulting in an optimal (and very fast) index build.
Visualizing page fill rate
While the above illustrations show how recently each page was modified during the index build, it’s also possible to use space-extents-illustrate to visualize how full each page is. The key-ordered index looks like this:
$ innodb_space -s ibdata1 -T test/t space-extents-illustrate
Compared to the random-ordered index:
$ innodb_space -s ibdata1 -T test/t_shuffle space-extents-illustrate
The random-ordered insertion caused pages to be split more frequently, and in some cases to be severely under-filled, causing a very poor page fill rate on average.
Those of you who are particularly observant will have noticed that the index illustrations in the first pictures above showed that the ordered-insertion index is significantly smaller than the random one. You can see here that the random-insertion index is 41% larger at 1043 pages compared to just 737 pages for the ordered-insertion index. Additionally, 206 more pages are left unused, making the actual disk space usage 57% larger.
Great work, great articel. I decieded I show my bosses how bad if we are using random strings to primary key… :(
But I got a error, could be this because we are using multiple ibdata? I mean ibdata1 and ibdata2.
/usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/page.rb:44:in `initialize’: Page can’t be initialized from nil space or buffer (space: #, buffer: ) (RuntimeError)
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/page.rb:24:in `new’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/page.rb:24:in `parse’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/space.rb:203:in `page’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/index.rb:31:in `page’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/index.rb:82:in `min_page_at_level’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/index.rb:176:in `each_page_at_level’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/index.rb:185:in `each_record’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/data_dictionary.rb:325:in `each_record_from_data_dictionary_index’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/data_dictionary.rb:338:in `each_table’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/data_dictionary.rb:387:in `each’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/data_dictionary.rb:387:in `select’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/data_dictionary.rb:387:in `object_by_field’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/data_dictionary.rb:404:in `table_by_name’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/lib/innodb/system.rb:86:in `space_by_table_name’
from /usr/local/rvm/gems/ruby-1.9.3-p547/gems/innodb_ruby-0.9.11/bin/innodb_space:1357:in `’
from /usr/local/rvm/gems/ruby-1.9.3-p547/bin/innodb_space:23:in `load’
from /usr/local/rvm/gems/ruby-1.9.3-p547/bin/innodb_space:23:in `’
from /usr/local/rvm/gems/ruby-1.9.3-p547/bin/ruby_executable_hooks:15:in `eval’
from /usr/local/rvm/gems/ruby-1.9.3-p547/bin/ruby_executable_hooks:15:in `’
Tibi: Currently multiple-file tablespaces (ibdata1, ibdata2) are not supported. :(
Thanks Jeremy, I will keep my eye on this project I hope it will be supported sometimes… :)
Tibi: Actually, I’ve just added support for multi-file system tablespaces to version 0.9.12 (just released). You’ll need to either pass -s ibdata1,ibdata2 or pass -s /path/to/datadir. Give it a try!
I tried it, it looks good. Thank you very much! :)
Regarding the stray pages 1088, 1152, and 1216, I can only make wild guesses because I’m not familiar with the code, but I wonder if it has something to do with page splits. That is, there may be one case in the code when InnoDB splits a page because it becomes over-filled, such that InnoDB assumes it needs to allocate a whole new extent, and then puts some of the data in the first page of that extent. But then the next time the same type of page split happens, it again allocates a new extent and uses it, not remembering that there’s plenty of space in the previous extent. Just a guess.
InnoDB 對 Primary Key 的選擇 | Gea-Suan Lin's BLOG
do you enable fast index creation when you add a secondary index to an existing table? How does it affect IO?
InnoDB表聚集索引层高什么时候发生变化 – iMySQL | 老叶茶馆
MySQL – 10 Performance Tips | siddharth's space
Question About Large InnoDB Table with Deleted Rows