Exploring InnoDB page management with innodb_ruby

[This post refers to innodb_ruby version 0.8.8 as of February 3, 2014.]

In On learning InnoDB: A journey to the core I introduced a new library and command-line tool in the innodb_ruby project. Later on in A quick introduction to innodb_ruby I walked through installation and a few quick demos of the innodb_space command-line tool.

In my last post, Page management in InnoDB space files, I described InnoDB’s extent, file segment, and free space management structures. Now I will provide a few demonstrations of using innodb_space to examine those structures in real tables.

A minimal, empty table

I created an empty table (the schema doesn’t matter) to illustrate the “minimal” state of InnoDB’s page management structures. The space-page-type-regions mode will summarize the type of all contiguous regions of the same page type:

$ innodb_space -f test/e.ibd space-page-type-regions
start       end         count       type                
0           0           1           FSP_HDR             
1           1           1           IBUF_BITMAP         
2           2           1           INODE               
3           3           1           INDEX               
4           5           2           FREE (ALLOCATED)    

The table has allocated the standard pages for IBD space files: FSP_HDR, IBUF_BITMAP, INODE, and an INDEX page for the root of the (empty) index. There are also two FREE (ALLOCATED) pages which are unused.

The space-lists mode can be used to summarize the extent descriptor and inode lists in the space:

$ innodb_space -f test/e.ibd space-lists
name                length      f_page      f_offset    l_page      l_offset    
free                0           0           0           0           0           
free_frag           1           0           158         0           158         
full_frag           0           0           0           0           0           
full_inodes         0           0           0           0           0           
free_inodes         1           2           38          2           38          

Only the free_frag extent descriptor list has any entries, and only a single extent is in it. The free_inodes list has the one INODE page seen above in it.

The contents of the free_frag list can be examined with the space-list-iterate mode, which will print a graphic illustrating the usage of pages within all extents in an extent list (“#” means the page is used, “.” means the page is free):

$ innodb_space -f test/e.ibd -L free_frag space-list-iterate
start_page  page_used_bitmap                                                
0           ####............................................................

The file segments in all indexes in the space can be summarized with the space-indexes mode:

$ innodb_space -f test/e.ibd space-indexes
id          root        fseg        used        allocated   fill_factor 
16          3           internal    1           1           100.00%     
16          3           leaf        0           0           0.00%       

Only the internal file segment has any pages allocated, and it only has a single page allocated. The index-fseg-internal-lists mode will summarize the extent lists in the internal file segment:

$ innodb_space -f test/e.ibd -p 3 index-fseg-internal-lists
name                length      f_page      f_offset    l_page      l_offset    
free                0           0           0           0           0           
not_full            0           0           0           0           0           
full                0           0           0           0           0           

All three lists are empty, because this empty table has not allocated any full extents. So where did the 1 page that is used go? It’s a “fragment” page, and those can be listed with the index-fseg-internal-frag-pages mode:

$ innodb_space -f test/e.ibd -p 3 index-fseg-internal-frag-pages
page        index   level   data    free    records 
3           16      0       0       16252   0       

That’s the minimal state of things — mostly empty bookkeeping structures, and a single INDEX page. Let’s take a look at a table with some real data in it.

A table with one million rows

In A quick introduction to innodb_ruby, I created a table with 1 million rows in it. We’ll use the same table in the examples here.

There are a total of 2,165 pages, with the majority of them type INDEX as a typical table would be:

$ innodb_space -f test/t.ibd space-page-type-regions
start       end         count       type                
3           37          35          INDEX               
38          63          26          FREE (ALLOCATED)    
64          2188        2125        INDEX               
2189        2239        51          FREE (ALLOCATED)    
2240        2240        1           INDEX               
2241        2303        63          FREE (ALLOCATED)    
2304        2304        1           INDEX               
2305        2367        63          FREE (ALLOCATED)    
2368        2368        1           INDEX               
2369        2431        63          FREE (ALLOCATED)    
2432        2432        1           INDEX               
2433        2495        63          FREE (ALLOCATED)    
2496        2496        1           INDEX               
2497        2687        191         FREE (ALLOCATED)    

Note that there are a few gaps of ALLOCATED (free) pages in between blocks of INDEX pages. InnoDB does not guarantee that it uses free pages sequentially, and many optimizations around bulk data loading will cause pages to be used out of order. (More on page splitting and these optimizations in a future post.)

Looking at the space’s lists, there are actually a few extents in free, as well as the usual one extent in free_frag:

$ innodb_space -f test/t.ibd space-lists
name                length      f_page      f_offset    l_page      l_offset    
free                2           0           1758        0           1798        
free_frag           1           0           158         0           158         
full_frag           0           0           0           0           0           
full_inodes         0           0           0           0           0           
free_inodes         1           2           38          2           38          

The pages in the free extent descriptor list are all free, as expected:

$ innodb_space -f test/t.ibd -L free space-list-iterate
2560        ................................................................
2624        ................................................................

The free_frag extent descriptor list shows a number of “fragment” pages are used as well:

$ innodb_space -f test/t.ibd -L free_frag space-list-iterate
start_page  page_used_bitmap                                                
0           ######################################..........................

The index file segments show the bulk of the used pages are allocated to the leaf file segment, also as expected (there are only 3 non-leaf internal pages to manage the 2,137 leaf pages in the B+tree):

$ innodb_space -f test/t.ibd space-indexes
id          root        fseg        used        allocated   fill_factor 
15          3           internal    3           3           100.00%     
15          3           leaf        2162        2528        85.52%      

You can also see that the leaf index file segment has more pages allocated than it’s actually using, showing an 85.52% fill factor. This is due to InnoDB’s “segment fill factor” which is fixed at 87.5% in stock MySQL, but is now configurable in Twitter MySQL thanks to a heads up from Facebook filing MySQL Bug 64673.

Since the internal index file segment has only three pages, as expected the file segment lists are all empty:

$ innodb_space -f test/t.ibd -p 3 index-fseg-internal-lists
name                length      f_page      f_offset    l_page      l_offset    
free                0           0           0           0           0           
not_full            0           0           0           0           0           
full                0           0           0           0           0           

The three used pages are allocated as fragment pages:

$ innodb_space -f test/t.ibd -p 3 index-fseg-internal-frag-pages
page        index   level   data    free    records 
3           15      2       26      16226   2       
36          15      1       14521   1401    1117    
37          15      1       13585   2341    1045    

The leaf index file segment lists are pretty busy, with 32 full extents and 6 not full extents:

$ innodb_space -f test/t.ibd -p 3 index-fseg-leaf-lists
name                length      f_page      f_offset    l_page      l_offset    
free                0           0           0           0           0           
not_full            6           0           1518        0           1718        
full                33          0           198         0           1478        

In addition the leaf index file segment has allocated all 32 fragment pages possible (before any of the full extents above):

$ innodb_space -f test/t.ibd -p 3 index-fseg-leaf-frag-pages
page        index   level   data    free    records 
4           15      0       9812    6286    446     
5           15      0       15158   860     689     
6           15      0       10912   5170    496     
7           15      0       10670   5412    485     
8           15      0       12980   3066    590     
9           15      0       11264   4808    512     
10          15      0       4488    11690   204     
11          15      0       9680    6418    440     
12          15      0       9306    6800    423     
13          15      0       9658    6434    439     
14          15      0       10032   6062    456     
15          15      0       9988    6108    454     
16          15      0       9570    6530    435     
17          15      0       9130    6978    415     
18          15      0       8844    7266    402     
19          15      0       11770   4300    535     
20          15      0       9020    7092    410     
21          15      0       8646    7462    393     
22          15      0       9746    6354    443     
23          15      0       11066   5014    503     
24          15      0       8910    7204    405     
25          15      0       11748   4322    534     
26          15      0       10978   5094    499     
27          15      0       11132   4940    506     
28          15      0       9350    6750    425     
29          15      0       13508   2526    614     
30          15      0       14938   1082    679     
31          15      0       14520   1506    660     
32          15      0       9086    7016    413     
33          15      0       9724    6368    442     
34          15      0       10978   5102    499     
35          15      0       9504    6592    432     

The full extents, as expected, are all full:

$ innodb_space -f test/t.ibd -p 3 -L full index-fseg-leaf-list-iterate
start_page  page_used_bitmap                                                
64          ################################################################
128         ################################################################
192         ################################################################
256         ################################################################
320         ################################################################
384         ################################################################
448         ################################################################
512         ################################################################
576         ################################################################
640         ################################################################
704         ################################################################
768         ################################################################
832         ################################################################
896         ################################################################
960         ################################################################
1024        ################################################################
1088        ################################################################
1152        ################################################################
1216        ################################################################
1280        ################################################################
1344        ################################################################
1408        ################################################################
1472        ################################################################
1536        ################################################################
1600        ################################################################
1664        ################################################################
1728        ################################################################
1792        ################################################################
1856        ################################################################
1920        ################################################################
1984        ################################################################
2048        ################################################################
2112        ################################################################

The not_full extents are all partially filled, as expected:

$ innodb_space -f test/t.ibd -p 3 -L not_full index-fseg-leaf-list-iterate
start_page  page_used_bitmap                                                
2176        #############...................................................
2240        #...............................................................
2304        #...............................................................
2368        #...............................................................
2432        #...............................................................
2496        #...............................................................

You can see the artifacts of InnoDB’s page split optimizations here: It has taken the first page out of an extent several times (due to page number “hinting” which is dubious at best) in an effort to lay out data in sequential order on disk. A deeper examination of this behavior will come in the future.

What do you think?