by Jeremy Cole
Note: This document is a work in progress. If you have some ideas, comments, or suggestions, please feel free to contact me.
Warning: This idea is only half-baked, and may be impossible or improbable. Please feel free to shoot it full of holes.
MySQL’s memory usage generally comes from the following places:
- per-server dynamic allocations — variables such as key_buffer_size and innodb_buffer_pool_size are allocated dynamically, as-needed, which makes it difficult for a server to establish a baseline memory usage
- per-thread dynamic allocations — many different variables affect memory allocated per-thread, dynamically, as-needed; this is a good thing as it allows sharing of limited resources, but a bad thing because it’s very difficult to determine a reasonable upper bounds
- many “tiny” allocations with no real accountability
The formula to calculate MySQL’s memory usage is already fairly complex, but just to get the basics, the formula is similar to the following:
thread_memory is the maximum amount of memory needed per thread:
thread_memory := thread_stack_size + net_buffer_length + max_allowed_packet + read_buffer_size + join_buffer_size + tmp_table_size + myisam_sort_buffer_size
max_memory is the total maximum amount of memory the server could allocate:
max_memory := thread_memory * max_connections + query_cache_size + innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + key_buffer_size
MySQL has a fairly robust memory allocator based on “memory roots” which are freed on thread shutdown with a single call, thus eliminating the source of many memory leaks in similarly complex projects without a similar allocator.
The advice given in the MySQL manual and in many other contexts is that one should configure a server so that max_memory as defined above does not exceed the maximum amount allocable by the server (usually about 2GB). In reality, though, that can be next to impossible without setting many per-thread variables to unreasonably low values.
This leads to many MySQL DBAs “fudging” a bit, setting the per-thread variables to values that are reasonable in the context of the work they are doing, but unreasonable, in the sense that they are unsustainably high if all threads were to actually need them simultaneously. Once any given connection allocates more than its fair share of memory, the system may crash with an “out of memory” error, possibly corrupting data.
In addition, it’s close to impossible to really calculate the “normal” memory usage of a real server, because many of the variables involved in the calculation are rarely, if ever, actually used. It takes a lot of guesswork, and the normal memory usage of the server depends greatly on the workload and features used.
It’s impossible to currently find out how much memory is actually being used by many parts of MySQL, because of the fact that the per-thread memory buffers may or may not be allocated for any given thread. If a server seems to be using more memory than it should (well above your estimates) it’s impossible to find out which per-thread buffers (or features) are allocating the memory.
My proposed solution
My proposal comes in a few parts:
- always fully allocate all global memory buffers and global pools
- allocate per-thread memory from a well-defined “pool”
- assign all “tiny” allocations (e.g. calls to new) to a “generic” pool
- addition of several new SHOW commands to show actual memory allocations
Global memory buffers
Fully allocating global memory buffers solves a large part of the problem, because it means that the server’s memory usage should not be expected to increase significantly over time. Currently MySQL lazily allocates memory, under the assumption that it should try not to use more memory than necessary. This is actually harmful because it means that you can easily configure MySQL to use a lot more memory than it can actually allocate.
It may take months to actually encounter any problems, and you’ll be at a loss as to what could be wrong, since it’s worked fine for months beforehand.
The idea behind the memory pool concept can be summed up in two words: accountability and constrainability. Every memory allocation should be accounted for somewhere, so that leaks can be easily traced, and so that it becomes trivial to track down which features are using your memory. Every memory allocation should also be accounted for so that per-feature and per-pool maximums can be maintained, to constrain total memory usage and try to keep MySQL from crashing because of out of memory errors.
I propose the following semantics for the “pool” system:
- all allocations within the server must assign the allocation to a certain named pool
- in the case that a pool is full, or not enough memory can be allocated from the pool, two options are possible: have the system wait until more memory is available (with a timeout); have the system allocate as much memory as possible and return quickly (if the request for allocation included a flag e.g. ALLOW_SHRINK)
- maximum limits per-variable are maintained accross threads, e.g. a global pool for max_allowed_packet that maintains the maximum amount of memory allocated for all sessions use of max_allowed_packet
- maximum limits per-session totalling all variables, e.g. the DBA could specify that a session may consume up to 24MB memory in total, for all pools
- global pools are fully allocated, as described above, under “Global memory buffers”
I would expect to create a configuration like the following:
pool.session.session_size = 16M pool.global.max_allowed_packet = 64M pool.global.sort_buffer_size = 10M pool.global.generic = 2M
This would mean that each session is limited to 16M total memory usage. The total amount of memory used by all threads’ max_allowed_packet settings would be 64M, and the total memory used by all threads’ sort_buffer_size would be 10M, and generic (unnamed) allocations could use a total of 2M.
This would mean that calculating absolute maximum memory usage is now only a matter of totalling the pool.global values up, it’s no longer necessary to multiply by max_connections.
Additional SHOW commands
In order to allow the memory pool concept to be really useful to the user, a number of new SHOW commands would need to be added to the server.
SHOW GLOBAL MEMORY USAGE
The SHOW GLOBAL MEMORY USAGE command would display a list of named pools, and the total memory allocated to them by all clients and all features.
SHOW SESSION MEMORY USAGE
The SHOW SESSION MEMORY USAGE command would display a list of named pools, and the memory allocated to them by the current client.
With the above new features added, MySQL should allocate all memory which it will need up-front, during startup. If you’ve configured it to use too much memory, it will fail immediately — this is actually a good thing, following the principle of “fail fast”. After the server initially allocates its maximum memory usage, the rest of the allocations to specific threads can be done internally.