How to find an errant MySQL client

A common story: You’ve got some connection, either it’s busy running something it shouldn’t be, it’s in Sleep but holding some important lock, or you just don’t know why it’s connected to your database server in the first place. You see it in your SHOW PROCESSLIST like so:

mysql> show processlist G
*************************** 1. row ***************************
     Id: 5979887
   User: root
   Host: localhost:55997
     db: NULL
Command: Sleep
   Time: 475
  State: 
   Info: NULL

How do you find that client, especially if it’s on another host? MySQL is providing you all the information you need above: localhost:55997. Of course localhost is the host or IP address, and 55997 is the source port of the socket; the port number (usually randomly assigned) on the far end of the socket, from the MySQL server’s perspective. You can turn that number into something useful—the PID and user—, by running the following command on the host that made the connection:

# lsof -nPi :55997
COMMAND  PID  USER   FD   TYPE    DEVICE SIZE NODE NAME
mysqld  5026 mysql 1654u  IPv4 303329996       TCP 127.0.0.1:3306->127.0.0.1:55997 (ESTABLISHED)
mysql   9146 jcole    3u  IPv4 303329995       TCP 127.0.0.1:55997->127.0.0.1:3306 (ESTABLISHED)

(Note that here you can see both sides of the socket, since I’m running these commands on localhost. Disregard the first entry, as it’s the half of the connection owned by the MySQL server.)

You can then find out the full command-line of the process with ps:

# ps -fp 9146
UID        PID  PPID  C STIME TTY          TIME CMD
jcole     9146  8740  0 12:53 pts/3    00:00:00 mysql -h 127.0.0.1 -u root -p

And see what it’s doing with strace (waiting on a read of stdin in this dumb test):

# strace -p 9146
Process 9146 attached - interrupt to quit
read(0,  <unfinished ...>
Process 9146 detached

I hope that’s helpful! It’s a pretty common debugging trick for me.

The MySQL “swap insanity” problem and the effects of the NUMA architecture

Update: Also read A brief update on NUMA and MySQL.

The “swap insanity” problem, in brief

When running MySQL on a large system (e.g., 64GB RAM and dual quad core CPUs) with a large InnoDB buffer pool (e.g., 48GB), over time, Linux decides to swap out potentially large amounts of memory, despite appearing1 to be under no real memory pressure. Monitoring reveals that at no time is the system in actual need of more memory than it has available; and memory isn’t leaking, mysqld‘s RSS is normal and stable.

Normally a tiny bit of swap usage could be OK (we’re really concerned about activity—swaps in and out), but in many cases, “real” useful memory is being swapped: primarily parts of InnoDB’s buffer pool. When it’s needed once again, a big performance hit is taken to swap it back in, causing random delays in random queries. This can cause overall unpredictable performance on production systems, and often once swapping starts, the system may enter a performance death-spiral.

While not every system, and not every workload experiences this problem, it’s common enough that it’s well known, and for those that know it well it can be a major headache.

The history of “swap insanity”

Over the past two to four years, there has been an off-and-on discussion about Linux swapping and MySQL, often titled “swap insanity” (I think coined by Kevin Burton). I have followed it closely, but I haven’t contributed much because I didn’t have anything new to add. The major contributors to the discussion over the past years have been:

  • Kevin Burton — Discussion of swappiness and MySQL on Linux.
  • Kevin Burton — Proposed IO_DIRECT as a solution (doesn’t work) and discussed memlock (may help, but not a full solution).
  • Peter Zaitsev — Discussed swappiness, memlock, and fielded a lot of discussion in the comments.
  • Don MacAskill — Proposed an innovative (albeit hacky) solution using swap on ramdisk, and a lot more interesting discussion in the comments.
  • Dathan Pattishall — Describes how Linux behavior can be even worse with swap disabled, and proposes using swapoff to clear it, but no real solution.
  • Rik van Riel on the LKML — A few answers and proposal of the Split-LRU patch.
  • Kevin Burton — Discussion of Linux Split-LRU patch with some success.
  • Mark Callaghan — Discussion of vmstat and monitoring things, and a recap of a few possible solutions.
  • Kevin Burton — More discussion that Linux Split-LRU is essential.
  • Kevin Burton — Choosing the middle road by enabling swap, but with a small amount of space, and giving up the battle.
  • Peter Zaitsev — More discussion about why swapping is bad, but no solution.

Despite all the discussion, not much has changed. There are some hacky solutions to get MySQL to stop swapping, but nothing definite. I’ve known these solutions and hacks now for a while, but the core question was never really settled: “Why does this happen?” and it’s never sat well with me. I was recently tasked with trying to sort this mess out once and for all, so I’ve now done quite a bit of research and testing related to the problem. I’ve learned a lot, and decided a big blog entry might be the best way to share it. Enjoy.

There was a lot of discussion and some work went into adding the relatively new swappiness tunable a few years ago, and I think that may have solved some of the original problems, but at around the same time the basic architecture of the machine changed to NUMA, which I think introduced some new problems, with the very same symptoms, masking the original fix.

Contrasting the SMP/UMA and NUMA architectures

The SMP/UMA architecture


The SMP, or UMA architecture, simplified

When the PC world first got multiple processors, they were all arranged with equal access to all of the memory in the system. This is called Symmetric Multi-processing (SMP), or sometimes Uniform Memory Architecture (UMA, especially in contrast to NUMA). In the past few years this architecture has been largely phased out between physical socketed processors, but is still alive and well today within a single processor with multiple cores: all cores have equal access to the memory bank.

The NUMA architecture


The NUMA architecture, simplified

The new architecture for multiple processors, starting with AMD’s Opteron and Intel’s Nehalem2 processors (we’ll call these “modern PC CPUs”), is a Non-Uniform Memory Access (NUMA) architecture, or more correctly Cache-Coherent NUMA (ccNUMA). In this architecture, each processor has a “local” bank of memory, to which it has much closer (lower latency) access. The whole system may still operate as one unit, and all memory is basically accessible from everywhere, but at a potentially higher latency and lower performance.

Fundamentally, some memory locations (“local” ones) are faster, that is, cost less to access, than other locations (“remote” ones attached to other processors). For a more detailed discussion of NUMA implementation and its support in Linux, see Ulrich Drepper’s article on LWN.net.

How Linux handles a NUMA system

Linux automatically understands when it’s running on a NUMA architecture system and does a few things:

  1. Enumerates the hardware to understand the physical layout.
  2. Divides the processors (not cores) into “nodes”. With modern PC processors, this means one node per physical processor, regardless of the number of cores present.
  3. Attaches each memory module in the system to the node for the processor it is local to.
  4. Collects cost information about inter-node communication (“distance” between nodes).

You can see how Linux enumerated your system’s NUMA layout using the numactl --hardware command:

# numactl --hardware
available: 2 nodes (0-1)
node 0 size: 32276 MB
node 0 free: 26856 MB
node 1 size: 32320 MB
node 1 free: 26897 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

This tells you a few important things:

  • The number of nodes, and their node numbers — In this case there are two nodes numbered “0” and “1”.
  • The amount of memory available within each node — This machine has 64GB of memory total, and two physical (quad core) CPUs, so it has 32GB in each node. Note that the sizes aren’t exactly half of 64GB, and aren’t exactly equal, due to some memory being stolen from each node for whatever internal purposes the kernel has in mind.
  • The “distance” between nodes — This is a representation of the cost of accessing memory located in (for example) Node 0 from Node 1. In this case, Linux claims a distance of “10” for local memory and “21” for non-local memory.

How NUMA changes things for Linux

Technically, as long as everything runs just fine, there’s no reason that being UMA or NUMA should change how things work at the OS level. However, if you’re to get the best possible performance (and indeed in some cases with extreme performance differences for non-local NUMA access, any performance at all) some additional work has to be done, directly dealing with the internals of NUMA. Linux does the following things which might be unexpected if you think of CPUs and memory as black boxes:

  • Each process and thread inherits, from its parent, a NUMA policy. The inherited policy can be modified on a per-thread basis, and it defines the CPUs and even individual cores the process is allowed to be scheduled on, where it should be allocated memory from, and how strict to be about those two decisions.
  • Each thread is initially allocated a “preferred” node to run on. The thread can be run elsewhere (if policy allows), but the scheduler attempts to ensure that it is always run on the preferred node.
  • Memory allocated for the process is allocated on a particular node, by default “current”, which means the same node as the thread is preferred to run on. On UMA/SMP architectures all memory was treated equally, and had the same cost, but now the system has to think a bit about where it comes from, because accessing non-local memory has implications on performance and may cause cache coherency delays.
  • Memory allocations made on one node will not be moved to another node, regardless of system needs. Once memory is allocated on a node, it will stay there.

The NUMA policy of any process can be changed, with broad-reaching effects, very simply using numactl as a wrapper for the program. With a bit of additional work, it can be fine-tuned in detail by linking in libnuma and writing some code yourself to manage the policy. Some interesting things that can be done simply with the numactl wrapper are:

  • Allocate memory with a particular policy:
    • locally on the “current” node — using --localalloc, and also the default mode
    • preferably on a particular node, but elsewhere if necessary — using --preferred=node
    • always on a particular node or set of nodes — using --membind=nodes
    • interleaved, that is, spread evenly round-robin across all or a set of nodes — using --interleaved=all or --interleaved=nodes
  • Run the program on a particular node or set of nodes, in this case that means physical CPUs (--cpunodebind=nodes) or on a particular core or set of cores (--physcpubind=cpus).

What NUMA means for MySQL and InnoDB

InnoDB, and really, nearly all database servers (such as Oracle), present an atypical workload (from the point of view of the majority of installations) to Linux: a single large multi-threaded process which consumes nearly all of the system’s memory and should be expected to consume as much of the rest of the system resources as possible.

In a NUMA-based system, where the memory is divided into multiple nodes, how the system should handle this is not necessarily straightforward. The default behavior of the system is to allocate memory in the same node as a thread is scheduled to run on, and this works well for small amounts of memory, but when you want to allocate more than half of the system memory it’s no longer physically possible to even do it in a single NUMA node: In a two-node system, only 50% of the memory is in each node. Additionally, since many different queries will be running at the same time, on both processors, neither individual processor necessarily has preferential access to any particular part of memory needed by a particular query.

It turns out that this seems to matter in one very important way. Using /proc/pid/numa_maps we can see all of the allocations made by mysqld, and some interesting information about them. If you look for a really big number in the anon=size, you can pretty easily find the buffer pool (which will consume more than 51GB of memory for the 48GB that it has been configured to use) [line-wrapped for clarity]:

2aaaaad3e000 default anon=13240527 dirty=13223315 
  swapcache=3440324 active=13202235 N0=7865429 N1=5375098

The fields being shown here are:

  • 2aaaaad3e000 — The virtual address of the memory region. Ignore this other than the fact that it’s a unique ID for this piece of memory.
  • default — The NUMA policy in use for this region.
  • anon=number — The number of anonymous pages mapped.
  • dirty=number — The number of pages that are dirty because they have been modified. Generally memory allocated only within a single process is always going to be used, and thus dirty, but if a process forks it may have many copy-on-write pages mapped that are not dirty.
  • swapcache=number — The number of pages swapped out but unmodified since they were swapped out, and thus they are ready to be freed if needed, but are still in memory at the moment.
  • active=number — The number of pages on the “active list”; if this field is shown, some memory is inactive (anon minus active) which means it may be paged out by the swapper soon.
  • N0=number and N1=number — The number of pages allocated on Node 0 and Node 1, respectively.

The entire numa_maps can be quickly summarized by the a simple script numa-maps-summary.pl, which I’ve written while analyzing this problem:

N0        :      7983584 ( 30.45 GB)
N1        :      5440464 ( 20.75 GB)
active    :     13406601 ( 51.14 GB)
anon      :     13422697 ( 51.20 GB)
dirty     :     13407242 ( 51.14 GB)
mapmax    :          977 (  0.00 GB)
mapped    :         1377 (  0.01 GB)
swapcache :      3619780 ( 13.81 GB)

An couple of interesting and somewhat unexpected things pop out to me:

  1. The sheer imbalance in how much memory is allocated in Node 0 versus Node 1. This is actually absolutely normal per the default policy. Using the default NUMA policy, memory was preferentially allocated in Node 0, but Node 1 was used as a last resort.
  2. The sheer amount of memory allocated in Node 0. This is absolutely critical — Node 0 is out of free memory! It only contains about 32GB of memory in total, and it has allocated a single large chunk of more than 30GB to InnoDB’s buffer pool. A few other smaller allocations to other processes finish it off, and suddenly it has no memory free, and isn’t even caching anything.

The memory allocated by MySQL looks something like this:


Allocating memory severely imbalanced, preferring Node 0

Due to Node 0 being completely exhausted of free memory, even though the system has plenty of free memory overall (over 10GB has been used for caches) it is entirely on Node 1. If any process scheduled on Node 0 needs local memory for anything, it will cause some of the already-allocated memory to be swapped out in order to free up some Node 0 pages. Even though there is free memory on Node 1, the Linux kernel in many circumstances (which admittedly I don’t totally understand3) prefers to page out Node 0 memory rather than free some of the cache on Node 1 and use that memory. Of course the paging is far more expensive than non-local memory access ever would be.

A small change, to big effect

An easy solution to this is to interleave the allocated memory. It is possible to do this using numactl as described above:

# numactl --interleave all command

We can use this with MySQL by making a one-line change to mysqld_safe, adding the following line (after cmd="$NOHUP_NICENESS"), which prefixes the command to start mysqld with a call to numactl:

cmd="/usr/bin/numactl --interleave all $cmd"

Now, when MySQL needs memory it will allocate it interleaved across all nodes, effectively balancing the amount of memory allocated in each node. This will leave some free memory in each node, allowing the Linux kernel to cache data on both nodes, thus allowing memory to be easily freed on either node just by freeing caches (as it’s supposed to work) rather than paging.

Performance regression testing has been done comparing the two scenarios (default local plus spillover allocation versus interleaved allocation) using the DBT2 benchmark, and found that performance in the nominal case is identical. This is expected. The breakthrough comes in that: In all cases where swap use could be triggered in a repeatable fashion, the system no longer swaps!

You can now see from the numa_maps that all allocated memory has been spread evenly across Node 0 and Node 1:

2aaaaad3e000 interleave=0-1 anon=13359067 dirty=13359067 
  N0=6679535 N1=6679532

And the summary looks like this:

N0        :      6814756 ( 26.00 GB)
N1        :      6816444 ( 26.00 GB)
anon      :     13629853 ( 51.99 GB)
dirty     :     13629853 ( 51.99 GB)
mapmax    :          296 (  0.00 GB)
mapped    :         1384 (  0.01 GB)

In graphical terms, the allocation of all memory within mysqld has been made in a balanced way:


Allocating memory balanced (interleaved) across nodes

An aside on zone_reclaim_mode

The zone_reclaim_mode tunable in /proc/sys/vm can be used to fine-tune memory reclamation policies in a NUMA system. Subject to some clarifications from the linux-mm mailing list, it doesn’t seem to help in this case.

An even better solution?

It occurred to me (and was backed up by the linux-mm mailing list) that there is probably further room for optimization, although I haven’t done any testing so far. Interleaving all allocations is a pretty big hammer, and while it does solve this problem, I wonder if an even better solution would be to intelligently manage the fact that this is a NUMA architecture, using the libnuma library. Some thoughts that come to mind are:

  • Spread the buffer pool across all nodes intelligently in large chunks, or by index, rather than round-robin per page.
  • Keep the allocation policy for normal query threads to “local” so their memory isn’t interleaved across both nodes. I think interleaved allocation could cause slightly worse performance for some queries which would use a substantial amount of local memory (such as for large queries, temporary tables, or sorts), but I haven’t tested this.
  • Managing I/O in and out to/from the buffer pool using threads that will only be scheduled on the same node that the memory they will use is allocated on (this is a rather complex optimization).
  • Re-schedule simpler query threads (many PK lookups, etc.) on nodes with local access to the data they need. Move them actively when necessary, rather than keeping them on the same node. (I don’t know if the cost of the switch makes up for this, but it could be trivial if the buffer pool were organized by index onto separate nodes.)

I have no idea if any of the above would really show practical benefits in a real-world system, but I’d love to hear any comments or ideas.

Update 1: Changed the link for “Rik van Riel on the LKML — A few answers and proposal of the Split-LRU patch.” to be a bit closer to my intention. The old link points to the message that started the thread, the new link points to the index of the messages in the thread.

Update 2: Added a link above provided by Kevin Closson about Oracle on NUMA systems.

Update 3: I should have included a warning about numa_maps. Simon Mudd notes correctly that reading the /proc/pid/numa_maps file stalls the pid process to generate the data it provides. It should be used carefully in production systems against mysqld as connection and query stalls will occur while it is reading. Do not monitor it minutely.

– – –

1 Using free shows some memory free and lots of cache in use, and totalling up the resident set sizes from ps or top shows that the running processes don’t need more memory than is available.

2 An article in Dr. Dobb’s Journal titled A Deeper Look Inside Intel QuickPath Interconnect gives pretty good high level coverage. Intel published a paper entitled Performance Analysis Guide for Intel® CoreTM i7 Processor and Intel® XeonTM 5500 processors which is quite good for understanding the internals of NUMA and QPI on Intel’s Nehalem series of processors.

3 I started a thread on the linux-mm mailing list related to MySQL on NUMA, and there are two other threads related on zone_reclaim_mode and on swapping.

Indirection of SHOW commands between MySQL 4.1 and 5.1

Today, while helping someone find some of the code that implements SHOW VARIABLES I was reminded at how insanely complex it all got with the information_schema implementation and the changes to the SHOW commands that came with it. Specifically, I was trying to find exactly how the SHOW VARIABLES output gets populated — what the source data is.

By way of example, I present the code to implement it in 5.1.51 (the new way), and in 4.1.21 (the old way). I, at least, have a much easier time sorting out the old way. They both start with the same basic entry point, in the parser, in show: within sql/sql_yacc.yy.

SHOW VARIABLES in MySQL 5.1.51

sql/sql_yacc.yy

The show token just does some initialization.

10062 show:
10063           SHOW
10064           {
10065             LEX *lex=Lex;
10066             lex->wild=0;
10067             lex->lock_option= TL_READ;
10068             mysql_init_select(lex);
10069             lex->current_select->parsing_place= SELECT_LIST;
10070             bzero((char*) &lex->create_info,sizeof(lex->create_info));
10071           }
10072           show_param
10073           {}
10074         ;

The command-specific work is done in show_param, but prepare_schema_table just sets up essentially a temporary table.

10076 show_param:
   ...
10273         | opt_var_type  VARIABLES wild_and_where
10274           {
10275             LEX *lex= Lex;
10276             lex->sql_command= SQLCOM_SHOW_VARIABLES;
10277             lex->option_type= $1;
10278             if (prepare_schema_table(YYTHD, lex, 0, SCH_VARIABLES))
10279               MYSQL_YYABORT;
10280           }

sql/sql_parse.cc

Another hoop…

1754 int prepare_schema_table(THD *thd, LEX *lex, Table_ident *table_ident,
1755                          enum enum_schema_tables schema_table_idx)
1756 {
   ...
1849   if (make_schema_select(thd, select_lex, schema_table_idx))
1850   {
1851     DBUG_RETURN(1);
1852   }

sql/sql_show.cc

Through a few hoops, and sel->add_table_to_list(…) adds the new temporary table to the select list.

6076 int make_schema_select(THD *thd, SELECT_LEX *sel,
6077                        enum enum_schema_tables schema_table_idx)
6078 {
6079   ST_SCHEMA_TABLE *schema_table= get_schema_table(schema_table_idx);
6080   LEX_STRING db, table;
6081   DBUG_ENTER("make_schema_select");
6082   DBUG_PRINT("enter", ("mysql_schema_select: %s", schema_table->table_name));
   ...
6087   thd->make_lex_string(&db, INFORMATION_SCHEMA_NAME.str,
6088                        INFORMATION_SCHEMA_NAME.length, 0);
6089   thd->make_lex_string(&table, schema_table->table_name,
6090                        strlen(schema_table->table_name), 0);
6091   if (schema_table->old_format(thd, schema_table) ||
6092       !sel->add_table_to_list(thd, new Table_ident(thd, db, table, 0),
6093                               0, 0, TL_READ))
6094   {
6095     DBUG_RETURN(1);
6096   }
6097   DBUG_RETURN(0);
6098 }

At this point an internal query has basically been built to do:

SELECT * FROM INFORMATION_SCHEMA.VARIABLES

So now that table needs to be populated, but you’ll need to happen to just know where to look to find the rest of the code…

sql/sql_show.cc

The schema_tables array lists all of the information_schema tables and how they are each populated.

6805 ST_SCHEMA_TABLE schema_tables[]=
6806 {
   ...
6879   {"VARIABLES", variables_fields_info, create_schema_table, fill_variables,
6880    make_old_format, 0, 0, -1, 1, 0},
   ...
6884   {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
6885 };

The schema_tables array says fill_variables can take care of it…

5441 int fill_variables(THD *thd, TABLE_LIST *tables, COND *cond)
5442 {
   ...
5458   res= show_status_array(thd, wild, enumerate_sys_vars(thd, sorted_vars),
5459                          option_type, NULL, "", tables->table, upper_case_names, cond);
   ...
5461   DBUG_RETURN(res);
5462 }

And then show_status_array just knows how to return a SHOW_VAR *, the real work happens in enumerate_sys_vars.

sql/set_var.cc

And finally the SHOW_VAR *result can be populated from the in-memory structures, in this case the system_variable_hash.

3346 SHOW_VAR* enumerate_sys_vars(THD *thd, bool sorted)
3347 {
3348   int count= system_variable_hash.records, i;
3349   int size= sizeof(SHOW_VAR) * (count + 1);
3350   SHOW_VAR *result= (SHOW_VAR*) thd->alloc(size);
3351 
3352   if (result)
3353   {
3354     SHOW_VAR *show= result;
3355 
3356     for (i= 0; i < count; i++)
3357     {
3358       sys_var *var= (sys_var*) hash_element(&system_variable_hash, i);
3359       show->name= var->name;
3360       show->value= (char*) var;
3361       show->type= SHOW_SYS;
3362       show++;
3363     }
3364 
3365     /* sort into order */
3366     if (sorted)
3367       my_qsort(result, count, sizeof(SHOW_VAR),
3368                (qsort_cmp) show_cmp);
3369 
3370     /* make last element empty */
3371     bzero(show, sizeof(SHOW_VAR));
3372   }
3373   return result;
3374 }

SHOW VARIABLES in MySQL 4.1.24

sql/sql_yacc.yy

The same basic entry point.

4422 show:   SHOW
4423         {
4424           LEX *lex=Lex;
4425           lex->wild=0;
4426           bzero((char*) &lex->create_info,sizeof(lex->create_info));
4427         }
4428         show_param
4429         {}
4430         ;

The SQLCOM_SHOW_VARIABLES is saved for later use, denoting what kind of SHOW this query is.

4432 show_param:
   ...
4531         | opt_var_type VARIABLES wild
4532           {
4533             THD *thd= YYTHD;
4534             thd->lex->sql_command= SQLCOM_SHOW_VARIABLES;
4535             thd->lex->option_type= (enum_var_type) $1;
4536           }

sql/sql_parse.cc

In mysql_execute_command a mega switch statement decides what function will handle this query.

1992 void
1993 mysql_execute_command(THD *thd)
1994 {
   ...
3260   case SQLCOM_SHOW_VARIABLES:
3261     res= mysqld_show(thd, (lex->wild ? lex->wild->ptr() : NullS),
3262                      init_vars, lex->option_type,
3263                      &LOCK_global_system_variables);
3264     break;

sql/sql_show.cc

And the magic happens here, in a single function.

1815 int mysqld_show(THD *thd, const char *wild, show_var_st *variables,
1816                 enum enum_var_type value_type,
1817                 pthread_mutex_t *mutex)
1818 {
1819   char buff[1024];
1820   List<item> field_list;
1821   Protocol *protocol= thd->protocol;
1822   LEX_STRING null_lex_str;
1823   DBUG_ENTER("mysqld_show");
1824 
1825   field_list.push_back(new Item_empty_string("Variable_name",30));
1826   field_list.push_back(new Item_empty_string("Value",256));
1827   if (protocol->send_fields(&field_list,1))
1828     DBUG_RETURN(1);
1829   null_lex_str.str= 0;
1830   null_lex_str.length= 0;
1831 
1832   pthread_mutex_lock(mutex);
1833   for (; variables->name; variables++)
1834   {
   ...
2128   }
2129   pthread_mutex_unlock(mutex);
2130   send_eof(thd);
2131   DBUG_RETURN(0);
2132 
2133  err:
2134   pthread_mutex_unlock(mutex);
2135   DBUG_RETURN(1);
2136 }

Much simpler, in my opinion.

Want to host the MySQL mirror/archive?

For a few years now Proven Scaling has been hosting the MySQL mirror/archive at mirror.provenscaling.com, which provides mainly older MySQL releases (which have been removed from MySQL’s mirror long ago) and MySQL Enterprise releases for the last few years.

Since Proven Scaling has been winding down (and we’ve started paying the bills personally), it doesn’t really make sense for us to maintain as large of a colocation footprint as we have. We’re looking to shift things around, and the mirror is something that’s fairly painful for us to host with a small footprint. We’re hoping another MySQL community member with an already-large datacenter/bandwidth footprint will want to pick it up, where it won’t affect their bottom line so much.

I think there’s still some value in having these files be available, so I’m reluctant to just shut it down (but if no one is interested, that may happen).

Some basic stats about the mirror (for July, 2010):

  • Data size: ~305G
  • Hits: 232,242
  • Transfer/mo: ~703G (this is pretty variable, and was ~1.23TB in May)
  • Transfer/day: ~22G avg, ~71G max

I’d be happy to provide file listings and Webalizer stats to interested parties. I will also note that a lot of the data transfer seems to be junk traffic from Asia, although I’m not sure what can be done to minimize this. Ideally we would arrange to rsync the full data set to a server somewhere (or provide ssh access for you to do the same), and you’d maintain the mirror as necessary going forward. We’re happy also to point mirror.provenscaling.com at the new home.

If you’re interested, email me at jeremy@jcole.us.

Speaking at OSCON 2010

I am speaking in a few days at OSCON 2010 in Portland, Oregon. My talk, “MySQL Bottleneck Hunters – How Schooner Increased MySQL Performance by 8x” is about how Schooner optimized MySQL to run on modern hardware with flash memory. Come on by if you’re at OSCON!

Here’s the abstract:

MySQL Bottleneck Hunters – How Schooner Increased MySQL Performance by 8x — Thursday, July 22, 2010 at 11:30am

MySQL users have an insatiable need for speed, capacity, and availability, all at a reasonable cost. This session will provide technical overview of the approach that Schooner engineering took to optimize MySQL Enterprise and InnoDB with flash memory, multi-core processors, and DRAM to achieve an 8x improvement in performance relative to existing systems.

This session will provide a crash course in the performance tuning tools and techniques that Schooner used to radically improve MySQL performance on commodity hardware. It will also offer an overview of the results that were achieved, and compare these results to other commonly deployed MySQL server architectures, including stock MySQL on SSDs and Fusion-io, using the industry-standard DBT2 benchmark.