Progress in MySQL Process List

Today I had a sort of short epiphany regarding getting progress of running statements in MySQL. MySQL already keeps a running count of rows touched in most multi-row statements (called thd->row_count1), so I figured there must be a way to make use of it. It was trivial to expose row_count through SHOW PROCESSLIST. After that, it was fairly obvious that another variable could be added: row_count_expected. For certain statements (currently only ALTER TABLE) it is easy to estimate how many rows will be touched, so that number can be used to calculate a Progress_percent column.

The Progress_rows number indicates progress within a given step of executing the query. For instance, if you run a SELECT with a GROUP BY that can’t be done using an index, you will see two cycles of Progress_rows: once with a State of “Copying to tmp table” and once with “Sending data”.

I implemented this all in a small patch to MySQL 5.0 (and backported to MySQL 4.1) which produces the following output from SHOW FULL PROCESSLIST:

mysql> show full processlist G
*************************** 1. row ***************************
              Id: 1
            User: jcole
            Host: localhost
              db: test
         Command: Query
            Time: 3
           State: copy to tmp table
            Info: alter table sclot__a type=myisam
   Progress_rows: 44141
Progress_percent: 76.09

This was really way, way too easy. Hopefully it can be one with MySQL Community soon.

1 Note that currently thd->row_count is a 32-bit unsigned integer, so it will wrap at about 4.2 billion rows. Someone should really think about fixing this. :)

Making connections more manageable

For the past few weeks off and on, as part of Proven Scaling‘s project to improve the MySQL server, I’ve been helping Joel Seligstein to really dig into the MySQL source code and add some features, in preparation for a much bigger feature coming up (more on that at a future date). He has now finished three smaller projects that have been on Proven Scaling’s and my own to-do list for quite some time: SET CONNECTION STATUS status, KILL connection_id WITH QUERY query_id, and SHOW ... FOR CONNECTION connection_id.1

SET CONNECTION STATUS

This patch adds a new SET CONNECTION STATUS status command, which allows each session to set a status which will be shown in a new Status column in SHOW FULL PROCESSLIST.

This allows a administrators to gain a bit more insight into complex multi-tier architectures, by having essentially a comment for each database connection. In SET CONNECTION STATUS, the status argument may be a complex expression, so CONCAT() and other string manipulations may be used. The current connection status may be retrieved with CONNECTION_STATUS(). Some things which I could imagine putting in the connection status are:

  • in pool — See at a glance which connections are idle in the connection pool, and which are checked out.
  • GET /foo.php — Easily see what request each connection is responsible for.
  • apache pid 672 — Allow you to easily correlate activity on a given server with activity on MySQL, without having to use netstat and friends to track things down.

Of course, there are many more creative people than me to figure out things to do with this useful feature!

KILL thread_id WITH QUERY query_id

In order to make this command useful, the query_id first had to be exposed in SHOW FULL PROCESSLIST. When WITH QUERY query_id is specified in a KILL command, KILL checks that the connection is still executing the query_id you’ve specified (and locks to ensure that it does not start a new query) before killing it. This solves a well-known race condition between SHOW PROCESSLIST and KILL, where the connection may have moved on to a potentially dangerous query to kill, such as a non-transactional UPDATE.

SHOW ... FOR CONNECTION connection_id

This patch extends the SHOW VARIABLES and SHOW STATUS commands with a FOR CONNECTION connection_id clause, which allows a user with either the same credentials as the connection, or with the SUPER privilege to view the connection’s status and variables.

All of these features will be great for debugging production systems, where it can be difficult or impossible to get any insight into what is happening at any given moment in time.

Thanks, Joel, for the hard work!

1 All three patches are against 5.0.26.

On Triggers, Stored Procedures, and Call Stacks

If you’re a frequent reader, you might have noticed that I’m on a roll contributing MySQL patches by now… there are many more to come. This is part of the reason that I founded Proven Scaling — to be able to spend my time solving interesting problems and making MySQL better. So what about triggers, stored procedures, and call stacks?

I’ve written a patch that implements three new functions in MySQL 5.01:

  • CALLER(level) — Returns the SQL statement in the call stack at level, where 0 is the level containing the call to CALLER() itself (which is nearly useless), and 1 and above are any stored procedure or trigger calls that got us here.
  • CALLER_DEPTH() — Returns the current depth of the call stack, not counting 0. CALLER(CALLER_DEPTH()) will always return the top-most level (i.e. the user-generated command).
  • CALLER_WS(separator, level) — Returns the entire call stack starting at level, with each level separated by separator.

What are these good for? Quite a bit! For a few examples:

  • Debugging — From any place in your stored procedure code, you can now find out how exactly you got there, no matter how complex the code is. In the future, this could be augmented by allowing you to see the values of the parameters at each step as well.
  • Auditing — This is much more interesting for most people, and while the current implementation isn’t perfect2, it’s getting a lot closer to what people need for auditing purposes. Using triggers and CALLER(), USER(), and NOW() you can get most of the auditing information you may need.

Now, for some examples of the output from these new functions:

Just a test of CALLER() and CALLER_DEPTH():

mysql> select caller(0);
+------------------+
| caller(0)        |
+------------------+
| select caller(0) | 
+------------------+
1 row in set (0.00 sec)

mysql> select caller_depth();
+----------------+
| caller_depth() |
+----------------+
|              0 | 
+----------------+
1 row in set (0.00 sec)

In order to get anything interesting, we need to generate some depth of the call stack:

delimiter //
create procedure stack() begin select caller_ws(' <called by> ', 1); end //
create procedure ts1() begin call stack(); end //
create procedure ts2() begin call ts1(); end //
create procedure ts3() begin call ts2(); end //

Now we can try it out for real:

mysql> call ts2()//
+------------------------------------------------------------+
| caller_ws(' <called by> ', 1)                              |
+------------------------------------------------------------+
| call stack() <called by> call ts1() <called by> call ts2() | 
+------------------------------------------------------------+
1 row in set (0.00 sec)

As an example of what you can do inside stored procedures:

DROP PROCEDURE IF EXISTS test.dump_stack //
CREATE DEFINER='root'@'localhost' PROCEDURE test.dump_stack ()
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE caller_depth INT DEFAULT 0;
  SET caller_depth = CALLER_DEPTH();

  CREATE TEMPORARY TABLE test_stack_t (
    depth INT NOT NULL,
    query TEXT, 
    PRIMARY KEY (depth)
  );
  WHILE i <= caller_depth
  DO
    INSERT INTO test_stack_t (depth, query) VALUES (i, caller(i+1));
    SET i = i + 1;
  END WHILE;
  SELECT depth, query FROM test_stack_t ORDER BY depth DESC;
  DROP TEMPORARY TABLE test_stack_t;
END
//

We can then redefine stack() to call dump_stack() instead, so that we can use the same tsX() procedures above:

drop procedure if exists stack //
create procedure stack() begin call dump_stack(); end //

And the same test:

mysql> call ts2()//
+-------+--------------+
| depth | query        |
+-------+--------------+
|     3 | call ts2()   | 
|     2 | call ts1()   | 
|     1 | call stack() | 
+-------+--------------+
3 rows in set (0.00 sec)

With a little additional magic, we can use CALLER() for auditing2:

DROP TABLE IF EXISTS test.audit //
CREATE TABLE test.audit (
  id INT NOT NULL auto_increment,
  ts DATETIME,
  user CHAR(64), 
  call_stack TEXT, 
  PRIMARY KEY (id), 
  INDEX (ts), 
  INDEX (user)
)
//

DROP PROCEDURE IF EXISTS test.audit //
CREATE DEFINER='root'@'localhost' PROCEDURE test.audit ()
BEGIN
  INSERT INTO test.audit (ts, user, call_stack)
  VALUES (now(), user(), caller_ws(' <called by> ', 2));
END
//

DROP TABLE IF EXISTS test.test //
CREATE TABLE test.test (
  c CHAR(50)
)
//

CREATE DEFINER='root'@'localhost' TRIGGER test_test_b_i
BEFORE INSERT ON test.test 
FOR EACH ROW CALL test.audit()
//

CREATE DEFINER='root'@'localhost' PROCEDURE addtest(in in_c char(50))
BEGIN
  INSERT INTO test.test (c) VALUES (in_c);
END
//

And, let’s try it out:

mysql> INSERT INTO test.test (c) VALUES ('jeremy') //
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM test.audit //
+----+---------------------+---------------+---------------------------------------------+
| id | ts                  | user          | call_stack                                  |
+----+---------------------+---------------+---------------------------------------------+
|  4 | 2006-10-01 18:07:39 | root@dhcp-100 | INSERT INTO test.test (c) VALUES ('jeremy') |
+----+---------------------+---------------+---------------------------------------------+
1 row in set (0.00 sec)

mysql> CALL addtest('monty') //
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM test.audit //
+----+---------------------+---------------+---------------------------------------------------------------------------+
| id | ts                  | user          | call_stack                                                                |
+----+---------------------+---------------+---------------------------------------------------------------------------+
|  4 | 2006-10-01 18:07:39 | root@dhcp-100 | INSERT INTO test.test (c) VALUES ('jeremy')                               |
|  5 | 2006-10-01 18:11:26 | root@dhcp-100 | INSERT INTO test.test (c) VALUES (in_c) <called by> CALL addtest('monty') |
+----+---------------------+---------------+---------------------------------------------------------------------------+
2 rows in set (0.01 sec)

Let me know what you think!

1 Specifically mysql-5.0.25-nightly-20060823.

2 Currently, triggers are missing three important features to make them really useful for auditing: a way to find the call stack (which this patch addresses), the ability to have multiple triggers per type per table (currently MySQL only allows one trigger per table per type [before update, after insert, etc.]), and per-statement triggers (currently MySQL only has per-row triggers).

Followup: On IPs, hostnames, and MySQL

Back in April, I wrote On IPs, hostnames, and MySQL, which described the (sometimes surprising) ways in which MySQL deals with IP addresses, hostnames, and privileges, as well as some basics about the host cache itself. In a footnote to that post, I mentioned a patch I had written against MySQL 4.1 to give some more visibility into the host cache.

Over the past two days, I have worked on porting that patch to MySQL 5.01, 2, and making some fairly large improvements to it. The patch implements a few things:

  1. Configurable Size — Without the patch, the size of the cache is fixed at 128 entries, and can only be changed by changing a #define and recompiling. You may now tune the size of the host cache using SET GLOBAL host_cache_size=X.
  2. Metrics — You may now see how well the cache is performing using SHOW GLOBAL STATUS LIKE 'Host%'. (See example below.)
  3. Display — You may now see the complete contents of the cache using SHOW HOST CACHE. (See example below.)

The above new features should give MySQL DBAs much better insight into what’s happening in their MySQL server, in an area where there has historically been very little information and a lot of misunderstandings.

Now, for the examples:

SHOW GLOBAL STATUS LIKE 'Host%'

mysql> show status like 'host%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Host_cache_free    | 127   | 
| Host_cache_hits    | 2     | 
| Host_cache_inserts | 1     | 
| Host_cache_misses  | 1     | 
| Host_cache_prunes  | 0     | 
| Host_cache_used    | 1     | 
+--------------------+-------+
6 rows in set (0.01 sec)

SHOW HOST CACHE

mysql> show host cache;
+-----------+----------+--------+------+
| IP        | Hostname | Errors | Hits |
+-----------+----------+--------+------+
| 10.0.0.22 | hiriko   |      0 |    2 | 
+-----------+----------+--------+------+
1 row in set (0.00 sec)

Enjoy!

1 Hopefully this will be accepted into MySQL 5.0. Brian?

2 The patch is against mysql-5.0.25-nightly-20060823 specifically.