Progress in MySQL Process List

Today I had a sort of short epiphany regarding getting progress of running statements in MySQL. MySQL already keeps a running count of rows touched in most multi-row statements (called thd->row_count1), so I figured there must be a way to make use of it. It was trivial to expose row_count through SHOW PROCESSLIST. After that, it was fairly obvious that another variable could be added: row_count_expected. For certain statements (currently only ALTER TABLE) it is easy to estimate how many rows will be touched, so that number can be used to calculate a Progress_percent column.

The Progress_rows number indicates progress within a given step of executing the query. For instance, if you run a SELECT with a GROUP BY that can’t be done using an index, you will see two cycles of Progress_rows: once with a State of “Copying to tmp table” and once with “Sending data”.

I implemented this all in a small patch to MySQL 5.0 (and backported to MySQL 4.1) which produces the following output from SHOW FULL PROCESSLIST:

mysql> show full processlist G
*************************** 1. row ***************************
              Id: 1
            User: jcole
            Host: localhost
              db: test
         Command: Query
            Time: 3
           State: copy to tmp table
            Info: alter table sclot__a type=myisam
   Progress_rows: 44141
Progress_percent: 76.09

This was really way, way too easy. Hopefully it can be one with MySQL Community soon.

1 Note that currently thd->row_count is a 32-bit unsigned integer, so it will wrap at about 4.2 billion rows. Someone should really think about fixing this. :)

Corrupted relay logs?

I just opened MySQL Bug #26123, to attempt to find out how many people are seeing this possible replication bug. A few Proven Scaling customers have seen the same bug, and I haven’t been able to reproduce it, so I opened a bug as a feeler. It appears to have something to do with using BLOB or TEXT fields in replication.

Are you seeing slaves stop with corrupted relay logs? Does restarting replication using CHANGE MASTER and the Exec_Master_Log_Pos from the stopped slave1 work just fine? Do the master’s binary logs look perfectly OK? Leave a comment on the bug.

1 This effectively forces the slave to re-download the exact same log events that it currently has in its relay logs. Since the corruption appears to happen either in the master’s slave thread, or the slave’s replication IO thread, this gets things going again.

MySQL Meetup Silicon Valley now at Google

For the past year I’ve been running the MySQL Meetup Silicon Valley, and it’s been fun. We normally either have open discussion, or a scheduled topic. I often present something. Starting with the February 12 Meetup we will be meeting at Google HQ in Mountain View, CA. (Thanks, Google!)

Feel free to come on down or up to Mountain View and hang out with us on the second Monday of each month! Wanna speak at one of the Meetups? Let me know!

Projection support in libmygis 0.7

I’ve just recently released a new version of libmygis, a library for dealing with various GIS formats. Its main purpose is importing ESRI Shapefile data into MySQL’s GIS, but it is useful for much more.

There are many new small features in libmygis 0.7, but the biggest new feature is projection support (and automatic re-projection) via the PROJ.4 cartographic library. With support for projections and the ability to read Shapefile PRJ files, libmygis is getting much closer to having full support for the Shapefile format.  This means you can easily import Shapefiles in any projection into MySQL and deal with it in pure lat/lon, which is what you’ll need in order to interface with outside tools such as Google Maps API.

Making connections more manageable

For the past few weeks off and on, as part of Proven Scaling‘s project to improve the MySQL server, I’ve been helping Joel Seligstein to really dig into the MySQL source code and add some features, in preparation for a much bigger feature coming up (more on that at a future date). He has now finished three smaller projects that have been on Proven Scaling’s and my own to-do list for quite some time: SET CONNECTION STATUS status, KILL connection_id WITH QUERY query_id, and SHOW ... FOR CONNECTION connection_id.1

SET CONNECTION STATUS

This patch adds a new SET CONNECTION STATUS status command, which allows each session to set a status which will be shown in a new Status column in SHOW FULL PROCESSLIST.

This allows a administrators to gain a bit more insight into complex multi-tier architectures, by having essentially a comment for each database connection. In SET CONNECTION STATUS, the status argument may be a complex expression, so CONCAT() and other string manipulations may be used. The current connection status may be retrieved with CONNECTION_STATUS(). Some things which I could imagine putting in the connection status are:

  • in pool — See at a glance which connections are idle in the connection pool, and which are checked out.
  • GET /foo.php — Easily see what request each connection is responsible for.
  • apache pid 672 — Allow you to easily correlate activity on a given server with activity on MySQL, without having to use netstat and friends to track things down.

Of course, there are many more creative people than me to figure out things to do with this useful feature!

KILL thread_id WITH QUERY query_id

In order to make this command useful, the query_id first had to be exposed in SHOW FULL PROCESSLIST. When WITH QUERY query_id is specified in a KILL command, KILL checks that the connection is still executing the query_id you’ve specified (and locks to ensure that it does not start a new query) before killing it. This solves a well-known race condition between SHOW PROCESSLIST and KILL, where the connection may have moved on to a potentially dangerous query to kill, such as a non-transactional UPDATE.

SHOW ... FOR CONNECTION connection_id

This patch extends the SHOW VARIABLES and SHOW STATUS commands with a FOR CONNECTION connection_id clause, which allows a user with either the same credentials as the connection, or with the SUPER privilege to view the connection’s status and variables.

All of these features will be great for debugging production systems, where it can be difficult or impossible to get any insight into what is happening at any given moment in time.

Thanks, Joel, for the hard work!

1 All three patches are against 5.0.26.