Idea: A “system” localization for MySQL

Currently the English error messages are embedded in all of the tests in MySQL. This means that you can’t really update the English translations without breaking a bunch of tests. I’m not sure if there’s a standard way to fix this, but it occurs to me that it would be quite easy to have a “system” localization which just prints a language-neutral version of the error, meaning that any version of it can be updated without breaking any tests.

For example, the following simple syntax error gives a message in English:

mysql> select foo;
ERROR 1054 (42S22): Unknown column 'foo' in 'field list'

This is based on the following definition in the errmsg-utf8.txt file:

        eng "Unknown column '%-.192s' in '%-.192s'"

In a test case this might be codified as:

--echo # Test that ER_BAD_FIELD_ERROR works.

The --error allows the error to be ignored (as an expected error); however the text of the English error message still ends up in the test result file:

# Test that ER_BAD_FIELD_ERROR works.
ERROR 42S22: Unknown column 'foo' in 'field list'

Changing the text of the message even in a trivial way (fixing a typo) will cause the test to fail due to a mismatch on the error message string, since the result files are just compared as text when running tests:

main.test_message                        [ fail ]
        Test ended at 2013-04-08 17:29:30

CURRENT_TEST: main.test_message
--- mysql-test/r/test_message.result	2013-04-09 03:26:27.516721785 +0300
+++ mysql-test/r/test_message.reject	2013-04-09 03:29:30.360718783 +0300
@@ -1,3 +1,3 @@
 # Test that ER_BAD_FIELD_ERROR works.
 SELECT foo;
-ERROR 42S22: Unknown column 'foo' in 'field list'
+ERROR 42S22: Unknown column 'foo' found in 'field list'

mysqltest: Result length mismatch

A sys “language” could easily be added, however:

        sys "ER_BAD_FIELD_ERROR({%-.192s}, {%-.192s})"
        eng "Unknown column '%-.192s' in '%-.192s'"

Ideally, these could of course be auto-generated based on all the context present already. When running with this localization the same error would result in:

mysql> select foo;
ERROR 1054 (42S22): ER_BAD_FIELD_ERROR({foo}, {field list})

Thus preserving the language-neutrality of the tests and allowing the English versions of them to be tweaked for better readability without breaking the world.

This would of course require one massive commit to fix the tests when changing the language the tests run under to the new “sys” language…

What do you think? How do other systems (especially databases) handle this?

8 thoughts on “Idea: A “system” localization for MySQL

  1. Tests don’t really need to be language neutral, and I’m afraid we would lose valuable test coverage — after all, most users will likely be getting English messages.

    Another approach is to change mysqltest to suppress acceptable variations in error messages.

    Anyway, I think it’s fine to “break” test cases when updating messages, it provides a nice barrier against frivolous changes to error messages.

    PS.: same “problem” applies to thread states and other similar states/messages, quite a few tests rely on these for synchronization.

    • I don’t think you lose actual coverage, unless the test is supposed to test that the exact string in the English error message is output. IMHO, the test should test that the correct error *code* is generated and that the values substituted into the message are the correct values. A synthetic and language neutral string test that just as well, while allowing the English string to be updated much more easily.

      • You lose coverage for the formatting process of English messages. The server formats messages using local buffers and whatnot, and this process might be susceptible to problems such as overflow, truncation, improper encoding, etc, and these have some relation with the message format (which is in a particular language). Although not all languages are tested, these code paths are being somewhat well tested for our main language (English).

        Nonetheless, we could still allow for messages to be more easily updated. Since the problem is that mysqltest displays error messages, we could fix mysqltest to eliminate any ordinary characters and preserve only the output generated by conversion specifiers. This shouldn’t be difficult as mysqltest can easily compare a resulting error message with it’s unformatted form.

  2. Neat idea, but more on the workaround side. I agree with Davi you’ll loose coverage this way. What are you trying to solve with this?

  3. I don’t get the “loos coverage” part? Right now tests can only be run with a certain language setting … and with the proposed change they still can only be run with a certain language setting … it would be “SYS” instead of “english” then, but still all but one language in errmsg.txt wouldn’t have coverage anyway …?

  4. Neat idea, though I don’t think error messages are changed so often that this is a big issue. A problem with this approach is that the results become less readable for humans. When a test fails due to an *unexpected* statement failure, I think you’d prefer to see the English error message rather than this somewhat cryptic “SYS” message.

  5. I’m wondering if your current investigations in localization are caused by someone unhappy with current state of it, or is it just curiousity.

What do you think?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s