A constructive start for post-election 2016

[I have copied this content from my original post on Facebook, and have backdated it appropriately to maintain context.]

I don’t normally say much publicly about donating anything, and until lately about politics or social causes, either. I only really occasionally mention my support of loans through Kiva (mainly because it’s a long-running passion of mine). Today is a bit different, though; today, in honor of our democratic process, I thought a number of organizations could use a bit of extra help for the coming months and years.

So, as a constructive start, I’ve donated $1,000 each to the following organizations:

  • Planned Parenthood
  • Center for Reproductive Rights
  • Reproductive Health Access Project
  • RAINN
  • ACLU Nationwide
  • NAACP
  • Southern Poverty Law Center
  • The Young Center for Immigrant Children’s Rights
  • Transgender Law Center
  • Gender Diversity

If you, too, are looking upon the election outcome and wondering how the hell we got to this point of intolerance, overt racism, sexism, misogyny, homophobia, xenophobia, and hate, then I encourage you to find a few organizations that you can support, too, in whatever way you can.

I’m certainly not going to ask anyone to give all your money to the DNC (ugh), or run out into the streets with torches. I certainly don’t believe that a president Trump and Republican majority across the government will instantly be the end of the world, or even that half of the rhetoric upon which Trump’s entire disgusting campaign was based is even serious.

However, I do believe this is a clear signal from a large number of my countrymen that they support divisiveness over inclusion, fear over understanding, spectacle over reason, and hatred over love. And I can’t say I am surprised, but it leaves me dumbfounded and heartbroken. Not for Hillary Clinton’s loss–I could have easily supported any other serious candidate on either side without much regret–but for the loss of the dignity of our country. And, not even the loss, but the recognition of that loss, or the realization that that dignity was a farce all along.

This is not what *I* stand for, it’s not what I want my children to grow up with, and certainly not what I want my grandchildren to face. We can do a hell of a lot better. Why don’t we even bother trying? Why can’t we get past the color of a person’s skin, or the gender of a person as an authority figure, or the religion practiced by another person? How do we have so little compassion for others? Why do we continually act in a self-serving way and abuse others? I struggle to accept that the human race is just incorrigibly terrible, but I don’t see anything materially changing, either. I want more than anything for America to be a positive example to the world, but we just keep proving that we don’t have the courage for it.

While I fully respect the democratic process that allows for a country to be cast in whatever way its constituents decide, I am sure I don’t have the stomach to sit idly by while it is cast in *this* way. The fact that it has even gotten this far gone makes me feel physically ill, and doubly so because I know or am related to so many of the people who have made it so.

So, I won’t stomach it anymore. If you’ve ever claimed “Obama is a muslim” or “but he’s black”; if you’ve called someone a “nigger” or “sand nigger” or “towel head” or “wetback”; if you’ve championed from your sofa to restrict the rights of women or called Hillary a “bitch”, or made any reference to her inability to serve due to being a woman; if you’ve called someone a “fag” or “dyke” or “tranny”; seriously: fuck you.

Yes, I know that includes quite a number of people closely related to me. Sorry, but I am not sorry. This has to stop.

Look inside yourself, ask yourself: Does this make you a better person? Why do you do it? Do you truly believe that you’re just better than these people? Can you defend your ideas and ideals? Can you defend your choices? Can you defend your sense of decency?

Thanks, Oracle, for fixing the stupid and dangerous SET GLOBAL sql_log_bin!

As of MySQL 5.5.41, released on November 28 (last week), Oracle has fixed MySQL Bug 67433, which I filed on October 31, 2012 and wrote about 4 months ago in Stupid and dangerous: SET GLOBAL sql_log_bin. You can see the fix in Launchpad revision 4718.

The MySQL 5.5.41 release notes mention:

Replication: The global scope for the sql_log_bin system variable has been deprecated, and this variable can now be set with session scope only. The statement SET GLOBAL SQL_LOG_BIN now produces an error. It remains possible for now to read the global value of sql_log_bin, but you should act to remove from your applications any dependencies on reading this value, as the ability to do so will be removed in a future MySQL release. (Bug #67433, Bug #15868071)

Thanks for addressing this and making the world’s data (and DBAs’ jobs) a little safer, Oracle! Special thanks to Morgan Tocker (MySQL Community Manager at Oracle) who cared about it enough to keep pushing it through to a reasonable fix/resolution.

Nine years later, Citibank’s fraud department is still self-defeating

Back in September 2005, I wrote Actually, No. about Citibank’s self-defeating fraud department, and the follow-up two months later Reply from Citibank which was just a form letter noting that they had received my complaint.

In that situation, their fraud department would call from an unverifiable number, potentially leaving a voicemail for you, asking you to call them back at an arbitrary unverifiable phone number, where they immediately ask for your account details to verify a transaction. As far as I know they still continue that asinine behavior.

Today, I got a new one: Citibank called to verify a transaction (presumably a large-ish payment I made to a new payee using online bill pay), and the caller asked me to verify receipt of an SMS message (“for security reasons”) by reading him the numeric code it contains, in order to verify they are in fact speaking to me. I told him that I received the message, but I was not comfortable giving him the code, as I could not verify who he is. I explained that he could just as easily be trying to e.g. gain control of my account and needing the verification code to complete a password reset or similar (social engineering me). I told him I would call the number printed on the back of my card, and asked him how to get back to him when doing so. He said he would leave a note on my account.

Of course, when I called back and authenticated to the agent, the agent tried to internally transfer me to the number left in the note (the fraud department), but instead got a message saying the department was closed (open only until 6pm Eastern time) despite someone from that department calling me literally 1 minute prior. It took the customer service agent tracking down the specific employee who left the note, and messaging him, to get me transferred to his personal extension, since going through the department extension wouldn’t work. In the end, it was in fact Citibank’s fraud department calling me, we verified the transaction they wanted to verify, and everything is cool on that end.

However, the bigger issue is that Citibank’s “fraud department” procedures are asinine and extremely self-defeating. I guess it’s time for another complaint to the executive office, and probably another form-letter reply.

Visualizing the impact of ordered vs. random index insertion in InnoDB

[This post refers to innodb_ruby version 0.9.11 as of October 2, 2014.]

Many DBAs know that building indexes in “random” order (or really any order that greatly differs from ordered by key) can be much less efficient. However, it’s often hard to really understand why that is. With the “-illustrate” visualization modes available in innodb_ruby, it’s possible to quite easily visualize the structure of indexes. The space-lsn-age-illustrate mode to innodb_space allows visualization of all pages in a space file by “LSN age”, generating something like a heatmap of the space file based on how recently each page was modified.

(Note that a small Ruby script generate_data_simple.rb was used to generate the test tables used below. You’ll also want to have a reasonable monospace font supporting Unicode block characters correctly, for which I’d heartily recommend Adobe’s Source Code Pro.)

Building an index by insertion in key order

Inserting rows into an index in key order is much more efficient primarily for two reasons:

  1. Pages can be filled completely, and the database (with sufficient optimizations present) can detect the “bulk loading” behavior and efficiently split pages by creating new, empty pages, rather than splitting pages in half.
  2. Only the “edge” of the index is being written to; once a page has filled it will not be revisited. This can make the buffer pool, which caches database pages, much more effective.

Here’s an example of a table with only a PRIMARY KEY clustered index, built by inserting rows in key order:

$ innodb_space -s ibdata1 -T test/t space-lsn-age-illustrate

As you can see in the above image, the index pages are written to in nearly perfect order as they are allocated from the beginning of the file towards the end.

The first extent (the first line of the image) is allocated as a fragment extent, and contains single pages allocated for different purposes. You can clearly see the tablespace bookkeeping pages, FSP_HDR at page 0 and INODE at page 2, immediately followed by the root index page at page 3, all very recently modified. Following that are 32 individually allocated “fragment” pages which were allocated first in the index before it switched to allocating full extents. Then the index pages progress perfectly until the end of the used space.

Also note the LSN age histogram (printed in white above the colored legend at the bottom) shows all buckets equally filled.

Building an index by insertion in random order

Looking at an index built by insertion in key order was reassuring. What if it is built in random order instead? The result is perhaps a bit surprising:

$ innodb_space -s ibdata1 -T test/t_shuffle space-lsn-age-illustrate

Because the rows were inserted in completely random order, every page has an equal chance for insertion. This means in practice that every page is very recently modified, and this is clearly visible by the entire heatmap being purple. What this also means is that the entire table must be continually present in the buffer pool, and if it doesn’t fit, performance will suffer greatly. This is the main reason that performance of insertion in random order is terrible.

Additionally, you can see a few misbehaviors of InnoDB: Note the extents starting at page 1088, 1152, and 1216. Why do you think they look like that? Let me know your ideas in the comments.

Building a primary and secondary index in primary key order

What if you have multiple indexes? Looking at a very simple and typical case, inserting data in key order by the primary key, while a secondary index exists:

$ innodb_space -s ibdata1 -T test/t_index_before space-lsn-age-illustrate

Notice that this looks like the previous cases perfectly interleaved together, because it is exactly that. Since the primary key and secondary index contain completely different data, insertion is ordered by the primary key, but completely mis-ordered by the secondary index, resulting in the secondary index being built inefficiently.

Adding a secondary index to an existing table

The obvious answer then to the previous problem is to add the index after the data has been loaded, and this does in fact result in the expected outcome:

$ innodb_space -s ibdata1 -T test/t_index_after space-lsn-age-illustrate

When the index is built on the existing table (via ALTER TABLE ... ADD INDEX), it is built by scanning and sorting the data before insertion into the index, resulting in an optimal (and very fast) index build.

Visualizing page fill rate

While the above illustrations show how recently each page was modified during the index build, it’s also possible to use space-extents-illustrate to visualize how full each page is. The key-ordered index looks like this:

$ innodb_space -s ibdata1 -T test/t space-extents-illustrate

Compared to the random-ordered index:

$ innodb_space -s ibdata1 -T test/t_shuffle space-extents-illustrate

The random-ordered insertion caused pages to be split more frequently, and in some cases to be severely under-filled, causing a very poor page fill rate on average.

Those of you who are particularly observant will have noticed that the index illustrations in the first pictures above showed that the ordered-insertion index is significantly smaller than the random one. You can see here that the random-insertion index is 41% larger at 1043 pages compared to just 737 pages for the ordered-insertion index. Additionally, 206 more pages are left unused, making the actual disk space usage 57% larger.

Stupid and dangerous: SET GLOBAL sql_log_bin

It’s been almost 4.5 years since, during some code refactoring, it was decided (or accidentally changed?) that sql_log_bin should become a GLOBAL variable as well as a SESSION one. Almost 2 years ago, during MySQL 5.5 upgrades at Twitter, I filed MySQL Bug 67433 describing in detail how stupid and dangerous this change was, and asking for a reversal.

Nothing has been changed or reversed, so SET GLOBAL sql_log_bin continues to be allowed. I continue to directly see, and hear about damage caused by incorrect use of SET GLOBAL sql_log_bin. So, let me describe just how stupid and dangerous it really is.

What really happens when you run SET GLOBAL sql_log_bin?

If you have a master-slave replication configuration, your master will have binary logging enabled (log_bin) and be recording each transaction to its local binary logs, which the slaves read (more or less in real time) in order to replicate. This log must be complete for replication to be useful and trustworthy. If it is incomplete, serious data loss may occur.

It’s been possible for a long time to use SET [SESSION] sql_log_bin=0 within a connection (with a SUPER user) to temporarily disable binary logging of a session. This can be quite useful to be able to run e.g. ALTER TABLE commands without having them immediately replicated to the slaves.

Most people would guess that running the very similar SET GLOBAL sql_log_bin=0 would disable binary logging for all sessions (you know, GLOBAL?) immediately. However, that’s not quite the case.

As background, there are essentially three different scopes of variables in MySQL server:

  • Global-only: The variable only exists, and only makes sense, in a global scope. An example is innodb_buffer_pool_size — there is only one buffer pool, so it makes no sense in any session context. These variables may also be read-only, so that they can’t be changed at runtime.
  • Global and session: The global variable defines the session default, and is copied to the session variable on connect. A user may change the variable within their session if they want, and there may be restrictions on doing so. If the global variable is changed, it will only affect new sessions; existing sessions will keep the variable they copied at session start. A typical example of this is max_allowed_packet.
  • Session-only: There is no global variable associated with the session variable. A user may change the variable only within their session.

The sql_log_bin variable was previously Session-only and with the commit mentioned above, it was made Global and session.

There are a few implications of this:

  • When a session is initiated, it copies its sql_log_bin value from the global scope to its session scope copy.
  • If the DBA issues SET GLOBAL sql_log_bin=0, it only affects new sessions, any already-connected sessions will continue logging. This makes it pretty useless for managing a failover.
  • Changes to the global variable using SET GLOBAL do not even affect the currently connected session; issuing SET GLOBAL sql_log_bin=0 within a session does not prevent the following commands in that session from being binary logged. This is actually often the first sign that something didn’t work as the DBA expected—they suddenly find all slaves to be executing something they thought they had prevented from being replicated.
  • If a connection is made while sql_log_bin=0 and the DBA then issues SET GLOBAL sql_log_bin=1 the sessions which started while it was 0 will continue to not log anything to the binary log for as long as they are allowed to stay connected. This also makes it pretty useless for managing a failover.
  • If the DBA accidentally issues SET GLOBAL sql_log_bin=0 briefly, even for milliseconds, the entire master-slave replication topology may be irreparably damaged and if this is done without immediately noticing it, permanent and severe data loss may occur.

Is anyone putting it to good use?

The only comment that MySQL Bug 67433 has received from Oracle was from Sveta Smirnova, saying:

Regarding to “revert” I don’t agree: with global access it is possible to make a server master online, without bringing it down.

If you Google search for “set global sql_log_bin” you can find plenty of examples of people talking about it in various languages. I couldn’t find a single example of a script or person using or describing its use correctly, and I looked through pages and pages of examples.

So here’s a Hall of Shame instead:

Also notably, the MySQL manual text for sql_log_bin also still does not adequately describe or warn about its behavior.

Your move, Oracle

Do the right thing, and revert this change. We even provided a patch.

Update: Morgan Tocker, MySQL Community Manager at Oracle is requesting feedback on the future of SET GLOBAL sql_log_bin. Go and make your opinion known!