I just opened MySQL Bug #26123, to attempt to find out how many people are seeing this possible replication bug. A few Proven Scaling customers have seen the same bug, and I haven’t been able to reproduce it, so I opened a bug as a feeler. It appears to have something to do with using BLOB or TEXT fields in replication.
Are you seeing slaves stop with corrupted relay logs? Does restarting replication using CHANGE MASTER and the Exec_Master_Log_Pos from the stopped slave1 work just fine? Do the master’s binary logs look perfectly OK? Leave a comment on the bug.
1 This effectively forces the slave to re-download the exact same log events that it currently has in its relay logs. Since the corruption appears to happen either in the master’s slave thread, or the slave’s replication IO thread, this gets things going again.