Special Characters in MySQL Replication

MySQL uses command-based replication. That means that it sends the actual insert, update, or delete command down to the replicated database, instead of sending the data itself. Most database systems ship the actual transaction log, which ensures that the databases stay in sync. But MySQL's choice to use command-based replication causes a few problems.

One of the most insidious problems has to do with characters in text fields. In order to put special characters in a text field, you have to escape them. You can't just put a apostrophe in a text field, because the text itself is set off by apostrophes, or single-quotes. So you have to put a backslash in front of it. That tells the database to take the apostrophe as a literal.

This works fine for the master database. And in fact, JDBC and other database drivers will do this for you when you use prepared statements. However, this fails with MySQL's command-based replication.

In order to replicate the data, MySQL turns the data back into an insert, update, or delete command. In so doing, it fails to properly escape the text. That means that text containing an apostrophe that you properly escaped and inserted into the master will break replication. When the improperly escaped command is sent to the replicated database, it fails, and backs up all commands that come later.

A colegue suggested that we double-escape the text. That is, he preceeded each apostrophe with three slashes, not one. The first two become one slash, and the third escapes the quote. Sure enough, the replicated command was properly escaped. Unfortunately, this led to inconsistent data. The data on the master contained a slash-quote while the data on the replicated database had just a quote. Not a valid solution.

The only solution is to avoid apostrophes and slashes in the data altogether. They must be converted into different characters or dropped from the text. Typical solutions will be to convert apostrophes to ticks and backslashes to forward slashes.

This is not just a problem with text fields. Binary data is treated as text when converted into a command for replication. If that binary happens to contain an apostrophe, then it will have the same problem. The best solution here is to base 64 encode the binary prior to inserting it, then decode it on the way out. The base 64 alphabet contains no special characters, and is therefore safe for replication.

Leave a Reply

You must be logged in to post a comment.