When setting up MySQL replication there are some things to remember. Although the setup is quite easy if you thoroughly read the documentation on MySQL's developer site you might still hit some issues.
We have quite a large scale replication setup (MySQL 4.1.12) with several hundred slaves. Today we saw a very strange situation: All of the slaves stopped replicating and claimed that a statement had been partially executed on the master side. The exact message was
Query partially completed on the master (error on master: 1053) and was aborted. There is a chance that your master is inconsistent at this point. If you are sure that your master is ok, run this query manually on the slave and then restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;
The error code 1053 which means as much as "server shutdown".
We checked the master and could not find anything unusual. The server had not been shut down at all and nothing seemed wrong with the master replication settings either.
In the end we found out what had caused the problem: Someone had tried to create a dump of the master server using
mysqldump --master-data. This implies a
FLUSH TABLES statement. Because that statement took too long it was aborted using the MySQL
KILL command. However because that statement had already been replicated to the slaves and was now aborted, the slaves took it for "partially executed" (which is usually something bad). The error code you can see on a killed client is 1053. So the slaves decided that something serious had happened and stopped the communication with their master.
I will have to look more closely into the documentation, but probably I will have to file a bug report as the replication should not suffer from this special case.