"Streaming" MySQL slave deployment
In August 2006 I made a post about the MySQL blackhole engine and its possible use for replication scenarios.
Back then the primary focus was to reduce the amount of data being transferred between master and possible many slaves during normal business. This speeds up processing on the slaves, because they have considerably less data to first store to their relay logs and then immediately discard again.
What still takes a lot of time is the initial setup of a slave system. This is because there is a potentially large SQL based dump file to be executed on maybe relatively low-end systems (e. g. Celeron CPUs and single IDE hard drives). While this can be made to work reliably with a set of scripts to automate the process of downloading a zipped SQL file from a local server and running it against the clients' databases, it still is a lengthy process. One setup I know about needs somewhere in the vicinity of 30 minutes to set up a slave. So I was looking for a way to speed it up.
The solution is based on the fact that you can quite easily copy MySQLs data files (including InnoDB table space files) from one machine to another. So my idea was to first create a ready-to-go set of slave data files and transfer them to multiple slaves. Usually those files will be bigger than an SQL file previously used, but no parsing and processing is required on the target system. This makes the whole thing mostly network and linear disk I/O bound.
For the impatient: I was able to get the setup time down to around 3 minutes on the same hardware. Read on if you want to know how.
The basic idea behind this concept is to relieve the individual slaves from importing SQL files themselves. As their hardware is rather slow this can be very time consuming. So why not have a single "power-slave" - one that has more powerful hardware - handle the process of importing and then provide any number of identical slaves with its data files. This will reduce the burden of the other slaves to unpacking.
While this does not really save anything in terms of bytes that need to be written to each slave's disk, the access pattern is much more sensible. The following table compares the disk transfers for a "regular" SQL import from local disk and the proposed alternative for a 60MB gzipped SQL file which will lead to approximately 2GB of InnoDB table space files:
Regular Import | Prepared data files |
---|---|
linear write 60MB download to local disk | Download 60MB, directly streamed to linearly written 2GB data files |
linear write 2GB initial creation of InnoDB data files | n/a |
linear read 60MB SQL.gz, interleaved with random write 2GB to data files | n/a |
4120 MB total bytes read/written randomly | 2GB linear write |
Importing an SQL file from the local hard disk means
there are permanent seeks between the current position in the SQL
text file and the server's data files. Moreover as the database
schema may define lots of indexes there is even more random disk
write activity when executing simple INSERT
s.
In contrast to that unpacking ready-made InnoDB table spaces (or MyISAM table files for that matter) is basically just linear writing. Even simple IDE drives can handle that pretty fast these days.
How to...
As the process of creating a suitable SQL dump is rather simple I will start from there. These are the steps taken in sequence:
Set up a fresh temporary MySQL daemon with a configuration similar to the clients
Dump the data from the master with the
--master-data
option and feed it into the temporary serverShutdown the temporary server and compress its data
Transfer the archive to the slaves and unpack
Adapt the slaves config files and make some minor adjustments
Run the slaves and let them connect and catch up with the master
I will go into a little more detail on the points above now.
Temporary Daemon
While we do not need all of the slaves to process the SQL dump individually, of course it still needs to be executed at least once. So what we do is set up a MySQL daemon with a stripped down configuration that is close to the actual slaves'. Close in this case means all parameters that affect the storage files must be matched to the slaves, because the files must be usable there later.
Anything concerning buffers sizes etc. can be different, of course.
Every time you want to prepare a new slave installation "image", the temporary daemon should to be started with an empty data directory. This is particularly important for InnoDB data files as those are a little more sensitive to moving around. While it is not strictly necessary to delete the table space and transaction log files every time I prefer to do so, because it allows for better compression rates later if you do not need all the available space.
When using InnoDB (as I usually do) there are two strategies how to configure the size of the table space:
Create the files with a size that is definitely sufficient to hold all the data you import from the master and a little reserve. To use this approach either requires you to know somewhat precisely what amount of data they will have to hold to not oversize them too much. This may (but need not) include data you know will be added later by replication.
Create the files close to the size that will be needed - maybe a little more to prevent the need of autoextension. In this case you should nevertheless specify the last data file to be autoextending. Otherwise the process of importing the SQL data may lead to filling the table space prematurely. Especially when used in an automated process this may be undesirable.
I usually use the second approach. The table space I define is normally enough, but sometimes - due to fluctuations in the live data - it needs to be autoextended somewhat. I recommend configuring InnoDB to add larger chunks to the autoexending file if needed (default: 8MB). As extending the files is associated with some overhead I normally use 50MB or 100MB. The bigger this is, the less often InnoDB will have to extend the file. See the manual section on InnoDB configuration for more info on this.
Dump master data
Once you have the temporary daemon running, use the mysqldump
tool
with the --master-data
and --single-transaction
options
to create a dump of the database(s) you need to be replicated. In order to
save time and disk space you may find it useful to pipe the output directly
through the mysql
command line client and feed it into the target
server. As I usually filter out tables I do not need on the slave, the dump file would not be
good for backup purposes anyway.
Shutdown & Compress
Now shutdown the temporary server. Compress the data directory. Depending on how you want to configure permissions you may include or exclude the "mysql" schema. I usually have the temporary server set up with as low permissions as possible and do not move the "mysql" schema along.
To allow for my approach on the slaves you cannot use the ZIP
format. Because I do not want to first download and then uncompress
the archive on the slaves to save disk I/O, the format of a ZIP archive is unsuitable.
It contains a list of files included at the end, so piping it through a decompression
program on the fly will not work. Instead I use a gzipped tarball. This allows to download and pipe the data stream through gunzip
before directing it to disk.
Transfer to a slave and uncompress
This is where it becomes interesting. Up to now we have merely set up a mysql daemon and imported a dump into it. Tarring up the data dir was not very special either.
On the slave I suggest curl
as a download tool. I have not tried with
wget
, but that might work as well. However it is important that the tool you
choose be able to output the downloaded file directly to standard out. Otherwise it could not be be piped through the tar
command. With
curl
that is quite simple, because it is the default behavior. It
also handles files larger than 2GB which wget
has some problems with,
if I remember correctly. The command line should look similar to this:
curl http://the.server/mysql_data.tgz | ( cd /the/target/datadir ; tar xzf - )
Curl will download the file and pipe it to tar
to
decompress. Because tar
cannot unpack to anywhere but the
current directory you may either first switch to the designated
target directory and then download or - as I do - invoke tar in a
sub-shell that first switches there. This has the advantage of not having to jump around the filesystem in the main script handling slave setup.
You will find that on a local area network downloading and unpacking will be considerably faster than having MySQL first create the empty data files and then import the SQL for the reasons stated above.
Adjust slave configuration
When the data files have reached their destination on the slave,
you may need to adjust the slave settings a little. This especially
depends on whether you copied fixed size data files - in which case
you can prepare the config file in advance - or used the autoextend option
on the last table space file. As the latter case applies to me
I wrote a little script that takes a template my.cnf
file and
replaces some placeholders via sed
. One of those is the size
of the last InnoDB data file from the archive. It will become a fixed size file
on the slave. Another file will then be added at the first slave start.
E. g. if the archive from the temporary daemon contains two
files ibdata1:1024M;ibdata2:1024M:autoextend
I use a template line
like ibdata1:1024M;ibdata2:#IBD2SIZE#M;ibdata3:512M:autoextend
and
replace the #IBD2SIZE#
part with the actual size of the file in the
archive. This could be done even more dynamically by building the
whole line via script, so you do not have to know in advance how many
files and of which size you get from the master.
Be careful not to use the du
command to determine the file size to be used in
the config file. Because du
shows the actual amount of space taken on the disk this will usually differ from the actual file size. This is what I use:
cp my.cnf.tmpl /etc/my.cnf ibd2size = $( wc -c ibdata2 | awk " { print $1 / 1024 / 1024 } " ) sed -i -e "s/#IBD2SIZE#/${ibd2size}/" /etc/my.cnf
First the the template gets copied to the place the daemon startup scripts will expect it to be (be sure the slave server is not running).
Second wc -c
tells you the number of characters/bytes the ibdata2
file really takes. awk
then just divides this number to get to
megabytes which is needed for the config file.
Finally the placeholder is substituted with the number just stored
to the ibd2size
variable.
See 14.2.7. Adding and Removing InnoDB Data and Log Files in the manual for more details on this process.
One more thing needs to be addressed only if one of the machines
is run on Windows and others on some flavor of Unix. Taking a look
at the relay-log.index
and relay-log.info
files reveals that when
created on a Windows master they will contain backslashes. This
prevents a Unix mysqld to initialize replication properly. So if you
deal with such a situation you will need to correct those, e. g.
using sed
again:
sed -i -e "s/\\/\//g" relay-log.in*
If master and slaves are running on the same platform this is not required.
Once you did all that have a final look at the file permissions, especially on Unix systems. Make sure the daemon will be able to access the data files.
You can now start the slave daemon. In case you configured it appropriately it should come up and await connections.
Connect to master
One last thing that needs to be done is read the master's current
binlog file name and position from the master.info
file. This is
needed, because once the slave server has been started you will need
to provide correct credentials for the replication user. Moreover you
must tell the slave which master host to connect to. Unfortunately
when issuing a CHANGE MASTER TO
statement on the slave which defines
a master hostname all information about previous master binlogs and
the position to where they have already been applied is discarded. From the manual:
If you specify MASTER_HOST or MASTER_PORT, the slave assumes that the master server is different from before (even if you specify a host or port value that is the same as the current value.) In this case, the old values for the master binary log name and position are considered no longer applicable, so if you do not specify MASTER_LOG_FILE and MASTER_LOG_POS in the statement, MASTER_LOG_FILE='' and MASTER_LOG_POS=4 are silently appended to it.
(see manual chapter 13.6.2.1. CHANGE MASTER TO Syntax)
So you will need to tell the slave again where to begin replicating.
One possible solution is to read the contents of the master.info
file that was brought along with the data files into an array and inject
the values into the statement:
arr = ( $(cat master.info) ) mysql -e "CHANGE MASTER TO master_host='the.master.server', master_user='replication_user', master_password='the_password', master_log_file='${arr[2]}', master_log_pos=${arr[3]}"
The format of the master.info
file is described in
the MySQL manual.
Final checks
This should be it. As soon as you issue a START SLAVE
statement, the
slave should connect to the master and begin to catch up whatever has
happened since the point in time when the dump was taken. In case SHOW SLAVE STATUS
tells you that the replication is not running, have a look at the daemon's error file to
get a hint about what's going on.
Comments
example to unzip in foo directory:
tar -zxvf emacs-21.3.tar.gz -C foo