Tuesday, March 04, 2008

InnoDB Primary Key and Secondary Index Locking Strangeness

Recently we noticed a strange locking problem within our application. While the previous release worked flawlessly the current version produced a number of Lock Wait Timeouts.

The code itself did not change in the area we experienced the problems, however our persistence framework had been updated to a new release in conjunction with our app.

One notable difference was the schema generation strategy (the schema is generated based on an UML class model). In the previous version primary key columns were always created alphabetically. Because that is not necessarily a desirable thing it always created a secondary index, too, that contained the same columns in a user-specified order. In case you really wanted the alphabetical order you ended up with two identical indexes. Especially for InnoDB this is a very inefficient thing to do. (See my Index Analyzer tool for more info on why.)

Often we ended up with something like this:

CREATE TABLE `parent` (
    `A` char(3) NOT NULL default '',
    `B` char(2) NOT NULL default '',
    `C` decimal(12,2) default NULL,
    PRIMARY KEY (A, B),
    KEY `IDX_P` (B, A)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

In this table the Primary Key and IDX_P are both referring to the same columns, just in a different order.

With the new persistence framework release those secondary indexes are not generated anymore. Instead, the primary key order is used as defined in the object model.

Although this is generally an improvement we appreciate a lot, it turns out this did have a negative side-effect of accidentally introducing a different locking behavior we would not have expected:

create database if not exists testcase;
use testcase;
drop table if exists child1;
drop table if exists child2;
drop table if exists parent;

CREATE TABLE `parent` (
    `A` char(3) NOT NULL default '',
    `B` char(2) NOT NULL default '',
    `C` decimal(12,2) default NULL,
    PRIMARY KEY (A, B),
    KEY `IDX_P` (B, A)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `child1` (
    `X_parent_A` char(3) default NULL,
    `X_parent_B` char(2) default NULL,
    `id` bigint(20) NOT NULL default '0',
    PRIMARY KEY (`id`),
    CONSTRAINT `child1_parent` 
       FOREIGN KEY (X_parent_A, X_parent_B) 
       REFERENCES parent (A, B) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `child2` (
    `X_parent_A` char(3) default NULL,
    `X_parent_B` char(2) default NULL,
    `id` bigint(20) NOT NULL default '0',
    PRIMARY KEY (`id`),
    CONSTRAINT `child2_parent` 
       FOREIGN KEY (X_parent_B, X_parent_A) 
       REFERENCES parent (B, A) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO parent (C,A,B) values (11.95, '111' , '11'); 

This sets up three tables and populates the "parent" table with a single row.

Now we go ahead and try some work on those tables in two separate transactions:

Transaction 1

Note that this transaction has not been committed yet. It makes a change to the single row we have in that table. However it does not change the primary key value!

Now in a second transaction try something with a child table:

lockwait2To me this does not really seem like a difficult task for the database to perform: Insert a new child row, referencing the parent table. Even though there is a concurrent update on that parent row in progress, it does not intend to change the primary key value which means the referential integrity is going to alright, no matter what transaction gets committed first.

In our app this is a very common pattern and we would definitely have noticed if that was something MySQL/InnoDB had problems with.

This is the relevant portion of the InnoDB status output on locks, when the UPDATE statement above has been issued in the first transaction:

inno_lock1

As you can see, there is a lock on the primary key of that single row we are updating.

When we try to insert into the "child1" table, this is what we get regarding locks:

inno_lock2

This notoriously hard to read output (see Xaprb's Maatkit) says that the INSERT INTO "child1" is waiting for a record lock on the master table to be granted. Because this lock is being held by the update transaction which has not been committed yet, the INSERT will eventually time out and report the error we have seen.

So far, this might not be obviously strange. But look at this now ("transaction 3"):

trans3

Compare transactions 2 and 3 carefully. You will see that they are identical in terms of the values. The only difference is that the former inserts into "child1" and the latter into "child2".

Here are the relevant parts from the DDL of tables "child1" and "child2":

CREATE TABLE `child1` ( ...
    CONSTRAINT `child1_parent` ...
        REFERENCES parent (A, B) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `child2` ( ...
    CONSTRAINT `child2_parent` ...
       REFERENCES parent (B, A) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As you can see the foreign key constraint defined in "child1" references the primary key columns in the parent table. "child2" uses the same columns, but in the order they are contained in the secondary "IDX_P" index.

According to the manual you need not reference the primary key in foreign key definitions, but any index that begins with the referenced columns in the right order will do. But obviously there is an important difference: Modifications to all child tables will block in case a referenced record's PK has been locked, even though its value is not being changed, whereas using a secondary index will happily allow you to run both statements concurrently without conflicts.

For completeness' sake we made some more tests, e. g. with a secondary index that's defined in the same order as the primary key. In that case you will get the locking in both transactions, because then InnoDB always uses the Primary Key and hence locks updates in both child tables. And of course you will get a locking situation on the child tables when you are going to modify the primary key value in the parent table.

Currently MySQL/InnoDB are looking into this, because from what I hear from their support staff, they find it curious as well. I'll follow up on this one for sure :)

1 comment:

david said...

Thanks for sharing this! Very interesting! I'm asking myself if MySQL is considering this as a bug or a feature :-)?