Thursday, March 27, 2008

Follow-up: Strange InnoDB Locking Behavior

On March 4th I wrote about an observation we made concerning InnoDB locking behavior in combination with foreign key constraints.

Since then we have been in contact with MySQL support and they in turn with the InnoDB developers. One of the conclusion I can already draw from the whole thing is that another level of indirection (InnoDB being a part of Oracle, i. e. an external resource to MySQL, too) does not tend to speed things up...

But be that as it is, here is a more thorough explanation we got about how the more aggressive locking comes to pass in case a foreign key constraint references a parent table's primary key. As the original response from the support staff spans several emails and includes some internal stuff I will paraphrase to protect the innocent (i. e. me ;-)).

First - just to remind myself - some notes on how the foreign key checks work. For any update you do to a child table, the DBMS needs to make sure that you cannot concurrently remove the parent table's row the FK refers to. Otherwise you could end up with a request to roll back the transaction that was intended to change a child row and be left without a suitable parent value. To do so, locks are placed on the index records being referenced by a child table's foreign key.

In the example transaction 1 locks a row in the parent table to update it. It does so by locking the primary key. It uses an X (exclusive) lock on the particular value of the PK for one row in this table. An exclusive lock blocks access to the entire row, because the primary key is a clustered index in InnoDB. That means the key itself is part of the record - in contrast to non-clustered indexes where a copy of index columns' values is placed in a location outside the actual records. So regardless of whether or not the key values themselves are changed, the record is locked completely.

Again: Because of InnoDB's structure all primary key locks are X locks, as the key itself is part of the row. In contrast to that an S (shared) lock only locks entries in a secondary index, which are stored outside the actual records.

Now, transaction 2 intends to update a record in a dependent table. It has to take a lock on the foreign key there whose reference key is the primary key in the parent table. However the exact row the child record refers to in the parent table has already been locked by transaction 1, hence operation 2 has to wait.

Transaction 3 needs to change a column in a row belonging to a child table, too. This column has a foreign key to the parent table, but not one whose parent is a primary key. It instead references a secondary key of the parent table. That means transaction 3 does not have to wait, because it does not require X lock on the entire row in the parent table. It only requires an S lock on the secondary key. As no other transaction has taken such a lock on that index entry, transaction 3 can proceed.

While all this explains perfectly why we experience a locking situation in one case, but not in the other, it still leaves the question open on how to go on from there.

Basically the one conclusion that can be drawn here already is this:

Either you get undesirable locks when foreign keys reference their parent tables' primary keys, our you have to define superfluous indexes that cost disk space and slow down write operations. In some cases you cannot even choose, because the optimizer might still decide to go for the primary key, even though you defined a secondary one, depending on the specific conditions for any given statement.

The latter part of that is especially true if you go for the - usually suggested - short primary keys in InnoDB tables, e. g. single numeric key columns. While those are generally a good idea especially for InnoDB they work against you in the scenario at hand.

Right now, the only suggestion we got from the MySQL folks is to rather go for less indexes, because the cost they introduce is usually significant in contrast to the locking problems. This is hard to quantify however and may depend heavily on your application's access patterns. Making matters still worse is InnoDB's current (5.0) limitation that it will lock all rows that are examined while executing any particular DML statement. Only in 5.1 only rows that get actually changed will be locked, alleviating the situation to some degree.

There might still be something coming here, however right now I do not think there is a big chance of this being changed any time soon. The one thing I will ask for however is to improve the documentation on foreign keys to have people know about this stuff in advance.


Sheeri K. Cabral said...

From the original post, I thought the foreign key on the *non-primary* index was a problem.....

Also, I would imagine it would be much faster to insert a foreign key value that references the primary key on another table, than to reference a non-unique key on another table...

So I'm confused....can you give a 1-2 paragraph summary of the problem and the outcome? I don't see how you can use fewer indexes, unless of course you mean "reference the primary key instead of the other key".

david said...

...go for less indexes, because the cost they introduce is usually significant in contrast to the locking problems....Making matters still worse is InnoDB's current (5.0) limitation that it will lock all rows that are examined while executing any particular DML statement

The less indexes you have the more rows need to be examinated and thus more rows are being locked? Right? So this solution by MySQL is not really an option, is it?

Daniel Schneller said...

The problem came about when we _removed_ the formerly present but logically superfluous secondary indexes (all their columns were also covered by the primary key).
As long as those had been in place, we did not see locking problems, because they got locked with S-locks, but not the PK/record with Xs.
Once we dropped them and only kept the PK (retargeting any foreign keys to that, of course) the locking situations arose for the reasons told.

You are right, this is not really an optimal advice. However I omitted the part that suggested changing our DML statements to that only the bare minimum of rows gets touched per transaction. So right now we have to decide which route to go:

1) change the application and try to prevent the locks as far as possible (unlikely, very expensive)

2) Keep up the logically unneeded indexes to prevent the newly encountered locking problems, but also live with the performance (space, time) penalty (more likely, event though not great either)

In our special case I do not think we would end up with more rows being scanned, because the non-PKs we dropped were fully covered by the primary keys columns, but locking those is bad enough already...

From what I hear, 5.1 will only lock rows that are going to be actually changed, but swapping out the database engine underneath a production app is nothing for the faint of heart (i. e. currently an absolute no-go).

Dan said...

Thanks for your post, Daniel. I was trying to debug a similar locking situation involving foreign keys.

To help see what's going on under the covers, I'm using Innotop, which is proving to be invaluable.

You say that a parent row (referenced by a child's foreign key) is locked on UPDATE, but in my testing, I've found that this only happens on INSERT, according to Innotop.

Could you verify this? Thanks -Dan