Sunday, January 14, 2007

Conditional INSERT with MySQL

Last week I needed to write an update SQL script that would insert some records into a database. However as the script was to be incorporated into a software update that was going to be deployed to all our customers, it needed to check some condition first and see, whether the insert was required at all.

Even though MySQL provides some non-standard SQL enhancement there is no INSERT IF EXISTS kind of statement.

I managed to do it using some temp-tables and a combination of INSERT IGNORE ... SELECT FROM and UPDATE ... WHERE statements.

The following example demonstrates how to insert a new row of data in the table named real_table. The data must only be inserted into this table, if another record in a table named condition_table exists. No change of real_table must occur, if there is no record matching the condition.

Moreover (because I needed this) the example takes take of modifying a possibly present record in real_table that already has the same primary key value as the one going to be inserted (PKvalue).

Now, how are we going to do this. First thing we need is a temporary table that resembles the structure of the real_table. It is going to keep the new row(s) for a moment. MySQL makes it simple to create a new temp table with the CREATE TABLE ... LIKE syntax:

-- create temp table with same structure as the real table
CREATE TEMPORARY TABLE temp_tbl LIKE real_table;

-- insert the new record into the temp table only
INSERT INTO temp_tbl (columnA, columnB, columnC       , columnD       , columnE, columnPK)
              VALUES (      1,       0, 'sampleValueA', 'sampleValueB', 9      , 'PKvalue');

So why is this necessary? Well, as I said before, the INSERT command does not allow any sort of condition. However the SELECT command - and along with that the INSERT ... SELECT - does. Now we can use it to read data from the temp table and along with it do a query on the condition_table. Because we do not know in advance whether the key is already present, the IGNORE option on INSERTwill be used. It will not abort with an error if there is a duplicate key.

As you can see, the WHERE condition does not have anything to do with the data we are going to insert. Instead it just checks if there is a "trigger record" in some other table. If such a record does not exist, the WHERE EXISTS clause will return false and the INSERT will not be done.

-- now try to insert the data into the real table.
-- take into account the external condition that must be met for this 
-- insert to happen in the first place.
-- use "insert ignore" if there may be a chance of the PK being already present
-- and this is not an error
INSERT IGNORE INTO real_table (columnA, columnB, columnC, columnD, columnE, columnPK)
 SELECT columnA, columnB, columnC, columnD, columnE, columnPK FROM temp_tbl
        WHERE EXISTS (SELECT 1 FROM condition_table c 
                              WHERE c.column = 'mustbepresentforinsert');

Now we are at a point where we can know for sure that the primary key value PKvalue exists in real_table. However because it may have been there before and the INSERT IGNORE might just have suppressed the error, we need to make sure, the rest of the record's fields are correct. To do so, we will do an UPDATE statement that is quite similar to the INSERT one:

-- if you used ignore above, the insert might not have done anything
-- because the PK was already present.
-- in this this update is needed to make sure you end up with the
-- new values you wanted to insert.

UPDATE real_table
    SET columnA=1,
        columnB=0,
        columnC='sampleValueA',
        columnD='sampleValueB',
        columnE=9
    WHERE  columnPK='PKvalue'
      AND EXISTS (SELECT 1 FROM condition_table c 
                          WHERE c.column = 'mustbepresentforinsert');

Again there is the same EXISTS clause to make sure we also check the trigger record. However this time it is accompanied by a WHERE columnPK='PKvalue' to make sure we only update the record we (maybe) just inserted.

This is it. Now the data in real_table has been either inserted or updated to the desired values, but even if the condition was met. Finally we should drop the temporary table again:

DROP TABLE temp_tbl;

Just to make sure: you should surround the INSERT and UPDATE statements with a new transaction if you use a storage engine that supports it. The CREATE|DROP TABLEs are DDL statements and are not covered by the transaction mechanism anyway, however they might influence it (e. g. through implicit commits or something like that), so they should be left out.

4 comments:

Kuli said...

Hi Daniel!

Why do you create that temporary table? It would be easier to simply select constant values instead of column fields.

Simply modify your INSERT statement that way:

INSERT IGNORE INTO real_table (columnA, columnB, columnC, columnD, columnE, columnPK)
SELECT 1, 4711, 'Value of C', .... FROM mysql.user
WHERE EXISTS (SELECT 1 FROM condition_table c
WHERE c.column = 'mustbepresentforinsert') LIMIT 1;

Alas there needs to be some table from where you select the constant values, otherwise you can't add a WHERE condition afterwards. Simply take some table which contains at least one row and, for performance reasons, not too many; mysql.user should be a good choice. The LIMIT 1 at the end lets MySQL insert only one row (instead of one per mysql.user contents); a DISTINCT modifier in the constant SELECT statement would do the same.

-Kuli

Daniel Schneller said...

Hi!
Thanks for the suggestion. However I like the temp-table approach a little better, because it allows the insertion of multiple rows if needed. Moreover it can live without any assumptions about records being present in other, completely unrelated tables.
Nevertheless, interesting technique :)

Daniel

Adrian said...

You don't need the other unrelated table either, you just need a single row rowset. You can use a subquery for this. I just needed something similar, the general idea is:

SELECT 'my condional insert value' FROM (SELECT 1) AS AlwaysOneRowSubquery WHERE EXISTS (...);

Did the trick for me.
Adrian

Anonymous said...

Hello,

I found your post highly sophisticated and somewhat difficult to follow in its entirety. Still, great job and very elaborate.
I solved the same problem with this approach:

I wanted to update a table with the values in some other table (external DB). The original table (hereafter Otable) had more entries than the mysql table that I needed to update (hereafter Ntable)
In the Ntable I turned a field that I wanted to remain unchanged into a 'unique' index field. This field actually had the same values as the primary key of the Otable and hence was surely unique.
Then I passed this SQL command:
INSERT IGNORE INTO Ntable VALUES ('', FK ,'OTHER', 'OTHER').
'' is the PK autonumber assigned by the mysql which I did not want to mess up so I bypass it.
FK is the PK of Otable which is passed as a variable with PHP -resulting from an earlier query to the external DB.
OTHER are other fields from Otable that needed to be updated if and only if the FK was not duplicate.

What happens is this: due to the MySQL DBMS restriction enforcing that a unique value cannot be updated and due to the IGNORE statement, this SQL command actually resulted into inserting ONLY the values that where non existent, without returning an error if duplicate entries were discovered, but they are rather surpassed without stopping the insert query. This is it.
The power of simplicity is sometimes overwhelming, as I once again realized!
I hope this helps others avoid a marathon search and test procedure like the one I went through to come to that.

Dimitris Georgiou
Athens-Greece