Saturday, May 12, 2007

MySQL: Add primary key to table with duplicates

Maybe this is obvious, but I post it anyway, just to remind myself should I need it again.

Recently I had to change a table that I had not completely thought through when I first created it. The structure was so simple, I did not think I could do anything wrong with it:

CREATE TABLE `parent` (
  `par_id` bigint(20) NOT NULL,
  `somevalue` varchar(20) default NULL,
  PRIMARY KEY  (`par_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `child` (
  `x_parid` bigint(20) default NULL,
  `value` bigint(10) default NULL,
  KEY `fk_parid` (`x_parid`),
  CONSTRAINT `child_ibfk_1` FOREIGN KEY (`x_parid`) REFERENCES `parent` (`par_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

There is a 1:0..* relationship between parent and child. Some sample data:

mysql> select * from parent;
+--------+--------------+
| par_id | somevalue    |
+--------+--------------+
|      1 | Parent No. 1 | 
|      2 | Parent No. 2 | 
|      3 | Parent No. 3 | 
+--------+--------------+
3 rows in set (0.00 sec)

mysql> select * from child;
+---------+-------+
| x_parid | value |
+---------+-------+
|       1 |    10 | 
|       1 |    12 | 
|       1 |    15 | 
|       2 |    25 | 
|       2 |    26 | 
|       2 |    26 | 
|       3 |    31 | 
|       3 |    31 | 
|       3 |    31 | 
+---------+-------+
9 rows in set (0.00 sec)

Clearly it is possible (and intended) that there can be several children with references to the same parent, even if the have equal values. Inserting data using my application was no problem. However when I tried to read back a parent including its children, I got an error from the persistence framework (proprietary 3rd party) that there was a probable inconsistency in the data, because there were several identical children. Turns out, the thing needs a primary key on every table to manage it in memory.

So how do you add a primary key to the children table when it already has data? First attempt:

mysql> ALTER TABLE child
    ->   ADD COLUMN child_id BIGINT(20) NOT NULL FIRST,
    ->   ADD PRIMARY KEY(child_id);
ERROR 1062 (23000): Duplicate entry '0' for key 1

Of course, adding a column fills it with either NULL or with the data types default value, in this case 0. Declaring it as a primary key then ought to fail, because all 9 rows got the default value.

So we need a some initial distinct values for the column (in the application the persistence layer will provide generated keys once there is primary key column). For this the auto increment feature comes in handy:

mysql> ALTER TABLE child
    ->   ADD COLUMN child_id BIGINT(20) AUTO_INCREMENT NOT NULL FIRST,
    ->   ADD PRIMARY KEY(child_id);
Query OK, 9 rows affected (0.06 sec)
Records: 9  Duplicates: 0  Warnings: 0

mysql> select * from child;
+----------+---------+-------+
| child_id | x_parid | value |
+----------+---------+-------+
|        1 |       1 |    10 | 
|        2 |       1 |    12 | 
|        3 |       1 |    15 | 
|        4 |       2 |    25 | 
|        5 |       2 |    26 | 
|        6 |       2 |    26 | 
|        7 |       3 |    31 | 
|        8 |       3 |    31 | 
|        9 |       3 |    31 | 
+----------+---------+-------+
9 rows in set (0.00 sec)

Because we do not need the auto incrementing anymore, we can remove it again:

mysql> ALTER TABLE child MODIFY COLUMN child_id BIGINT(20)  NOT NULL;
Query OK, 9 rows affected (0.08 sec)
Records: 9  Duplicates: 0  Warnings: 0

Unfortunately this cannot be combined into a single ALTER TABLE statement, because the parser seems to first check if the statement's parts are valid, which is not the case for the last MODIFY COLUMN segment as the child_id column does not exist at that time.

18 comments:

Tudor said...

Hello,

Good post. Helped me quite a bit.


With respect,
Tudor

Scott Donnelly said...

Many thanks, you helped me out a treat.

Scotty

ChrisF said...

Thanks, that helped me a lot. I was having problems with Devart's MySQL entity framework support and that's exactly what I needed.

Anonymous said...

Hi Daniel,

I have a slightly different problem but it falls under the primary key issue: my table perfume has three columns (including perfume_id as primary key) and is currently empty. I am trying to insert values into the column perfume_brand but I am getting an error message that says duplicate entry for 0 for key primary. How am I suppose to populate this table so that the perfume brand values get assigned a primary key? thanks

Daniel Schneller said...

Make sure your primary key column is defined as "Auto Increment". Otherwise it is up to you to provide unique key values. For the first record the default value 0 will be inserted if you do not tell the database anything else. But for the second record this default value will not be valid as it is already taken.

PĂȘle-MĂȘle said...

Thanks . Got it, I recreated a table and set the ID as auto_increment

Anti-Dan said...

Worst advice ever, still cannot change the primary key for the table due to duplicates, by following your advice.

Daniel Schneller said...

<irony>Jeez, Blogger must have swallowed the useful part of your comment...</irony>
Maybe if you had presented a little more detail one could be of help, but hey, chance gone. Won't feed the troll any more.

randy said...

I've used this page 4 times so far :-) Thank you, and please leave the page up for the next 4 times I use it!

Manjunath.S said...

hi,
plz tell me how clear the mysql promt

like
mysql> ;;;;;;;;;;;,,,,,,,,,,,,,
mysql> ;;;;;;;;;;;,,,,,,,,,,,,,
mysql> ;;;;;;;;;;;,,
mysql> how to we clear the prompt
how we get new promt without closing.
like
mysql>

Diederick said...

Awesome. Thanks!

Anonymous said...

Thanks, it helps me !

Anonymous said...

no shit you cant use a primary key if you have duplicate id numbers, anti dan....

wouldnt that kind of defeat the purpose of a primary key?

Allison C said...

Thanks. This helped me.

Anonymous said...

its really really really really really really really really helpful

Anonymous said...

Thanks a lot Daniel. Your post helped me a lot!

Nicolas

Aza D. Oberman said...

I'm getting the dreaded "Duplicate entry for key 'PRIMARY'" when initially populating a freshly created table using MySQL under DBI:

CREATE TABLE `adminFiles` (
`Identifier` varchar(35) NOT NULL,
`admin_email` varchar(40) DEFAULT NULL,
`admin_phone` char(15) DEFAULT NULL,
`Password` varchar(75) DEFAULT NULL COMMENT 'use RickRack.cgi',
`Identification` varchar(40) DEFAULT NULL,
`region` char(20) NOT NULL,
`temp_entry` int(1) DEFAULT NULL,
PRIMARY KEY (`Identifier`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

After preparing "INSERT INTO adminFiles (Identifier, admin_email, admin_phone, Password, Identification, region, temp_entry) VALUES (?, ?, ?, ?, ?, ?, ?)" my very first attempt to add the following generated the "Duplicate entry for key 'PRIMARY'" error:

sth->execute('Ubermiller', 'admin@gon.com', '2022335734', 'c5c36t', 'Admin', 'DCnd', '0');

Deleting the entire database and creating a new one doesn't help.

This PRIMARY key is not an integer that can be autoincremented -- the "fix" most favored here.

It seems far fetched but could the ORDINAL_POSITION of the columns be in play here?

Guru said...

Thanks