Thursday, May 9, 2013

Remove Duplicate Rows from a Table in MySQL

You have a table with duplicate rows – somehow a unique index didn’t get created and a bug has added duplicate records to your table. :(

Let us create a scenario:

Create the table:

CREATE TABLE `t1` (`yearcol` varchar(255) default NULL,
                   `firstname` varchar(255) NOT NULL);

Let's add some data now:

insert into t1 values ('2010', 'ABC');

insert into t1 values ('2010', 'ABC');

insert into t1 values ('2010', 'XYZ');

insert into t1 values ('2010', 'PQR');

On doing a simple select query we get the result as:
The highlighted columns shows that there is duplicacy in the table.
select * from t1;

The result generated is:

yearcol firstname
2010 ABC
2010 ABC
2010 XYZ
2010 PQR


Now to remove the duplicates and fix the indexing issue. We will use ALTER TABLE which will help me add the UNIQUE INDEX along with IGNORE  keyword, which will help me fix the duplicates.

The official documentation of MySQL says:
  • IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.         

Now let's apply the UNIQUE INDEX using ALTER TABLE command:
ALTER IGNORE TABLE t1 ADD UNIQUE INDEX dup_idx (year_col, firstname);

We get the output something like:
mysql> ALTER IGNORE TABLE t1 ADD UNIQUE INDEX dup_idx (yearcol, firstname);
Query OK, 4 rows affected (0.14 sec)
Records: 4  Duplicates: 1  Warnings: 0

On doing a simple select query we get the result as:
The highlighted columns shows that the duplicate column is no longer in tabler.
select * from t1;

The result generated is:

yearcol firstname
2010 ABC
2010 XYZ
2010 PQR

And we have achieved the removal of duplicates from the table and also fixed the UNIQUE INDEX in the table to save ourselves from any further future accidents like these.  :)

NOTE:
For some versions of MySQL, this solution does not work. This is basically due to engine being used:

In order to fix the same (Thanks to Kethan for the update):
ALTER TABLE t1 ENGINE MyISAM;
ALTER IGNORE TABLE t1 ADD UNIQUE INDEX dup_idx (yearcol, firstname);
ALTER TABLE t1 ENGINE InnoDB;

9 comments:

  1. I am getting the following error

    0 16:48:58 select * from t1 ALTER Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ALTER' at line 3

    ReplyDelete
    Replies
    1. Hi Kethan,
      I think you are missing the delimiter here.

      Try:
      select * from t1;

      And for altering use: (If you are following the sample)
      ALTER IGNORE TABLE t1 ADD UNIQUE INDEX dup_idx (year_col, firstname);

      Delete
    2. Nope I found the answer after some googling :)

      This was due to a bug in Innodb for some versions of mysql. The workaround is to convert the engine and then back to original.

      ALTER TABLE t1 ENGINE MyISAM;
      ALTER IGNORE TABLE t1 ADD UNIQUE INDEX dupidx (field);
      ALTER TABLE t1 ENGINE InnoDB;

      Delete
    3. Yes, this is a bug with MySQL. Glad you got the answer.

      Delete
    4. Thanks for this tutorial :)

      Delete
    5. Thanks for sharing the error scenario, I have updated the blog accordingly.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete