MySQL- Remove Duplicate Rows
I’ve been working with meduim sized tables (2.1million rows) in MySQL
lately. One particular table had a lot of duplicate rows, which I
needed to filter out. I’m quickly going to demonstrate how I did it. I’m
sure there are many ways of doing this, but this method proved to be
the easiest for me.
STEP 1: Copy table structure
I used the following code to duplicate the table structure of `postcodes` (if you use PhpMyAdmin, you can use the shortcuts):
That’s it. All the rows with a unique postcode will get inserted into the table “postcodes2″.
The key is in the SELECT query, where the postcode field is grouped together. That effectively puts all the duplicate postcodes together, generating a result of rows with unique postcodes.
This example is slightly different, look at the data:
The previous example simply filtered the data by duplicate postcodes. As you can see, the highlighted row has different longitude and latitude values. So if you want to filter the data by unique rows, you need the following query:
Again, the key is the SELECT query. This time the query groups all the fields together (using the CONCAT function), to return rows with completely unique values.
Example 1: Removing rows with a specific duplicate field:
As you can see, the field “postcode” has duplicated rows.STEP 1: Copy table structure
I used the following code to duplicate the table structure of `postcodes` (if you use PhpMyAdmin, you can use the shortcuts):
1 2 3 4 5 | CREATE TABLE `DB_NAME`.`postcodes2` ( `postcodenospace` varchar( 10 ) NOT NULL , `longitude` varchar( 15 ) NOT NULL , `latitude` varchar( 15 ) NOT NULL ) |
STEP 2: run query:
1 2 3 4 5 6 7 8 9 10 | INSERT INTO postcodes2 ( postcode, longitude, latitude ) SELECT postcode,longitude,latitude FROM `postcodes` GROUP BY postcode |
The result
The key is in the SELECT query, where the postcode field is grouped together. That effectively puts all the duplicate postcodes together, generating a result of rows with unique postcodes.
Example 2: Removing duplicate rows:
STEP 1: Copy table structure
Use the code above to copy the table structure.This example is slightly different, look at the data:
The previous example simply filtered the data by duplicate postcodes. As you can see, the highlighted row has different longitude and latitude values. So if you want to filter the data by unique rows, you need the following query:
STEP 2: Run the query
1 2 3 4 5 6 7 8 9 10 | INSERT INTO postcodes2 ( postcode, longitude, latitude ) SELECT postcode,longitude,latitude FROM `postcodes` GROUP BY concat(postcode,longitude,latitude) |
The result
Again, the key is the SELECT query. This time the query groups all the fields together (using the CONCAT function), to return rows with completely unique values.
0 comments:
Post a Comment