How to Remove Duplicate Records Except a Single Record in MySQL?

Last Updated : 09 Jul, 2024

In MySQL, remove duplicate records while preserving one representative entry using a DELETE statement with a self-join. Match records based on criteria like identical values in a column and delete duplicates based on conditions, such as retaining the record with the lowest ID, ensuring data cleanliness, and reducing redundancy in the database.

Managing data efficiently is crucial in any database system, and dealing with duplicate records is a common challenge. In this article, we will explore a practical approach to remove duplicate records except a single record in MySQL. This process is essential for maintaining data integrity and ensuring that your database contains unique and relevant information.

Eliminating Duplicate Records Except One in MySQL

When working with databases, it's not uncommon to encounter situations where duplicate records need to be cleaned up. The process involves identifying redundant data based on specific criteria and keeping only one instance of each unique record. In MySQL, this can be achieved using a combination of the DELETE statement and self-joins.

Syntax:

The main concept revolves around using a self-join to compare records within the same table and then selectively delete duplicate entries. Here's the basic syntax:

DELETE t1 FROM table_name t1
JOIN table_name t2 ON t1.column_name = t2.column_name AND t1.primary_key > t2.primary_key;

DELETE: Specifies the action to be taken, which is the removal of records.
table_name: The name of the table containing duplicate records.
t1 and t2: Aliases for the same table, enabling a self-join.
column_name: The column used to identify duplicate records.
primary_key: The primary key column of the table, ensuring we keep one instance.

Example: Eliminating Duplicate Records Keeping Lowest ID in MySQL

Duplicate records in a MySQL table can lead to data inconsistencies. This example demonstrates how to remove duplicate records from the 'employee' table, preserving only the records with the lowest ID for each unique name.

Consider a table named employee with columns id, name, and salary. We'll insert some sample data, including duplicate records.

-- Table creation and data insertion
CREATE TABLE employee (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    salary INT
);

INSERT INTO employee (id, name, salary) VALUES
    (1, 'John', 50000),
    (2, 'Alice', 60000),
    (3, 'Bob', 55000),
    (4, 'Alice', 60000),
    (5, 'Charlie', 70000),
    (6, 'John', 50000);

Now, let's use the DELETE statement to remove duplicate records, keeping only the one with the lowest id.

-- Remove duplicate records, keeping the one with the lowest id
DELETE e1 FROM employee e1
JOIN employee e2 ON e1.name = e2.name AND e1.id > e2.id;

Output:

Before deleting the duplicate record:

This shows the initial state of the employee table with duplicate records. Notably, there are entries for 'Alice' and 'John' with the same salary.

AFTER deleting the duplicate record:

Explanation:

This depicts the result after executing the DELETE statement. Duplicate records, such as the second 'Alice' and the second 'John,' have been removed, leaving only one instance of each unique record based on the lowest id.

To put it simply, becoming adept at the task of removing duplicate records, leaving only one unique entry in MySQL, is a valuable skill in keeping databases organized and efficient. The combination of self-joins with the DELETE statement allows you to confidently get rid of redundancy, promoting data accuracy and ensuring optimal performance for your database. Success in this process hinges on being adaptable to different table structures and carefully considering the criteria for removal. Ultimately, a well-maintained database forms the bedrock for robust applications, making these techniques indispensable for anyone engaged in database management or development.

Conclusion

The DELETE statement in MySQL can be used with a self-joined query to get rid of duplicate records while keeping a single representative record. The query checks records for matches in a column and deletes duplicate records based on conditions such as keeping the record with the least ID. This helps to keep data clean, reduce redundancy, and improve database performance, especially when dealing with repetitive entries.