How to Remove Duplicate Records Except a Single Record in MySQL?
Last Updated :
09 Jul, 2024
In MySQL, remove duplicate records while preserving one representative entry using a DELETE statement with a self-join. Match records based on criteria like identical values in a column and delete duplicates based on conditions, such as retaining the record with the lowest ID, ensuring data cleanliness, and reducing redundancy in the database.
Managing data efficiently is crucial in any database system, and dealing with duplicate records is a common challenge. In this article, we will explore a practical approach to remove duplicate records except a single record in MySQL. This process is essential for maintaining data integrity and ensuring that your database contains unique and relevant information.
Eliminating Duplicate Records Except One in MySQL
When working with databases, it's not uncommon to encounter situations where duplicate records need to be cleaned up. The process involves identifying redundant data based on specific criteria and keeping only one instance of each unique record. In MySQL, this can be achieved using a combination of the DELETE statement and self-joins.
Syntax:
The main concept revolves around using a self-join to compare records within the same table and then selectively delete duplicate entries. Here's the basic syntax:
DELETE t1 FROM table_name t1
JOIN table_name t2 ON t1.column_name = t2.column_name AND t1.primary_key > t2.primary_key;
- DELETE: Specifies the action to be taken, which is the removal of records.
- table_name: The name of the table containing duplicate records.
- t1 and t2: Aliases for the same table, enabling a self-join.
- column_name: The column used to identify duplicate records.
- primary_key: The primary key column of the table, ensuring we keep one instance.
Example: Eliminating Duplicate Records Keeping Lowest ID in MySQL
Duplicate records in a MySQL table can lead to data inconsistencies. This example demonstrates how to remove duplicate records from the 'employee' table, preserving only the records with the lowest ID for each unique name.
Consider a table named employee with columns id, name, and salary. We'll insert some sample data, including duplicate records.
-- Table creation and data insertion
CREATE TABLE employee (
id INT PRIMARY KEY,
name VARCHAR(50),
salary INT
);
INSERT INTO employee (id, name, salary) VALUES
(1, 'John', 50000),
(2, 'Alice', 60000),
(3, 'Bob', 55000),
(4, 'Alice', 60000),
(5, 'Charlie', 70000),
(6, 'John', 50000);
Now, let's use the DELETE statement to remove duplicate records, keeping only the one with the lowest id.
-- Remove duplicate records, keeping the one with the lowest id
DELETE e1 FROM employee e1
JOIN employee e2 ON e1.name = e2.name AND e1.id > e2.id;
Output:
Before deleting the duplicate record:
Before deleting the duplicate record.This shows the initial state of the employee table with duplicate records. Notably, there are entries for 'Alice' and 'John' with the same salary.
AFTER deleting the duplicate record:
After deleting the duplicate record.Explanation:
This depicts the result after executing the DELETE statement. Duplicate records, such as the second 'Alice' and the second 'John,' have been removed, leaving only one instance of each unique record based on the lowest id.
To put it simply, becoming adept at the task of removing duplicate records, leaving only one unique entry in MySQL, is a valuable skill in keeping databases organized and efficient. The combination of self-joins with the DELETE statement allows you to confidently get rid of redundancy, promoting data accuracy and ensuring optimal performance for your database. Success in this process hinges on being adaptable to different table structures and carefully considering the criteria for removal. Ultimately, a well-maintained database forms the bedrock for robust applications, making these techniques indispensable for anyone engaged in database management or development.
Conclusion
The DELETE statement in MySQL can be used with a self-joined query to get rid of duplicate records while keeping a single representative record. The query checks records for matches in a column and deletes duplicate records based on conditions such as keeping the record with the least ID. This helps to keep data clean, reduce redundancy, and improve database performance, especially when dealing with repetitive entries.
Similar Reads
How to Remove All Duplicate Rows Except One in SQLite?
SQLite is a lightweight and open-source relational database management system (RDBMS). SQLite does not require any server to process since it is a serverless architecture that can run operations and queries without any server. In this article, we will understand how to remove duplicate rows except o
5 min read
How to Remove All Duplicate Rows Except One in PostgreSQL?
When working with databases, we often have duplicate rows in the table. These duplicates can arise due to various reasons such as erroneous data entry or data migration processes. Removing duplicates while retaining one instance of each is a frequent requirement in database maintenance tasks. So to
5 min read
How to Remove Duplicates in LibreOffice?
Removing duplicates in LibreOffice is a useful way to clean up your data and make sure you have accurate and unique information in your spreadsheets. Duplicates can cause confusion and errors in data analysis and reporting, so it's important to know how to remove them efficiently. LibreOffice Calc,
5 min read
How to Find Duplicate Records in SQL?
To find duplicate records in SQL, we can use the GROUP BY and HAVING clauses. The GROUP BY clause allows us to group values in a column, and the COUNT function in the HAVING clause shows the count of the values in a group. Using the HAVING clause with a condition of COUNT(*) > 1, we can identify
3 min read
How to Find and Remove Duplicates in Excel
Inacurate data always ruin your data and that the reason removing duplicates in Excel because it is one of the reason for duplicate data in Excel. In this guide, we aer going to explain you the best ways to find and remove duplicates in Excel. How to Find and Remove Duplicates in ExcelIn the section
9 min read
How to Remove Duplicates From Array Using VBA in Excel?
Excel VBA code to remove duplicates from a given range of cells. In the below data set we have given a list of 15 numbers in âColumn Aâ range A1:A15. Need to remove duplicates and place unique numbers in column B. Sample Data: Cells A1:A15 Sample Data Final Output: VBA Code to remove duplicates and
2 min read
How to Fetch Duplicate Rows in a Table?
Identifying duplicate rows in a database table is a common requirement, especially when dealing with large datasets. Duplicates can arise due to data entry errors, system migrations, or batch processing issues. In this article, we will explain efficient SQL techniques to identify and retrieve duplic
3 min read
How to Delete Duplicate Rows in MySQL?
Duplicate rows are a common problem in MySQL databases. Duplicate rows can cause problems with data accuracy and integrity. They can also make it difficult to query and analyze data.Ways to Delete Duplicate Rows in MySQLThere are a few different ways to delete duplicate rows from tables in MySQL:Usi
3 min read
How to Remove Duplicates in Google Sheets
Google Sheets as a part of Google Workspace, is one of the popular cloud-based spreadsheet applications widely used for data management and analysis. It allows users to create and edit data on spreadsheets and enables us to share spreadsheets online which can be accessible from any device with inter
5 min read
How to Remove duplicate elements from array in JavaScript ?
In this article, we will discuss the methods to remove duplicate elements from a Javascript array. There are various methods to remove duplicates in the array. These are the following ways: Table of Content Using filter() MethodUsing set() MethodUsing forEach() MethodUsing reduce() MethodUsing index
4 min read