SQL | Remove Duplicates without Distinct
Last Updated :
11 Jul, 2025
In SQL, removing duplicate records is a common task, but the DISTINCT keyword can sometimes lead to performance issues, especially with large datasets. The DISTINCT clause requires sorting and comparing records, which can increase the processing load on the query engine.
In this article, we’ll explain various alternatives to remove duplicates in SQL, including using ROW_NUMBER(), self-joins, and GROUP BY. Each method will be explained in detail with examples and outputs.
Why Remove Duplicates in SQL?
Duplicate records can lead to incorrect data analysis and reporting, and can increase storage requirements. Therefore, removing duplicate records ensures better data integrity and more efficient database operations Fortunately, there are more efficient methods to remove duplicates from SQL queries without using DISTINCT.
1. Remove Duplicates Using Row_Number()
The Row_Number function assigns a unique number to each row within a partition of a result set, which allows us to identify and remove duplicate rows effectively.
Example
Let’s assume we have a table named Employees, and we want to remove duplicate rows based on the EmployeeName, EmployeeAddress, and EmployeeSex columns.
WITH CTE AS (
SELECT EmployeeID, EmployeeName, EmployeeAddress, EmployeeSex,
ROW_NUMBER() OVER (PARTITION BY EmployeeName, EmployeeAddress, EmployeeSex ORDER BY EmployeeID) AS RowNum
FROM Employees
)
DELETE FROM CTE WHERE RowNum > 1;
Explanation:
- The ROW_NUMBER() function assigns a unique number to each row within the partition of duplicate values.
- The CTE (Common Table Expression) is used to define the duplicate rows, and DELETE removes all but the first occurrence (RowNum > 1) of each duplicate.
2. Remove Duplicates Using a Self-Join
A self-join involves joining a table to itself to identify and remove duplicates based on specific criteria. This method is ideal for comparing columns within the same table.
Example
Let's consider the Employee table again. We’ll use a self-join to remove duplicate entries where EmployeeName and EmployeeAddress are the same.
DELETE A
FROM Employees A
JOIN Employees B ON A.EmployeeName = B.EmployeeName
AND A.EmployeeAddress = B.EmployeeAddress
WHERE A.EmployeeID > B.EmployeeID;
Explanation:
- The self-join compares records within the same table (aliased as A and B).
- The condition A.EmployeeID > B.EmployeeID ensures that only the duplicate records (with a higher EmployeeID) are deleted.
3. Remove Duplicates using group By
The GROUP BY clause can be used to remove duplicates by grouping rows with identical values in selected columns. This method is ideal when we want to retain specific records (like the first or last entry) based on aggregate functions.
Example
To remove duplicates based on FirstName, LastName, and MobileNo, we can group by these columns and select distinct entries.
SELECT FirstName, LastName, MobileNo
FROM Customers
GROUP BY FirstName, LastName, MobileNo;
Explanation:
- The GROUP BY clause groups records with the same FirstName, LastName, and MobileNo values.
- This effectively removes any duplicate entries based on these columns and returns only unique combinations.
4. Remove Duplicates Using DISTINCT ON
(PostgreSQL)
For PostgreSQL users, the DISTINCT ON clause is a powerful way to remove duplicates based on specific columns while retaining additional data from the same rows.
Example
SELECT DISTINCT ON (EmployeeName) EmployeeName, EmployeeAddress
FROM Employees
ORDER BY EmployeeName, EmployeeID;
Explanation:
- The DISTINCT ON clause keeps the first occurrence of each EmployeeName and removes subsequent duplicates.
- The ORDER BY clause specifies which row should be retained when duplicates are found.
5. Use of EXCEPT
to Remove Duplicates
The EXCEPT operator returns the records from the first query that are not present in the second query. This can be used to eliminate duplicates from a result set.
Example
SELECT * FROM Employees
EXCEPT
SELECT DISTINCT * FROM Employees;
Explanation:
- The first query returns all records from the Employees table, and the second query returns only distinct records.
- The EXCEPT operator subtracts the distinct rows from the original set, effectively leaving only duplicates.
Conclusion
There are several efficient ways to remove duplicates in SQL without using the DISTINCT keyword. Methods like ROW_NUMBER(), self-joins, GROUP BY, and DISTINCT ON can help eliminate duplicates and maintain data integrity in our databases. These techniques are particularly useful for improving query performance in large datasets where using DISTINCT might slow down execution.
Similar Reads
SQL Query to Delete Duplicate Rows Duplicate rows in a database can cause inaccurate results, waste storage space, and slow down queries. Cleaning duplicate records from our database is an essential maintenance task for ensuring data accuracy and performance. Duplicate rows in a SQL table can lead to data inconsistencies and performa
6 min read
SQL Query to Get Distinct Records Without Using Distinct Keyword Retrieving distinct records is a common task when working with databases. While the DISTINCT clause is the standard approach to fetch unique rows, there are scenarios where you may need to achieve the same result without using it.In this article, we explain various alternative methods to retrieve di
3 min read
How to Delete Duplicate Rows in MySQL? Duplicate rows can cause problems with data accuracy and integrity. They can also make it difficult to query and analyze data. In this article, we will explain several methods to remove duplicate rows from your MySQL tables, ensuring your data stays clean and accurate.Note: Some features are only su
4 min read
How to Delete Duplicate Rows in MySQL? Duplicate rows can cause problems with data accuracy and integrity. They can also make it difficult to query and analyze data. In this article, we will explain several methods to remove duplicate rows from your MySQL tables, ensuring your data stays clean and accurate.Note: Some features are only su
4 min read
How to Delete Duplicate Rows in MySQL? Duplicate rows can cause problems with data accuracy and integrity. They can also make it difficult to query and analyze data. In this article, we will explain several methods to remove duplicate rows from your MySQL tables, ensuring your data stays clean and accurate.Note: Some features are only su
4 min read
Delete Duplicate Rows in MS SQL Server In MS SQL Server, managing duplicate rows is a common task that can affect the integrity and performance of a database. To address this issue, SQL Server provides several methods for identifying and deleting duplicate rows. In this article, We will explore three effective approaches: using the GROUP
5 min read