Find Duplicates in MS SQL Server
Last Updated :
30 Aug, 2024
Finding duplicate values in a database is a common task when managing data integrity. In SQL, several methods can be employed to identify and handle duplicate entries.
In this article, We will explore two effective techniques for locating duplicates using SQL queries: the GROUP BY
clause and the ROW_NUMBER
()
function.
Find Duplicate Values Using the GROUP BY Clause
The GROUP BY
clause is a straightforward way to identify duplicates by grouping rows based on specified columns and using the HAVING
clause to filter groups with more than one occurrence.
This method is useful for detecting repeated combinations of column values.
Syntax:
SELECT col1, col2, ...COUNT(*)
FROM table_name
GROUP BY col1, col2, ...
HAVING COUNT(*) > 1;
In this method, we group the column values, and values that have a count greater than 1 are the duplicate values in that column.
Example of Finding duplicate values using the GROUP BY clause
Let us create a table named Geek that contains three columns: ID, A, and B.
CREATE TABLE Geek (
ID INT IDENTITY(1, 1),
A INT,
B INT,
PRIMARY KEY (ID)
);
Let us add some values to the table Geek
INSERT INTO Geek (A, B)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 2);
Let's write an query to find the duplicate rows using GROUP BY clause in the Geek table :
SELECT
A,
B,
COUNT(*) AS num
FROM
Geek
GROUP BY
A,
B
HAVING
COUNT(*) > 1;
Output:
Explanation: The query identifies combinations of columns `A` and `B` in the `Geek` table that appear more than once. It groups the rows by `A` and `B`, counts the occurrences of each combination, and uses the `HAVING` clause to filter out groups with a count of 1 or less. The result is a list of duplicate combinations along with their counts.
To find the full row details for each duplicate row, JOIN the output of the above query with the Geek table using CTE :
WITH CTE AS (
SELECT A, B, COUNT(*) AS num
FROM Geek
GROUP BY A, B
HAVING COUNT(*) > 1
)
SELECT Geek.ID, Geek.A, Geek.B
FROM Geek
JOIN CTE ON CTE.A = Geek.A AND CTE.B = Geek.B
ORDER BY Geek.A, Geek.B;
Output:
Explanation: The query uses a Common Table Expression (CTE) to identify the combinations of columns `A` and `B` in the `Geek` table that occur more than once. It groups by `A` and `B`, and the `HAVING` clause filters to include only those combinations with a count greater than 1. The outer query then joins this CTE with the original `Geek` table to retrieve all rows that have these duplicate combinations, and it orders the results by `A` and `B`.
Find Duplicate Values Using ROW_NUMBER() Function
To find the duplicate values using the ROW_NUMBER() function, follow the given syntax.
Syntax :
WITH cte AS (
SELECT
col,
ROW_NUMBER() OVER (PARTITION BY col ORDER BY col) AS row_num
FROM
table_name
)
SELECT * FROM cte WHERE row_num > 1;
MS SQL Server query to find the duplicate rows using ROW_NUMBER() function in the Geek table :
Query:
WITH CTE AS (
SELECT A, B,
ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY A, B)
AS rownum FROM Geek
)
SELECT * FROM CTE WHERE rownum > 1;
Output:
Explanation: The query uses a Common Table Expression (CTE) to assign a unique row number (`rownum`) to each row in the `Geek` table based on the partition of columns `A` and `B`, ordered by `A` and `B`. The outer query then selects rows where `rownum` is greater than 1, effectively retrieving duplicate rows based on the combination of columns `A` and `B`.
Conclusion
Identifying and handling duplicate data is crucial for maintaining data quality in databases. The GROUP BY
clause and the ROW_NUMBER()
function offer powerful techniques for finding duplicates, each with its own advantages. The GROUP BY
method is efficient for detecting repeated combinations, while ROW_NUMBER()
provides a detailed approach to pinpoint specific duplicates.
Similar Reads
Create Database in MS SQL Server Databases in Microsoft SQL Server are crucial for managing data, categorized into system databases, which are auto-created and user databases, created by users. In this article, We will learn about the basics of system and user databases along with methods for creating and managing them using T-SQL
5 min read
How to Find Duplicate Rows in PL/SQL Finding duplicate rows is a widespread requirement when dealing with database analysis tasks. Duplicate rows often create problems in analyzing tasks. Detecting them is very important. PL/SQL is a procedural extension for SQL. We can write custom scripts with the help of PL/SQL and thus identifying
5 min read
List All Databases in SQL Server In SQL Server, databases are crucial for storing and managing data efficiently. Whether we are managing a large enterprise system or a small application, understanding how to list all the databases on our SQL Server is essential. In this article, we will write SQL queries that help us to retrieve al
3 min read
COUNT() Function in SQL Server The COUNT() function in SQL Server is a fundamental aggregate function used to determine the number of rows that match a specific condition. Counting rows provides valuable insights into data sets such as the total number of records, distinct values, or records meeting certain criteria.In this artic
3 min read
Introduction of MS SQL Server Data is a collection of facts and figures and we have humungous data available to the users via the internet and other sources. To manipulate the data, Structured Query Language (SQL) in short has been introduced years ago. There are different versions of SQL available in the market provided by diff
2 min read