Find Duplicates in MS SQL Server
Last Updated :
30 Aug, 2024
Finding duplicate values in a database is a common task when managing data integrity. In SQL, several methods can be employed to identify and handle duplicate entries.
In this article, We will explore two effective techniques for locating duplicates using SQL queries: the GROUP BY
clause and the ROW_NUMBER
()
function.
Find Duplicate Values Using the GROUP BY Clause
The GROUP BY
clause is a straightforward way to identify duplicates by grouping rows based on specified columns and using the HAVING
clause to filter groups with more than one occurrence.
This method is useful for detecting repeated combinations of column values.
Syntax:
SELECT col1, col2, ...COUNT(*)
FROM table_name
GROUP BY col1, col2, ...
HAVING COUNT(*) > 1;
In this method, we group the column values, and values that have a count greater than 1 are the duplicate values in that column.
Example of Finding duplicate values using the GROUP BY clause
Let us create a table named Geek that contains three columns: ID, A, and B.
CREATE TABLE Geek (
ID INT IDENTITY(1, 1),
A INT,
B INT,
PRIMARY KEY (ID)
);
Let us add some values to the table Geek
INSERT INTO Geek (A, B)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 2);
Let's write an query to find the duplicate rows using GROUP BY clause in the Geek table :
SELECT
A,
B,
COUNT(*) AS num
FROM
Geek
GROUP BY
A,
B
HAVING
COUNT(*) > 1;
Output:
Explanation: The query identifies combinations of columns `A` and `B` in the `Geek` table that appear more than once. It groups the rows by `A` and `B`, counts the occurrences of each combination, and uses the `HAVING` clause to filter out groups with a count of 1 or less. The result is a list of duplicate combinations along with their counts.
To find the full row details for each duplicate row, JOIN the output of the above query with the Geek table using CTE :
WITH CTE AS (
SELECT A, B, COUNT(*) AS num
FROM Geek
GROUP BY A, B
HAVING COUNT(*) > 1
)
SELECT Geek.ID, Geek.A, Geek.B
FROM Geek
JOIN CTE ON CTE.A = Geek.A AND CTE.B = Geek.B
ORDER BY Geek.A, Geek.B;
Output:
Explanation: The query uses a Common Table Expression (CTE) to identify the combinations of columns `A` and `B` in the `Geek` table that occur more than once. It groups by `A` and `B`, and the `HAVING` clause filters to include only those combinations with a count greater than 1. The outer query then joins this CTE with the original `Geek` table to retrieve all rows that have these duplicate combinations, and it orders the results by `A` and `B`.
Find Duplicate Values Using ROW_NUMBER() Function
To find the duplicate values using the ROW_NUMBER() function, follow the given syntax.
Syntax :
WITH cte AS (
SELECT
col,
ROW_NUMBER() OVER (PARTITION BY col ORDER BY col) AS row_num
FROM
table_name
)
SELECT * FROM cte WHERE row_num > 1;
MS SQL Server query to find the duplicate rows using ROW_NUMBER() function in the Geek table :
Query:
WITH CTE AS (
SELECT A, B,
ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY A, B)
AS rownum FROM Geek
)
SELECT * FROM CTE WHERE rownum > 1;
Output:
Explanation: The query uses a Common Table Expression (CTE) to assign a unique row number (`rownum`) to each row in the `Geek` table based on the partition of columns `A` and `B`, ordered by `A` and `B`. The outer query then selects rows where `rownum` is greater than 1, effectively retrieving duplicate rows based on the combination of columns `A` and `B`.
Conclusion
Identifying and handling duplicate data is crucial for maintaining data quality in databases. The GROUP BY
clause and the ROW_NUMBER()
function offer powerful techniques for finding duplicates, each with its own advantages. The GROUP BY
method is efficient for detecting repeated combinations, while ROW_NUMBER()
provides a detailed approach to pinpoint specific duplicates.
Similar Reads
SQL Interview Questions Are you preparing for a SQL interview? SQL is a standard database language used for accessing and manipulating data in databases. It stands for Structured Query Language and was developed by IBM in the 1970's, SQL allows us to create, read, update, and delete data with simple yet effective commands.
15+ min read
SQL Tutorial Structured Query Language (SQL) is the standard language used to interact with relational databases. Whether you want to create, delete, update or read data, SQL provides the structure and commands to perform these operations. SQL is widely supported across various database systems like MySQL, Oracl
8 min read
SQL Commands | DDL, DQL, DML, DCL and TCL Commands SQL commands are crucial for managing databases effectively. These commands are divided into categories such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Data Query Language (DQL), and Transaction Control Language (TCL). In this article, we will e
7 min read
SQL Joins (Inner, Left, Right and Full Join) SQL joins are fundamental tools for combining data from multiple tables in relational databases. Joins allow efficient data retrieval, which is essential for generating meaningful observations and solving complex business queries. Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT JO
5 min read
Normal Forms in DBMS In the world of database management, Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and maintaining data integrity. When working with databases, especially relational databases, it is critical to follow normalization techniques that help to eliminate
7 min read
ACID Properties in DBMS In the world of DBMS, transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliability. This is where the ACID prop
8 min read
SQL Query Interview Questions SQL or Structured Query Language, is the standard language for managing and manipulating relational databases such as MySQL, Oracle, and PostgreSQL. It serves as a powerful tool for efficiently handling data whether retrieving specific data points, performing complex analysis, or modifying database
15+ min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read
Window Functions in SQL SQL window functions are essential for advanced data analysis and database management. It is a type of function that allows us to perform calculations across a specific set of rows related to the current row. These calculations happen within a defined window of data and they are particularly useful
6 min read
Top 60 DBMS Interview Questions with Answers for 2025 A Database Management System (DBMS) is the backbone of modern data storage and management. Understanding DBMS concepts is critical for anyone looking to work with databases. Whether you're preparing for your first job in database management or advancing in your career, being well-prepared for a DBMS
15+ min read