Removing Duplicate Rows (Based on Values from Multiple Columns) From SQL Table
Last Updated :
28 Nov, 2021
In SQL, some rows contain duplicate entries in multiple columns(>1). For deleting such rows, we need to use the DELETE keyword along with self-joining the table with itself. The same is illustrated below. For this article, we will be using the Microsoft SQL Server as our database.
Step 1: Create a Database. For this use the below command to create a database named GeeksForGeeks.
Query:
CREATE DATABASE GeeksForGeeks
Output:

Step 2: Use the GeeksForGeeks database. For this use the below command.
Query:
USE GeeksForGeeks
Output:

Step 3: Create a table RESULT inside the database GeeksForGeeks. This table has 5 columns namely STUDENT_ID, PHYSICS_MARKS, CHEMISTRY_MARKS, MATHS_MARKS, and TOTAL_MARKS containing the id of the student, his/her marks in physics, chemistry, and mathematics, and finally his/her total marks.
Query:
CREATE TABLE RESULT(
STUDENT_ID INT,
PHYSICS_MARKS INT,
CHEMISTRY_MARKS INT,
MATHS_MARKS INT,
TOTAL_MARKS INT);
Output:

Step 4: Describe the structure of the table RESULT.
Query:
EXEC SP_COLUMNS RESULT;
Output:

Step 5: Insert 10 rows into the RESULT table.
Query:
INSERT INTO RESULT VALUES(1,90,88,75,253);
INSERT INTO RESULT VALUES(2,99,88,75,262);
INSERT INTO RESULT VALUES(3,96,88,75,256);
INSERT INTO RESULT VALUES(4,97,87,76,260);
INSERT INTO RESULT VALUES(5,91,86,77,254);
INSERT INTO RESULT VALUES(6,92,85,78,255);
INSERT INTO RESULT VALUES(7,93,84,79,256);
INSERT INTO RESULT VALUES(8,80,83,87,250);
INSERT INTO RESULT VALUES(9,80,82,88,250);
INSERT INTO RESULT VALUES(10,80,81,89,250);
Output:

Step 6: Display all the rows of the RESULT table.
Query:
SELECT * FROM RESULT;
Output:

Step 7: Delete rows from the table RESULT which have duplicate entries in the columns CHEMISTRY_MARKS and MATHS_MARKS. To achieve this, we use the DELETE function by self joining(use JOIN function on 2 aliases of the table i.e. R1 and R2) the table with itself and comparing the entries of the columns CHEMISTRY_MARKS and MATHS_MARKS for different entries of the column STUDENT_ID because ID is unique for each student.
Syntax:
DELETE T1 FROM TABLE_NAME T1
JOIN TABLE_NAME T2
ON T1.COLUMN_NAME2 = T2.COLUMN_NAME2 AND
T1.COLUMN_NAME3 = T2.COLUMN_NAME3 AND .......
AND T2.COLUMN_NAME1 < T1.COLUMN_NAME1;
Query:
DELETE R1 FROM RESULT R1
JOIN RESULT R2
ON R1.CHEMISTRY_MARKS = R2.CHEMISTRY_MARKS
AND R1.MATHS_MARKS = R2.MATHS_MARKS
AND R2.STUDENT_ID < R1.STUDENT_ID;
Output:

Step 8: Display all the rows of the updated RESULT table.
Query:
SELECT * FROM RESULT;
Note: No row has duplicate entries in the columns CHEMISTRY_MARKS and MATHS_MARKS.
Output:

Step 9: Delete rows from the table RESULT which have duplicate entries in the columns TOTAL_MARKS and PHYSICS_MARKS.
Query:
DELETE R1 FROM RESULT R1
JOIN RESULT R2
ON R1.TOTAL_MARKS = R2.TOTAL_MARKS AND R1.PHYSICS_MARKS = R2.PHYSICS_MARKS
AND R2.STUDENT_ID < R1.STUDENT_ID;
Output:

Step 10: Display all the rows of the updated RESULT table.
Query:
SELECT * FROM RESULT;
Note – No row has duplicate entries in the columns TOTAL_MARKS and PHYSICS_MARKS.
Output:

Similar Reads
How to Remove Duplicate Values Based on Only One Column of the Table in SQL?
In SQL, managing duplicate records is a crucial task for maintaining data integrity. Often, we may encounter scenarios where there are duplicate rows based on a single column, and it becomes necessary to remove duplicates to ensure accuracy and efficiency. In this article, we will explain how to del
4 min read
How to Find Duplicates Values Across Multiple Columns in SQL?
In SQL, identifying duplicate entries across multiple columns is crucial for ensuring data integrity and quality. Whether we're working with large datasets or trying to clean up existing data, we can efficiently find duplicates using GROUP BY and COUNT() functions. In this article, we'll focus on ho
3 min read
How to remove rows from a Numpy array based on multiple conditions ?
In this article, we will learn how to remove rows from a NumPy array based on multiple conditions. For doing our task, we will need some inbuilt methods provided by the NumPy module which are as follows: np.delete(ndarray, index, axis): Delete items of rows or columns from the NumPy array based on g
3 min read
Selecting Multiple Columns Based On Condition in SQL
SQL (Structured Query Language) is used to manage and query databases. One common requirement when querying data is selecting multiple columns based on specific conditions. Understanding how to use SQL for this purpose can enhance your ability to retrieve relevant data efficiently. In this article,
4 min read
Drop One or Multiple Columns From PySpark DataFrame
In this article, we will discuss how to drop columns in the Pyspark dataframe. In pyspark the drop() function can be used to remove values/columns from the dataframe. Syntax: dataframe_name.na.drop(how=âany/allâ,thresh=threshold_value,subset=[âcolumn_name_1â³,âcolumn_name_2â]) how â This takes either
3 min read
How to Remove All Duplicate Rows Except One in SQLite?
SQLite is a lightweight and open-source relational database management system (RDBMS). SQLite does not require any server to process since it is a serverless architecture that can run operations and queries without any server. In this article, we will understand how to remove duplicate rows except o
5 min read
How to Compare Rows and Columns in the Same Table in SQL
In SQL, comparing rows and columns in the same table is a common task for data analysis, identifying relationships, and calculating differences. By using techniques like self-joins and conditional queries, we can extract meaningful insights from a single table. This enables efficient querying, allow
4 min read
How to Remove rows based on count of a specific value in R?
Data cleaning is an essential step in data analysis, and removing rows based on specific criteria is a common task. One such criterion is the count of a specific value in a column. This article will guide you through the process of removing rows from a data frame in R based on the count of a specifi
5 min read
How to Query Two Tables For Duplicate Values in SQL?
When working with relational databases, it's common to identify duplicate values across multiple tables. SQL provides efficient ways to query such data using different techniques. These methods help streamline data analysis and ensure data consistency. In this article, we demonstrate how to query tw
3 min read
How to Remove Old Row and Column Items from the Pivot Table in Excel?
Pivot table is one of the most efficient tools in excel for data analysis. If you are using pivot tables frequently, then you will find even after deleting the old data from the data source, it remains in the filter drop-down of the pivot table. We will learn, how to remove the old row and column it
4 min read