How to Remove All Duplicate Rows Except One in SQLite?
Last Updated :
07 Feb, 2024
SQLite is a lightweight and open-source relational database management system (RDBMS). SQLite does not require any server to process since it is a serverless architecture that can run operations and queries without any server. In this article, we will understand how to remove duplicate rows except one row. We will also see some techniques which help us to remove duplicate rows.
Introduction to Duplicate Rows in SQLite
Duplicate rows refer to entries within a database table that share identical values across all columns or a subset of columns. These duplicates can arise due to various reasons, such as data entry errors, data migration issues, or database design problems.
Setting Up Environment
Let's create a table called employees and insert some data also to better understand.
Query:
CREATE TABLE employees
(
id INTEGER PRIMARY KEY,
name TEXT,
department TEXT,
salary REAL
);
Let's insert some data into the Employees table.
Copy code
INSERT INTO employees (name, department, salary) VALUES
('John Doe', 'Engineering', 50000),
('Jane Smith', 'Marketing', 45000),
('John Doe', 'Engineering', 50000),
('Alice Johnson', 'Finance', 60000),
('Bob Brown', 'Engineering', 55000),
('Jane Smith', 'Marketing', 45000);
Identifying Duplicate Rows
Before start removing, it is important to identify which rows are duplicates. It can be done using SQLite's GROUP BY and HAVING clauses in conjunction with aggregate functions like COUNT(). Let's run basic query to identify duplicates in the employees table
Query:
SELECT name, department, COUNT(*)
FROM employees
GROUP BY name, department
HAVING COUNT(*) > 1;
Output:
OUTPUTExplanation: In the above query we selects rows from the employees
table where the combination of name
and department
occurs more than once. It groups the rows by name
and department
and
counts the occurrences of each unique combination, and filters out groups with counts greater than one.
Ways to Removing Duplicate Rows
When an duplicates records are identified or verified then the next step to retain only one instance of each duplicated row and removing the others. SQLite offers multiple method to solve this which include help of ROWID, temporary tables, subqueries, and Common Table Expressions (CTEs). Let's understand each one of them with the help of example.
Method 1: Using ROWID
ROWID is a special column that exists in every ordinary table. It's an implicit column that serves as the Primary key for the table if we have not explicitly defined one. It provides a convenient way to uniquely identify rows in a table which is essential to understand its working. SQLite automatically assigns a unique ROWID to each row of every table which can be easy to identify and remove duplicates.
Query:
DELETE FROM employees
WHERE ROWID NOT IN
(
SELECT MIN(ROWID)
FROM employees
GROUP BY name, department
);
Output:
OUTPUTExplanation: After executing the query, the duplicate rows will be removed from the employees table, retaining only one instance of each unique row.
Method 2: Using Temporary Table
Another approach involves creating a temporary table to store unique rows and then replacing the original table.
Query:
CREATE TABLE temp_employees AS
SELECT DISTINCT *
FROM employees;
DELETE FROM employees;
INSERT INTO employees SELECT * FROM temp_employees;
DROP TABLE temp_employees;
Output:
OUTPUTExplanation: After executing these commands, the employees table will have duplicate rows removed, preserve only one instance of each unique row.
Method 3: Using Subquery
Subquery is also known as a nested query or inner query. It is a query which is nested within another query. It allows us to perform more complex operations by using the result of one query as input for another query.
Using a subquery, We can select the distinct rows and then delete the remaining duplicates from the original table.
Query:
DELETE FROM employees
WHERE (name, department, salary) NOT IN
(
SELECT MIN(name), MIN(department), MIN(salary)
FROM employees
GROUP BY name, department, salary
);
Output:
OUTPUTExplanation: After executing the query, duplicate rows will be removed from the employees table, reserve only one instance of each unique row.
Method 4: Using Common Table Expressions (CTEs)
Common Table Expressions (CTEs) provide a way to define temporary result sets within a query. These result sets defined by a CTE can be used within the same query similar like subquery. CTEs increase the readability and maintainability for complex queries that require multiple levels of nesting. Common Table Expressions (CTEs) offer a convenient way to define temporary result sets that can be referenced within a statement. We can use CTEs to identify and remove duplicates.
Query:
WITH duplicates AS
(
SELECT name, department, salary, ROW_NUMBER() OVER (PARTITION BY name, department, salary) AS row_num
FROM employees
)
DELETE FROM employees
WHERE (name, department, salary) IN
(
SELECT name, department, salary
FROM duplicates
WHERE row_num > 1
);
Output:
OUTPUTExplanation: After executing the query the duplicate rows will be deleted from the employees table preserve only one instance of each unique row based on the specified columns.
Conclusion
Overall, After reading whole article now we have good understanding of how to identify duplicate rows and also how to delete them. We can remove duplicate rows with the help of one of them method which is described above. The method we have discussed which are ROWID, temporary tables, subqueries, and Common Table Expressions (CTEs). The Duplicate rows in database is important to remove to maintain data integrity and optimizing performance.
Similar Reads
How to Remove All Duplicate Rows Except One in PostgreSQL?
When working with databases, we often have duplicate rows in the table. These duplicates can arise due to various reasons such as erroneous data entry or data migration processes. Removing duplicates while retaining one instance of each is a frequent requirement in database maintenance tasks. So to
5 min read
How to delete duplicate rows in SQLite?
SQLite is an open-source and serverless database system that does not require any server to perform various queries also it is widely used in the development of embedded software like television and mobile phones Sometimes it might happen that we by mistake insert multiple times similar data into ta
3 min read
How to Remove Duplicate Records Except a Single Record in MySQL?
In MySQL, remove duplicate records while preserving one representative entry using a DELETE statement with a self-join. Match records based on criteria like identical values in a column and delete duplicates based on conditions, such as retaining the record with the lowest ID, ensuring data cleanlin
4 min read
How to Fetch Duplicate Rows in a Table?
Identifying duplicate rows in a database table is a common requirement, especially when dealing with large datasets. Duplicates can arise due to data entry errors, system migrations, or batch processing issues. In this article, we will explain efficient SQL techniques to identify and retrieve duplic
3 min read
How to Find and Remove Duplicates in Excel
Removing duplicates in Excel is essential when cleaning up data to ensure accuracy and avoid redundancy. Whether youâre working with small datasets or large spreadsheets, Excel provides built-in tools and methods to help you identify and remove duplicates effectively. This guide will walk you throug
9 min read
How to Delete Duplicate Rows in PL/SQL?
Inconsistencies and inefficiencies in data management are frequently caused by duplicate rows in a database table. Eliminating duplicate rows is a typical PL/SQL activity to maintain data integrity and improve database performance. This article will guide you on how to remove duplicated rows in PL/S
4 min read
How to Find Duplicate Records in SQL?
To find duplicate records in SQL, we can use the GROUP BY and HAVING clauses. The GROUP BY clause allows us to group values in a column, and the COUNT function in the HAVING clause shows the count of the values in a group. Using the HAVING clause with a condition of COUNT(*) > 1, we can identify
3 min read
How to Find Duplicate Rows in PL/SQL
Finding duplicate rows is a widespread requirement when dealing with database analysis tasks. Duplicate rows often create problems in analyzing tasks. Detecting them is very important. PL/SQL is a procedural extension for SQL. We can write custom scripts with the help of PL/SQL and thus identifying
5 min read
How to Remove Duplicates From Array Using VBA in Excel?
Excel VBA code to remove duplicates from a given range of cells. In the below data set we have given a list of 15 numbers in âColumn Aâ range A1:A15. Need to remove duplicates and place unique numbers in column B. Sample Data: Cells A1:A15 Sample Data Final Output: VBA Code to remove duplicates and
2 min read
How to Remove Duplicates in Google Sheets
Google Sheets as a part of Google Workspace, is one of the popular cloud-based spreadsheet applications widely used for data management and analysis. It allows users to create and edit data on spreadsheets and enables us to share spreadsheets online which can be accessible from any device with inter
5 min read