How to Remove Duplicates by using $unionWith in MongoDB?
Last Updated :
07 May, 2024
Duplicate documents in a MongoDB collection can often lead to inefficiencies and inconsistencies in data management. However, MongoDB provides powerful aggregation features to help us solve such issues effectively.
In this article, we'll explore how to remove duplicates using the $unionWith aggregation stage in MongoDB. We'll cover the concepts, syntax, and practical examples to demonstrate its usage and effectiveness.
Understanding $unionWith
- The $unionWith aggregation stage in MongoDB is used to combine documents from multiple collections or aggregation pipelines into a single stream of documents.
- It allows us to merge the results of different data sources which can be useful for various data processing tasks, including removing duplicates.
Syntax of $unionWith:
The syntax of $unionWith is straightforward. Here's how it looks:
{
$unionWith: {
coll: "<collection_name>"
}
}
- $unionWith: The aggregation stage to combine documents from different collections.
- coll: The name of the collection to union documents with.
Example of Removing Duplicates with $unionWith
To understand How to Remove Duplicates by using $unionWith in MongoDB we need a collection and some documents on which we will perform various operations and queries. Here we will consider a collection called users and collection2 which contains the information shown below:
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d12"),
"name": "Alice",
"email": "[email protected]",
"age": 30
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d13"),
"name": "Bob",
"email": "[email protected]",
"age": 35
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d14"),
"name": "Charlie",
"email": "[email protected]",
"age": 40
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d15"),
"name": "David",
"email": "[email protected]",
"age": 45
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d16"),
"name": "Eve",
"email": "[email protected]",
"age": 50
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d17"),
"name": "Frank",
"email": "[email protected]",
"age": 55
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d18"),
"name": "Alice",
"email": "[email protected]",
"age": 60
}
]
collection2:
// collection2
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d19"),
"name": "Alice",
"email": "[email protected]",
"age": 65
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d20"),
"name": "Bob",
"email": "[email protected]",
"age": 70
}
]
Example 1: Remove duplicates based on the "name" field
db.users.aggregate([
{ $unionWith: { coll: "collection2" } },
{
$group: {
_id: "$name",
doc: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$doc" } }
])
Output:
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d12"),
"name": "Alice",
"email": "[email protected]",
"age": 30
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d13"),
"name": "Bob",
"email": "[email protected]",
"age": 35
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d14"),
"name": "Charlie",
"email": "[email protected]",
"age": 40
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d15"),
"name": "David",
"email": "[email protected]",
"age": 45
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d16"),
"name": "Eve",
"email": "[email protected]",
"age": 50
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d17"),
"name": "Frank",
"email": "[email protected]",
"age": 55
}
]
Explanation: This MongoDB aggregation pipeline combines documents from the users
collection with collection2
, groups them by the "name" field, and retains only the first document encountered for each name. The $replaceRoot
stage then replaces each document with the retained document, effectively removing duplicates based on the "name" field.
Example 2: Remove duplicates based on the "email" field
To remove duplicates based on the "email" field, you can modify the $group
stage in the aggregation pipeline
db.users.aggregate([
{ $unionWith: { coll: "collection2" } },
{
$group: {
_id: "$email",
doc: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$doc" } }
])
Output:
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d12"),
"name": "Alice",
"email": "[email protected]",
"age": 30
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d13"),
"name": "Bob",
"email": "[email protected]",
"age": 35
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d14"),
"name": "Charlie",
"email": "[email protected]",
"age": 40
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d15"),
"name": "David",
"email": "[email protected]",
"age": 45
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d16"),
"name": "Eve",
"email": "[email protected]",
"age": 50
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d17"),
"name": "Frank",
"email": "[email protected]",
"age": 55
}
]
Explanation: This MongoDB aggregation pipeline merges documents from the users
collection with collection2
, groups them by the "email" field, and retains only the first document encountered for each email. The $replaceRoot
stage then replaces each document with the retained document, effectively removing duplicates based on the "email" field
Conclusion
Overall, we explored how to remove duplicates from a MongoDB collection using the $unionWith aggregation stage. We discussed the syntax and provided a step-by-step example to demonstrate its usage. By using the aggregation pipelines and $unionWith, MongoDB enables efficient and effective removal of duplicate documents, ensuring data integrity and consistency in your database. As you continue to work with MongoDB, mastering aggregation pipelines and their stages will prove invaluable for various data processing tasks.
Similar Reads
How to Find Duplicates in MongoDB
Duplicates in a MongoDB collection can lead to data inconsistency and slow query performance. Therefore, it's essential to identify and handle duplicates effectively to maintain data integrity. In this article, we'll explore various methods of how to find duplicates in MongoDB collections and discus
4 min read
How to Remove Documents using Node.js Mongoose?
When working with Node.js and MongoDB, managing data involves removing documents from collections. In this article, we will explore how to remove documents using Node.js and Mongoose, a popular MongoDB library that simplifies database interactions. We'll cover the various methods provided by Mongoos
3 min read
How to Remove Duplicates from an Array of Objects using TypeScript ?
We are given an array of objects and we have to check if there are duplicate objects present in the array and remove the duplicate occurrences from it. Otherwise, return the array as it is. Examples of Removing Duplicates from an Array of Objects using TypeScript Table of Content Using filter() and
4 min read
How to remove duplicate values from array using PHP?
In this article, we will discuss removing duplicate elements from an array in PHP. We can get the unique elements by using array_unique() function. This function will remove the duplicate values from the array.Syntax:array array_unique($array, $sort_flags);Note: The keys of the array are preserved i
4 min read
How to Use $unwind Operator in MongoDB?
MongoDB $unwind operator is an essential tool for handling arrays within documents. It helps deconstruct arrays, converting each array element into a separate document, which simplifies querying, filtering, and aggregation in MongoDB.By understanding the MongoDB $unwind syntax users can utilize this
6 min read
How to Remove Duplicate Elements from an Array using Lodash ?
Removing duplicate elements from an array is necessary for data integrity and efficient processing. The approaches implemented and explained below will use the Lodash to remove duplicate elements from an array. Table of Content Using uniq methodUsing groupBy and map methodsUsing xor functionUsing un
3 min read
How to Join Two Collections in Mongodb using Node.js ?
Joining two collections in MongoDB using Node.js can be accomplished using the aggregation framework. The $lookup stage in the aggregation pipeline allows you to perform a left outer join to another collection in the same database. Understanding MongoDB CollectionsIn MongoDB, a collection is a group
4 min read
How to Remove Duplicates in JSON Array JavaScript ?
In JavaScript, removing duplicates from a JSON array is important for data consistency and efficient processing. We will explore three different approaches to remove duplicates in JSON array JavaScript. Use the methods below to remove Duplicates in JSON Array JavaScript. Table of Content Using SetUs
3 min read
How to Remove All Duplicate Rows Except One in SQLite?
SQLite is a lightweight and open-source relational database management system (RDBMS). SQLite does not require any server to process since it is a serverless architecture that can run operations and queries without any server. In this article, we will understand how to remove duplicate rows except o
5 min read
How to Use $set and $unset Operators in MongoDB
MongoDB is a NoSQL database that stores data in documents instead of traditional rows and columns found in relational databases. These documents, grouped into collections, allow for flexible data storage and retrieval. One of MongoDBâs key advantages is its ability to dynamically update documents us
6 min read