MongoDB aggregation pipeline is a powerful framework for data processing that allows documents to perform sequential transformations like filtering, grouping, and reshaping.
In this article, We will learn about various Aggregation Pipeline Stages in MongoDB with the help of examples and so on.
Aggregation Pipeline Stages in MongoDB
The aggregation pipeline in MongoDB is a framework designed for data aggregation, inspired by data processing pipelines. Documents enter a multi-stage pipeline that can transform them in various ways, including filtering, grouping, reshaping, and modifying.
Each stage of the pipeline processes the documents as they pass through and performing an operation and passing the results to the next stage. There are different types of stages in the MongoDB called aggregation pipeline stages. The aggregation method takes the stages that work in the sequence. The data goes through multiple stages and returns the resulting output.
Syntax:
db.<collection_name>.aggregate(pipeline, options)
or
db.<collection_name>.aggregate( [ { <stage1> }, {<stage2>},... ] )
- A pipeline is a sequence of aggregation operations and stages. It is of array type. If the pipeline is not mentioned then it returns all the documents from the collection.
- Aggregation Stages always work in sequence means stage1 will execute first and pass the result to stage2 and so on.
- Options are optional and only used when you specify pipeline as an array.
- Aggregation is used to enhance performance by optimizing the query.
- Use variable and field names with a dollar sign($) and string without a dollar sign.
Aggregation Stages
1. $addFields Stage
$addFields stage is used to add new fields to the existing documents. It returns documents that contain all the existing fields and newly added fields.
Syntax :
db.<collection_name>.aggregate(
[
{
$addFields: { <newField1>: <expression>, <newField2>: <expression> ... }
}
]);
Example:
Let''s suppose we have to Add a new field isActive
with the value true
to every document in the employee_details
collection using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate([
{
$addFields: {"isActive" : true }
}
])
Output:
2. $set Stage
$set stage is used to add new fields to the existing documents. It returns documents that contain all the existing fields and newly added fields. It is the same as that of the $addFields stage. If the name of the new field is the same as an existing field (including _id), $set overwrites the existing value of that field with the new value.
Note: $set stage is new staging in version 4.2.
Syntax:
db.<collection_name>.aggregate(
[
{
$set: { <newField>: <expression>, ... }
}
]);
Example:
Let''s suppose we have to Add a new field `company_name` with the value `GFG` to every document in the `employee_details` collection using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate([
{
$set: {"company_name" : "GFG"}
}
])
Output:
3. $unset Stage
$unset stage is used to remove/exclude fields from the documents. If you want to remove the field or fields from the embedded document then use the dot notation. $unset stage is new staging in version 4.2.
Syntax:
- To remove a single field, The $unset stage takes a string that specifies the field to remove.
db.<collection_name>.aggregate(
[
{
$unset: "<field>"
}
]);
- To remove multiple fields, The $unset takes an array of fields to remove.
db.<collection_name>.aggregate(
[
{
$unset: [ "<field1>", "<field2>", ... ]
}
]);
Example:
Let''s suppose we have to Remove the nested field `address.phone.type` from every document in the `employee_details` collection using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate([
{
$unset: "address.phone.type"
}
]);
Output:
4. $project Stage
$project stage is used to include fields, exclude _id field, add new fields and replace the existing field value in the output document. The following is the table that describes the specification and its description.
Specification | Description |
field : <1 or true> | include the field in the output document |
_id : <0 or false> | exclude the _id from the output document |
field : <expression> | add the new field or replace the existing field value |
field : <0 or false> | exclude the field from the output document. If you use the field to exclude instead of _id then you can't use any other specification. |
Syntax:
db.<collection_name>.aggregate(
[
{
$project: { <specification>}
}
])
Example:
Let''s suppose we have to Retrieve only the `firstName` and `lastName` fields from the `name` object and the `salary` field for every document in the `employee_details` collection, excluding the `_id` field, using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate(
[
{
$project:{"name.firstName":1, "name.lastName":1, "salary":1, "_id":0}
}
]);
Output:
5. $count Stage
$count returns the count of the documents present in the selected collection.
Note: To get count of the documents in the collection we can also use db.<collection_name>.count(). It only returns the count we can't specify any output field.
Syntax:
db.<collection_name>.aggregate(
[
{
$count: <String>
}
]);
<string> is the name of the output field which has the count as its value.
Example:
Let''s suppose we have to Count the total number of documents in the `employee_details` collection and return the result as `employees_count` using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate(
[
{ $count: "employees_count" }
]);
Output:
6. $unwind Stage
$unwind stage is used to deconstruct an array field from the input documents to output documents. If mentioned field path is not an array but is also not missing, null, or an empty array then $unwind treats the non-array field as a single element array and the includeArrayIndex for that field is null.
Note: $unwind stage does not include documents whose field is missing, null or empty array until you use preserveNullAndEmptyArrays : true
Syntax:
db.<collection_name>.aggregate(
[
{
$unwind:
{
path: <field path>,
includeArrayIndex: <string>,
preserveNullAndEmptyArrays: <boolean>
}
}
]);
- Path specifies which field you want to unwind.
- includeArrayIndex includes array index only for array, not an application for fields that are not an array, null or empty.
- preserveNullAndEmptyArrays, If true then this option is used to include documents whose field is missing, null or empty array field. By default it is false.
Example:
Let''s suppose we have to Skip the first 3 documents and then unwind the `skills` array field in each remaining document in the `employee_details` collection using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate(
[
{$skip : 3},
{ $unwind: "$skills" }
]
);
Output:
The following is the output of the above operation. In this, we break down the document on the basis of an array field that is skills.
7. $limit Stage
$limit stage is used to limit the number of documents passed to the next stage in the pipeline.
Syntax:
db.<collection_name>.aggregate(
[
{
$limit: <positive 64-bit integer>
}
]);
Example:
Let''s suppose we have to Limit the results to the first 2 documents and then count these documents, returning the result as `employees_count`, using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate(
[
{$limit : 2},
{$count : "employees_count" }
]);
Output:
8. $sort Stage
$sort stage is used to sort the documents. If sorting is on multiple fields, the sort order is evaluated from left to right.
For example, documents are first sorted by <field1> and then documents with the same <field1> values are further sorted by <field2>.
Note: You can sort a maximum of 32 keys.
Syntax:
db.<collection_name>.aggregate(
[
{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }
]);
<field> represents the document field on the basis of which we sort the document either ascending or descending depends on the <sort-order>. <sort order> has the following values :
Value | Order |
1 | Ascending |
-1 | Descending |
Example of $sort in Ascending Order
Let''s suppose we have to Sort the documents in the `employee_details` collection in ascending order based on the `firstName` field in the `name` object using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate(
[
{$sort : {"name.firstName" : 1}}
]);
Output:
Example of $sort in Descending Order
Let''s suppose we have to Sort the documents in the `employee_details` collection in descending order based on the `firstName` field in the `name` object using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate(
[
{$sort : {"name.firstName" : -1}}
]);
Output:
9. $sortByCount Stage
$sortByCount stage is used to group the documents on the basis of field path (document field) and then count the number of documents in each different group and the documents are sorted by count in descending order. The $sortByCount stage returns the documents. Each document has two fields '_id' field containing the different grouping values and the 'count' field containing the number of documents belonging to that group.
Note: The $sortByCount stage is equivalent to the following: $group + $sort sequence: { $group: { _id: <expression>, count: { $sum: 1 } } }, { $sort: { count: -1 } }
Syntax:
db.<collection_name>.aggregate(
[
{$sortByCount : "<field path>"}
]);
Example:
Let''s suppose we have to Create a MongoDB aggregation pipeline to list and count the occurrences of each skill across all employees in the `employee_details` collection after unwinding the `skills` array field.
Query:
db.employee_details.aggregate(
[
{$unwind : "$skills"},
{$sortByCount : "$skills"}
]);
Output:
10. $skip Stage
$skip stage is used to skip the number of documents and passes the remaining documents to the next stage in the pipeline. It returns the remaining documents.
Syntax:
db.<collection_name>.aggregate(
[
{ $skip: <positive 64-bit integer> }
]);
Example:
Let''s suppose we have to skip the first 3 documents in the `employee_details` collection using the MongoDB aggregation pipeline.
Query:
db.employee_details.aggregate(
[
{$skip : 3}
]);
Output:
Conclusion
Overall, The aggregation pipeline in MongoDB offers a flexible way to manipulate and extract data, enhancing query performance through optimized processing stages. We ave seen various stage which is frequently used in MongoDB in Aggregation Operations.
Similar Reads
SQL Interview Questions Are you preparing for a SQL interview? SQL is a standard database language used for accessing and manipulating data in databases. It stands for Structured Query Language and was developed by IBM in the 1970's, SQL allows us to create, read, update, and delete data with simple yet effective commands.
15+ min read
SQL Tutorial SQL is a Structured query language used to access and manipulate data in databases. SQL stands for Structured Query Language. We can create, update, delete, and retrieve data in databases like MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that communicates with databases.In this S
11 min read
SQL Commands | DDL, DQL, DML, DCL and TCL Commands SQL commands are crucial for managing databases effectively. These commands are divided into categories such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Data Query Language (DQL), and Transaction Control Language (TCL). In this article, we will e
7 min read
SQL Joins (Inner, Left, Right and Full Join) SQL joins are fundamental tools for combining data from multiple tables in relational databases. Joins allow efficient data retrieval, which is essential for generating meaningful observations and solving complex business queries. Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT JO
6 min read
Normal Forms in DBMS In the world of database management, Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and maintaining data integrity. When working with databases, especially relational databases, it is critical to follow normalization techniques that help to eliminate
7 min read
ACID Properties in DBMS In the world of DBMS, transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliability. This is where the ACID prop
8 min read
Introduction of DBMS (Database Management System) A Database Management System (DBMS) is a software solution designed to efficiently manage, organize, and retrieve data in a structured manner. It serves as a critical component in modern computing, enabling organizations to store, manipulate, and secure their data effectively. From small application
8 min read
SQL Query Interview Questions SQL or Structured Query Language, is the standard language for managing and manipulating relational databases such as MySQL, Oracle, and PostgreSQL. It serves as a powerful tool for efficiently handling data whether retrieving specific data points, performing complex analysis, or modifying database
15 min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read
Window Functions in SQL SQL window functions are essential for advanced data analysis and database management. They enable calculations across a specific set of rows, known as a "window," while retaining the individual rows in the dataset. Unlike traditional aggregate functions that summarize data for the entire group, win
7 min read