Adding and Removing Shards in Mongodb
Last Updated :
27 Feb, 2025
MongoDB, as a highly scalable NoSQL database, is designed to handle large amounts of data. One of its powerful features for scaling is sharding. Sharding allows MongoDB to distribute data across multiple machines, improving performance and capacity.
In this article will delve into the process of adding and removing shards in MongoDB, covering the steps, considerations, and best practices to manage a sharded cluster effectively.
What is Sharding in MongoDB?
Sharding in MongoDB is the process of distributing data across multiple servers, or "shards," to enable horizontal scaling. This approach is especially useful when the data size exceeds the capacity of a single server. In a sharded cluster, data is divided based on a shard key, which is used to distribute the data across different shards. The cluster also includes two other components:
- Config servers: Store metadata and configuration settings for the shared cluster.
- Mongos routers: Act as the interface between client applications and the sharded cluster, routing client requests to the appropriate shard.
Adding a Shard to a MongoDB Cluster
Adding a shard to a MongoDB cluster is a straightforward operation, but it must be done with careful planning to ensure the continued stability and performance of the system. Here are the general steps for adding a new shard to a MongoDB cluster.
Step 1: Prepare the New Shard
Before adding the shard to the cluster, ensure that the new shard is properly configured. A shard in MongoDB can either be a standalone server or a replica set. Most production deployments use replica sets for redundancy and high availability.
For a replica set, ensure that:
- The replica set is initialized and configured.
- The mongod instances are running and accessible from the other members of the cluster.
Step 2: Connect to the MongoDB Config Server
The next step is to connect to one of the config servers of the sharded cluster. MongoDB config servers store the metadata necessary to route client requests to the appropriate shard.
You can connect to a config server using the mongo shell:
mongo --host <config_server_host>:<port>
Step 3: Add the Shard to the Cluster
Once connected to the config server, use the sh.addShard() command to add the new shard to the cluster. You will need to specify the address of the shard, which can be either a standalone mongod instance or a replica set. If the shard is a replica set, provide the replica set’s name and primary node’s address.
For example, to add a replica set named shardRS with a primary at hostname:27017, run the following command:
sh.addShard("shardRS/hostname:27017")
MongoDB will verify that the shard is properly configured and then add it to the cluster. After this, the data in the cluster can be distributed across the new shard.
Step 4: Verify the Addition of the Shard
To ensure that the shard has been successfully added, use the sh.status() command to view the status of the sharded cluster. This will show the current shards and their statuses.
sh.status()
You should see the new shard listed among the existing shards.
Step 5: Balancing Data
After adding a shard, MongoDB will start redistributing data across the shards to balance the data load. The balancing process uses the shard key and ensures that each shard stores a roughly equal portion of the data. This can take some time depending on the amount of data in the cluster.
You can monitor the balancer status by running the following:
db.isMaster()
Step 6: Monitor Cluster Health
Once the shard has been added and data is balanced, monitor the cluster health. Ensure that there are no issues with connectivity or data distribution, and check if the new shard is functioning properly.
Removing a Shard from a MongoDB Cluster
Removing a shard from a MongoDB cluster can be necessary for scaling down or decommissioning servers. The process is relatively straightforward, but care must be taken to avoid data loss. MongoDB does not support removing a shard while data is actively being written to it.
Step 1: Start with Balancing the Data
Before removing a shard, ensure that the data is balanced across the remaining shards. MongoDB will attempt to move data away from the shard to be removed, but if there is insufficient capacity on the other shards, it could cause performance issues.
You can use the following command to check if the data is balanced:
sh.status()
If necessary, run the following to manually trigger the balancing process:
sh.startBalancer()
Step 2: Disable the Shard
If you're ready to remove a shard, the first step is to disable it from accepting new writes. This is done by running the following command:
sh.removeShard("<shard_name>")
This command removes the shard from the cluster but does not delete the data. MongoDB will automatically migrate the data to the other shards, ensuring that no data is lost during the process.
Step 3: Wait for Data Migration
MongoDB will migrate data from the shard being removed to other shards in the cluster. Depending on the amount of data and the cluster's load, this process may take some time. You can monitor the migration status by checking the output of the sh.status() command.
Step 4: Confirm the Shard Removal
Once data migration is complete, you can confirm that the shard has been removed from the cluster. Use the sh.status() command again to verify that the shard no longer appears in the list of active shards.
sh.status()
Step 5: Shut Down the Shard Server
Once the shard has been removed and data migration is complete, you can safely shut down the shard server. Ensure that the shard server is properly decommissioned and removed from your infrastructure.
Best Practices for Adding and Removing Shards
While adding and removing shards is a straightforward operation, there are several best practices to follow:
- Plan for Scaling: When adding a new shard, ensure that the shard key is carefully selected to ensure efficient distribution of data. Poorly chosen shard keys can lead to hotspots, where some shards store much more data than others, affecting performance.
- Monitor Performance: Always monitor the performance of the cluster during and after adding/removing shards. Use MongoDB’s built-in monitoring tools or third-party tools to track key metrics such as query response time, disk usage, and network traffic.
- Use Replica Sets: For fault tolerance and high availability, use replica sets as shards. This ensures that if a shard goes down, the replica set can take over, minimizing downtime.
- Balance the Load: MongoDB automatically redistributes data when a shard is added or removed. However, it’s crucial to ensure that the cluster’s overall load is balanced to avoid performance bottlenecks.
- Test Changes in a Staging Environment: Before adding or removing shards in a production environment, test the changes in a staging environment. This allows you to identify potential issues without affecting the live cluster.
Conclusion
Sharding is a critical feature for scaling MongoDB horizontally, and adding or removing shards is an essential operation for managing large, distributed systems. By following the proper steps and best practices outlined in this guide, database administrators can ensure smooth scaling operations while maintaining the stability and performance of their MongoDB clusters. Proper planning and monitoring are key to successful shard management, ensuring that MongoDB can handle growing amounts of data with ease.
Similar Reads
SQL Interview Questions Are you preparing for a SQL interview? SQL is a standard database language used for accessing and manipulating data in databases. It stands for Structured Query Language and was developed by IBM in the 1970's, SQL allows us to create, read, update, and delete data with simple yet effective commands.
15+ min read
SQL Tutorial SQL is a Structured query language used to access and manipulate data in databases. SQL stands for Structured Query Language. We can create, update, delete, and retrieve data in databases like MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that communicates with databases.In this S
11 min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
SQL Commands | DDL, DQL, DML, DCL and TCL Commands SQL commands are crucial for managing databases effectively. These commands are divided into categories such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Data Query Language (DQL), and Transaction Control Language (TCL). In this article, we will e
7 min read
SQL Joins (Inner, Left, Right and Full Join) SQL joins are fundamental tools for combining data from multiple tables in relational databases. Joins allow efficient data retrieval, which is essential for generating meaningful observations and solving complex business queries. Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT JO
6 min read
Normal Forms in DBMS In the world of database management, Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and maintaining data integrity. When working with databases, especially relational databases, it is critical to follow normalization techniques that help to eliminate
7 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
ACID Properties in DBMS In the world of DBMS, transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliability. This is where the ACID prop
8 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Steady State Response In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We
9 min read