Recognizing a gap in the availability of straightforward tools for MongoDB benchmarking, particularly those that do not require complex compilation, configuration, and intricate setups, I developed MongoDB Workload Generator. The aim was to provide MongoDB users with an accessible solution for effortlessly generating realistic data and simulating application workloads on both sharded and non-sharded clusters.
This tool supports standard CRUD operations and offers the flexibility to define custom queries, enabling tailored workload simulations. Optimized for performance with multi-core utilization and configurable threading, it is well-suited for stress testing. Best of all, with a configuration requiring only basic connection details, MongoDB Workload Generator prioritizes ease of use without compromising powerful functionality, enabling users to start benchmarking their MongoDB deployments with very little effort. Check out the MongoDB Workload Generator documentation for guidance in leveraging its features, usage, and advanced options.
Configuration for MongoDB benchmarking
MongoDB Workload Generator is composed of four files:
- mongodbCreds.py
- mongodbLoadQueries.py
- app.py
- mongodbWorkload.py
The only required configuration is in mongodbCreds.py, where you define the connection details for your MongoDB cluster. The file is self-explanatory and includes examples to help guide you through the setup process. mongodbCreds.py is designed to be easily extendable, allowing users to include additional parameters as needed. It provides a simplified format for specifying custom settings. Once configured, the application automatically compiles these parameters into the MongoDB URI used for the workload.
Additionally, multiple Mongos routers (when a sharded environment is being used) can be specified in the configuration. When provided, the tool will automatically distribute the workload across them, enabling built-in load balancing. Custom queries can optionally be defined in mongodbLoadQueries.py. When adding new queries, ensure they target collections and fields generated by the tool to prevent runtime errors.
Default behavior
MongoDB Workload Generator will create a database with an empty collection; you do not need to do this beforehand. Once the workload begins, it runs for 60 seconds, utilizing four threads and one CPU core. It automatically creates a MongoDB database named airlines, with a collection called flights_1, and populates it with documents containing data similar to the sample shown below (it also creates the necessary indexes). Note: Sharding is not enabled by default.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | [direct: mongos] airlines> db.flights_1.findOne() { _id: ObjectId('6806aafca42639c2279450a4'), flight_id: 6766976, flight_name: 'Flight_kopgt', departure: 'Crawfordside', arrival: 'Lake Tony', gate: 'K10', timestamp: ISODate('2025-04-21T16:30:52.512Z'), duration_minutes: 523, seats_available: 80, passengers: [ { passenger_id: 1, name: 'Kenneth Gallagher', seat_number: '26F' }, { passenger_id: 2, name: 'Lisa Moran', seat_number: '23B' }, { passenger_id: 3, name: 'Mark Crosby', seat_number: '1E' }, { passenger_id: 4, name: 'Mark Roy', seat_number: '22E' }, { passenger_id: 5, name: 'Mark Nichols', seat_number: '12E' }, { passenger_id: 6, name: 'Melissa Rivas', seat_number: '27A' }, { passenger_id: 7, name: 'Andrew Bishop', seat_number: '17A' }, { passenger_id: 8, name: 'Susan Williams', seat_number: '7F' }, { passenger_id: 9, name: 'Shannon Cameron', seat_number: '13B' }, { passenger_id: 10, name: 'Jean Rodriguez', seat_number: '12D' } ], equipment: { plane_type: 'Embraer E190', total_seats: 90, amenities: [ 'WiFi', 'TV', 'Power outlets' ] }, flight_code: 'FLT-148' } |
However, it is important to consider when running any benchmark that databases behave differently when they are completely empty compared to when they contain data. Therefore, this should be taken into account when running the initial workload, as performance and behavior observed in an empty database may not accurately reflect real-world usage scenarios until some initial data has been loaded.
As explained in our documentation, MongoDB Workload Generator supports various configuration options, including the ability to adjust the ratio of each CRUD operation. This feature comes in handy, especially if you want to run the initial workload as a “loading” phase. For example, the initial workload can be executed with the –insert_ratio 100 flag, which effectively serves as the ‘loading’ stage to populate the collections with initial data, since the tool will perform only insert operations, and then you can proceed with the default ratios as explained below.
To simulate a realistic application workload, the tool executes a predefined mix of database operations. This distribution is designed to reflect common usage patterns observed in many production environments. By default, 60% of the queries are FIND operations, representing read-heavy access patterns typical of real-world applications where users frequently retrieve data. UPDATE operations make up 20% of the workload, simulating scenarios where existing records are modified, such as status changes, data corrections, or user interactions. INSERT operations account for 10% of the queries, reflecting the creation of new records (creating new flights). Finally, 10% of the operations are DELETE queries, representing data cleanup or removal tasks that are less frequent but still necessary in most systems.
This default distribution provides a balanced and meaningful baseline for performance testing and stress simulations. However, users are encouraged to adjust these ratios to better align with their specific use cases and workload characteristics.
During execution, the tool generates a real-time report every five seconds, showing the average number of queries executed across all utilized CPU cores and a detailed breakdown by operation type. A final summary report is produced at the end of the benchmark, providing statistics on overall workload performance and collection activity.
The MongoDB Performance Tuning guide is a collection of insights, strategies, and best practices from Percona’s MongoDB experts. Use it to diagnose — and correct — the issues that may be affecting your database’s performance. Get it now
MongoDB Benchmarking
The MongoDB cluster used for this benchmark has been configured as shown below:
MongoDB version: Percona Server for MongoDB (6.0.20-17)
Shards: 2
Nodes per shard: 3 (1 arbiter in each shard)
Config nodes: 3
Mongos routers: 2
Each of the nodes in the cluster is an ec2 t2.medium instance with the following specifications:
Now that you’re familiar with the environment and MongoDB Workload Generator, you can begin by running a sample workload to observe its behavior in action.
To enhance visibility into the workload’s performance and impact, we recommend using Percona Monitoring and Management (PMM). PMM provides powerful visualizations and metrics that allow you to monitor how different configuration parameters affect your MongoDB environment in real time. This setup is especially useful for identifying performance bottlenecks, understanding query behavior, and validating changes to your workload configurations and system changes (OS tuning, hardware, etc.).
Workload #1: Default settings
Our first workload will be executed without specifying any parameters. In this case, MongoDB Workload Generator will run with its default settings, offering a simple way to get started and observe its output and behavior. The default configuration is not optimized, as it purposely includes a randomized mix of optimized and ineffective queries, which may result in variable execution times and throughput.
Let’s see the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ./mongodbWorkload.py 2025-04-23 14:05:35 - INFO - Configuring Workload 2025-04-23 14:05:35 - INFO - Collection flights_1 created 2025-04-23 14:05:36 - INFO - Indexes created 2025-04-23 14:05:36 - INFO - Duration: 60 seconds CPUs: 1 Threads: (Per CPU: 4 | Total: 4) Collections: 1 Configure Sharding: False Insert batch size: 10 Optimized workload: False Workload ratio: SELECTS: 60% | INSERTS: 10% | UPDATES: 20% | DELETES: 10% Report frequency: 5 seconds Report logfile: None =============================================================================== Workload Started =============================================================================== 2025-04-23 14:05:42 - INFO - AVG Operations last 5s (1 CPUs): 36.80 (SELECTS: 22.00, INSERTS: 4.00, UPDATES: 6.00, DELETES: 4.80) 2025-04-23 14:05:47 - INFO - AVG Operations last 5s (1 CPUs): 39.40 (SELECTS: 25.20, INSERTS: 4.20, UPDATES: 7.20, DELETES: 2.80) 2025-04-23 14:05:52 - INFO - AVG Operations last 5s (1 CPUs): 40.20 (SELECTS: 25.80, INSERTS: 4.40, UPDATES: 6.40, DELETES: 3.60) 2025-04-23 14:05:57 - INFO - AVG Operations last 5s (1 CPUs): 38.80 (SELECTS: 24.80, INSERTS: 4.00, UPDATES: 5.80, DELETES: 4.20) .... removed for brevity ..... =============================================================================== Workload Finished =============================================================================== 2025-04-23 14:06:43 - INFO - =============================================================================== | Collection Stats | =============================================================================== | Name | Sharded | Size | Documents | =============================================================================== | flights_1 | False | 5.74 MB | 2139 | =============================================================================== 2025-04-23 14:06:43 - INFO - =============================================================================== Workload Stats (All CPUs Combined) =============================================================================== Workload Runtime: 1.13 minutes CPUs Used: 1 Total Operations: 2363 (SELECT: 1426, INSERT: 236, UPDATE: 459, DELETE: 242) AVG QPS: 34.95 (SELECTS: 21.09, INSERTS: 3.49, UPDATES: 6.79, DELETES: 3.58) Documents Inserted: 2360, Matching Documents Selected: 828, Documents Updated: 421, Documents Deleted: 221 =============================================================================== |
Additional workloads: Custom settings
The subsequent workloads will showcase the tool’s ability to generate substantial loads and its adaptability through varied configurations: increased CPU and thread counts, sharding enablement, and the creation of three collections. I will demonstrate the process of incrementally increasing the workload and observing its effects. For conciseness, the tool’s output (previously illustrated above) will be omitted. The initial series will focus on the progressive scaling of CPU utilization:
1 | for x in 2 4 6 8 10 12 ; do ./mongodbWorkload.py --cpu ${x} --threads 20 --shard --collections 3 --optimize --runtime 30s; sleep 30; done |
In the next series, I increased both the threads along with the CPUs:
1 | for x in 2 4 6 8 10 12 ; do ./mongodbWorkload.py --cpu ${x} --threads $((10*$x)) --shard --collections 3 --optimize --runtime 30s; sleep 30; done |
Observing workload variations
After running a range of distinct workloads, the differences between them become clear, demonstrating how the MongoDB Workload Generator can be used to test a variety of real-world scenarios effectively, tailored to your particular needs.
As you review the PMM graphs below, look closely for patterns in query throughput, CPU utilization, Operations, and I/O activity. Can you pinpoint when each workload was executed and how its characteristics influenced the system? These visual differences not only demonstrate the tool’s flexibility but also underscore the importance of precise, metrics-driven benchmarking, enabling a more informed and effective decision-making process.
The initial graphs depict the baseline workload using default settings. As illustrated below, the operation count increased when the first workload was started; however, as anticipated, this was a small-scale workload resulting in minimal system impact and consequently low CPU utilization:
The subsequent graphs represent the system’s response to the additional diverse workloads executed. They clearly delineate the impact of each incremental configuration adjustment, providing insights into the performance characteristics under varying stress levels:
The final graph shows the even workload distribution between the available mongos routers:
MongoDB Benchmarking made easy
MongoDB Workload Generator is more than just a testing utility; it’s a practical benchmarking framework designed specifically for MongoDB environments. Whether you’re optimizing performance, stress testing under load, or comparing configuration impacts, this tool provides a controlled and repeatable way to simulate realistic database activity.
Combined with Percona Monitoring and Management (PMM), it becomes a powerful solution for gaining insights into system behavior, identifying inefficiencies, and making informed optimization decisions.