05 Chapter Performance MongoDB
05 Chapter Performance MongoDB
PERFORMANCE MONGODB
Table of Contents
˗ Memory size
˗ Schema Design
˗ Indexes
˗ CRUD Optimization
˗ Performance on Clusters
1
11/22/2023
Memory size
Memory size
2
11/22/2023
Memory size
Von Neumann Architecture
Memory size
RAM/Memory ˗ Memory is a quintessential
resource.
˗ the availability of RAM and
the fall of its production costs
contributed for the
development of databases'
architectures.
3
11/22/2023
Memory size
RAM/Memory ˗ RAM or memory is 25 times faster
than common SSDs also makes
this transition of Disk oriented into
RAM oriented a nice, strong
appealing factor for databases to
be designed around usage of
memory.
˗ MongoDB has storage engines
that are either very dependent on
RAM, or even completely in
memory execution modes for its
data management operations.
Memory size
RAM/Menory ˗ Ensure your working set fits in RAM
Aggregation
Index Traversing
Write Operations
Query Engine
Connections
˗ Properly sizing the working set
holds true whether you run
MongoDB on Atlas or manage
MongoDB yourself.
4
11/22/2023
Schema Design
Production
New Requirement
5
11/22/2023
Strategy of Modelling
Data Model
6
11/22/2023
7
11/22/2023
Design Pattern
8
11/22/2023
Indexes
9
11/22/2023
Indexes
˗ In the database world, index plays a vital role in a performance,
that not an exception with MongoDB
Indexing strategies
10
11/22/2023
11
11/22/2023
12
11/22/2023
13
11/22/2023
14
11/22/2023
15
11/22/2023
16
11/22/2023
Key Consideration
˗ Index create in foreground will do collection level locking
˗ Index creation in the background helps to overcome the locking
bottleneck but decrease the efficiency of index traversal
˗ Recommend the developer to write the covered query. The kind
of query will be entirely satisfied with an index. So zero
documents need to be inspected to satisfy the query, and this
makes the query run lot faster. All the projection keys need to
be indexed
˗ Use Index to sort the result and avoid blocking sort
˗ Remove Duplicate and unused index, it also improve the disk
throughput and memory optimization
17
11/22/2023
Performance
Considerations in
Distributed Systems
Shard Cluster
18
11/22/2023
Replica Cluster
Replication
Replica Cluster
Replication
˗ Maintain multiple copies of your data
˗ Provides redundancy and increases data availability
With multiple copies of data on different database servers, it provides a
level of fault tolerance against the loss of a single database server.
˗ In some cases, it can provide increased read capacity as
clients can send read operations to different servers.
Maintaining copies of data in different data centers can increase data
locality and availability for distributed applications.
You can also maintain additional copies for dedicated purposes, such
as disaster recovery, reporting, or backup.
19
11/22/2023
Replica Cluster
Replica Set
- is a group of gomond instances that
maintain the same data set.
- contains several data bearing nodes and
optionally one arbiter node. Of the data
bearing nodes, one and only one member is
deemed the primary node, while the other
nodes are deemed secondary nodes.
- Although clients cannot write data to
secondaries, clients can read data from
secondary members
Replica Cluster
˗ Automatic Failover
The replica set failover
mechanism is based on voting.
A secondary node will be
elected as the primary node of
the entire replica set.
For successful voting, the
number of nodes in a replica
set must be odd
20
11/22/2023
Sharded Cluster
Sharding
˗ Sharding is a method for distributing data across multiple
machines. MongoDB uses sharding to support deployments
with very large data sets and high throughput operations.
˗ System growth: vertical and horizontal scaling.
vertical scaling
horizontal scaling
˗ MongoDB supports horizontal scaling through sharding.
Sharded Cluster
˗ Sharded Cluster
21
11/22/2023
22
11/22/2023
Latency
Latency
23
11/22/2023
Latency
Latency
24
11/22/2023
Latency
25
11/22/2023
Scatter Gather
˗ Ping all nodes of our shard cluster for the information
corresponding to a given query
Routed Queries
˗ Pinpoint exactly which shards contain the information relevant
for our client query.
26
11/22/2023
Sorting
˗ Sorting in a Sharded Cluster involves a few hurdles
Sorting Merge
27
11/22/2023
28
11/22/2023
Recap
˗ Consideration before sharding
˗ Latency
˗ Scattered gather and routed queries
˗ Sorting, limit & skip
29
11/22/2023
Read preference.
˗ By default, clients read from the primary; however, clients can
specify a read preference to send read operations to
secondaries.
30
11/22/2023
31
11/22/2023
32
11/22/2023
Analytics Queries
Analytics Queries
33
11/22/2023
34
11/22/2023
Recaps
˗ Read preferences associated with performance
˗ When it’s a good idea
Analytics queries
Local reads
˗ When it’s a bad idea
35
11/22/2023
Disclaimer
˗ Specific analytics secondary nodes
˗ Reporting on delayed consistency data
˗ Text Search
36
11/22/2023
Replica Set
37
11/22/2023
Aggregation Pipeline on a
Sharded Cluster
˗ How it works
˗ Where operations are completed
˗ Optimization
38
11/22/2023
39
11/22/2023
˗ $out
˗ $facet
˗ $lookup
˗ $graphLookup
40
11/22/2023
Aggregation Optimizations
Aggregation Optimizations
41
11/22/2023
Recaps
˗ How it works
˗ Where operation are completed
˗ Optimizations
42