0% found this document useful (0 votes)
10 views

05 Chapter Performance MongoDB

Mongo

Uploaded by

tai43464
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

05 Chapter Performance MongoDB

Mongo

Uploaded by

tai43464
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

11/22/2023

PERFORMANCE MONGODB

Nguyen Thi Hanh

Table of Contents
˗ Memory size
˗ Schema Design
˗ Indexes
˗ CRUD Optimization
˗ Performance on Clusters

1
11/22/2023

Memory size

Memory size

˗ MongoDB High Performance Database

˗ But to operate correctly, while supporting your applications,


requires adequate hardware provisioning.

2
11/22/2023

Memory size
Von Neumann Architecture

Memory size
RAM/Memory ˗ Memory is a quintessential
resource.
˗ the availability of RAM and
the fall of its production costs
contributed for the
development of databases'
architectures.

3
11/22/2023

Memory size
RAM/Memory ˗ RAM or memory is 25 times faster
than common SSDs also makes
this transition of Disk oriented into
RAM oriented a nice, strong
appealing factor for databases to
be designed around usage of
memory.
˗ MongoDB has storage engines
that are either very dependent on
RAM, or even completely in
memory execution modes for its
data management operations.

Memory size
RAM/Menory ˗ Ensure your working set fits in RAM
 Aggregation
 Index Traversing
 Write Operations
 Query Engine
 Connections
˗ Properly sizing the working set
holds true whether you run
MongoDB on Atlas or manage
MongoDB yourself.

4
11/22/2023

Schema Design

Modeling Approach MongoDB


Develop Application
and Queries

Define Data Model

Production

New Requirement

5
11/22/2023

Strategy of Modelling

Data Model

6
11/22/2023

Data Model Type

Choose Embedded VS Reference

7
11/22/2023

Design Pattern

Key Consideration (Recap too)


˗ Understand your application’s query patterns, Design your data
model, Select the appropriate indexes.
˗ MongoDB has a flexible schema does not mean you ignore
schema design.
˗ Prioritize embedding, unless there is an unavoidable reason
˗ Don’t be afraid of application-level joins: If the index is built
correctly and the returned results are limited by projection
conditions, then application-level joins will not be much more
expensive than joins in relational databases

8
11/22/2023

Key Consideration (Recap too)


˗ Array should not grow without bound
˗ When the array size growing outbound, index performance on
the array will fall down
˗ Avoid lookup if the can avoided
˗ Avoid huge number of collection
˗ Avoid default _id Filed: 12 bytes is too large and some
computational cost
˗ Optimization for keys: Every Document had schema, so every
document store key name in document and it consume more
space

Indexes

9
11/22/2023

Indexes
˗ In the database world, index plays a vital role in a performance,
that not an exception with MongoDB

Indexing strategies

˗ Use the ESR (Equality, Sort, Range) RuleThe ESR (Equality,


Sort, Range)
˗ Create Indexes to Support Your Queries
˗ Use Indexes to Sort Query
˗ Ensure Indexes Fit in RAM
˗ Create Queries that Ensure Selectivity

10
11/22/2023

Follow ESR Rule in Compound Indexes


Equality
 "Equality" refers to an exact match on a single value.
 Example: db.cars.find( { model: "Cordoba" } )
db.cars.find( { model: { $eq: "Cordoba" } } )

 Place fields that require exact matches first in your index.


 An index may have multiple keys for queries with exact matches. The
index keys for equality matches can appear in any order. However, to
satisfy an equality match with the index, all of the index keys for exact
matches must come before any other index fields.
 Exact matches should be selective. To reduce the number of index
keys scanned, ensure equality tests eliminate at least 90% of possible
document matches.

Follow ESR Rule in Compound Indexes


Sort
 "Sort" determines the order for results. Sort follows equality matches
because the equality matches reduce the number of documents that
need to be sorted.
 An index can support sort operations when the query fields are a
subset of the index keys. Sort operations on a subset of the index keys
are only supported if the query includes equality conditions for all of the
prefix keys that precede the sort keys.
 Example: queries the cars collection, the output is sorted by model
db.cars.find( { manufacturer: "GM" } ).sort( { model: 1 } )

 To improve query performance, create an index on the manufacturer


and model fields:
db.cars.createIndex( { manufacturer: 1, model: 1 } )

11
11/22/2023

Follow ESR Rule in Compound Indexes


Sort - Blocking sort
 A blocking sort indicates that MongoDB must consume and process all
input documents to the sort before returning results. Blocking sorts do
not block concurrent operations on the collection or database.
 If MongoDB cannot use an index or indexes to obtain the sort order,
MongoDB must perform a blocking sort operation on the data.
 MongoDB to use temporary files on disk to store data exceeding the
100 megabyte system memory limit while processing a blocking sort
operation.
 Sort operations that use an index often have better performance than
blocking sorts.

Follow ESR Rule in Compound Indexes


Range
 "Range" filters scan fields. The scan doesn't require an exact match,
which means range filters are loosely bound to index keys. To improve
query efficiency, make the range bounds as tight as possible and use
equality matches to limit the number of documents that must be
scanned.
 Range filters resemble the following:
db.cars.find( { price: { $gte: 15000} } )
db.cars.find( { age: { $lt: 10 } } )
db.cars.find( { priorAccidents: { $ne: null } } )

12
11/22/2023

Follow ESR Rule in Compound Indexes


˗ For compound indexes, this rule of thumb is helpful in deciding
the order of fields in the index:
 First, add those fields against which Equality queries are run.
 The next fields to be indexed should reflect the Sort order of the query.
 The last fields represent the Range of data to be accessed.

˗ If we put equality key first, we will limit the of data we looking


˗ Avoid blocking/in-memory sorting

Follow ESR Rule in Compound Indexes

13
11/22/2023

Follow ESR Rule in Compound Indexes

B-Tree & Prefix Compression: Query


Performance & Disk usage
˗ In B-Tree indexes, Low Cardinality value actually harm
performance
˗ In Low Cardinality value preference to use Partial Index

14
11/22/2023

B-Tree & Prefix Compression: Query


Performance & Disk usage
Cardinality
 Cardinality is defined to be number of unique elements present in a set.
The lower the cardinality, the more duplicated elements.
 So if a set has 5 elements made of Boolean values, then the cardinality
of the set is going to be two. So, all sets made of Booleans will have a
max cardinality of two and a min cardinality of one.

B-Tree & Prefix Compression: Query


Performance & Disk usage
How cardinality impacts indexing
 If a Boolean field is indexed, there is not much the index will improve in
terms of performance.
• Have just Booleans with a 50/50 split. The index will allow you to skip 50% of the
documents, but still be a sequential scan of the rest 50%.
• If there is a 80/20 split between true and false, then the index is pretty much
useless when querying over the true part because you will have to do a sequential
scan of 80% of the documents (But the queries looking for false will benefit from
the index).
• This applies to any field with a low cardinality, if a field has an enum of five values
and thousands of documents in each category, A similar effect can be observed.
 Indexes must be built carefully in conditions like these. One more side
effect of having an index on such fields is that it impacts writes as well.

15
11/22/2023

B-Tree & Prefix Compression: Query


Performance & Disk usage
Partial Index
 Partial indexes only index the documents in a collection that meet a
specified filter expression. By indexing a subset of the documents in a
collection, partial indexes have lower storage requirements and
reduced performance costs for index creation and maintenance.
 For example, the following operation creates a compound index that
indexes only the documents with a rating field greater than 5.
db.restaurants.createIndex(
{ cuisine: 1, name: 1 },
{ partialFilterExpression: { rating: { $gt: 5 } } }
)

Use Covered Queries When Possible


Covered query
 A covered query is a query that can be satisfied entirely using an index
and does not have to examine any documents. An index covers a
query when all of the following apply:
• all the fields in the query are part of an index, and
• all the fields returned in the results are in the same index.
• no fields in the query are equal to null (i.e. {"field" : null} or {"field" : {$eq : null}} ).
 For example, a collection inventory has the following index on the type
and item fields: db.inventory.createIndex( { type: 1, item: 1 } )
 This index will cover the following operation which queries on the type
and item fields and returns only the item field:
db.inventory.find( { type: "food", item:/^c/ },{ item: 1, _id: 0 })

16
11/22/2023

Use Covered Queries When Possible


Covered query
 For the specified index to cover the query, the projection document
must explicitly specify _id: 0 to exclude the _id field from the result
since the index does not include the _id field.
 For example, consider a collection userdata with documents of the
following form:
{ _id: 1, user: { login: "tester" } }
 The collection has the following index:
{ "user.login": 1 }
 The { "user.login": 1 } index will cover the query below:
db.userdata.find( { "user.login": "tester" }, { "user.login": 1, _id: 0 } )

Key Consideration
˗ Index create in foreground will do collection level locking
˗ Index creation in the background helps to overcome the locking
bottleneck but decrease the efficiency of index traversal
˗ Recommend the developer to write the covered query. The kind
of query will be entirely satisfied with an index. So zero
documents need to be inspected to satisfy the query, and this
makes the query run lot faster. All the projection keys need to
be indexed
˗ Use Index to sort the result and avoid blocking sort
˗ Remove Duplicate and unused index, it also improve the disk
throughput and memory optimization

17
11/22/2023

Performance
Considerations in
Distributed Systems

What is a Distributed System in MongoDB?


For a high availability solution
 Replica Cluster

 Shard Cluster

18
11/22/2023

Replica Cluster
Replication

Replica Cluster
Replication
˗ Maintain multiple copies of your data
˗ Provides redundancy and increases data availability
 With multiple copies of data on different database servers, it provides a
level of fault tolerance against the loss of a single database server.
˗ In some cases, it can provide increased read capacity as
clients can send read operations to different servers.
 Maintaining copies of data in different data centers can increase data
locality and availability for distributed applications.
 You can also maintain additional copies for dedicated purposes, such
as disaster recovery, reporting, or backup.

19
11/22/2023

Replica Cluster
Replica Set
- is a group of gomond instances that
maintain the same data set.
- contains several data bearing nodes and
optionally one arbiter node. Of the data
bearing nodes, one and only one member is
deemed the primary node, while the other
nodes are deemed secondary nodes.
- Although clients cannot write data to
secondaries, clients can read data from
secondary members

Replica Cluster
˗ Automatic Failover
 The replica set failover
mechanism is based on voting.
A secondary node will be
elected as the primary node of
the entire replica set.
 For successful voting, the
number of nodes in a replica
set must be odd

20
11/22/2023

Sharded Cluster
Sharding
˗ Sharding is a method for distributing data across multiple
machines. MongoDB uses sharding to support deployments
with very large data sets and high throughput operations.
˗ System growth: vertical and horizontal scaling.
 vertical scaling
 horizontal scaling
˗ MongoDB supports horizontal scaling through sharding.

Sharded Cluster
˗ Sharded Cluster

21
11/22/2023

Considerations Before Sharding


˗ Sharding is an horizontal scaling solution
˗ Have we reached the limits of our vertical scaling?
˗ You need to understand how your data grows and how your
data is accessed
˗ MongoDB uses the shard key to distribute the collection's
documents across shards.
˗ The shard key consists of a field or multiple fields in the
documents.
˗ It’s important to get a good shard key

Working with Distributed Systems


˗ Consider latency
˗ Data is spread across different nodes
˗ Read implications
˗ Write implications

22
11/22/2023

Latency

Latency

23
11/22/2023

Latency

Latency

24
11/22/2023

Latency

Read in Distributed Systems


˗ Two types of reads:
 Scatter Gather
 Routed Queries
˗ When
 If we are not using the shard key, we will be performing scattered
gathered queries
 If we are using the shard key, we will be performing scattered Routed
Queries
˗ Routed queries and scattered gathered will have two different
performance profiles

25
11/22/2023

Scatter Gather
˗ Ping all nodes of our shard cluster for the information
corresponding to a given query

Routed Queries
˗ Pinpoint exactly which shards contain the information relevant
for our client query.

26
11/22/2023

Sorting
˗ Sorting in a Sharded Cluster involves a few hurdles

Sorting Merge

27
11/22/2023

Limit and Skip

Limit and Skip Merge

28
11/22/2023

Recap
˗ Consideration before sharding
˗ Latency
˗ Scattered gather and routed queries
˗ Sorting, limit & skip

Reading from Secondaries

29
11/22/2023

Read preference.
˗ By default, clients read from the primary; however, clients can
specify a read preference to send read operations to
secondaries.

Reading from Secondaries

30
11/22/2023

Reading from Secondaries

Reading from Secondaries

31
11/22/2023

Reading from Secondaries

Reading from Secondaries

˗ When Reading from a Secondary is a Good Idea?


 Analytics queries
 Local reads

32
11/22/2023

Analytics Queries

Analytics Queries

33
11/22/2023

Reading from Secondaries

˗ When Reading from a Secondary is a Bad Idea


 Providing extra capacity for reads

Providing extra capacity for reads

34
11/22/2023

Recaps
˗ Read preferences associated with performance
˗ When it’s a good idea
 Analytics queries
 Local reads
˗ When it’s a bad idea

Replica Sets Nodes with


Differing Indexes

35
11/22/2023

Disclaimer
˗ Specific analytics secondary nodes
˗ Reporting on delayed consistency data
˗ Text Search

Secondary Node Consisderations


˗ Prevent such a secondary from becoming primary
 Priority =0
 Hidden Node
 Deplayed Secondary

36
11/22/2023

Replica Set

37
11/22/2023

Aggregation Pipeline on a
Sharded Cluster

˗ How it works
˗ Where operations are completed
˗ Optimization

38
11/22/2023

39
11/22/2023

˗ $out
˗ $facet
˗ $lookup
˗ $graphLookup

40
11/22/2023

Aggregation Optimizations

Aggregation Optimizations

41
11/22/2023

Recaps
˗ How it works
˗ Where operation are completed
˗ Optimizations

42

You might also like