0% found this document useful (0 votes)
98 views

TopDev - High Performance and Scalability Database Design - V2.1

This document provides an overview of approaches for designing databases for high performance and scalability. It discusses picking the right tools based on demand factors like growth rate and data freshness needs. Key approaches covered include caching data for high read loads, separating operational and reporting databases, and testing systems to benchmark and improve performance. The document also provides a case study on scaling an e-commerce system through techniques like database partitioning, replication, and optimizing queries.

Uploaded by

Phan Minh Duc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

TopDev - High Performance and Scalability Database Design - V2.1

This document provides an overview of approaches for designing databases for high performance and scalability. It discusses picking the right tools based on demand factors like growth rate and data freshness needs. Key approaches covered include caching data for high read loads, separating operational and reporting databases, and testing systems to benchmark and improve performance. The document also provides a case study on scaling an e-commerce system through techniques like database partitioning, replication, and optimizing queries.

Uploaded by

Phan Minh Duc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

High Performance and Scalability

Database Design

Nguyễn Sơn Tùng TopDev Event


Former Head of Technology @ Tiki.vn 22/08/2017
Former Technical Manager @ Clip.vn Ho Chi Minh city
facebook.com/tungns
Agenda Extra Materials
○ Design documents

I. PART 1 - Overview and Approaches ○ Samples

II. PART 2 – Performance and Scaling

III. PART 3 – Case Study

Scopes
What will include What will NOT include
https://goo.gl/nF95Nf
○ Scalability ○ Maintenance

○ Performance ○ Fail Tolerance

○ Approaches, Best Practice


PART 1
# Overview and Approaches
Why Database matters?

Complexity

Data Intensive Quantity


growth
Application

Business
changing
velocity
Data Intensive - Complexity

A DB Diagram of Magento 2.1.3


Data Intensive – Quantity & Changes

Code changes
Data size growth by one of our Application year-
over-year
If we don’t well prepare
> Service Unavailable > Traffic Drop
If we don’t well prepare
Panic!!!

“What can’t kill you makes you stronger”


Yes! It’s all serious
… when it comes to $$$
How should we deal with this?
What affect DB Performance?

Quite a lots!!!
Approaches
#1 Pick the right tools for your demand

o No One-size-fit-all

o Some pitfalls:

• Too complex: Intensive time to build, maintain or change

• Too bad: Too many mistakes, not ready to scale


Approaches
#1 Pick the right tools for your demand

o Answer some questions:

• How fast will it grow? (gradually vs exponentially)

• What traffic pattern looks like?

Example: Flash sales

Early invest in high-performance


and high-scalability
Approaches
#1 Pick the right tools for your demand

o Answer some questions:

• How fresh (real-time) each data must be?

0s 5
24h
min

cacheable
Approaches
#1 Pick the right tools for your demand

o Answer some questions:

• Is there any locking model?

Transactional DB
Read locking

Virtual Item
Physical item
Unlimited purchases
Discount 30%
1 item only
1000 purchasing attempts!!!
Approaches
#2. Precise (trust) more important than scaling (ready)

… again, $$$
Approaches
#3. Testing yourself

 Simulate large data


Approaches
#3. Testing yourself

 Benchmark

o Server: sysbench

o Database: mysqlslap

o Application: ab, siege, loader.io,…


Approaches
#4. Improving

 Stay monitored: We can’t improve anything without measurement

NewRelic Database Monitor


Sample, EXPLAIN &
Suggestions
Approaches
#4. Improving

 Stay alerted

Daily reports

Top slow queries


Approaches
#4. Improving DB Processing

 Top-down approach Reporting

User Traffic
 Rule of thumb: 80/20
o >80% Traffic hit to cache, doesn’t need slow
I/O DB Transaction Processing, require <20%
resource cost

o Statistics, Aggregates, Analytical tasks are


<20% but cost >80% resource cost
Conclusion: Scaling Approaches
1. There’s no One-Size-Fit-All

2. Understand your business

3. Attack Top -> Down, 80/20

4. Measure -> Improve


PART 2
# Performance and Scaling
A quick look: Typical Data Architecture

Analytical system

source: Data Intensive Applications - O'reilly


Scaling Principles
 [Important] Speeding-up with Caching/Indexed Data

o High read performance

 [Important] Separating Operational DB vs Reporting DB

o Different complexity

 Other:

o Separating Read/Write: Different I/O

o Speeding-up with Pre-calculated Data: E.g: Statistics data

o Avoiding monolithics (everything in one DB): Hard to scale


Scale with Data Caching
Most efficient for high-traffic
applications, landing pages
Fullpage cache (Varnish)

Partial (template cache)

Data cache

Query cache

Pre-calculated data

Real-data

A Traffic Spike
Data Cache and Indexed Data
 Cache:  Indexed Data:

o Key => {Value} o Indexed = Not Source of Truth

o Super high read performance o High read performance

o Lack of filter/query abilities o without loosing filter/query abilities

o Engine: Redis, Memcache o Engine: MongoDB


Data Cache and Indexed Data (cont)
 Cache, Indexed data refreshing

Passive Active

clear()

Target Target
page or page or
Data Changed Worker
Data Data

build()

5
buildCache()
min
Scale Reporting (1)
 Simple approach
Scale Reporting (2)
 Data-warehouse approach
Example: Analytical Service, BI Tool

Analytics Service consumed by Excel Holistics BI tool


Application Usage
Common pitfall
 Slow queries (legendary)

 Indexing problems

 Lock/Deadlock

 Putting queries into Loop

 Retrieving too much data (bandwidth issue)


Slow queries
 Detection: logging

 Investigation:

o EXPLAIN, EXPLAIN ANALYZE


(PostgreSQL)

o Profiling

 Powerful tools:

o DB Monitoring (e.g: NewRelic)

o percona pt-query-digest
Indexing
 Key of performance

 Too many, too few

o When will we do indexing? (or Should we index everything?)

o Check data cardinality

 Fields to be indexed:

o Sorting, Searching, Grouping, Joining

 Types of Index: B-Tree, Hash


Lock and DeadLock
○ Locks and Deadlocks: When will it happen?

○ How to avoid?
Pre-calculation
On-demand calculation vs Pre-calculation

o Similar to Cache (timeout) vs Indexed Data (permanent)

o Example: Calculating Cohort data (Retention)


DB Normalization
 Normalization: To do or not to do

o Data duplication avoid JOIN

o Be-aware of duplication data-updating

 Foreign keys

o Increase change of table-locks

o Be-aware of Cascading deleting and updating


Triggers and Events
 Be aware:

o Hidden logics

o Hard to monitor

 Triggers increase change of table-locks


Scaling Infrastructure
Scale Vertically
 DB Partitioning

o Some contrainsts

• A primary must include all columns in the table's


partitioning location

• All parts of a PRIMARY KEY must be NOT NULL

o Partitioned by: Date/Time, ID

 Configuration tuning (my.cnf)

 More RAM, SSD, …


Partitioning
Scale Horizontally
Replication

 Master-Slave: MySQL Replication

Replication model Separating READ/Write


from Application
Scale Horizontally
 Replication setup
Downtime
Scale Horizontally
Replication

 Multi-Master: Galera Cluster, Percona ExtraDB Cluster

Working model Commit all nodes or commit nothing


source: http://galeracluster.com/
PART 3
# Case-study
Before we begin
 Conventions

 Tools

o Modelling (MySQL Workbench)

o Faking Data (fake2db)

o Testing and analyzing queries (EXPLAIN, ANALYZE, PROFILE)


Case Study: E-commerce System
Main Flow

Listing/Landing pages Product Detail Order


Data Architecture
DB Diagram
DB Diagram - EAV
Data Update Flow
Thank you
 QnA

 Sharing your own best practices

You might also like