0% found this document useful (0 votes)

98 views

TopDev - High Performance and Scalability Database Design - V2.1

This document provides an overview of approaches for designing databases for high performance and scalability. It discusses picking the right tools based on demand factors like growth rate and data freshness needs. Key approaches covered include caching data for high read loads, separating operational and reporting databases, and testing systems to benchmark and improve performance. The document also provides a case study on scaling an e-commerce system through techniques like database partitioning, replication, and optimizing queries.

Uploaded by

Phan Minh Duc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

TopDev - High Performance and Scalability Database Design - V2.1

Uploaded by

Phan Minh Duc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

High Performance and Scalability

Database Design

Nguyễn Sơn Tùng TopDev Event

Former Head of Technology @ Tiki.vn 22/08/2017
Former Technical Manager @ Clip.vn Ho Chi Minh city
facebook.com/tungns
Agenda Extra Materials
○ Design documents

I. PART 1 - Overview and Approaches ○ Samples

II. PART 2 – Performance and Scaling

III. PART 3 – Case Study

Scopes
What will include What will NOT include
https://goo.gl/nF95Nf
○ Scalability ○ Maintenance

○ Performance ○ Fail Tolerance

○ Approaches, Best Practice

PART 1
# Overview and Approaches
Why Database matters?

Complexity

Data Intensive Quantity

growth
Application

Business
changing
velocity
Data Intensive - Complexity

A DB Diagram of Magento 2.1.3

Data Intensive – Quantity & Changes

Code changes
Data size growth by one of our Application year-
over-year
If we don’t well prepare
> Service Unavailable > Traffic Drop
If we don’t well prepare
Panic!!!

“What can’t kill you makes you stronger”

Yes! It’s all serious
… when it comes to $$$
How should we deal with this?
What affect DB Performance?

Quite a lots!!!
Approaches
#1 Pick the right tools for your demand

o No One-size-fit-all

o Some pitfalls:

• Too complex: Intensive time to build, maintain or change

• Too bad: Too many mistakes, not ready to scale

Approaches
#1 Pick the right tools for your demand

o Answer some questions:

• How fast will it grow? (gradually vs exponentially)

• What traffic pattern looks like?

Example: Flash sales

Early invest in high-performance

and high-scalability
Approaches
#1 Pick the right tools for your demand

o Answer some questions:

• How fresh (real-time) each data must be?

0s 5
24h
min

cacheable
Approaches
#1 Pick the right tools for your demand

o Answer some questions:

• Is there any locking model?

Transactional DB
Read locking

Virtual Item
Physical item
Unlimited purchases
Discount 30%
1 item only
1000 purchasing attempts!!!
Approaches
#2. Precise (trust) more important than scaling (ready)

… again, $$$
Approaches
#3. Testing yourself

 Simulate large data

Approaches
#3. Testing yourself

 Benchmark

o Server: sysbench

o Database: mysqlslap

o Application: ab, siege, loader.io,…

Approaches
#4. Improving

 Stay monitored: We can’t improve anything without measurement

NewRelic Database Monitor

Sample, EXPLAIN &
Suggestions
Approaches
#4. Improving

 Stay alerted

Daily reports

Top slow queries

Approaches
#4. Improving DB Processing

 Top-down approach Reporting

User Traffic
 Rule of thumb: 80/20
o >80% Traffic hit to cache, doesn’t need slow
I/O DB Transaction Processing, require <20%
resource cost

o Statistics, Aggregates, Analytical tasks are

<20% but cost >80% resource cost
Conclusion: Scaling Approaches
1. There’s no One-Size-Fit-All

2. Understand your business

3. Attack Top -> Down, 80/20

4. Measure -> Improve

PART 2
# Performance and Scaling
A quick look: Typical Data Architecture

Analytical system

source: Data Intensive Applications - O'reilly

Scaling Principles
 [Important] Speeding-up with Caching/Indexed Data

o High read performance

 [Important] Separating Operational DB vs Reporting DB

o Different complexity

 Other:

o Separating Read/Write: Different I/O

o Speeding-up with Pre-calculated Data: E.g: Statistics data

o Avoiding monolithics (everything in one DB): Hard to scale

Scale with Data Caching
Most efficient for high-traffic
applications, landing pages
Fullpage cache (Varnish)

Partial (template cache)

Data cache

Query cache

Pre-calculated data

Real-data

A Traffic Spike
Data Cache and Indexed Data
 Cache:  Indexed Data:

o Key => {Value} o Indexed = Not Source of Truth

o Super high read performance o High read performance

o Lack of filter/query abilities o without loosing filter/query abilities

o Engine: Redis, Memcache o Engine: MongoDB

Data Cache and Indexed Data (cont)
 Cache, Indexed data refreshing

Passive Active

clear()

Target Target
page or page or
Data Changed Worker
Data Data

build()

5
buildCache()
min
Scale Reporting (1)
 Simple approach
Scale Reporting (2)
 Data-warehouse approach
Example: Analytical Service, BI Tool

Analytics Service consumed by Excel Holistics BI tool

Application Usage
Common pitfall
 Slow queries (legendary)

 Indexing problems

 Lock/Deadlock

 Putting queries into Loop

 Retrieving too much data (bandwidth issue)

Slow queries
 Detection: logging

 Investigation:

o EXPLAIN, EXPLAIN ANALYZE

(PostgreSQL)

o Profiling

 Powerful tools:

o DB Monitoring (e.g: NewRelic)

o percona pt-query-digest
Indexing
 Key of performance

 Too many, too few

o When will we do indexing? (or Should we index everything?)

o Check data cardinality

 Fields to be indexed:

o Sorting, Searching, Grouping, Joining

 Types of Index: B-Tree, Hash

Lock and DeadLock
○ Locks and Deadlocks: When will it happen?

○ How to avoid?
Pre-calculation
On-demand calculation vs Pre-calculation

o Similar to Cache (timeout) vs Indexed Data (permanent)

o Example: Calculating Cohort data (Retention)

DB Normalization
 Normalization: To do or not to do

o Data duplication avoid JOIN

o Be-aware of duplication data-updating

 Foreign keys

o Increase change of table-locks

o Be-aware of Cascading deleting and updating

Triggers and Events
 Be aware:

o Hidden logics

o Hard to monitor

 Triggers increase change of table-locks

Scaling Infrastructure
Scale Vertically
 DB Partitioning

o Some contrainsts

• A primary must include all columns in the table's

partitioning location

• All parts of a PRIMARY KEY must be NOT NULL

o Partitioned by: Date/Time, ID

 Configuration tuning (my.cnf)

 More RAM, SSD, …

Partitioning
Scale Horizontally
Replication

 Master-Slave: MySQL Replication

Replication model Separating READ/Write

from Application
Scale Horizontally
 Replication setup
Downtime
Scale Horizontally
Replication

 Multi-Master: Galera Cluster, Percona ExtraDB Cluster

Working model Commit all nodes or commit nothing

source: http://galeracluster.com/
PART 3
# Case-study
Before we begin
 Conventions

 Tools

o Modelling (MySQL Workbench)

o Faking Data (fake2db)

o Testing and analyzing queries (EXPLAIN, ANALYZE, PROFILE)

Case Study: E-commerce System
Main Flow

Listing/Landing pages Product Detail Order

Data Architecture
DB Diagram
DB Diagram - EAV
Data Update Flow
Thank you
 QnA

 Sharing your own best practices

SI Modernization Scorecard
No ratings yet
SI Modernization Scorecard
43 pages
Designing Data Intensive Applications
25% (4)
Designing Data Intensive Applications
61 pages
Contractor's Material and Test Certificate For Underground Piping
0% (1)
Contractor's Material and Test Certificate For Underground Piping
4 pages
New Slayer Legend - Skill Builds and Promotions
No ratings yet
New Slayer Legend - Skill Builds and Promotions
16 pages
Batman Arkham Knights Fan Script by Jarol Tilap De8da3a
No ratings yet
Batman Arkham Knights Fan Script by Jarol Tilap De8da3a
193 pages
Working:: Ib Mathematics SL TOPIC: Vectors Vector Algebra 1
No ratings yet
Working:: Ib Mathematics SL TOPIC: Vectors Vector Algebra 1
17 pages
The Data Acess HandBook
No ratings yet
The Data Acess HandBook
357 pages
Robert Lee Frost (March 26, 1874 - January 29, 1963) Was An American Poet
No ratings yet
Robert Lee Frost (March 26, 1874 - January 29, 1963) Was An American Poet
9 pages
2017 Raccs
100% (3)
2017 Raccs
15 pages
Database - Design
No ratings yet
Database - Design
9 pages
20250219 - Zafin Learn Session - PostgreSQL Performance for Application Developers
No ratings yet
20250219 - Zafin Learn Session - PostgreSQL Performance for Application Developers
58 pages
4 Zos PDF
No ratings yet
4 Zos PDF
29 pages
Oracle Performance
No ratings yet
Oracle Performance
5 pages
Db2 For ZOS Ultra High Performance and Tuning
No ratings yet
Db2 For ZOS Ultra High Performance and Tuning
44 pages
6 Causes Poor Database
No ratings yet
6 Causes Poor Database
6 pages
Dzone 2017guidetodatabases
No ratings yet
Dzone 2017guidetodatabases
50 pages
Sqlfordevscom Next Level Database Techniques For Developers 37 40
No ratings yet
Sqlfordevscom Next Level Database Techniques For Developers 37 40
4 pages
Introduction To DB2 LUW Performance
No ratings yet
Introduction To DB2 LUW Performance
268 pages
Performance Tuning PostgreSQL
No ratings yet
Performance Tuning PostgreSQL
25 pages
SC4x W3L1 TopicsInDatabases v2
No ratings yet
SC4x W3L1 TopicsInDatabases v2
37 pages
Performance Tunning
No ratings yet
Performance Tunning
459 pages
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
No ratings yet
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
50 pages
Untitled document (1)
No ratings yet
Untitled document (1)
3 pages
Database Tuning Notes1344
No ratings yet
Database Tuning Notes1344
19 pages
Sqlfordevscom Next Level Database Techniques For Developers PDF
No ratings yet
Sqlfordevscom Next Level Database Techniques For Developers PDF
50 pages
Rwws Mysql 2006
No ratings yet
Rwws Mysql 2006
73 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Capacity Planning
No ratings yet
Capacity Planning
19 pages
Database Performance Optimization. Andrey Avtomonov
100% (1)
Database Performance Optimization. Andrey Avtomonov
26 pages
Advance Mid
No ratings yet
Advance Mid
17 pages
When Where and Why To Use NoSQL
No ratings yet
When Where and Why To Use NoSQL
13 pages
Database_Performance_Tuning_and_Query_Op
No ratings yet
Database_Performance_Tuning_and_Query_Op
9 pages
IBM-DB2 Top 10 Performance Tips - Hayes
No ratings yet
IBM-DB2 Top 10 Performance Tips - Hayes
10 pages
Digitization Week 3
No ratings yet
Digitization Week 3
13 pages
Oracle DB Tuning-2
No ratings yet
Oracle DB Tuning-2
24 pages
Advanced Database Totorials 1
No ratings yet
Advanced Database Totorials 1
95 pages
ADODBIILAB6
No ratings yet
ADODBIILAB6
9 pages
Data Warehousing For Dummies
From Everand
Data Warehousing For Dummies
Thomas C. Hammergren
4/5 (1)
0179 Advanced Mysql Performance Optimization
No ratings yet
0179 Advanced Mysql Performance Optimization
138 pages
How To Optimize Mysql Queries For Speed and Performance
No ratings yet
How To Optimize Mysql Queries For Speed and Performance
2 pages
BIS Lecture 01 - Introduction (1)
No ratings yet
BIS Lecture 01 - Introduction (1)
28 pages
Cache Hit Ratio
No ratings yet
Cache Hit Ratio
3 pages
Linode UnderstandingDatabases ExtendedEdition
No ratings yet
Linode UnderstandingDatabases ExtendedEdition
259 pages
MySQL Perf Tuning OOW2015 Dim
No ratings yet
MySQL Perf Tuning OOW2015 Dim
141 pages
SQL Server Performance Tuning Interview Questions
100% (1)
SQL Server Performance Tuning Interview Questions
12 pages
DBMS Unit5
No ratings yet
DBMS Unit5
30 pages
Whack Am Ole Short
No ratings yet
Whack Am Ole Short
60 pages
Performance SQL Server PDF
100% (1)
Performance SQL Server PDF
81 pages
dbms ----
No ratings yet
dbms ----
12 pages
17 ways to speed your SQL queries
No ratings yet
17 ways to speed your SQL queries
8 pages
Informatica PowerCenter Performance Tuning Tips
No ratings yet
Informatica PowerCenter Performance Tuning Tips
8 pages
Oreilly Report High Performance Data Architectures
No ratings yet
Oreilly Report High Performance Data Architectures
35 pages
DB Scaling Cost Savings Measures 1709010009
No ratings yet
DB Scaling Cost Savings Measures 1709010009
15 pages
Llpxucx2tgh3 MongoDBModernizationScorecard2023
No ratings yet
Llpxucx2tgh3 MongoDBModernizationScorecard2023
36 pages
SQL Interview Topics (1)
No ratings yet
SQL Interview Topics (1)
3 pages
WhitePaper Adding Speed and Scale To PostgreSQL
No ratings yet
WhitePaper Adding Speed and Scale To PostgreSQL
11 pages
Scale Perf Best Practices
No ratings yet
Scale Perf Best Practices
39 pages
Storagesystems
No ratings yet
Storagesystems
41 pages
MySQL Press Mysql Database Design and Tuning Jun 2005 Ebook
No ratings yet
MySQL Press Mysql Database Design and Tuning Jun 2005 Ebook
342 pages
Recherche
No ratings yet
Recherche
18 pages
Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte
No ratings yet
Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte
41 pages
The Data Access Handbook Achieving Optimal Database Application Performance and Scalability 1st Edition John Goodson instant download
No ratings yet
The Data Access Handbook Achieving Optimal Database Application Performance and Scalability 1st Edition John Goodson instant download
54 pages
Best Practice in Database Development For Performance
No ratings yet
Best Practice in Database Development For Performance
14 pages
Top 10 Strategies For Oracle Performance Part 3
No ratings yet
Top 10 Strategies For Oracle Performance Part 3
8 pages
Data Engineering - Behind the Scene of Data by Hoda Ragaie
No ratings yet
Data Engineering - Behind the Scene of Data by Hoda Ragaie
44 pages
Foundation_of_Database_Management
No ratings yet
Foundation_of_Database_Management
30 pages
AI Club Introducing Students To World of Machine Learning: Aloha Spirit
No ratings yet
AI Club Introducing Students To World of Machine Learning: Aloha Spirit
1 page
Toate Subiectele Sunt Obligatorii. Se Acord Timpul Efectiv de Lucru Este de 3 Ore
No ratings yet
Toate Subiectele Sunt Obligatorii. Se Acord Timpul Efectiv de Lucru Este de 3 Ore
0 pages
Instructions On How To Write The Macroeconomics SEMINAR PAPER (Bonus 10%)
No ratings yet
Instructions On How To Write The Macroeconomics SEMINAR PAPER (Bonus 10%)
2 pages
Mutations and Genetic Disorders PPT
No ratings yet
Mutations and Genetic Disorders PPT
18 pages
To Sir, With Love Is A 1959: Autobiographical Novel E. R. Braithwaite East End of London
No ratings yet
To Sir, With Love Is A 1959: Autobiographical Novel E. R. Braithwaite East End of London
2 pages
A Research Project 2
No ratings yet
A Research Project 2
54 pages
English 5
No ratings yet
English 5
72 pages
Math 12 BESR ABM Q4-Week 6
No ratings yet
Math 12 BESR ABM Q4-Week 6
10 pages
UCLA Extension
100% (1)
UCLA Extension
4 pages
Turkish Studies Indeks
No ratings yet
Turkish Studies Indeks
120 pages
Contoh Soal Future Continuous Tense - Jawaban Bahasa Inggris
No ratings yet
Contoh Soal Future Continuous Tense - Jawaban Bahasa Inggris
1 page
Anaphylactic Death Final
No ratings yet
Anaphylactic Death Final
73 pages
YokotaJ Learning Through Literature PDF
No ratings yet
YokotaJ Learning Through Literature PDF
8 pages
JA Defendant (CANDOLE) Delfin v. Villarin
No ratings yet
JA Defendant (CANDOLE) Delfin v. Villarin
7 pages
Yogam & Siddha For Hypertension
No ratings yet
Yogam & Siddha For Hypertension
8 pages
Sonicwall Configuration Guide For IPsec With NS1000 Ver - 1.0 - Final
No ratings yet
Sonicwall Configuration Guide For IPsec With NS1000 Ver - 1.0 - Final
68 pages
PP VS Andan
No ratings yet
PP VS Andan
3 pages
Calcite Vs Aragonite
No ratings yet
Calcite Vs Aragonite
1 page
2Gb Ddr3L Sdram: Lead-Free&Halogen-Free (Rohs Compliant)
No ratings yet
2Gb Ddr3L Sdram: Lead-Free&Halogen-Free (Rohs Compliant)
32 pages
Notificación CPI
No ratings yet
Notificación CPI
6 pages
Math Centers - 2d Shapes 3d Objects
No ratings yet
Math Centers - 2d Shapes 3d Objects
11 pages
RT BOARD Table of Specification
No ratings yet
RT BOARD Table of Specification
30 pages
The English Curriculum Guide 2023
No ratings yet
The English Curriculum Guide 2023
111 pages
DR Jekyll and MR Hyde Plot Summary
No ratings yet
DR Jekyll and MR Hyde Plot Summary
2 pages

TopDev - High Performance and Scalability Database Design - V2.1

Uploaded by

TopDev - High Performance and Scalability Database Design - V2.1

Uploaded by

High Performance and Scalability

Nguyễn Sơn Tùng TopDev Event

I. PART 1 - Overview and Approaches ○ Samples

II. PART 2 – Performance and Scaling

III. PART 3 – Case Study

○ Performance ○ Fail Tolerance

○ Approaches, Best Practice

Data Intensive Quantity

A DB Diagram of Magento 2.1.3

“What can’t kill you makes you stronger”

• Too complex: Intensive time to build, maintain or change

• Too bad: Too many mistakes, not ready to scale

o Answer some questions:

• How fast will it grow? (gradually vs exponentially)

• What traffic pattern looks like?

Example: Flash sales

Early invest in high-performance

o Answer some questions:

• How fresh (real-time) each data must be?

o Answer some questions:

• Is there any locking model?

 Simulate large data

o Application: ab, siege, loader.io,…

 Stay monitored: We can’t improve anything without measurement

NewRelic Database Monitor

Top slow queries

 Top-down approach Reporting

o Statistics, Aggregates, Analytical tasks are

2. Understand your business

3. Attack Top -> Down, 80/20

4. Measure -> Improve

source: Data Intensive Applications - O'reilly

o High read performance

 [Important] Separating Operational DB vs Reporting DB

o Separating Read/Write: Different I/O

o Speeding-up with Pre-calculated Data: E.g: Statistics data

o Avoiding monolithics (everything in one DB): Hard to scale

Partial (template cache)

o Key => {Value} o Indexed = Not Source of Truth

o Super high read performance o High read performance

o Lack of filter/query abilities o without loosing filter/query abilities

o Engine: Redis, Memcache o Engine: MongoDB

Analytics Service consumed by Excel Holistics BI tool

 Putting queries into Loop

 Retrieving too much data (bandwidth issue)

o EXPLAIN, EXPLAIN ANALYZE

o DB Monitoring (e.g: NewRelic)

 Too many, too few

o When will we do indexing? (or Should we index everything?)

o Check data cardinality

o Sorting, Searching, Grouping, Joining

 Types of Index: B-Tree, Hash

o Similar to Cache (timeout) vs Indexed Data (permanent)

o Example: Calculating Cohort data (Retention)

o Data duplication avoid JOIN

o Be-aware of duplication data-updating

o Increase change of table-locks

o Be-aware of Cascading deleting and updating

 Triggers increase change of table-locks

• A primary must include all columns in the table's

• All parts of a PRIMARY KEY must be NOT NULL

o Partitioned by: Date/Time, ID

 Configuration tuning (my.cnf)

 More RAM, SSD, …

 Master-Slave: MySQL Replication

Replication model Separating READ/Write

 Multi-Master: Galera Cluster, Percona ExtraDB Cluster

Working model Commit all nodes or commit nothing

o Modelling (MySQL Workbench)

o Faking Data (fake2db)

o Testing and analyzing queries (EXPLAIN, ANALYZE, PROFILE)

Listing/Landing pages Product Detail Order

 Sharing your own best practices

You might also like