0% found this document useful (0 votes)

143 views

Scaling Memcache at Facebook - Slides

The document summarizes Facebook's approach to scaling their Memcache infrastructure to support over 1 billion requests per second. Key aspects include: 1) Using Memcache as a front end to databases to handle the heavy read load. 2) Partitioning data and servers into multiple Memcache clusters to improve read throughput and allow independent scaling. 3) Synchronizing data between Memcache clusters and databases by tailing database commit logs to invalidate cached entries. 4) Distributing Memcache clusters across data centers for availability and low latency access from different geographic regions.

Uploaded by

gamezzzz

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views

Scaling Memcache at Facebook - Slides

Uploaded by

gamezzzz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Scaling Memcache

at Facebook

Presenter: Rajesh Nishtala ([email protected])

Co-authors: Hans Fugal, Steven Grimm, Marc
Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy,
Mike Paleczny, Daniel Peek, Paul Saab, David Stafford,
Tony Tung, Venkateshwaran Venkataramani
Infrastructure Requirements
for Facebook
1.  Near real-time communication
2.  Aggregate content on-the-fly from
multiple sources
3.  Be able to access and update very popular
shared content
4.  Scale to process millions of user requests
per second
Design Requirements
Support a very heavy read load
•  Over 1 billion reads / second
•  Insulate backend services from high read rates
Geographically Distributed
Support a constantly evolving product
•  System must be flexible enough to support a variety of use cases
•  Support rapid deployment of new features
Persistence handled outside the system
•  Support mechanisms to refill after updates
memcached
•  Basic building block for a distributed key-value store
for Facebook
•  Trillions of items
•  Billions of requests / second
•  Network attached in-memory hash table
•  Supports LRU based eviction
Roadmap

1.  Single front-end cluster Geo Region Geo Region

•  Read heavy workload
Front-End Cluster Front-End Cluster
•  Wide fanout
•  Handling failures Web Server Web Server

Storage Replication
2.  Multiple front-end clusters
FE FE
•  Controlling data replication Memcache Memcache

•  Data consistency
Storage Cluster Storage Cluster
(Master) (Replica)
3.  Multiple Regions
•  Data consistency
Pre-memcache
Just a few databases are enough to support the load

Web Server Web Server Web Server Web Server

Database Database Database

Data sharded across the databases

Why Separate Cache?
High fanout and multiple rounds of data fetching

Interstitial slide

Data dependency DAG for a small request

Scaling memcache in 4 easy steps
10s of servers & millions of operations per second

0 No memcache servers

1 A few memcache servers

2
Interstitial slide
Many memcache servers in one cluster

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

Need more read capacity

•  Two orders of magnitude

more reads than writes Web Server
•  Solution: Deploy a few 1. Get (key)
memcache hosts to handle
3. DB lookup 4. Set (key)
the read capacity 2. Miss (key)

•  How do we store data?

•  Demand-filled look-aside cache Memcache

•  Common case is data is Database

available in the cache Database Database
Handling updates

•  Memcache needs to be
invalidated after DB write Web Server
•  Prefer deletes to sets
1. Database 2. Delete
•  Idempotent update

•  Demand filled
•  Up to web application
to specify which keys Memcache

to invalidate after Database

database update
Problems with look-aside caching
Stale Sets

•  Extend memcache
Web Server Web Server protocol with leases
A B
•  Return and attach a
1. Read (A) lease-id with every miss
3. Read (B) 4. Set (B) •  Lease-id is invalidated
inside server on a delete
5. Set (A)
•  Disallow set if the
lease-id is
Database Memcache invalid at the server
A
B A
B
2. Updated to (B)

MC & DB Inconsistent
Problems with look-aside caching
Thundering Herds

Web Web Web •  Memcache server

Server Server Server arbitrates access
to database
•  Small extension to leases
•  Clients given a choice
of using a slightly stale
value or waiting
Database Memcache
B A
Scaling memcache in 4 easy steps
100s of servers & 10s of millions of operations per second

0 No memcache servers

1 A few memcache servers

2
Interstitial slide
Many memcache servers in one cluster

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

Need even more read capacity

Web Server Web Server Web Server Web Server

Memcache Memcache Memcache Memcache

•  Items are distributed across memcache servers by using

consistent hashing on the key
•  Individual items are rarely accessed very frequently so over replication
doesn t make sense
•  All web servers talk to all memcache servers
•  Accessing 100s of memcache servers to process a user request is
common
Incast congestion
Web Server
DROPS

Get key1
10kB val Get
10kBkey2
val Getval
5kB key3 Get keyN
7kB val

Memcache Memcache Memcache Memcache

•  Many simultaneous responses overwhelm shared

networking resources
•  Solution: Limit the number of outstanding requests
with a sliding window
•  Larger windows cause result in more congestion
•  Smaller windows result in more round trips to the network
Scaling memcache in 4 easy
steps
1000 s of servers & 100s of millions of operations per second

0 No memcache servers

1 A few memcache servers

2
Interstitial slide
Many memcache servers in one cluster

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

Multiple clusters

•  All-to-all limits Front-End Cluster Front-End Cluster

horizontal scaling
Web Server Web Server
•  Multiple memcache
clusters front one FE FE
Memcache Memcache
DB installation
•  Have to keep the caches
consistent
Storage Cluster (Master)
•  Have to manage
over-replication of data
Databases invalidate caches
Front-End Cluster #1 Front-End Cluster #2 Front-End Cluster #3

Web Server Web Server Web Server

MC MC MC MC MC MC MC MC MC MC MC

MySQL McSqueal
Storage Server Commit Log

•  Cached data must be invalidated after database updates

•  Solution: Tail the mysql commit log and issue deletes based
on transactions that have been committed
•  Allows caches to be resynchronized in the event of a problem
Invalidation pipeline
Too many packets

MC MC MC MC MC MC MC MC MC MC MC

Memcache Memcache Memcache

Routers Routers Routers

Memcache Routers
• Aggregating deletes reduces
packet rate by 18x
• Makes configuration
McSqueal McSqueal McSqueal
management easier
DB DB DB • Each stage buffers deletes in
case downstream component is
down
Scaling memcache in 4 easy steps
1000s of servers & > 1 billion operations per second

0 No memcache servers

1 A few memcache servers

2 Many memcache servers in one cluster

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

Geographically distributed clusters
Replica

Replica Master
Writes in non-master
Database update directly in master

• Race between DB replication and subsequent DB read

Web Web
Server Server
3. Read from DB
(get missed)
2. Delete from mc
1. Write to master 4. Set potentially
state value to
memcache

Race!
Master Replica Memcache
DB 3. MySQL replication DB
Remote markers
Set a special flag that indicates whether a race is likely
Read miss path:
If marker set
read from master DB
else
read from replica DB
Web Server

1. Set remote
marker
2. Write to master

3. Delete from
memcache

Master Replica Memcache

DB 4. Mysql replication DB
5. Delete remote
marker
Putting it all together
1.  Single front-end cluster Geo Region Geo Region
•  Read heavy workload
Front-End Cluster Front-End Cluster
•  Wide fanout
•  Handling failures Web Server Web Server

Storage Replication
2.  Multiple front-end clusters
FE FE
•  Controlling data replication Memcache Memcache

•  Data consistency
Storage Cluster Storage Cluster
(Master) (Replica)
3.  Multiple Regions
•  Data consistency
Lessons Learned
•  Push complexity into the client whenever possible
•  Operational efficiency is as important
as performance
•  Separating cache and persistent store allows them
to be scaled independently
Thanks! Questions?
http://www.facebook.com/careers

12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
No ratings yet
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
2 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Kenan BP 13 0 Database Reference A1 PDF
100% (1)
Kenan BP 13 0 Database Reference A1 PDF
452 pages
Manual IQ+ Komplet
100% (1)
Manual IQ+ Komplet
296 pages
Capacity Planning For MongoDB
No ratings yet
Capacity Planning For MongoDB
36 pages
Bank Management System Srs
50% (4)
Bank Management System Srs
20 pages
BMS Specification Sheet
100% (1)
BMS Specification Sheet
30 pages
Memcache FB PDF
No ratings yet
Memcache FB PDF
14 pages
Datawarehousing
No ratings yet
Datawarehousing
71 pages
AWS Databases
No ratings yet
AWS Databases
31 pages
The 10x Academy
No ratings yet
The 10x Academy
9 pages
Tune Your Zabbix For Better Performance
No ratings yet
Tune Your Zabbix For Better Performance
31 pages
MySQL and SSD: Usage Patterns
No ratings yet
MySQL and SSD: Usage Patterns
29 pages
What Is Query Cache in MySQL
No ratings yet
What Is Query Cache in MySQL
4 pages
Simplified Way of Processing Large Data Using Chunk in Laravel
No ratings yet
Simplified Way of Processing Large Data Using Chunk in Laravel
4 pages
Reasons-To-Choose Php-Laravel-Framework
No ratings yet
Reasons-To-Choose Php-Laravel-Framework
16 pages
Context:: 1.need of Project 2.project Stages 3.project Requirements
No ratings yet
Context:: 1.need of Project 2.project Stages 3.project Requirements
7 pages
Akshay Pratap - Informatica IICS
No ratings yet
Akshay Pratap - Informatica IICS
3 pages
PROIDS: Probabilistic Data Structure Based Intrusion Detection System
No ratings yet
PROIDS: Probabilistic Data Structure Based Intrusion Detection System
13 pages
Laravel Shop Tutorial #1 - 2
No ratings yet
Laravel Shop Tutorial #1 - 2
9 pages
Tips To Build An Outstanding Dev Portfolio
100% (1)
Tips To Build An Outstanding Dev Portfolio
15 pages
Distributed Database: GDC Thana Semester 6
No ratings yet
Distributed Database: GDC Thana Semester 6
10 pages
Informatica Cloud (IICS) Architecture
No ratings yet
Informatica Cloud (IICS) Architecture
21 pages
FOODIGO
No ratings yet
FOODIGO
11 pages
Nagios Vs Zabbix
No ratings yet
Nagios Vs Zabbix
2 pages
ADBMS Parallel and Distributed Databases
No ratings yet
ADBMS Parallel and Distributed Databases
98 pages
MongoDB TCO Comparison MongoDB Oracle
No ratings yet
MongoDB TCO Comparison MongoDB Oracle
12 pages
Aws Certified Big Data Specialty
No ratings yet
Aws Certified Big Data Specialty
9 pages
Postgre SQL
No ratings yet
Postgre SQL
10 pages
Lecture 07 - Key-Value Databases
No ratings yet
Lecture 07 - Key-Value Databases
75 pages
Mysql Server 5.7 by CJ
100% (1)
Mysql Server 5.7 by CJ
42 pages
Ude My For Business Course List
No ratings yet
Ude My For Business Course List
51 pages
MySQL Perf Tuning Best Practices
No ratings yet
MySQL Perf Tuning Best Practices
30 pages
DB - LabManual - Withproject - Fall 2019 PDF
No ratings yet
DB - LabManual - Withproject - Fall 2019 PDF
114 pages
Forecasting MySQL Performance and Scalability
100% (1)
Forecasting MySQL Performance and Scalability
41 pages
A Complete Guide To Web Development in Python
No ratings yet
A Complete Guide To Web Development in Python
3 pages
Comparison of MySQL and MS SQL
100% (1)
Comparison of MySQL and MS SQL
7 pages
The AI Hierarchy of Needs
No ratings yet
The AI Hierarchy of Needs
8 pages
AWS Portfolio
No ratings yet
AWS Portfolio
76 pages
What's A Data Warehouse
No ratings yet
What's A Data Warehouse
24 pages
03 Introduction To PostgreSQL
No ratings yet
03 Introduction To PostgreSQL
43 pages
MYSQL MCQs - 1
No ratings yet
MYSQL MCQs - 1
3 pages
50 MCQ Database Questions
No ratings yet
50 MCQ Database Questions
16 pages
A Review Paper On Big Data Database'S: Cassandra, Hbase, Hive
No ratings yet
A Review Paper On Big Data Database'S: Cassandra, Hbase, Hive
6 pages
SQL Server Note
No ratings yet
SQL Server Note
42 pages
Redis Vs Ncache
No ratings yet
Redis Vs Ncache
36 pages
Aws Interview
No ratings yet
Aws Interview
4 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages
Postgres Comprehensive Administration
100% (1)
Postgres Comprehensive Administration
7 pages
An Introduction To MySQL Performance Optimization
No ratings yet
An Introduction To MySQL Performance Optimization
20 pages
Intro To Flask!
No ratings yet
Intro To Flask!
323 pages
Database Services in AWS: Relational Databases
No ratings yet
Database Services in AWS: Relational Databases
9 pages
Impala and BigQuery
No ratings yet
Impala and BigQuery
47 pages
Top 50 Mysql Interview Questions & Answers
No ratings yet
Top 50 Mysql Interview Questions & Answers
1 page
Cloud Computing Module-05 Search Creators
100% (1)
Cloud Computing Module-05 Search Creators
25 pages
A Technical Overview of Couchbase
No ratings yet
A Technical Overview of Couchbase
96 pages
Apache Cassandra Sample Resume
No ratings yet
Apache Cassandra Sample Resume
17 pages
Mysql Queries
No ratings yet
Mysql Queries
9 pages
Lauren Chen: Contact
No ratings yet
Lauren Chen: Contact
3 pages
Manage Your Data: Manage Your Business: Mapping Business Needs To Technical Capabilities
100% (1)
Manage Your Data: Manage Your Business: Mapping Business Needs To Technical Capabilities
4 pages
Infobright Best Practices
No ratings yet
Infobright Best Practices
36 pages
Azure Cosmos DB - Change Feed Support
No ratings yet
Azure Cosmos DB - Change Feed Support
8 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
Handel Mrvica-帕薩卡利亞克羅地亞變奏 3
No ratings yet
Handel Mrvica-帕薩卡利亞克羅地亞變奏 3
5 pages
月半小夜曲
No ratings yet
月半小夜曲
3 pages
Conceptpuzzles 2
No ratings yet
Conceptpuzzles 2
1 page
Resilient Ip-Networks1587052156 c6
No ratings yet
Resilient Ip-Networks1587052156 c6
51 pages
Scaling MySQL at YouTube Using Go - Slides
No ratings yet
Scaling MySQL at YouTube Using Go - Slides
30 pages
Revisiting Software Zero-Copy For Web-Caching Applications With Twin Memory Allocation
No ratings yet
Revisiting Software Zero-Copy For Web-Caching Applications With Twin Memory Allocation
6 pages
F5 Programmability and Puppet: - Colin Walker, Sr. Product Management Engineer - September 2014
No ratings yet
F5 Programmability and Puppet: - Colin Walker, Sr. Product Management Engineer - September 2014
27 pages
A Load Cluster Management System Using SNMP and Web
No ratings yet
A Load Cluster Management System Using SNMP and Web
31 pages
A Framework For Cloud Recovery Testing Slide
No ratings yet
A Framework For Cloud Recovery Testing Slide
39 pages
Apache Traffic Server - More Than Just A Proxy
No ratings yet
Apache Traffic Server - More Than Just A Proxy
50 pages
Ebay Cloud Configuration Management System
No ratings yet
Ebay Cloud Configuration Management System
33 pages
VB-UNIT III-Data Access Options' With You
No ratings yet
VB-UNIT III-Data Access Options' With You
16 pages
Datastage Errors and Resolution
No ratings yet
Datastage Errors and Resolution
10 pages
Lecture 4 - Machine Learning Pipeline
No ratings yet
Lecture 4 - Machine Learning Pipeline
38 pages
Cloud Computing2
No ratings yet
Cloud Computing2
18 pages
ict report
No ratings yet
ict report
100 pages
Smart Appointment Generation For Patient: International Journal of Advance Engineering and Research Development
No ratings yet
Smart Appointment Generation For Patient: International Journal of Advance Engineering and Research Development
3 pages
Electives Syllabus M20 V1
No ratings yet
Electives Syllabus M20 V1
96 pages
Programming
No ratings yet
Programming
87 pages
Literature Review of Restaurant Management System
100% (1)
Literature Review of Restaurant Management System
4 pages
Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0
No ratings yet
Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0
6 pages
Relational Database Design: Converting Conceptual Models To Relational Databases
No ratings yet
Relational Database Design: Converting Conceptual Models To Relational Databases
31 pages
Course - DBMS: Course Instructor Dr.K. Subrahmanyam Department of CSE, KLEF
No ratings yet
Course - DBMS: Course Instructor Dr.K. Subrahmanyam Department of CSE, KLEF
99 pages
CS-701 Software Architecture notes
No ratings yet
CS-701 Software Architecture notes
68 pages
How To: Install Activevos 9.2.2 On Jboss Eap 6.1: Solution
No ratings yet
How To: Install Activevos 9.2.2 On Jboss Eap 6.1: Solution
4 pages
Alaris System Maint Software V9.5x Tech Manual
100% (1)
Alaris System Maint Software V9.5x Tech Manual
210 pages
Grails Users Guide
No ratings yet
Grails Users Guide
401 pages
Verification Handbook For Investigative Reporting
100% (2)
Verification Handbook For Investigative Reporting
79 pages
Oracle Database 12c & Oracle Database 12C Rac On Ibm Aix: Tips and Considerations
No ratings yet
Oracle Database 12c & Oracle Database 12C Rac On Ibm Aix: Tips and Considerations
31 pages
Concurrency Control Dbms
No ratings yet
Concurrency Control Dbms
49 pages
C TADM56 74 Sample Questions
No ratings yet
C TADM56 74 Sample Questions
6 pages
Alerts
No ratings yet
Alerts
21 pages
Graphic Hub Guide 3.3 PDF
No ratings yet
Graphic Hub Guide 3.3 PDF
390 pages
Servicenow Course Content
No ratings yet
Servicenow Course Content
2 pages
Mid_Term_Question_241_CSE3521
No ratings yet
Mid_Term_Question_241_CSE3521
4 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
6 pages
6 Computer Officer 6 Level 076 - 2-12 Final PDF
0% (1)
6 Computer Officer 6 Level 076 - 2-12 Final PDF
10 pages

Scaling Memcache at Facebook - Slides

Uploaded by

Scaling Memcache at Facebook - Slides

Uploaded by

Scaling Memcache

Presenter: Rajesh Nishtala ([email protected])

1. Single front-end cluster Geo Region Geo Region

Web Server Web Server Web Server Web Server

Database Database Database

Data sharded across the databases

Data dependency DAG for a small request

1 A few memcache servers

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

• Two orders of magnitude

• How do we store data?

• Common case is data is Database

to invalidate after Database

Web Web Web • Memcache server

1 A few memcache servers

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

Web Server Web Server Web Server Web Server

Memcache Memcache Memcache Memcache

• Items are distributed across memcache servers by using

Memcache Memcache Memcache Memcache

• Many simultaneous responses overwhelm shared

1 A few memcache servers

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

• All-to-all limits Front-End Cluster Front-End Cluster

Web Server Web Server Web Server

• Cached data must be invalidated after database updates

Memcache Memcache Memcache

1 A few memcache servers

2 Many memcache servers in one cluster

3 Many memcache servers in multiple clusters

4 Geographically distributed clusters

• Race between DB replication and subsequent DB read

Master Replica Memcache

You might also like

1.  Single front-end cluster Geo Region Geo Region

•  Two orders of magnitude

•  How do we store data?

•  Common case is data is Database

Web Web Web •  Memcache server

•  Items are distributed across memcache servers by using

•  Many simultaneous responses overwhelm shared

•  All-to-all limits Front-End Cluster Front-End Cluster

•  Cached data must be invalidated after database updates

• Race between DB replication and subsequent DB read