YouTube Architecture Dmvdivc90jj5hh1a9

YouTube grew rapidly, serving over 100 million videos per day with just a handful of employees. They used open source tools like Apache, Python, and MySQL on Linux servers. As bottlenecks arose, they addressed them through database sharding, caching, moving popular content to CDNs, and replacing MySQL with Google's BigTable for thumbnail storage. Their strategy emphasized simplicity, prioritizing video delivery, and constantly addressing new bottlenecks through iteration.

Uploaded by

SOWMYA S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

140 views5 pages

YouTube Architecture Dmvdivc90jj5hh1a9

Uploaded by

SOWMYA S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

YouTube Architecture

Source: http://highscalability.com/youtube-architecture

March 12, 2008

YouTube grew incredibly fast, to over 100 million video views per day, with only a handful
of people responsible for scaling the site. How did they manage to deliver all that video to all
those users? And how have they evolved since being acquired by Google?

Information Sources
1. Google Video

Platform
1. Apache
2. Python
3. Linux (SuSe)
4. MySQL
5. psyco, a dynamic python->C compiler
6. lighttpd for video instead of Apache

What's Inside?
The Stats

1. Supports the delivery of over 100 million videos per day.

2. Founded 2/2005
3. 3/2006 30 million video views/day
4. 7/2006 100 million video views/day
5. 2 sysadmins, 2 scalability software architects
6. 2 feature developers, 2 network engineers, 1 DBA

Recipe for handling rapid growth

while (true)
{
identify_and_fix_bottlenecks();
drink();
sleep();
notice_new_bottleneck();
}

This loop runs many times a day.

Web Servers

1. NetScalar is used for load balancing and caching static content.

2. Run Apache with mod_fast_cgi.
3. Requests are routed for handling by a Python application server.
4. Application server talks to various databases and other informations sources to get all
the data and formats the html page.
5. Can usually scale web tier by adding more machines.
6. The Python web code is usually NOT the bottleneck, it spends most of its time
blocked on RPCs.
7. Python allows rapid flexible development and deployment. This is critical given the
competition they face.
8. Usually less than 100 ms page service times.
9. Use psyco, a dynamic python->C compiler that uses a JIT compiler approach to
optimize inner loops.
10. For high CPU intensive activities like encryption, they use C extensions.
11. Some pre-generated cached HTML for expensive to render blocks.
12. Row level caching in the database.
13. Fully formed Python objects are cached.
14. Some data are calculated and sent to each application so the values are cached in local
memory. This is an underused strategy. The fastest cache is in your application server
and it doesn't take much time to send precalculated data to all your servers. Just have
an agent that watches for changes, precalculates, and sends.

Video Serving

 Costs include bandwidth, hardware, and power consumption.

 Each video hosted by a mini-cluster. Each video is served by more than one machine.
 Using a a cluster means:
- More disks serving content which means more speed.
- Headroom. If a machine goes down others can take over.
- There are online backups.
 Servers use the lighttpd web server for video:
- Apache had too much overhead.
- Uses epoll to wait on multiple fds.
- Switched from single process to multiple process configuration to handle more connections.
 Most popular content is moved to a CDN (content delivery network):
- CDNs replicate content in multiple places. There's a better chance of content being closer to
the user, with fewer hops, and content will run over a more friendly network.
- CDN machines mostly serve out of memory because the content is so popular there's little
thrashing of content into and out of memory.
 Less popular content (1-20 views per day) uses YouTube servers in various colo sites.
- There's a long tail effect. A video may have a few plays, but lots of videos are being played.
Random disks blocks are being accessed.
- Caching doesn't do a lot of good in this scenario, so spending money on more cache may
not make sense. This is a very interesting point. If you have a long tail product caching won't
always be your performance savior.
- Tune RAID controller and pay attention to other lower level issues to help.
- Tune memory on each machine so there's not too much and not too little.
Serving Video Key Points

1. Keep it simple and cheap.

2. Keep a simple network path. Not too many devices between content and users.
Routers, switches, and other appliances may not be able to keep up with so much
load.
3. Use commodity hardware. More expensive hardware gets the more expensive
everything else gets too (support contracts). You are also less likely find help on the
net.
4. Use simple common tools. They use most tools build into Linux and layer on top of
those.
5. Handle random seeks well (SATA, tweaks).

Serving Thumbnails

 Surprisingly difficult to do efficiently.

 There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos.
 Thumbnails are hosted on just a few machines.
 Saw problems associated with serving a lot of small objects:
- Lots of disk seeks and problems with inode caches and page caches at OS level.
- Ran into per directory file limit. Ext3 in particular. Moved to a more hierarchical structure.
Recent improvements in the 2.6 kernel may improve Ext3 large directory handling up to 100
times, yet storing lots of files in a file system is still not a good idea.
- A high number of requests/sec as web pages can display 60 thumbnails on page.
- Under such high loads Apache performed badly.
- Used squid (reverse proxy) in front of Apache. This worked for a while, but as load
increased performance eventually decreased. Went from 300 requests/second to 20.
- Tried using lighttpd but with a single threaded it stalled. Run into problems with
multiprocesses mode because they would each keep a separate cache.
- With so many images setting up a new machine took over 24 hours.
- Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
 To solve all their problems they started using Google's BigTable, a distributed data store:
- Avoids small file problem because it clumps files together.
- Fast, fault tolerant. Assumes its working on a unreliable network.
- Lower latency because it uses a distributed multilevel cache. This cache works across
different collocation sites.
- For more information on BigTable take a look at Google Architecture, GoogleTalk
Architecture, and BigTable.

Databases

1. The Early Years

- Use MySQL to store meta data like users, tags, and descriptions.
- Served data off a monolithic RAID 10 Volume with 10 disks.
- Living off credit cards so they leased hardware. When they needed more hardware
to handle load it took a few days to order and get delivered.
- They went through a common evolution: single server, went to a single master with
multiple read slaves, then partitioned the database, and then settled on a sharding
approach.
- Suffered from replica lag. The master is multi-threaded and runs on a large machine
so it can handle a lot of work. Slaves are single threaded and usually run on lesser
machines and replication is asynchronous, so the slaves can lag significantly behind
the master.
- Updates cause cache misses which goes to disk where slow I/O causes slow
replication.
- Using a replicating architecture you need to spend a lot of money for incremental
bits of write performance.
- One of their solutions was prioritize traffic by splitting the data into two clusters: a
video watch pool and a general cluster. The idea is that people want to watch video so
that function should get the most resources. The social networking features of
YouTube are less important so they can be routed to a less capable cluster.
2. The later years:
- Went to database partitioning.
- Split into shards with users assigned to different shards.
- Spreads writes and reads.
- Much better cache locality which means less IO.
- Resulted in a 30% hardware reduction.
- Reduced replica lag to 0.
- Can now scale database almost arbitrarily.

Data Center Strategy

1. Used manage hosting providers at first. Living off credit cards so it was the only way.
2. Managed hosting can't scale with you. You can't control hardware or make favorable
networking agreements.
3. So they went to a colocation arrangement. Now they can customize everything and
negotiate their own contracts.
4. Use 5 or 6 data centers plus the CDN.
5. Videos come out of any data center. Not closest match or anything. If a video is
popular enough it will move into the CDN.
6. Video bandwidth dependent, not really latency dependent. Can come from any colo.
7. For images latency matters, especially when you have 60 images on a page.
8. Images are replicated to different data centers using BigTable. Code
looks at different metrics to know who is closest.

Lessons Learned
1. Stall for time. Creative and risky tricks can help you cope in the short term while you
work out longer term solutions.
2. Prioritize. Know what's essential to your service and prioritize your resources and
efforts around those priorities.
3. Pick your battles. Don't be afraid to outsource some essential services. YouTube
uses a CDN to distribute their most popular content. Creating their own network
would have taken too long and cost too much. You may have similar opportunities in
your system. Take a look at Software as a Service for more ideas.
4. Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond
to problems. It's true that nobody really knows what simplicity is, but if you aren't
afraid to make changes then that's a good sign simplicity is happening.
5. Shard. Sharding helps to isolate and constrain storage, CPU, memory, and IO. It's not
just about getting more writes performance.
6. Constant iteration on bottlenecks:
- Software: DB, caching
- OS: disk I/O
- Hardware: memory, RAID
7. You succeed as a team. Have a good cross discipline team that understands the
whole system and what's underneath the system. People who can set up printers,
machines, install networks, and so on. With a good team all things are possible.

*****************************************************************************

Exercise activities & answers: (Time 30 min)

1. Identify the key quality attributes & ASRs

o Performance
 Support 100 million videos views per day
 High volume of thumb nails – approx. 6 per video and
o Availability
2. Identify the key tactics used to satisfy the ASRs
 Load balancing
 Caching static content
 Row level caching in database
 CDN
 Not too many devices between content & users (routers, switches, etc.)
 Big Table for handling high volume queries (thumb nails)
 Sharding
3. Describe the problem of Serving Thumbnails and how it was resolved
 High volume of Thumbnails – 4 per video
 High volume of requests for Thumbnails – because there are 60 Thumbnails on a web
page
 Lots of disk seeks
 Ran into directory limit on # of files per directory

Solution
Used Googles Big Table – a distributed data store
 It clumps files together
 Fast & fault tolerant
 Distributed multi-level cache across collocation sites

Ebooks File Computing Essentials 2019 27th Edition (Ebook PDF) All Chapters
100% (4)
Ebooks File Computing Essentials 2019 27th Edition (Ebook PDF) All Chapters
56 pages
Exploring IBM Eserver Iseries 11th Edition by Jim Hoskins, Roger Dimmick ISBN 1885068921 9781885068927 PDF Download
100% (4)
Exploring IBM Eserver Iseries 11th Edition by Jim Hoskins, Roger Dimmick ISBN 1885068921 9781885068927 PDF Download
47 pages
Mater Dei College College of Nursing Tubigon, Bohol
No ratings yet
Mater Dei College College of Nursing Tubigon, Bohol
27 pages
Attachment Report
No ratings yet
Attachment Report
31 pages
Information Technology I
No ratings yet
Information Technology I
78 pages
YouTube Architecture
No ratings yet
YouTube Architecture
8 pages
TripAdvisor Architecture - 40M Visitors 200M Dynamic Page Views 30TB Data
No ratings yet
TripAdvisor Architecture - 40M Visitors 200M Dynamic Page Views 30TB Data
11 pages
Important Features of Operating System
No ratings yet
Important Features of Operating System
3 pages
Chapter 1.2
No ratings yet
Chapter 1.2
47 pages
Communication Skills 002
No ratings yet
Communication Skills 002
57 pages
BCA Semester VI Cloud Computing - Cloud Platforms in Industr
No ratings yet
BCA Semester VI Cloud Computing - Cloud Platforms in Industr
69 pages
Sesion 1 - Huawei Cloud Stack - IT Infrastructure With Hybid Cloud
No ratings yet
Sesion 1 - Huawei Cloud Stack - IT Infrastructure With Hybid Cloud
46 pages
ACAv3 EN M12 CachingContent Instructor Deck
No ratings yet
ACAv3 EN M12 CachingContent Instructor Deck
51 pages
Oracle 1z0 932 - Dumpspedia
No ratings yet
Oracle 1z0 932 - Dumpspedia
64 pages
Playtube Document
No ratings yet
Playtube Document
16 pages
Vedio Streaming Web
No ratings yet
Vedio Streaming Web
17 pages
SW Architecture - Lecture - 06
No ratings yet
SW Architecture - Lecture - 06
35 pages
Chapter 4 - Building Scalable Web Applications
No ratings yet
Chapter 4 - Building Scalable Web Applications
19 pages
Veritas DLO 9.4 BOI Setup and Configuration Guide
No ratings yet
Veritas DLO 9.4 BOI Setup and Configuration Guide
39 pages
Playtube Project
No ratings yet
Playtube Project
17 pages
ICT Quiz
81% (16)
ICT Quiz
20 pages
Module 4 CC
No ratings yet
Module 4 CC
43 pages
g12 Mil q2 Wk5 Audio Media
No ratings yet
g12 Mil q2 Wk5 Audio Media
7 pages
System Design Notes 1664811186
No ratings yet
System Design Notes 1664811186
24 pages
L5 LargeScaleWebApps
No ratings yet
L5 LargeScaleWebApps
22 pages
Lesson 1 - Types of Computers and Their Parts
No ratings yet
Lesson 1 - Types of Computers and Their Parts
16 pages
Intro To Operating System
No ratings yet
Intro To Operating System
90 pages
M.SC - IT Paert II CBCS Cyber Forensicssemester IV 2
No ratings yet
M.SC - IT Paert II CBCS Cyber Forensicssemester IV 2
280 pages
EnglishTo26 09
No ratings yet
EnglishTo26 09
4 pages
System Design Interview
No ratings yet
System Design Interview
4 pages
300+ TOP Operating System LAB VIVA Questions and Answers
No ratings yet
300+ TOP Operating System LAB VIVA Questions and Answers
33 pages
S11 - System Architecture
No ratings yet
S11 - System Architecture
79 pages
Designing Youtube
No ratings yet
Designing Youtube
24 pages
HDF5 Users Guide
No ratings yet
HDF5 Users Guide
342 pages
Video Streaming and CDNS: Context
No ratings yet
Video Streaming and CDNS: Context
15 pages
SD Blueprint Merged
100% (1)
SD Blueprint Merged
160 pages
System Design
No ratings yet
System Design
56 pages
System Design
No ratings yet
System Design
56 pages
System Design Scale System From Zero To Million Users #Systemdesign (English) (DownloadYoutubeSubtitles - Com)
No ratings yet
System Design Scale System From Zero To Million Users #Systemdesign (English) (DownloadYoutubeSubtitles - Com)
8 pages
Tech Stack
No ratings yet
Tech Stack
54 pages
Inic 1618
No ratings yet
Inic 1618
8 pages
CT Unity - Best Practices - v3
No ratings yet
CT Unity - Best Practices - v3
14 pages
WT Unit-1
No ratings yet
WT Unit-1
20 pages
Scaling To Millions Users
No ratings yet
Scaling To Millions Users
21 pages
Ics Dos Manual
No ratings yet
Ics Dos Manual
16 pages
Caching
No ratings yet
Caching
12 pages
Build Your First Home Server
From Everand
Build Your First Home Server
R.R. Arnob
No ratings yet
Veritas Migration Procedure
No ratings yet
Veritas Migration Procedure
17 pages
Thecus: User's Manual
No ratings yet
Thecus: User's Manual
185 pages
Untitled 1
No ratings yet
Untitled 1
423 pages
018 - System Design - Netflix - EXTERNAL
No ratings yet
018 - System Design - Netflix - EXTERNAL
10 pages
Learn NodeJS in 1 Day: Complete Node JS Guide with Examples
From Everand
Learn NodeJS in 1 Day: Complete Node JS Guide with Examples
Krishna Rungta
3.5/5 (4)
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Educative System Design Part3
No ratings yet
Educative System Design Part3
51 pages
Google Distributed System
No ratings yet
Google Distributed System
40 pages
Chapter 85: Checking The Status of The Saposcol Before Performance Checking The Status of The Saposcol Before Performance Tuning Tuning
No ratings yet
Chapter 85: Checking The Status of The Saposcol Before Performance Checking The Status of The Saposcol Before Performance Tuning Tuning
4 pages
FEACrack User Manual v3 - 2
No ratings yet
FEACrack User Manual v3 - 2
268 pages
Citrix Storagelink Installation Guide: This Document Describes How To Install Citrix Storagelink™ Version 2.1
No ratings yet
Citrix Storagelink Installation Guide: This Document Describes How To Install Citrix Storagelink™ Version 2.1
17 pages
System Design Cheat Sheet
No ratings yet
System Design Cheat Sheet
6 pages
Ebay Architecture: Scalability With Agility
No ratings yet
Ebay Architecture: Scalability With Agility
46 pages
Ontents: Asic Tructure of Omputers
0% (1)
Ontents: Asic Tructure of Omputers
7 pages
Vga
No ratings yet
Vga
9 pages
Building Scalable Web Architectures: Aaron Bannert
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
75 pages
Varnish, Memcached, Redis, and HTTP Caching For Increased Web App Performance
No ratings yet
Varnish, Memcached, Redis, and HTTP Caching For Increased Web App Performance
4 pages
The Beginner’s Guide to Node.js
From Everand
The Beginner’s Guide to Node.js
Steven Mcananey
No ratings yet
Web System Architecture
No ratings yet
Web System Architecture
32 pages
Scalable Web2.0
100% (3)
Scalable Web2.0
145 pages
Modern Web Application Architecture Overview
No ratings yet
Modern Web Application Architecture Overview
9 pages
Web Architecture
No ratings yet
Web Architecture
23 pages
Mobile Webinar Yehuda 6 10
No ratings yet
Mobile Webinar Yehuda 6 10
216 pages
Amazon Web Services: Big Data Case Presentation
100% (1)
Amazon Web Services: Big Data Case Presentation
23 pages
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
Ac 2005 Scalable We Barch
No ratings yet
Ac 2005 Scalable We Barch
74 pages
Howto Serve 2500 Ad Requests / Second
No ratings yet
Howto Serve 2500 Ad Requests / Second
54 pages
Web Scalability - Part - 2
100% (2)
Web Scalability - Part - 2
25 pages
Wikipedia: Site Internals (Workbook 2007)
100% (6)
Wikipedia: Site Internals (Workbook 2007)
30 pages
309 Study Notes
No ratings yet
309 Study Notes
4 pages
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
100% (10)
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
34 pages
Building Scalable Web Architectures: Aaron Bannert
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
74 pages
Rwws Mysql 2006
No ratings yet
Rwws Mysql 2006
73 pages
Front End Standards
No ratings yet
Front End Standards
11 pages
Web Design & Development
No ratings yet
Web Design & Development
37 pages
Building Scalable Web Sites
No ratings yet
Building Scalable Web Sites
21 pages
Raspberry Pi Server Essentials
From Everand
Raspberry Pi Server Essentials
Piotr J Kula
No ratings yet
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
From Everand
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
Rob Botwright
No ratings yet
Youtube Architecture
No ratings yet
Youtube Architecture
25 pages
Google BigQuery Analytics
From Everand
Google BigQuery Analytics
Jordan Tigani
3/5 (1)
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet