0% found this document useful (0 votes)

274 views

The Google File System

The Google File System (GFS) is a scalable distributed file system designed by Google to handle large files across many servers. GFS uses a master server to manage metadata and chunk servers to store file data split into large 64MB chunks replicated across multiple machines. The design focuses on reliability with component failures as the norm, optimizes for appending large files, and provides fault tolerance through replication and fast recovery of metadata and data chunks.

Uploaded by

Himanshu Patel

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

274 views

The Google File System

Uploaded by

Himanshu Patel

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

The Google File System

Reporter: You-Wei Zhang

Abstract
We have designed and implemented
the Google File System, a scalable
distributed file system for large
distributed data-intensive
applications.

outline

Introduction Design Point

Design Overview
System Interactions
Master Operation
Fault Tolerance
Conclusion

Introduction - Design Point

Traditional Issues
Performance
Scalability
Reliability
Availability

Different Points in GFS

Component failures are the norm rather than the
exception
Files are huge that multi-GB files are common
Most files are mutated by appending new data rather
than overwriting existing data
4

2. Design Overview

Interface
Organized hierarchically in directories
and identified by path names
Usual operations
create, delete, open, close, read, and write

Moreover
Snapshot Copy
Record append Multiple clients to append data
to the same file concurrently

2. Design Overview

Architecture (1)

Single master
Multiple chunkservers
Multiple clients

2. Design Overview

Chunk

Architecture (2)

Files are divided into fixed-size chunks

Each chunk is identified by 64-bit chunk handle
Chunkservers store chunks on local disk as Linux files
Replication for reliability (default 3 replicas)

Master
Maintains all file system metadata
Namespace, access control information, mapping, locations

Manage chunk lease, garbage collection, chunk migration

Periodically communicate with chunkservers (HeartBeat
message)

2. Design Overview

Single Master
Simple read procedure

2. Design Overview

Chunk Size
64MB
Much larger than typical file system block sizes

Advantages from large chunk size

Reduce interaction between client and master
Client can perform many operations on a given
chunk
Reduces network overhead by keeping persistent TCP
connection

Reduce size of metadata stored on the master

The metadata can reside in memory
9

2. Design Overview

Metadata (1)
Store three major types
Namespaces
File and chunk identifier

Mapping from files to chunks

Location of each chunk replicas

In-memory data structures

Metadata is stored in memory
Periodic scanning entire state is easy and
efficient
10

2. Design Overview

Metadata (2)
Chunk locations
Master do not keep a persistent record of chunk locations
Instead, it simply polls chunkservers at startup and periodically
thereafter (heartbeat message)
Because of chunkserver failures, it is hard to keep persistent
record of chunk locations

Operation log
Master maintains historical record of critical metadata changes
Namespace and mapping
For reliability and consistency, replicate operation log on
multiple remote machines

3. System Interactions

Leases and Mutation Order

Use leases to maintain

consistent mutation order
across replicas
Master grant lease to one
of the replicas -> Primary
Primary picks serial order
for all mutations
Other replicas follow the
primary order
Minimize management
overhead at the master
Use pipelining for fully
utilize network bandwidth1

3. System Interactions

Atomic Record Appends

Atomic append operation called record append
Record append is heavily used
Clients would need complicated and expensive
synchronization
Primary checks if append exceed max chunk
size
If so, primary pads chunk to max chunk size
Secondaries do the same
Primary replies to the client that operation
should be retried on the next chunk000
13

3. System Interactions

Snapshot
Make a copy of a file or a directory
tree
Master revokes lease for that file
Duplicate metadata
On first write to a chunk after the snapshot
operation
All chunkservers create new chunk
Data can be copied locally

4. Master Operation

Namespace Management and

Locking

GFS master maintain a table which

map full pathname to metadata
Each node in the namespace has
associated read-write lock
Concurrent operations can be
properly serialized by this locking
mechanism

4. Master Operation

Replica Placement
GFS place replicas over different
racks for reliability and availability
Read can exploit aggregate
bandwidth of multiple racks but write
traffic has to flow through multiple
racks
-> need tradeoff
16

4. Master Operations

Creation, Re-replication,
Rebalancing

Create

Equalize disk space utilization

Limit recent creation on each chunkserver
Spread replicas across racks

Re-replication
Re-replicates happens when a chunkserver becomes
unavailable

Rebalancing
Periodically rebalance replicas for better disk space
and load balancing
17

4. Master Operation

Garbage Collection
Master just logs deletion and rename the file to a
hidden name that includes timestamp
During the masters regular scan, if the
timestamp is within recent 3 days (for example)
it will not be deleted
These files can be read by new name and
undeleted by renaming back to the original
name
Periodically check the orphaned chunk and erase
them
18

4. Master Operation

Stale Replica Detection

Chunkserver misses mutation to the
chunk due to system down
Master maintains chunk version number
to distinguish stale one and up-to-date
one
Increase version number when chunk get
lease from master
Master periodically remove stale replicas
19

5. Fault Tolerance
Fast Recovery
Master and Chunkserver are designed to restore their state
and restart in seconds

Chunk Replication
Each chunk is replicated on multiple chunkservers on
different racks
According to user demand, the replication factor can be
modified for reliability

Master Replication
Operation log
Historical record of critical metadata changes

Operation log is replicated on multiple machines

6. Conclusion
GFS is a distributed file system that support large-scale data processing
workloads on commodity hardware
GFS has different points in the design space
Component failures as the norm
Optimize for huge files
GFS provides fault tolerance
Replicating data
Fast and automatic recovery
Chunk replication
GFS has the simple, centralized master that does not become a bottleneck
GFS is a successful file system
An important tool that enables to continue to innovate on Googles
ideas

DC 1
100% (1)
DC 1
65 pages
Mastering SaltStack - Second Edition
From Everand
Mastering SaltStack - Second Edition
Joseph Hall
No ratings yet
2019 ASSMTs Software Design and Architecture Engg - Sir Hussain Saleem 28102019
No ratings yet
2019 ASSMTs Software Design and Architecture Engg - Sir Hussain Saleem 28102019
1 page
Operating Systems: Internals and Design Principles: Memory Management
No ratings yet
Operating Systems: Internals and Design Principles: Memory Management
41 pages
Geh 6703 PDF
100% (1)
Geh 6703 PDF
1,236 pages
Gustafson Law
No ratings yet
Gustafson Law
4 pages
History of Cisc & Risc
0% (1)
History of Cisc & Risc
10 pages
Peterson Algorithm and Implementation of Algorithm
No ratings yet
Peterson Algorithm and Implementation of Algorithm
29 pages
Session 20-21-22-Mongoose ODM
No ratings yet
Session 20-21-22-Mongoose ODM
17 pages
Design Engineering
No ratings yet
Design Engineering
63 pages
Wap
No ratings yet
Wap
30 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
9 pages
Software Design Patterns: Common Questions and Answers
No ratings yet
Software Design Patterns: Common Questions and Answers
6 pages
Amdahl Law
No ratings yet
Amdahl Law
2 pages
Data Base Complete
No ratings yet
Data Base Complete
75 pages
Remote Procedure Call in Distributed System
No ratings yet
Remote Procedure Call in Distributed System
26 pages
Mesosphere Guide To Data-Rich Apps in Financial Services 1
No ratings yet
Mesosphere Guide To Data-Rich Apps in Financial Services 1
11 pages
IPv4 - Header
No ratings yet
IPv4 - Header
13 pages
What Are Design Patterns and Do I Need Them?
No ratings yet
What Are Design Patterns and Do I Need Them?
25 pages
Presented By: Akshat Jain
No ratings yet
Presented By: Akshat Jain
10 pages
Module 1 - Ch2 - Operating System Structure
No ratings yet
Module 1 - Ch2 - Operating System Structure
50 pages
Amdahl's Law
No ratings yet
Amdahl's Law
5 pages
Nymble Final
100% (1)
Nymble Final
77 pages
Dev Ops
No ratings yet
Dev Ops
16 pages
Operating System Support in Distributed Systems
No ratings yet
Operating System Support in Distributed Systems
4 pages
An Analysis of Customer Retention and Insurance Claim Patterns Using Data Mining: A Case Study
100% (1)
An Analysis of Customer Retention and Insurance Claim Patterns Using Data Mining: A Case Study
11 pages
Mobile Transport Layer
No ratings yet
Mobile Transport Layer
18 pages
Cs9152 DBT Unit I Notes
100% (1)
Cs9152 DBT Unit I Notes
53 pages
Operating System
No ratings yet
Operating System
74 pages
Chapter 3: Processes: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
No ratings yet
Chapter 3: Processes: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
74 pages
Foss Lab Programs
No ratings yet
Foss Lab Programs
12 pages
Distributed Systems: Dr.P.Amudha Associate Professor
100% (4)
Distributed Systems: Dr.P.Amudha Associate Professor
38 pages
Java Performance Tuning (Full Presentation) by Ender
No ratings yet
Java Performance Tuning (Full Presentation) by Ender
172 pages
SPM 2 Marks Refer
No ratings yet
SPM 2 Marks Refer
13 pages
Hci Internal Questions 2003
No ratings yet
Hci Internal Questions 2003
3 pages
WebRTC GitHub Repo Developer's Guide
No ratings yet
WebRTC GitHub Repo Developer's Guide
6 pages
Hybrid Port Knocking
100% (1)
Hybrid Port Knocking
16 pages
5 Designing Dropbox - Grokking The System Design Interview
No ratings yet
5 Designing Dropbox - Grokking The System Design Interview
10 pages
Different Classification of Computer Architecture
0% (1)
Different Classification of Computer Architecture
5 pages
Sad Lec16,17 & 18 - Input and Output
No ratings yet
Sad Lec16,17 & 18 - Input and Output
65 pages
Deliver C# Software Faster With Source Code Analysis
No ratings yet
Deliver C# Software Faster With Source Code Analysis
59 pages
Latest Trends in Serverless Computing: Bachelors of Technology in
No ratings yet
Latest Trends in Serverless Computing: Bachelors of Technology in
7 pages
Design Pattern
No ratings yet
Design Pattern
28 pages
System Analysis and Design
No ratings yet
System Analysis and Design
57 pages
Role of Parallel Computation in IOT, AR, Big Data and VR
No ratings yet
Role of Parallel Computation in IOT, AR, Big Data and VR
16 pages
Advanced Database Protocols
No ratings yet
Advanced Database Protocols
15 pages
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
Enterprise Information Architecture Component Model - Chapter 5
No ratings yet
Enterprise Information Architecture Component Model - Chapter 5
27 pages
Good Programming Skills
No ratings yet
Good Programming Skills
47 pages
Lect Final 1
No ratings yet
Lect Final 1
8 pages
Advanced JAVA Programming
No ratings yet
Advanced JAVA Programming
7 pages
MOEAFramework 2.1 ManualFixed
No ratings yet
MOEAFramework 2.1 ManualFixed
191 pages
Software Engineering Fundamentals Tutorial
No ratings yet
Software Engineering Fundamentals Tutorial
10 pages
Cs8079 - Hci QB Unit 5
No ratings yet
Cs8079 - Hci QB Unit 5
5 pages
Chapter 34
No ratings yet
Chapter 34
18 pages
Final Documentation
No ratings yet
Final Documentation
82 pages
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
From Everand
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
Eddie Vi
4/5 (1)
Mastering Ninject for Dependency Injection
From Everand
Mastering Ninject for Dependency Injection
Daniel Baharestani
No ratings yet
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
From Everand
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
William Smith
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Test Summary: Result Section
No ratings yet
Test Summary: Result Section
4 pages
Logistic Regression: Abhishek Panchal 15CS14F
No ratings yet
Logistic Regression: Abhishek Panchal 15CS14F
15 pages
New Doc 27
No ratings yet
New Doc 27
20 pages
Report of Assignment 3
No ratings yet
Report of Assignment 3
4 pages
A Seminar Report On
No ratings yet
A Seminar Report On
6 pages
Report of Assignment 3
No ratings yet
Report of Assignment 3
3 pages
Naive Bayes - Report (Repaired)
No ratings yet
Naive Bayes - Report (Repaired)
5 pages
An Efficient Wireless Noc With Congestion-Aware Routing For Multicore Chips
No ratings yet
An Efficient Wireless Noc With Congestion-Aware Routing For Multicore Chips
5 pages
Sethour (Int Hour) Setminute (Int Min), Setsecond (Int Sec)
No ratings yet
Sethour (Int Hour) Setminute (Int Min), Setsecond (Int Sec)
2 pages
Hardik 2
No ratings yet
Hardik 2
1 page
Miniproject
No ratings yet
Miniproject
131 pages
Programming Assignments: A1 - Systemc and Openmp
No ratings yet
Programming Assignments: A1 - Systemc and Openmp
2 pages
Relational Model: Tuple Relational Calculus Domain Relational Calculus
No ratings yet
Relational Model: Tuple Relational Calculus Domain Relational Calculus
22 pages
What Is Balanced Nutrition: Search Chat
No ratings yet
What Is Balanced Nutrition: Search Chat
6 pages
DBMS Assignment 7
No ratings yet
DBMS Assignment 7
3 pages
Unit 5 - Dr.D.umanandhini (Autosaved)
No ratings yet
Unit 5 - Dr.D.umanandhini (Autosaved)
77 pages
Pthread
No ratings yet
Pthread
4 pages
Untitled
No ratings yet
Untitled
1,555 pages
BT14CSE038
No ratings yet
BT14CSE038
2 pages
F
No ratings yet
F
10 pages
States, State Graphs, and Transition Testing: 1. Synopsis
100% (1)
States, State Graphs, and Transition Testing: 1. Synopsis
28 pages
Java Practice Solutions
No ratings yet
Java Practice Solutions
7 pages
People V Burgos
No ratings yet
People V Burgos
2 pages
What Is RFC in SAP
100% (2)
What Is RFC in SAP
11 pages
Ethernet-APL Products by The End of 2022 or Early 2023
No ratings yet
Ethernet-APL Products by The End of 2022 or Early 2023
2 pages
Got A Better Name? Please Let Me Know!
No ratings yet
Got A Better Name? Please Let Me Know!
30 pages
Writing JUnit Tests in NetBeans IDE
No ratings yet
Writing JUnit Tests in NetBeans IDE
16 pages
201508181629102776
No ratings yet
201508181629102776
7 pages
27 42 60 - Information Broker (IB)
No ratings yet
27 42 60 - Information Broker (IB)
19 pages
Sic and Sicex Compare
No ratings yet
Sic and Sicex Compare
3 pages
Common Language Runtime
No ratings yet
Common Language Runtime
3 pages
CCNA 4 Student Skills Based Assessment Lab Answer Key PDF
No ratings yet
CCNA 4 Student Skills Based Assessment Lab Answer Key PDF
9 pages
Ip7ww Voipdb C1
No ratings yet
Ip7ww Voipdb C1
5 pages
191ECO5O1T QB - Docx-04.10.2024
No ratings yet
191ECO5O1T QB - Docx-04.10.2024
4 pages
Troubleshooting
No ratings yet
Troubleshooting
6 pages
Application of Computer in Accounting
100% (2)
Application of Computer in Accounting
20 pages
CCS - 336 - Cloud Services Management
No ratings yet
CCS - 336 - Cloud Services Management
118 pages
Unit 3- Scripting
No ratings yet
Unit 3- Scripting
16 pages
Muhammad Raza Rafiq.L
No ratings yet
Muhammad Raza Rafiq.L
3 pages
Brochure ACER
No ratings yet
Brochure ACER
7 pages
Grade 10 Baseline Quiz
No ratings yet
Grade 10 Baseline Quiz
3 pages
Introduction To Components of A Computer System
No ratings yet
Introduction To Components of A Computer System
32 pages
Data Structures
No ratings yet
Data Structures
5 pages
Comparison of Microprocessor, Microcontroller, Pic and Arm Processors
No ratings yet
Comparison of Microprocessor, Microcontroller, Pic and Arm Processors
1 page