The Google File System
The Google File System
Abstract
We have designed and implemented
the Google File System, a scalable
distributed file system for large
distributed data-intensive
applications.
outline
2. Design Overview
Interface
Organized hierarchically in directories
and identified by path names
Usual operations
create, delete, open, close, read, and write
Moreover
Snapshot Copy
Record append Multiple clients to append data
to the same file concurrently
2. Design Overview
Architecture (1)
Single master
Multiple chunkservers
Multiple clients
2. Design Overview
Chunk
Architecture (2)
Master
Maintains all file system metadata
Namespace, access control information, mapping, locations
2. Design Overview
Single Master
Simple read procedure
2. Design Overview
Chunk Size
64MB
Much larger than typical file system block sizes
2. Design Overview
Metadata (1)
Store three major types
Namespaces
File and chunk identifier
2. Design Overview
Metadata (2)
Chunk locations
Master do not keep a persistent record of chunk locations
Instead, it simply polls chunkservers at startup and periodically
thereafter (heartbeat message)
Because of chunkserver failures, it is hard to keep persistent
record of chunk locations
Operation log
Master maintains historical record of critical metadata changes
Namespace and mapping
For reliability and consistency, replicate operation log on
multiple remote machines
11
3. System Interactions
12
3. System Interactions
3. System Interactions
Snapshot
Make a copy of a file or a directory
tree
Master revokes lease for that file
Duplicate metadata
On first write to a chunk after the snapshot
operation
All chunkservers create new chunk
Data can be copied locally
14
4. Master Operation
15
4. Master Operation
Replica Placement
GFS place replicas over different
racks for reliability and availability
Read can exploit aggregate
bandwidth of multiple racks but write
traffic has to flow through multiple
racks
-> need tradeoff
16
4. Master Operations
Creation, Re-replication,
Rebalancing
Create
Re-replication
Re-replicates happens when a chunkserver becomes
unavailable
Rebalancing
Periodically rebalance replicas for better disk space
and load balancing
17
4. Master Operation
Garbage Collection
Master just logs deletion and rename the file to a
hidden name that includes timestamp
During the masters regular scan, if the
timestamp is within recent 3 days (for example)
it will not be deleted
These files can be read by new name and
undeleted by renaming back to the original
name
Periodically check the orphaned chunk and erase
them
18
4. Master Operation
5. Fault Tolerance
Fast Recovery
Master and Chunkserver are designed to restore their state
and restart in seconds
Chunk Replication
Each chunk is replicated on multiple chunkservers on
different racks
According to user demand, the replication factor can be
modified for reliability
Master Replication
Operation log
Historical record of critical metadata changes
6. Conclusion
GFS is a distributed file system that support large-scale data processing
workloads on commodity hardware
GFS has different points in the design space
Component failures as the norm
Optimize for huge files
GFS provides fault tolerance
Replicating data
Fast and automatic recovery
Chunk replication
GFS has the simple, centralized master that does not become a bottleneck
GFS is a successful file system
An important tool that enables to continue to innovate on Googles
ideas
21