0% found this document useful (0 votes)
79 views

Google File System

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Google File System

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Google file System

Google file System


Google file system (GFS)
Google File System, a scalable distributed file system for
large distributed data-intensive applications.
Google File System (GFS) to meet the rapidly growing
demands of Google’s data processing needs.
GFS shares many of the same goals as other distributed file
systems such as performance, scalability, reliability, and
availability.
GFS provides a familiar file system interface.
Files are organized hierarchically in directories and
identified by pathnames.
Support the usual operations to create, delete, open, close,
GFS
Small as well as multi-GB files are common.
Each file typically contains many application objects
such as web documents.
GFS provides an atomic append operation called
record append. In a traditional write, the client specifies
the offset at which data is to be written.
Concurrent writes to the same region are not
serializable.
GFS has snapshot and record append operations.
GFS (snapshot and record append)
The snapshot operation makes a copy of a file or a
directory.
Record append allows multiple clients to append
data to the same file concurrently while
guaranteeing the atomicity of each individual
client’s append.
It is useful for implementing multi-way merge
results.
GFS consist of two kinds of reads: large streaming
reads and small random reads.
In large streaming reads, individual operations
Common Goals of GFS
and most Distributed File Systems
Performance
Reliability
Scalability
Availability
Other GFS Concepts
Component failures are the norm rather than
the exception.
File System consists of hundreds or even thousands of
storage machines built from inexpensive commodity
parts.

Files are Huge. Multi-GB Files are common.


Each file typically contains many application objects
such as web documents.
Append, Append, Append.
Most files are mutated by appending new data rather
than overwriting
Other GFS Concepts
Why assume hardware failure is the norm?
It is cheaper to assume common failure on poor
hardware and account for it, rather than invest in
expensive hardware and still experience occasional
failure.
The amount of layers in a distributed system (network,
disk, memory, physical connections, power, OS,
application) mean failure on any could contribute to
data corruption.
GFS Interface
GFS – familiar file system interface
Files organized hierarchically in directories,
path names
Create, delete, open, close, read, write
operations
Snapshot and record append (allows multiple
clients to append simultaneously - atomic)
GFS Architecture
GFS Architecture
GFS Architecture
Chunk
GFS Architecture
A GFS cluster consists of a single master and multiple
chunk-servers and is accessed by multiple clients, Each
of these is typically a commodity Linux machine.

It is easy to run both a chunk-server and a client on the


same machine.

As long as machine resources permit, it is possible to


run flaky application code is acceptable.
GFS Architecture
Files are divided into fixed-size chunks.
Each chunk is identified by an immutable and
globally unique 64 bit chunk assigned by the
master at the time of chunk creation.
Chunk-servers store chunks on local disks as
Linux files, each chunk is replicated on multiple
chunk-servers.
The master maintains all file system metadata.
This includes the namespace, access control
GFS is fault tolerance?
Consistency
Consistency
Write control and data flow
Replica placement
Replica placement
Garbage Collection

You might also like