0% found this document useful (0 votes)
4 views

Google File System

Uploaded by

sherlimca
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Google File System

Uploaded by

sherlimca
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Google File System (GFS)

Google File System is essentially a distributed File Storage in any given cluster of Google File System
there can be hundreds or thousands of commodity servers and this cluster provides an interface for n
number of clients to either read a file or write a file.

Theoretically it is like a File System distributed over hundreds or thousands of servers.

Design Considerations

This particular architecture is commodity hardware. Instead of buying an expensive hardware they
chose to buy off-the-shelf commodity hardware, and they chose this commodity hardware because they
are cheap. Secondly using lot of such servers they could scale horizontally created the right software
layer created on top of it.

The first design consideration when using commodity hardware is that, commodity hardware fails all the
time they could be disk failures, there could be network failures and a server might crash, there are OS
bugs and there are some human errors and all of this means that, in a cluster of thousands of server at
any point in them there are one or two servers that could always be down. Google File System has to be
designed in such a way that, it can still perform reliably in a fault tolerant manner, in the face of all these
constant failures.

The second design consideration is that Google File System is optimized to store and read large files. A
typical file in Google File System ranges from 100 MB to multiple GBS.

The third important design consideration is it is optimized for two kinds while operations.
1) The writes to any file are generally append-only. There are hardly or no random writes in
the file and to the reads are sequential reads, so there could be a crawler which keeps
appending, adding all the crawl documents HTML pages to a single file and then a batch
processing system which reads this entire large file and creates an index out of it a search
index out of it.

 The typical file size arranged from 100 MB to multiple GBs and the single file is not
stored on single server. It is actually subdivided into multiple chunks. And each
chunk is of 64 MB and these chunks is spread across multiple machines and these
machines which is the commodity hardware the thousands of servers we had in a
cluster. These machines are also called as chunk servers. Here the servers are not
storing an entire file but chunks of a particular file and these chunks are identified
globally unique 64 bit ID.
In below example, the file has 4 chunks, and each of those 4 chunks resides on 4
different servers and since these files are stored on commodity hardware , which
can go down at any time, if you store only one copy of the file , it is highly possible,
you loose the file.
 Google File system ensures that each chunk of your file has atleast three replicas
across 3 different servers so even if one server goes down, you have other two
replicas to work with. This replica count default is 3, and it is configurable by client
who is writing to the file, so while creating the file, or even afterwards you can
specify that I want the file to have atleast 5 replicas, and it is difficult for your client
application to identify which chunk server so instead of storing all the data , with in
chunk servers or with in client applications, we have separate component called GFS
master, which is Google File System master server which has all the metadata,
which is all the names of files, which are in that particular cluster. For each file how
many chunks there are, IDs of those chunks and on which server those chunks
residing , in addition to that it also has the access control details of which client is
allowed to access which particular file , so gfs master has a table which has the file
name, chunk id (64 bit id) and each chunk has multiple replicas with default count of
3, so it will have the location of all those 3 replicas.
So whenever client wants to read the file, it asks for the file name, and if it is in middle of reading the file
it can ask for chunk index. Eg : want to read 5th chunk of the particular file , in that case GFS master from
its table will give out id of the chunk and all the chunk server IP addresses where the chunk has the
replicas, once the client has those IP addresses, it can then using a particular chunk handle which is that
ID it can directly go to the junk server, read the data directly from the chunk server , so it is important to
note , the gfs client does not read data from gfs master , it only reads metadata about the file , and the
actually reading and writing of the file is directly done between the client and the GFS chunk server and
to further reduce the load from GFS master , the gfs client also caches the particular chunk handle and
the chunk locations.
If a client want to write a file, it ask the master to create the file , and master will give out locations of all
chunk servers which are relatively free now which means that if there few chunk servers which are only
10% full or 20 % full , it will give more priority to those chunk servers and give the client three such
possible chunk servers which are free where the client can write data . Out of those 3 servers , the client
will find the closest server and pass all the data to it so in this case replica 2 was the closest and client
sent all the reader to replica A and this junk server will in turn pass on all the data to next replicas and
so on and so forth .

When the master gets those 3 chunk server it says the particular chunk server is the primary replica.The
data is not stored on disk and it is stored on LRU cache. Once all the data is stored in cache the client
send request to primary replicas saying that now you can commit the data on disk , and the primary
replica will coordinate with all other replicas nd once it gets confirmation it send back the confirmation
to the client.

It is important for chunk servers to have heart beat messages passed along to the master so that master
knows that the chunk server is still alive. If a chunk server dies, then the heart beat message of the
particular chunk server is not available , and the count of replica is reduced.
There is only one master server, or one master component for entire cluster. So even if there are
hundreds of clients accessing files writing the files , reading files or creating files and there are hundreds
or few thousands of servers all these operations can still been done using the single master .
All operations that occur on the file are stored on append only lock called operations log. So each file
operation with corresponding timestamp and user details who performed that operation is stored in
operations log , and this log is very important and so it is directly writtern into disk . Suppose if the
master crashes, it can read the operations log and create the entire file system namespace along with
chunk ID s to get back to the earlier stage of the system . It is possible the operations log becomes too
long , and that is why behind the scenes there is background thread which keeps on checkpointing the
operations log .

Eg : after every 100 lines of operations it compresses that into very few lines of data so that when
master comes back up it doesn’t have to go through all the lines again , it just start from checkpoint,
read all the logs after that
Since master is the only single component and all clients to it becomes a single point of failure. If the
master goes down for the clients then a entire gfs cluster has gone down. Practically it is not big issue,
because clients cache all the metadata for a particular file that it needs to read and also there is another
component, which does the same operations of the master , which runs behind the scene constantly
such that if master goes down the shadow master can take over and perform the operations of the
master.

This is the basic overview of Google File System . GFS is used to store huge amount of data which is
difficult to achieve with vertical scaling and because now these files are distributed across machines it
also helps in doing batch processing much faster than a traditional server. That batch processing system
is called mapreduce.

You might also like