0% found this document useful (0 votes)

24 views30 pages

Lecture 14 HDFS GFS

Uploaded by

Chhotu Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views30 pages

Lecture 14 HDFS GFS

Uploaded by

Chhotu Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Hadoop Distributed

File System (HDFS)

Goals of HDFS
• Very Large Distributed File System
• 10K nodes, 100 million files, 10PB
• Assumes Commodity Hardware
• Files are replicated to handle hardware failure
• Detect failures and recover from them
• Optimized for Batch Processing
• Data locations exposed so that computations can move to where data
resides
• Provides very high aggregate bandwidth
Distributed File System
• Single Namespace for the entire cluster
• Data Coherency
• Write-once-read-many (WORM) access model
• Client can only append to existing files
• Files are broken up into blocks
• Typically 64MB block size
• Each block replicated on multiple DataNodes
• Intelligent Client
• Client can find the location of blocks
• Client accesses data directly from DataNode
HDFS Architecture

• Master / Slave Architecture

• A single NameNode
• Many DataNodes
• Internally a file is split into
one or more blocks and
these blocks are stored in a
set of DataNodes.
Functions of a NameNode
•Manages File System Namespace
• Maps a file name to a set of blocks
• Maps a block to the DataNodes where it resides
•Cluster Configuration Management
•Replication Engine for Blocks
NameNode Metadata
 Metadata in Memory
 The entire metadata is in the main memory
 No demand paging of metadata
 Types of metadata
 List of files
 List of Blocks for each file
 List of DataNodes for each block
 File attributes, e.g. creation time, replication factor
 A Transaction Log
 Records file creations, file deletions etc.
DataNode
• A Block Server
• Stores data in the local file system (e.g. ext3)
• Stores metadata of a block (e.g. CRC)
• Serves data and metadata to Clients
• Block Report
• Periodically sends a report of all existing blocks to the
NameNode
• Facilitates Pipelining of Data
• Forwards data to other specified DataNodes
Block Placement
• Current Strategy
• One replica on the local node
• Second replica on a remote rack
• Third replica on the same remote rack
• Additional replicas are randomly placed
• Clients read from the nearest replicas
• Would like to make this policy pluggable
Heartbeats
• DataNodes send a heartbeat to the NameNode
• Once every 3 seconds
• NameNode uses heartbeats to detect DataNode failure.
• A network partition can cause a subset of DataNodes to
lose connectivity with NameNode.
• The NameNode marks DataNodes without recent
Heartbeats as dead and does not forward any new IO
requests to them.
Replication Engine
• NameNode detects DataNode failures
• Chooses new DataNodes for new replicas
• Balances disk usage
• Balances communication traffic to DataNodes
Data Correctness
• Use Checksums to validate data
• Use CRC32
• File Creation
• The client computes checksum per 512 bytes
• DataNode stores the checksum
• File Access
• The client retrieves the data and checksum from
DataNode
• If Validation fails, the Client tries other replicas
NameNode Failure
• A single point of failure
• Transaction Log stored in multiple directories
• A directory on the local file system
• A directory on a remote file system (NFS/CIFS)
• Need to develop a real High Availability (HA) solution
Data Pipelining
• Client retrieves a list of DataNodes on which to place
replicas of a block.
• Client writes a block to the first DataNode.
• The first DataNode forwards the data to the next node
in the Pipeline.
• When all replicas are written, the Client moves on to
write the next block in the file.
Rebalancer
• Goal: % disk full on DataNodes should be similar
• Usually run when new DataNodes are added
• Cluster is online when Rebalancer is active
• Rebalancer is throttled to avoid network congestion
• Command line tool
Secondary NameNode
• Copies FsImage and Transaction Log from NameNode to a
temporary directory
• Merges FsImage and Transaction Log into a new FsImage in a
temporary directory
• Uploads new FsImage to the NameNode
• Transaction Log on NameNode is purged
• FsImage is a file stored on the OS filesystem that contains the
complete directory structure (namespace) of the HDFS with
details about the location of the data on the Data Blocks and
which blocks are stored on which node.
Rebalancer
• Commads for HDFS User:
• hadoop dfs -mkdir /foodir
• hadoop dfs -cat /foodir/myfile.txt
• hadoop dfs -rm /foodir/myfile.txt
• Commands for HDFS Administrator
• hadoop dfsadmin -report
• hadoop dfsadmin -decommision datanodename
• Web Interface
• http://host:port/dfshealth.jsp
Google File System
(GFS)
Motivation
• More than 15,000 commodity-class PC’s.
• Multiple clusters distributed worldwide.
• Thousands of queries served per second.
• One query reads 100’s of MB of data.
• One query consumes 10’s of billions of CPU Cycle.
• Google stores dozens of copies of the entire Web.

Conclusion: Need large, distributed, highly fault tolerant

file system
Introduction
• Google File System (GFS) a scalable distributed file system for large distributed data-intensive
applications.

• It is designed and implemented to meet the rapidly growing demands of Google’s data processing
needs.

• GFS shares many of the same goals as previous distributed file systems such as performance,
scalability, reliability, and availability.

• It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high
aggregate performance to a large number of clients.
Operations Support
• GFS support the usual operations to create, delete, open, close, read, and write files.

• Moreover, GFS has snapshot and record append operations.

• Snapshot creates a copy of a file or a directory tree at low cost.

• Record append allows multiple clients to append data to the same file concurrently while guaranteeing

the atomicity of each individual client’s append.

• Record append is useful for implementing multi-way merge results and producer consumer queues that

many clients can simultaneously append to without additional locking.

Components of GFS
• Master (NameNode)

• Manages metadata (namespace)

• Not involved in data transfer

• Controls allocation, placement, replication

• Chunkserver (DataNode)

• Stores chunks of data

• No knowledge of GFS file system structure

• Built on local linux file system

GFS Architecture
• A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients, as shown in Figure 1.

Figure 1: GFS Architecture

GFS Architecture (Contd.)
 GFS uses large chunk: 64MB (1GB=1024 MB=16 chunks)
 Stored as plain Linux file, which will be lazily extended up to 64MB.

 The master maintains less than 64 bytes of metadata for each 64 MB chunk.
 Opt to many read and write on a given chunk
 Reduces network overhead by keeping a connection to the chunk server.
 See also MapReduce, Big Table

 Chunkserver:
 Files are divided into fixed-size chunks (64MB).
 Each chunk is identified by an immutable and globally unique 64 bit chunk handle assigned by
the master at the time of chunk creation.
 Chunkservers store chunks on local disks as Linux files and read or write chunk data specified by
a chunk handle and byte range.
 For reliability, each chunk is replicated on multiple chunkservers. (Default 3 replicas).

 GFS Client:
 GFS client code linked into each application implements the file system API and communicates
with the master and chunkservers to read or write data on behalf of the application.
GFS Architecture (Contd.)
 System Metadata
 The master stores three major types of metadata: the file and chunk namespaces, the mapping
from files to chunks, and the locations of each chunk’s replicas.
 All metadata is kept in the master’s memory.
 The first two types (namespaces and file-to-chunk mapping) are also kept persistent by logging
mutations to an operation log stored on the master’s local disk and replicated on remote machines.
 Using a log allows us to update the master state simply, reliably, and without risking inconsistencies in
the event of a master crash. The master does not store chunk location information persistently.
 Instead, it asks each chunkserver about its chunks at master startup and whenever a chunkserver
joins the cluster.
Write Control and Data Flow
Step 1) Who has the lease?
2) Lease info
3) Data push
4) Commit
5) Serialized Commit
6)Commit ACK
7) Success

Note: Lease is temporary

access to AWS Account
Step 1. The client asks the master which chunkserver holds the current lease for the chunk and the locations of the other
replicas. If no one has a lease, the master grants one to a replica it chooses (not shown).

Step 2. The master replies with the identity of the primary and the locations of the other (secondary) replicas. The client
caches this data for future mutations. It needs to contact the master again only when the primary becomes unreachable
or replies that it no longer holds a lease.

Step 3. The client pushes the data to all the replicas. A client can do so in any order. Each chunkserver will store the data
in an internal LRU buffer cache until the data is used or aged out. By decoupling the data flow from the control flow, we
can improve performance by scheduling the expensive data flow based on the network topology regardless of which
chunkserver is the primary.

Step 4. Once all the replicas have acknowledged receiving the data, the client sends a write request to the primary. The
request identifies the data pushed earlier to all of the replicas. The primary assigns consecutive serial numbers to all the
mutations it receives, possibly from multiple clients, which provides the necessary serialization. It applies the mutation to
its own local state in serial number order.
Step 5. The primary forwards the write request to all secondary replicas. Each secondary replica applies
mutations in the same serial number order assigned by the primary.

Step 6. The secondaries all reply to the primary indicating that they have completed the operation.

Step 7. The primary replies to the client. Any errors encountered at any of the replicas are reported to
the client. In case of errors, the write may have succeeded at the primary and an arbitrary subset of the
secondary replicas. (If it had failed at the primary, it would not have been assigned a serial number and
forwarded.) The client request is considered to have failed, and the modified region is left in an
inconsistent state. Our client code handles such errors by retrying the failed mutation. It will make a few
attempts at steps (3) through (7) before falling back to a retry from the beginning of the write.
MASTER OPERATION
• The master executes all namespace operations.
• In addition, it manages chunk replicas throughout the system: It makes placement decisions, creates
new chunks and hence replicas, and coordinates various system-wide activities to keep chunks fully
replicated, to balance load across all the chunkservers, and to reclaim unused storage.

• Namespace Management and Locking

• Replica Placement
• Creation, Re-replication, Rebalancing
• Garbage Collection
• Stale Replica Detection
Conclusion
• The Google File System demonstrates the qualities essential for supporting large-scale data processing workloads
on commodity hardware.
• GFS provides fault tolerance by constant monitoring, replicating crucial data, and fast and automatic recovery.
• Chunk replication allows us to tolerate chunkserver failures.
• GFS has successfully met our storage needs and is widely used within Google as the storage platform for research
and development as well as production data processing.
GFS vs HDFS

Guentner Manual GMMnext V1.1.5 en
No ratings yet
Guentner Manual GMMnext V1.1.5 en
179 pages
Zambian Grid Code
100% (1)
Zambian Grid Code
174 pages
CS7 Installation Service Manual For Service Tool and User Tool Screen VER 1.30 A47FJA02EN12 - 161220 - Fix PDF
No ratings yet
CS7 Installation Service Manual For Service Tool and User Tool Screen VER 1.30 A47FJA02EN12 - 161220 - Fix PDF
332 pages
Distributed File System Google File System
No ratings yet
Distributed File System Google File System
44 pages
Chapter 2 Google File System 250525 070947
No ratings yet
Chapter 2 Google File System 250525 070947
42 pages
M4 - 05 - Google File System
No ratings yet
M4 - 05 - Google File System
28 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
Unit 3.4 Gfs and Hdfs
No ratings yet
Unit 3.4 Gfs and Hdfs
4 pages
Aircraft IT Ops V10.4 - SEPTEMBER-OCTOBER 2021 - V10.4
No ratings yet
Aircraft IT Ops V10.4 - SEPTEMBER-OCTOBER 2021 - V10.4
77 pages
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
No ratings yet
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
33 pages
Paper Gfs Summary
No ratings yet
Paper Gfs Summary
14 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
Snap
No ratings yet
Snap
46 pages
Hadoop Intro
No ratings yet
Hadoop Intro
40 pages
HDFS
No ratings yet
HDFS
14 pages
Chap 6
No ratings yet
Chap 6
54 pages
Chapter 2 1712934164766
No ratings yet
Chapter 2 1712934164766
21 pages
Unit 5 Lecture 2
No ratings yet
Unit 5 Lecture 2
22 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
Asug82651 - Sap S4hana Licensing Cloud
No ratings yet
Asug82651 - Sap S4hana Licensing Cloud
36 pages
DATA228 Lecture Notes Week 4
No ratings yet
DATA228 Lecture Notes Week 4
21 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
Demands of Google's Data Processing Needs. Performance, Scalability, Reliability, and Availability. A Proprietary DFS
No ratings yet
Demands of Google's Data Processing Needs. Performance, Scalability, Reliability, and Availability. A Proprietary DFS
9 pages
Assisnment # 1 Os
No ratings yet
Assisnment # 1 Os
6 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Google File System 1
No ratings yet
Google File System 1
48 pages
15 Gfs
No ratings yet
15 Gfs
40 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
R16 4-1 BDA - Unit-2 (Ref-3)
No ratings yet
R16 4-1 BDA - Unit-2 (Ref-3)
22 pages
F1 Maths 201
No ratings yet
F1 Maths 201
6 pages
Unit 3 Big Data - 240516 - 090400
No ratings yet
Unit 3 Big Data - 240516 - 090400
20 pages
Cloud Computing - Unit 3
No ratings yet
Cloud Computing - Unit 3
38 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
HDFSArchitecture
No ratings yet
HDFSArchitecture
15 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Google File System
No ratings yet
Google File System
48 pages
Split Mount MW - Indoor Part
100% (1)
Split Mount MW - Indoor Part
40 pages
HDFS
No ratings yet
HDFS
19 pages
Brownell and Young Method - Intergraph CADWorx & Analysis
No ratings yet
Brownell and Young Method - Intergraph CADWorx & Analysis
2 pages
2 GFS
No ratings yet
2 GFS
30 pages
3
No ratings yet
3
11 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
4
No ratings yet
4
53 pages
Agile ETRM From Allegro
No ratings yet
Agile ETRM From Allegro
8 pages
Hadoop
No ratings yet
Hadoop
23 pages
02 PAS Install The Vault
No ratings yet
02 PAS Install The Vault
66 pages
Selecting A Paper For G7 Calibration PDF
No ratings yet
Selecting A Paper For G7 Calibration PDF
3 pages
Basic Pentesting - 2 - CTF Walkthrough - Infosec Resources
No ratings yet
Basic Pentesting - 2 - CTF Walkthrough - Infosec Resources
9 pages
SAP HANA Interview Questions
No ratings yet
SAP HANA Interview Questions
17 pages
8580bf5e 586f 455b 9b04 D2477a6c6bbgfg7 - AngularJS - Syllabus - BestDotNetTraining
No ratings yet
8580bf5e 586f 455b 9b04 D2477a6c6bbgfg7 - AngularJS - Syllabus - BestDotNetTraining
4 pages
Nmap Result
No ratings yet
Nmap Result
2 pages
Notes For Centres 2106
No ratings yet
Notes For Centres 2106
7 pages
BDA Unit-1
No ratings yet
BDA Unit-1
19 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
No ratings yet
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
21 pages
IBM CC0103EN Certificate Cognitive Class
No ratings yet
IBM CC0103EN Certificate Cognitive Class
1 page
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
Java 2 Slips
No ratings yet
Java 2 Slips
60 pages
Learning Guide: Tour Service Level III
No ratings yet
Learning Guide: Tour Service Level III
35 pages
Unit 2 PDF
No ratings yet
Unit 2 PDF
22 pages
Google File System
No ratings yet
Google File System
20 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
29 pages
Venu Babu Ravur
No ratings yet
Venu Babu Ravur
1 page
Unit 2
No ratings yet
Unit 2
22 pages
Improving Ceph Performance While Reducing Costs: Applications and Ecosystem Solutions Development Rick Stehno
No ratings yet
Improving Ceph Performance While Reducing Costs: Applications and Ecosystem Solutions Development Rick Stehno
13 pages
PPC 120T Ce RT - DS
No ratings yet
PPC 120T Ce RT - DS
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
48 pages
What Is Distributed Data Processing?
No ratings yet
What Is Distributed Data Processing?
2 pages
Hadoop: OREIN IT Technologies
No ratings yet
Hadoop: OREIN IT Technologies
65 pages
EPON OLT WebGUI User Manual
No ratings yet
EPON OLT WebGUI User Manual
82 pages
SEL-351R: Intelligent Control Made Simple
No ratings yet
SEL-351R: Intelligent Control Made Simple
4 pages
Op Jeeva1
No ratings yet
Op Jeeva1
36 pages
STP Concepts - Rev 2022
No ratings yet
STP Concepts - Rev 2022
39 pages
1564-Article Text-2810-1-10-20171231 PDF
No ratings yet
1564-Article Text-2810-1-10-20171231 PDF
5 pages
Rob Jordan & Chris Livdahl
No ratings yet
Rob Jordan & Chris Livdahl
32 pages
The Google File System
No ratings yet
The Google File System
21 pages
The Google File System: Kenneth Chiu
No ratings yet
The Google File System: Kenneth Chiu
40 pages
The Architecture of Open Source Applications - The Hadoop Distributed File System
No ratings yet
The Architecture of Open Source Applications - The Hadoop Distributed File System
6 pages
0187 - Wanderlust
No ratings yet
0187 - Wanderlust
19 pages
Apache Hadoop Filesystem and Its Usage in Facebook
No ratings yet
Apache Hadoop Filesystem and Its Usage in Facebook
33 pages
Google File System
No ratings yet
Google File System
22 pages
Using Unicode Character Symbols in Excel
No ratings yet
Using Unicode Character Symbols in Excel
28 pages
Gfs Google File System 13331
No ratings yet
Gfs Google File System 13331
28 pages
The Google File System Final
No ratings yet
The Google File System Final
20 pages

Lecture 14 HDFS GFS

Uploaded by

Lecture 14 HDFS GFS

Uploaded by

Hadoop Distributed

File System (HDFS)

• Master / Slave Architecture

Conclusion: Need large, distributed, highly fault tolerant

• Moreover, GFS has snapshot and record append operations.

• Snapshot creates a copy of a file or a directory tree at low cost.

the atomicity of each individual client’s append.

many clients can simultaneously append to without additional locking.

• Manages metadata (namespace)

• Not involved in data transfer

• Controls allocation, placement, replication

• Stores chunks of data

• No knowledge of GFS file system structure

• Built on local linux file system

Figure 1: GFS Architecture

Note: Lease is temporary

• Namespace Management and Locking

You might also like