The Hadoop Distributed File System

The document describes the Hadoop Distributed File System (HDFS), which provides distributed storage for large datasets across commodity hardware. HDFS uses a master/slave architecture with a dedicated NameNode that manages the file system namespace and regulates access to files. DataNodes store application data in blocks and handle read/write requests. The system provides redundancy through replication and detects/repairs data corruption. HDFS has been deployed at Yahoo to store over 9.8PB of data across 3,500 nodes with high reliability and durability. Future work aims to improve scalability and fault tolerance.

Uploaded by

Anuj Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views29 pages

The Hadoop Distributed File System

Uploaded by

Anuj Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

The Hadoop Distributed File

System
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler
Yahoo!
Sunnyvale, California USA
{Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com

Presenter: Alex Hu
HDFS
 Introduction
 Architecture
 File I/O Operations and Replica Management
 Practice at YAHOO!
 Future Work
 Critiques and Discussion
Introduction and Related Work
 What is Hadoop?
– Provide a distributed file system and a framework
– Analysis and transformation of very large data set
– MapReduce
Introduction (cont.)
 What is Hadoop Distributed File System (HDFS) ?
– File system component of Hadoop
– Store metadata on a dedicated server NameNode
– Store application data on other servers DataNode
– TCP-based protocols
– Replication for reliability
– Multiply data transfer bandwidth for durability
Architecture
 NameNode
 DataNodes
 HDFS Client
 Image Journal
 CheckpointNode
 BackupNode
 Upgrade, File System Snapshots
Architecture Overview
NameNode – one per cluster
 Maintain The HDFS namespace, a hierarchy of
files and directories represented by inodes
 Maintain the mapping of file blocks to DataNodes
– Read: ask NameNode for the location
– Write: ask NameNode to nominate DataNodes
 Image and Journal
 Checkpoint: native files store persistent record of
images (no location)
DataNodes
 Two files to represent a block replica on DN
– The data itself – length flexible
– Checksums and generation stamp
 Handshake when connect to the NameNode
– Verify namespace ID and software version
– New DN can get one namespace ID when join
 Register with NameNode
– Storage ID is assigned and never changes
– Storage ID is a unique internal identifier
DataNodes (cont.) - control
 Block report: identify block replicas
– Block ID, the generation stamp, and the length
– Send first when register and then send per hour
 Heartbeats: message to indicate availability
– Default interval is three seconds
– DN is considered “dead” if not received in 10 mins
– Contains Information for space allocation and load balancing
●
Storage capacity
●
Fraction of storage in use
●
Number of data transfers currently in progress
– NN replies with instructions to the DN
– Keep frequent. Scalability
HDFS Client
 A code library exports HDFS interface
 Read a file

– Ask for a list of DN host replicas of the blocks

– Contact a DN directly and request transfer
 Write a file
– Ask NN to choose DNs to host replicas of the first block of the file
– Organize a pipeline and send the data
– Iteration
 Delete a file and create/delete directory
 Various APIs

– Schedule tasks to where the data are located

– Set replication factor (number of replicas)
HDFS Client (cont.)
Image and Journal
 Image: metadata describe organization
– Persistent record is called checkpoint
– Checkpoint is never changed, and can be replaced
 Journal: log for persistence changes
– Flushed and synched before change is committed
 Store in multiple places to prevent missing
– NN shut down if no place is available
 Bottleneck: threads wait for flush-and-sync
– Solution: batch
CheckpointNode
 CheckpointNode is NameNode
 Runs on different host
 Create new checkpoint
– Download current checkpoint and journal
– Merge
– Create new and return to NameNode
– NameNode truncate the tail of the journal
 Challenge: large journal makes restart slow
– Solution: create a daily checkpoint
BackupNode
 Recent feature
 Similar to CheckpointNode
 Maintain an in memory, up-to-date image
– Create checkpoint without downloading
 Journal store
 Read-only NameNode
– All metadata information except block locations
– No modification
Upgrades, File System and Snapshots
 Minimize damage to data during upgrade
 Only one can exist
 NameNode
– Merge current checkpoint and journal in memory
– Create new checkpoint and journal in a new place
– Instruct DataNodes to create a local snapshot
 DataNode
– Create a copy of storage directory
– Hard link existing block files
Upgrades, File System and Snapshots –
Rollback
 NameNode recovers the checkpoint

 DataNode resotres directory and delete replicas after

snapshot is created

 The layout version stored on both NN and DN

– Identify the data representation formats
– Prevent inconsistent format

 Snapshot creation is all-cluster effort

– Prevent data loss
File I/O Operations and Replica
Management
 File Read and Write
 Block Placement and Replication management
 Other features
File Read and Write
 Checksum
– Read by the HDFS client to detect any corruption
– DataNode store checksum in a separate place
– Ship to client when perform HDFS read
– Clients verify checksum
 Choose the closet replica to read
 Read fail due to

– Unavailable DataNode
– A replica of the block is no longer hosted
– Replica is corrupted
 Read while writing: ask for the latest length
File Read and Write (cont.)
 New data can only be appended
 Single-writer, multiple-reader

 Lease

– Who open a file for writing is granted a lease

– Renewed by heartbeats and revoked when closed
– Soft limit and hard limit
– Many readers are allowed to read
 Optimized for sequential reads and writes
– Can be improved
●
Scribe: provide real-time data streaming
●
Hbase: provide random, real-time access to large tables
Add Block and The hflush

hflush
• Unique block ID
• Perform write operation
• new change is not guaranteed
to be visible
• The hflush
Block Replacement
 Not practical to connect all nodes
 Spread across multiple racks

– Communication has to go through multiple switches

– Inter-rack and intra-rack
– Shorter distance, greater bandwidth
 NameNode decides the rack of a DataNode
– Configure script
Replica Replacement Policy
 Improve data reliability, availability and network
bandwidth utilization
 Minimize write cost
 Reduce inter-rack and inter-node write
 Rule1: No Datanode contains more than one
replica of any block
 Rule2: No rack contains more than two replicas of
the same block, provided there are sufficient racks
on the cluster
Replication management
 Detected by NameNode

 Under-replicated
– Priority queue (node with one replica has the highest)
– Similar to replication replacement policy

 Over-replicated
– Remove the old replica
– Not reduce the number of racks
Other features
 Balancer
– Balance disk space usage
– Bandwidth consuming control
 Block Scanner
– Verification of the replica
– Corrupted replica is not deleted immediately
 Decommissioning
– Include and exclude lists
– Re-evaluate lists
– Remove decommissioning DataNode only if all blocks on it are
replicated
 Inter-Cluster Data Copy
– DistCp – MapReduce job
Practice At Yahoo!
 3500 nodes and 9.8PB of storage available
 Durability of Data
– Uncorrelated node failures
●
Chance of losing a block during one year: <.5%
●
Chance of node fail each month: .8%
– Correlated node failures
●
Failure of rack or switch
●
Loss of electrical power
 Caring for the commons
– Permissions – modeled on UNIX
– Total space available
Benchmarks
DFSIO benchmark Operation Benchmark
 DFSIO Read: 66MB/s per node
 DFISO Write: 40MB/s per node
Production cluster
 Busy Cluster Read: 1.02MB/s per node
 Busy Cluster Write: 1.09MB/s per node
Sort benchmark
Future Work
 Automated failover solution
– Zookeeper
 Scalability
– Multiple namespaces to share physical storage
– Advantage
●
Isolate namespaces
●
Improve overall availability
●
Generalizes the block storage abstraction
– Drawback
●
Cost of management
– Job-centric namespaces rather than cluster centric
Critiques and Discussion
 Pros
– Architecture: NameNode, DataNode, and powerful features to provide kinds of operations,
detect corrupted replica, balance disk space usage and provide consistency.
– HDFS is easy to use: users don’t have to worry about different servers. It can be used as
local file system to provide various operations
– Benchmarks are sufficient. They use real data with large number of nodes and storage to
provide kinds of experiments.

 Cons
– Fault—tolerance is not very sophisticated. All the recoveries introduced are based on the
assumption that NameNode is alive. No proper solution currently in this paper handles the
failure of NameNode
– Scalability, especially the handling of replying heartbeats with instructions. If there are too
many messages come in, the performance of NameNode is not proper measured in this
paper
– The test of correlated failure is not provided. We can’t get any information of the
performance of HDFS after correlated failure is encountered.
 Thank you very much

Yarn Ha Federation
No ratings yet
Yarn Ha Federation
64 pages
Paper Hdfs Summary
No ratings yet
Paper Hdfs Summary
5 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Bbvcx
No ratings yet
Bbvcx
89 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
Rob Jordan & Chris Livdahl
No ratings yet
Rob Jordan & Chris Livdahl
32 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop Intro
No ratings yet
Hadoop Intro
40 pages
Introduction To Hadoop Ecosystem
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
HDFS
No ratings yet
HDFS
1 page
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
4
No ratings yet
4
53 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Unit 3 1
No ratings yet
Unit 3 1
20 pages
BD Module 1 Final
No ratings yet
BD Module 1 Final
17 pages
HDFS
No ratings yet
HDFS
37 pages
HDFS
No ratings yet
HDFS
16 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
No ratings yet
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
34 pages
Bigdta Unit 3
No ratings yet
Bigdta Unit 3
65 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
DC Mod 6
No ratings yet
DC Mod 6
9 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
IMTC634 - Data Science - Chapter 14
No ratings yet
IMTC634 - Data Science - Chapter 14
22 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Unit 4
No ratings yet
Unit 4
104 pages
HDFS
No ratings yet
HDFS
11 pages
Unit 3 HDFS Notes
No ratings yet
Unit 3 HDFS Notes
71 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
HDFS
No ratings yet
HDFS
19 pages
Hadoop
No ratings yet
Hadoop
23 pages
Big Data Unit-III
No ratings yet
Big Data Unit-III
39 pages
HDFS
No ratings yet
HDFS
22 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
22 pages
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
No ratings yet
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
17 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
No ratings yet
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
47 pages
Haoop Architecture
No ratings yet
Haoop Architecture
34 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
5.apache Hadoop Updated
No ratings yet
5.apache Hadoop Updated
57 pages
Hadoop Intro and Hdfs
No ratings yet
Hadoop Intro and Hdfs
37 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
43 pages
Apache Hadoop Filesystem and Its Usage in Facebook
No ratings yet
Apache Hadoop Filesystem and Its Usage in Facebook
33 pages
03 Hdfs
No ratings yet
03 Hdfs
27 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
QS Spec Sheet
No ratings yet
QS Spec Sheet
11 pages
Fiche Technique SAP
No ratings yet
Fiche Technique SAP
7 pages
Recommender Systems-Chapter 3
No ratings yet
Recommender Systems-Chapter 3
47 pages
Introduction To UX Design
No ratings yet
Introduction To UX Design
8 pages
HPU Main Library Membership Form for Smart Card HPU Staff_copy
No ratings yet
HPU Main Library Membership Form for Smart Card HPU Staff_copy
1 page
Class e Instructions Rev2a
No ratings yet
Class e Instructions Rev2a
29 pages
Module1 DSDV
No ratings yet
Module1 DSDV
95 pages
Konnwei Kw310 Can Obdii+Eobd Code Reader: Specifications
No ratings yet
Konnwei Kw310 Can Obdii+Eobd Code Reader: Specifications
16 pages
A Comparison of General-Purpose Optimization Algorithms For
No ratings yet
A Comparison of General-Purpose Optimization Algorithms For
33 pages
Thesis Asset Management Client Login
100% (2)
Thesis Asset Management Client Login
4 pages
BCOS184
No ratings yet
BCOS184
333 pages
Kashish
No ratings yet
Kashish
30 pages
Mock Paper 2
No ratings yet
Mock Paper 2
11 pages
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
2 pages
BSNL Cellone Phase Iv FMCC
No ratings yet
BSNL Cellone Phase Iv FMCC
13 pages
Cyble Sensor CM3030 CYBLE Manual
No ratings yet
Cyble Sensor CM3030 CYBLE Manual
2 pages
My Python Project
No ratings yet
My Python Project
4 pages
Installing Ubuntu Server
100% (1)
Installing Ubuntu Server
13 pages
Ar
No ratings yet
Ar
10 pages
Td+Correction Enpu PDF Redresseur Équipement
No ratings yet
Td+Correction Enpu PDF Redresseur Équipement
1 page
Ais CH 3
No ratings yet
Ais CH 3
39 pages
1992 Mercedes Benz 300SE
100% (1)
1992 Mercedes Benz 300SE
3 pages
Complete Guide To Install SCCM Software Update Point Role
No ratings yet
Complete Guide To Install SCCM Software Update Point Role
30 pages
Unit 13 Statement of Aims 2022-23 - Toran
No ratings yet
Unit 13 Statement of Aims 2022-23 - Toran
5 pages
Q1 Module+2 Internet+and+Computing+Fundamentals+III Dostilla,+Mark+William+M. AJ+Villegas+Voc+HS
No ratings yet
Q1 Module+2 Internet+and+Computing+Fundamentals+III Dostilla,+Mark+William+M. AJ+Villegas+Voc+HS
8 pages
Welcome To Jiwaji
No ratings yet
Welcome To Jiwaji
1 page
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
No ratings yet
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
6 pages
Vlsi Interview Questions
0% (1)
Vlsi Interview Questions
10 pages
The Six Elements To Block-Building Approaches For The Single Container Loading Problem
No ratings yet
The Six Elements To Block-Building Approaches For The Single Container Loading Problem
15 pages
Focusrite Driver Release Notes - June 2022
No ratings yet
Focusrite Driver Release Notes - June 2022
10 pages

The Hadoop Distributed File System

Uploaded by

The Hadoop Distributed File System

Uploaded by

The Hadoop Distributed File

– Ask for a list of DN host replicas of the blocks

– Schedule tasks to where the data are located

 DataNode resotres directory and delete replicas after

 The layout version stored on both NN and DN

 Snapshot creation is all-cluster effort

– Who open a file for writing is granted a lease

– Communication has to go through multiple switches

You might also like