Hadoop

This document provides an overview of the Hadoop Distributed File System (HDFS). It describes the key components of HDFS including the NameNode, DataNodes, and clients. It explains how these components interact for file input/output operations and how HDFS manages data replication across DataNodes for fault tolerance. The document also discusses HDFS features like upgrades, point-in-time recovery, and dynamic replica adjustment.

Uploaded by

ishugupta0298

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Hadoop

Uploaded by

ishugupta0298

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Tutorial 7

Hadoop
Distributed File
System
Ishu Gupta - 20114041
Introduction
• Apache Hadoop subproject
• Runs on commodity hardware
• Fault Tolerant
• High throughput access
Architecture
NameNode
• Master server that manages the metadata and namespace of the HDFS cluster.
• Stores information about the file system hierarchy, file metadata (such as permissions, modification
times), and the mapping of data blocks to DataNodes.
• NameNode is a critical component, and its failure can result in the unavailability of the entire HDFS
cluster.
• HDFS employs mechanisms like secondary NameNode and, in modern setups, Hadoop High
Availability (HA) to ensure continuous availability and fault tolerance.
DataNode
• Worker nodes in the HDFS cluster responsible for storing actual data blocks.
• Store and manage the file data as blocks on their local disks. Each block is typically 128 megabytes
in size, although this can be configured per file.
• Replicate data blocks as per the replication factor specified for the file (usually three replicas by
default).
• Replication ensures fault tolerance; if a DataNode or a block becomes unavailable due to a failure,
HDFS can still retrieve the data from other available replicas on different DataNodes.
• Send periodic heartbeats to the NameNode to confirm that they are operational.
Client
• HDFS clients interact with the Hadoop Distributed File System for file storage and retrieval.
• Perform operations like creating, reading, updating, and deleting files and directories in the HDFS
namespace.
• Communicate with the NameNode to obtain metadata information about files and directories.
• Query the NameNode to discover the locations of data blocks, allowing for efficient data access.
• Read data by contacting the NameNode for block locations and then fetching the data from the
nearest DataNode.
• During write operations, clients request the NameNode to choose suitable DataNodes to host block
replicas, and data is written in a pipelined manner for optimal throughput.
Interaction of NameNode
DataNode and Client
Image and Journal
• Image - In-memory representation of the file system namespace and metadata stored in the
NameNode's memory.
• Includes information about files, directories, permissions, modification times, and block locations.
• Journal - modification log that records changes made to the image.
• Captures metadata modifications such as file creations, deletions, and updates.
• Enhances durability and provides a way to replay changes to the namespace in case of failures.
CheckpointNode
• CheckpointNode periodically combines the existing checkpoint and journal to create a new
checkpoint and an empty journal
• Runs on a different host from the NameNode
• Downloads the current checkpoint and journal files from the NameNode, merges them locally, and
returns the new checkpoint back to the NameNode.
BackupNode
• In-memory, up-to-date image of the file system namespace that is always synchronized with the state
of the NameNode
• Accepts the journal stream of namespace transactions from the active NameNode, saves them to its
own storage directories, and applies these transactions to its own namespace image in memory.
• BackupNode can create a checkpoint without downloading checkpoint and journal files from the
active NameNode
• Read-only NameNode.
Upgrades
• HDFS supports rolling upgrades.
• During a rolling upgrade, individual components such as NameNodes and DataNodes can be
upgraded one at a time while the cluster continues to operate with the remaining nodes, ensuring
minimal downtime.
• Upgrading HDFS involves ensuring compatibility between the new software version and the existing
components in the cluster.
Upgrades
Point in Time Recovery
• Snapshots are read-only copies of the file system at a specific moment, preserving the file and
directory structure as it existed when the snapshot was taken.
• Snapshots serve as valuable tools for point-in-time recovery and backup strategies. Administrators
can revert the file system to a specific snapshot in case of accidental data deletion or corruption.
• By periodically creating snapshots, organizations can establish a backup strategy that allows them to
recover data to a known and reliable state, enhancing data protection and disaster recovery
capabilities.
FILE I/O OPERATIONS AND
REPLICA MANGEMENT
File I/O Operations
File I/O Operations
READ -
• HDFS allows clients to read data from files stored in the system. Clients first contact the NameNode
to obtain the locations of data blocks.
• Data blocks are read from the nearest DataNodes, facilitating efficient and parallelized reading of
large files. HDFS supports streaming reads, enabling sequential access to data.
WRITE -
• When writing data to HDFS, clients request the NameNode to nominate a set of DataNodes to host
block replicas. Clients then write data in a pipeline fashion to these selected DataNodes.
• HDFS provides fault tolerance by replicating data blocks across multiple DataNodes. The client
ensures that multiple copies of the data are stored for reliability.
File I/O Operations
Append Operations:
• HDFS supports append operations, allowing clients to append data to existing files. Clients can
append data to the end of a file without altering its existing content.
• Appending data is achieved by adding new blocks to the file. Clients interact with the NameNode to
locate suitable DataNodes for appending new data blocks.
File Deletion and Renaming:
• Clients can delete files and directories from HDFS by sending requests to the NameNode. Deletion
involves removing the metadata associated with the file and releasing the corresponding data blocks.
• HDFS also supports file renaming, which involves changing the path of a file or directory. Renaming
operations are atomic and do not involve physically moving data blocks.
Replica Management
Racks in HDFS
Replica Management
• If a DataNode or block becomes unavailable due to hardware failure or other issues, the system can
continue functioning by retrieving the data from the remaining replicas.

Block Placement Policy:

• Determine where to store replicas. Optimize data availability, reliability, and network bandwidth
usage.
• The NameNode selects DataNodes for block placement, considering network proximity, rack
awareness, and existing block locations, ensuring balanced distribution across the cluster.
Replica Management
Rebalancing -
• If a DataNode fails or if the replication factor falls below the specified threshold, the system initiates
replica re-replication to maintain the desired level of fault tolerance.
• Admins can manually rebalance replicas to ensure uniform distribution of data and storage utilisation
across the cluster.
Dynamic Replica Adjustment -
• HDFS allows administrators to dynamically adjust the replication factor for specific files or
directories based on changing storage requirements and data access patterns.
• Administrators can balance data durability with storage costs, optimizing resource utilization as
needed.
Tutorial 7

Thank You!
Ishu Gupta - 20114041

The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
Document 4 HDFS
No ratings yet
Document 4 HDFS
8 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
The Architecture of Open Source Applications - The Hadoop Distributed File System
No ratings yet
The Architecture of Open Source Applications - The Hadoop Distributed File System
6 pages
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
No ratings yet
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
9 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Bda Unit 5
No ratings yet
Bda Unit 5
17 pages
huawei
No ratings yet
huawei
32 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
29 pages
Unit- 3 (HDFS)-1
No ratings yet
Unit- 3 (HDFS)-1
24 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Module 1 PDF
No ratings yet
Module 1 PDF
42 pages
HDFS
No ratings yet
HDFS
19 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
HDFS
No ratings yet
HDFS
37 pages
Unit-2_ch_1_updated
No ratings yet
Unit-2_ch_1_updated
22 pages
HDFS Intro
No ratings yet
HDFS Intro
9 pages
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
No ratings yet
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
49 pages
Bigdata 15cs82 Vtu Module 1 2 Notes
57% (14)
Bigdata 15cs82 Vtu Module 1 2 Notes
49 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Bda - M 2
No ratings yet
Bda - M 2
113 pages
BD Module 1 Final
No ratings yet
BD Module 1 Final
17 pages
HDFS Concepts
No ratings yet
HDFS Concepts
10 pages
HDFS
No ratings yet
HDFS
16 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
22 pages
File System Basics: Hadoop Distributed
No ratings yet
File System Basics: Hadoop Distributed
22 pages
HDFS
No ratings yet
HDFS
15 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Quick Look: HDFS: Assumptions and Goals
No ratings yet
Quick Look: HDFS: Assumptions and Goals
5 pages
BDA Mod 3 QB Solns
No ratings yet
BDA Mod 3 QB Solns
19 pages
Lecture_14_HDFS_GFS
No ratings yet
Lecture_14_HDFS_GFS
30 pages
Unit2 HDFS
No ratings yet
Unit2 HDFS
17 pages
UNIT 3 HDFS, Hadoop Environment Part 1
No ratings yet
UNIT 3 HDFS, Hadoop Environment Part 1
9 pages
21CS72-BIGDATA-MODULE-2-HDFS (1)
No ratings yet
21CS72-BIGDATA-MODULE-2-HDFS (1)
55 pages
Hadoop Working
No ratings yet
Hadoop Working
33 pages
Big Data Assighmwnt 2
No ratings yet
Big Data Assighmwnt 2
60 pages
BDA Module-1 Notes
No ratings yet
BDA Module-1 Notes
14 pages
Unit-2
No ratings yet
Unit-2
14 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
No ratings yet
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
34 pages
Rob Jordan & Chris Livdahl
No ratings yet
Rob Jordan & Chris Livdahl
32 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
Hdfs and Pig
No ratings yet
Hdfs and Pig
13 pages
HDFSArchitecture
No ratings yet
HDFSArchitecture
15 pages
Introduction To Hadoop Ecosystem
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Unit II-bid Data Programming
No ratings yet
Unit II-bid Data Programming
23 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
248 pages
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
No ratings yet
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
17 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
Unit 3 Big Data_240516_090400
No ratings yet
Unit 3 Big Data_240516_090400
20 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
ATOM Efficient Tracking Monitoring and Orchestration of Cloud Resources
No ratings yet
ATOM Efficient Tracking Monitoring and Orchestration of Cloud Resources
18 pages
IJAA 13 VRP Akshaya Patra Midday Meal Genetic GA K-Means
No ratings yet
IJAA 13 VRP Akshaya Patra Midday Meal Genetic GA K-Means
18 pages
Cloud Security
No ratings yet
Cloud Security
15 pages
IJAA 01 Cheung DHL Service Network Design Hong Kong Leung Opt Sim Interfaces
No ratings yet
IJAA 01 Cheung DHL Service Network Design Hong Kong Leung Opt Sim Interfaces
15 pages
Advanced Operating System CSN-502 Time in Distributed System
No ratings yet
Advanced Operating System CSN-502 Time in Distributed System
6 pages
Advanced Operating System CSN-502: Design Issues (Distributed OS) Issue 1: Time in Distributed Systems
No ratings yet
Advanced Operating System CSN-502: Design Issues (Distributed OS) Issue 1: Time in Distributed Systems
7 pages
Fundamentals of Cyber Security II Lecture 1
No ratings yet
Fundamentals of Cyber Security II Lecture 1
20 pages
Lockdown D
No ratings yet
Lockdown D
19 pages
IT Essentials ITE v6.0 Final Exam Answers 100 Full Update 2016
33% (3)
IT Essentials ITE v6.0 Final Exam Answers 100 Full Update 2016
23 pages
LTRT-50404 MP-202 Telephone Adapter Quick Installation Guide v2.2
No ratings yet
LTRT-50404 MP-202 Telephone Adapter Quick Installation Guide v2.2
2 pages
Configuring S7 Eternet Adapter
No ratings yet
Configuring S7 Eternet Adapter
15 pages
G31M04 Schematic Foxconn Precision Co. Inc.: Fab.A Data: 2007/6/15 Page Index
100% (1)
G31M04 Schematic Foxconn Precision Co. Inc.: Fab.A Data: 2007/6/15 Page Index
39 pages
CSE491: Cloud Compu/ng: Abdur Rahman Adnan Imran Hossain Shaon
No ratings yet
CSE491: Cloud Compu/ng: Abdur Rahman Adnan Imran Hossain Shaon
23 pages
User Manual IDS-5042 Series
No ratings yet
User Manual IDS-5042 Series
67 pages
Huawei HG556 A Service Manual
No ratings yet
Huawei HG556 A Service Manual
59 pages
Aindumps 350-601 v2020-03-18 by Jessica 108q
No ratings yet
Aindumps 350-601 v2020-03-18 by Jessica 108q
110 pages
Wakeonlan Config Tarjeta
No ratings yet
Wakeonlan Config Tarjeta
2 pages
146 73 Ref Manual COMs Module292514GB-00
No ratings yet
146 73 Ref Manual COMs Module292514GB-00
16 pages
Iec61375 5 TNM
No ratings yet
Iec61375 5 TNM
27 pages
UNIT III WIRELESS HEALTH SYSTEMS
No ratings yet
UNIT III WIRELESS HEALTH SYSTEMS
49 pages
How To Configure Multicast On Hikvision DVR or NVR - SCC - CCTV
100% (1)
How To Configure Multicast On Hikvision DVR or NVR - SCC - CCTV
11 pages
Add23 IP - NAST3098 - E01 - 1-ZXR10 M6000-S IPv6 Basic Configuration&Operation (V3.00.20) - 34p
No ratings yet
Add23 IP - NAST3098 - E01 - 1-ZXR10 M6000-S IPv6 Basic Configuration&Operation (V3.00.20) - 34p
34 pages
View Net
No ratings yet
View Net
2 pages
Cover Letter Plus CV PDF
No ratings yet
Cover Letter Plus CV PDF
4 pages
Network Design Guide 2009 Issue 1
No ratings yet
Network Design Guide 2009 Issue 1
35 pages
10.4.4 Lab - Build A Switch and Router Network
No ratings yet
10.4.4 Lab - Build A Switch and Router Network
5 pages
Subnetting IP Networks: CCNA Routing and Switching Introduction To Networks v6.0
No ratings yet
Subnetting IP Networks: CCNA Routing and Switching Introduction To Networks v6.0
62 pages
CV Kirtidhakad DV
No ratings yet
CV Kirtidhakad DV
4 pages
RC - DS - Gazelle R101i Smart Industrial 3G Router
No ratings yet
RC - DS - Gazelle R101i Smart Industrial 3G Router
7 pages
3G Parameter Layering
No ratings yet
3G Parameter Layering
5 pages
Gutranrelation: Error Sheet
No ratings yet
Gutranrelation: Error Sheet
16 pages
Wlan Roaming Performance Testing: Ixchariot
No ratings yet
Wlan Roaming Performance Testing: Ixchariot
13 pages
SFP Modules Uk
No ratings yet
SFP Modules Uk
4 pages
F5 LTM and Asm Lab Setup
100% (1)
F5 LTM and Asm Lab Setup
12 pages
Gestetner: Digital B&W Multi Function Printer
No ratings yet
Gestetner: Digital B&W Multi Function Printer
3 pages
Handover GSM
0% (1)
Handover GSM
33 pages

Hadoop

Uploaded by

Hadoop

Uploaded by

Tutorial 7

Block Placement Policy:

You might also like