Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
122 views
Hdfs Cartoon
Uploaded by
srishasticvijayakumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Hdfs Cartoon For Later
Download
Save
Save Hdfs Cartoon For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
122 views
Hdfs Cartoon
Uploaded by
srishasticvijayakumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Hdfs Cartoon For Later
Carousel Previous
Carousel Next
Save
Save Hdfs Cartoon For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 5
Search
Fullscreen
THE CAST Haboop DISTRIBUTED FILE SYSTEM (HDFS) People sit in front of me \endask me to reste edorg) =) > (There is onty ONE of We storedota. there are MANY of us sometimes even thousands! —_ (~ and x cordate) sien cramer [eLreNT] INAMENODE| DATANODES| WRITING DATA IN HDFS CLUSTER REQUEST FROM USER BLOCK AND REPLICATION J Let's stort with iting sme date.) x ‘Me, Client, please write \ 200 MB data for me ae fos) An yes. please: 0) divide the data in 120M blocks} BD) sss £ (a3) forgetting \ something? a (Apes lent euay tron a, [BLOCKSIZE: large file is divided] in blocks (usually 64 or 128MB) REPLICATION FACTOR: ‘each block is stored in multiple locations (usually 3) DIVIDE FILE INTO BLOCKS ‘ASK NAMENODE, NAMENODE ASSIGNS DATANODES Lets work on the first block first baie as) | 2900000 2XX X00 poconny 222000002 20000022 ZO ra] ea MI First-- I divide the MI [Mr. Namenode: please help| me write a 128MB block with replication of 3 Replication 3 ee.) Ts [ need'to find 3 datanodes } for weet) (How do I do that? {win tell you some other time Here you go buddy. ‘Addresses of three datanodes. T have also sorted them in increasing distance from you ‘si Datanode 1, Datanode 2, Datanode 3 CLIENT STARTS WRITING DATA (send my dota (and thelist) to (first daerede oly I store the data in hard drive, and-- ee WHILE I am recieving data, I forward the same \ data to the next datanode © Maneesh Varshney.
[email protected]
(Kem eos) renin) We TA..DA... REPLICATION PIPELINE INFORM NAMENODE WHEN DONE ‘Once all data (For this block) is written to hard disk lsend DONE to namenode ca Te =» [Done| Bock success stored) ‘and replicated in HDFS ‘When T am done with a block, Trepeat the same steps Kone ‘remaining blocks WHEN ALL BLOCKS ARE WRITTEN. . RECAP All blocks written, please close file NOW I store all meta information in persistent ‘storage (hard disks) L cose csealt | -we stored data via Replication Pipeline READING DATA IN HDFS CLUSTER REQUEST FROM USER CONTACT NAMENODE FIRST. (tein tein HOES check What about reading them? Let's ask the client again 4 Mr. Client, please read this file termes) = Ig Please give me info on this file Filename] (angie (a) list of all blocks for this fle, (b) ist of datanodes for each block (wot distance 9 fe kK Block 1: at DN x1, yl, z1 con Block 3: at DN x3, y3, 23| and so on, “oy how sinc, (coieentonno, feist es, ames 2 ‘So I download each bec) inturn, ter tke se) DOWNLOAD DATA (Counlced date from the nearest datanode (the first in list) Umm. Question What happens when the datanode is dead or does not have the data, —FPease give me block a| or the data is corrupted DATA for block n] [Actually, HDFS can very elegantly handle these faults and more 5 we will see next ~~FAULT TOLERANCE IN HDFS. PART I: TYPES OF FAULTS AND THEIR DETECTION FAULT I: NODE FAILURE FAULT II: COMMUNICATION FAILURE [rere re typically tree kind of fous ‘he Firat NODE FAILURE * Goodbye, cruel world FAULT III: DATA CORRUPTION (Second is COMMUNICATION FAILURE (cannot send and receive data) Third is DATA CORRUPTION [Data can be corrupted while sending over network se [Or corrupted while itis Stored in hard disks ee DETECTION #1: NODE FATLURES DETECTING DATANODE FATLURE NOTE: Tf Namenode is dead, ‘the entire cluster is dead! Namenode is the SINGLE POINT OF FATLURE Instead, let's focus on how datanode failures ‘are detected =r) / te sna enzr8ea7 {message every 3 seconds This is our way of saying we are alive Era ant get amessage\ { in 10 minutes, the \geremt is dead rome J | |i may be ALIVE and, there was onlya network failure, but ‘the namenode treats both as same) DETECTION #2: NETWORK FATLURES DETECTION #3: CORRUPTED DATA DETECTING CORRUPTED HARD DRIVES Whenever data is sent, Checksum is sent along with ‘an ACK is replied by the reciever Fay Datel ‘transmitted data [ra] [checksum] data! Re we EL a ae If the ACK is not received (after several retries), the sender assumes that the host is dead, or the network has failed data in hard disks, T also store the checksum /| checksum_| (Perssty erent \ Taefal (one to | ‘the namenode Ty lblocks T have| Before sending block report T check if checksums are ok, T don't send info for blocks that are corrupted Thave four blocks RECAP: HEARTBEAT MESSAGES AND BLOCK REPORTS We send heartbeats every (a | blocks... so one (eis carted 3 seconds to say we are alive E (Wes block reperts\ ‘and we skip blocks) sen st ee (which is how the rnamenode will know which blocks are lost)FAULT TOLERANCE IN HDFS. PART IT: HANDLING READING AND WRITING FAILURES. HANDLING WRITE FAILURES (ore tin shoud have sid fier) Lwrite the block in smaller data ni (uly 64K) ced packer Moreover, each datanode replies back an ACK for each packet to) confirm that they got ry So, if I don't get ACKs from some atanode, F know itis dead \ ss pie in) Remember replication pipleline? — —— aga el pockes|| | eZ RS ms, ea) |e =| ‘ L [=a ‘ - gi eS Ba ; \ - ee Here's he dusted ppt. Note thot he blockwitve, under replicated but the namenede [ra] Re HANDLING READ FAILURES /aemenber.vnen asked for\ RE (“ean heb e “anenedegaeine lcatos of of deenodes [Fa] Se DN, DN2, DN 3| Tf one datanode is dead, T read from the others in the list FAULT TOLERANCE IN HDFS. PART IIT: HANDLING DATANODE FATLURES FirsTrust tllyou | about the two tables I keep. List of Blocks Block 1 - stored at DNIL, DN2, DN3 Block 2 - stored at DNI, DN4, DN5| List of Datanodes © continuously update these ‘two tables-- If I find a block on a datanode is corrupted, I update first table (by removing bad DN from block's list) ‘And if T find that a datanode feed Fed Sths) UNDER REPLICATED BLOCKS /— rc the first list (list of blocks) periodically, and see if ‘there are blocks that ‘re not replicated properly For ell uiderseplicnted vice skamer dtenodes ve copy them from datanodes that ave the replica Z ike so) Could you copy the. block from that datanode| Hey, Tneed to lcopy a block from you ( Here you go. (mmo more gestion: Af snc et Oh aa Corot hee mamse That's correct. HDFS cannot guarantee that atleast one replica will always survive. But it tries it best by smartly selecting replica locations, as we will see next —REPLICA PLACEMENT STRATEGY |/Kemenber x promised tre you how I select datanode locations for storing the |“ repees oro bile RACKS AND DATANODES, E {ELECTING FIRST REPLICA LOCATION The cluster is divided into RACKS] Each rack has multiple datanodes lack 1] [Rack 2] [rack 3 Tf the writer is a member of cluster, it is selected as first replica Otherwise some random | dete sete) ) NEXT TWO REPLICA LOCATIONS ‘SUBSEQUENT REPLICA LOCATIONS Fist replica) Pick a different rack than first replicas Select two different datanode on that rack| ‘next two replicas Pick any random datanede, if it satisfies these two conditions: nv one -dearede) aX fase ‘those two conditions cannot be satisfied in which case they are .. chem.. ignored (convenient eh?) _/ JAiso, HDFS allows you use your| ‘own placement algorithm, So if you know a better algorithm, don’t be shy now. T do a lot of other things as well.. read more about me at websites and books Or best of all ( install and run HFS: wicca) in our next comics sre) oA — ete na an store dra Ween tan Neprbesuce obs Read about map reduce THE END
You might also like
HDFS Comic
PDF
No ratings yet
HDFS Comic
5 pages
DFS, HDFS, Architecture, Scaling Problem
PDF
No ratings yet
DFS, HDFS, Architecture, Scaling Problem
32 pages
Hadoop File System
PDF
No ratings yet
Hadoop File System
36 pages
HDFS v001
PDF
No ratings yet
HDFS v001
30 pages
Hadoop File System: B. Ramamurthy
PDF
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop File System: B. Ramamurthy
PDF
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
L2
PDF
No ratings yet
L2
60 pages
bdh_unit_3
PDF
No ratings yet
bdh_unit_3
25 pages
Hadoop File System: B. Ramamurthy
PDF
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
HDFS
PDF
No ratings yet
HDFS
37 pages
HDFSnew
PDF
No ratings yet
HDFSnew
20 pages
Module 1 PDF
PDF
No ratings yet
Module 1 PDF
49 pages
Unit- 3 (HDFS)-1
PDF
No ratings yet
Unit- 3 (HDFS)-1
24 pages
HDFS
PDF
No ratings yet
HDFS
16 pages
Big Data Assighmwnt 2
PDF
No ratings yet
Big Data Assighmwnt 2
60 pages
Unit 4
PDF
No ratings yet
Unit 4
104 pages
BD Module 1 Final
PDF
No ratings yet
BD Module 1 Final
17 pages
Bda - M 2
PDF
No ratings yet
Bda - M 2
113 pages
Hadoop Distributed File System (HDFS)
PDF
No ratings yet
Hadoop Distributed File System (HDFS)
22 pages
The Hadoop Distributed File System
PDF
No ratings yet
The Hadoop Distributed File System
44 pages
The Hadoop Distributed File System
PDF
No ratings yet
The Hadoop Distributed File System
29 pages
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
PDF
No ratings yet
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
34 pages
Unit-2
PDF
No ratings yet
Unit-2
14 pages
Hadoop Working
PDF
No ratings yet
Hadoop Working
33 pages
Data Flow in Hdfs
PDF
No ratings yet
Data Flow in Hdfs
7 pages
2018 Unit1 Lecture5 HDFS HA
PDF
No ratings yet
2018 Unit1 Lecture5 HDFS HA
29 pages
Introduction To Hadoop Ecosystem
PDF
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
BDS Session 5
PDF
No ratings yet
BDS Session 5
57 pages
Rob Jordan & Chris Livdahl
PDF
No ratings yet
Rob Jordan & Chris Livdahl
32 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
PDF
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
PDF
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
03_hdfs
PDF
No ratings yet
03_hdfs
27 pages
What Is Hadoop HDFS
PDF
No ratings yet
What Is Hadoop HDFS
20 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
PDF
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
Unit-4 BDA as on 25-11-2024
PDF
No ratings yet
Unit-4 BDA as on 25-11-2024
248 pages
BIGDTA_UNIT_3
PDF
No ratings yet
BIGDTA_UNIT_3
65 pages
huawei
PDF
No ratings yet
huawei
32 pages
Hadoop
PDF
No ratings yet
Hadoop
23 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
PDF
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
43 pages
Hadoop Training in Hyderabad - Hadoop File System
PDF
No ratings yet
Hadoop Training in Hyderabad - Hadoop File System
5 pages
BD Unit-IIINotes
PDF
No ratings yet
BD Unit-IIINotes
17 pages
3.3_HDFS
PDF
No ratings yet
3.3_HDFS
30 pages
HDFS
PDF
No ratings yet
HDFS
11 pages
Unit-3 (HDFS)
PDF
No ratings yet
Unit-3 (HDFS)
59 pages
Hadoop Architecture
PDF
No ratings yet
Hadoop Architecture
84 pages
Unit 2 Da Material
PDF
No ratings yet
Unit 2 Da Material
71 pages
05 - Introduction To HDFS
PDF
No ratings yet
05 - Introduction To HDFS
27 pages
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
PDF
No ratings yet
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
17 pages
BigData Module 1
PDF
No ratings yet
BigData Module 1
17 pages
Module 1 PDF
PDF
No ratings yet
Module 1 PDF
42 pages
HDFS
PDF
No ratings yet
HDFS
19 pages
HDFS Presentation Kunal Yadav
PDF
No ratings yet
HDFS Presentation Kunal Yadav
11 pages
Bda Unit 5
PDF
No ratings yet
Bda Unit 5
17 pages
Unit II Big Data Analytics
PDF
No ratings yet
Unit II Big Data Analytics
11 pages
21CS72-BIGDATA-MODULE-2-HDFS (1)
PDF
No ratings yet
21CS72-BIGDATA-MODULE-2-HDFS (1)
55 pages
lab2_BD
PDF
No ratings yet
lab2_BD
20 pages
Hadoop Intro
PDF
No ratings yet
Hadoop Intro
40 pages
Unit 3.1
PDF
No ratings yet
Unit 3.1
88 pages
Big Data Assignment PDF
PDF
No ratings yet
Big Data Assignment PDF
18 pages