0% found this document useful (0 votes)
18 views4 pages

Notes Big Data Day2 2024

The document outlines the functionality of Hadoop's NameNode and DataNode, including metadata management through Fsimage and Editfiles, as well as the process of block reporting and handling DataNode failures. It also details user-oriented HDFS commands, the use of WebHDFS for thin clients, and the steps to build and run a Hadoop MapReduce application. Additionally, it describes the MapReduce framework, including the roles of Mapper and Reducer classes, and provides an example of a word count application.

Uploaded by

dbda.iacsdakurdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Notes Big Data Day2 2024

The document outlines the functionality of Hadoop's NameNode and DataNode, including metadata management through Fsimage and Editfiles, as well as the process of block reporting and handling DataNode failures. It also details user-oriented HDFS commands, the use of WebHDFS for thin clients, and the steps to build and run a Hadoop MapReduce application. Additionally, it describes the MapReduce framework, including the roles of Mapper and Reducer classes, and provides an example of a word count application.

Uploaded by

dbda.iacsdakurdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

namenode

Fsimage: take care of metadata


Editfiles:take care of change logs

Every data node has to send block mapping report to name node.at start up block report is
send to name node in 3 min.after every 6 hrs report will be updated.if the no. of block i large
th report is split actress multiple heartbeats.
Block reporting is
Data node failure:
Heart beat info is information whether data node is dead or alive.dead data node is never be
used for putting replicas.as min 3 replica needs to be there then if any one data node fails
name node creates new replica on data node which is alive.if for 30 sec(default interval)
data node does not send heartbeat info to name node then it is declared as dead. a dead
datanode forces name to replicate data on other data node.

User Oriented commands


Hdfs

hdfs dfs -command [args] // dfs is hdfs shell

-put put file from local to hdfs


-get get file from hdfs to local
-copyFromLocal alias of -put
-copyToLocal alias of -get
-mv to move file in hdfs

Hdfs dfs -mkdir mydata //create folder in home of hdfs as


Hdfs dfs -put file.txt mydata/

Hdfs file permissions:


R,w,x same as linux

Hdfs home directories:


Hdfs dfs -ls /user

Only user car write or read data from his home directory
Day_3

WebHDFS

Curl command is used to perform webhdfs operations.


Webhdfs is used in thin clients where hadoop is not installed.
Options for data input:
Flume
Sqoop
Hadoop
How to build a hadoop MapReduce Application?
1) use an ide to create a java project.(eclipse)
2) Before importing the source code in the above project search for package statements.
3) Create a Package in the above java project.
4) Import the source code in above package(s).
5) If you are going to get compilation errors then try to find the causes.
6) If errors are there because of dependencies then add the dependencies in the
classpath of the project.
7) Compile the java project
8) Create library/jar out of the project
9) Verify the jar
10) Run the jar on the hadoop cluster.

1) Open IDE Click on file→new→java Project→give name→select use default


jre(currently jdk 1.7) 3rd option→click on next→finish.
2) To find workspace on terminal Select projectname→ right
click→properties→location: /home/cloudera/workspace/projectname
3) Rht clk on src→new→package→give name hdfs→finish
4) Check file system ls -lh /home/cloudera/workspace/projectname/src you will see new
folder created with package name.
5) Run command to copy file:
Cp ~/hdp/pigandhive/labs/lab1.2/HDFS_API/InputCounties.java
/home/cloudera/workspace/projectname/src/hdfs/
6) Go back to eclips Initially you will not see any changes in eclipse so to reflect
changecs in eclipse right click→refresh(f5 key) now you will see .java file is
added in package.
7) Right clk→projectname→build path→confugur build path→click on libraries
tab→Add external jars→usr/lib/hadoop/client/select all jars→click on
ok→again click on ok.
8) Go to terminal Verify that whether the .class file is there in bin folder.
9) Create jar
10) yarn jar jarfile_name(path) fullyqualifiedclass_name inputfile result
(As jar is not executable jar we need to provide fullyqualifiedclass_name)

MapReduce:
MapReduce is ETL Framework.
Ijested file in hdfs file
broken down into blocks
After running map-reduce job.map task process the input of MapReduce job,with amap task
assigned
Process data parallely on each map
Produce data in key value format.
Shuffle-sort method: data with same key is combined,and sent to reducer
So o/p of shuffle and sort is sent to reducer as input.reduce will generate final o/p

Ex word_count
1) Load constitution.txt instatging area
2) Run command
yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount
constitution.txt wordcount_op
(Wordcount class already present in no need to create just run above command)
3) Hdfs dfs -ls -R to check op folder wordcount_op created
4) Hdfs dfs -cat wordcount_op/part-r-00000 to see reducer output

Maper is java class


Public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable>
{
//need to override map method of Mapper class
Protected void map(LongWritable Key,Text value)
}

Public class WordCountReducer extends Mapper<LongWritable,Text,Text,IntWritable>


{
//need to override map method of Mapper class
Protected void reduce(LongWritable Key,Text value)
}

WordCountJob class
As developer you cannot instantiate Mapper or Reducer class so that responsibility taken by
Job class.Job is responsible for instatiating Mapper and
Reducer.
Job job=Job.getInstance(getConf(),”WordCountJob)
This line is tell job class gather the configuration of namenode

You might also like