namenode
Fsimage: take care of metadata
Editfiles:take care of change logs
Every data node has to send block mapping report to name node.at start up block report is
send to name node in 3 min.after every 6 hrs report will be updated.if the no. of block i large
th report is split actress multiple heartbeats.
Block reporting is
Data node failure:
Heart beat info is information whether data node is dead or alive.dead data node is never be
used for putting replicas.as min 3 replica needs to be there then if any one data node fails
name node creates new replica on data node which is alive.if for 30 sec(default interval)
data node does not send heartbeat info to name node then it is declared as dead. a dead
datanode forces name to replicate data on other data node.
User Oriented commands
Hdfs
hdfs dfs -command [args] // dfs is hdfs shell
-put put file from local to hdfs
-get get file from hdfs to local
-copyFromLocal alias of -put
-copyToLocal alias of -get
-mv to move file in hdfs
Hdfs dfs -mkdir mydata //create folder in home of hdfs as
Hdfs dfs -put file.txt mydata/
Hdfs file permissions:
R,w,x same as linux
Hdfs home directories:
Hdfs dfs -ls /user
Only user car write or read data from his home directory
Day_3
WebHDFS
Curl command is used to perform webhdfs operations.
Webhdfs is used in thin clients where hadoop is not installed.
Options for data input:
Flume
Sqoop
Hadoop
How to build a hadoop MapReduce Application?
1) use an ide to create a java project.(eclipse)
2) Before importing the source code in the above project search for package statements.
3) Create a Package in the above java project.
4) Import the source code in above package(s).
5) If you are going to get compilation errors then try to find the causes.
6) If errors are there because of dependencies then add the dependencies in the
classpath of the project.
7) Compile the java project
8) Create library/jar out of the project
9) Verify the jar
10) Run the jar on the hadoop cluster.
1) Open IDE Click on file→new→java Project→give name→select use default
jre(currently jdk 1.7) 3rd option→click on next→finish.
2) To find workspace on terminal Select projectname→ right
click→properties→location: /home/cloudera/workspace/projectname
3) Rht clk on src→new→package→give name hdfs→finish
4) Check file system ls -lh /home/cloudera/workspace/projectname/src you will see new
folder created with package name.
5) Run command to copy file:
Cp ~/hdp/pigandhive/labs/lab1.2/HDFS_API/InputCounties.java
/home/cloudera/workspace/projectname/src/hdfs/
6) Go back to eclips Initially you will not see any changes in eclipse so to reflect
changecs in eclipse right click→refresh(f5 key) now you will see .java file is
added in package.
7) Right clk→projectname→build path→confugur build path→click on libraries
tab→Add external jars→usr/lib/hadoop/client/select all jars→click on
ok→again click on ok.
8) Go to terminal Verify that whether the .class file is there in bin folder.
9) Create jar
10) yarn jar jarfile_name(path) fullyqualifiedclass_name inputfile result
(As jar is not executable jar we need to provide fullyqualifiedclass_name)
MapReduce:
MapReduce is ETL Framework.
Ijested file in hdfs file
broken down into blocks
After running map-reduce job.map task process the input of MapReduce job,with amap task
assigned
Process data parallely on each map
Produce data in key value format.
Shuffle-sort method: data with same key is combined,and sent to reducer
So o/p of shuffle and sort is sent to reducer as input.reduce will generate final o/p
Ex word_count
1) Load constitution.txt instatging area
2) Run command
yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount
constitution.txt wordcount_op
(Wordcount class already present in no need to create just run above command)
3) Hdfs dfs -ls -R to check op folder wordcount_op created
4) Hdfs dfs -cat wordcount_op/part-r-00000 to see reducer output
Maper is java class
Public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable>
{
//need to override map method of Mapper class
Protected void map(LongWritable Key,Text value)
}
Public class WordCountReducer extends Mapper<LongWritable,Text,Text,IntWritable>
{
//need to override map method of Mapper class
Protected void reduce(LongWritable Key,Text value)
}
WordCountJob class
As developer you cannot instantiate Mapper or Reducer class so that responsibility taken by
Job class.Job is responsible for instatiating Mapper and
Reducer.
Job job=Job.getInstance(getConf(),”WordCountJob)
This line is tell job class gather the configuration of namenode