Hadoop
Hadoop
04 LTS
Update
fdp17@fdp17-Veriton-M200-H81:~$ sudo apt-get update
Install JDK
Check Version
Install SSH
fdp17@fdp17-Veriton-M200-H81:~$ sudoapt-get install ssh
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
ncurses-term openssh-client openssh-server openssh-sftp-server ssh-import-id
Suggested packages:
ssh-askpass libpam-ssh keychain monkeysphere rssh molly-guard
The following NEW packages will be installed:
ncurses-term openssh-server openssh-sftp-server ssh ssh-import-id
The following packages will be upgraded:
openssh-client
1 upgraded, 5 newly installed, 0 to remove and 178 not upgraded.
Need to get 1,230 kB of archives.
After this operation, 5,244 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-client amd64 1:7.2p2-
4ubuntu2.2 [587 kB]
Get:2 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-sftp-server amd64 1:7.2p2-
4ubuntu2.2 [38.7 kB]
Get:3 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-server amd64 1:7.2p2-
4ubuntu2.2 [338 kB]
Get:4 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 ssh all 1:7.2p2-4ubuntu2.2 [7,076 B]
Get:5 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 ncurses-term all 6.0+20160213-1ubuntu1 [249
kB]
Get:6 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 ssh-import-id all 5.5-0ubuntu1 [10.2 kB]
Fetched 1,230 kB in 2s (583 kB/s)
Preconfiguring packages ...
(Reading database ... 188613 files and directories currently installed.)
Preparing to unpack .../openssh-client_1%3a7.2p2-4ubuntu2.2_amd64.deb ...
Unpacking openssh-client (1:7.2p2-4ubuntu2.2) over (1:7.2p2-4ubuntu2.1) ...
Selecting previously unselected package openssh-sftp-server.
Preparing to unpack .../openssh-sftp-server_1%3a7.2p2-4ubuntu2.2_amd64.deb ...
Unpacking openssh-sftp-server (1:7.2p2-4ubuntu2.2) ...
Selecting previously unselected package openssh-server.
Preparing to unpack .../openssh-server_1%3a7.2p2-4ubuntu2.2_amd64.deb ...
Unpacking openssh-server (1:7.2p2-4ubuntu2.2) ...
Selecting previously unselected package ssh.
Preparing to unpack .../ssh_1%3a7.2p2-4ubuntu2.2_all.deb ...
Unpacking ssh (1:7.2p2-4ubuntu2.2) ...
Selecting previously unselected package ncurses-term.
Preparing to unpack .../ncurses-term_6.0+20160213-1ubuntu1_all.deb ...
Unpacking ncurses-term (6.0+20160213-1ubuntu1) ...
Selecting previously unselected package ssh-import-id.
Preparing to unpack .../ssh-import-id_5.5-0ubuntu1_all.deb ...
Unpacking ssh-import-id (5.5-0ubuntu1) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for ufw (0.35-0ubuntu2) ...
Processing triggers for systemd (229-4ubuntu16) ...
Processing triggers for ureadahead (0.100.0-19) ...
ureadahead will be reprofiled on next reboot
Setting up openssh-client (1:7.2p2-4ubuntu2.2) ...
Setting up openssh-sftp-server (1:7.2p2-4ubuntu2.2) ...
Setting up openssh-server (1:7.2p2-4ubuntu2.2) ...
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:ENIl49vMNmyHFQMWhQ+7wfyERkQOA6XUx3TpTVzBkgk root@fdp17-Veriton-M200-
H81 (RSA)
Creating SSH2 DSA key; this may take some time ...
1024 SHA256:m8uM/6fhMPV7Ac0+4ROrlQcR36TA5tbT07/OKd7Sv3o root@fdp17-Veriton-M200-H81
(DSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:x+7TNccRUWPACHLzqvB8dfQ99i7/QzGY8lkE2G1bDHM root@fdp17-Veriton-M200-H81
(ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:SYNVzUtPB8yy3U01cxQ7OfKZ6Wi7i5hcEpzdXEx6K5Q root@fdp17-Veriton-M200-H81
(ED25519)
Setting up ssh (1:7.2p2-4ubuntu2.2) ...
Setting up ncurses-term (6.0+20160213-1ubuntu1) ...
Setting up ssh-import-id (5.5-0ubuntu1) ...
Processing triggers for systemd (229-4ubuntu16) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for ufw (0.35-0ubuntu2) ...
fdp17@fdp17-Veriton-M200-H81:~$ su hduser
Password:
hduser@fdp17-Veriton-M200-H81:/home/fdp17$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:/xOGOuWDb/rGI1l07EQq8b2siNTQcTmpfDyYNLAPeKU hduser@fdp17-Veriton-M200-H81
The key's randomart image is:
+---[RSA 2048]----+
| ... o |
| . += = . |
| . E+ @ * |
| ..oB B = |
| oS+ B . |
| . ..+ * |
| . . X.o . |
| . B O.. |
| .Ooo.. |
+----[SHA256]-----+
KEy Transfer
hduser@fdp17-Veriton-M200-H81:/home/fdp17$ cat /home/hduser/.ssh/id_rsa.pub >>
/home/hduser/.ssh/authorized_keys
The second command adds the newly created key to the list of authorized keys so that Hadoop can
use ssh without prompting for a password.
We can check if ssh works:
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Check Java
hduser@fdp17-Veriton-M200-H81:~$ update-alternatives --config java
There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java-8-openjdk-
amd64/jre/bin/java
Nothing to configure
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
We need to set JAVA_HOME by modifying hadoop-env.sh file.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Adding the above statement in the hadoop-env.sh file ensures that the value of JAVA_HOME
variable will be available to Hadoop whenever it is started up.
3. /usr/local/hadoop/etc/hadoop/core-site.xml:
The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop
uses when starting up.
This file can be used to override the default settings that Hadoop starts with.
hduser@laptop:~$ sudo mkdir -p /app/hadoop/tmp
hduser@laptop:~$ sudo chown hduser:hadoop /app/hadoop/tmp
Open the file and enter the following in between the <configuration></configuration> tag:
hduser@laptop:~$ vi /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml
By default, the /usr/local/hadoop/etc/hadoop/ folder contains
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
file which has to be renamed/copied with the name mapred-site.xml:
hduser@laptop:~$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml
The mapred-site.xml file is used to specify which framework is being used for MapReduce.
We need to enter the following content in between the <configuration></configuration> tag:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the
cluster that is being used.
It is used to specify the directories which will be used as the namenode and the datanode on that
host.
Before editing this file, we need to create two directories which will contain the namenode and the
datanode for this Hadoop installation.
This can be done using the following commands:
Open the file and enter the following content in between the <configuration></configuration> tag:
hduser@laptop:~$ nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
bogotobogo.com site search:
Format the New Hadoop Filesystem
Now, the Hadoop file system needs to be formatted so that we can start to use it. The format
command should be issued with write permission since it creates current directory
under /usr/local/hadoop_store/hdfs/namenode folder:
hduser@laptop:~$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
k@laptop:/usr/local/hadoop/sbin$ ls
distribute-exclude.sh start-all.cmd stop-balancer.sh
hadoop-daemon.sh start-all.sh stop-dfs.cmd
hadoop-daemons.sh start-balancer.sh stop-dfs.sh
hdfs-config.cmd start-dfs.cmd stop-secure-dns.sh
hdfs-config.sh start-dfs.sh stop-yarn.cmd
httpfs.sh start-secure-dns.sh stop-yarn.sh
kms.sh start-yarn.cmd yarn-daemon.sh
mr-jobhistory-daemon.sh start-yarn.sh yarn-daemons.sh
refresh-namenodes.sh stop-all.cmd
slaves.sh stop-all.sh
hduser@laptop:/usr/local/hadoop/sbin$ start-all.sh
hduser@laptop:~$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/04/18 16:43:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-
laptop.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-laptop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-
secondarynamenode-laptop.out
15/04/18 16:43:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-
laptop.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-
laptop.out
We can check if it's really up and running:
hduser@laptop:/usr/local/hadoop/sbin$ jps
9026 NodeManager
f7348 NameNode
9766 Jps
8887 ResourceManager
7507 DataNode
The output means that we now have a functional instance of Hadoop running on our VPS (Virtual
private server).
Another way to check is using netstat:
hduser@laptop:~$ netstat -plten | grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 1843372 10605/java
tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 1841277 10447/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 1841130 10895/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 1840196 10447/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 1841320 10605/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 1841646 10605/java
tcp6 0 0 :::8040 :::* LISTEN 1001 1845543 11383/java
tcp6 0 0 :::8042 :::* LISTEN 1001 1845551 11383/java
tcp6 0 0 :::8088 :::* LISTEN 1001 1842110 11252/java
tcp6 0 0 :::49630 :::* LISTEN 1001 1845534 11383/java
tcp6 0 0 :::8030 :::* LISTEN 1001 1842036 11252/java
tcp6 0 0 :::8031 :::* LISTEN 1001 1842005 11252/java
tcp6 0 0 :::8032 :::* LISTEN 1001 1842100 11252/java
tcp6 0 0 :::8033 :::* LISTEN 1001 1842162 11252/java
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/
ClusterSetup.html#Hadoop_Startup
Stopping Hadoop
$ pwd
/usr/local/hadoop/sbin
To see a list of options for each example, add the example name to this command. The following is a list of the
aggregatewordcount: An Aggregate-based map/reduce program that counts the words in the input
files.
aggregatewordhist: An Aggregate-based map/reduce program that computes the histogram of the
words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute the exact digits of pi.
dbcount: An example job that counts the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute the exact bits of pi.
grep: A map/reduce program that counts the matches to a regex in the input.
join: A job that effects a join over sorted, equally partitioned data sets.
sort: A map/reduce program that sorts the data written by the random writer.
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of
the words in the input files.
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
I
mport org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
FINAL
STEP 7: Wordcount: Create a file called WordCount.java.
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
Compile:
1
$ javac WordCount.java -cp $(hadoop classpath)
The hadoop classpath provides the compiler with all the paths it needs to compile
correctly and you should see a resulting WordCount.class appear in the directory.Create Jar File:
jar cf wc.jar WordCount*.class
Create HDFS Directory
•
•
/usr/local/Cellar/hadoop/input - input directory in HDFS
/usr/local/Cellar/hadoop/output - output directory in HDFS
hdfs dfs -mkdir -p /usr/local/Cellar/hadoop/input
hdfs dfs -mkdir -p /usr/local/Cellar/hadoop/output
Create text file locally and move to HDFS directory
DOREENs-MacBook-Air:Cellar doreenrobin$ nano file01.txt
DOREENs-MacBook-Air:Cellar doreenrobin$ pwd
/usr/local/Cellar
DOREENs-MacBook-Air:Cellar doreenrobin$ ls
file01.txt
hadoop
DOREENs-MacBook-Air:Cellar doreenrobin$ hadoop fs -put /usr/local/Cellar/
file01.txt /usr/local/Cellar/hadoop/input
17/04/03 14:32:22 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Running the Mapreduce program
DOREENs-MacBook-Air:2.7.3 doreenrobin$ bin/hadoop jar wc.jar WordCount /
usr/local/Cellar/hadoop/input /usr/local/Cellar/hadoop/output/file03
17/04/03 14:40:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/04/03 14:40:25 INFO Configuration.deprecation: session.id is deprecated. Instead,
use dfs.metrics.session-id
17/04/03 14:40:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
17/04/03 14:40:25 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
17/04/03 14:40:25 WARN mapreduce.JobResourceUploader: Hadoop command-line
option parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
17/04/03 14:40:26 INFO mapred.FileInputFormat: Total input paths to process : 2
17/04/03 14:40:26 INFO mapreduce.JobSubmitter: number of splits:2
17/04/03 14:40:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_local428776688_0001
17/04/03 14:40:26 INFO mapreduce.Job: The url to track the job: http://localhost:
8080/17/04/03 14:40:26 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/04/03 14:40:26 INFO mapreduce.Job: Running job: job_local428776688_0001
17/04/03 14:40:26 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
17/04/03 14:40:26 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 1
17/04/03 14:40:26 INFO mapred.LocalJobRunner: Waiting for map tasks
17/04/03 14:40:26 INFO mapred.LocalJobRunner: Starting task:
attempt_local428776688_0001_m_000000_0
17/04/03 14:40:26 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 1
17/04/03 14:40:26 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
17/04/03 14:40:26 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
17/04/03 14:40:26 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/
usr/local/Cellar/hadoop/input/file02.txt:0+29
17/04/03 14:40:27 INFO mapred.MapTask: numReduceTasks: 1
17/04/03 14:40:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/04/03 14:40:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/04/03 14:40:27 INFO mapred.MapTask: soft limit at 83886080
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/04/03 14:40:27 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/04/03 14:40:27 INFO mapred.LocalJobRunner:
17/04/03 14:40:27 INFO mapred.MapTask: Starting flush of map output
17/04/03 14:40:27 INFO mapred.MapTask: Spilling map output
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufend = 44; bufvoid =
104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend =
26214384(104857536); length = 13/6553600
17/04/03 14:40:27 INFO mapred.MapTask: Finished spill 0
17/04/03 14:40:27 INFO mapred.Task:
Task:attempt_local428776688_0001_m_000000_0 is done. And is in the process of
committing
17/04/03 14:40:27 INFO mapred.LocalJobRunner: hdfs://localhost:9000/usr/local/
Cellar/hadoop/input/file02.txt:0+29
17/04/03 14:40:27 INFO mapred.Task: Task
'attempt_local428776688_0001_m_000000_0' done.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Finishing task:
attempt_local428776688_0001_m_000000_0
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Starting task:
attempt_local428776688_0001_m_000001_0
17/04/03 14:40:27 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 117/04/03 14:40:27 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
17/04/03 14:40:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
17/04/03 14:40:27 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/
usr/local/Cellar/hadoop/input/file01.txt:0+22
17/04/03 14:40:27 INFO mapred.MapTask: numReduceTasks: 1
17/04/03 14:40:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/04/03 14:40:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/04/03 14:40:27 INFO mapred.MapTask: soft limit at 83886080
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/04/03 14:40:27 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/04/03 14:40:27 INFO mapred.LocalJobRunner:
17/04/03 14:40:27 INFO mapred.MapTask: Starting flush of map output
17/04/03 14:40:27 INFO mapred.MapTask: Spilling map output
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufend = 38; bufvoid =
104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend =
26214384(104857536); length = 13/6553600
17/04/03 14:40:27 INFO mapred.MapTask: Finished spill 0
17/04/03 14:40:27 INFO mapred.Task:
Task:attempt_local428776688_0001_m_000001_0 is done. And is in the process of
committing
17/04/03 14:40:27 INFO mapred.LocalJobRunner: hdfs://localhost:9000/usr/local/
Cellar/hadoop/input/file01.txt:0+22
17/04/03 14:40:27 INFO mapred.Task: Task
'attempt_local428776688_0001_m_000001_0' done.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Finishing task:
attempt_local428776688_0001_m_000001_0
17/04/03 14:40:27 INFO mapred.LocalJobRunner: map task executor complete.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Waiting for reduce tasks
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Starting task:
attempt_local428776688_0001_r_000000_0
17/04/03 14:40:27 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 1
17/04/03 14:40:27 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
17/04/03 14:40:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
17/04/03 14:40:27 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:
org.apache.hadoop.mapreduce.task.reduce.Shuffle@7e5ccf1d
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: MergerManager:
memoryLimit=334338464, maxSingleShuffleLimit=83584616,
mergeThreshold=220663392, ioSortFactor=10,
memToMemMergeOutputsThreshold=1017/04/03 14:40:27 INFO reduce.EventFetcher:
attempt_local428776688_0001_r_000000_0 Thread started: EventFetcher for
fetching Map Completion Events
17/04/03 14:40:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output
of map attempt_local428776688_0001_m_000000_0 decomp: 41 len: 45 to
MEMORY
17/04/03 14:40:27 INFO reduce.InMemoryMapOutput: Read 41 bytes from map-
output for attempt_local428776688_0001_m_000000_0
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-
output of size: 41, inMemoryMapOutputs.size() -> 1, commitMemory -> 0,
usedMemory ->41
17/04/03 14:40:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output
of map attempt_local428776688_0001_m_000001_0 decomp: 36 len: 40 to
MEMORY
17/04/03 14:40:27 INFO reduce.InMemoryMapOutput: Read 36 bytes from map-
output for attempt_local428776688_0001_m_000001_0
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-
output of size: 36, inMemoryMapOutputs.size() -> 2, commitMemory -> 41,
usedMemory ->77
17/04/03 14:40:27 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
17/04/03 14:40:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-
memory map-outputs and 0 on-disk map-outputs
17/04/03 14:40:27 INFO mapred.Merger: Merging 2 sorted segments
17/04/03 14:40:27 INFO mapred.Merger: Down to the last merge-pass, with 2
segments left of total size: 61 bytes
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: Merged 2 segments, 77 bytes
to disk to satisfy reduce memory limit
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: Merging 1 files, 79 bytes from
disk
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes
from memory into reduce
17/04/03 14:40:27 INFO mapred.Merger: Merging 1 sorted segments
17/04/03 14:40:27 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 69 bytes
17/04/03 14:40:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/04/03 14:40:27 INFO mapreduce.Job: Job job_local428776688_0001 running in
uber mode : false
17/04/03 14:40:27 INFO mapreduce.Job: map 100% reduce 0%
17/04/03 14:40:27 INFO mapred.Task:
Task:attempt_local428776688_0001_r_000000_0 is done. And is in the process of
committing
17/04/03 14:40:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/04/03 14:40:27 INFO mapred.Task: Task
attempt_local428776688_0001_r_000000_0 is allowed to commit now17/04/03 14:40:27 INFO
output.FileOutputCommitter: Saved output of task
'attempt_local428776688_0001_r_000000_0' to hdfs://localhost:9000/usr/local/
Cellar/hadoop/output/file03/_temporary/0/task_local428776688_0001_r_000000
17/04/03 14:40:27 INFO mapred.LocalJobRunner: reduce > reduce
17/04/03 14:40:27 INFO mapred.Task: Task
'attempt_local428776688_0001_r_000000_0' done.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Finishing task:
attempt_local428776688_0001_r_000000_0
17/04/03 14:40:27 INFO mapred.LocalJobRunner: reduce task executor complete.
17/04/03 14:40:28 INFO mapreduce.Job: map 100% reduce 100%
17/04/03 14:40:28 INFO mapreduce.Job: Job job_local428776688_0001 completed
successfully
17/04/03 14:40:28 INFO mapreduce.Job: Counters: 35
Map-Reduce Framework
Spilled Records=12
Shuffled Maps =2
Failed Shuffles=0
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=51
File Output Format Counters
Bytes Written=41
////////////////////
WebUrl
http://localhost:50070/explorer.html#/usr/local/Cellar/hadoop/output/file03
UBUNTU
FILES TO HADOOP
The following command is used to create an input directory in HDFS.
$HADOOP_HOME/bin/hadoop fs -mkdir input_dir
Step 5
The following command is used to copy the input file named sample.txtin the input directory of HDFS.
$HADOOP_HOME/bin/hadoop fs -put /home/hadoop/sample.txt input_dir
Step 6
The following command is used to verify the files in the input directory.
$HADOOP_HOME/bin/hadoop fs -ls input_dir/