0% found this document useful (0 votes)
20 views

Hadoop

The document details the steps to install and configure Hadoop on Ubuntu 16.04 LTS. It covers installing Java, creating Hadoop user and group, downloading and extracting Hadoop, configuring environment variables, and verifying the installation.

Uploaded by

Jayashree
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Hadoop

The document details the steps to install and configure Hadoop on Ubuntu 16.04 LTS. It covers installing Java, creating Hadoop user and group, downloading and extracting Hadoop, configuring environment variables, and verifying the installation.

Uploaded by

Jayashree
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 28

Hadoop on 16.

04 LTS

Update
fdp17@fdp17-Veriton-M200-H81:~$ sudo apt-get update

Install JDK

fdp17@fdp17-Veriton-M200-H81:~$ sudo apt-get install default-jdk

Check Version

fdp17@fdp17-Veriton-M200-H81:~$ java -version


openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

k@laptop:~$ sudo addgroup hadoop


Adding group `hadoop' (GID 1002) ...
Done.

Creating Hadoop Group and hduser

fdp17@fdp17-Veriton-M200-H81:~$ sudo adduser --ingroup hadoop hduser


Adding user `hduser' ...
Adding new user `hduser' (1001) with group `hadoop' ...
Creating home directory `/home/hduser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
 Home Phone []:
 Other []:
 Is the information correct? [Y/n]
Add hduser to sudo
sudo usermod -a -G sudo hduser


 Install SSH

 fdp17@fdp17-Veriton-M200-H81:~$ sudoapt-get install ssh
 Reading package lists... Done
 Building dependency tree
Reading state information... Done
The following additional packages will be installed:
ncurses-term openssh-client openssh-server openssh-sftp-server ssh-import-id
Suggested packages:
ssh-askpass libpam-ssh keychain monkeysphere rssh molly-guard
The following NEW packages will be installed:
ncurses-term openssh-server openssh-sftp-server ssh ssh-import-id
The following packages will be upgraded:
openssh-client
1 upgraded, 5 newly installed, 0 to remove and 178 not upgraded.
Need to get 1,230 kB of archives.
After this operation, 5,244 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-client amd64 1:7.2p2-
4ubuntu2.2 [587 kB]
Get:2 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-sftp-server amd64 1:7.2p2-
4ubuntu2.2 [38.7 kB]
Get:3 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-server amd64 1:7.2p2-
4ubuntu2.2 [338 kB]
Get:4 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 ssh all 1:7.2p2-4ubuntu2.2 [7,076 B]
Get:5 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 ncurses-term all 6.0+20160213-1ubuntu1 [249
kB]
Get:6 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 ssh-import-id all 5.5-0ubuntu1 [10.2 kB]
Fetched 1,230 kB in 2s (583 kB/s)
Preconfiguring packages ...
(Reading database ... 188613 files and directories currently installed.)
Preparing to unpack .../openssh-client_1%3a7.2p2-4ubuntu2.2_amd64.deb ...
Unpacking openssh-client (1:7.2p2-4ubuntu2.2) over (1:7.2p2-4ubuntu2.1) ...
Selecting previously unselected package openssh-sftp-server.
Preparing to unpack .../openssh-sftp-server_1%3a7.2p2-4ubuntu2.2_amd64.deb ...
Unpacking openssh-sftp-server (1:7.2p2-4ubuntu2.2) ...
Selecting previously unselected package openssh-server.
Preparing to unpack .../openssh-server_1%3a7.2p2-4ubuntu2.2_amd64.deb ...
Unpacking openssh-server (1:7.2p2-4ubuntu2.2) ...
Selecting previously unselected package ssh.
Preparing to unpack .../ssh_1%3a7.2p2-4ubuntu2.2_all.deb ...
Unpacking ssh (1:7.2p2-4ubuntu2.2) ...
Selecting previously unselected package ncurses-term.
Preparing to unpack .../ncurses-term_6.0+20160213-1ubuntu1_all.deb ...
Unpacking ncurses-term (6.0+20160213-1ubuntu1) ...
Selecting previously unselected package ssh-import-id.
Preparing to unpack .../ssh-import-id_5.5-0ubuntu1_all.deb ...
Unpacking ssh-import-id (5.5-0ubuntu1) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for ufw (0.35-0ubuntu2) ...
Processing triggers for systemd (229-4ubuntu16) ...
Processing triggers for ureadahead (0.100.0-19) ...
ureadahead will be reprofiled on next reboot
Setting up openssh-client (1:7.2p2-4ubuntu2.2) ...
Setting up openssh-sftp-server (1:7.2p2-4ubuntu2.2) ...
Setting up openssh-server (1:7.2p2-4ubuntu2.2) ...
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:ENIl49vMNmyHFQMWhQ+7wfyERkQOA6XUx3TpTVzBkgk root@fdp17-Veriton-M200-
H81 (RSA)
Creating SSH2 DSA key; this may take some time ...
1024 SHA256:m8uM/6fhMPV7Ac0+4ROrlQcR36TA5tbT07/OKd7Sv3o root@fdp17-Veriton-M200-H81
(DSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:x+7TNccRUWPACHLzqvB8dfQ99i7/QzGY8lkE2G1bDHM root@fdp17-Veriton-M200-H81
(ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:SYNVzUtPB8yy3U01cxQ7OfKZ6Wi7i5hcEpzdXEx6K5Q root@fdp17-Veriton-M200-H81
(ED25519)
Setting up ssh (1:7.2p2-4ubuntu2.2) ...
Setting up ncurses-term (6.0+20160213-1ubuntu1) ...
Setting up ssh-import-id (5.5-0ubuntu1) ...
Processing triggers for systemd (229-4ubuntu16) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for ufw (0.35-0ubuntu2) ...

CHECK ssh and sshd


fdp17@fdp17-Veriton-M200-H81:~$ which ssh
/usr/bin/ssh
fdp17@fdp17-Veriton-M200-H81:~$ which sshd
/usr/sbin/sshd

Switch user to hduser and generate Key

fdp17@fdp17-Veriton-M200-H81:~$ su hduser
Password:
hduser@fdp17-Veriton-M200-H81:/home/fdp17$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:/xOGOuWDb/rGI1l07EQq8b2siNTQcTmpfDyYNLAPeKU hduser@fdp17-Veriton-M200-H81
The key's randomart image is:
+---[RSA 2048]----+
| ... o |
| . += = . |
| . E+ @ * |
| ..oB B = |
| oS+ B . |
| . ..+ * |
| . . X.o . |
| . B O.. |
| .Ooo.. |
+----[SHA256]-----+

KEy Transfer
hduser@fdp17-Veriton-M200-H81:/home/fdp17$ cat /home/hduser/.ssh/id_rsa.pub >>
/home/hduser/.ssh/authorized_keys

The second command adds the newly created key to the list of authorized keys so that Hadoop can
use ssh without prompting for a password.
We can check if ssh works:

hduser@fdp17-Veriton-M200-H81:/home/fdp17$ ssh localhost


The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:x+7TNccRUWPACHLzqvB8dfQ99i7/QzGY8lkE2G1bDHM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.8.0-36-generic x86_64)

* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage

180 packages can be updated.


116 updates are security updates.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by


applicable law.

Downloadd Hadoop 2.7.3


fdp17@fdp17-Veriton-M200-H81:~$ wget
http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
Unzip

fdp17@fdp17-Veriton-M200-H81:~$ tar xvzf hadoop-2.7.3.tar.gz

Add hduser to sudo

fdp17@fdp17-Veriton-M200-H81:~$ sudo adduser hduser sudo


Adding user `hduser' to group `sudo' ...
Adding user hduser to group sudo
Done.

Move files to /usr/local/hadoop


hduser@fdp17-Veriton-M200-H81:~/hadoop-2.7.3$ ls
bin include libexec NOTICE.txt sbin
etc lib LICENSE.txt README.txt share
hduser@fdp17-Veriton-M200-H81:~/hadoop-2.7.3$ sudo mv * /usr/local/hadoop/
Grant Priviledges
hduser@fdp17-Veriton-M200-H81:~$ sudo chown -R hduser:hadoop /usr/local/hadoop/

Check Java
hduser@fdp17-Veriton-M200-H81:~$ update-alternatives --config java

There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java-8-openjdk-
amd64/jre/bin/java
Nothing to configure

Now we can append the following to the end of ~/.bashrc:


hduser@laptop:~$ nano ~/.bashrc
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

FOR VERSION HADOOP 3.0


#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export YARN_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_PREFIX/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
#HADOOP VARIABLES END

hduser@laptop:~$ source ~/.bashrc


note that the JAVA_HOME should be set as the path just before the '.../bin/':
hduser@ubuntu-VirtualBox:~$ javac -version
javac 1.7.0_75

hduser@ubuntu-VirtualBox:~$ which javac


/usr/bin/javac

hduser@ubuntu-VirtualBox:~$ readlink -f /usr/bin/javac


/usr/lib/jvm/java-7-openjdk-amd64/bin/javac

2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
We need to set JAVA_HOME by modifying hadoop-env.sh file.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Adding the above statement in the hadoop-env.sh file ensures that the value of JAVA_HOME
variable will be available to Hadoop whenever it is started up.

FOR VERSION HADOOP 3.1 and JAVA 10 on UBUNTU 18


Include
export HADOOP_OPTS="--add-modules java.activation"
(after case hadoop opts - 3 lines)

3. /usr/local/hadoop/etc/hadoop/core-site.xml:
The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop
uses when starting up.
This file can be used to override the default settings that Hadoop starts with.
hduser@laptop:~$ sudo mkdir -p /app/hadoop/tmp
hduser@laptop:~$ sudo chown hduser:hadoop /app/hadoop/tmp
Open the file and enter the following in between the <configuration></configuration> tag:
hduser@laptop:~$ vi /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

4. /usr/local/hadoop/etc/hadoop/mapred-site.xml
By default, the /usr/local/hadoop/etc/hadoop/ folder contains
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
file which has to be renamed/copied with the name mapred-site.xml:

hduser@laptop:~$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml

The mapred-site.xml file is used to specify which framework is being used for MapReduce.
We need to enter the following content in between the <configuration></configuration> tag:

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>

5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the
cluster that is being used.
It is used to specify the directories which will be used as the namenode and the datanode on that
host.
Before editing this file, we need to create two directories which will contain the namenode and the
datanode for this Hadoop installation.
This can be done using the following commands:

hduser@laptop:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode


hduser@laptop:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
hduser@laptop:~$ sudo chown -R hduser:hadoop /usr/local/hadoop_store

Open the file and enter the following content in between the <configuration></configuration> tag:
hduser@laptop:~$ nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
bogotobogo.com site search:
Format the New Hadoop Filesystem
Now, the Hadoop file system needs to be formatted so that we can start to use it. The format
command should be issued with write permission since it creates current directory
under /usr/local/hadoop_store/hdfs/namenode folder:
hduser@laptop:~$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

15/04/18 14:43:03 INFO namenode.NameNode: STARTUP_MSG:


/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = laptop/192.168.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop
...
STARTUP_MSG: java = 1.7.0_65
************************************************************/
15/04/18 14:43:03 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP,
INT]
15/04/18 14:43:03 INFO namenode.NameNode: createNameNode [-format]
15/04/18 14:43:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
Formatting using clusterid: CID-e2f515ac-33da-45bc-8466-5b1100a2bf7f
15/04/18 14:43:09 INFO namenode.FSNamesystem: No KeyProvider found.
15/04/18 14:43:09 INFO namenode.FSNamesystem: fsLock is fair:true
15/04/18 14:43:10 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/04/18 14:43:10 INFO blockmanagement.DatanodeManager:
dfs.namenode.datanode.registration.ip-hostname-check=true
15/04/18 14:43:10 INFO blockmanagement.BlockManager:
dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/04/18 14:43:10 INFO blockmanagement.BlockManager: The block deletion will start around
2015 Apr 18 14:43:10
15/04/18 14:43:10 INFO util.GSet: Computing capacity for map BlocksMap
15/04/18 14:43:10 INFO util.GSet: VM type = 64-bit
15/04/18 14:43:10 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/04/18 14:43:10 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/04/18 14:43:10 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/04/18 14:43:10 INFO blockmanagement.BlockManager: defaultReplication =1
15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxReplication = 512
15/04/18 14:43:10 INFO blockmanagement.BlockManager: minReplication =1
15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxReplicationStreams =2
15/04/18 14:43:10 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
15/04/18 14:43:10 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/04/18 14:43:10 INFO blockmanagement.BlockManager: encryptDataTransfer = false
15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
15/04/18 14:43:10 INFO namenode.FSNamesystem: fsOwner = hduser (auth:SIMPLE)
15/04/18 14:43:10 INFO namenode.FSNamesystem: supergroup = supergroup
15/04/18 14:43:10 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/04/18 14:43:10 INFO namenode.FSNamesystem: HA Enabled: false
15/04/18 14:43:10 INFO namenode.FSNamesystem: Append Enabled: true
15/04/18 14:43:11 INFO util.GSet: Computing capacity for map INodeMap
15/04/18 14:43:11 INFO util.GSet: VM type = 64-bit
15/04/18 14:43:11 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/04/18 14:43:11 INFO util.GSet: capacity = 2^20 = 1048576 entries
15/04/18 14:43:11 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/04/18 14:43:11 INFO util.GSet: Computing capacity for map cachedBlocks
15/04/18 14:43:11 INFO util.GSet: VM type = 64-bit
15/04/18 14:43:11 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/04/18 14:43:11 INFO util.GSet: capacity = 2^18 = 262144 entries
15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct =
0.9990000128746033
15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
15/04/18 14:43:11 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/04/18 14:43:11 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and
retry cache entry expiry time is 600000 millis
15/04/18 14:43:11 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/04/18 14:43:11 INFO util.GSet: VM type = 64-bit
15/04/18 14:43:11 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/04/18 14:43:11 INFO util.GSet: capacity = 2^15 = 32768 entries
15/04/18 14:43:11 INFO namenode.NNConf: ACLs enabled? false
15/04/18 14:43:11 INFO namenode.NNConf: XAttrs enabled? true
15/04/18 14:43:11 INFO namenode.NNConf: Maximum size of an xattr: 16384
15/04/18 14:43:12 INFO namenode.FSImage: Allocated new BlockPoolId: BP-130729900-
192.168.1.1-1429393391595
15/04/18 14:43:12 INFO common.Storage: Storage directory
/usr/local/hadoop_store/hdfs/namenode has been successfully formatted.
15/04/18 14:43:12 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with
txid >= 0
15/04/18 14:43:12 INFO util.ExitUtil: Exiting with status 0
15/04/18 14:43:12 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at laptop/192.168.1.1
************************************************************/
Note that hadoop namenode -format command should be executed once before we start using
Hadoop.
If this command is executed again after Hadoop has been used, it'll destroy all the data on the
Hadoop file system.
bogotobogo.com site search:
Starting Hadoop
Now it's time to start the newly installed single node cluster.
We can use start-all.sh or (start-dfs.sh and start-yarn.sh)
k@laptop:~$ cd /usr/local/hadoop/sbin

k@laptop:/usr/local/hadoop/sbin$ ls
distribute-exclude.sh start-all.cmd stop-balancer.sh
hadoop-daemon.sh start-all.sh stop-dfs.cmd
hadoop-daemons.sh start-balancer.sh stop-dfs.sh
hdfs-config.cmd start-dfs.cmd stop-secure-dns.sh
hdfs-config.sh start-dfs.sh stop-yarn.cmd
httpfs.sh start-secure-dns.sh stop-yarn.sh
kms.sh start-yarn.cmd yarn-daemon.sh
mr-jobhistory-daemon.sh start-yarn.sh yarn-daemons.sh
refresh-namenodes.sh stop-all.cmd
slaves.sh stop-all.sh

k@laptop:/usr/local/hadoop/sbin$ sudo su hduser

hduser@laptop:/usr/local/hadoop/sbin$ start-all.sh
hduser@laptop:~$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/04/18 16:43:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-
laptop.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-laptop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-
secondarynamenode-laptop.out
15/04/18 16:43:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-
laptop.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-
laptop.out
We can check if it's really up and running:
hduser@laptop:/usr/local/hadoop/sbin$ jps
9026 NodeManager
f7348 NameNode
9766 Jps
8887 ResourceManager
7507 DataNode
The output means that we now have a functional instance of Hadoop running on our VPS (Virtual
private server).
Another way to check is using netstat:
hduser@laptop:~$ netstat -plten | grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 1843372 10605/java
tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 1841277 10447/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 1841130 10895/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 1840196 10447/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 1841320 10605/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 1841646 10605/java
tcp6 0 0 :::8040 :::* LISTEN 1001 1845543 11383/java
tcp6 0 0 :::8042 :::* LISTEN 1001 1845551 11383/java
tcp6 0 0 :::8088 :::* LISTEN 1001 1842110 11252/java
tcp6 0 0 :::49630 :::* LISTEN 1001 1845534 11383/java
tcp6 0 0 :::8030 :::* LISTEN 1001 1842036 11252/java
tcp6 0 0 :::8031 :::* LISTEN 1001 1842005 11252/java
tcp6 0 0 :::8032 :::* LISTEN 1001 1842100 11252/java
tcp6 0 0 :::8033 :::* LISTEN 1001 1842162 11252/java

WEB INTERACE for HADOOP

Version 2.8 : localhost:50070


Version 3.0: /localhost:9870

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/
ClusterSetup.html#Hadoop_Startup

Stopping Hadoop
$ pwd
/usr/local/hadoop/sbin

Running an inbuilt mapreduce example


hduser@fdp17-Veriton-M200-H81:~$ hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 2 5

To see a list of options for each example, add the example name to this command. The following is a list of the

included jobs in the examples JAR file.

 aggregatewordcount: An Aggregate-based map/reduce program that counts the words in the input
files.
 aggregatewordhist: An Aggregate-based map/reduce program that computes the histogram of the
words in the input files.
 bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute the exact digits of pi.

 dbcount: An example job that counts the pageview counts from a database.

 distbbp: A map/reduce program that uses a BBP-type formula to compute the exact bits of pi.

 grep: A map/reduce program that counts the matches to a regex in the input.

 join: A job that effects a join over sorted, equally partitioned data sets.

 multifilewc: A job that counts words from several files.

 pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

 pi: A map/reduce program that estimates pi using a quasi-Monte Carlo method.


 randomtextwriter: A map/reduce program that writes 10 GB of random textual data per node.

 randomwriter: A map/reduce program that writes 10 GB of random data per node.

 secondarysort: An example defining a secondary sort to the reduce.

 sort: A map/reduce program that sorts the data written by the random writer.

 sudoku: A Sudoku solver.

 teragen: Generate data for the terasort.

 terasort: Run the terasort.

 teravalidate: Check the results of the terasort.

 wordcount: A map/reduce program that counts the words in the input files.

 wordmean: A map/reduce program that counts the average length of the words in the input files.

 wordmedian: A map/reduce program that counts the median length of the words in the input files.

 wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of
the words in the input files.

Giving permission to folder to execute java program


sudo chmod -R 777 wordcount/

WordCOunt java Program

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
I
mport org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

public static class Map extends MapReduceBase implements


Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

public static class Reduce extends MapReduceBase implements


Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {


JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}

FINAL
STEP 7: Wordcount: Create a file called WordCount.java.
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
Compile:
1
$ javac WordCount.java -cp $(hadoop classpath)
The hadoop classpath provides the compiler with all the paths it needs to compile
correctly and you should see a resulting WordCount.class appear in the directory.Create Jar File:
jar cf wc.jar WordCount*.class
Create HDFS Directory


/usr/local/Cellar/hadoop/input - input directory in HDFS
/usr/local/Cellar/hadoop/output - output directory in HDFS
hdfs dfs -mkdir -p /usr/local/Cellar/hadoop/input
hdfs dfs -mkdir -p /usr/local/Cellar/hadoop/output
Create text file locally and move to HDFS directory
DOREENs-MacBook-Air:Cellar doreenrobin$ nano file01.txt
DOREENs-MacBook-Air:Cellar doreenrobin$ pwd
/usr/local/Cellar
DOREENs-MacBook-Air:Cellar doreenrobin$ ls
file01.txt
hadoop
DOREENs-MacBook-Air:Cellar doreenrobin$ hadoop fs -put /usr/local/Cellar/
file01.txt /usr/local/Cellar/hadoop/input
17/04/03 14:32:22 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Running the Mapreduce program
DOREENs-MacBook-Air:2.7.3 doreenrobin$ bin/hadoop jar wc.jar WordCount /
usr/local/Cellar/hadoop/input /usr/local/Cellar/hadoop/output/file03
17/04/03 14:40:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/04/03 14:40:25 INFO Configuration.deprecation: session.id is deprecated. Instead,
use dfs.metrics.session-id
17/04/03 14:40:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
17/04/03 14:40:25 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
17/04/03 14:40:25 WARN mapreduce.JobResourceUploader: Hadoop command-line
option parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
17/04/03 14:40:26 INFO mapred.FileInputFormat: Total input paths to process : 2
17/04/03 14:40:26 INFO mapreduce.JobSubmitter: number of splits:2
17/04/03 14:40:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_local428776688_0001
17/04/03 14:40:26 INFO mapreduce.Job: The url to track the job: http://localhost:
8080/17/04/03 14:40:26 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/04/03 14:40:26 INFO mapreduce.Job: Running job: job_local428776688_0001
17/04/03 14:40:26 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
17/04/03 14:40:26 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 1
17/04/03 14:40:26 INFO mapred.LocalJobRunner: Waiting for map tasks
17/04/03 14:40:26 INFO mapred.LocalJobRunner: Starting task:
attempt_local428776688_0001_m_000000_0
17/04/03 14:40:26 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 1
17/04/03 14:40:26 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
17/04/03 14:40:26 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
17/04/03 14:40:26 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/
usr/local/Cellar/hadoop/input/file02.txt:0+29
17/04/03 14:40:27 INFO mapred.MapTask: numReduceTasks: 1
17/04/03 14:40:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/04/03 14:40:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/04/03 14:40:27 INFO mapred.MapTask: soft limit at 83886080
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/04/03 14:40:27 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/04/03 14:40:27 INFO mapred.LocalJobRunner:
17/04/03 14:40:27 INFO mapred.MapTask: Starting flush of map output
17/04/03 14:40:27 INFO mapred.MapTask: Spilling map output
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufend = 44; bufvoid =
104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend =
26214384(104857536); length = 13/6553600
17/04/03 14:40:27 INFO mapred.MapTask: Finished spill 0
17/04/03 14:40:27 INFO mapred.Task:
Task:attempt_local428776688_0001_m_000000_0 is done. And is in the process of
committing
17/04/03 14:40:27 INFO mapred.LocalJobRunner: hdfs://localhost:9000/usr/local/
Cellar/hadoop/input/file02.txt:0+29
17/04/03 14:40:27 INFO mapred.Task: Task
'attempt_local428776688_0001_m_000000_0' done.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Finishing task:
attempt_local428776688_0001_m_000000_0
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Starting task:
attempt_local428776688_0001_m_000001_0
17/04/03 14:40:27 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 117/04/03 14:40:27 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
17/04/03 14:40:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
17/04/03 14:40:27 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/
usr/local/Cellar/hadoop/input/file01.txt:0+22
17/04/03 14:40:27 INFO mapred.MapTask: numReduceTasks: 1
17/04/03 14:40:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/04/03 14:40:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/04/03 14:40:27 INFO mapred.MapTask: soft limit at 83886080
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/04/03 14:40:27 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/04/03 14:40:27 INFO mapred.LocalJobRunner:
17/04/03 14:40:27 INFO mapred.MapTask: Starting flush of map output
17/04/03 14:40:27 INFO mapred.MapTask: Spilling map output
17/04/03 14:40:27 INFO mapred.MapTask: bufstart = 0; bufend = 38; bufvoid =
104857600
17/04/03 14:40:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend =
26214384(104857536); length = 13/6553600
17/04/03 14:40:27 INFO mapred.MapTask: Finished spill 0
17/04/03 14:40:27 INFO mapred.Task:
Task:attempt_local428776688_0001_m_000001_0 is done. And is in the process of
committing
17/04/03 14:40:27 INFO mapred.LocalJobRunner: hdfs://localhost:9000/usr/local/
Cellar/hadoop/input/file01.txt:0+22
17/04/03 14:40:27 INFO mapred.Task: Task
'attempt_local428776688_0001_m_000001_0' done.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Finishing task:
attempt_local428776688_0001_m_000001_0
17/04/03 14:40:27 INFO mapred.LocalJobRunner: map task executor complete.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Waiting for reduce tasks
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Starting task:
attempt_local428776688_0001_r_000000_0
17/04/03 14:40:27 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 1
17/04/03 14:40:27 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
17/04/03 14:40:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
17/04/03 14:40:27 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:
org.apache.hadoop.mapreduce.task.reduce.Shuffle@7e5ccf1d
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: MergerManager:
memoryLimit=334338464, maxSingleShuffleLimit=83584616,
mergeThreshold=220663392, ioSortFactor=10,
memToMemMergeOutputsThreshold=1017/04/03 14:40:27 INFO reduce.EventFetcher:
attempt_local428776688_0001_r_000000_0 Thread started: EventFetcher for
fetching Map Completion Events
17/04/03 14:40:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output
of map attempt_local428776688_0001_m_000000_0 decomp: 41 len: 45 to
MEMORY
17/04/03 14:40:27 INFO reduce.InMemoryMapOutput: Read 41 bytes from map-
output for attempt_local428776688_0001_m_000000_0
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-
output of size: 41, inMemoryMapOutputs.size() -> 1, commitMemory -> 0,
usedMemory ->41
17/04/03 14:40:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output
of map attempt_local428776688_0001_m_000001_0 decomp: 36 len: 40 to
MEMORY
17/04/03 14:40:27 INFO reduce.InMemoryMapOutput: Read 36 bytes from map-
output for attempt_local428776688_0001_m_000001_0
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-
output of size: 36, inMemoryMapOutputs.size() -> 2, commitMemory -> 41,
usedMemory ->77
17/04/03 14:40:27 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
17/04/03 14:40:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-
memory map-outputs and 0 on-disk map-outputs
17/04/03 14:40:27 INFO mapred.Merger: Merging 2 sorted segments
17/04/03 14:40:27 INFO mapred.Merger: Down to the last merge-pass, with 2
segments left of total size: 61 bytes
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: Merged 2 segments, 77 bytes
to disk to satisfy reduce memory limit
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: Merging 1 files, 79 bytes from
disk
17/04/03 14:40:27 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes
from memory into reduce
17/04/03 14:40:27 INFO mapred.Merger: Merging 1 sorted segments
17/04/03 14:40:27 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 69 bytes
17/04/03 14:40:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/04/03 14:40:27 INFO mapreduce.Job: Job job_local428776688_0001 running in
uber mode : false
17/04/03 14:40:27 INFO mapreduce.Job: map 100% reduce 0%
17/04/03 14:40:27 INFO mapred.Task:
Task:attempt_local428776688_0001_r_000000_0 is done. And is in the process of
committing
17/04/03 14:40:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/04/03 14:40:27 INFO mapred.Task: Task
attempt_local428776688_0001_r_000000_0 is allowed to commit now17/04/03 14:40:27 INFO
output.FileOutputCommitter: Saved output of task
'attempt_local428776688_0001_r_000000_0' to hdfs://localhost:9000/usr/local/
Cellar/hadoop/output/file03/_temporary/0/task_local428776688_0001_r_000000
17/04/03 14:40:27 INFO mapred.LocalJobRunner: reduce > reduce
17/04/03 14:40:27 INFO mapred.Task: Task
'attempt_local428776688_0001_r_000000_0' done.
17/04/03 14:40:27 INFO mapred.LocalJobRunner: Finishing task:
attempt_local428776688_0001_r_000000_0
17/04/03 14:40:27 INFO mapred.LocalJobRunner: reduce task executor complete.
17/04/03 14:40:28 INFO mapreduce.Job: map 100% reduce 100%
17/04/03 14:40:28 INFO mapreduce.Job: Job job_local428776688_0001 completed
successfully
17/04/03 14:40:28 INFO mapreduce.Job: Counters: 35

File System Counters

FILE: Number of bytes read=10367

FILE: Number of bytes written=906415

FILE: Number of read operations=0


FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=131

HDFS: Number of bytes written=41

HDFS: Number of read operations=22

HDFS: Number of large read operations=0

HDFS: Number of write operations=5

Map-Reduce Framework

Map input records=3

Map output records=8

Map output bytes=82

Map output materialized bytes=85

Input split bytes=228

Combine input records=8


Combine output records=6

Reduce input groups=5

Reduce shuffle bytes=85

Reduce input records=6

Reduce output records=5

Spilled Records=12

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=0

Total committed heap usage (bytes)=957874176

Shuffle Errors

BAD_ID=0

CONNECTION=0
IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0
File Input Format Counters

Bytes Read=51
File Output Format Counters

Bytes Written=41
////////////////////
WebUrl
http://localhost:50070/explorer.html#/usr/local/Cellar/hadoop/output/file03

UBUNTU

hduser@doreen-B250M-D3H:~/wordcount$ sudo nano WordCount.java

[sudo] password for hduser:


hduser@doreen-B250M-D3H:~/wordcount$ javac WordCount.java -cp $(hadoop classpath)

hduser@doreen-B250M-D3H:~/wordcount$ jar cf wc.jar WordCount*.class

FILES TO HADOOP
The following command is used to create an input directory in HDFS.
$HADOOP_HOME/bin/hadoop fs -mkdir input_dir
Step 5
The following command is used to copy the input file named sample.txtin the input directory of HDFS.
$HADOOP_HOME/bin/hadoop fs -put /home/hadoop/sample.txt input_dir

Step 6
The following command is used to verify the files in the input directory.
$HADOOP_HOME/bin/hadoop fs -ls input_dir/

hduser@doreen-B250M-D3H:~/wordcount$ hadoop jar wc.jar WordCount /user/hduser/input


/user/hduser/output1
17/07/08 12:26:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
17/07/08 12:26:16 INFO Configuration.deprecation: session.id is deprecated. Instead, use
dfs.metrics.session-id
17/07/08 12:26:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker,
sessionId=
17/07/08 12:26:16 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker,
sessionId= - already initialized
17/07/08 12:26:16 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not
performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/07/08 12:26:16 INFO mapred.FileInputFormat: Total input paths to process : 1
17/07/08 12:26:16 INFO mapreduce.JobSubmitter: number of splits:1
17/07/08 12:26:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local170299371_0001
17/07/08 12:26:16 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/07/08 12:26:16 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/07/08 12:26:16 INFO mapreduce.Job: Running job: job_local170299371_0001
17/07/08 12:26:16 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
17/07/08 12:26:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/08 12:26:16 INFO mapred.LocalJobRunner: Waiting for map tasks
17/07/08 12:26:16 INFO mapred.LocalJobRunner: Starting task:
attempt_local170299371_0001_m_000000_0
17/07/08 12:26:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/08 12:26:16 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/07/08 12:26:16 INFO mapred.MapTask: Processing split:
hdfs://localhost:54310/user/hduser/input/sample.txt:0+95
17/07/08 12:26:16 INFO mapred.MapTask: numReduceTasks: 1
17/07/08 12:26:16 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/07/08 12:26:16 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/07/08 12:26:16 INFO mapred.MapTask: soft limit at 83886080
17/07/08 12:26:16 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/07/08 12:26:16 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/07/08 12:26:16 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/07/08 12:26:16 INFO mapred.LocalJobRunner:
17/07/08 12:26:16 INFO mapred.MapTask: Starting flush of map output
17/07/08 12:26:16 INFO mapred.MapTask: Spilling map output
17/07/08 12:26:16 INFO mapred.MapTask: bufstart = 0; bufend = 147; bufvoid = 104857600
17/07/08 12:26:16 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend =
26214348(104857392); length = 49/6553600
17/07/08 12:26:16 INFO mapred.MapTask: Finished spill 0
17/07/08 12:26:16 INFO mapred.Task: Task:attempt_local170299371_0001_m_000000_0 is done. And is in
the process of committing
17/07/08 12:26:16 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/input/sample.txt:0+95
17/07/08 12:26:16 INFO mapred.Task: Task 'attempt_local170299371_0001_m_000000_0' done.
17/07/08 12:26:16 INFO mapred.LocalJobRunner: Finishing task:
attempt_local170299371_0001_m_000000_0
17/07/08 12:26:16 INFO mapred.LocalJobRunner: map task executor complete.
17/07/08 12:26:16 INFO mapred.LocalJobRunner: Waiting for reduce tasks
17/07/08 12:26:16 INFO mapred.LocalJobRunner: Starting task:
attempt_local170299371_0001_r_000000_0
17/07/08 12:26:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/08 12:26:16 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/07/08 12:26:16 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:
org.apache.hadoop.mapreduce.task.reduce.Shuffle@4b61154f
17/07/08 12:26:16 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464,
maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10,
memToMemMergeOutputsThreshold=10
17/07/08 12:26:16 INFO reduce.EventFetcher: attempt_local170299371_0001_r_000000_0 Thread started:
EventFetcher for fetching Map Completion Events
17/07/08 12:26:16 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map
attempt_local170299371_0001_m_000000_0 decomp: 108 len: 112 to MEMORY
17/07/08 12:26:16 INFO reduce.InMemoryMapOutput: Read 108 bytes from map-output for
attempt_local170299371_0001_m_000000_0
17/07/08 12:26:16 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 108,
inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->108
17/07/08 12:26:16 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
17/07/08 12:26:16 INFO mapred.LocalJobRunner: 1 / 1 copied.
17/07/08 12:26:16 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and
0 on-disk map-outputs
17/07/08 12:26:16 INFO mapred.Merger: Merging 1 sorted segments
17/07/08 12:26:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 98
bytes
17/07/08 12:26:16 INFO reduce.MergeManagerImpl: Merged 1 segments, 108 bytes to disk to satisfy reduce
memory limit
17/07/08 12:26:16 INFO reduce.MergeManagerImpl: Merging 1 files, 112 bytes from disk
17/07/08 12:26:16 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
17/07/08 12:26:16 INFO mapred.Merger: Merging 1 sorted segments
17/07/08 12:26:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 98
bytes
17/07/08 12:26:16 INFO mapred.LocalJobRunner: 1 / 1 copied.
17/07/08 12:26:16 INFO mapred.Task: Task:attempt_local170299371_0001_r_000000_0 is done. And is in
the process of committing
17/07/08 12:26:16 INFO mapred.LocalJobRunner: 1 / 1 copied.
17/07/08 12:26:16 INFO mapred.Task: Task attempt_local170299371_0001_r_000000_0 is allowed to
commit now
17/07/08 12:26:16 INFO output.FileOutputCommitter: Saved output of task
'attempt_local170299371_0001_r_000000_0' to
hdfs://localhost:54310/user/hduser/output1/_temporary/0/task_local170299371_0001_r_000000
17/07/08 12:26:16 INFO mapred.LocalJobRunner: reduce > reduce
17/07/08 12:26:16 INFO mapred.Task: Task 'attempt_local170299371_0001_r_000000_0' done.
17/07/08 12:26:16 INFO mapred.LocalJobRunner: Finishing task:
attempt_local170299371_0001_r_000000_0
17/07/08 12:26:16 INFO mapred.LocalJobRunner: reduce task executor complete.
17/07/08 12:26:17 INFO mapreduce.Job: Job job_local170299371_0001 running in uber mode : false
17/07/08 12:26:17 INFO mapreduce.Job: map 100% reduce 100%
17/07/08 12:26:17 INFO mapreduce.Job: Job job_local170299371_0001 completed successfully
17/07/08 12:26:17 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=6412
FILE: Number of bytes written=568778
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=190
HDFS: Number of bytes written=74
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=1
Map output records=13
Map output bytes=147
Map output materialized bytes=112
Input split bytes=103
Combine input records=13
Combine output records=8
Reduce input groups=8
Reduce shuffle bytes=112
Reduce input records=8
Reduce output records=8
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=708837376
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=95
File Output Format Counters
Bytes Written=74

You might also like