How To Install Hadoop On Ubuntu 18
How To Install Hadoop On Ubuntu 18
Hadoop Common
Hadoop Distributed File System (HDFS)
YARN
MapReduce
This article explains how to install Hadoop Version 2 on Ubuntu 18.04. We will
install HDFS (Namenode and Datanode), YARN, MapReduce on the single node
cluster in Pseudo Distributed Mode which is distributed simulation on a single
machine. Each Hadoop daemon such as hdfs, yarn, mapreduce etc. will run as a
separate/individual java process.
SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career NEWSLETTER and receive latest Linux news, jobs, career
advice and tutorials.
# add user
# cd /opt
or
To set the JDK 1.8 Update 192 as the default JVM we will use the following
commands :
After installation to verify the java has been successfully configured, run the
following commands :
Install the Open SSH Server and Open SSH Client with the command :
Generate Public and Private Key Pairs with the following command. The terminal
will prompt for entering the file name. Press ENTER and proceed. After that copy
the public keys form id_rsa.pub to authorized_keys .
$ ssh-keygen -t rsa
$ ssh localhost
Passwordless SSH Check.
Edit the bashrc for the Hadoop user via setting up the following Hadoop
environment variables :
export HADOOP_HOME=/home/hadoop/hadoop-2.8.5
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Source the .bashrc in current login session.
$ source ~/.bashrc
Edit the hadoop-env.sh file which is in /etc/hadoop inside the Hadoop
installation directory and make the following changes and check if you want to
change any other configurations.
export JAVA_HOME=/opt/jdk1.8.0_192
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/hadoop-2.8.5/etc/hadoop"}
Edit the core-site.xml with vim or you can use any of the editors. The file is
under /etc/hadoop inside hadoop home directory and add following entries.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadooptmpdata</value>
</property>
</configuration>
$ mkdir hadooptmpdata
SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career NEWSLETTER and receive latest Linux news, jobs, career
advice and tutorials.
$ mkdir -p hdfs/namenode
$ mkdir -p hdfs/datanode
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
$ cp mapred-site.xml.template mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Configuration For mapred-site.xml File.
<configuration>
<property>
<name>mapreduceyarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career NEWSLETTER and receive latest Linux news, jobs, career
advice and tutorials.
Once the Namenode has been formatted then start the HDFS using the start-
dfs.sh script.
To start the YARN services you need to execute the yarn start script i.e. start-
yarn.sh
Starting the YARN Startup Script to start YARN.
To verify all the Hadoop services/daemons are started successfully you can use
the jps command.
/opt/jdk1.8.0_192/bin/jps
20035 SecondaryNameNode
19782 DataNode
21671 Jps
20343 NodeManager
19625 NameNode
20187 ResourceManager
Now we can check the current Hadoop version you can use below command :
$ hadoop version
or
$ hdfs version
SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career NEWSLETTER and receive latest Linux news, jobs, career
advice and tutorials.
The YARN Resource Manager (RM) web interface will display all running jobs on
current Hadoop Cluster.
Conclusion
The world is changing the way it is operating currently and Big-data is playing an
major role in this phase. Hadoop is a framework that makes our lif easy while
working on large sets of data. There are improvements on all the fronts. The
future is exciting.