0% found this document useful (0 votes)
49 views

How To Install Hadoop On Ubuntu 18

This document provides instructions for installing Hadoop on Ubuntu 18.04 in pseudo-distributed mode. It describes how to add users, install Java, configure SSH access, install and configure Hadoop, start the HDFS and YARN services, and access the NameNode and ResourceManager web UIs. Key steps include downloading and extracting Hadoop, editing configuration files, starting services using scripts, and verifying installation using the jps command.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

How To Install Hadoop On Ubuntu 18

This document provides instructions for installing Hadoop on Ubuntu 18.04 in pseudo-distributed mode. It describes how to add users, install Java, configure SSH access, install and configure Hadoop, start the HDFS and YARN services, and access the NameNode and ResourceManager web UIs. Key steps include downloading and extracting Hadoop, editing configuration files, starting services using scripts, and verifying installation using the jps command.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

How to install Hadoop on Ubuntu

18.04 Bionic Beaver Linux


Contents

o 1. Software Requirements and Conventions Used
o 2. Other Versions of this Tutorial
o 3. Add users for Hadoop Environment
o 4. Install and configure the Oracle JDK
o 5. Configure passwordless SSH
o 6. Install Hadoop and configure related xml files
 6.1. Setting up the environment variables
 6.2. Configuration Changes in core-site.xml file
 6.3. Configuration Changes in hdfs-site.xml file
 6.4. Configuration Changes in mapred-site.xml file
 6.5. Configuration Changes in yarn-site.xml file
o 7. Starting the Hadoop Cluster
o 8. HDFS Command Line Interface
o 9. Access the Namenode and YARN from Browser
o 10. Conclusion

Apache Hadoop is an open source framework used for distributed storage as


well as distributed processing of big data on clusters of computers which runs on
commodity hardwares. Hadoop stores data in Hadoop Distributed File System
(HDFS) and the processing of these data is done using MapReduce. YARN
provides API for requesting and allocating resource in the Hadoop cluster.

The Apache Hadoop framework is composed of the following modules:

 Hadoop Common
 Hadoop Distributed File System (HDFS)
 YARN
 MapReduce

This article explains how to install Hadoop Version 2 on Ubuntu 18.04. We will
install HDFS (Namenode and Datanode), YARN, MapReduce on the single node
cluster in Pseudo Distributed Mode which is distributed simulation on a single
machine. Each Hadoop daemon such as hdfs, yarn, mapreduce etc. will run as a
separate/individual java process.

In this tutorial you will learn:


 How to add users for Hadoop Environment
 How to install and configure the Oracle JDK
 How to configure passwordless SSH
 How to install Hadoop and configure necessary related xml files
 How to start the Hadoop Cluster
 How to access NameNode and ResourceManager Web UI

Namenode Web User Interface.

Software Requirements and Conventions Used

Software Requirements and Linux Command Line Conventions

Category Requirements, Conventions or Software Version Used

System Ubuntu 18.04

Software Hadoop 2.8.5, Oracle JDK 1.8

Other Privileged access to your Linux system as root or via


the  sudo  command.

Conventions # - requires given linux commands to be executed with root privileges


either directly as a root user or by use of  sudo  command
Software Requirements and Linux Command Line Conventions

Category Requirements, Conventions or Software Version Used

$ - requires given linux commands to be executed as a regular non-


privileged user

Other Versions of this Tutorial


Ubuntu 20.04 (Focal Fossa)

Add users for Hadoop Environment

SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career  NEWSLETTER  and receive latest Linux news, jobs, career
advice and tutorials.

Create the new user and group using the command :

# add user

Add New User for Hadoop.


Install and configure the Oracle JDK
Download and extract the Java archive under the  /opt  directory.

# cd /opt

# tar -xzvf jdk-8u192-linux-x64.tar.gz

or

$ tar -xzvf jdk-8u192-linux-x64.tar.gz -C /opt

To set the JDK 1.8 Update 192 as the default JVM we will use the following
commands :

# update-alternatives --install /usr/bin/java java


/opt/jdk1.8.0_192/bin/java 100

# update-alternatives --install /usr/bin/javac javac


/opt/jdk1.8.0_192/bin/javac 100

After installation to verify the java has been successfully configured, run the
following commands :

# update-alternatives --display java

# update-alternatives --display javac

OracleJDK Installation & Configuration.


Configure passwordless SSH

Install the Open SSH Server and Open SSH Client with the command :

# sudo apt-get install openssh-server openssh-client

Generate Public and Private Key Pairs with the following command. The terminal
will prompt for entering the file name. Press  ENTER  and proceed. After that copy
the public keys form  id_rsa.pub  to  authorized_keys .

$ ssh-keygen -t rsa

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Passwordless SSH Configuration.

Verify the password-less ssh configuration with the command :

$ ssh localhost
Passwordless SSH Check.

Install Hadoop and configure related xml files


Download and extract Hadoop 2.8.5 from Apache official website.

# tar -xzvf hadoop-2.8.5.tar.gz

Setting up the environment variables

Edit the  bashrc  for the Hadoop user via setting up the following Hadoop
environment variables :

export HADOOP_HOME=/home/hadoop/hadoop-2.8.5

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Source the  .bashrc  in current login session.

$ source ~/.bashrc

Edit the  hadoop-env.sh  file which is in  /etc/hadoop  inside the Hadoop
installation directory and make the following changes and check if you want to
change any other configurations.

export JAVA_HOME=/opt/jdk1.8.0_192

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/hadoop-2.8.5/etc/hadoop"}

Changes in hadoop-env.sh File.

Configuration Changes in core-site.xml file

Edit the  core-site.xml  with vim or you can use any of the editors. The file is
under  /etc/hadoop  inside  hadoop  home directory and add following entries.

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadooptmpdata</value>

</property>
</configuration>

In addition, create the directory under  hadoop  home folder.

$ mkdir hadooptmpdata

Configuration For core-site.xml File.

Configuration Changes in hdfs-site.xml file

SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career  NEWSLETTER  and receive latest Linux news, jobs, career
advice and tutorials.

Edit the  hdfs-site.xml  which is present under the same location


i.e  /etc/hadoop  inside  hadoop  installation directory and create
the  Namenode/Datanode  directories under  hadoop  user home directory.

$ mkdir -p hdfs/namenode

$ mkdir -p hdfs/datanode

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

<name>dfs.name.dir</name>

<value>file:///home/hadoop/hdfs/namenode</value>

<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>

</property>

</configuration>

Configuration For hdfs-site.xml File.

Configuration Changes in mapred-site.xml file

Copy the  mapred-site.xml  from  mapred-site.xml.template  using  cp  command


and then edit the  mapred-site.xml  placed
in  /etc/hadoop  under  hadoop  instillation directory with the following changes.

$ cp mapred-site.xml.template mapred-site.xml

Creating the new mapred-site.xml File.


<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>
Configuration For mapred-site.xml File.

Configuration Changes in yarn-site.xml file

Edit  yarn-site.xml  with the following entries.

<configuration>

<property>

<name>mapreduceyarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

Configuration For yarn-site.xml File.

Starting the Hadoop Cluster


Format the namenode before using it for the first time. As HDFS user run the
below command to format the Namenode.

$ hdfs namenode -format


Format the Namenode.

SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career  NEWSLETTER  and receive latest Linux news, jobs, career
advice and tutorials.

Once the Namenode has been formatted then start the HDFS using the  start-
dfs.sh  script.

Starting the DFS Startup Script to start HDFS.

To start the YARN services you need to execute the yarn start script i.e.  start-
yarn.sh
Starting the YARN Startup Script to start YARN.

To verify all the Hadoop services/daemons are started successfully you can use
the  jps  command.

/opt/jdk1.8.0_192/bin/jps

20035 SecondaryNameNode

19782 DataNode

21671 Jps

20343 NodeManager

19625 NameNode

20187 ResourceManager

Hadoop Daemons Output from JPS Command.

Now we can check the current Hadoop version you can use below command :

$ hadoop version

or

$ hdfs version

Check Hadoop Version.

HDFS Command Line Interface


To access the HDFS and create some directories top of DFS you can use HDFS
CLI.

$ hdfs dfs -mkdir /test

$ hdfs dfs -mkdir /hadooponubuntu

$ hdfs dfs -ls /

HDFS Directory Creation using HDFS CLI.

Access the Namenode and YARN from Browser


You can access the both the Web UI for NameNode and YARN Resource Manager
via any of the browsers like Google Chrome/Mozilla Firefox.

Namenode Web UI -  http://<hadoop cluster hostname/IP address>:50070

Namenode Web User Interface.


HDFS Details from Namenode Web User Interface.

SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career  NEWSLETTER  and receive latest Linux news, jobs, career
advice and tutorials.

HDFS Directory Browsing via Namenode Web User Interface.

The YARN Resource Manager (RM) web interface will display all running jobs on
current Hadoop Cluster.

Resource Manager Web UI -  http://<hadoop cluster hostname/IP address>:8088


Resource Manager Web User Interface.

Conclusion
The world is changing the way it is operating currently and Big-data is playing an
major role in this phase. Hadoop is a framework that makes our lif easy while
working on large sets of data. There are improvements on all the fronts. The
future is exciting.

You might also like