0% found this document useful (0 votes)
99 views

Experiment No - 1

The document describes the steps to install Apache Hadoop in a single node (pseudo-distributed) configuration on Ubuntu. It involves installing Java, OpenSSH, creating a Hadoop user, downloading and extracting Hadoop, and configuring configuration files like core-site.xml, hdfs-site.xml, and yarn-site.xml to specify directories and settings for HDFS, MapReduce and YARN in the single node environment.

Uploaded by

Tameem Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Experiment No - 1

The document describes the steps to install Apache Hadoop in a single node (pseudo-distributed) configuration on Ubuntu. It involves installing Java, OpenSSH, creating a Hadoop user, downloading and extracting Hadoop, and configuring configuration files like core-site.xml, hdfs-site.xml, and yarn-site.xml to specify directories and settings for HDFS, MapReduce and YARN in the single node environment.

Uploaded by

Tameem Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

EXPERIMENT NO – 1

1. Install Apache Hadoop

AIM: Installation of Single Node Hadoop Cluster on Ubuntu 20.04.4

PROCEDURE:

Prerequisites:

1. Install OpenJDK on Ubuntu.


2. Install OpenSSH on Ubuntu.
3. Create Hadoop User.

Step 1: Installing Java on Ubuntu.

The Hadoop framework is written in Java, and its services require a compatible Java Runtime Environment
(JRE) and Java Development Kit (JDK). Use the following command to update your system before initiating
a new installation:

 Sudo apt update

The OpenJDK 8 package in Ubuntu contains both the runtime environment and development kit.

Type the following command in your terminal to install OpenJDK 8:

 sudo apt install openjdk-8-jdk

The OpenJDK or Oracle Java version can affect how elements of a Hadoop ecosystem interact .

Step 2: Find Version of Java Installed

Once the installation process is complete, verify the current Java version:

 java –version; javac -version

Step 3: To know the Java path

Type the following command in your terminal.

 sudo update-alternatives –config java


 sudo update-alternatives –config javac

Step 4: Install OpenSSH on Ubuntu

Install the OpenSSH server and client using the following command:

 sudo apt install openssh-server openssh-client


In the example below, the output confirms that the latest version is already installed.

Step 5: Create Hadoop User

The adduser command is used to create a new Hadoop user:

 sudo adduser hdoop

The username, in the above command is hdoop. You can add/use any username and password. Switch to
the newly created user and enter the corresponding password:

 su – hdoop

Step 6: Verify SSH Installation

By giving the following commands we can check or verify whether SSH is installed or not.

 Which ssh

Result :/usr/bin/ssh

 Which sshd

Result :/usr/bin/sshd

Step 7:

Hadoop uses SSH (to access its nodes) which would normally require the user to enter a password.
However, this requirement can be eliminated by creating and setting up SSH certificates using the following
command. If asked for a filename just leave it blank and press the enter key to continue.

 Su hdoop

The following command generate an SSH key pair and define the location is to be stored in:

 hdoop@D:-/home/AITKS-Lab$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa


The system proceeds to generate and save the SSH key pair.

The following command adds the newly created key to the list of authorized keys so that Hadoop can use
ssh without prompting for a password.

 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

The new user is now able to SSH without needing to enter a password every time. Verify everything is set
up correctly by using the hdoop user to SSH to localhost:

 ssh localhost

The Hadoop user is now able to establish an SSH connection to the localhost.

Download and Install Hadoop on Ubuntu

Note: Based on your Hadoop Version modify below commands

Step 8: Visit the official Apache Hadoop page, and select the version of Hadoop you want to implement.
Here use the Binary download for Hadoop Version 3.2.1

Select your preferred option, and you will get a mirror link that allows you to download the  Hadoop tar
package.

Step 9: Use the provided mirror link and download the Hadoop package with the wget command:

 wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

Step 10:

Once the download is complete, extract the files to initiate the Hadoop installation by using the following
command:

 tar xzf hadoop-3.2.1.tar.gz

Step 11:

To move the folder where your hadoop download is available use the following command:

 sudo mv* /usr/local/hadoop/


Step 12: Set read/write permission

 sudo chown –R hdoop:hadoop/usr/local/hadoop

Setup Configuration Files


Hadoop excels when deployed in a fully distributed mode on a large cluster of networked
servers. However, if you are new to Hadoop and want to explore basic commands or test
applications, you can configure Hadoop on a single node.

This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as a single
Java process. A Hadoop environment is configured by editing a set of configuration files:

 bashrc
 hadoop-env.sh
 core-site.xml
 hdfs-site.xml
 mapred-site-xml
 yarn-site.xml

Step 13: Configure Hadoop Environment Variables (bashrc)

Before editing the .bashrc file in the hdoop’s home directory, we need to find the path where java has
been installed to set the JAVA_HOME environment variables using step 3.

 Sudo gedit ~/.bashrc

Use the above command to define the Hadoop environment variables by adding the following content
to the end of the file

#Hadoop Related Options


export HADOOP_HOME=/home/hdoop/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS"-Djava.library.path=$HADOOP_HOME/lib/nativ"

Once you add the variables, save and exit the .bashrc file.


Step 14: To apply the changes to the current running environment use the following command:

 source ~/.bashrc

Step 15: Edit hadoop-env.sh File

The hadoop-env.sh file serves as a master file to configure YARN, HDFS, MapReduce, and Hadoop-related


project settings.

When setting up a single node Hadoop cluster, you need to define which Java implementation is to be
utilized. Use the previously created $HADOOP_HOME variable to access the hadoop-env.sh file:

 sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to


the OpenJDK installation on your system., add the following line:

 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

The path needs to match the location of the Java installation on your system.
If you need help to locate the correct Java path, run the following command in your terminal window:

 which javac

The resulting output provides the path to the Java binary directory.

Use the provided path to find the OpenJDK directory with the following command:

 readlink -f /usr/bin/javac

The section of the path just before the /bin/javac directory needs to be assigned to


the $JAVA_HOME variable.

Step 16: Edit core-site.xml File

The core-site.xml file defines HDFS and Hadoop core properties.

To set up Hadoop in a pseudo-distributed mode, you need to specify the URL for your NameNode, and the
temporary directory Hadoop uses for the map and reduce process.

Open the core-site.xml file in a text editor:

 sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following configuration to override the default values for the temporary directory and add your
HDFS URL to replace the default local file system setting:

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

This example uses values specific to the local system. You should use values that
match your systems requirements. The data needs to be consistent throughout the
configuration process.
Step 17: Edit hdfs-site.xml File

The properties in the hdfs-site.xml file govern the location for storing node metadata, fsimage file, and edit
log file. Configure the file by defining the NameNode and DataNode storage directories.

Additionally, the default dfs.replication value of 3 needs to be changed to 1 to match the single node
setup.

Use the following command to open the hdfs-site.xml file for editing:

 sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following configuration to the file and, if needed, adjust the NameNode and
DataNode directories to your custom locations:

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

If necessary, create the specific directories you defined for the dfs.data.dir value.


Step 18: Edit mapred-site.xml File

Use the following command to access the mapred-site.xml file and define MapReduce values:

 sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following configuration to change the default MapReduce framework name value to yarn:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 19: Edit yarn-site.xml File

The yarn-site.xml file is used to define settings relevant to YARN. It contains configurations for the Node


Manager, Resource Manager, Containers, and Application Master.

Open the yarn-site.xml file in a text editor:

 sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Append the following configuration to the file:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>   

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH
_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Step 20: Format HDFS NameNode

It is important to format the NameNode before starting Hadoop services for the first time:

 hdfs namenode -format

The shutdown notification signifies the end of the NameNode format process.

Step 21: Starting Hadoop

Navigate to the hadoop-3.2.1/sbin directory and execute the following commands to


start the NameNode and DataNode:

 start-dfs.sh

The system takes a few moments to initiate the necessary nodes.


Step 22:

For checking running process in our Hadoop Cluster we use JSP Command .JSP
stands for Java Virtual Machine Process Status Tool.

After running JSP command the following Daemons Should start.

Note: Your Hadoop installation is successful only if above daemons should start

Step 23: Access Hadoop from Browser

Use your preferred browser and navigate to your localhost URL or IP. The default port
number 9870 gives you access to the Hadoop NameNode:

 http://localhost:9870

The NameNode user interface provides a comprehensive overview of the entire cluster.
The default port 9864 is used to access individual DataNodes directly from your
browser:

 http://localhost:9864

Result: Hence the Installation of Single Node Hadoop Cluster on Ubuntu 20.04.4 is successfully completed.

You might also like