Experiment No - 1
Experiment No - 1
PROCEDURE:
Prerequisites:
The Hadoop framework is written in Java, and its services require a compatible Java Runtime Environment
(JRE) and Java Development Kit (JDK). Use the following command to update your system before initiating
a new installation:
The OpenJDK 8 package in Ubuntu contains both the runtime environment and development kit.
The OpenJDK or Oracle Java version can affect how elements of a Hadoop ecosystem interact .
Install the OpenSSH server and client using the following command:
The username, in the above command is hdoop. You can add/use any username and password. Switch to
the newly created user and enter the corresponding password:
su – hdoop
By giving the following commands we can check or verify whether SSH is installed or not.
Which ssh
Result :/usr/bin/ssh
Which sshd
Result :/usr/bin/sshd
Step 7:
Hadoop uses SSH (to access its nodes) which would normally require the user to enter a password.
However, this requirement can be eliminated by creating and setting up SSH certificates using the following
command. If asked for a filename just leave it blank and press the enter key to continue.
Su hdoop
The following command generate an SSH key pair and define the location is to be stored in:
The following command adds the newly created key to the list of authorized keys so that Hadoop can use
ssh without prompting for a password.
The new user is now able to SSH without needing to enter a password every time. Verify everything is set
up correctly by using the hdoop user to SSH to localhost:
ssh localhost
The Hadoop user is now able to establish an SSH connection to the localhost.
Step 8: Visit the official Apache Hadoop page, and select the version of Hadoop you want to implement.
Here use the Binary download for Hadoop Version 3.2.1
Select your preferred option, and you will get a mirror link that allows you to download the Hadoop tar
package.
Step 9: Use the provided mirror link and download the Hadoop package with the wget command:
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
Step 10:
Once the download is complete, extract the files to initiate the Hadoop installation by using the following
command:
Step 11:
To move the folder where your hadoop download is available use the following command:
This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as a single
Java process. A Hadoop environment is configured by editing a set of configuration files:
bashrc
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site-xml
yarn-site.xml
Before editing the .bashrc file in the hdoop’s home directory, we need to find the path where java has
been installed to set the JAVA_HOME environment variables using step 3.
Use the above command to define the Hadoop environment variables by adding the following content
to the end of the file
source ~/.bashrc
When setting up a single node Hadoop cluster, you need to define which Java implementation is to be
utilized. Use the previously created $HADOOP_HOME variable to access the hadoop-env.sh file:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
The path needs to match the location of the Java installation on your system.
If you need help to locate the correct Java path, run the following command in your terminal window:
which javac
The resulting output provides the path to the Java binary directory.
Use the provided path to find the OpenJDK directory with the following command:
readlink -f /usr/bin/javac
To set up Hadoop in a pseudo-distributed mode, you need to specify the URL for your NameNode, and the
temporary directory Hadoop uses for the map and reduce process.
Add the following configuration to override the default values for the temporary directory and add your
HDFS URL to replace the default local file system setting:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
This example uses values specific to the local system. You should use values that
match your systems requirements. The data needs to be consistent throughout the
configuration process.
Step 17: Edit hdfs-site.xml File
The properties in the hdfs-site.xml file govern the location for storing node metadata, fsimage file, and edit
log file. Configure the file by defining the NameNode and DataNode storage directories.
Additionally, the default dfs.replication value of 3 needs to be changed to 1 to match the single node
setup.
Add the following configuration to the file and, if needed, adjust the NameNode and
DataNode directories to your custom locations:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Add the following configuration to change the default MapReduce framework name value to yarn:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 19: Edit yarn-site.xml File
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH
_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Step 20: Format HDFS NameNode
It is important to format the NameNode before starting Hadoop services for the first time:
The shutdown notification signifies the end of the NameNode format process.
start-dfs.sh
For checking running process in our Hadoop Cluster we use JSP Command .JSP
stands for Java Virtual Machine Process Status Tool.
Note: Your Hadoop installation is successful only if above daemons should start
Use your preferred browser and navigate to your localhost URL or IP. The default port
number 9870 gives you access to the Hadoop NameNode:
http://localhost:9870
The NameNode user interface provides a comprehensive overview of the entire cluster.
The default port 9864 is used to access individual DataNodes directly from your
browser:
http://localhost:9864
Result: Hence the Installation of Single Node Hadoop Cluster on Ubuntu 20.04.4 is successfully completed.