0% found this document useful (0 votes)
45 views

Hadoop Installation On Linux

This document provides steps to install and configure OpenJDK, OpenSSH, and Hadoop on Ubuntu. It includes enabling passwordless SSH access, downloading and extracting Hadoop files, modifying configuration files for core Hadoop components like HDFS and YARN, and starting the Hadoop cluster and services.

Uploaded by

Rokon
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Hadoop Installation On Linux

This document provides steps to install and configure OpenJDK, OpenSSH, and Hadoop on Ubuntu. It includes enabling passwordless SSH access, downloading and extracting Hadoop files, modifying configuration files for core Hadoop components like HDFS and YARN, and starting the Hadoop cluster and services.

Uploaded by

Rokon
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Install OpenJDK on Ubuntu


sudo apt update
sudo apt install openjdk-8-jdk -y
java -version; javac -version

which javac
copy the path

readlink -f /usr/bin/javac
copy the path before /bin/javac

>sudo nano .bashrc

#Add JAVA_HOME path to .bashrc


export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

2.Install OpenSSH on Ubuntu


sudo apt install openssh-server openssh-client -y

3.Enable Passwordless SSH for Hadoop User


ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

4.Use the cat command to store the public key as authorized_keys in the ssh directory:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

5.Set the permissions for your user with the chmod command:
chmod 0600 ~/.ssh/authorized_keys

6.SSH to localhost
ssh localhost

7.Download and Install Hadoop on Ubuntu


wget <mirror link>

8.Extract the files


tar xzf hadoop-3.3.0.tar.gz

9.Modify the following files:

bashrc
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site-xml
yarn-site.xml

##.bashrc

add it at the end of the file

#Hadoop Related Options


export HADOOP_HOME=/home/amit/hadoop-3.3.0
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Save it.

Run the command on the terminal to make new environment variables visible.
source ~/.bashrc

##hadoop-env.sh

Uncomment JAVA_HOME by removing # and modify it to


JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

##core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Create a directory for datanodes and namenodes and add its location in hdfs-site.xml

##hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/amit/hadoop-3.3.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/amit/hadoop-3.3.0/data/datanode</value>
</property>

##mapred-site-xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

##yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CON
F_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_
HOME</value>
</property>

10.Format HDFS NameNode


hdfs namenode -format

11.Start Hadoop Cluster


./start-dfs.sh
./start-yarn.sh
jps

12.Access Hadoop UI from Browser

Hadoop NameNode UI: http://localhost:9870


YARN Resource Manager: http://localhost:8088

You might also like