Open In App

How to Install Hadoop in Linux?

Last Updated : 30 Jun, 2025
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Hadoop is a framework written in Java for running applications on a large cluster of community hardware. It is similar to the Google file system. In order to install Hadoop, we need java first so first, we install java in our Ubuntu.

Step 1: Check for Java

Open your terminal and first check whether your system is equipped with Java or not with command

java -version

If Java is not installed, then follow the following steps:

1: Update your System

Below are the 2 commands to update your system.

sudo apt-get update

sudo apt-get install update

updating Linux system

2: Install the Default JDK

Now we will install the default JDK for java using the following command:

sudo apt-get install default-jdk

It will ask you for Y/N, press Y.

installing jdk for Hadoop

3: Check for Java again

Now check whether Java is installed or not using the command

java -version

checking for java installation

Step 2: Creating a User

Once it installs we require a dedicated user for the same. It is not necessary but it is a good thing to make a dedicated user for the Hadoop installation. You can use the following command:

sudo addgroup hadoop

adding a user for Hadoop - 1

sudo adduser --ingroup hadoop hadoopusr

adding a user for Hadoop - 2

Step 3: Set Password

Now after running the above 2 commands, you have successfully created a dedicated user with name 'hadoopusr'. Now it will ask for a new UNIX password so choose password according to your convenience (make sure sometimes it doesn't show the character or number you type so please remember whatever you type). Then it will ask you for information like Full Name etc. Keep pressing enter for default then press Y for correct information.

adding user information for Hadoop Installation User

Step 4:

Now use the following command:

sudo adduser hadoopusr sudo

With this command, you add your 'hadoopusr' to the 'sudo' group so that we can also make it a superuser.

making Hadoop user to superuser in Linux

Step 5: Install SSH Key

Now we also need to install ssh key's that is secured shell.

sudo apt-get install openssh-server

installing ssh key

Step 6: Switch to User

Now it's time for us to switch to new user that is hadoopusr and also enter the password you use above command for switching user:

su - hadoopusr

switching to Hadoop user

Step 7: Generate SSH Key

Now it's time to generate ssh key because Hadoop requires ssh access to manage it's node, remote or local machine so for our single node of the setup of Hadoop we configure such that we have access to the localhost.

ssh-keygen -t rsa -P ""

After this command simple press enter.

generating ssh key for Hadoop user

Step 8:

Now we use the below command because we need to add the public key of the computer to the authorized key file of the compute that you want to access with ssh keys so we fired these command.

cat $HOME/ .ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

add the public key of the computer to the authorized key file in Hadoop installation

Step 9:

Now check for the local host i.e. ssh localhost with below command and press yes to continue and enter your password if it ask then type exit.

ssh localhost

testing ssh localhost - 1testing ssh localhost - 2

Now you have completed the basic requirement for Hadoop installation.

Step 10: Download Package

Now download the package that you will going to install . download it from Hadoop-2.9.0 by clicking to the file shown in below image.

downloading hadoop

Step 11:

Once you have download hadoop-2.9.0.tar.gz then place this tar file to your preferred location then extract it with below commands. In my case I moved it to the /Documents folder.

extracting downloaded Hadoop File - 1

Now we extract this file with below command and enter your hadoopusr password. If you don't know the password don't worry you can simply switch your user and change password according to yourself.

command : sudo tar xvzf hadoop-2.9.0.tar.gz

extracting downloaded Hadoop File - 2

Step 12:

Now we need to move this extracted folder to the hadoopusr user so for that type below command(make sure name of your extracted folder is hadoop ):

sudo mv hadoop /usr/local/hadoop

Step 13:

Now we need to change the ownership so for that command is:

sudo chown -R hadoopusr /usr/local

changing ownership in Hadoop Installation

Step 14:

This is the most important Step i.e. now we are going to configure some files this is really very important. First we configure our ./bashrc file so that to open that file type the below command:

sudo gedit ~/.bashrc

configuring ./bashrc in Hadoop Installation

Then a ./bashrc file is open then copy the below command inside this file (change java version according to your PC java version like it might be java-8-openjdk-amd64 ).

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

configuring ./bashrc in Hadoop Installation

Then check whether you have configured it correctly or not.

source ~/.bashrc

checking the configuring of ./bashrc in Hadoop Installation

Step 15:

Before configuring more file first we ensure which version of java we have installed for that go to the location /usr/lib/jvm and after going to this location type

ls

command to list down the file inside it now see the java version, In my case it is

java-11-openjdk-amd64

checking java version

Step 16:

Now we will configure hadoop-env.sh. For that open the file using below command.

sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

configuring hadoop-env.sh file

Once the file opened, copy the below export command inside it and make sure to comment the already existing export command with JAVA_HOME:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

configuring hadoop-env.sh file

Don't forget to save.

Step 17:

Now we will configure the core-site.xml. For that open that file using below command:

sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

configure the core-site.xml

once the file opens copy the below text

inside the configuration tag

[code] <!-- <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> --> [/code] See the below image for better understanding:

configure the core-site.xml

Step 18:

Now we will configure the hdfs-site.xml for that open that file using below command.

sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
configuring the hdfs-site.xml file

Once the file opens copy the below text inside the configuration tag

[code] <!-- <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value> </property> --> [/code] See the below image for better understanding:

 configuring the hdfs-site.xml file

Step 19:

Now we will configure the yarn-site.xml which is responsible for the execution of file in the Hadoop environment. For that open that file using below command:

sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
yarn-site.xml file configuration

once the file opens copy the below text inside the configuration tag

[code] <!-- <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> --> [/code] See the below image for better understanding:

yarn-site.xml file configuration

Step 20:

Now the last file to configure is mapred-site.xml. For that we have mapred-site.xml.template so we need to locate that file then copy this file to that location and then rename it. So to locate the file we need to go to the location /usr/local/hadoop/etc/hadoop/ so to copy this file and also rename the file the single, use the following command

sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
mapred-site.xml file configuration

once the file gets copied or renamed now open that file using the following command:

sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
mapred-site.xml file configuration

And then place the below content inside its configuration tag

. [code] <!-- <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> --> [/code] See the below image for better understanding:

mapred-site.xml file configuration

Step 21:

Now we have successfully configured all the files. So now it is time to check our installation. As we know that in Hadoop architecture we have name node and other blocks so we need to make one directory i.e. hadoop_space. Inside this directory we make another directory i.e. hdfs and namenode and datanode. The command to make directory is given below:

 
sudo mkdir -p /usr/local/hadoop_space
sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode

Now we need to give permission for that commands are below:

sudo chown -R hadoopusr /usr/local/hadoop_space

Running Hadoop

1. First, we need to format the namenode then you need to run the below command for first time when you starting the cluster if you use it again then all your metadata will get erase.

hdfs namenode -format
formatting namenode in Hadoop

2. Now we need to start the DFS i.e. Distributed File System.

start-dfs.sh
starting DFS in Hadoop

3. Now the last thing you need to start is yarn

start-yarn.sh
starting yarn in Hadoop

4. Now use the following command:

jps

Now you will be able to see the SecondaryNameNode, NodeManager, ResourceManager, NameNode, jpd and DataNode which means you will have successfully installed Hadoop.

using jps command

5. You have successfully installed hadoop on your system. Now to check all you cluster information you can use localhost:50070 in your browser. The Interface will look like as:

Hadoop Interface in Browser

Next Article
Article Tags :

Similar Reads