Hadoop is a framework written in Java for running applications on a large cluster of community hardware. It is similar to the Google file system. In order to install Hadoop, we need java first so first, we install java in our Ubuntu.
Step 1: Open your terminal and first check whether your system is equipped with
Java or not with command
java -version
Step 2: Now it is time to update your system. Below are the 2 commands to update your system.
sudo apt-get update
sudo apt-get install update
Step 3: Now we will install the default JDK for java using the following command:
sudo apt-get install default-jdk
It will ask you for Y/N press Y.
Step 4: Now check whether Java is installed or not using the command
java -version
Step 5: Once it installs we require a dedicated user for the same. It is not necessary but it is a good thing to make a dedicated user for the
Hadoop installation. You can use the following command:
sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoopusr
Step 6: Now after running the above 2 commands, you have successfully created a dedicated user with name
hadoopusr. Now it will ask for a new UNIX password so choose password according to your convenience(make sure sometimes it doesn't show the character or number you type so please remember whatever you type). Then it will ask you for information like Full Name etc. Keep pressing enter for default then press Y for correct information.
Step 7: Now use the following command:
sudo adduser hadoopusr sudo
With this command, you add your 'hadoopusr' to the 'sudo' group so that we can also make it a superuser.
Step 8: Now we also need to install ssh key's that is secured shell.
sudo apt-get install openssh-server
Step 9: Now it's time for us to switch to new user that is hadoopusr and also enter the password you use above command for switching user:
su - hadoopusr
Step 10: Now it's time to generate ssh key because Hadoop requires ssh access to manage it's node, remote or local machine so for our single node of the setup of Hadoop we configure such that we have access to the localhost.
ssh-keygen -t rsa -P ""
After this command simple press
enter.
Step 11: Now we use the below command because we need to add the public key of the computer to the authorized key file of the compute that you want to access with ssh keys so we fired these command.
cat $HOME/ .ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Step 12: Now check for the local host i.e. ssh localhost with below command and press
yes to continue and enter your
password if it ask then type exit.
ssh localhost

Now you have completed the basic requirement for
Hadoop installation.
Step 13: Now download the package that you will going to install . download it from
Hadoop-2.9.0 by clicking to the file shown in below image.
Step 14: Once you have download
hadoop-2.9.0.tar.gz then place this tar file to your preferred location then extract it with below commands. In my case I moved it to the
/Documents folder.

Now we extract this file with below command and enter your
hadoopusr password. If you don't know the password don't worry you can simply switch your user and change password according to yourself.
command : sudo tar xvzf hadoop-2.9.0.tar.gz
Step 15: Now we need to move this extracted folder to the
hadoopusr user so for that type below command(make sure name of your extracted folder is
hadoop):
sudo mv hadoop /usr/local/hadoop
Step 16: Now we need to change the ownership so for that command is:
sudo chown -R hadoopusr /usr/local
Step 17: This is the most important Step i.e. now we are going to configure some files this is really very important.
First we configure our
./bashrc file so that to open that file type the below command:
sudo gedit ~/.bashrc

Then a
./bashrc file is open then copy the below command inside this file (change java version according to your PC java version like it might be java-8-openjdk-amd64 ).
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Then check whether you have configured it correctly or not.
source ~/.bashrc
Step 18: Before configuring more file first we ensure which version of java we have installed for that go to the location
/usr/lib/jvm and after going to this location type
ls command to list down the file inside it now see the java version, In my case it is
java-11-openjdk-amd64.
Step 19: Now we will configure
hadoop-env.sh. For that open the file using below command.
sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Once the file opened, copy the below export command inside it and make sure to comment the already existing export command with
JAVA_HOME:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Don't forget to
save.
Step 20: Now we will configure the core-site.xml. For that open that file using below command:
sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

once the file opens copy the below text
inside the configuration tag
[code]
<!--
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
-->
[/code]
See the below image for better understanding:
Step 21: Now we will configure the
hdfs-site.xml for that open that file using below command.
sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Once the file opens copy the below text
inside the configuration tag
[code]
<!--
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>
-->
[/code]
See the below image for better understanding:
Step 22: Now we will configure the yarn-site.xml which is responsible for the execution of file in the Hadoop environment. For that open that file using below command:
sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

once the file opens copy the below text
inside the configuration tag
[code]
<!--
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-->
[/code]
See the below image for better understanding:
Step 23: Now the last file to configure is mapred-site.xml. For that we have
mapred-site.xml.template so we need to locate that file then copy this file to that location and then
rename it.
So to locate the file we need to go to the location
/usr/local/hadoop/etc/hadoop/ so to copy this file and also rename the file the single, use the following command
sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

once the file gets copied or renamed now open that file using the following command:
sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml

And then place the below content inside its
configuration tag.
[code]
<!--
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
-->
[/code]
See the below image for better understanding:
Step 24: Now we have successfully configured all the files. So now it is time to check our installation. As we know that in Hadoop architecture we have name node and other blocks so we need to make one directory i.e. hadoop_space. Inside this directory we make another directory i.e. hdfs and namenode and datanode. The command to make directory is given below:
sudo mkdir -p /usr/local/hadoop_space
sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode
Now we need to give permission for that commands are below:
sudo chown -R hadoopusr /usr/local/hadoop_space
Running Hadoop
1. First, we need to format the namenode then you need to run the below command for first time when you starting the cluster if you use it again then all your metadata will get erase.
hdfs namenode -format
2. Now we need to start the DFS i.e. Distributed File System.
start-dfs.sh
3. now the last thing you need to start is
yarn
start-yarn.sh
4. Now use the following command:
jps
Now you will be able to see the SecondaryNameNode, NodeManager, ResourceManager, NameNode, jpd, and DataNode which means you will have successfully installed Hadoop.
5. You have successfully installed
hadoop on your system. Now to check all you cluster information you can use
localhost:50070 in your browser. The Interface will look like as:
Similar Reads
How to Install DataGrip in Linux
DataGrip is a database platform that supports several engines. MySQL, PostgreSQL, MS SQL Server, Oracle, Sybase, DB2, SQLite, HyperSQL, Apache Derby, and H2 are among the databases it supports. It comes with an editor that contains features like auto-completion, analysis, and navigation to help you
2 min read
How to Install SQL Developer in Linux?
SQL Developer is a client application or a desktop application. We can install it in our local system & can access Oracle Database. Oracle provides it. Oracle has a server component & SQL Developer has a client component. By using SQL developer we can access the Oracle database. SQL Developm
3 min read
How to Install JDK in Linux?
Java is a very popular general-purpose programming language, which is very close to flow Oop's theory and can run independently on any platform, but its run time environment is dependent on the platform the JVM (Java Virtual Machine) is also known as. Which first interprets Java file byte code and b
2 min read
How to Install Deepin on Linux?
Deepin is the top Linux distribution from China, it provides a beautiful, easy-to-use, safe, and reliable operating system for global users. Deepin is based on Debian's stable branch. The feature that sets Deepin aside from the rest of Linux distributions is its desktop environment known as DDE Deep
3 min read
How to Install Bin Files in Linux
Bin files in Linux are the binary executable files used for software installation or distribution. Installing bin files involves granting executable permissions, running the bin file, and verifying the installation, ensuring that the software is correctly installed and ready for use on the Linux sys
3 min read
How to Install SQL Client in Linux?
The installation of MySQL client on Linux systems allows plenty possibilities to connect, manage and interact with MySQL databases straight from terminal. Whether itâs for Developers, Database Administrators or System Users, efficiently querying and managing databases requires setting up MySQL clien
3 min read
How to install Sentora in Linux
Sentora is an Open-Source Web Hosting Control Panel that is used to build specifically to work on a variety of Linux. it's a free-to-download and use web hosting control panel developed for Linux, UNIX (it stands for UNICS 'UNiplexed Information Computing System' ), and BSD Based servers or computer
3 min read
How to Install and Use SQLite in Fedora Linux
SQLite is a versatile, lightweight, serverless, and self-contained database engine that finds applications in a variety of settings, ranging from embedded systems to mobile applications and serverless environments. In this article, we will delve into the installation and usage of SQLite on a Fedora
4 min read
How to Install And Run VMware Tool in Linux?
Here, we will see how to install and run the VMware Tools in the Ubuntu Linux system. Virtualization means you can use the guest operating system on your main operating system. Sometimes they help you to do some tasks that are not performable to your current operating system, so you also use a virtu
2 min read
How to Install Apache CouchDB 2.3.0 in Linux?
Apache CouchDB (CouchDB) is a NoSQL document database that collects and stores data in JSON-formatted documents. CouchDB, unlike relational databases, uses a schema-free data model, making record management simpler across a range of computing devices, mobile phones, and web browsers. CouchDB was fir
2 min read