0% found this document useful (0 votes)
26 views

Hadoop Multinode Cluster Installation

The document describes the steps to install Apache Hadoop 2.7.1 in a multinode cluster configuration with 1 name node and 3 data nodes on Ubuntu 12.04. It involves setting up the environment, configuring hostnames and IPs, installing Java and Hadoop, configuring configuration files, enabling passwordless SSH login, formatting the name node, and starting and stopping the Hadoop daemons.

Uploaded by

Jay Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Hadoop Multinode Cluster Installation

The document describes the steps to install Apache Hadoop 2.7.1 in a multinode cluster configuration with 1 name node and 3 data nodes on Ubuntu 12.04. It involves setting up the environment, configuring hostnames and IPs, installing Java and Hadoop, configuring configuration files, enabling passwordless SSH login, formatting the name node, and starting and stopping the Hadoop daemons.

Uploaded by

Jay Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Multinode Cluster Installation Mode

Apache Hadoop v2.7.1


Linux Operating System (Ubuntu 12.04)

Environment Setup:
No.of Nodes = 4 (1 Namenode, 3 Datanodes)
Hostnames:
Namenode – namenode
Datanodes – datanode1, datanode2, datanode3

Installation and Configuration:


In “namenode”
1. Create a new user “multinode” for this installation procedure.
~$ sudo adduser multinode

2. Edit the “/etc/hosts” file providing the IP addresses of the cluster nodes.
~$ sudo vim /etc/hosts
namenode- ip-address namenode
datanode1-ip-address datanode1
datanode2-ip-address datanode2
datanode3-ip-address datanode3

Comment the line containing “localhost”


After making the above mentioned changes, save and close the file
Note: Also make sure, all the four nodes are reachable via network

3. Switch to the newly created user account


~$ su – multinode

4. Download the latest stable version of Apache Hadoop tarball distribution.

5. Download Java1.7 JDK tarball. Consider the architecture 32 bit (i386, i586,
i686), 64bit (x86_64) before downloading.

6. Assuming that the downloaded tarballs are present under the home directory of
the user. Extract the tarballs

~$ tar -xvf hadoop-2.7.1.tar.gz


~$ tar -xvf jdk-7u79-linux-x86_64.gz

7. After extracting, set up the environment variables in ~/.bashrc file


~$ vi .bashrc
export JAVA_HOME=/home/multinode/jdk1.7.0_79
export HADOOP_PREFIX=/home/multinode/hadoop-2.7.1
export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:
$HADOOP_HOME/sbin:$PATH
After appending these lines, save and close the file.

8. For these variables to be set for the current shell, source the file.
~$ source ~/.bashrc
Check whether the changes have been applied properly
~$ echo $JAVA_HOME
~$ hadoop version

9. Next, edit the hadoop configuration files


~$ cd $HADOOP_CONF_DIR

~hadoop-2.7.1/etc/hadoop$ vi hadoop-env.sh
export JAVA_HOME=/home/multinode/jdk1.7.0_79

~hadoop-2.7.1/etc/hadoop$ vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:8020</value>
</property>
</configuration>

~hadoop-2.7.1/etc/hadoop$ vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/multinode/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/multinode/data</value>
</property>
<property>
<name>dfs.namenode.http.address</name>
<value>namenode:50070</value>
</property>
</configuration>
~hadoop-2.7.1/etc/hadoop$ vi yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

~hadoop-2.7.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml

~hadoop-2.7.1/etc/hadoop$ vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Note: These configurations are designed to work for the scenario of running
Namenode, ResourceManager and JobHistoryServer daemons in the “namenode”.

~hadoop-2.7.1/etc/hadoop$ vi slaves
datanode1
datanode2
datanode3

In “datanodes”:
Repeat the steps from 1 to 9 on all datanodes.

10. To enable password less login from namenode to all datanodes through SSH
In “namenode”:
~$ ssh-keygen
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub namenode
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub datanode1
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub datanode2
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub datanode3

This procedure avoids prompting for password, when starting the daemons.

11. Format the namenode before starting the daemons.


~$ hadoop namenode -format

This formats the dfs.namenode.name.dir location and creates the necessary files
and folders required for namenode.

Note: Steps 10 and 11 are one-time procedures.

12. Start the cluster


~$ start-dfs.sh
~$ start-yarn.sh
~$ mr-jobhistory-daemon start historyserver
Alternatively, to start all the daemons
~$ start-all.sh
~$ mr-jobhistory-daemon start historyserver

13. To check for the daemons, use jps (java process status)
~$ jps

14. To Stop the cluster


~$ stop-yarn.sh
~$ stop-dfs.sh
~$ mr-jobhistory-daemon stop historyserver
To stop all the daemons in one go
~$ stop-all.sh
~$ mr-jobhistory-daemon stop historyserver

Note: To stop or start daemons individually.


~$ hadoop-demon.sh <start | stop> <namenode | datanode>
~$ yarn-daemon.sh <start | stop> <resourcemanager | nodemanager>

To stop or start all datanodes


~$ hadoop-daemons.sh <start | stop> datanode

To stop or start all nodemanagers


~$ yarn-daemon.sh <start | stop> nodemanager

You might also like