Single Node Hadoop Cluster
Single Node Hadoop Cluster
Hadoop installation
Prerequisites
Sun java 7
After installation, make a quick check whether Sun’s JDK is correctly set up
$ java -version
This will add the user hduser and the group hadoop to your local machine.
Configuring SSH
$ su hduser
The hadoop control scripts rely on SSH to peform cluster-wide operations. For example, there is
a script for stopping and starting all the daemons in the clusters. To work seamlessly, SSH needs
to be setup to allow password-less login for the hadoop user from machines in the cluster. The
simplest way to achive this is to generate a public/private key pair, and it will be shared across
the cluster.
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine.
For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for
the hduser user we created in the earlier.
We have to generate an SSH key for the hduser user.
$ ssh-keygen -t rsa -P “”
P “”, here indicates an empty password.
You have to enable SSH access to your local machine with this newly created key which is done
by the following command.
$ ssh localhost
Now, let move hadooop 2.7.2 to a directory of our choice, we will choose /usr/local/hadoop
The following files will have to be modified to complete the Hadoop setup:
1. ~/.bashrc
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
6. /usr/local/hadoop/etc/hadoop/yarn-site.xml
1. ~/.bashrc
Now let edit the bashrc file and append to the end of the file the path to hadoop
$ source ~/.bashrc
2. hadoop-env.sh
Now let give the java path to run hadoop
export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
3. core-site.xml
The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting
up.
This file can be used to override the default settings that Hadoop starts with.
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
4. mapred-site.xml
$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
The mapred-site.xml file is used to specify which framework is being used for MapReduce.
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
5. hdfs-site.xml
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the cluster that is being
used.
It is used to specify the directories which will be used as the namenode and the datanode on that host.
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
6. yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Now, let create the folder where will process the hdfs jobs
hadoop namenode -format command should be executed once before we start using Hadoop.
If this command is executed again after Hadoop has been used, it'll destroy all the data on the Hadoop file
system.
Starting Hadoop
$ start-dfs.sh
$ start-yarn.sh
$ jps
jps : It is verify that cluster is properly created or not.
Now, check your hadoop is installed completly or not follow this steps,
1. open your browser
2. write localhost:8088 in your address bar