0% found this document useful (0 votes)

5 views113 pages

04. Hadoop Installaion (1)

The document outlines the installation and configuration of Hadoop in three modes: Local Standalone, Pseudo-distributed, and Fully Distributed. It provides detailed steps for setting up the environment, including creating users, installing Java, configuring SSH, and setting Hadoop paths. Additionally, it includes instructions for running a MapReduce program to count words in files using Hadoop's built-in examples.

Uploaded by

khyatipatamsetti09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views113 pages

04. Hadoop Installaion (1)

Uploaded by

khyatipatamsetti09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 113

Hadoop Installation and

Map-Reduce Programming

by
A K Chakravarthy
Assistant Professor
Department of Information Technology

1 Big Image Data Processing on Hadoop 02/03/25

Hadoop can be installed in your systems in three
different modes:

Local Standalone mode

Pseudo distributed mode
Fully distributed mode

2 Big Image Data Processing on Hadoop 02/03/25

3 Big Image Data Processing on Hadoop 02/03/25
4 Big Image Data Processing on Hadoop 02/03/25
5 Big Image Data Processing on Hadoop 02/03/25
6 Big Image Data Processing on Hadoop 02/03/25
Local Standalone mode:

This is the default mode. In this mode, all the

components of Hadoop, such as NameNode,
DataNode, JobTracker and TaskTracker, run on a
single Java process.

7 Big Image Data Processing on Hadoop 02/03/25

Pseudo-distributed mode:

In this mode, a separate JVM is spawned for each

of the Hadoop components and they communicate
across network sockets, effectively giving a fully
functioning minicluster on a single host.

8 Big Image Data Processing on Hadoop 02/03/25

Fully Distributed mode:

In this mode, Hadoop is spread across multiple

machines, some of which will be general-purpose
workers and others will be dedicated hosts for
components, such as NameNode and JobTracker.

9 Big Image Data Processing on Hadoop 02/03/25

Environment Setup for Hadoop

Hadoop is supported by Ubuntu/GNU/Linux

platform and its flavors.

Therefore, we have to install a Linux

operating system for setting up Hadoop
environment.

In case you have an OS other than Linux, you

can install a Virtualbox software in it and have
Linux inside the Virtualbox.

10 Big Image Data Processing on Hadoop 02/03/25

• Before installation of virtual box

• Install Microsoft Visual C++ Redistributable

Version

• Download Ubuntu -ubuntu-16.04.7-desktop-

amd64

• Install Ubuntu using Virtual box

M.Tech(CSE) Presentation 02/03/25

I. Local Standalone mode

12 Big Image Data Processing on Hadoop 02/03/25

Step-1: Creating a User in Ubuntu:
At the beginning, it is recommended to create a
separate user for Hadoop to isolate Hadoop file system
from Unix file system.

In addition to this, as we need to prepare cluster, first

create group and then a user in that group.

13 Big Image Data Processing on Hadoop 02/03/25

Follow the steps given below to create a group and a user
in that group:

$ clear
$ sudo addgroup aec_viper_group
$ sudo adduser –ingroup aec_viper_group aec_viper_user

Password is ‘aec’ in both the cases

14 Big Image Data Processing on Hadoop 02/03/25

$sudo gedit /etc/sudoers

Add the following line (after %sudo ALL=(ALL:ALL)

ALL)

%aec_viper_group ALL=(ALL:ALL) ALL

15 Big Image Data Processing on Hadoop 02/03/25

Change the user form existinguser… to acet_viper_user

LOGOUT of the present user

After this, if you type pwd, we should get

$aec_viper_user@ .......

16 Big Image Data Processing on Hadoop 02/03/25

Check, whether Java is installed or not
$java –version

If not
$ sudo apt-get install default –jre
$ sudo apt-get install default –jdk
(Internet connection is must)

17 Big Image Data Processing on Hadoop 02/03/25

Step-2: SSH Setup and Key Generation in
Ubuntu:

SSH setup is required to do different operations on a

cluster such as starting, stopping, distributed daemon
shell operations. To authenticate different users of
Hadoop, it is required to provide public/private key pair for
a Hadoop user and share it with different users.

18 Big Image Data Processing on Hadoop 02/03/25

$ ssh localhost
After Typing this you must be able to connect to
localhost and able to see the following output.
SCREENSHOT1
Otherwise use the following commands
$ sudo apt-get install openssh-server
$ sudo apt-get install vsftpd
(Internet connection is must)

$ ssh localhost

19 Big Image Data Processing on Hadoop 02/03/25

The following commands are used for generating
•A key value pair using SSH.
•Copy the public keys from id_rsa.pub to authorized_keys
and
•Provide the owner with read and write permissions to
authorized_keys file respectively.

$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >>
~/.ssh/authorized_keys $ chmod 0600
~/.ssh/authorized_keys

20 Big Image Data Processing on Hadoop 02/03/25

Step-3: Installing Java
All of you type this command
$ uname –i

As Java is already installed in your

computer, You will get either
x86_64 or
i686 or
something else

21 Big Image Data Processing on Hadoop 02/03/25

Then open a text editor using search in ubuntu.
Create a file abc.txt

Those who got x86_64, type these two lines into

abc.txt
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin

Those who got other than x86_64, type these

two lines into abc.txt
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export PATH=$PATH:$JAVA_HOME/bin

22 Big Image Data Processing on Hadoop 02/03/25

Step-4: Download Hadoop

As Hadoop is already downloaded in one computer in

this lab, you need to copy that downloaded Hadoop into
your computer. So follow the following commands to do
this.

$ pwd (It must be nitw_viper_user)

$ wget ftp://172.168.10.168/Downloads/hadoop-2.7.2.tar.gz --user=lenova

--password=nitw
By doing this, the download Hadoop software which is in the tar form will be
copied into ‘nitw_viper_user’.

23 Big Image Data Processing on Hadoop 02/03/25

Step-5: Untar the Hadoop

To untar (like unzipping) the Hadoop in nitw_cvhd_user

follow the following commands.

$ tar zxf hadoop-2.7.2.tar.gz

$ ls –lrt

24 Big Image Data Processing on Hadoop 02/03/25

Step-6: Setting the Hadoop Path

Just like your Java path, Now we are doing it for Hadoop
path.
Open the file abc.txt, where you have already typed two
lines in that previously

Type the following two lines into abc.txt

export HADOOP_HOME=/home/acet_viper_user/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin

25 Big Image Data Processing on Hadoop 02/03/25

Step-7: Updating the bashrc file
To set the environmental changes, i.e. to keep the java path and
hadoop path to work correctly in all ‘terminals’, we have to save the
changes to ‘bashrc file’ of the current user.
Actually, bashrc is a hidden file. So to open this file, we have to
use .bashrc.
Use the following commands to do this.
$ nano .bashrc
bashrc will be opened,
1.go to the end of the file
2.paste the 4-lines from abc.txt into this bashrc file
3.Save the bashrc file

26 Big Image Data Processing on Hadoop 02/03/25

Step-8:Now use the command

$ source ~/.bashrc

(This command is to refresh the terminal with updated bashrc)

27 Big Image Data Processing on Hadoop 02/03/25

Step-9: Just verify, whether everything is done
properly or not , to check the path settings are
reflected or not.

$ echo $JAVA_HOME

$ echo $HADOOP_HOME

$ echo $PATH (It will show both Java path and Hadoop Path)

28 Big Image Data Processing on Hadoop 02/03/25

Step-10: Now, Check whether Hadoop is working
or not. (Just like Java -version)
$ hadoop version

Now Hadoop on Local Standalone

mode is Ready
29 Big Image Data Processing on Hadoop 02/03/25
Now, We run a program on Hadoop from the already
existing examples given.

Let's have an input directory where we will push a few files

and our requirement is to count the total number of words
in those files. To calculate the total number of words, we do
not need to write our MapReduce, provided the .jar file
contains the implementation for word count. You can try
other examples using the same .jar file; just issue the
following commands to check supported MapReduce
functional programs by hadoop-mapreduce-examples-
2.7.1.jar file.

30 Big Image Data Processing on Hadoop 02/03/25

$ cp $HADOOP_HOME/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.7.1.jar .
(This dot is must, i.e. to copy into present working director)

This command copies the examples into present working

directory i.e. nitw_viper_user

$ ls –lrt

31 Big Image Data Processing on Hadoop 02/03/25

Step-12: Now, we will execute wordcount program.

For this, we need input directory, from which the input

will be taken by the program.

And output will be written to another directory.

So First create a directory named as input_dir and copy

some text files into this directory.

$ mkdir input_dir
$ cp $HADOOP_HOME/*.txt input_dir
$ cd input_dir
$ ls –lrt
$ cd .. //to come to nitw_viper_user
32 Big Image Data Processing on Hadoop 02/03/25
Step-13: Now, use the following command to execute the
program Jar file name in the
current directory

$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar

wordcount input_dir output_dir

This is
keyword
just like
java This is
This is output
This is the input directory
file name directory

33 Big Image Data Processing on Hadoop 02/03/25

Step-14: To see the output

$ cd output_dir
$ ls –lrt
$ cat part-r-00000

34 Big Image Data Processing on Hadoop 02/03/25

Assignment-1

35 Big Image Data Processing on Hadoop 02/03/25

II. Pseudo-distributed mode

36 Big Image Data Processing on Hadoop 02/03/25

Step-1: Setting Up Hadoop
You can set Hadoop environment variables by
appending the following commands to .bashrc
and then save it
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”
export PATH=$PATH:$HADOOP_HOME/sbin

37 Big Image Data Processing on Hadoop 02/03/25

Step-2:Now use the command

$ source ~/.bashrc

(This command is to refresh the terminal with updated bashrc)

38 Big Image Data Processing on Hadoop 02/03/25

Step-3: Hadoop Configuration

You can find all the Hadoop configuration files in the location
“$HADOOP_HOME/etc/hadoop”.

It is required to make changes in those configuration files

according to your Hadoop infrastructure.

$ cd $HADOOP_HOME/etc/hadoop
$ pwd

39 Big Image Data Processing on Hadoop 02/03/25

In order to develop Hadoop programs in java, you have to reset the
java environment variables in hadoop-env.sh file by replacing
JAVA_HOME value with the location of java in your system.

$ nano hadoop-env.sh

Those who got x86_64 (uname -i), type

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Write this statement at the end of the file

Those who got other than x86_64, type
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386

40 Big Image Data Processing on Hadoop 02/03/25

You need to configure the following
files also
core-site.xml
hdfs-site.xml
yarn-site-xml
mapred-site.xml

41 Big Image Data Processing on Hadoop 02/03/25

Step-4: Configuring core-site.xml

The core-site.xml file contains information such as the port

number used for Hadoop instance, memory allocated for the file
system, memory limit for storing the data, and size of
Read/Write buffers.
Open the core-site.xml and add the following properties in
between <configuration>, </configuration> tags.

$ nano core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

42 Big Image Data Processing on Hadoop 02/03/25

Step-5: Configuring hdfs-site.xml :

$ nano hdfs-site.xml
The hdfs-site.xml file contains information such as the value of
replication data, namenode path, and datanode paths of your local file
systems. It means the place where you want to store the Hadoop
infrastructure.
Let us assume the following data.
dfs.replication (data replication value) = 1

(In the below given path /nitw_cvhd_user/ is the user name.

hadoopinfra/hdfs/namenode is the directory created by hdfs file system.)

namenode path = //home/nitw_cvhd_user/ hadoopinfra/hdfs/namenode

(hadoopinfra/hdfs/datanode is the directory created by hdfs file system.)

datanode path = //home/nitw_cvhd_user/hadoopinfra/hdfs/datanode
43 Big Image Data Processing on Hadoop 02/03/25
<configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<value>file:///home/acet_viper_user/hadoopinfra/hdfs/namenode</value>
</property>

<value>file:///home/acet_viper_user/hadoopinfra/hdfs/datanode</value>
</property>

</configuration>

44 Big Image Data Processing on Hadoop 02/03/25

Step-6: Configuring yarn-site.xml

$ nano yarn-site.xml

This file is used to configure yarn into Hadoop. Open

the yarn-site.xml file and add the following
properties in between the <configuration>,
</configuration> tags in this file.

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

45 Big Image Data Processing on Hadoop 02/03/25

Step-7: Configuring mapred-site.xml

This file is used to specify which MapReduce framework we

are using.
By default, Hadoop contains a template of mapred-
site.xml.template First of all, it is required to copy the file from
mapred-site,xml.template to mapred-site.xml file using the
following command.

$ cp mapred-site.xml.template mapred-site.xml

Then open the mapred-site.xml and add the following properties in

between the <configuration> </configuration> tags.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
46 Big Image Data Processing on Hadoop 02/03/25
Step-8: Verifying Hadoop Installation

Goto home directory.

$ cd

$ hdfs namenode –format

47 Big Image Data Processing on Hadoop 02/03/25

Step-9: Verifying Hadoop dfs

Goto home directory

$ start–dfs.sh

$ start–yarn.sh

48 Big Image Data Processing on Hadoop 02/03/25

Step-9: Verifying Hadoop dfs

$ jps

49 Big Image Data Processing on Hadoop 02/03/25

Before proceeding further, you need to make sure that
Hadoop is working fine. Just issue the following command:
$ hadoop version
If everything is fine with your setup, then you should see the
following result:

It means your Hadoop's pseudo distributed mode setup is

working fine. By default, Hadoop is configured to run in a
non-distributed mode on a single machine.
50 Big Image Data Processing on Hadoop 02/03/25
51 Big Image Data Processing on Hadoop 02/03/25
52 Big Image Data Processing on Hadoop 02/03/25
Word-count Program execution
in HDFS Environment

When you have installed in Standalone mode, the

data that you have used to run the program is from
local file system.

But now, in Pseudo Distributed mode, we will be

seeing how to put the data into HDFS and get the
data from HDFS, so that we will have the feeling of
working on Hadoop(storage).

53 Big Image Data Processing on Hadoop 02/03/25

$hdfs dfs –mkdir hdfs://localhost:9000/acetinput
$hdfs dfs –ls hdfs://localhost:9000/

$hdfs dfs –put /../..file1.txt hdfs://localhost:9000/acetinput

$hdfs dfs –put /../..file2.txt hdfs://localhost:9000/acetinput

54 Big Image Data Processing on Hadoop 02/03/25

55 Big Image Data Processing on Hadoop 02/03/25
56 Big Image Data Processing on Hadoop 02/03/25
Step-13: Now, use the following command to execute the
program Jar file name in the
current directory

$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar

wordcount hdfs://localhost:9000/acetinput hdfs://localhost:9000/acetoutput

This is
keyword This is
just like output
java directory
This is
This is the input
file name directory

57 Big Image Data Processing on Hadoop 02/03/25

Step-14: To see the output

$ hdfs dfs –cat hdfs://localhost:9000/acetoutput/part-r-00000

58 Big Image Data Processing on Hadoop 02/03/25

III. Fully-Distributed Mode

59 Big Image Data Processing on Hadoop 02/03/25

Prerequisites

Configuring Pseudo distributed mode of

Hadoop.
(Here we have used six Pseudo distributed Mode of Hadoop
installed systems. )

 Stop all the processes running in all the six

systems by using the command.

$stop-all.sh (in all the six systems)

60 Big Image Data Processing on Hadoop 02/03/25

ALL THE SLAVE NODES MUST
REMAIN IDLE UNLESS
SPECIFIED!

61 Big Image Data Processing on Hadoop 02/03/25

Networking

Update /etc/hosts on all machines. Put the alias

to the ip addresses of all the machines. Here we
are creating a cluster of 6 machines, one is
master, one is secondary and other 4 are slaves.

But I will be showing in one system,

afterwards from this system itself I will be
accessing all the remaining 5 systems and
update the data. i.e. secondary node and
slave nodes need not do anything for the
time being.
62 Big Image Data Processing on Hadoop 02/03/25
$sudo gedit /etc/hosts
Add the following lines at the end of this file (for six node
cluster)
(IP Address) (hostname) (alias)

192.168.192.104 selab104 master

192.168.192.105 selab105 secondarymaster
192.168.192.101 selab101 slave1
192.168.192.102 selab102 slave2
192.168.192.103 selab103 slave3
192.168.192.106 selab106 slave4

Note: To know the hostname of your computer: $hostname

63 Big Image Data Processing on Hadoop 02/03/25

$sudo gedit /etc/hosts
Add the following lines at the end of this file (for six node
cluster)
And make comment to the hostname of the system i.e. ‘#’
symbol before it.
Note: To know the hostname of your computer: $hostname
Full Distributed Mode (etc/hosts )
1. Put comments to the host name of the system i.e. # symbol before them.
127.0.0.1 localhost
# 127.0.1.1 selab114 (112/113/115/116/118)

2.(in addition to the existing file add these at the end)

(IP Address) (hostname) (alias)
192.168.192.114 selab114 master
192.168.192.112 selab112 secondarymaster
192.168.192.113 selab113 slave1
192.168.192.115 selab115 slave2
192.168.192.116 selab116 slave3
192.168.192.118 selab118 slave4

64 Big Image Data Processing on Hadoop 02/03/25

Process to connect to other 5 systems
SSH Access

The ‘nitw_viper_user’ user on the master must be able to connect:

1. To its own user account on the master.
$ssh master in this context.

2. To the ‘nitw_viper_user’ user account on other slaves via a

password-less SSH login.

•Add the ’master’ public SSH key using the following command.

$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub nitw_viper_user@secondarymaster

$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub nitw_viper_user@slave1
$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub nitw_viper_user@slave2
$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub nitw_viper_user@slave3
$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub nitw_viper_user@slave4

65 Big Image Data Processing on Hadoop 02/03/25

Process to connect to other 5 systems…

Connect with user nitw_viper_user from the master to the user account
nitw_viper_user on all slaves in 6 different terminals! (UBUNTU Terminals)

1. From master to master: nitw_cvhd_user@selab104:~$ ssh master

2.From master to secondarymaster : nitw_viper_user@selab104:~$ ssh secondarymaster

which results in nitw_cvhd_user@selab105:~$
From master to slave2 : nitw_cvhd_user@selab114:~$ ssh slave1
which results in nitw_cvhd_user@selab101:~$
From master to slave3 : nitw_cvhd_user@selab114:~$ ssh slave2
which results in nitw_cvhd_user@selab102:~$
From master to slave4 : nitw_cvhd_user@selab114:~$ ssh slave3
which results in nitw_cvhd_user@selab103:~$
From master to slave5 : nitw_cvhd_user@selab114:~$ ssh slave4
which results in nitw_cvhd_user@selab106:~$

66 Big Image Data Processing on Hadoop 02/03/25

Screenshot to show all the nodes
accessed from the Master node

67 Big Image Data Processing on Hadoop 02/03/25

Now Update the /etc/hosts

of the secondarymaster and slave nodes(directly from the master

node)

68 Big Image Data Processing on Hadoop 02/03/25

Now we need to inform the master name to
Hadoop:
Create a file named masters in (means this file is not existing as on now)

/home/hadoop-2.7.2/etc/hadoop directory

Add the name of secondarymaster in masters file.

(It has to be done in all the systems:
master + secondarymaster + slaves,
but here I will be accessing all the systems from my system
and doing it.)

nitw_cvhd_user@selab104:~$ cd /home/hadoop-2.7.2/etc/hadoop
nitw_cvhd_user@selab104:~$ sudo gedit masters
Add the following line
secondarymaster
69 Big Image Data Processing on Hadoop 02/03/25
70 Big Image Data Processing on Hadoop 02/03/25
Now, we have to do the same process in
all the reaming 5 systems (I will be doing it
without touching those 5 systems. (i.e.
through ‘ssh’, already connected))

nitw_viper_user@selab105:~$

nitw_viper_user@selab101:~$ nitw_viper_user@selab102:~$

nitw_viper_user@selab103:~$ nitw_viper_user@selab106:~$

71 Big Image Data Processing on Hadoop 02/03/25

Now we need to inform the Slave names to Hadoop

Then, we need to inform the slaves’ names to Hadoop:

Edit the file named slaves in (means this file is already existing)

/home/hadoop-2.7.2/etc/hadoop directory

Add the names of slaves in slaves file.

(It has to be done in all the systems:
master + secondarymaster + slaves,
but here I will be accessing all the systems from my system
and doing it.)

72 Big Image Data Processing on Hadoop 02/03/25

nitw_viper_user@selab104:~$ cd /home/hadoop-2.7.2/etc/hadoop
nitw_viper_user@selab104:~$ sudo gedit slaves
Add the following lines
slave1
slave2
slave3
slave4

73 Big Image Data Processing on Hadoop 02/03/25

Now, we have to do the same process in
all the reaming 5 systems (I will be doing it
without touching those 5 systems. (i.e.
through ‘ssh’, already connected))

nitw_viper_user@selab105:~$

nitw_viper_user@selab101:~$ nitw_viper_user@selab102:~$

nitw_viper_user@selab103:~$ nitw_viper_user@selab106:~$

74 Big Image Data Processing on Hadoop 02/03/25

Now, edit the ‘core-site.xml’ (all machines)

nitw_viper_user@selab104:~/home/hadoop-2.7.2/etc/hadoop$
sudo gedit core-site.xml

nitw_viper_user@selab105(101,102,103,104,106):~/home/hadoop-2.7.2/
etc/hadoop$ sudo gedit core-site.xml

Change the fs.default.name parameter (in conf/core-site.xml),

which specifies the NameNode (the HDFS master) host and port.

/home/hadoop-2.7.2/etc/hadoop/core-site.xml (ALL
machines .i.e. Master as well as slave)
Pseudo Mode(core-site.xml) Full Distributed Mode(core-
site.xml)
<property> <property>
<name>fs.default.name</name>
<name>fs.default.name</name> <value>hdfs://master:9000</value>
</property>
<value>hdfs://localhost:9000</val
ue>
</property>

75 Big Image Data Processing on Hadoop 02/03/25

76 Big Image Data Processing on Hadoop 02/03/25
Now, edit the ‘hdfs-site.xml’ (all machines)
nitw_cvhd_user@selab104:~/home/hadoop-2.7.1/etc/hadoop$
sudo gedit hdfs-site.xml

nitw_viper_user@selab105(101,102,103,104,106):~/:~/home/
hadoop-2.7.1/etc/hadoop$ sudo gedit hdfs-site.xml

Change the dfs.replication parameter (in conf/hdfs-site.xml) which

specifies the default block replication. We have 5 data nodes
available, so we set dfs.replication to 3. (you can have any
number, but 3 is optimal)
conf/hdfs-site.xml (ALL machines)
Pseudo Mode(hdfs-site.xml) Full Distributed Mode (hdfs-site.xml)
<property> <property>
<name>dfs.replication</name> <name>dfs.replication</name>
<value>1</value> <value>3</value>
</property> </property>
<property> <property>
<name>dfs.name.dir</name> <name>dfs.name.dir</name>
<value>file:///home/nitw_viper_user/hadoopinfra/hdfs/namenode</value>
<value>file:///home/nitw_viper_user/hadoopinfra/hdfs/namenode</value> </property>
</property> <property>
<property> <name>dfs.data.dir</name>
<name>dfs.data.dir</name> <value>file:///home/nitw_viper_user/hadoopinfra/hdfs/datanode</value>
</property>
<value>file:///home/nitw_viper_user/hadoopinfra/hdfs/datanode</value>
</property> <property>
<name>dfs.secondary.http-address</name>
<value>selab123:50090</value>
<description>hostname:portnumber</description>
</property>

77 Big Image Data Processing on Hadoop 02/03/25

secondarymaster

78 Big Image Data Processing on Hadoop 02/03/25

Now, edit the ‘yarn-site.xml’ (all machines)

nitw_cvhd_user@selab104:~/home/hadoop-2.7.2/etc/hadoop$
sudo gedit yarn-site.xml

nitw_cvhd_user@selab105(101,102,103,104,106):~/:~/home/
hadoop-2.7.2/etc/hadoop$ sudo gedit yarn-site.xml

Change the dfs.replication parameter (in conf/hdfs-site.xml) which

specifies the default block replication. We have 5 data nodes
available, so we set dfs.replication to 3. (you can have any
number, but 3 is optimal)
Pseudo Mode(yarn-site.xml) Full Distributed Mode(yarn-site.xml)
<property> <property>

conf/yarn-site.xml (ALL machines)

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</property> <property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>

79 Big Image Data Processing on Hadoop 02/03/25

80 Big Image Data Processing on Hadoop 02/03/25
Now, edit the ‘mapred-site.xml’ (all machines)
nitw_cvhd_user@selab104:~/home/hadoop-2.7.2/etc/hadoop$ sudo
gedit mapred-site.xml

nitw_cvhd_user@selab105(101,102,103,104,106):~/home/hadoop-
2.7.2/etc/hadoop$ sudo gedit mapred-site.xml

Change the mapred.job.tracker parameter (in conf/mapred-

site.xml), which specifies the JobTracker (MapReduce master) host
and port.
Pseudo Mode (mapred-site.xml) Full Distributed Mode (mapred-site.xml)
<property> <property>
<name>mapreduce.framework.name</name> <name>mapreduce.framework.name</name>
<value>yarn</value>
<value>yarn</value>
</property>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>6</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>6</value>
</property>
<property>
<name>mapred.map.child.java.opts</name>
<value>-Xmx512m</value>
</property>
<property>
<name>mapred.reduce.child.java.opts</name>
<value>-Xmx512m</value>
</property>

81 Big Image Data Processing on Hadoop 02/03/25

82 Big Image Data Processing on Hadoop 02/03/25
Formatting the HDFS filesystem via the NameNode
$ cd
$ hdfs namenode –format

83 Big Image Data Processing on Hadoop 02/03/25

Starting the multi-node cluster

Cluster is started by running the following command on master

nitw_viper_user@selab104:~$cd /home/hadoop-2.7.2

nitw_cvhd_user@selab104:~$ bin/start-all.sh

To check the daemons running ,

run jps on master and slave

84 Big Image Data Processing on Hadoop 02/03/25

HDFS Part (Storage):
•The NameNode daemon is started on master,
•SecondaryNode is started on secondarymaster and
•DataNode daemons are started on all slaves (here: master and
slave).

YARN Part (Processing):

•The ResouceManager is started on master, and
•NodeManager daemons are started on all slaves.

85 Big Image Data Processing on Hadoop 02/03/25

86 Big Image Data Processing on Hadoop 02/03/25
87 Big Image Data Processing on Hadoop 02/03/25
88 Big Image Data Processing on Hadoop 02/03/25
$hdfs dfs –mkdir hdfs://master:9000/acetinput1
$hdfs dfs –ls hdfs://master:9000/

$hdfs dfs –put /../..file1.txt hdfs://master:9000/acetinput1

$hdfs dfs –put /../..file2.txt hdfs://master:9000/acetinput1

89 Big Image Data Processing on Hadoop 02/03/25

90 Big Image Data Processing on Hadoop 02/03/25
91 Big Image Data Processing on Hadoop 02/03/25
92 Big Image Data Processing on Hadoop 02/03/25
93 Big Image Data Processing on Hadoop 02/03/25
94 Big Image Data Processing on Hadoop 02/03/25
95 Big Image Data Processing on Hadoop 02/03/25
96 Big Image Data Processing on Hadoop 02/03/25
97 Big Image Data Processing on Hadoop 02/03/25
98 Big Image Data Processing on Hadoop 02/03/25
99 Big Image Data Processing on Hadoop 02/03/25
Map-Reduce Programming

by
Dr. U.S.N. Raju
Asst. Professor, Dept. of CS&E,
N.I.T. Warangal

100 Big Image Data Processing on Hadoop 02/03/25

101 Big Image Data Processing on Hadoop 02/03/25
102 Big Image Data Processing on Hadoop 02/03/25
OpenCV Installation

by
Dr. U.S.N. Raju
Asst. Professor, Dept. of CS&E,
N.I.T. Warangal

103 Big Image Data Processing on Hadoop 02/03/25

Copy the following code in a file named opencv.sh
version="$(wget -q -O - http://sourceforge.net/projects/opencvlibrary/files/opencv-unix | egrep -m1 -o '\"[0-9](\.[0-9]+)+' |
cut -c2-)"
echo "Installing OpenCV" $version
mkdir OpenCV
cd OpenCV
echo "Removing any pre-installed ffmpeg and x264"
sudo apt-get -qq remove ffmpeg x264 libx264-dev
echo "Installing Dependenices"
sudo apt-get -qq install libopencv-dev build-essential checkinstall cmake pkg-config yasm libjpeg-dev libjasper-dev
libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine-dev libgstreamer0.10-dev libgstreamer-plugins-
base0.10-dev libv4l-dev python-dev python-numpy libtbb-dev libqt4-dev libgtk2.0-dev libfaac-dev libmp3lame-dev
libopencore-amrnb-dev libopencore-amrwb-dev
echo "Downloading OpenCV" $version
wget -O OpenCV-$version.zip http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/$version/
opencv-"$version".zip/download
echo "Installing OpenCV" $version
unzip OpenCV-$version.zip
cd opencv-$version
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D
BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D INSTALL_C_EXAMPLES=ON -D
INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D WITH_QT=ON -D WITH_OPENGL=ON ..
make -j2
sudo checkinstall
sudo sh -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf'
sudo ldconfig
echo "OpenCV" $version "ready to be used"

104 Big Image Data Processing on Hadoop 02/03/25

Then change the permissions for the file with
$chmod 777 opencv.sh
Then execute the script file as $./opencv.sh
Now open the terminal and type $python
Then the code as shown below
>>import cv2
If this gives an error, as shown below

105 Big Image Data Processing on Hadoop 02/03/25

Perform the steps as shown in below image

106 Big Image Data Processing on Hadoop 02/03/25

WebHDFS Rest-API

We use Rest-API of HDFS known as WebHDFS for handling with

images and videos in Hadoop. To enable this, add the following
two lines in hdfs-site.xml file:

Configuring hdfs-site.xml

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>

107 Big Image Data Processing on Hadoop 02/03/25

Creating Directory in HDFS

To create a directory named “input” in HDFS use the following

command after starting Hadoop

$hdfs dfs –mkdir /user/nitw_cvhd_user2/input

Copying a image into HDFS

To copy a image named “input_image.jpg” into “input”

directory in HDFS use the following command

$hdfs dfs –copyFromLocal <image_path>

/user/nitw_cvhd_user2/input/input_image.jpg

Opening an image which is in HDFS using webhdfs

An image named “input_image.jpg” which is in directory

named “input” in HDFS can now be opened using following URL

http://localhost:50070/webhdfs/v1/user/nitw_cvhd_user2/input/
input_image.jpg?op=OPEN
108 Big Image Data Processing on Hadoop 02/03/25
109 Big Image Data Processing on Hadoop 02/03/25
Creating 8 Bitmap images for an image in HDFS

The file named bitmap_hdfs.py has code for calculating 8

bitmap images for a given image where the input image is taken
from HDFS and output images are written to local file system.

The following image shows the contents of input directory

110 Big Image Data Processing on Hadoop 02/03/25

111 Big Image Data Processing on Hadoop 02/03/25
Creating 8 Bitmap images for an image in HDFS

The following images show the contents of output directory

before and after putting output images into HDFS

112 Big Image Data Processing on Hadoop 02/03/25

Thank you …

113 Big Image Data Processing on Hadoop 02/03/25

CSCP Sample Questions - PDF
No ratings yet
CSCP Sample Questions - PDF
4 pages
Cyber Tools-Bk1-Ch1 and 2
25% (4)
Cyber Tools-Bk1-Ch1 and 2
13 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
BD Lab File
No ratings yet
BD Lab File
39 pages
hbase_installationn
No ratings yet
hbase_installationn
12 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Hadoop Installation
No ratings yet
Hadoop Installation
12 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Experiment 1 Hadoop Installation
No ratings yet
Experiment 1 Hadoop Installation
6 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
6 pages
BDA lab manual UPDATED
No ratings yet
BDA lab manual UPDATED
45 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
big-data-file
No ratings yet
big-data-file
32 pages
big data
No ratings yet
big data
32 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
CP5261Data Analytics Laboratory
No ratings yet
CP5261Data Analytics Laboratory
57 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Installing Hadoop On Ubuntu
No ratings yet
Installing Hadoop On Ubuntu
29 pages
BDAO
No ratings yet
BDAO
23 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
unit-4-unit-4-bda
No ratings yet
unit-4-unit-4-bda
16 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
bdamanual (2)
No ratings yet
bdamanual (2)
8 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Aryan
No ratings yet
Aryan
60 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Bda Unit-4 Notes
No ratings yet
Bda Unit-4 Notes
15 pages
Bda Record
No ratings yet
Bda Record
27 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Step 1 - Install Oracle Java 8 On Ubuntu
No ratings yet
Step 1 - Install Oracle Java 8 On Ubuntu
7 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Experiment-2_BDA_Lab
No ratings yet
Experiment-2_BDA_Lab
13 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Final Copy - BDA LAB Record
No ratings yet
Final Copy - BDA LAB Record
44 pages
Notes - Unit 4 - Basics of Hadoop-3-16 (1)
No ratings yet
Notes - Unit 4 - Basics of Hadoop-3-16 (1)
14 pages
Hadoop/Hbase Installation: Install Java
No ratings yet
Hadoop/Hbase Installation: Install Java
11 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
How To Install Hadoop On Ubuntu 18
No ratings yet
How To Install Hadoop On Ubuntu 18
15 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Test Report For Feeder Protection Relay
No ratings yet
Test Report For Feeder Protection Relay
3 pages
3D Printed Low-Cost Force-Torque Sensors
No ratings yet
3D Printed Low-Cost Force-Torque Sensors
17 pages
My ML WY RMS: Series
No ratings yet
My ML WY RMS: Series
44 pages
UNIT 2 IOT NEW
No ratings yet
UNIT 2 IOT NEW
16 pages
Sample Pages: Excel / VBA For Business & Accounting
No ratings yet
Sample Pages: Excel / VBA For Business & Accounting
10 pages
Rammer 1533 Operator's Manual
No ratings yet
Rammer 1533 Operator's Manual
68 pages
Welcometothe: World
No ratings yet
Welcometothe: World
25 pages
328 NSQF
No ratings yet
328 NSQF
146 pages
Laparoscopic Pump LP100: One Device, Two Functions, More Advantages
No ratings yet
Laparoscopic Pump LP100: One Device, Two Functions, More Advantages
4 pages
Chapter 8 - Synchronous
No ratings yet
Chapter 8 - Synchronous
12 pages
Citra Log
No ratings yet
Citra Log
39 pages
ADAS Systems
No ratings yet
ADAS Systems
21 pages
Company Profile: Chartered Information Systems Pvt. LTD
No ratings yet
Company Profile: Chartered Information Systems Pvt. LTD
7 pages
Code Visual To Flowchart 3
No ratings yet
Code Visual To Flowchart 3
7 pages
Pavan B S Exp SAP FICO Salary 13LPA
No ratings yet
Pavan B S Exp SAP FICO Salary 13LPA
4 pages
Sixth Semester Syllabus: Electrical Machine Design
No ratings yet
Sixth Semester Syllabus: Electrical Machine Design
5 pages
Pit Viper 275
No ratings yet
Pit Viper 275
7 pages
Voltage Check: Inspection of ECM and Its Circuit
100% (1)
Voltage Check: Inspection of ECM and Its Circuit
22 pages
Forklift Operator Daily Checklist
No ratings yet
Forklift Operator Daily Checklist
2 pages
Section 4: Microwave Link Availability, Performance Objectives and Planning Guidelines
No ratings yet
Section 4: Microwave Link Availability, Performance Objectives and Planning Guidelines
78 pages
Week 5 - Sqlite - 4 - SQL TCL
No ratings yet
Week 5 - Sqlite - 4 - SQL TCL
3 pages
Strategic Management and Strategic Competitiveness
No ratings yet
Strategic Management and Strategic Competitiveness
21 pages
Pressure Switch Danfoss - DKRCC - PI.CA0.B3.22 - KP - MS PDF
No ratings yet
Pressure Switch Danfoss - DKRCC - PI.CA0.B3.22 - KP - MS PDF
4 pages
739658
No ratings yet
739658
2 pages
Engineering Entrepreneurship
No ratings yet
Engineering Entrepreneurship
6 pages
Dranetz 4400 PowerGuide
No ratings yet
Dranetz 4400 PowerGuide
4 pages
CNS - Unit 1
No ratings yet
CNS - Unit 1
25 pages
Man User DPS200
No ratings yet
Man User DPS200
110 pages