0% found this document useful (0 votes)
26 views47 pages

BDH Record - Merged

The document describes performing a MapReduce word count job. It involves formatting HDFS, starting HDFS and YARN, exporting classpath, copying input file to HDFS, compiling the Mapper and Reducer java files, and running the job to output word counts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views47 pages

BDH Record - Merged

The document describes performing a MapReduce word count job. It involves formatting HDFS, starting HDFS and YARN, exporting classpath, copying input file to HDFS, compiling the Mapper and Reducer java files, and running the job to output word counts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

MBL21304L –Big Data with Hadoop

Record Work

Register Number : RA2152007010018

Name of the Student : Raghuram Krishna B S

Semester/Year : III Semester/II Year

Department : MBA

Speclization : Business Analytics


SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
S.R.M. NAGAR, KATTANKULATHUR -603 203

BONAFIDE CERTIFICATE

Register No. RA2152007010018

Certified to be the bonafide record of work done by Raghuram Krishna B S of second


year, MBA Degree course in the Practical MBL21304L –Big Data with Hadoop in
SRM Institute of Science and Technology, Kattankulathur during the academic year 2022-
2023(Odd Sem).

Signature of the Faculty Signature of the Dean/CoM

Submitted for University Examination held on SRM Institute of Science

and Technology, Kattankulathur.

Examiner-1 Examiner-2
List of Experiments
EX PAGE
DATE NAME OF THE EXERCISE
NO NO

1 20.07.2022 1
Installation of Hadoop with Ubuntu

2 20.07.2022 5
Basic commands to work with Ubuntu

3 02.08.2022 8
Basic HDFS commands

4 10.08.2022 11
HDFS Shell commands – file folder commands

5 24.08.2022 16
HDFS Admin commands

6 15.09.2022 18
Map reduce word count

7 23.09.2022 22
Map reduce max temperature

8 20.10.2022 28
Mongo DB commands

9 25.10.2022 32
Pig Latin commands

10 27.10-2022 40
Map reduce matrix multiplication
Exercise No: 1
INSTALLATION OF HADOOP WITH UBUNTU
Date: 20.07.2022

AIM
To install Hadoop with Ubuntu in the system

PROCEDURE

Step 1: Installation of Oracle VM Virtual box 6.1.26 setup


Step 2: Start installation procedure, open terminal in Linux. To check installed
version
Step 3: Create Hadoop group and new year
Step 4: SSH Server to install, run
Step 5: SSH key generation. Create SSH key and add it to authorized keys
Step 6: Hadoop installation in hduser1. Download and unpack Apache Hadoop
Step 7: Hadoop configuration
Step 8: Format file system
Step 9: Start Hadoop
Step 10: Check if everything is running.

1
OUTPUT

2
3
RESULT
The installation of Hadoop with Ubuntu has been performed

4
Exercise No: 2
BASIC COMMANDS TO WORK WITH UBUNTU
Date: 20.07.2022

AIM
To work with basic commands in Ubuntu

PROCEDURE

Step 1: Give the following commands to work with some basic functions.
Step 2: pwd - This command refers to the present working directory in which you are
operating
Step 3: dir - This command is used to print all the available directories in the present
working directory
Step 4: ls – This command is used to list down all the directories and files inside the
present working directories.
Step 5: cd - You can change the directories in the terminal
Step 6: touch -This command is used to create a new file
Step 7: mkdir - This command will make a directory in pwd
Step 8: rmdir - This command will remove the directory
Step 9: ping - Use ping commands to check the connectivity to your service
Step 10: hostname - Displays the hostname
Step 11: uname – Use this command to get the release number, version of Linux and
much more
Step 12: hadoop
Step 13: hadoop version

OUTPUT

5
6
RESULT

Basic command in Ubuntu has been performed

7
Exercise No: 3
BASIC HDFS COMMANDS
Date: 02.08.2022

AIM
To work with basic HDFS commands

PROCEDURE

Step 1: For formatting name node in hadoop give the following commands
cd hadoop
cd bin
./hadoop namenode -format
Step 2: After navigating into binary folder in Hadoop name node formatting is done
Step 3: To create cluster and start nodes, change to sbin folder and run following
commands.
cd ..
cd sbin
start-dfs.sh
Step 4: To start the resource manager and check the running nodes input the
following command
start-yarn.sh
jps
Step 5: To check the status of the running nodes, run the below command.
jps
Step 6: For creating a directory input the following command
hdfs dfs -mkdir -p /user/

8
OUTPUT

9
RESULT

Basic HDFS commands has been performed and a directory has been created

10
Exercise No: 4
HDFS SHELL COMMANDS – FILE FOLDER COMMANDS
Date: 10.08.2022

AIM
To perform HDFS shell commands

PROCEDURE
Step 1: To copy a file from local FS to HDFS, run the following code
hdfs dfs -copyFromLocal /home/hduser1/ls /user/input
Step 2: To move a file from local FS to HDFS, run the following code
hdfs dfs -moveFromLocal /home/hduser1/This /user/input
Step 3: To copy a file from HDFS FS to local FS,
hdfs dfs -copyToLocal /user/input/This /home/hduser1/Desktop/
Step 4: To display the content of a file,
hdfs dfs -cat /user/input/This
Step 5: To list the directory,
hdfs dfs -ls -R /
Step 6: To count the number of directories (including default root directory) and files,
hdfs dfs -count /
Step 7: To list the disk usage/size for each directory,
hdfs dfs -du -h /
Step 8: To display the last KB of a file,
hdfs dfs -tail /user/input/ls
Step 9: To test if path, file and directory exists,
hdfs dfs -test -e /user/input/ls
hdfs dfs -test -f /user/input/ls
hdfs dfs -test -d /user/input/ls
echo $?
Step 10: To create a new file with 0 bytes,
hdfs dfs -touchz /user/new1
Step 11: To remove all the files and folders,
hdfs dfs -rm -R -skipTrash /user/*
hdfs dfs -rm -R -skipTrash /user/

11
OUTPUT

12
13
14
RESULT

File folder HDFS commands has been performed and verified

15
Exercise No: 5
HDFS ADMIN COMMANDS
Date: 24.08.2022

AIM
To perform HDFS admin commands

PROCEDURE

Step 1: Disk usage- disk usage by HDFS displayed in bytes


hdfs dfs -du -h /
Step 2: checksum- calculates the chunk size of a file, run the following code
hdfs dfs -checksum /user/file1.txt
Step 3: chown- changing the file from one group to other
hdfs dfs -chown /hduser1/usr /uder/file1.txt
Step 4: appendtofile- merging contents of two files into one file
hdfs dfs -appendTofile /home/hduser1/desktop/This /user/lyric2.txt
Step 5: expunge- deleting trash in hdfs
hdfs dfs -expunge
Step 6: chgrp- changing the file from one user group to another group
hdfs dfs -chgrp /user/hduser1/lyric2 /user/usr2
Step 7: chmod- changing user permission for a file
hdfs dfs -chmod -r /lyric

OUTPUT

16
RESULT

HDFS admin commands has been performed

17
Exercise No: 6
MAP REDUCE WORD COUNT
Date: 15.09.2022

AIM
To perform map reduce word count

PROCEDURE

Step 1: To delete the tmp directory with name and data node from local system,
change to bin folder and run following commands.
cd /home/hadoop/bin
./hadoop namenode –format
Step 2: To create cluster and start nodes, change to sbin folder and run following
commands.
cd /home/hadoop/sbin
./start-dfs.sh
Step 3: To start the resource manager and check the running nodes,
./start-yarn.sh
Step 4: To check the status of the running nodes, run the below command.
jps
Step 5: Class path describes the locations of the available class files to the Java
Compiler.
export HADOOP_CLASSPATH=$(hadoop classpath)
Step 6: To create a directory,
hdfs dfs -mkdir /wordcount1
Step 7: To put a file or folder,
hdfs dfs –put /home/hduser1/wordcount/input /wordcount1/input
Step 8: The program for Mapper and Reducer is written in java and saved in a .java
file.
Step 9: To compile java file,
javac –classpath ${HADOOP_CLASSPATH} -d <folder path> <java file
path>
Step 10: Class files are created in the designated folder.

18
Step 11: To generate jar file,
jar –cvf <jar file name> <folder path where class files saved>
Step 12: The output files are put into a single jar file.
Step 13: To run the jar file,
hadoop jar <jar file path> <class name> <input file path> <output file
path>
Step 14: To see the output,
hdfs dfs –cat <output file path>

OUTPUT

19
20
RESULT

Map reduce count has been performed

21
Exercise No: 7
MAP REDUCE MAX TEMPERATURE
Date: 23.09.2022

AIM
To perform map reduce max temperature

PROCEDURE

Step 1: To delete the tmp directory with name and data node from local system,
change to bin folder and run following commands.
cd /home/hadoop/bin
./hadoop namenode –format
Step 2: To create cluster and start nodes, change to sbin folder and run following
commands.
cd /home/hadoop/sbin
./start-dfs.sh
Step 3: To start the resource manager and check the running nodes,
./start-yarn.sh
Step 4: To check the status of the running nodes, run the below command.
jps
Step 5: Class path describes the locations of the available class files to the Java
Compiler.
export HADOOP_CLASSPATH=$(hadoop classpath)
Step 6: To create a directory,
hdfs dfs -mkdir /MaxTemp
Step 7: To create an input file,
cat > <input file>
Step 8: To put a file or folder,
hdfs dfs –put /home/hduser1/MaxTemp/input/input.txt /MaxTemp/input
Step 9: The program for Mapper and Reducer is written in java and saved in a .java
file.
Step 10: To compile java file,
javac –classpath ${HADOOP_CLASSPATH} -d <folder path> <java file

22
path>
Step 11: Class files are created in the designated folder.
Step 12: To generate jar file,
jar –cvf <jar file name> <folder path where class files saved>
Step 13: The output files are put into a single jar file.
Step 14: To run the jar file,
hadoop jar <jar file path> <class name> <input file path> <output file
path>
Step 15: To see the output,
hdfs dfs –cat <output file path>

OUTPUT

23
24
25
26
RESULT

Map reduce max temperature has been performed

27
Exercise No: 8
MONGO DB COMMANDS
Date: 20.10.2022

AIM
To perform Mongo DB commands

PROCEDURE

Step 1: Open Mongodb in command prompt


Step 2: Make a database named mycustomers
Step 3: Create a collection named customers
INSERT: Use the insert command to insert data in to the collection customers.
Step 4: Inserting multiple values into the collection.
Step 5: UPDATE: Command is used to update a value in the collection
Step 6: REMOVE: Removes the value from the collection table

OUTPUT

28
29
30
RESULT

Mongo DB commands has been performed

31
Exercise No: 9
PIG LATIN COMMANDS
Date: 25.10.2022

AIM
To perform pig latin commands

PROCEDURE
Step 1: Start the Hadoop cluster
Step 2: Start the Yarn Resource manager
Step 3: Create a folder called pigdata in the hdfs
Step 4: The pigdata folder will be created in the hdfs view it in the browser
Step 5: Put the files in the hdfs
Step 6: View the files in the browser
Step 7: Display and view one of the file using cat command in the terminal
Step 8: Upload all the txt files using put command into the hdfs and view it in the
browser
Step 9: Open a new terminal and start the pig program using
pig -x local
Step 10: The grunt shell will be opened from where we can execute all the pig latin
commands.
Step 11: Create a new relation called student and load the file from the hdfs system
Step 12: Load student details file from the hdfs file
Step 13: Use dump student command to see the file contents
Step 14: Use describe command to view the data types of the relation
Step 15: Use explain student command to see the schema of the relation
Step 16: Use illustrate command to see the datatypes and one row of the relation
Step 17: Group based on one of the columns
grp_rel=group student_details by city;
dump grp_rel;
Step 18: grp_rel1=group student_details by (city,age);
Step 19: Co Group: When there are two or more relations
cogroup student_details by age employee by age
Step 20: Join:

32
Inner join
Customer_orders=JOIN customers by id, orders by order_id;
Dump customer_orders;
Step 21: Outer join
Outer_left=JOIN customers by id LEFT OUTER,orders by customer_id;
Step 22: dump Outer_left;
Union: Union just combines two relations with all its columns
New_rel=UNION relation1, relation2;

OUTPUT

33
34
35
36
37
38
RESULT
Pig commands has been executed

39
Exercise No: 10
MAP REDUCE MATRIX MULTIPLICATION
Date: 27.10.2022

AIM
To perform map reduce matrix multiplication

PROCEDURE

Step 1: To delete the tmp directory with name and data node from the local system,
change to the bin folder and run the following commands.
cd /home/hadoop/bin
./hadoop namenode –format
Step 2: To create cluster and start nodes, change to sbin folder and run following
commands.
cd /home/hadoop/sbin
./start-dfs.sh
Step 3: To start the resource manager and check the running nodes,
./start-yarn.sh
Step 4: To check the status of the running nodes, run the below command.
jps
Step 5: Class path describes the locations of the available class files to the Java
Compiler.
export HADOOP_CLASSPATH=$(hadoop classpath)
Step 6: To create a directory,
hdfs dfs -mkdir /Matrix
hdfs dfs -mkdir /Matrix/input
Step 7: To put a file or folder,
hdfs dfs –put /home/hduser1/matrix/ufiles/M.txt /Matrix/input
hdfs dfs –put /home/hduser1/matrix/ufiles/N.txt /Matrix/input
Step 8: The program for Mapper and Reducer is written in java and saved in a .java
file.
Step 9: To compile java file,
javac –classpath ${HADOOP_CLASSPATH} -d <folder path> <java file

40
path>
Step 10: Class files are created in the designated folder.
Step 11: To generate jar file,
jar –cvf <jar file name> <folder path where class files saved>
Step 12: The output files are put into a single jar file.
Step 13: To run the jar file,
hadoop jar <jar file path> <class name> <input file path> <output file
path>
Step 14: To see the output,
hdfs dfs –cat <output file path>

OUTPUT

41
42
43
RESULT
Map reduce matrix calculation has been performed

44

You might also like