Hadoop Lab
Hadoop Lab
2016
Table of Contents
Hadoop Lab ....................................................................................................................................................................... 2
Data Node Calculation ................................................................................................................................................. 2
Linux Commands .......................................................................................................................................................... 3
HDFS Commands .......................................................................................................................................................... 4
You have a Hadoop Cluster with replication factor = 3 and block size = 64 MB.
In this case, the number of DataNodes required to store would be:
Total amount of Data * Replication Factor / Disk Space available on each DataNode
100 * 3 / 10
30 DataNodes
Now, let's assume you need to process this 100 TB of data using MapReduce.
And, reading 100 TB data at a speed of 100 MB/s using only 1 node would take:
Total data / Read-write speed
100 * 1024 * 1024 / 100
1048576 seconds
291.27 hours
So, with 30 DataNodes you would be able to finish this MapReduce job in:
291.27 / 30
9.70 hours
1. Problem Statement
How many such Data Nodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster?
1. Problem Solution
1.1 Time required for reading the data using Single DataNode
1 DataNode takes: -
Time taken by 1 DataNode to read the 100TB data / Total time given to finish the read
= (1048576 seconds/60)/5 minutes
= 3495.253333 Data Nodes
So, you would need ~ 3495 such DataNodes to read the 100TB data in 5 minutes
HDFS Commands
LS command:
Displays List of Files and Directories in HDFS file Path
Command: hadoop fs –ls /
MKDIR command:
It creates the directory in HDFS
Syntax: hadoop fs–mkdir /directory_name
E.g: hadoop fs –mkdir /Bigdata
DU command:
Displays the summary of file lengths.
Syntax: hadoop fs–du –s /path/to/file_in_hdfs
Command: hadoop fs –du –s / Bigdata /test
Note: Here test is a file that exists in HDFS in the directory Bigdata
TOUCHZ command:
Create a file in HDFS with file size 0 bytes
Syntax: hadoop fs–touchz /directory/filename
E.g: hadoop fs –touchz / Bigdata /sample
Note: Here we are trying to create a file named “sample” in the directory ‘Bigdata of hdfs with
file size 0 bytes.
CAT command:
Copies source paths to stdout.
Syntax: hadoop fs–cat /path/to/file_in_hdfs
Command: hadoop fs –cat / Bigdata /test
Note: Here test is a file that exists in HDFS in the directory Bigdata
TEXT command:
Takes a source file and outputs the file in text format. (Same as Cat command)
Syntax: hadoop fs–text /path/to/file_in_hdfs
Command: hadoop fs –text / Bigdata test
Note: Here test is a file that exists in HDFS in the directory Bigdata
copyFromLocal command :
Copy the file from Local file system to HDFS.
Syntax: hadoop fs -copyFromLocal <localsrc> URI
E.g.: hadoop fs –copyFromLocal /home/ Bigdata/Desktop/test / Bigdata
Note: Here test is the file present in the local directory - /home/Bigdata/Desktop
copyToLocal command :
Copy the file from HDFS to Local File System.
Syntax: hadoop fs -copyToLocal URI <localdst>
Command: hadoop fs –copyToLocal / Bigdata/test /home/ Bigdata
Note: Here test is a file present in Bigdata directory of HDFS
PUT command:
Copy single source, or multiple sources from local file system to the destination file system.
Syntax: hadoop fs -put <localsrc> ... <dst>
Command: hadoop fs –put /home/ Bigdata/Desktop/test /user
4|Hadoop Lab P age
Note: copyFromLocal is similar to put command, except that the source is restricted to a local
file reference
GET command:
Copy files from hdfs to the local file system.
Syntax: hadoop fs -get [-ignorecrc] [-crc] <src><localdst>
E.g.: hadoop fs –get /user/test /home/ Bigdata
Note: copyToLocal is similar to get command, except that the destination is restricted to a
local file reference.
COUNT command:
Count the number of directories, files and bytes under the paths that match the specified file
pattern.
Command: hadoop fs –count /user
RM command:
Remove the file from HDFS.
Syntax: hadoop fs–rm /path/to/file_in_hdfs
Command: hadoop fs –rm / Bigdata/test
RMR command:
Remove the directory to HDFS
Syntax: hadoop fs–rmr /path/to/directory_in_hdfs
Command: hadoop fs –rmr / Bigdata/