Week2 - Assignment Solutions
Week2 - Assignment Solutions
Assignment Solution
Week2: MapReduce - Distributed Computing
Framework
TRENDYTECH 9108179578
21
Total Marks 30
Solution:
TRENDYTECH 9108179578
32
or
Qu 2)
TRENDYTECH 9108179578
43
Create a runnable jar file named wordcount.jar from the above code
and execute that jar on the given input files, the output should be
generated in a directory called mapred_output inside home
directory of hdfs.
Share the snapshot of the command used to run the jar file in
terminal.
Solution:
Share the snapshot of the command to run the jar file in terminal.
TRENDYTECH 9108179578
54
Qu 3) What change will you make in the above code if we do not want
any aggregation finally.Just mention the change in the code.
Solution:
job.setNumReduceTasks(0);
TRENDYTECH 9108179578
65
Qu 4) Write the code if we want the words present in the input files -
Hadoop , Elephant to go to one reducer and the other remaining words
to go to the second reducer.
Create a runnable jar file wc_part.jar ,execute the jar file , Share the
snapshot of the command
Solution:
CustomPartitioner.java
Main.java
TRENDYTECH 9108179578
76
Create a runnable jar file wc_part.jar ,execute the jar file , and then
place the output in wc_part_out directoryinside home directory of hdfs.
Share the snapshot of the command and also snapshot of the output.
Output Snapshot:
TRENDYTECH 9108179578
87
Note: You can comment the previous code of Question 4 and write the
logic there itself.
TRENDYTECH 9108179578
98
Solution:
Code: CustomPartitioner.java
TRENDYTECH 9108179578
10
9
TRENDYTECH 9108179578
11
10
Qu 6) (2 Marks)
For the above program, what will happen if you use 3 reducers and in
partitioner class you have below condition:
if key length less than 4 than return 0
else return 1
Please explain.
TRENDYTECH 9108179578
12
11
Solution:
In the partitioner class we have only two return values, that is 0 and 1.
But we have mentioned 3 reducers in our main function. Therefore, the third
reducer won't get any key value pair to work on. Under utilisation of reducer
is what is going to take place.
Qu 7) what will happen if you use 2 reducers and in partitioner class you
have below condition: (2 Marks)
if key length less than 4 - than return 0
if key length >= 4 and <6 return 1
else return 2
Please explain.
A) Do you expect correct output if you run this code without combiner &
why. please explain. (2 Marks + 2 Marks)
B) Do you expect correct output if you run this code with a combiner
class & why. please explain (2 Marks + 2 Marks)
TRENDYTECH 9108179578
13
12
C) In above problem how will you make sure that output is correct along
with the right optimization. What changes will you make. (2 Marks + 2
Marks)
Solution:
B) No, we would not get the correct output in this case. As, combiner
would do some local aggregations, therefore now the key value pair wouldn't
be (<key>,1). But would be (<key>,some_no). Where some_no is the no. of
occurrences of that particular key in that particular mapper. And now after
shuffle and sort, the reducer gets something like (<key>,{3,5,8,2,7,..}) as the
input, and the correct output would be sum of all integers in the list and not, the
length of the list.
C) I would use a combiner which would take care of local aggregations. And
then as reducer code would use count+=value. Instead of count+=1;
Qu 9) (2 Marks)
A) What can be the use case when reducer is not required. Please
explain one such use case.
TRENDYTECH 9108179578
14
13
Solution: When we are doing some filtering of data. Like, removing all nos.
and punctuations from a file.
(2 Marks)
C)Will shuffle and sort come into play when there is no reducer? Please
explain why? (2 Marks)
Solution:
No, there won't be any shuffle & sort when there are no reducers, and
mapper's output will be the final one. Shuffle & sort is only required to feed in
data to the reducer in a key and a list of values pair format. But since there is
no reducer,mapper's output is the final one.
In Java:
TRENDYTECH 9108179578
15
14
If you want a specific range of values, you have to multiply the returned
value with the magnitude of the range. For example, if you want to get
the random number between 0 to 20, the resultant has to be multiplied
by 20 to get the desired result.
In word count problem ,Consider you are using 2 reducers and we have
written the custom partitioning logic as below:
What is the behaviour of above code. Please explain what do you feel
and why? Do you suggest any changes in the above code ?
Solution:
TRENDYTECH 9108179578
16
15
Changes
***********************************************************
TRENDYTECH 9108179578