T05 MapReduce
T05 MapReduce
Hadoop Daemons
◼ HDFS Daemons
◼ NameNode (NN)
◼ Secondary NameNode (SNN)
◼ DataNode (DN)
◼ MapReduce Daemons
◼ JobTracker (JT)
◼ TaskTracker (TT)
2
3
Hadoop Architecture Description
4
Advantages of MapReduce
◼ Parallel Processing
5
MapReduce Process Flow
6
Hadoop Processing Framework
7
MapReduce Example: Word Count Program
8
9
What is MapReduce
◼ Users specify a map function that processes a key/value pair to generate a set of
intermediate key/value pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key.
◼ Programs written in this functional style are automatically parallelized and executed
on a large cluster of commodity machines.
10
MapReduce Runtime System
◼ This allows programmers without any experience with parallel and distributed
systems to easily utilize the resources of a large distributed system
11
MapReduce Programming Components
TT: RecordReader
(k1, v1)
Mapper
(k2, v2)
(k3, <v3>)
Reducer
(k4, v4)
TT: RecordWriter
12
Combine
13
Partition
14
Fault Tolerance
15
Locality
16
Limitations of Hadoop
◼ Issue with small files. Not suited to processing many files simultaneously.
NameNode gets overloaded.
◼ Slow processing Speed. Hadoop has many lines of code.
◼ Latency: Each time converting data to key-value format by Mapper and then by
Reducer takes time.
◼ Security: Hadoop is missing encryption. It supports Kerberos authentication, which
is hard to manage. Complex applications are challenging to manage and thus their
data can be at risk.
◼ No real-time data processing, only batch
◼ Not easy to use and program. It doesn’t support abstraction
◼ Hadoop is not so efficient for iterative processing of chain of stages in which each
output of the previous stage is the input to the next stage.
17
References
◼ https://www.systutorials.com/hadoop-mapreduce-tutorials/
◼ https://www.tutorialscampus.com/map-reduce/algorithm.htm
◼ https://www.journaldev.com/8848/mapreduce-algorithm-example
18
Acknowledgment
◼ Many of the figures in the slides are copied from Simplilearn and Edureka
19
Thank You
20