0% found this document useful (0 votes)

58 views11 pages

3.1.How Map Reduce Works & 3.2 Anatomy

Uploaded by

sec22it109

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views11 pages

3.1.How Map Reduce Works & 3.2 Anatomy

Uploaded by

sec22it109

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

UNIT III MAP REDUCE FRAMEWORK 9

Developing a Map Reduce Application-How Map Reduce Works-Anatomy of a MapReduce Job Run-Failures-Job
Scheduling-Shuffle and Sort – Task execution - MapReduce Types and Formats- Map Reduce Features-Hadoop
Environment. YARN – Failures in Classic MapReduce and YARN – job scheduling – Shuffle and sort – Task
execution – MapReduce types – Input formats – Output formats.

How Map Reduce Works

Understanding MapReduce in Hadoop

MapReduce is a component of the Apache Hadoop ecosystem, a framework that enhances massive data processing.
Other components of Apache Hadoop include Hadoop Distributed File System (HDFS), Yarn, and Apache Pig.

The MapReduce component enhances the processing of massive data using dispersed and parallel algorithms in the
Hadoop ecosystem. This programming model is applied in social platforms and e-commerce to analyze huge data
collected from online users.

This article provides an understanding of MapReduce in Hadoop. It will enable readers to gain insights on how vast
volumes of data is simplified and how MapReduce is used in real-life applications.

Introduction to MapReduce in Hadoop

MapReduce is a Hadoop framework used for writing applications that can process vast amounts of data on large
clusters. It can also be called a programming model in which we can process large datasets across computer clusters.
This application allows data to be stored in a distributed form. It simplifies enormous volumes of data and large scale
computing.

There are two primary tasks in MapReduce: map and reduce. We perform the former task before the latter. In the
map job, we split the input dataset into chunks. Map task processes these chunks in parallel. The map we use outputs
as inputs for the reduced tasks. Reducers process the intermediate data from the maps into smaller tuples, which
reduces the tasks, leading to the final output of the framework.

The MapReduce framework enhances the scheduling and monitoring of tasks. The failed tasks are re-executed by
the framework. This framework can be used easily, even by programmers with little expertise in distributed
processing. MapReduce can be implemented using various programming languages such as Java, Hive, Pig, Scala,
and Python.

How MapReduce works in Hadoop

An overview of MapReduce Architecture and MapReduce’s phases will help us understand how MapReduce in
Hadoop works.

MapReduce architecture

The following diagram shows a MapReduce architecture.

Fig: MapReduce architecture

Components of MapReduce architecture

Job: This is the actual work that needs to be executed or processed

Task: This is a piece of the actual work that needs to be executed or processed. A MapReduce job comprises many
small tasks that need to be executeUsed.

Job Tracker: This tracker plays the role of scheduling jobs and tracking all jobs assigned to the task tracker.

Task Tracker: This tracker plays the role of tracking tasks and reporting the status of tasks to the job tracker.

Input data: This is the data used to process in the mapping phase.
Output data: This is the result of mapping and reducing.

Client: This is a program or Application Programming Interface (API) that submits jobs to MapReduce. MapReduce
can accept jobs from many clients.

Hadoop MapReduce Master: This plays the role of dividing jobs into job-parts.
Job-parts: These are sub-jobs that result from the division of the main job.

In the MapReduce architecture, clients submit jobs to the MapReduce Master. This master will then sub-divide the
job into equal sub-parts. The job-parts will be used for the two main tasks in MapReduce:
1. Mapping and
2. Reducing.
The developer will write logic that satisfies the requirements of the organization or company. The input data will be
split and mapped.

The intermediate data will then be sorted and merged. The reducer that will generate a final output stored in the
HDFS will process the resulting output.

Fig: Data flow in MapReduce program:

How JobTracker and TaskTrackers work?

For every job submitted for execution in the system, there is one Jobtracker that resides on Namenode and there are
multiple tasktrackers which reside on Datanode.
● A job is divided into multiple tasks which are then run onto multiple data nodes in a cluster.
● It is the responsibility of job tracker to coordinate the activity by scheduling tasks to run on different data
nodes.
● Execution of individual tasks is then to be looked after by task tracker, which resides on every data node
executing part of the job.
● Task tracker’s responsibility is to send the progress report to the job tracker.
● In addition, the task tracker periodically sends a ‘heartbeat‘ signal to the Jobtracker so as to notify him of
the current state of the system.
● Thus the job tracker keeps track of the overall progress of each job. In the event of task failure, the job
tracker can reschedule it on a different task tracker.
Fig: Jobtracker and task trackers work

Components of Task tracker(TT): It consists of a map task and reduces the task. Task trackers report the status
of each assigned job to the job tracker. The following diagram summarizes how jobtracker and task trackers work.

Fig: Jobtracker and task trackers work

3.3. Anatomy of a MapReduce program?

Input data is split into small subsets of data. Map tasks work on these data splits. The intermediate input data from
Map tasks is then submitted to Reduce task after an intermediate process called 'shuffle'. The Reduce task(s) works
on this intermediate data to generate the result of a MapReduce Job.

Hadoop MapReduce jobs are divided into a set of map tasks and reduce tasks that run in a distributed fashion on a
cluster of computers. Each task works on a small subset of the data it has been assigned so that the load is spread
across the cluster.

The input to a MapReduce job is a set of files in the data store that are spread out over the HDFS. In Hadoop, these
files are split with an input format, which defines how to separate files into input split. You can assume that input
split is a byte-oriented view of a chunk of the files to be loaded by a map task.

The map task generally performs loading, parsing, transformation and filtering operations, whereas the reduce
task is responsible for grouping and aggregating the data produced by map tasks to generate final output. This is
the way a wide range of problems can be solved with such a straightforward paradigm, from simple numerical
aggregation to complex join operations and cartesian products.

Phases of MapReduce: The MapReduce program is executed in three main phases:

1. Mapping,
2. Shuffling,
3. Reducing.
4. Combiner phase (Optional phase)

1. Mapping Phase
● This is the first phase of the program.
● There are two steps in this phase:
● Splitting
● Mapping.

A dataset is split into equal units called chunks (input splits) in the splitting step. Hadoop consists of a
RecordReader that uses TextInputFormat to transform input splits into key-value pairs. The key-value pairs are
then used as inputs in the mapping step.

This is the only data format that a mapper can read or understand. The mapping step contains a coding logic that
is applied to these data blocks. In this step, the mapper processes the key-value pairs and produces an output of the
same form (key-value pairs).

2. Shuffling phase

This is the second phase that takes place after the completion of the Mapping phase. It consists of two main steps:

1. Sorting
2. Merging.
● Sorting step: The key-value pairs are sorted using the keys.
● Merging step: It ensures that key-value pairs are combined.

The shuffling phase facilitates the removal of duplicate values and the grouping of values. Different values with
similar keys are grouped. The output of this phase will be keys and values, just like in the Mapping phase.

3. Reducer phase

In the reducer phase, the output of the shuffling phase is used as the input. The reducer processes this input further
to reduce the intermediate values into smaller values. It provides a summary of the entire dataset. The output from
this phase is stored in the HDFS.

Example of a MapReduce with the three main phases. Splitting is often included in the mapping stage.

Combiner phase

This is an optional phase that’s used for optimizing the MapReduce process. It’s used for reducing the app outputs
at the node level. In this phase, duplicate outputs from the map outputs can be into a single output. The combiner
phase increases speed in the Shuffling phase by improving the performance of Jobs.
<gobal, 50000>
Output Format:

The output format translates the final key/value pair from the reduce function and writes it out to a file by a record
writer. By default, it will separate the key and value with a tab and separate record with a new line character. We
will discuss in our future articles about how to write your own customized output format.

The following diagram shows how all the four phases of MapReduce have been applied.

Benefits of Hadoop MapReduce

● Speed: MapReduce can process huge unstructured data in a short time.
● Fault-tolerance: The MapReduce framework can handle failures.
● Cost-effective: Hadoop has a scale-out feature that enables users to process or store data in a cost-effective
manner.
● Scalability: Hadoop provides a highly scalable framework. MapReduce allows users to run applications from
many nodes.
● Data availability: Replicas of data are sent to various nodes within the network. This ensures copies of the
data are available in the event of failure.
● Parallel Processing: In MapReduce, multiple job-parts of the same dataset can be processed in a parallel
manner. This reduces the time taken to complete a task.

Applications of Hadoop MapReduce

The following are some of the practical applications of the MapReduce program.

E-commerce : E-commerce companies such as Walmart, E-Bay, and Amazon use MapReduce to analyze buying
behavior. MapReduce provides meaningful information that is used as the basis for developing product
recommendations. Some of the information used include site records, e-commerce catalogs, purchase history, and
interaction logs.

Social networks: The MapReduce programming tool can evaluate certain information on social media platforms
such as Facebook, Twitter, and LinkedIn. It can evaluate important information such as who liked your status and
who viewed your profile.

Entertainment: Netflix uses MapReduce to analyze the clicks and logs of online customers. This information helps
the company suggest movies based on customers’ interests and behavior.

Conclusion: MapReduce is a crucial processing component of the Hadoop framework. It’s a quick, scalable, and
cost-effective program that can help data analysts and developers process huge data.

This programming model is a suitable tool for analyzing usage patterns on websites and e-commerce platforms.
Companies providing online services can utilize this framework to improve their marketing strategies.

Big Data Assignment - III

1. Explain architecture and components of Map Reduce Framework with its phases by an example process.(20)
2. Explain How Job Tracker(JT) and Task Trackers(TT) work in Map Reduce Framework?(20)
3. Explain Shuffling and Sorting and the types of I/O Formats in MapReduce process.(20)
4. Explain Task Execution in Hadoop YARN with its workflow architecture.(20)
5. Explain job scheduling types in YARN Framework.(20)

Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
BDA-UNIT-3
No ratings yet
BDA-UNIT-3
29 pages
unit3
No ratings yet
unit3
33 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
25 pages
Unit-4-1
No ratings yet
Unit-4-1
12 pages
BDA U2 - copy
No ratings yet
BDA U2 - copy
79 pages
2 Bda Chapter2 Answer
No ratings yet
2 Bda Chapter2 Answer
9 pages
Big Data Unit-2 PPT part2
No ratings yet
Big Data Unit-2 PPT part2
78 pages
Unit Iv-1
No ratings yet
Unit Iv-1
84 pages
UNIT 3bda
No ratings yet
UNIT 3bda
16 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
5 pages
Unit - III
No ratings yet
Unit - III
37 pages
MapReduce
No ratings yet
MapReduce
14 pages
Cache GCNV Ft1
No ratings yet
Cache GCNV Ft1
222 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
UNIT 3 NOTES (1)
No ratings yet
UNIT 3 NOTES (1)
21 pages
UNIT – III
No ratings yet
UNIT – III
38 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
unit 2
No ratings yet
unit 2
12 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Unit 3
No ratings yet
Unit 3
13 pages
13 11 07 22 31 50 2169 Vhari PDF
100% (1)
13 11 07 22 31 50 2169 Vhari PDF
238 pages
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
No ratings yet
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
45 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
Basic EKG For Dummies
88% (8)
Basic EKG For Dummies
133 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
What Is MapReduce in Hadoop
No ratings yet
What Is MapReduce in Hadoop
5 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Big Data notes (1)
No ratings yet
Big Data notes (1)
13 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
What are Crafty Buildy Strategy Simulation Games_ – How To Market A Game
No ratings yet
What are Crafty Buildy Strategy Simulation Games_ – How To Market A Game
28 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Chapter 2 Solutions
60% (5)
Chapter 2 Solutions
17 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
5 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
What Is MapReduce in Hadoop - Architecture - Example
No ratings yet
What Is MapReduce in Hadoop - Architecture - Example
7 pages
ML Unit-I
No ratings yet
ML Unit-I
121 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Pka de Farmacos
83% (12)
Pka de Farmacos
6 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Hadoop Streaming: Mapreduce
No ratings yet
Hadoop Streaming: Mapreduce
8 pages
The Shane 10 Unit Apartment Building Plan - 83128DC - Architectural Designs - House Plans
No ratings yet
The Shane 10 Unit Apartment Building Plan - 83128DC - Architectural Designs - House Plans
4 pages
A-Level Ict Study Guide
No ratings yet
A-Level Ict Study Guide
13 pages
Nursing Records & Reports
No ratings yet
Nursing Records & Reports
24 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
slidesgo-color-psychology-the-impact-of-color-on-brand-identity-20241201081359gr36
No ratings yet
slidesgo-color-psychology-the-impact-of-color-on-brand-identity-20241201081359gr36
14 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Bda Unit-Iii-R20
No ratings yet
Bda Unit-Iii-R20
44 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Kajian Geometri Jalan Tambang Berdasarkan Aashto Dan Kepmen No 1827/K/30/Mem/2018 Pada Penambangan Andesit Di PT XYZ, Kecamatan Rumpin, Kabupaten Bogor, Provinsi Jawa Barat
No ratings yet
Kajian Geometri Jalan Tambang Berdasarkan Aashto Dan Kepmen No 1827/K/30/Mem/2018 Pada Penambangan Andesit Di PT XYZ, Kecamatan Rumpin, Kabupaten Bogor, Provinsi Jawa Barat
10 pages
Chemistry Jee Main Jan 2023 24 01 2023 A N S 2 Memory Based Questions
No ratings yet
Chemistry Jee Main Jan 2023 24 01 2023 A N S 2 Memory Based Questions
12 pages
Matilda – Harry Styles Sheet music for Piano (Sol…
No ratings yet
Matilda – Harry Styles Sheet music for Piano (Sol…
1 page
North Cleveland High School
100% (1)
North Cleveland High School
10 pages
God Eater
No ratings yet
God Eater
31 pages
Robert Merton Final
No ratings yet
Robert Merton Final
9 pages
Michael Moores Thesis Bowling Columbine
100% (3)
Michael Moores Thesis Bowling Columbine
4 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Beyond Schein Dental
No ratings yet
Beyond Schein Dental
9 pages
Big Data Analytics Mid 2
No ratings yet
Big Data Analytics Mid 2
9 pages
Research Paper
No ratings yet
Research Paper
25 pages
Ring Frame Cop Building
88% (8)
Ring Frame Cop Building
23 pages
Compliance Delivery Analyst - Job Description - JG5
No ratings yet
Compliance Delivery Analyst - Job Description - JG5
4 pages
Defining Relative Clauses
No ratings yet
Defining Relative Clauses
15 pages
Biffi
No ratings yet
Biffi
12 pages
Hadoop
No ratings yet
Hadoop
34 pages
Sa - Jis g3118
No ratings yet
Sa - Jis g3118
2 pages
Maths Project On Statistics
No ratings yet
Maths Project On Statistics
7 pages
Sinamics Sm120 CM en
No ratings yet
Sinamics Sm120 CM en
2 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Oilseparators According EN858
No ratings yet
Oilseparators According EN858
11 pages
Activity Sheet Week 1 TLE 10 Needle Craft Genirose R.albaladejo
100% (1)
Activity Sheet Week 1 TLE 10 Needle Craft Genirose R.albaladejo
7 pages
Dra Aft Surve Ey: Proc Cedures and Cal Lculation N: Readi Ing The Draf Ftmark of TH He Ship
No ratings yet
Dra Aft Surve Ey: Proc Cedures and Cal Lculation N: Readi Ing The Draf Ftmark of TH He Ship
4 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet

3.1.How Map Reduce Works & 3.2 Anatomy

Uploaded by

3.1.How Map Reduce Works & 3.2 Anatomy

Uploaded by

UNIT III MAP REDUCE FRAMEWORK 9

How Map Reduce Works

Introduction to MapReduce in Hadoop

How MapReduce works in Hadoop

The following diagram shows a MapReduce architecture.

Components of MapReduce architecture

Job: This is the actual work that needs to be executed or processed

Fig: Data flow in MapReduce program:

How JobTracker and TaskTrackers work?

Fig: Jobtracker and task trackers work

3.3. Anatomy of a MapReduce program?

Phases of MapReduce: The MapReduce program is executed in three main phases:

Benefits of Hadoop MapReduce

Applications of Hadoop MapReduce

Big Data Assignment - III

You might also like