BDA Chapter 3

MapReduce is a programming model in Hadoop designed for processing large datasets in parallel by dividing tasks into independent units. It consists of three main phases: map, shuffle, and reduce, where mappers transform data and reducers aggregate results. The framework offers features like fault tolerance, automatic parallelization, and language independence, simplifying distributed programming for developers.

Uploaded by

jaitech110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views17 pages

BDA Chapter 3

Uploaded by

jaitech110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

BDA-Chapter-3

Map Reduce
Map Reduce
• MapReduce in Hadoop is nothing but the processing model in
Hadoop.
• The programming model of MapReduce is designed to process huge
volumes of data parallelly by dividing the work into a set of
independent tasks.
Map Reduce Layer
• MapReduce is a patented software framework introduced by Google to support
distributed computing on large datasets on clusters of computers.
• It is basically an operative programming model that runs in the Hadoop background
providing simplicity, scalability, recovery, and speed, including easy solutions for data
processing.
• This MapReduce framework is proficient in processing a tremendous amount of data
parallelly on large clusters of computational nodes.
• MapReduce is a programming model that allows you to process your data across an
entire cluster. It basically consists of Mappers and Reducers that are different scripts you
write or different functions you might use when writing a MapReduce program.
• Mappers have the ability to transform your data in parallel across your computing cluster
in a very efficient manner
• Reducers are responsible for aggregating your data together.
• Mappers and Reducers put together can be used to solve complex problems.
MapReduce Phases
MapReduce Phases
• There are mainly three phases in MapReduce:
• the map phase,
• the shuffle phase,
• the reduce phase.
MapReduce Phases
Map Phase
• The phase wherein individuals count the population of their assigned
cities is called the map phase.
• There are some terms related to this phase.
• Mapper: The individual (census taker) involved in the actual
calculation is called a mapper.
• Input split: The part of the city each census taker is assigned with is
known as the input split.
• Key–Value pair: The output from each mapper is a key–value pair. As
can be seen in the image, the key is X and the value is, say, 5.
Shuffle Phase
• The phase in which the values from different mappers are copied or
transferred to reducers is known as the shuffle phase.
• The shuffle phase comes in-between the map phase and the reduce
phase
Reduce Phase
• The large dataset has been broken down into various input splits and
the instances of the tasks have been processed.
• This is when the reduce phase comes into place. Similar to the map
phase, the reduce phase processes each key separately.
• Reducers: The individuals who work in the headquarters are known as
the reducers. This is because they reduce or consolidate the outputs
from many different mappers.
• After the reducer has finally finished the task, a results file is
generated, which is stored in HDFS. Then, HDFS replicates these
results
Working of MapReduce
Working of MapReduce
• Hadoop divides the job into tasks. There are two types of tasks:
• Map tasks (Splits & Mapping)
• Reduce tasks (Shuffling, Reducing)
• as mentioned above.
• The complete execution process (execution of Map and Reduce tasks, both) is
controlled by two types of entities called a
• Jobtracker: Acts like a master (responsible for complete execution of submitted
job)
• Multiple Task Trackers: Acts like slaves, each of them performing the job
• For every job submitted for execution in the system, there is one Jobtracker that
resides on Namenode and there are multiple tasktrackers which reside
on Datanode.
Working of MapReduce
• A job is divided into multiple tasks which are then run onto multiple data
nodes in a cluster.
• It is the responsibility of job tracker to coordinate the activity by scheduling
tasks to run on different data nodes.
• Execution of individual task is then to look after by task tracker, which
resides on every data node executing part of the job.
• Task tracker’s responsibility is to send the progress report to the job
tracker.
• In addition, task tracker periodically sends ‘heartbeat’ signal to the
Jobtracker so as to notify him of the current state of the system.
• Thus job tracker keeps track of the overall progress of each job. In the
event of task failure, the job tracker can reschedule it on a different task
tracker.
Example
• The architecture consists of four phases of execution namely,
splitting, mapping, shuffling, and reducing.
• Consider the input:
Features of MapReduce
• The important features of MapReduce are illustrated as follows:
• Abstracts developers from the complexity of distributed programming languages
• In-built redundancy and fault tolerance is available
• The MapReduce programming model is language independent
• Automatic parallelization and distribution are available
• Enables the local processing of data
• Manages all the inter-process communication
• Parallelly manages distributed servers running across various tasks
• Manages all communications and data transfers between various parts of the
system module
• Redundancy and failures are provided for the overall management of the whole
process

3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
Hadoop MapReduce for Developers
No ratings yet
Hadoop MapReduce for Developers
4 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
5 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Bda Unit 3
No ratings yet
Bda Unit 3
29 pages
What Is MapReduce in Hadoop - Architecture - Example
No ratings yet
What Is MapReduce in Hadoop - Architecture - Example
7 pages
MapReduce Architecture Explained
No ratings yet
MapReduce Architecture Explained
13 pages
Hadoop Streaming and MapReduce Overview
No ratings yet
Hadoop Streaming and MapReduce Overview
8 pages
MapReduce and HBase in Big Data
No ratings yet
MapReduce and HBase in Big Data
98 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
25 pages
Unit 3
No ratings yet
Unit 3
33 pages
Understanding Hadoop MapReduce Framework
No ratings yet
Understanding Hadoop MapReduce Framework
9 pages
MapReduce and HBase in Big Data Analysis
No ratings yet
MapReduce and HBase in Big Data Analysis
20 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
MapReduce in Hadoop Explained
No ratings yet
MapReduce in Hadoop Explained
5 pages
MapReduce Applications in Big Data Analytics
No ratings yet
MapReduce Applications in Big Data Analytics
23 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Understanding MapReduce Programming
No ratings yet
Understanding MapReduce Programming
32 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Anatomy of Hadoop MapReduce Jobs
No ratings yet
Anatomy of Hadoop MapReduce Jobs
11 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
14 pages
MapReduce Framework in Big Data
No ratings yet
MapReduce Framework in Big Data
46 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Bda Unit 2
No ratings yet
Bda Unit 2
54 pages
MapReduce Programming with Hadoop Guide
No ratings yet
MapReduce Programming with Hadoop Guide
15 pages
MapReduce Framework Overview and Tasks
No ratings yet
MapReduce Framework Overview and Tasks
34 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
Big Data Lecture # 07
No ratings yet
Big Data Lecture # 07
21 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
Anatomy of MapReduce in Hadoop
No ratings yet
Anatomy of MapReduce in Hadoop
37 pages
Hadoop MapReduce for Big Data
No ratings yet
Hadoop MapReduce for Big Data
5 pages
Understanding MapReduce Workflows
No ratings yet
Understanding MapReduce Workflows
38 pages
Unit 3
No ratings yet
Unit 3
27 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Client-NameNode-DataNode Interaction Guide
No ratings yet
Client-NameNode-DataNode Interaction Guide
21 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
120 pages
Unit 3 - Map Reduce Applications
No ratings yet
Unit 3 - Map Reduce Applications
25 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
No ratings yet
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
5 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
MapReduce in Hadoop: Big Data Solutions
No ratings yet
MapReduce in Hadoop: Big Data Solutions
15 pages
MapReduce and HDFS Architecture Explained
No ratings yet
MapReduce and HDFS Architecture Explained
9 pages
Big Data Analytics Overview and MapReduce
No ratings yet
Big Data Analytics Overview and MapReduce
15 pages
Bda Unit 3
No ratings yet
Bda Unit 3
14 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
3 pages
Hadoop MapReduce & Hive Overview
No ratings yet
Hadoop MapReduce & Hive Overview
45 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Big Data Analytics Mid 2
No ratings yet
Big Data Analytics Mid 2
9 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
UNIT III Notes
No ratings yet
UNIT III Notes
24 pages
Radar For Object Detection
No ratings yet
Radar For Object Detection
12 pages
Lab
No ratings yet
Lab
6 pages
TRM SW3 Standard Test U6
No ratings yet
TRM SW3 Standard Test U6
3 pages
Sample 8118
No ratings yet
Sample 8118
11 pages
Weather Listening Practice
No ratings yet
Weather Listening Practice
2 pages
BPDC CA24-25 MIDSEM Ans
No ratings yet
BPDC CA24-25 MIDSEM Ans
4 pages
Informal Email Writing Guide
No ratings yet
Informal Email Writing Guide
3 pages
Interview Method Cards
No ratings yet
Interview Method Cards
2 pages
Graduation Ceremony Highlights 2016
No ratings yet
Graduation Ceremony Highlights 2016
2 pages
Introduction to Relational Databases
No ratings yet
Introduction to Relational Databases
116 pages
Relational Database Management System
No ratings yet
Relational Database Management System
77 pages
Verb Tense and Reported Speech Exercises
No ratings yet
Verb Tense and Reported Speech Exercises
2 pages
Class 8th English (U.t) Solution 2024-2025
No ratings yet
Class 8th English (U.t) Solution 2024-2025
2 pages
Capr-Iv 06155
No ratings yet
Capr-Iv 06155
90 pages
54A10 Final
No ratings yet
54A10 Final
61 pages
Ingreso y Acceso a Especialidades Docentes
No ratings yet
Ingreso y Acceso a Especialidades Docentes
7 pages
BEV Whatsapp Documentation
No ratings yet
BEV Whatsapp Documentation
5 pages
Question Bank1
No ratings yet
Question Bank1
8 pages
Research On The National Anthem
No ratings yet
Research On The National Anthem
18 pages
Network Perimeter
No ratings yet
Network Perimeter
15 pages
King Lear Notes
No ratings yet
King Lear Notes
5 pages
衝刺濃縮班reading2
No ratings yet
衝刺濃縮班reading2
19 pages
Long Quote For Praying For Myself On My Birthday - Google Search
No ratings yet
Long Quote For Praying For Myself On My Birthday - Google Search
1 page
S4HANA Integration With Salesforce Configuration Guide
No ratings yet
S4HANA Integration With Salesforce Configuration Guide
63 pages
Celebrating Excellence in Daily Life
No ratings yet
Celebrating Excellence in Daily Life
2 pages
Matalinong Pangangasiwa NG Likas Na Yaman
No ratings yet
Matalinong Pangangasiwa NG Likas Na Yaman
13 pages
Crosscultural Relationships - Speaking Practice
No ratings yet
Crosscultural Relationships - Speaking Practice
3 pages
The Lost Child
100% (1)
The Lost Child
3 pages
Khushi Chauhan Resume 2025 PDF
No ratings yet
Khushi Chauhan Resume 2025 PDF
2 pages
Grade 4 English Lesson Plan: Term 2
No ratings yet
Grade 4 English Lesson Plan: Term 2
76 pages

BDA Chapter 3

Uploaded by

BDA Chapter 3

Uploaded by

BDA-Chapter-3

You might also like