Lecture 10 MapReduce Hadoop

This document provides an overview of the MapReduce programming model and framework, as well as Hadoop, an open-source software implementation of MapReduce. It describes how MapReduce allows for distributed, parallel processing of large datasets by dividing work into independent map and reduce tasks. The document outlines the core components of MapReduce applications, including the map and reduce functions, and explains the typical execution process across a cluster. It also discusses fault tolerance, optimization techniques like combining, and criticisms of MapReduce as a general programming model.

Uploaded by

DAVID HUMBERTO GUTIERREZ MARTINEZ

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Lecture 10 MapReduce Hadoop

Uploaded by

DAVID HUMBERTO GUTIERREZ MARTINEZ

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

MI-PDB, MIE-PDB: Advanced Database Systems

http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-MIE-PDB/

Lecture 10:

MapReduce, Hadoop
26. 4. 2016

Lecturer: Martin Svoboda

[email protected]

Author: Irena Holubová

Faculty of Mathematics and Physics, Charles University in Prague
Course NDBI040: Big Data Management and NoSQL Databases
MapReduce Framework
 A programming model + implementation
 Developed by Google in 2008
 To replace old, centralized index structure
 Distributed, parallel computing on large data
Google: “A simple and powerful interface that enables automatic
parallelization and distribution of large-scale computations,
combined with an implementation of this interface that achieves
high performance on large clusters of commodity PCs.”

 Programming model in general:

 Mental model a programmer has about execution of application
 Purpose: improve programmer's productivity
 Evaluation: expressiveness, simplicity, performance
Programming Models
 Parallel programming models
 Message passing
 Independent tasks encapsulating local data
 Tasks interact by exchanging messages

 Shared memory
 Tasks share a common address space
 Tasks interact by reading and writing from/to this space
 Asynchronously
 Data parallelization
 Data are partitioned across tasks
 Tasks execute a sequence of independent operations
MapReduce Framework
 Divide-and-conquer paradigm
 Map breaks down a problem into sub-problems
 Processes input data to generate a set of intermediate key/value
pairs
 Reduce receives and combines the sub-solutions to solve the
problem
 Processes intermediate values associated with the same
intermediate key
 Many real world tasks can be expressed this way
 Programmer focuses on map/reduce code
 Framework cares about data partitioning, scheduling execution
across machines, handling machine failures, managing inter-
machine communication, …
MapReduce
A Bit More Formally
 Map
 Input: a key/value pair
 Output: a set of intermediate key/value pairs
 Usually different domain

 (k1,v1) → list(k2,v2)
 Reduce
 Input: an intermediate key and a set of values for that
key
 Output: a possibly smaller set of values
 The same domain
 (k2,list(v2)) → (k2,possibly smaller list(v2))
MapReduce
Example: Word Frequency

map(String key, String value):

// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");

reduce(String key, Iterator values):

// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(key, AsString(result));
MapReduce
Example: Word Frequency
MapReduce
Application Parts
 Input reader
 Reads data from stable storage
 e.g., a distributed file system
 Divides the input into appropriate size 'splits'
 Prepares key/value pairs
 Map function
 User-specified processing of key/value pairs
 Partition function
 Map function output is allocated to a reducer
 Partition function is given the key (output of Map) and the
number of reducers and returns the index of the desired reducer
 Default is to hash the key and use the hash value modulo the
number of reducers
MapReduce
Application Parts
 Compare function
 Sorts the input for the Reduce function
 Reduce function
 User-specified processing of key/values
 Output writer
 Writes the output of the Reduce function to stable storage
 e.g., a distributed file system
MapReduce
Execution – Step 1
1. MapReduce library in the user program
splits the input files into M pieces
 Typically 16 – 64 MB per piece
 Controllable by the user via optional
parameter
2. It starts copies of the program on a
cluster of machines
MapReduce
Execution – Step 2
 Master = a special copy of the program
 Workers = other copies that are assigned
work by master
 M Map tasks and R Reduce tasks to
assign
 Master picks idle workers and assigns
each one a Map task (or a Reduce task)
MapReduce
Execution – Step 3
 A worker who is assigned a Map task:
 Reads the contents of the corresponding input
split
 Parses key/value pairs out of the input data
 Passes each pair to the user-defined Map
function
 Intermediate key/value pairs produced by the
Map function are buffered in memory
MapReduce
Execution – Step 4
 Periodically, the buffered pairs are written
to local disk
 Partitioned into R regions by the partitioning
function
 Locations of the buffered pairs on the local
disk are passed back to the master
 Itis responsible for forwarding the locations to
the Reduce workers
MapReduce
Execution – Step 5
 Reduce worker is notified by the master about data
locations
 It uses remote procedure calls to read the buffered data
from local disks of the Map workers
 When it has read all intermediate data, it sorts it by the
intermediate keys
 Typically many different keys map to the same Reduce task
 If the amount of intermediate data is too large, an external sort
is used
MapReduce
Execution – Step 6
 A Reduce worker iterates over the sorted
intermediate data
 For each intermediate key encountered:
 It passes the key and the corresponding set of
intermediate values to the user's Reduce function
 The output is appended to a final output file for this
Reduce partition
MapReduce
Function combine
 After a map phase, the mapper transmits over
the network the entire intermediate data file to
the reducer
 Sometimes this file is highly compressible
 User can specify function combine
 Like a reduce function
 It is run by the mapper before passing the job to the
reducer
 Over local data
MapReduce
Counters
 Can be associated with any action that a
mapper or a reducer does
 In addition to default counters
 e.g., the number of input and output key/value
pairs processed
 User can watch the counters in real time to
see the progress of a job
MapReduce
Fault Tolerance
 A large number of machines process a large
number of data → fault tolerance is necessary
 Worker failure
 Master pings every worker periodically
 If no response is received in a certain amount of time,
master marks the worker as failed
 All its tasks are reset back to their initial idle state →
become eligible for scheduling on other workers
MapReduce
Fault Tolerance
 Master failure
 Strategy A:
 Master writes periodic checkpoints of the master data
structures
 If it dies, a new copy can be started from the last
checkpointed state
 Strategy B:
 There is only a single master → its failure is unlikely

 MapReduce computation is simply aborted if the master fails

 Clients can check for this condition and retry the MapReduce
operation if they desire
MapReduce
Stragglers
 Straggler = a machine that takes an unusually
long time to complete one of the map/reduce
tasks in the computation
 Example: a machine with a bad disk
 Solution:
 When a MapReduce operation is close to completion,
the master schedules backup executions of the
remaining in-progress tasks
 A task is marked as completed whenever either the
primary or the backup execution completes
MapReduce
Task Granularity
 M pieces of Map phase and R pieces of Reduce phase
 Ideally both much larger than the number of worker machines
 How to set them?
 Master makes O(M + R) scheduling decisions
 Master keeps O(M * R) status information in memory
 For each Map/Reduce task: state (idle/in-progress/completed)
 For each non-idle task: identity of worker machine
 For each completed Map task: locations and sizes of the R intermediate
file regions
 R is often constrained by users
 The output of each Reduce task ends up in a separate output file
 Practical recommendation (Google):
 Choose M so that each individual task is roughly 16 – 64 MB of input
data
 Make R a small multiple of the number of worker machines we expect to
use
MapReduce Criticism
David DeWitt and Michael Stonebraker – 2008
1. MapReduce is a step backwards in database access based on
 Schema describing data structure
 Separating schema from the application
 Advanced query languages
2. MapReduce is a poor implementation
 Instead of indexes is uses brute force
3. MapReduce is not novel (ideas more than 20 years old and
overcome)
4. MapReduce is missing features common in DBMSs
 Indexes, transactions, integrity constraints, views, …
5. MapReduce is incompatible with applications implemented over
DBMSs
 Data mining, business intelligence, …
Apache Hadoop
 Open-source software framework
 Running of applications on large clusters of
commodity hardware
 Multi-terabyte
data-sets
 Thousands of nodes
 Implements MapReduce
 Derived from Google's MapReduce and Google
File System (GFS)
 Not open-source

http://hadoop.apache.org/
Apache Hadoop
Modules
 Hadoop Common
 Common utilities
 Support for other Hadoop modules
 Hadoop Distributed File System (HDFS)
 Distributed file system
 High-throughput access to application data
 Hadoop YARN
 Framework for job scheduling and cluster resource management
 Hadoop MapReduce
 YARN-based system for parallel processing of large data sets
HDFS (Hadoop Distributed File System)
Basic Features

 Free and open source

 High quality
 Crossplatform
 Pure Java
 Has bindings for non-Java programming languages

 Fault-tolerant
 Highly scalable
HDFS
Data Characteristics
 Assumes:
 Streaming data access
 Batch processing rather than interactive user access
 Large data sets and files
 Write-once / read-many
 A file once created, written and closed does not need to be
changed
 Or not often
 This assumption simplifies coherency
 Optimal applications for this model: MapReduce, web-
crawlers, …
HDFS
Fault Tolerance

 Idea: “failure is the norm rather than exception”

A HDFS instance may consist of thousands of
machines
 Each storing a part of the file system’s data
 Each component has non-trivial probability of failure
→ Assumption: “There is always some component
that is non-functional.”
 Detection of faults
 Quick, automatic recovery
HDFS
NameNode, DataNodes
 Master/slave architecture
 HDFS exposes file system namespace
 File is internally split into one or more blocks
 Typical block size is 64MB (or 128 MB)
 NameNode = master server that manages the file
system namespace + regulates access to files by clients
 Opening/closing/renaming files and directories
 Determines mapping of blocks to DataNodes
 DataNode = serves read/write requests from clients +
performs block creation/deletion and replication upon
instructions from NameNode
 Usually one per node in a cluster
 Manages storage attached to the node that it runs on
HDFS
Namespace
 Hierarchical file system
 Directories and files
 Create, remove, move, rename, ...
 NameNode maintains the file system
 Any meta information changes to the file system are
recorded by the NameNode
 An application can specify the number of replicas
of the file needed
 Replication factor of the file
 The information is stored in the NameNode
HDFS
Data Replication
 HDFS is designed to store very large files across
machines in a large cluster
 Each file is a sequence of blocks
 All blocks in the file are of the same size
 Except the last one
 Block size is configurable per file
 Blocks are replicated for fault tolerance
 Number of replicas is configurable per file
HDFS
How NameNode Works?
 Stores HDFS namespace
 Uses a transaction log called EditLog to record every
change that occurs to the file system’s meta data
 E.g., creating a new file, change in replication factor of a file, ..
 EditLog is stored in the NameNode’s local file system
 FsImage – entire file system namespace + mapping of
blocks to files + file system properties
 Stored in a file in NameNode’s local file system
 Designed to be compact
 Loaded in NameNode’s memory
 4 GB of RAM is sufficient
HDFS
How DataNode Works?
 Stores data in files in its local file system
 Has no knowledge about HDFS file system
 Stores each block of HDFS data in a separate file
 Does not create all files in the same directory
 Local file system might not be support it
 Uses heuristics to determine optimal number of files per
directory
Hadoop MapReduce
 MapReduce requires:
 Distributed file system
 Engine that can distribute, coordinate, monitor and
gather the results
 Hadoop: HDFS + JobTracker + TaskTracker
 JobTracker (master) = scheduler
 TaskTracker (slave per node) – is assigned a Map or
Reduce (or other operations)
 Map or Reduce run on a node → so does the TaskTracker
 Each task is run on its own JVM
MapReduce
JobTracker (Master)

 Like a scheduler:
1. A client application is sent to the JobTracker
2. It “talks” to the NameNode (= HDFS master) and
locates the TaskTracker (Hadoop client) near the
data
3. It moves the work to the chosen TaskTracker node
MapReduce
TaskTracker (Client)
 Accepts tasks from JobTracker
 Map, Reduce, Combine, …
 Input, output paths
 Has a number of slots for the tasks
 Execution slots available on the machine (or machines on the
same rack)
 Spawns a separate JVM for execution of a task
 Indicates the number of available slots through the
hearbeat message to the JobTracker
 A failed task is re-executed by the JobTracker

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
77% (13)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Chapter 2 PPT 02
100% (1)
Chapter 2 PPT 02
94 pages
Chapter 1 Basics of Process Engineers PDF
100% (1)
Chapter 1 Basics of Process Engineers PDF
23 pages
Parallel Programming, Mapreduce Model: Unit Ii
No ratings yet
Parallel Programming, Mapreduce Model: Unit Ii
47 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Map Reduce
No ratings yet
Map Reduce
69 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
AAAI2011 Tutorial Slides
No ratings yet
AAAI2011 Tutorial Slides
213 pages
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
No ratings yet
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
26 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Distributed Systems: 18. Mapreduce
No ratings yet
Distributed Systems: 18. Mapreduce
39 pages
Mapreduce: Simpli - Ed Data Processing On Large Clusters
No ratings yet
Mapreduce: Simpli - Ed Data Processing On Large Clusters
4 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
Chapter 6
No ratings yet
Chapter 6
57 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Map reduce
No ratings yet
Map reduce
35 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Map Reduce Intro CS4961-L22
No ratings yet
Map Reduce Intro CS4961-L22
20 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
UNIT – III
No ratings yet
UNIT – III
38 pages
Introduction To: Ma Ed
No ratings yet
Introduction To: Ma Ed
42 pages
He-Phan-Bo - Thoai-Nam - Distributedsystem - 18 - Mapreduce - (Cuuduongthancong - Com)
No ratings yet
He-Phan-Bo - Thoai-Nam - Distributedsystem - 18 - Mapreduce - (Cuuduongthancong - Com)
31 pages
Hadoop: A Seminar Report On
No ratings yet
Hadoop: A Seminar Report On
28 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Data Science
No ratings yet
Data Science
7 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
No ratings yet
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
27 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Hadoop - Mapreduce (1)
No ratings yet
Hadoop - Mapreduce (1)
5 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
MapReduce - Simpli Ed Data Processing On Large Clusters
No ratings yet
MapReduce - Simpli Ed Data Processing On Large Clusters
22 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Map Reduce PDF
No ratings yet
Map Reduce PDF
29 pages
Hadoop
No ratings yet
Hadoop
34 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Introduction to batch processing
No ratings yet
Introduction to batch processing
23 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Michelin Technical Databook 2021
100% (1)
Michelin Technical Databook 2021
137 pages
Tergitol np9 Surfactant Tds
No ratings yet
Tergitol np9 Surfactant Tds
2 pages
B9876 PDF
No ratings yet
B9876 PDF
21 pages
Print Soal BAHASA INGGRIS TINGKAT LANJUT KELAS 11 SEMESTER 2 TH 2023-2024 MISS ICHA BEAUTY
No ratings yet
Print Soal BAHASA INGGRIS TINGKAT LANJUT KELAS 11 SEMESTER 2 TH 2023-2024 MISS ICHA BEAUTY
12 pages
MODULAR
No ratings yet
MODULAR
10 pages
Mathematics: Quarter 3 - Module1 Parallelogram
No ratings yet
Mathematics: Quarter 3 - Module1 Parallelogram
20 pages
The Physical Self
100% (1)
The Physical Self
4 pages
G. Floor Estimate
No ratings yet
G. Floor Estimate
10 pages
Some Basic Concepts of Chemistry: Multiple Choice Questions
No ratings yet
Some Basic Concepts of Chemistry: Multiple Choice Questions
18 pages
DLP1
No ratings yet
DLP1
6 pages
DT Coursework Examples
100% (2)
DT Coursework Examples
5 pages
Industrial Automation & Robotics Final Exam Assignment
No ratings yet
Industrial Automation & Robotics Final Exam Assignment
2 pages
ZKP8005
No ratings yet
ZKP8005
2 pages
Special Bundle Quants Practice Questions For IBPS RRB PO Prelims 2021 - (Eng Version)
No ratings yet
Special Bundle Quants Practice Questions For IBPS RRB PO Prelims 2021 - (Eng Version)
45 pages
CIRCLES geometry Examples gr 12
No ratings yet
CIRCLES geometry Examples gr 12
5 pages
Determination of Total Chromium in Sediments by FAAS
No ratings yet
Determination of Total Chromium in Sediments by FAAS
5 pages
Fundamental of Diodes e Book
No ratings yet
Fundamental of Diodes e Book
12 pages
Design of Flexible Pavement
No ratings yet
Design of Flexible Pavement
3 pages
Drug Study
No ratings yet
Drug Study
4 pages
Chem 121-1
No ratings yet
Chem 121-1
45 pages
Spesifikasi Komputer
No ratings yet
Spesifikasi Komputer
3 pages
Breast Cancer by Professor B. T. Ugwu
No ratings yet
Breast Cancer by Professor B. T. Ugwu
16 pages
Curriculum Map
No ratings yet
Curriculum Map
2 pages
igcse pre release
No ratings yet
igcse pre release
28 pages
Chocolate and Fats
100% (3)
Chocolate and Fats
2 pages
Nursing of Urinary System: Hema Malini, S.KP, MN, PHD
No ratings yet
Nursing of Urinary System: Hema Malini, S.KP, MN, PHD
4 pages
Ms Quiz Finals
No ratings yet
Ms Quiz Finals
5 pages
Your Brain Is NOT A Computer
No ratings yet
Your Brain Is NOT A Computer
10 pages

Lecture 10 MapReduce Hadoop

Uploaded by

Lecture 10 MapReduce Hadoop

Uploaded by

MI-PDB, MIE-PDB: Advanced Database Systems

Lecturer: Martin Svoboda

Author: Irena Holubová

 Programming model in general:

map(String key, String value):

reduce(String key, Iterator values):

 MapReduce computation is simply aborted if the master fails

 Free and open source

 Idea: “failure is the norm rather than exception”

You might also like