0% found this document useful (0 votes)

222 views17 pages

Introduction To MapReduce

MapReduce is a programming model for processing large datasets across clusters of computers. It works by splitting data, distributing it, and processing it in parallel on the nodes using user-defined map and reduce functions. The results are then aggregated and output. It provides high degrees of parallelism, fault tolerance, and transparency to programmers without needing to deal with complex distributed systems details. Common examples are distributed grep, word count, and sorting of large datasets.

Uploaded by

Quincy Israel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

222 views17 pages

Introduction To MapReduce

Uploaded by

Quincy Israel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 17

Introduction to Google MapReduce

WING Group Meeting 13 Oct 2006 Hendra Setiawan

What is MapReduce?
A programming model (& its associated
implementation) For processing large data set Exploits large set of commodity computers Executes process in distributed manner Offers high degree of transparencies In other words:
simple and maybe suitable for your tasks !!!

Distributed Grep
Split data

Very big data

Split data Split data

grep grep grep grep

matches matches

matches

cat

All matches

Split data

matches

Distributed Word Count

Split data

Very big data

Split data Split data

count count count count

count count

count

merge

merged count

Split data

count

Map Reduce
Very big data M A P Partitioning Function R E D U C E Result

Map:
Accepts input key/value pair Emits intermediate key/value pair

Reduce :
Accepts intermediate key/value* pair Emits output key/value pair

Partitioning Function

Partitioning Function (2)

Default : hash(key) Guarantee:
mod R

Relatively well-balanced partitions Ordering guarantee within partition

Distributed Sort
Map:
emit(key,value)

Reduce (with R=1):

emit(key,value)

MapReduce
Distributed Grep
Map:
if match(value,pattern) emit(value,1)

Reduce:
emit(key,sum(value*))

Distributed Word Count

Map:
for all w in value do emit(w,1)

Reduce:
emit(key,sum(value*))

MapReduce Transparencies
Plus Google Distributed File System : Parallelization Fault-tolerance Locality optimization Load balancing

Suitable for your task if

Have a cluster Working with large dataset Working with independent data (or
assumed) Can be cast into map and reduce

MapReduce outside Google

Hadoop (Java)
Emulates MapReduce and GFS

The architecture of Hadoop MapReduce

and DFS is master/slave
Master Slave MapReduce jobtracker tasktracker DFS namenode datanode

Example Word Count (1)

Map
public static class MapClass extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = ((Text)value).toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } }

Example Word Count (2)

Reduce
public static class Reduce extends MapReduceBase implements Reducer { public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += ((IntWritable) values.next()).get(); } output.collect(key, new IntWritable(sum)); } }

Example Word Count (3)

Main
public static void main(String[] args) throws IOException { //checking goes here JobConf conf = new JobConf();
conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); JobClient.runJob(conf); }

One time setup

set hadoop-site.xml and slaves Initiate namenode Run Hadoop MapReduce and DFS Upload your data to DFS Run your process Download your data from DFS

Summary
A simple programming model for
processing large dataset on large set of computer cluster Fun to use, focus on problem, and let the library deal with the messy detail

References
Original paper
(http://labs.google.com/papers/mapreduce .html) On wikipedia (http://en.wikipedia.org/wiki/MapReduce) Hadoop MapReduce in Java (http://lucene.apache.org/hadoop/) Starfish - MapReduce in Ruby (http://rufy.com/starfish/)

MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Hadoop
No ratings yet
Hadoop
34 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Cloud Computing & MapReduce Basics
No ratings yet
Cloud Computing & MapReduce Basics
55 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Hadoop MapReduce WordCount Guide
No ratings yet
Hadoop MapReduce WordCount Guide
5 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
MapReduce & Hadoop for CS Students
No ratings yet
MapReduce & Hadoop for CS Students
25 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
30 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Distributed Systems: MapReduce Basics
No ratings yet
Distributed Systems: MapReduce Basics
24 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
MapReduce for Data Processing
No ratings yet
MapReduce for Data Processing
7 pages
Introduction To: Ma Ed
No ratings yet
Introduction To: Ma Ed
42 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
45 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
MapReduce and Hadoop Overview
No ratings yet
MapReduce and Hadoop Overview
69 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Hadoop and Spark Overview
No ratings yet
Hadoop and Spark Overview
34 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
Big Data
No ratings yet
Big Data
43 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Lecture 2 - Map Reduce
No ratings yet
Lecture 2 - Map Reduce
20 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
MapReduce for Big Data Analysis
No ratings yet
MapReduce for Big Data Analysis
59 pages
Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy
No ratings yet
Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy
36 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
Bda 03
No ratings yet
Bda 03
10 pages
Risk Assessment Scissor Lift
No ratings yet
Risk Assessment Scissor Lift
5 pages
Strategic Entrepreneurship Notes
No ratings yet
Strategic Entrepreneurship Notes
31 pages
Rules in Conversion, Obversion, Contraposition and Inversion
88% (26)
Rules in Conversion, Obversion, Contraposition and Inversion
13 pages
Grundfos Cm15 1 A R A e Avbe
No ratings yet
Grundfos Cm15 1 A R A e Avbe
9 pages
At The End of The Session You Will Have Adequate Knowledge To Understand
100% (3)
At The End of The Session You Will Have Adequate Knowledge To Understand
248 pages
Nikola Tesla: Visionary Inventor
No ratings yet
Nikola Tesla: Visionary Inventor
6 pages
Freud, Travels and Rome - Luisina Bourband
No ratings yet
Freud, Travels and Rome - Luisina Bourband
6 pages
Jurisprudence Syllabus - NAAC - New
No ratings yet
Jurisprudence Syllabus - NAAC - New
8 pages
History Unit 3
No ratings yet
History Unit 3
74 pages
Bender Gestalt Test Analysis
100% (1)
Bender Gestalt Test Analysis
5 pages
Metric Prefixes Explained
No ratings yet
Metric Prefixes Explained
7 pages
Nervous Shock or Psychiatric Illness (EXAM NOTES)
No ratings yet
Nervous Shock or Psychiatric Illness (EXAM NOTES)
10 pages
Resumen Comfort Zone
No ratings yet
Resumen Comfort Zone
2 pages
Cuaderno Digital Horizontal
No ratings yet
Cuaderno Digital Horizontal
9 pages
Astano Beed 2 3 Detailed Lesson Plan in MTB Mle Iii
No ratings yet
Astano Beed 2 3 Detailed Lesson Plan in MTB Mle Iii
7 pages
Safety Inspection Check List
No ratings yet
Safety Inspection Check List
24 pages
Load Calc - Pipe Rack
No ratings yet
Load Calc - Pipe Rack
11 pages
SPD ERAN7.0 RAN Sharing Feature Introduction
100% (1)
SPD ERAN7.0 RAN Sharing Feature Introduction
62 pages
Portfolio Diversification & Risk
No ratings yet
Portfolio Diversification & Risk
51 pages
Concept Paper Tourism Workshop
No ratings yet
Concept Paper Tourism Workshop
3 pages
Micromark Accessories Product Proposal MASTER
No ratings yet
Micromark Accessories Product Proposal MASTER
20 pages
Skema Kertas 2 NS
No ratings yet
Skema Kertas 2 NS
10 pages
3tcpe Taaot: HRTRDR
No ratings yet
3tcpe Taaot: HRTRDR
1 page
CFT 2
No ratings yet
CFT 2
12 pages
PB Cons Sop 004 Sop Penimbunan Tanah
No ratings yet
PB Cons Sop 004 Sop Penimbunan Tanah
5 pages
SG Series Rotavator Manual
No ratings yet
SG Series Rotavator Manual
13 pages
Cummins Isx QSX Disassembly and Assembly Abby PDF
100% (10)
Cummins Isx QSX Disassembly and Assembly Abby PDF
514 pages
Principles Hazop Fta Eta Fmea
No ratings yet
Principles Hazop Fta Eta Fmea
2 pages
76mm Shipboard Brochure
No ratings yet
76mm Shipboard Brochure
2 pages
Vocabulary Exercises for Students
No ratings yet
Vocabulary Exercises for Students
2 pages

Introduction To MapReduce

Uploaded by

Introduction To MapReduce

Uploaded by

Introduction to Google MapReduce

WING Group Meeting 13 Oct 2006 Hendra Setiawan

Very big data

Split data Split data

grep grep grep grep

Distributed Word Count

Very big data

Split data Split data

count count count count

Partitioning Function (2)

Relatively well-balanced partitions Ordering guarantee within partition

Reduce (with R=1):

Distributed Word Count

Suitable for your task if

MapReduce outside Google

The architecture of Hadoop MapReduce

Example Word Count (1)

Example Word Count (2)

Example Word Count (3)

One time setup

You might also like