By Christian Mechem and Geoff Crowley
By Christian Mechem and Geoff Crowley
MapReduce was invented by engineers at Google to respond to the massive amount of data
they were collecting from the web.
Distributing this data to numerous computers and parallelizing computations on that data
presented significant work for programmers.
To solve the problem, they created a new programming model based off the functional
programming paradigm.
In functional programming computations never modify data and order of operations does not
matter. These concepts were applied in MapReduce to employ fault tolerance and parallelism.
Immutable and Redundant Data
• The MapReduce model works with data and computations that are
independent of each other.
• This is the simplest implementation of parallel processes because MapReduce
computations can be performed locally on each node with no communication
required between nodes (MasterServer excepted).
• Massive amounts of data are processed parallelly on nodes in a cluster each
using the same MapReduce function.
• Final key, value pairs are written to a GFS or Global File System where the
aggregate final results are stored.
MapReduce uses the Principle of Locality
• Dean, Jeffrey, and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” GoogleAPI, vol. 51, no. 1,
2008, pp. 107–113., doi:10.1145/1327452.1327492.
• Fedak, Gilles, et al. “Future of MapReduce for Scientific Computing.” Proceedings of the Second International Workshop on
MapReduce and Its Applications - MapReduce 11, 2011, doi:10.1145/1996092.1996108.
• Guo, Zhenhua, et al. “Investigation of Data Locality and Fairness in MapReduce.” Proceedings of Third International Workshop
on MapReduce and Its Applications Date - MapReduce 12, 2012, doi:10.1145/2287016.2287022.
• Pearlman, Shana. “MapReduce 101: What It Is & How to Get Started - Talend.” Talend Real-Time Open Source Data Integration
Software, www.talend.com/resources/what-is-mapreduce/.
• Roebuck, Kevin. MapReduce: High-Impact Strategies. Tebbo, 2011.
• Tan, Yu Shyang. “MapReduce and Its Applications in Heterogeneous Environment.” doi:10.32657/10356/46718.
• Dharanipragada, Janakiram, et al. “Generate-Map-Reduce: An Extension to Map-Reduce to Support Shared Data and Recursive
Computations.” Concurrency and Computation: Practice and Experience, vol. 26, no. 2, Apr. 2013, pp. 561–585.,
doi:10.1002/cpe.3018.