Matrix Multiplication of Big Data Using
Matrix Multiplication of Big Data Using
Mais Haj Qasem, Alaa Abu Sarhan, Raneem Qaddoura, and Basel A. Mahafzah
Computer Science Department
The University of Jordan
Amman, Jordan
Abstract— One popular application for big data is matrix parallel. This makes the problem more time feasible.
multiplication, which has been solved using many approaches. MapReduce is among these computing systems [9].
Recently, researchers have applied MapReduce as a new approach MapReduce is a parallel approach that consist of two
to solve this problem. In this paper, we provide a review for matrix sequential tasks which are Map and Reduce tasks. These tasks
multiplication of big data using MapReduce. This review includes
the techniques for solving matrix multiplication using
are implemented with several subtasks. There are many
MapReduce, the time-complexity and the number of mappers applications using MapReduce; such as, MapReduce with K-
needed for each technique. Moreover, this review provides the means for remote-sensing image clustering [10], MapReduce
number of articles published between the year 2010 and 2016 for with decision tree for classification [10], and MapReduce with
each of the following three topics: Big Data, Matrix Multiplication, expectation maximization for text filtering [11]. MapReduce
and MapReduce. has also been used in real-time systems [12] and for job
scheduling [13].
Keywords — Big Data; MapReduce; Matrix Multiplication The rest of the paper is organized as follows. Section 2
presents big data analysis and research trends and a brief
I. INTRODUCTION summary about Matrix Multiplication and MapReduce. Section
Big data analysis is the process of examining and evaluating 3 reviews work related to using the MapReduce in solving the
large and varied data sets. Large size of data is continually matrix multiplication problem and compares between the
generated every day and is useful for many applications related works. Section 4 summarizes and concludes the paper.
including social networks, news feeds, business and marketing,
II. BIG DATA ANALYSIS AND RESEARCH TRENDS
and cloud services. In order to extract useful information from
these large size of data, machine learning algorithms or Big data is a hot research topic which has many applications
algebraic analysis are performed. where complex and huge data should be analyzed. Number of
Matrix multiplication has many related real-life published articles in this topic is shown in Table I and illustrated
applications, which is a fundamental operation in linear algebra. in Fig. 1. The research in this topic is increasing as it is used
It became a problem when matrices are huge and considered as almost anywhere these days in news articles, professional
“big data”. Recently, researchers have found many applications magazines, and social networks like tweets, YouTube videos,
for matrices due to the extensive use of personal computers, and blog discussions. Google scholar is used to extract number
which increased the use of matrices in a wide variety of of articles published for each year using the query string "Big
applications, such as economics, engineering, statistics, and Data" as exact search term. As shown from the figure, number
other sciences [1]. of published articles is significantly increasing from 2010 to
A massive amount of computation is being involved in 2015 and an insignificant decrease is recognized in 2016.
matrix multiplication especially when it is considered as big
TABLE I. BIG DATA PUBLISHED ARTICLES
data, this encouraged researchers to investigate computational
problems thoroughly to enhance the efficiency of the Year No. of Articles
implemented algorithms for matrix multiplication. Hence, over 2010 2,520
the years, several parallel and distributed systems for matrix
2011 4,000
multiplication problem have been proposed to reduce the
communication and computation time [2-4]. 2012 11,200
Parallel and distributed algorithms over various 2013 27,900
interconnection networks; such as Chained-Cubic Tree (CCT),
Hyper Hexa-Cell (HHC), Optical CCT, and OTIS HHC [5-8], 2014 47,500
divide large problems into smaller sub-problems and assign 2015 67,300
each of them to different processors, then all processors run in
2016 56,800
2011 7,720
2012 9,250
2013 9,460
2014 9,670
2015 9,850
2016 10,600
Year No. of Articles MapReduce is a parallel framework for big data, which
contains two jobs when is applied on matrix multiplication:
2010 3380
x First job: the reduce task is inactive, while the map task
2011 4990
is simply used to read the input file and create a pair of
2012 8040 elements for multiplication.
2013 10700 x Second job: the map task implements the multiplication
2014 13300
independently for each pair of elements, while reduce job
combines the results for each output element.
2015 15300
Operations in each job of MapReduce are presented in Table
2016 15100 IV, the first job responsible for reading the input elements from
the input file and the other job does the multiplication and
combination. This schema is called element-to-element
technique since each mapper implements element by element
multiplication, this technique is shown in Fig. 6.
The research in this topic is also increasing as shown in Table Fig. 6. Element by element schema matrix multiplication.
III and illustrated in Fig. 5. Google scholar is used to extract In order to reduce the overall computational and overcome
number of articles published for each year using the query string the disadvantage of the element-by-element technique, a
"MapReduce" as exact search term. As shown from the figure, blocking technique has been used. Sun and Rishe [21] proposed
number of published articles is increasing from 2010 to blocking technique which is MapReduce matrix factorization
2016. Qaddoura
IT-DREPS Conference, Amman, Jordan Dec 6-8, 2017
technique, in order to enhance the efficiency of matrix TABLE VII. COLUMN BY ROW OPERATION
multiplication. Job 1:
Job2: Map Job 2: Reduce
Map
In this technique, also two jobs were used to complete the
Files ‹ … , ‹key,
multiplication process. This technique decomposes first matrix [ * ]… [ ∗
Input … ›
into row vectors while it composes the second matrix into ]›
column vectors, as shown in Fig. 7. Using this technique, the
communication overhead and memory utilization is decreased, ‹ … , ‹key, ‹key, [ * ]+
Output … › [ * ]… [ ∗ ⋯+ [ ∗ ]›
and the computation process for each map is increased.
]›
Operations in each job of MapReduce are presented in Table V.
n: number of rows for the first matrix, m: number of columns for the first matrix or number of rows for
the second matrix, q: number of columns for the second matrix