Lecture Notes in Computer Science 8302: Editorial Board
Lecture Notes in Computer Science 8302: Editorial Board
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Vasudha Bhatnagar Srinath Srinivasa (Eds.)
13
Volume Editor
Vasudha Bhatnagar
South Asian University
Department of Computer Science
Akbar Bhavan, Chanakyapuri
New Delhi, India
E-mail: [email protected]
Srinath Srinivasa
International Institute
of Information Technology
Banglore, India
E-mail: [email protected]
CR Subject Classification (1998): H.3, H.2.8, H.2, I.2, H.4, I.5, F.2, G.2, H.5
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and
executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication
or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location,
in its current version, and permission for use must always be obtained from Springer. Permissions for use
may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution
under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of publication,
neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or
omissions that may be made. The publisher makes no warranty, express or implied, with respect to the
material contained herein.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Contemporary digital world comprises text, images, videos, and multiple forms
of semi-structured data that are inter-linked and inter-related in complex net-
works. Pervasive in both commercial and scientific domains, these data present
innumerable opportunities for discovering patterns, accompanied by challenges
of matching magnitude. The data deluge has fueled the creativity of data-curious
researchers and has led to the rapid emergence of new technologies in data an-
alytics. The major impetus has come from the variety, variability, and veracity
of data in addition to their infinitely growing volume (the 4 V’-s of Big Data).
The second edition of the International Conference on Big Data Analytics
(BDA 2013) was held during December 16–18, 2013, in Mysore, India, to con-
gregate researchers, practitioners and policy makers, for sharing their works and
experiences in development of methods and algorithms for big data analytics.
The conference attracted 49 submissions in all, of which 45 were submitted for
the research track and four for the industry track. All submitted papers were sub-
jected to plagiarism check before review. Each paper was reviewed by at-least
three reviewers and the review comments were communicated to the authors.
The review process resulted in the acceptance of nine regular papers and one
short paper for the research track. One industry paper was accepted, leading to
an overall acceptance rate of 22%. This volume includes the accepted papers,
tutorials, and invited papers presented during the conference.
The section “Mining Social Media Data” comprises four papers. The tutorial
article by Mehta and Subramaniam focuses on methods for doing entity analytics
and integration using large twitter data sets. They review the state of the art, and
present new ideas on handling common research problems such as event detection
from social media, summarization, location inference and fusing external data
sources with social data. Sureka and Agrawal address the problem of detecting
copyright infringement of music videos on YouTube. They propose an algorithm
that mines both video and uploader meta-data, and uses a rule-based classifier
to predict the category (original or copyright-violated) of the video. Jain et
al. investigate user behavior based on the temporal dimension of tweets and
relate it to the evolution of topics on Twitter OSN. Based on a novel metric
called “tweet strength,” topics are identified along with the users driving the
evolution. Bhargava et al. study the problem of authorship attribution of tweets,
for forensic purposes. The proposed method extracts stylometric information
from the collected data set to predict authors, using classification algorithms.
The section “Perspectives on Big Data Analytics,” which comprises three
papers, opens with a tutorial paper by Lakshminarayan. The paper presents
some fundamental methods of dimensionality reduction and elaborates on the
main algorithms. The author also points to next-generation methods that seek to
identify structures within high-dimensional data, not captured by second-order
VI Preface
statistics. The invited paper by Mondal discusses the role of crowd-driven data
collection in big data analytics and opportunities presented by such collections.
Kiran addresses the intermittance problem in large transaction databases. The
paper introduces quasi-periodic-frequent patterns, which provide useful informa-
tion and are immune to intermittance problem.
The section “Graph Analytics” consists of papers related to mining of large
graphs. Das and Chakravarthy present a survey of graph algorithms and identify
the challenges of adapting/extending algorithms for the analysis of large graphs
using the Map-Reduce programming model. Tripathy et al. study the character-
istics of complex networks in the game of cricket, where dyadic relationships exist
among a group of players. Properties such as average degree, average strength,
and average clustering coefficients are found to be directly related to the per-
formances of the teams. Parveen and Nair propose techniques for effective and
efficient visualization of small-world networks in a similarity space. An algorithm
for the visual assessment of cluster tendency is presented for efficient hierarchical
graphical representation of large networks.
The section “Practice of Big Data Analytics” consists of three papers describ-
ing practical applications. Elisabeth et al. present a tourist recommender system
using GPS data collected from rental tourist cars. Misra et al. present a case
study to demonstrate the performance advantage of Hadoop-based ETL tools
over the traditional tools. Lakshminarayan and Baron investigate and report on
the of application of big data analytics in manufacturing of integrated circuits.
We gratefully acknowledge the support extended by the University of Delhi
and the University of Aizu. We owe gratitude to MYRA School of Business in
Mysore for organizing the conference and extending their hospitality. Thanks are
also due to our sponsors: E-Bay and IBM India Research Lab. We also thank
all the Program Committee members and external reviewers for their time and
diligent reviews. Ramesh Agrawal performed a plagiarism check on submissions;
thanks to him and his team. The Organization Committee and student volun-
teers of BDA 2013 deserve special mention for their support. Special thanks to
the Steering Committee members. Finally, thanks to EasyChair for making our
task of generating this volume smooth and simple.
Steering Committee
R.K. Arora IIT Delhi, Delhi, India
Subhash Bhalla University of Aizu, Japan
Sharma Chakravarthy University of Texas at Arlington, USA
Rattan Datta Indian Meteorological
Department, Delhi, India
S.K. Gupta IIT, Delhi, India (Chair)
H.V. Jagadish University of Michigan, USA
D. Janakiram IIT Madras, India
N. Vijayaditya Government of
India
Executive Committee
General Chair
D. Janakiram IIT Madras, India
Program Co-chairs
Srinath Srinivasa IIIT, Banglore, India
Vasudha Bhatnagar University of Delhi, India
Organizing Chair
Shalini Urs ISiM, Mysore, India
Publicity Chair
Vikram Goyal IIIT, Delhi, India
Proceedings Chairs
Subhash Bhalla University of Aizu, Japan
Naveen Kumar University of Delhi, India
Industry Chair
Vijay Srinivas Agneeswaran Impetus Labs, India
Tutorials Chair
Jaideep Srivastava University of Minnesota, USA
VIII Organization
Program Committee
Vijay Srinivas Agneeswaran Impetus Labs, Bangalore, India
Ramesh Agrawal Jawaharlal Nehru University, New Delhi, India
Avishek Anand Max Planck Institute, Germany
Amitabha Bagchi Indian Institute of Technology, Delhi, India
Srikanta Bedathur Indraprastha Institute of Information
Technology (IIIT), Delhi, India
Subhash Bhalla University of Aizu, Japan
Raj Bhatnagar University of Cincinnati, USA
Arnab Bhattacharya Indian Institute of Technology, Kanpur, India
Indrajit Bhattacharya IBM Research, India
Gao Cong Nanyang Technological University, Singapore
Prasad Deshpande IBM Research, India
Lipika Dey TCS Innovation Labs, Delhi, India
Dejing Dou University of Oregon, USA
Haimonti Dutta Columbia University, USA
Shady Elbassuoni American University, Beirut, Lebanon
Rajeev Gupta IBM Research, India
Sharanjit Kaur University of Delhi, India
Akhil Kumar Penn State University, USA
Naveen Kumar University of Delhi, USA
Choudur Lakshminarayan Hewlett-Packard Laboratories, USA
Ulf Leser Institut für Informatik, Humboldt-Universität
zu Berlin, Germany
Ravi Madipadaga Carl Zeiss, India
Sameep Mehta IBM Research, India
Mukesh Mohania IBM Research, India
Yasuhiko Morimoto Hiroshima University, Japan
Joydeb Mukherjee Impetus Labs, India
Saikat Mukherjee Siemens, India
Mandar Mutalikdesai Siemens, India
Felix Naumann Hasso-Plattner-Institut, Potsdam, Germany
Hariprasad Nellitheertha Intel, India
Anjaneyulu Pasala Infosys Labs, India
Organization IX
Additional Reviewers
Adhikari, Animesh P, Deepak
Agarwal, Manoj Prateek, Satya
Burgoon, Erin Puri, Charu
Correa, Denzil Rachakonda, Aditya
Gupta, Shikha Ranu, Sayan
Jog, Chinmay Ravindra, Padmashree
Kulkarni, Sumant Sreevalsan-Nair, Jaya
Lal, Sangeeta Telang, Aditya
Table of Contents
Graph Analytics
Challenges and Approaches for Large Graph Analysis Using
Map/Reduce Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Soumyava Das and Sharma Chakravarthy
Complex Network Characteristics and Team Performance in the Game
of Cricket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Rudra M. Tripathy, Amitabha Bagchi, and Mona Jain
Visualization of Small World Networks Using Similarity Matrices . . . . . . 151
Saima Parveen and Jaya Sreevalsan-Nair
XII Table of Contents