100% found this document useful (2 votes)

22 views

Full Download Big Data Analytics Systems Algorithms Applications C.S.R. Prabhu PDF

Data

Uploaded by

otvosamadei

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

22 views

Full Download Big Data Analytics Systems Algorithms Applications C.S.R. Prabhu PDF

Data

Uploaded by

otvosamadei

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Download the full version of the textbook now at textbookfull.

com

Big Data Analytics Systems Algorithms

Applications C.S.R. Prabhu

https://textbookfull.com/product/big-data-
analytics-systems-algorithms-applications-c-s-r-
prabhu/

Explore and download more textbook at https://textbookfull.com

Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Probabilistic data structures and algorithms for big data

applications Gakhov

https://textbookfull.com/product/probabilistic-data-structures-and-
algorithms-for-big-data-applications-gakhov/

textbookfull.com

Leadership Strategies in the Age of Big Data Algorithms

and Analytics First Edition Norton Paley

https://textbookfull.com/product/leadership-strategies-in-the-age-of-
big-data-algorithms-and-analytics-first-edition-norton-paley/

textbookfull.com

Computational Intelligence Applications in Business

Intelligence and Big Data Analytics 1st Edition Vijayan
Sugumaran
https://textbookfull.com/product/computational-intelligence-
applications-in-business-intelligence-and-big-data-analytics-1st-
edition-vijayan-sugumaran/
textbookfull.com

Criminology in Canada theories patterns and typologies

Mccormick

https://textbookfull.com/product/criminology-in-canada-theories-
patterns-and-typologies-mccormick/

textbookfull.com
Security in Computing: 5th Edition Charles P. Pfleeger And
Shari Lawrence Pfleeger

https://textbookfull.com/product/security-in-computing-5th-edition-
charles-p-pfleeger-and-shari-lawrence-pfleeger/

textbookfull.com

Backroads Byways of Georgia David B. Jenkins

https://textbookfull.com/product/backroads-byways-of-georgia-david-b-
jenkins/

textbookfull.com

Polymeric Gene Delivery Systems Yiyun Cheng

https://textbookfull.com/product/polymeric-gene-delivery-systems-
yiyun-cheng/

textbookfull.com

Geotechnics for Sustainable Infrastructure Development

Phung Duc Long

https://textbookfull.com/product/geotechnics-for-sustainable-
infrastructure-development-phung-duc-long/

textbookfull.com

Software Quality The Future of Systems and Software

Development 8th International Conference SWQD 2016 Vienna
Austria January 18 21 2016 Proceedings 1st Edition Dietmar
Winkler
https://textbookfull.com/product/software-quality-the-future-of-
systems-and-software-development-8th-international-conference-
swqd-2016-vienna-austria-january-18-21-2016-proceedings-1st-edition-
dietmar-winkler/
textbookfull.com
African Languages and Literatures in the 21st Century
Esther Mukewa Lisanza

https://textbookfull.com/product/african-languages-and-literatures-in-
the-21st-century-esther-mukewa-lisanza/

textbookfull.com
C. S. R. Prabhu ·
Aneesh Sreevallabh Chivukula ·
Aditya Mogadala · Rohit Ghosh ·
L. M. Jenila Livingston

Big Data
Analytics:
Systems,
Algorithms,
Applications
Big Data Analytics: Systems, Algorithms,
Applications
C. S. R. Prabhu Aneesh Sreevallabh Chivukula
• •

Aditya Mogadala Rohit Ghosh

• •

L. M. Jenila Livingston

Big Data Analytics: Systems,

Algorithms, Applications

123
C. S. R. Prabhu Aneesh Sreevallabh Chivukula
National Informatics Centre Advanced Analytics Institute
New Delhi, Delhi, India University of Technology, Sydney
Ultimo, NSW, Australia
Aditya Mogadala
Saarland University Rohit Ghosh
Saarbrücken, Saarland, Germany Qure.ai
Goregaon East, Mumbai, Maharashtra, India
L. M. Jenila Livingston
School of Computing Science
and Engineering
Vellore Institute of Technology
Chennai, Tamil Nadu, India

ISBN 978-981-15-0093-0 ISBN 978-981-15-0094-7 (eBook)

https://doi.org/10.1007/978-981-15-0094-7
© Springer Nature Singapore Pte Ltd. 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Foreword

Big Data phenomenon has emerged globally as the next wave of technology, which
will influence in a big way and contribute to better quality of life in all its aspects.
The advent of Internet of things (IoT) and its associated Fog Computing paradigm
is only accentuating and amplifying the Big Data phenomenon.
This book by C. S. R. Prabhu and his co-authors is coming up at the right time.
This book ﬁlls in the timely need for a comprehensive text covering all dimensions
of Big Data Analytics: systems, algorithms, applications and case studies along
with emerging research horizons. In each of these dimensions, this book presents a
comprehensive picture to the reader in a lucid and appealing manner. This book can
be used effectively for the beneﬁt of students of undergraduate and post-graduate
levels in IT, computer science and management disciplines, as well as research
scholars in these areas. It also helps IT professionals and practitioners who need to
learn and understand the subject of Big Data Analytics.
I wish this book all the best in its success with the global student community as
well as the professionals.

Dr. Rajkumar Buyya

Redmond Barry Distinguished
Professor, Director, Cloud Computing
and Distributed Systems (CLOUDS)
Lab, School of Computing and
Information Systems, The University
of Melbourne, Melbourne, Australia

v
Preface

The present-day Information Age has produced an overwhelming deluge of digital

data arriving from unstructured sources such as online transactions, mobile phones,
social networks and emails popularly known as Big Data. In addition, with the
advent of Internet of things (IoT) devices and sensors, the sizes of data that will
flow into the Big Data scenario have multiplied many folds. This Internet-scale
computing has also necessitated the ability to analyze and make sense of the data
deluge that comes with it to help intelligent decision making and real-time actions
to be taken based on real-time analytics techniques.
The Big Data phenomenon has been impacting all sectors of business and
industry, resulting in an upcoming new information ecosystem. The term ‘Big Data’
refers to not only the massive volumes and variety of data itself, but also the set of
technologies surrounding it, to perform the capture, storage, retrieval, management,
processing and analysis of the data for the purposes of solving complex problems in
life and in society as well, by unlocking the value from that data more economi-
cally. In this book, we provide a comprehensive survey of the big data origin,
nature, scope, structure, composition and its ecosystem with references to tech-
nologies such as Hadoop, Spark, R and its applications. Other essential big data
concepts including NoSQL databases for storage, machine learning paradigms for
computing, analytics models connecting the algorithms are all aptly covered. This
book also surveys emerging research trends in large-scale pattern recognition,
programming processes for data mining and ubiquitous computing and application
domains for commercial products and services. Further, this book expands into the
detailed and precise description of applications of Big Data Analytics into the
technological domains of Internet of things (IoT), Fog Computing and Social
Semantic Web mining and then into the business domains of banking and ﬁnance,
insurance and capital market before delving into the issues of security and privacy
associated with Big Data Analytics. At the end of each chapter, pedagogical
questions on the comprehension of the chapter contents are added.
This book also describes the data engineering and data mining life cycles
involved in the context of machine learning paradigms for unstructured and
structured data. The relevant developments in big data stacks are discussed with a

vii
viii Preface

focus on open-source technologies. We also discuss the algorithms and models used
in data mining tasks such as search, filtering, association, clustering, classification,
regression, forecasting, optimization, validation and visualization. These techniques
are applicable to various categories of content generated in data streams, sequences,
graphs and multimedia in transactional, in-memory and analytic databases. Big
Data Analytics techniques comprising descriptive and predictive analytics with an
emphasis on feature engineering and model fitting are covered. For feature engi-
neering steps, we cover feature construction, selection and extraction along with
preprocessing and post-processing techniques. For model fitting, we discuss the
model evaluation techniques such as statistical significance tests, cross-validation
curves, learning curves, sufficient statistics and sensitivity analyses. Finally, we
present the latest developments and innovations in generative learning and dis-
criminative learning for large-scale pattern recognition. These techniques comprise
incremental, online learning for linear/nonlinear and convex/multi-objective opti-
mization models, feature learning or deep learning, evolutionary learning for
scalability and optimization meta-heuristics.
Machine learning algorithms for big data cover broad areas of learning such a
supervised, unsupervised and semi-supervised and reinforcement techniques. In
particular, supervised learning subsection details several classification and regres-
sion techniques to classify and forecast, while unsupervised learning techniques
cover clustering approaches that are based on linear algebra fundamentals.
Similarly, semi-supervised methods presented in the chapter cover approaches that
help to scale to big data by learning from largely un-annotated information. We also
present reinforcement learning approaches which are aimed to perform collective
learning and support distributed scenarios.
The additional unique features of this book are about 15 real-life experiences as
case studies which have been provided in the above-mentioned application
domains. The case studies provide, in brief, the experiences of the different contexts
of deployment and application of the techniques of Big Data Analytics in the
diverse contexts of private and public sector enterprises. These case studies span
product companies such as Google, Facebook, Microsoft, consultancy companies
such as Kaggle and also application domains at power utility companies such as
Opower, banking and finance companies such as Deutsche Bank. They help the
readers to understand the successful deployment of analytical techniques that
maximize a company's functional effectiveness, diversity in business and customer
relationship management, in addition to improving the financial benefits. All these
companies handle real-life Big Data ecosystems in their respective businesses to
achieve tangible results and benefits. For example, Google not only harnesses, for
profit, the big data ecosystem arising out of its huge number of users with billions of
web searches and emails by offering customized advertisement services, but also is
offering to other companies to store and analyze the big datasets in cloud platforms.
Google has also developed an IoT sensor-based autonomous Google car with
real-time analytics for driverless navigation. Facebook, the largest social network in
the world, deployed big data techniques for personalized search and advertisement.
So LinkedIn also deploys big data techniques for effective service delivery.
Preface ix

Microsoft also aspires to enter the big data business scenario by offering services of
Big Data Analytics to business enterprises on its Azure cloud services. Nokia
deploys its Big Data Analytics services on the huge buyer and subscriber base of its
mobile phones, including the mobility of its buyers and subscribers. Opower, a
power utility company, has deployed Big Data Analytics techniques on its customer
data to achieve substantial benefits on power savings. Deutsche Bank has deployed
big data techniques for achieving substantial savings and better customer rela-
tionship management (CRM). Delta Airlines improved its revenues and customer
relationship management (CRM) by deploying Big Data Analytics techniques.
A Chinese city traffic management was achieved successfully by adopting big data
methods.
Thus, this book provides a complete survey of techniques and technologies in
Big Data Analytics. This book will act as basic textbook introducing niche tech-
nologies to undergraduate and postgraduate computer science students. It can also
act as a reference book for professionals interested to pursue leadership-level career
opportunities in data and decision sciences by focusing on the concepts for problem
solving and solutions for competitive intelligence. To the best of our knowledge,
big data applications are discussed in a plethora of books. But, there is no textbook
covering a similar mix of technical topics. For further clarification, we provide
references to white papers and research papers on specific topics.

New Delhi, India C. S. R. Prabhu

Ultimo, Australia Aneesh Sreevallabh Chivukula
Saarbrücken, Germany Aditya Mogadala
Mumbai, India Rohit Ghosh
Chennai, India L. M. Jenila Livingston
Acknowledgements

The authors humbly acknowledge the contributions of the following individuals

toward the successful completion of this book.
Mr. P. V. N. Balaram Murthy, Ms. J. Jyothi, Mr. B. Rajgopal, Dr. G. Rekha,
Dr. V. G. Prasuna, Dr. P. S. Geetha, Dr. J. V. Srinivasa Murthy, all from KMIT,
Hyderabad, Dr. Charles Savage of Munich, Germany, Ms. Rachna Sehgal of
New Delhi, Dr. P. Radhakrishna of NIT, Warangal, Mr. Madhu Reddy, Hyderabad,
Mr. Rajesh Thomas, New Delhi, Mr. S. Balakrishna, Pondicherry, for their support
and assistance in various stages and phases involved in the development of the
manuscript of this book.
The authors thank the managements of the following institutions for supporting
the authors:
1. KMIT, Hyderabad
2. KL University, Guntur
3. VIT, Chennai
4. Advance Analytics Institute, University of Technology, Sydney, (475), Sydney,
Australia.

xi
About This Book

Big Data Analytics is an Internet-scale commercial high-performance parallel

computing paradigm for data analytics.
This book is a comprehensive textbook on all the multifarious dimensions and
perspectives of Big Data Analytics: the platforms, systems, algorithms and appli-
cations, including case studies.
This book presents data-derived technologies, systems and algorithmics in the
areas of machine learning, as applied to Big Data Analytics.
As case studies, this book covers briefly the analytical techniques useful for
processing data-driven workflows in various industries such as health care, travel
and transportation, manufacturing, energy, utilities, telecom, banking and insur-
ance, in addition to the IT sector itself.
The Big Data-driven computational systems described in this book have carved
out, as discussed in various chapters, the applications of Big Data Analytics in
various industry application areas such as IoT, social networks, banking and
ﬁnancial services, insurance, capital markets, bioinformatics, advertising and rec-
ommender systems. Future research directions are also indicated.
This book will be useful to both undergraduate and graduate courses in computer
science in the area of Big Data Analytics.

xiii
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Contents

1 Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 1

C. S. R. Prabhu
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 1
1.2 What Is Big Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 2
1.3 Disruptive Change and Paradigm Shift in the Business
Meaning of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Silos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Big Bang of Big Data . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.2 Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.3 Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.4 Parallel Processing for Problem Solving . . . . . . . . . . . 6
1.5.5 Why Hadoop? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.6 Hadoop and HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.7 Hadoop Versions 1.0 and 2.0 . . . . . . . . . . . . . . . . . . 8
1.5.8 Hadoop 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 HDFS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1 MapReduce Framework . . . . . . . . . . . . . . . . . . . . . . 11
1.6.2 Job Tracker and Task Tracker . . . . . . . . . . . . . . . . . . 11
1.6.3 YARN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Hadoop Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7.1 Cloud-Based Hadoop Solutions . . . . . . . . . . . . . . . . . 14
1.7.2 Spark and Data Stream Processing . . . . . . . . . . . . . . . 14
1.8 Decision Making and Data Analysis in the Context of Big
Data Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8.1 Present-Day Data Analytics Techniques . . . . . . . . . . . 15
1.9 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 17
1.10 Evolutionary Computing (EC) . . . . . . . . . . . . . . . . . . . . . . . . 21

xv
xvi Contents

1.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.12 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
References and Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Intelligent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ 25
Aneesh Sreevallabh Chivukula
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ 25
2.1.1 Open-Source Data Science . . . . . . . . . . . . ........ 26
2.1.2 Machine Intelligence and Computational
Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.3 Data Engineering and Data Sciences . . . . . . . . . . . . . 34
2.2 Big Data Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.1 Distributed Systems and Database Systems . . . . . . . . 37
2.2.2 Data Stream Systems and Stream Mining . . . . . . . . . . 40
2.2.3 Ubiquitous Computing Infrastructures . . . . . . . . . . . . 43
2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3 Analytics Models for Data Science . . . . . . . . . . . . . . . .......... 47
L. M. Jenila Livingston
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Data Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.1 Data Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.2 Data Munging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.3 Descriptive Analytics . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.4 Predictive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.5 Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.6 Network Science . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Computing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Data Structures for Big Data . . . . . . . . . . . . . . . . . . . 55
3.3.2 Feature Engineering for Structured Data . . . . . . . . . . . 73
3.3.3 Computational Algorithm . . . . . . . . . . . . . . . . . . . . . 78
3.3.4 Programming Models . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3.5 Parallel Programming . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3.6 Functional Programming . . . . . . . . . . . . . . . . . . . . . . 80
3.3.7 Distributed Programming . . . . . . . . . . . . . . . . . . . . . 80
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Contents xvii

4 Big Data Tools—Hadoop Ecosystem, Spark and NoSQL

Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 83
C. S. R. Prabhu
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.1 Hadoop Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 HDFS Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2 MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 Pig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4 Flume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.5 Sqoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.6 Mahout, The Machine Learning Platform from Apache . . . . . . 142
4.7 GANGLIA, The Monitoring Tool . . . . . . . . . . . . . . . . . . . . . 142
4.8 Kafka, The Stream Processing Platform . . . . . . . . . . . . . . . . . 143
4.9 Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.10 NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5 Predictive Modeling for Unstructured Data . . . . . . . . . . . . . . . . . . 167
Aditya Mogadala
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.2 Applications of Predictive Modeling . . . . . . . . . . . . . . . . . . . . 169
5.2.1 Natural Language Processing . . . . . . . . . . . . . . . . . . . 169
5.2.2 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.2.3 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2.4 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.3.1 Feature Extraction and Weighing . . . . . . . . . . . . . . . . 179
5.3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.4 Pattern Mining for Predictive Modeling . . . . . . . . . . . . . . . . . 187
5.4.1 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . 187
5.4.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
5.4.3 Convolutional Neural Networks (CNN) . . . . . . . . . . . 189
5.4.4 Recurrent Neural Networks (RNNs) . . . . . . . . . . . . . . 190
5.4.5 Deep Boltzmann Machines (DBM) . . . . . . . . . . . . . . 191
5.4.6 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
xviii Contents

6 Machine Learning Algorithms for Big Data . . . . . . . . . . . . . . . . . . 195

Aditya Mogadala
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.2 Generative Versus Discriminative Algorithms . . . . . . . . . . . . . 196
6.3 Supervised Learning for Big Data . . . . . . . . . . . . . . . . . . . . . 198
6.3.1 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
6.3.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . 199
6.3.3 Regression and Forecasting . . . . . . . . . . . . . . . . . . . . 200
6.3.4 Supervised Neural Networks . . . . . . . . . . . . . . . . . . . 200
6.3.5 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . 201
6.4 Unsupervised Learning for Big Data . . . . . . . . . . . . . . . . . . . 202
6.4.1 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 202
6.4.2 Principal Component Analysis (PCA) . . . . . . . . . . . . 203
6.4.3 Latent Dirichlet Allocation (LDA) . . . . . . . . . . . . . . . 204
6.4.4 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 205
6.4.5 Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.5 Semi-supervised Learning for Big Data . . . . . . . . . . . . . . . . . 207
6.5.1 Co-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.5.2 Label Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.5.3 Multiview Learning . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.6 Reinforcement Learning Basics for Big Data . . . . . . . . . . . . . 209
6.6.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . 210
6.6.2 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.6.3 Reinforcement Learning in Practice . . . . . . . . . . . . . . 210
6.7 Online Learning for Big Data . . . . . . . . . . . . . . . . . . . . . . . . 210
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.9 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7 Social Semantic Web Mining and Big Data Analytics . . . . . . . . . . . 217
C. S. R. Prabhu
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.2 What Is Semantic Web? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.3 Knowledge Representation Techniques and Platforms
in Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.4 Web Ontology Language (OWL) . . . . . . . . . . . . . . . . . . . . . . 219
7.5 Object Knowledge Model (OKM) . . . . . . . . . . . . . . . . . . . . . 219
7.6 Architecture of Semantic Web and the Semantic
Web Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.7 Social Semantic Web Mining . . . . . . . . . . . . . . . . . . . . . . . . . 221
7.8 Conceptual Networks and Folksonomies or Folk
Taxonomies of Concepts/Subconcepts . . . . . . . . . . . . . . . . . . 224
7.9 SNA and ABM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Contents xix

7.10 e-Social Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

7.11 Opinion Mining and Sentiment Analysis . . . . . . . . . . . . . . . . 228
7.12 Semantic Wikis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.13 Research Issues and Challenges for Future . . . . . . . . . . . . . . . 229
7.14 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8 Internet of Things (IOT) and Big Data Analytics . . . . . . . . . . . . . . 233
C. S. R. Prabhu
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.2 Smart Cities and IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.3 Stages of IOT and Stakeholders . . . . . . . . . . . . . . . . . . . . . . . 235
8.3.1 Stages of IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.3.2 Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.3.3 Practical Downscaling . . . . . . . . . . . . . . . . . . . . . . . . 235
8.4 Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.4.1 Analytics from the Edge to Cloud . . . . . . . . . . . . . . . 236
8.4.2 Security and Privacy Issues and Challenges
in Internet of Things (IOT) . . . . . . . . . . . . . . . . . . . . 236
8.5 Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.6 Cost Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.7 Opportunities and Business Model . . . . . . . . . . . . . . . . . . . . . 238
8.8 Content and Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.9 Data-Based Business Models Coming Out of IOT . . . . . . . . . . 239
8.10 Future of IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.10.1 Technology Drivers . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.10.2 Future Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.10.3 Challenges and Concerns . . . . . . . . . . . . . . . . . . . . . 240
8.11 Big Data Analytics and IOT . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.11.1 Infrastructure for Integration of Big Data
with IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.12 Fog Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.12.1 Fog Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.12.2 Fog Security and Privacy . . . . . . . . . . . . . . . . . . . . . 244
8.13 Research Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
8.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
8.15 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
9 Big Data Analytics for Financial Services and Banking . . . . . . . . . 249
C. S. R. Prabhu
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.2 Customer Insights and Marketing Analysis . . . . . . . . . . . . . . . 250
xx Contents

9.3 Sentiment Analysis for Consolidating Customer

Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
9.4 Predictive Analytics for Capitalizing on Customer
Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
9.5 Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
9.6 Fraud Detection and Risk Management . . . . . . . . . . . . . . . . . 252
9.7 Integration of Big Data Analytics into Operations . . . . . . . . . . 253
9.8 How Banks Can Beneﬁt from Big Data Analytics? . . . . . . . . . 253
9.9 Best Practices of Data Analytics in Banking for Crises
Redressal and Management . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.10 Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
9.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
9.12 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
10 Big Data Analytics Techniques in Capital Market Use Cases . . . . . 257
C. S. R. Prabhu
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.2 Capital Market Use Cases of Big Data Technologies . . . . . . . . 258
10.2.1 Algorithmic Trading . . . . . . . . . . . . . . . . . . . . . . . . . 258
10.2.2 Investors’ Faster Access to Securities . . . . . . . . . . . . . 259
10.3 Prediction Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.3.1 Stock Market Prediction . . . . . . . . . . . . . . . . . . . . . . 259
10.3.2 Efﬁcient Market Hypothesis (EMH) . . . . . . . . . . . . . . 260
10.3.3 Random Walk Theory (RWT) . . . . . . . . . . . . . . . . . . 260
10.3.4 Trading Philosophies . . . . . . . . . . . . . . . . . . . . . . . . 260
10.3.5 Simulation Techniques . . . . . . . . . . . . . . . . . . . . . . . 261
10.4 Research Experiments to Determine Threshold Time
for Determining Predictability . . . . . . . . . . . . . . . . . . . . . . . . 261
10.5 Experimental Analysis Using Bag of Words and Support
Vector Machine (SVM) Application to News Articles . . . . . . . 262
10.6 Textual Representation and Analysis of News Articles . . . . . . 262
10.7 Named Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
10.8 Object Knowledge Model (OKM) . . . . . . . . . . . . . . . . . . . . . 263
10.9 Application of Machine Learning Algorithms . . . . . . . . . . . . . 263
10.10 Sources of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.11 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.13 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11 Big Data Analytics for Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . 267
C. S. R. Prabhu
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Contents xxi

11.2 The Insurance Business Scenario . . . . . . . . . . . . . . . . . . . . . . 268

11.3 Big Data Deployment in Insurance . . . . . . . . . . . . . . . . . . . . . 268
11.4 Insurance Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
11.5 Customer Needs Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
11.6 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.8 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
12 Big Data Analytics in Advertising . . . . . . . . . . . . . . . . . . . . . . . . . . 271
C. S. R. Prabhu
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.2 What Role Can Big Data Analytics Play in Advertising? . . . . . 272
12.3 BOTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
12.4 Predictive Analytics in Advertising . . . . . . . . . . . . . . . . . . . . 272
12.5 Big Data for Big Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
12.6 Innovation in Big Data—Netﬂix . . . . . . . . . . . . . . . . . . . . . . 273
12.7 Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
12.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
12.9 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
13 Big Data Analytics in Bio-informatics . . . . . . . . . . . . . . . . . . . . . . . 275
C. S. R. Prabhu
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
13.2 Characteristics of Problems in Bio-informatics . . . . . . . . . . . . 276
13.3 Cloud Computing in Bio-informatics . . . . . . . . . . . . . . . . . . . 276
13.4 Types of Data in Bio-informatics . . . . . . . . . . . . . . . . . . . . . . 276
13.5 Big Data Analytics and Bio-informatics . . . . . . . . . . . . . . . . . 279
13.6 Open Problems in Big Data Analytics in Bio-informatics . . . . 279
13.7 Big Data Tools for Bio-informatics . . . . . . . . . . . . . . . . . . . . 282
13.8 Analysis on the Readiness of Machine Learning
Techniques for Bio-informatics Application . . . . . . . . . . . . . . 282
13.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
13.10 Questions and Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
14 Big Data Analytics and Recommender Systems . . . . . . . . . . . . . . . 287
Rohit Ghosh
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
14.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
14.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
14.3.1 Basic Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
14.3.2 Content-Based Recommender Systems . . . . . . . . . . . . 291
xxii Contents

14.3.3 Unsupervised Approaches . . . . . . . . . . . . . . . . . . . . . 291

14.3.4 Supervised Approaches . . . . . . . . . . . . . . . . . . . . . . . 291
14.3.5 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . 292
14.4 Evaluation of Recommenders . . . . . . . . . . . . . . . . . . . . . . . . . 294
14.5 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
14.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
14.7 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
15 Security in Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
C. S. R. Prabhu
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
15.2 Ills of Social Networking—Identity Theft . . . . . . . . . . . . . . . . 302
15.3 Organizational Big Data Security . . . . . . . . . . . . . . . . . . . . . . 302
15.4 Security in Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
15.5 Issues and Challenges in Big Data Security . . . . . . . . . . . . . . 303
15.6 Encryption for Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
15.7 Secure MapReduce and Log Management . . . . . . . . . . . . . . . 304
15.8 Access Control, Differential Privacy and Third-Party
Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
15.9 Real-Time Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
15.10 Security Best Practices for Non-relational or NoSQL
Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
15.11 Challenges, Issues and New Approaches Endpoint Input,
Validation and Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
15.12 Research Overview and New Approaches for Security
Issues in Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
15.13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
15.14 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
16 Privacy and Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
C. S. R. Prabhu
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.2 Privacy Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.3 Enterprise Big Data Privacy Policy and COBIT 5 . . . . . . . . . . 312
16.4 Assurance and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . 313
16.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
16.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Contents xxiii

17 Emerging Research Trends and New Horizons . . . . . . . . . . . . . . . . 317

Aneesh Sreevallabh Chivukula
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
17.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
17.3 Data Streams, Dynamic Network Analysis and Adversarial
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
17.4 Algorithms for Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
17.5 Dynamic Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
17.6 Dynamic Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 319
17.7 Outlier Detection in Time-Evolving Networks . . . . . . . . . . . . 319
17.8 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
17.9 Literature Review of Research in Dynamic Networks . . . . . . . 320
17.10 Dynamic Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 320
17.11 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
17.12 Validation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
17.13 Change Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
17.14 Labeled Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
17.15 Event Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
17.16 Evolutionary Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
17.17 Block Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
17.18 Surveys on Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . 326
17.19 Adversarial Learning—Secure Machine Learning . . . . . . . . . . 328
17.20 Conclusion and Future Emerging Direction . . . . . . . . . . . . . . . 329
17.21 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Appendices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
About the Authors

Dr. C. S. R. Prabhu has held prestigious positions with Government of India and
various institutions. He retired as Director General of the National Informatics
Centre (NIC), Ministry of Electronics and Information Technology, Government of
India, New Delhi, and has worked with Tata Consultancy Services (TCS), CMC,
TES and TELCO (now Tata Motors). He was also faculty for the Programs of the
APO (Asian Productivity Organization). He has taught and researched at the
University of Central Florida, Orlando, USA, and also had a brief stint as a
Consultant to NASA. He was Chairman of the Computer Society of India (CSI),
Hyderabad Chapter. He is presently working as an Advisor (Honorary) at KL
University, Vijayawada, Andhra Pradesh, and as a Director of Research and
Innovation at Keshav Memorial Institute of Technology (KMIT), Hyderabad.
He received his Master’s degree in Electrical Engineering with specialization in
Computer Science from the Indian Institute of Technology, Bombay. He has guided
many Master’s and doctoral students in research areas such as Big Data.

Dr. Aneesh Sreevallabh Chivukula is currently a Research Scholar at the

Advanced Analytics Institute, University of Technology Sydney (UTS), Australia.
Previously, he chiefly worked in computational data science-driven product
development at Indian startup companies and research labs. He received his M.S.
degree from the International Institute of Information Technology (IIIT),
Hyderabad. His research interests include machine learning, data mining, pattern
recognition, big data analytics and cloud computing.

Dr. Aditya Mogadala is a postdoc in the Language Science and Technology at

Saarland University. His research concentrates on the general area of
Deep/Representation learning applied for integration of external real-world/
common-sense knowledge (e.g., vision and knowledge graphs) into natural lan-
guage sequence generation models. Before Postdoc, he was a PhD student and

xxv
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
xxvi About the Authors

Research Associate at the Karlsruhe Institute of Technology, Germany. He holds

B.Tech and M.S. degree from the IIIT, Hyderabad, and has worked as a Software
Engineer at IBM India Software Labs.

Mr. Rohit Ghosh currently works at Qure, Mumbai. He previously served as a

Data Scientist for ListUp, and for Data Science Labs. Holding a B.Tech. from the
IIT Mumbai, his work involves R&D areas in computer vision, deep learning,
reinforcement learning (mostly related to trading strategies) and cryptocurrencies.

Dr. L. M. Jenila Livingston is an Associate Professor with the CSE Dept at VIT,
Chennai. Her teaching foci and research interests include artiﬁcial intelligence, soft
computing, and analytics.
Chapter 1
Big Data Analytics

1.1 Introduction

The latest disruptive trends and developments in digital age comprise social network-
ing, mobility, analytics and cloud, popularly known as SMAC. The year 2016 saw
Big Data Technologies being leveraged to power business intelligence applications.
What holds in store for 2020 and beyond?
Big Data for governance and for competitive advantage is going to get the big
push in 2020 and beyond. The tug of war between governance and data value will
be there to balance in 2020 and beyond. Enterprises will put to use the enormous
data or Big Data they already have about their customers, employees, partners and
other stakeholders by deploying it for both regulatory use cases and non-regulatory
use cases of value to business management and business development. Regulatory
use cases require governance, data quality and lineage so that a regulatory body can
analyze and track the data to its source all through its various transformations. On
the other hand, the non-regulatory use of data can be like 360° customer monitoring
or offering customer services where high cardinality, real time and mix of structured,
semi-structured and unstructured data will produce more effective results.
It is expected that in 2020 businesses will shift to a data-driven approach. All
businesses today require analytical and operational capabilities to address customers,
process claims, use interfaces to IOT devices such as sensors in real time, at a per-
sonalized level, for each individual customer. For example, an e-commerce site can
provide individual recommendations after checking prices in real time. Similarly,
health monitoring for providing medical advice through telemedicine can be made
operational using IOT devices for monitoring all individual vital health parameters.
Health insurance companies can process valid claims and stop paying fraudulent
claims by combining analytics techniques with their operational systems. Media
companies can deliver personalized content through set-top boxes. The list of such
use cases is endless. For achieving the delivery of such use cases, an agile platform
is essentially required which can provide both analytical results and also operational
efficiency so as to make the office operations more relevant and accurate, backed

C. S. R. Prabhu et al., Big Data Analytics: Systems, Algorithms, Applications,
https://doi.org/10.1007/978-981-15-0094-7_1
2 1 Big Data Analytics

by analytical reasoning. In fact, in 2020 and beyond the business organizations will
go beyond just asking questions to taking great strides to achieve both initial and
long-term business values.
Agility, both in data and in software, will become the differentiator in business in
2020 and beyond. Instead of just maintaining large data lakes, repositories, databases
or data warehouses, enterprises will leverage on data agility or the ability to under-
stand data in contexts and take intelligent decisions on business actions based on
data analytics and forecasting.
The agile processing models will enable the same instance of data to support
batch analytics, interactive analytics, global messaging, database models and all
other manifestations of data, all in full synchronization. More agile data analytics
models will be required to be deployed when a single instance of data can support
a broader set of tools. The end outcome will be agile development and application
platform that supports a very broad spectrum of processing and analytical models.
Block chain is the big thrust area in 2020 in financial services, as it provides
a disruptive way to store and process transactions. Block chain runs on a global
network of distributive computer systems which any one can view and examine.
Transactions are stored in blocks such that each block refers to previous block, all
of them being time-stamped and stored in a form unchangeable by hackers, as the
world has a complete view of all transactions in a block chain. Block chain will
speed up financial transactions significantly, at the same time providing security
and transparency to individual customers. For enterprises, block chain will result in
savings and efficiency. Block chain can be implemented in Big Data environment.
In 2020, microservices will be offered in a big way, leveraging on Big Data
Analytics and machine learning by utilizing huge amount of historical data to better
understand the context of the newly arriving streaming data. Smart devices from
IOT will collaborate and analyze each other, using machine learning algorithms to
adjudicate peer-to-peer decisions in real time.
There will also be a shift from post-event and real-time analytics to pre-event and
action (based on real-time data from immediate past).
Ubiquity of connected data applications will be the order of the day. In 2020, mod-
ern data applications will be highly portable, containerized and connected quickly
replacing vertically integrated monolithic software technologies.
Productization of data will be the order of the day in 2020 and beyond. Data will
be a product, a commodity, to buy or to sell, resulting in new business models for
monetization of data.

1.2 What Is Big Data?

Supercomputing at Internet scale is popularly known as Big Data. Technologies such

as distributed computing, parallel processing, cluster computing and distributed file
system have been integrated to take the new avatar of Big Data and data science.
Commercial supercomputing, now known as Big Data, originated at companies such
1.2 What Is Big Data? 3

as Google, Facebook, Yahoo and others, operates at Internet scale that needed to
process the ever-increasing numbers of users and their data which was of very large
volume, with large variety, high veracity and changing with high velocity which had a
great value. The traditional techniques of handling data and processing it were found
to be completely deficient to rise up to the occasion. Therefore, new approaches and a
new paradigm were required. Using the old technologies, the new framework of Big
Data Architecture was evolved by the very same companies who needed it. Thence
came the birth of Internet-scale commercial supercomputing paradigm or Big Data.

1.3 Disruptive Change and Paradigm Shift in the Business

Meaning of Big Data

This paradigm shift brought disruptive changes to organizations and vendors across
the globe and also large social networks so as to encompass the whole planet, in all
walks of life, in light of Internet of things (IOT) contributing in a big way to Big
Data. Big Data is not the trendy new fashion of computing, but it is sure to transform
the way computing is performed and it is so disruptive that its impact will sustain
for many generations to come.
Big Data is the commercial equivalent of HPC or supercomputing (for scientific
computing) with a difference: Scientific supercomputing or HPC is computation
intensive with scientific calculations as the main focus of computing, whereas Big
Data is only processing very large data for mostly finding out the patterns of behavior
in data which were previously unknown.
Today, Internet-scale commercial companies such as Amazon, eBay and Filpkart
use commercial supercomputing to solve their Internet-scale business problems, even
though commercial supercomputing can be harnessed for many more tasks than sim-
ple commercial transactions as fraud detection, analyzing bounced checks or tracking
Facebook friends! While the scientific supercomputing activity came downward and
commercial supercomputing activity went upward, they both are reaching a state
of equilibrium. Big data will play an important role in ‘decarbonizing’ the global
economy and will also help work toward Sustainable Development Goals.
Industry 4.0, Agriculture or Farming 4.0, Services 4.0, Finance 4.0 and beyond
are the expected outcomes of the application IOT and Big Data Analytics techniques
together to the existing versions of the same sectors of industry, agriculture or farm-
ing, services, finance, by weaving together of many sectors of the economy to the
one new order of the World 4.0. Beyond this, the World 5.0 is aimed to be achieved
by the governments of China and Japan by deploying IOT and Big Data in a big way,
a situation which may become ‘big brothers,’ becoming too powerful in tracking
everything aiming to control everything! That is where we need to find a scenario
of Humans 8.0 who have human values or Dharma, so as to be independent and
yet have a sustainable way of life. We shall now see how the Big Data technologies
based on Hadoop and Spark can handle practically the massive amounts of data that
is pouring in modern times.
4 1 Big Data Analytics

1.4 Hadoop

Hadoop was the first commercial supercomputing software platform that works at
scale and also is affordable at scale. Hadoop is based on exploiting parallelism and
was originally developed in Yahoo to solve specific problems. Soon it was realized to
have large-scale applicability to problems faced across the Internet-scale companies
such as Facebook or Google. Originally, Yahoo utilized Hadoop for tracking all user
navigation clicks in web search process for harnessing it for advertisers. This meant
millions of clickstream data to be processed on tens of thousands of servers across the
globe on an Internet-scale database that was economical enough to build and operate.
No existing solutions were found capable to handle this problem. Hence, Yahoo built,
from scratch, the entire ecosystem for effectively handling this requirement. Thus
was born Hadoops [1]. Like Linux, Hadoop was also in open source. Just as Linux
spans over clusters of servers, clusters of HPC servers or Clouds, so also Hadoop
has created the Big Data Ecosystem of new products, vendors, new startups and
disruptive possibilities. Even though in open-source domain originally, today even
Microsoft Operating System supports Hadoop.

1.5 Silos

Traditionally, IT organizations partition expertise and responsibilities which con-

strains collaboration between and among groups so created. This may result in small
errors in supercomputing scale which may result in huge losses of time and money.
A 1% error, say for 300 terabytes, is 3 million megabytes. Fixing such bugs will be
an extremely expensive exercise.
In scientific supercomputing area, small teams managed well the entire environ-
ment. Therefore, it is concluded that a small team with a working knowledge of
the entire platform works the best. Silos become impediments in all circumstances,
both in scientific and in commercial supercomputing environments. Internet-scale
computing can and will work only when it is taken as a single platform (not silos of
different functions). A small team with complete working knowledge of the entire
platform is essential. However, historically since the 1980s, the customers and user
community were forced to look at computing as silos with different vendors for
hardware, operating system, database and development platform. This leads to a silo-
based computing. In Big Data and Hadoop, this is replaced with a single platform or a
single system image and single ecosystem of the entire commercial supercomputing
activities.
Supercomputers are Single Platforms
Originally, mainframes were single platforms. Subsequently, silos of products from
a variety of vendors came in. Now again in Big Data, we are arriving at a single
platform approach.
Random documents with unrelated
content Scribd suggests to you:
(a lex data) is expressly declared to be modelled on a lex
Manciana[1394], which can hardly be other than a set of regulations
issued by a former owner of the estate, and adopted with
modifications by the imperial agents (procuratores) specially
appointed to organize it as an imperial domain. In Roman practice it
was usual to follow convenient precedents. How long the estate had
become Crown-property, and by what process, inheritance purchase
confiscation etc, we do not know. Nor is it certain whether the new
statute was prepared as a matter of course on the cessation of
private ownership, or whether it was issued in response to an appeal
to the emperor complaining of oppressive exactions on the part of
the head-tenants. But of the latter situation there is no sign, and I
am inclined to accept the former alternative. In that case it appears
necessary to suppose that the system of letting a great estate to one
or a few great lessees, who might and did sublet parcels to small
tenant farmers, was not unknown in the practice of great private
landlords. This may well have been the case in Africa, still populous
and prosperous, though such a system never took root in
depopulated and failing Italy. It required willingness on the part of
men of substance to risk their capital in a speculation that could only
succeed if good sub-tenants were to be found. This condition could
not be fulfilled in Italy, but in Africa things were very different.
It is however easier to note this difference by unmistakeable signs
than to ascertain it in detail. One point is clear. The coloni on this
domain were bound to render fixed services to the head-tenants at
certain seasons of the year. These services consisted of two days’
work (operas binas) at the times of ploughing hoeing and harvest,
six in all. The falling-off in the supply of slaves, despite occasional
captures of prisoners in war, was a consequence of the pax Romana,
and how to provide sufficient labour was a standing problem of
agriculture. The guarantee of extra labour at seasons of pressure
was doubtless a main consideration with speculators in inducing
them to venture their substance by becoming lessees of large tracts
of land. Of hired labour available for the purpose the statute gives
no hint, nor is it likely that such labourers were to be found in Africa.
Thus the colonus, and perhaps his whole household, were bound to
certain compulsory services, and thereby made part of an
organization strictly regulated and liable to further regulation.
Further regulation was not likely to give the peasant farmer more
freedom of movement, since the leading motive of the system was
to secure continuous cultivation, and this could best be secured by
long tenancies, tending to become hereditary. Therefore this statute
offers various inducements to keep the peasant contentedly engaged
in bettering his own position by developing the estate. The head-
tenants are strictly forbidden to oppress him by exacting larger
shares of produce or more operae than are allowed by the
regulations. He is encouraged to cultivate parcels of waste land, not
included in his farm, by various privileges: in particular, a term of
rent-free years is guaranteed to him in case he plants the land with
fruit trees. This term, varying from five to ten years according to
species of trees, is meant to give him time to get a taste of profit
before he becomes liable to rent: its effect in making him loth to
move is obvious.
The statute tells us nothing on another important point. From the
jurists and other sources[1395] we know that in Italy it was normally
the custom for the stock of a farm let to a colonus to be found for
the most part by the landlord. It was held[1396] that in taking over
this instrumentum at a valuation the tenant virtually purchased it, of
course not paying for it in ready money, but standing bound to
account for the amount on quitting the tenancy. Thus a small man
was left free to employ his own little capital in the actual working of
the farm. He could add to the stock, and his additions gave to the
landlord a further security for his rent, over and above that given by
the sureties usually required. What stock was found by landlords,
and what by tenant, was a matter for agreement generally following
local convention. But on this African domain we are not told how the
question of instrumentum was settled. Probably there was a
traditional rule so well established that no reference to the point in
the statute seemed necessary. The sole landlord was now the
emperor. Without some direct evidence to that effect, I can hardly
suppose that the provision of farm stock was entrusted to his
procuratores. On the other hand, if the chief tenants, the
conductores, were expected to undertake this business, as if they
had been landlords, this too seems to call for direct evidence.
Possibly the need of finding stock for an African peasant farmer was
not so pressing as in Italy: still some equipment was surely required.
How it was provided, seems to me a question for answering which
we have not as yet sufficient materials. But it may be that on these
domains the practical necessity for dealing with it seldom occurred.
If, when the formal term of a tenancy expired, the same tenant
stayed on either by tacit renewal (reconductio) or by grant of a new
lease, the stock originally supplied would surely remain for use on
the farm, upkeep and renewals of particular articles being of course
allowed for. If a farmer’s son succeeded him as tenant, the situation
would be the same, or very nearly so. Therefore the manifest desire
of emperors to keep tenants in permanence probably operated to
minimize questions of instrumentum to the point of practical
insignificance.
That the coloni on this estate were themselves handworkers can
hardly be doubted. The operae required of them suggest this on any
natural interpretation. But there is nothing to shew that they did not
employ[1397] slave labour—if and when they could get it. We are not
to assume that they were all on one dead level of poverty. That the
head-tenants kept slaves to work those parts of the domain that
they farmed for their own account, is indicated by the mention of
their vilici, and made certain by the small amount of supplementary
labour guaranteed them in the form of tenants’ operae. Only one
direct mention of slaves (servis dominicis) occurs in the inscription,
and the text is in that place badly mutilated. Partly for the same
defect, it seems necessary to avoid discussing certain other details,
such as the position of the stipendiarii of whom we hear in a broken
passage. Nor do I venture to draw confident inferences from the
references to inquilini or coloni inquilini, or to discover an important
distinction between the tenants who actually resided on the estate
and those who did not. It may be right to infer a class of small
proprietors dwelling around on the skirts of the great domain and
hiring parcels of land within it. It may be right to regard the inquilini
as coloni transplanted from abroad and made residents on the
estate. But until such conclusions are more surely established it is
safer to refrain from building upon them. The general effect of this
document is to give us outlines of a system of imperial ‘peculiars,’
that is of domains on which order and security, necessary for the
successful working and continuous cultivation, were not left to the
operation of the ordinary law, but guaranteed in each case by what
we may call an imperial by-law.
(2) The inscription of Souk el Khmis[1398] deals with circumstances
between 180 and 183 ad. The rescript of Commodus, and the appeal
to which it was the answer, are recorded in it. The imperial estate to
which it refers is called saltus Burunitanus. A single conductor
appears to have been the lessee of the whole estate, and it was
against his unlawful exactions that the coloni appealed. Through the
connivance of the responsible procurator (corruptly obtained, the
coloni hint,) this tyrant had compelled them to pay larger shares of
produce than were rightly due, and also to render services of men
and beasts beyond the amount fixed by statute. This abuse had
existed on the estate for some time, but the proceedings of the
present conductor had made it past all bearing. Evidently there had
been some resistance, but official favour had enabled him to employ
military force in suppressing it. Violence had been freely used: some
persons had been arrested and imprisoned or otherwise maltreated;
others had been severely beaten, among them even Roman citizens.
Hence the appeal. It is to be noted that the appellants in no way
dispute their liability to pay shares of produce (partes agrarias) or to
render labour-services at the usual seasons of pressure (operarum
praebitionem iugorumve). They refer to a clause in a lex Hadriana,
regulating these dues. It is against the exaction of more than this
statute allows that they venture to protest. They judiciously point
out to the emperor that such doings are injurious to the financial
interest[1399] of his treasury (in perniciem rationum tuarum), that is,
they will end by ruining the estate as a source of steady revenue.
The officials of the central department in Rome were evidently of the
same opinion, for the rescript of Commodus[1400] plainly ordered his
procuratores to follow closely the rules and policy applicable to the
domains, permitting no exactions in transgression of the standing
regulations (contra perpetuam formam). In short, he reaffirmed the
statute of Hadrian.
In this document also we hear nothing of tenants’ arrears or of
money-rents. Naturally enough, for the coloni are partiarii whose
rent is a share of produce. In connexion with such tenants the
difficulty[1401] of reliqua does not easily arise. They are labouring
peasants, who describe themselves as homines rustici tenues
manuum nostrarum operis victum tolerantes. Of course they are
posing as injured innocents. Perhaps they were: at any rate the
great officials in Rome would look kindly on humble peasants who
only asked protection in order to go on unmolested, producing the
food which it was their duty to produce,—food, by the by, of the
need of which the Roman mob was a standing reminder. Of vilici or
ordinary slaves this document says nothing, for it had no need to do
so; but the right to operae at certain seasons implies slave labour on
the head-tenant’s own farm, probably attached to the chief villa or
palatium. In a notable phrase at the end of their appeal the coloni
speak of themselves[1402] as ‘your peasants, home-bred slaves and
foster-children of your domains’ (rustici tui vernulae et alumni
saltuum tuorum). Surely this implies, not only that they are coloni
Caesaris, standing in a direct relation to the emperor whose
protection[1403] they implore against the conductores agrorum
fiscalium; but also that their connexion with the estate is an old-
established one, passing from fathers to sons, a hereditary tie which
they have at present no wish to see broken.
In this case the circumstances that led to the setting-up of the
inscription are clear enough. Evidently the appeal represented a
great effort, both in the way of organizing concerted action on the
part of the peasant farmers, and in overcoming the hindrances to its
presentation which would be created by the interested ingenuity of
those whose acts were thereby called in question. The imperial
officials in the Provinces were often secretly in league with those in
authority at Rome, and to have procured an imperial rescript in
favour of the appellants was a great triumph, perhaps a rare one.
The forma perpetua containing the regulations governing the estate
was, we learn, already posted up on a bronze tablet. It had been
disregarded: and now it was an obvious precaution to record that
the emperor had ordered those regulations to be observed in future.
How long the effect of this rescript lasted we are left to guess.
Officials changed, and reaffirmation of principles could not guarantee
permanent reform of practice. Still, the policy of the central bureau,
when not warped by corrupt influence, was consistent and clear. To
keep these imperial ‘peculiars’ on such a footing as to insure steady
returns was an undoubted need: and, after the extreme strain on
the resources of the empire imposed by the calamitous times of
Marcus, it was in the reign of Commodus a greater need than ever.
(3) The Gazr Mezuâr inscription[1404], very fragmentary and in
some points variously interpreted, belongs to the same period (181
ad). A few details seem sufficiently certain to be of use here. The
estate in question is imperial property, apparently one of the
domanial units revealed to us by these African documents. It seems
to record another case of appeal against unlawful exaction of
operae, probably by a conductor or conductores. It also was
successful. But it is notable that the lawful amount of operae to be
rendered by coloni on this estate was just double of that fixed in the
other cases—four at each of the seasons of pressure, twelve in all.
We can only infer that the task-scale varied on various estates for
reasons unknown to us. One fragment, if a probable
restoration[1405] is to be accepted, conveys the impression of a
despairing threat on the part of the appellants. It suggests that on
failure of redress they may be driven to return to their homes where
they can make their abode in freedom. On the face of it, this is an
assertion of freedom of movement, a valuable piece of evidence, if it
can be trusted. We may safely go so far as to note that it is at least
not inconsistent with other indications pointing to the same
conclusion. We may even remark that the suggestion of going home
in search of freedom agrees better with the notion that these coloni
were African natives than with the supposition of their Italian origin.
The Roman citizens on the Burunitan estate will not support the
latter view, for they are mentioned as exceptional. Seeck (rightly, I
think,) urges that Italy was in sore need of men and had none to
spare for populous Africa. I would add that the emigration of Italians
to the Provinces as working farmers seems to require more proof
than has yet been produced. As officials, as traders, as financiers
and petty usurers, as exploiters of other men’s labour, they
abounded in the subject countries; but, so far as I can learn, not as
labourers. Many of them no doubt held landed estates, for instance
in the southern parts of Spain and Gaul. But when we meet with
loose general expressions[1406] such as ‘The Roman is dwelling in
every land that he has conquered,’ we must not let them tempt us
into overestimating the number of Italian settlers taking an active
part in the operations of provincial agriculture.
(4) The inscription of Ain Ouassel[1407] belongs to the end of the
reign of Severus. The text is much broken, but information of no
small importance can be gathered from what remains. Severus was
himself a native of Africa, and may have taken a personal interest in
the subject of this ordinance. In point of form the document chiefly
consists of a quoted communication (sermo) from the emperor’s
procuratores[1408], one of whom, a freedman, saw to its publication
in an inscription on an ara legis divi Hadriani. A copy of the lex
Hadriana, or at least the relevant clauses thereof, was included. The
matter on which the emperor’s decision is announced was the
question of the right to occupy and cultivate rough lands (rudes agri)
[1409], which are defined as lands either simply waste or such as the
conductores have neglected to cultivate for at least ten years
preceding. These lands are included in no less than five different
saltus mentioned by proper names, and the scope of the ordinance
is wider than in the cases referred to above. It appears that, while it
may have contained some modifications or extensions of the
provisions of the lex Hadriana, its main bearing was to reaffirm and
apply the privileges granted by that statute. It is not rash to infer
that we have here evidence of a set of regulations for all or many of
the African domains, forming a part of Hadrian’s great work of
reorganization.
If the remaining words of this inscription are rightly interpreted, as
I think they are, it seems that the policy of encouraging the
cultivation of waste and derelict lands was at this time being revived
by the government. We have seen it at work in Trajan’s time,
promoted by guarantee of privileges and temporary exemption from
burdens. But the persons then encouraged to undertake the work of
reclamation were to all appearance only the coloni at the time
resident on the estate. In the case of these five saltus, the offer
seems to be made more widely, at least so far as the remaining text
may justify such conclusions. It reads like an attempt to attract
enterprising squatters of any kind from any quarter. They are offered
not merely undisturbed occupation and a heritable tenure of some
sort, but actual possessio. Now this right, which fills a whole
important chapter in Roman law, was one protected by special legal
remedies, and even on an imperial domain can hardly have been a
matter of indifference. It was quite distinct from mere possessio
naturalis[1410], which was all that the ordinary colonus enjoyed on
his own behalf. This new-type squatter is allowed the same privilege
of so many years of grace, free of rent, at the outset of his
enterprise, that we have noted above. The details are somewhat
different. For olives the free term is ten years: for fruit trees (poma,
here mentioned without reference to vines) it is seven years. It is
expressly provided that the divisio, which implies the partiary system
of tenancy, shall apply only to such poma as are actually
brought[1411] to market. This suggests that in the past attempts to
levy the quota as a proportional share of the gross crop, without
regard to the needs of the grower’s own household, had been found
to discourage reclamation. It has been pointed out that the effect of
the new policy would be to create a sort of perpetual leasehold,
similar to that known by the Greek term emphyteusis, which is found
fully established in the later empire. But the land was not all under
fruit-crops. The disposal of corn crops is regulated in a singular
clause thus. ‘Any shares of dry[1412] crops that shall be due are,
during the first five years of occupation, to be delivered to the head-
tenant within whose holding[1413] the land occupied is situate. After
the lapse of that time they are to go to the account (of the
Treasury[1414]).’ Why is the conductor to receive these partes
aridae? It is reasonably suggested that the intention was to obviate
initial obstruction on the part of the big lessee, and thus to give the
reclamation-project a fair start.
For we have no right to assume that the parcels of land thrown
open to occupation had hitherto been included[1415] in no tenancy.
The whole import of the document shews that they often belonged
to this or that area held by one or other of the big lessees. That
there was at least one conductor to each of the five saltus seems
certain. That there was only one to each, is perhaps probable, but
hardly to be gathered from the text. Now, so long as the conductor
regularly paid his fixed rent (canon) and accounted for the taxes
(tributa) due from the estate, why should the imperial authority step
in to take pieces of land (and that the poorest land) out of his direct
control? The answer to this is that the Roman law[1416] recognized
the right of a private landlord to require of his tenants that they
should not ‘let down’ the land leased to them: and proof of
neglected cultivation might operate to bar a tenant’s claim for
abatement of rent. What was the right of an ordinary landlord was
not likely to be waived by an emperor: though his domains might be
administered in fact by a special set of fiscal regulations, he claimed
a right analogous to that recognized by the ordinary law, and none
could challenge its exercise. A big lessee might often find that parts
of his holding could not be cultivated at a profit under existing
conditions. Slave labour was careless and inefficient; it was in these
times also costly, so costly that it only paid to employ it on generous
soils. The task-work of coloni did not amount to much, and it was no
doubt rendered grudgingly. He was tempted to economize in
slaves[1417] and to employ his reduced staff on the best land only.
We need not suppose that he got an abatement of his fixed rent
from the fiscal authorities: he was most unlikely to attract their
attention by making such a claim. He had made his bargain with
eyes presumably open. That he had agreed to the canon assures us
that it must have been low enough to leave him a comfortable
margin for profit. We may be fairly sure that he sat quiet and did
what seemed to pay him best.
In the remaining text of this statute there is no reference to
operae due from the new squatters, and nothing is said of coloni.
This does not seem to be due to injury of the stone. The persons for
whose benefit the statute is enacted are apparently a new or newly
recognized element[1418] in the population of these domains, not
coloni. But the rights offered to them are expressly referred to as
rights granted by the statute of Hadrian. If so, then the lex Hadriana
contemplated the establishment of a new peasant class, not coloni,
and the present statute was merely a revival of Hadrian’s scheme.
The men are eventually to pay shares of crops, and Schulten’s[1419]
view, that they are on the way to become coloni, is possible, if not
probable. When he remarks that they might find the position of
coloni a doubtful boon, we need not challenge his opinion.
(5) The inscription of Ain el Djemala[1420], a later discovery (1906)
is of special importance as belonging to the same neighbourhood as
the preceding one. It is a document of Hadrian’s time. It refers to
the same group of estates as the above, and deals with the same
matter, the right to cultivate waste or derelict parcels of land. Indeed
the connexion of the two inscriptions is so close that the parts
preserved of each can be safely used to fill gaps in the text of the
other. In a few points this inscription, the earlier in date, supplies
further detail. The most notable is that another estate, a saltus or
fundus Neronianus, is mentioned in it, and not in the later one. Thus
it would seem that it referred to six estates, a curious coincidence,
when we recall the six great African landlords made away with by
Nero. Another little addition is that waste lands are defined as
marshy or wooded. Also that the land is spoken of as fit for growing
olives vines and corn-crops, which supplements a mutilated portion
of the Ain Ouassel stone. But in one point the difference between
the two is on the face of it difficult to reconcile. In addressing the
imperial procuratores the applicants base their request on the lex
Manciana, the benefit of which they seek to enjoy[1421] as used on
the neighbouring saltus Neronianus. Here the broken text is thought
to have contained a reference to the enhanced prosperity of that
estate owing to the concession. In any case we may fairly conclude
that the lex Manciana was well known in the district, and its
regulations regarded by the farmers as favourable to their interests.
But the reply to their petition does not refer to it as the immediate
basis of the decision given. The communication (sermo) of Hadrian’s
procurators is cited as the ground of the leave granted for cultivation
of waste lands. Yet the broken sentence at the end of the inscription
seems at least to shew that the rules of the lex Manciana were still
recognized as a standard, confirmed and perhaps incorporated, or
referred to by name, in the lex Hadriana itself. It is ingeniously
suggested that the farmers rest their case on the Manciana because
the Hadriana was as yet unknown to them; while the reply refers to
Hadrian’s statute as authority. Whether the saltus or fundus
Neronianus, on which the Mancian regulations were in force, is
another estate-unit similar to the five named both here and in the
later inscription, is a point on which I have some doubts, too little
connected with my subject for discussion here. The general scope of
the concession granted by Hadrian is the same as the later one of
Severus.
If Hadrian issued a statute or statutes regulating the terms of
occupancy on the African domains, and some attempts to evade it
were met by its reaffirmation under Commodus, it is quite natural
that neglect or evasion of it in some other respects should be met by
reaffirmation under Severus. This consideration will account for the
identity of the concessions granted in these two inscriptions. And it
agrees perfectly with the evidence of later legislation in the
Theodosian code. The normal course of events is, legislation to
protect the poorer classes of cultivators, then evasion of the law by
the selfish rich, then reenactment of evaded laws, generally with
increased penalties. That under the administrative system of the
domains much the same phenomena should occur, is only what we
might expect.

XLVIII. DISCUSSION OF THE ABOVE

INSCRIPTIONS.
In reviewing the state of things revealed to us by these
inscriptions we must carefully bear in mind that they relate solely to
the Province Africa. Conditions there were in many ways exceptional.
When Rome took over this territory after the destruction of Carthage
in 146 bc, it was probably a country divided for the most part into
great estates worked on the Carthaginian system by slave labour.
Gradually the land came more and more into the hands of Roman
capitalists, to whose opulence Horace refers. Pliny tells us that in
Nero’s time six[1422] great landlords possessed half the entire area
of the Province, when that emperor found a pretext for putting them
to death and confiscating their estates. Henceforth the ruling
emperor was the predominating landlord[1423] in a Province of
immense importance, in particular as a chief granary of Rome. We
are not to suppose that any change in the system of large units was
ever contemplated. Punic traditions, probably based on experience,
favoured the system; though the Punic language, still spoken, seems
to have been chiefly confined to the seaboard districts. What the
change of lordship effected was not only to the financial advantage
of the imperial treasury: it also put an end to the creation of what
were a sort of little principalities that might some day cause serious
trouble. At this point we are tempted to wonder whether the great
landlords, before the sweeping measure of Nero, had taken any
steps towards introducing a new organization in the management of
their estates. Trajan’s statute refers to a lex Manciana and adopts a
number of its regulations. These regulations clearly contemplate a
system of head-tenants and sub-tenants, of whom the latter seem to
be actual working farmers living of the labour of their own hands, as
those who some 65 years later described themselves in appealing to
Commodus. The former have stewards in charge of the cultivation of
the ‘manor farms’ attached to the principal farmsteads, and evidently
employ gangs of slaves: but at special seasons have a right to a
limited amount[1424] of task-labour from the free sub-tenants of the
small farms. That these labour-conditions were devised to meet a
difficulty in procuring enough slaves to carry on the cultivation of the
whole big estate, is an inference hardly to be resisted. That we find
it on more than one estate indicates that for the time it was serving
its purpose. But, in admitting that it probably began under the rule
of great private landlords, we must not lose sight of the fact that it
was liable to grievous abuse, and that even the regulations of
Hadrian did not remove the necessity of pitiful appeals for redress.
An important characteristic of these estates was that they were
outside the municipal[1425] system. Each of the so-called civitates
had its own charter or statute (lex) conforming more or less closely
to a common[1426] model, under which the municipal authorities
could regulate the management of lands within its territory. But
these great estates were independent[1427] of such local
jurisdictions. And this independence would seem to date from the
times of private ownership, before the conversion of many of them
into imperial domains. Mommsen thought that this separate
treatment of them as ‘peculiars’ began in Italy under the Republic,
and was due to the influence of the landowning aristocracy, who
were bent upon admitting no such concurrent authority on their
latifundia. This may have been so, and the extension of large-scale
possessions to the Provinces may have carried the system abroad. At
all events there it was, and it suited the convenience of a grasping
emperor: he had only to get rid of the present possessor and carry
on the administration of the domain as before: his agents stepped
into the place of those employed by the late landlord, and only slight
modification of the current regulations would be required. He issued
a statute for management of ‘crown-property’ as he would for a
municipality. It was in effect a local law, and it does not appear that
the common law administered by the ordinary courts could override
it. The imperial procurator was practically the magistrate charged
with its administration in addition to his financial duties, for
government and extraction of revenue were really two sides of the
same function. Obviously the interests of the emperor, of his agent,
of the head-tenants, and of the peasant cultivators, were not the
same. But the peasant, who wanted to pay as little as possible, and
the emperor who wanted to receive steady returns—as large as
possible, but above all things steady—had a common interest in
preventing unlawful exactions, by which a stable income was
imperilled and the prosperity of the cultivator impaired. On the other
hand the procurator and the conductor could only make illicit profits
through combining to rob the emperor by squeezing his coloni. How
to accomplish this was no doubt a matter of delicate calculation.
How much oppression would the coloni stand without resorting to
the troublesome and risky process of an appeal? We only hear of
one or two appeals made with success. Of those that were made
and rejected or foiled by various arts, and of those abandoned in
despair at an early stage, we get no record. Yet that such cases did
occur, perhaps not seldom, we may be reasonably sure.
It is well to remember that Columella, in whose treatise letting of
farms to tenants first appears, not as an occasional expedient but as
part of a reasoned scheme of estate-management, makes provision
for a procurator[1428] as well as a vilicus. One duty of the former is
to keep an eye on the latter. In the management of great estates an
atmosphere of mistrust is perhaps to some extent unavoidable. In
an agricultural system based on slave labour, this mistrust begins at
the very bottom of the structure and reaches to the very top, as is
shewn by all experience ancient and modern. Industry in slaves,
diligence and honesty in agents and stewards, are not to be relied
on when these subordinates have no share in the profit derived from
the practice of such virtues. And mistrust of slaves and freedmen did
not imply a simple trust in free tenants. Columella only advises[1429]
letting to tenants in circumstances that make it impracticable to
cultivate profitably by a slave-staff under a steward. The plan is a
sort of last resort, and it can only work well if the tenants stay on
continuously. Therefore care should be taken to make the position of
the coloni permanently attractive. This advice is primarily designed
for Italy, but its principles are of general application, and no doubt
justified by experience. Their extension to latifundia abroad, coupled
with a falling-off in the supply of slaves, led to similar results: great
estates might still be in part worked by slave labour under stewards,
but letting parcels to small tenants became a more and more vital
feature of the system. But to deal directly from a distance with a
number of such peasant farmers would be a troublesome business.
We need not wonder that it became customary to let large blocks of
land, even whole latifundia, to big lessees, speculative men who
undertook the subletting and rent-collecting of part of their holdings,
while they could work the central manor-farm by slave labour on
their own account, and generally exploit the situation for their own
profit. Thus, as once the latifundium had absorbed little properties,
so now its subdivision was generating little tenancies, with chief-
tenants as a sort of middlemen between the dominus and the coloni.
To protect the colonus, the powers of the conductor[1430] had to be
strictly limited: to ease the labour-problem and retain the conductor,
a certain amount of task-work had to be required of the colonus.
And this last condition was ominous of the coming serfdom.
If the economic situation and the convenience of non-resident
landlords operated to produce a widespread system of letting to
small tenants, it was naturally an object to levy the rents in such a
form as would best secure a safe and regular return. To exact a fixed
money-rent would mean that the peasant must spend time in
marketing his produce in order to procure the necessary cash, and
thereby lessen the time spent in actual farm-labour. In bad years he
would look for an abatement of his rent, nor would it be easy to
satisfy him: here was material for disputes and discontent. Such
difficulties were known in Italy and elsewhere, and jurists
recognized[1431] an advantage of the ‘partiary’ system in this
connexion. An abatement of rent due in a particular year need not
imply that the landlord lost the amount of abatement for good and
all. If the next year produced a ‘bumper’ crop, the landlord was
entitled to claim restitution of last year’s abatement in addition to
the yearly rent. This too, it seems, in the case of a tenant sitting at a
fixed money-rent. But the partiarius colonus is on another footing:
he shares gain and loss with the dominus, with whom he is a quasi-
partner[1432]. It was surely considerations of this kind that led to the
adoption of the share-rent system on these great African estates. By
fixing the proportion on a moderate scale, the peasant was fairly
certain to be able to pay his rent, and he would not be harassed
with money transactions dependent on the fluctuations in the price
of corn. Under such conditions he was more likely to be contented
and to stay on where he was, and that this should be so was
precisely what the landlord desired. On the other hand the big
conductor might pay rent either in coin or kind. He was a speculator,
doubtless well able to take care of his own interests: probably the
normal case was that he agreed to a fixed cash payment, and only
took the lease on terms that left him a good prospect of making it a
remunerative venture. But on this point there is need of further
evidence.
When the emperor took over an estate of this kind, such an
existing organization would be admirably fitted to continue under the
fiscal administration. Apparently this is just what happened. One
small but important improvement would be automatically produced
by the change. The coloni would now become coloni Caesaris[1433]
and whatever protection against exactions of conductores they may
have enjoyed under the sway of their former lords was henceforth
not less likely to be granted and much more certain of effect. To the
fiscal officials any course of action tending to encourage permanent
tenancies and steady returns would on the face of it be welcome: for
it was likely to save them trouble, if not to bring them credit. The
only influence liable to incline them in another direction was
corruption in some form or other, leading them to connive at
misdeeds of the local agents secretly in league with the head-lessees
on the spot. That cases of such connivance occurred in the period
from Trajan to Severus is not to be doubted. During the following
period of confusion they probably became frequent. But it was not
until Diocletian introduced a more elaborate imperial system, and
increased imperial burdens to defray its greater cost, that the evil
reached its height. Then the corruption of officials tainted all
departments, and was the canker ever gnawing at the vital forces of
the empire. But that this deadly corruption was a sudden growth out
of an existing purity is not to be imagined. All this is merely an
illustration of that oldest of political truisms, that to keep practice
conformable to principle is supremely difficult. The only power that
seems to be of any effect in checking the decay of departmental
virtue is the power of public opinion. Now a real public opinion
cannot be said to have existed in the Roman Empire; and, had it
existed, there was no organ through which it could be expressed.
And the Head of the State, let him be ever so devoted to the
common weal, was too overburdened with manifold responsibilities
to be able to give personal attention to each complaint and prescribe
an equitable remedy.
How far we are entitled to trace a movement of policy by the
contents of these African inscriptions is doubtful. They are too few,
and too much alike. Perhaps we may venture to detect a real step
onward in the latest of them. The renewal of the encouragement of
squatter-settlers[1434] on derelict lands does surely point to a
growing consciousness that the food-question was becoming a more
and more serious one. Perhaps it may be taken to suggest that the
system of leasing the African domains to big conductores had lately
been found failing in efficiency. But it is rash to infer much from a
single case: and the African Severus may have followed an
exceptional policy in his native province. It is when we look back
from the times of the later Empire, with its frantic legislation to bind
coloni to the soil, and to enforce the cultivation of every patch of
arable ground, that we are tempted to detect in every record
symptoms of the coming constraint. As yet the central government
had not laid its cramping and sterilizing hand on every part of its
vast dominions. Moreover the demands on African productivity had
not yet reached their extreme limit. There was as yet no
Constantinople, and Egypt still shared with Africa the function of
supplying food to Rome. Thus it is probably reasonable to believe
that the condition of the working tenant-farmers was in this age a
tolerable[1435] one. If those on the great domains were bit by bit
bound to their holdings, it was probably with their own consent, so
far at least that, seeing no better alternative, they became stationary
and more or less dependent peasants. In other parts of Africa, for
instance near Carthage, we hear of wealthy landowners employing
bodies of slaves. Some of these men may well have been Italians: at
least they took a leading part later in the rising against Maximin and
the elevation of Gordian.
In connexion with the evidence of this group of inscriptions it may
be not out of place to say a few words on the view set forth by
Heisterbergk, that the origin of the later serf-colonate was Provincial,
not Italian. He argues[1436] that what ruined small-scale farming in
Italy was above all things the exemption of Italian land from
taxation. Landlords were not constrained by the yearly exaction of
dues to make the best economic use of their estates. Vain land-pride
and carelessness were not checked: mismanagement and waste had
free course, and small cultivation declined. The fall in free rustic
population was both effect and cause. In the younger Pliny’s time
good tenants were already hard to find, but great landlords owned
parks and mansions everywhere. In the Provinces nearly all the land
was subject to imperial taxation in kind or in money, and owners
could not afford to let it lie idle. The practical control of vast estates
was not possible from a distance. The direction of agriculture,
especially of extensive farming (corn etc) from a fixed centre was
little less difficult. There was therefore strong inducement to
delegate the business of cultivation to tenants, and to let the
difference in amount between their rents and the yearly imperial
dues represent the landlord’s profit. Thus the spread of latifundia
swallowed up small holdings in the Provinces as in Italy; but it
converted small owners into small tenants, and did not merge the
holdings into large slave-gang plantations or throw them into
pasture. The plan of leasing a large estate as a whole to a big head-
tenant, or establishing him in the central ‘manor farm,’ was quite
consistent with the general design, and this theory accounts for the
presence of a population of free coloni, whom later legislation might
and did bind fast to the soil.
This argument has both ingenuity and force, but we can only
assent to it with considerable reservations. Letting to free coloni was
a practice long used in Italy, and in the first century ad was evidently
becoming more common. It was but natural that it should appear in
the Provinces. Still, taken by itself, there is no obvious reason why it
should develope into serfdom. With the admitted scarcity and rising
value of labour, why was it that the freeman did not improve his
position in relation to his lord, indeed to capitalists in general? I
think the presence of the big lessee, the conductor, an employer of
slave labour, had not a little to do with it. Labour as such was
despised. The requirement of task-work to supplement that of slaves
on the ‘manor farm’ was not likely to make labour more esteemed.
Yet to get his little holding the colonus had to put up with this
condition. It may be significant that we hear nothing of coloni
working for wages in spare time. Was it likely that they would do so?
Then, when the conductor came to be employed as collector of rents
and other dues on the estate, his opportunities of illicit exaction
gave him more and more power over them; and, combined with
their reluctance to migrate and sacrifice the fruits of past labour,
reduced them[1437] more and more to a state of de facto
dependence. At the worst they would be semi-servile in fact, though
free in law; at the best they would have this outlook, without any
apparent alternative to escape their fate. This, I imagine, was the
unhappy situation that was afterwards recognized by law.
I must not omit to point out that I have said practically nothing on
the subject[1438] of municipal lands and their administration by the
authorities of the several res publicae or civitates. Of the importance
of this matter I am well aware, more particularly in connexion with
the development of emphyteusis under the perpetual leases granted
by the municipalities. In a general history of the imperial economics
this topic would surely claim a significant place. But it seems to have
little or no bearing on the labour conditions with which I am
primarily concerned, while it would add greatly to the bulk of a
treatise already too long. So too the incidence of taxation, and the
effects of degradation[1439] of the currency, influences that both
played a sinister part in imperial economics, belong properly to a
larger theme. Even the writers on land-surveying etc, the
agrimensores or gromatici, only touch my subject here and there
when it is necessary to speak of tenures, which cannot be ignored in
relation to labour-questions. All these matters are thoroughly and
suggestively treated in Seeck’s great history of the Decline and Fall
of the ancient world. Another topic left out of discussion is the
practical difference, if any, between the terms[1440] fundus and
saltus in the imperial domains. I can find no satisfactory materials
for defining it, and it does not appear to bear any relation to the
labour-question. The meaning of the term inquilinus is a more
important matter. If we are to accept Seeck’s ingenious
conclusions[1441], it follows that this term, regularly used by the
jurists of a house-tenant (urban) as opposed to colonus a tenant of
land (rustic), in the course of the second century began to put on a
new meaning. Marcus settled large numbers of barbarians on Roman
soil. These ‘indwellers’ were labelled as inquilini, a word implying
that they were imported aliens, distinct from the proper residents.
An analogous distinction existed in municipalities between
unprivileged ‘indwellers’ (incolae) and real municipes. Now a jurist’s
opinion[1442] in the first half of the third century speaks of inquilini
as attached (adhaerent) to landed estates, and only capable of being
bequeathed to a legatee by inclusion in the landed estate: and it
refers to a rescript of Marcus and Commodus dealing with a point of
detail connected with this rule of law. Thus the inquilinate seems to
have been a new condition implying attachment to the soil, long
before the colonate acquired a similar character. For the very few
passages, in which the fixed and dependent nature of the colonate is
apparently recognized before the time of Constantine, are with some
reason suspected of having been tampered with by the compilers of
the Digest, or are susceptible of a different interpretation. It is clear
that this intricate question cannot be fully discussed here. If these
rustic inquilini were in their origin barbarian settlers, perhaps two
conclusions regarding them may be reasonable. First, they seem to
be distinct from slaves, the personal property of individual owners.
For the evidence, so far as it goes, makes them attached[1443] to the
land, and only transferable therewith. Secondly, they are surely
labourers, tilling with their own hands the holdings assigned to
them. If this view of them be sound, we may see in them the
beginnings of a serf class. But it does not follow that the later
colonate was a direct growth from this beginning. We have noted
above several other causes contributing to that growth; in particular
the state of de facto fixity combined with increasing dependence, in
which the free colonus was gradually losing his freedom. Whether
the later colonate will ever receive satisfactory explanation in the
form of a simple and convincing theory, I cannot tell: at present it
seems best to admit candidly that, among the various influences
tending to produce the known result, I do not see my way[1444] to
distinguish one as supremely important, and to ignore the effect of
others. The opinion[1445] of de Coulanges, that the origin of the later
colonate is mainly to be sought in the gradual effect of custom (local
custom), eventually recognized (not created) by law, is perhaps the
soundest attempt at a brief expression of the truth.

XLIX. THE JURISTS OF THE DIGEST.

For the position of the colonus in Roman Law during the period
known as that of the ‘classic’ Jurists we naturally find our chief
source of evidence in the Digest. And it is not surprising that here
and there we find passages bearing on labour-questions more or
less directly. But in using this evidence it is most necessary to keep
in mind the nature and scope of this great compilation. First, it is not
a collection of laws. Actual laws were placed in the Codex, based on
previous Codes such as the Theodosian (439 ad), after a careful
process of sifting and editing, with additions to complete the work.
This great task was performed by Justinian’s commissioners in 14
months or less. The Justinian Code was confirmed and published in
529 ad, and finally in a revised form rather more than five years
later. Secondly, the Digest is a collection of opinions of lawyers
whose competence and authority had been officially recognized, and
whose responsa carried weight in the Roman courts. From early
times interpretation had been found indispensable in the
administration of the law; and in the course of centuries, both by
opinions on cases and by formal treatises, there had grown up such
a mass of written jurisprudence as no man could master. These
writings were specially copious in the ‘classic’ period (say from
Hadrian to Alexander 117-235). Actual laws are sometimes cited in
the form of imperial decisions, finally settling some disputed point.
But the normal product of discussion is the opinion of this or that
eminent jurist as to what is sound law in a particular question. The
different opinions of different authorities are often quoted side by
side. If this were all, we might congratulate ourselves on having
simply a collection of authentic extracts from named authors,
conveying their views in their own words. And no doubt many of the
extracts are of this character.
But the position is not in fact so simple as this. Tribonian and his
fellow-commissioners were set to work at the end of the year 530.
Their task was completed and the Digesta published with imperial
confirmation at the end of 533. Now the juristic literature in
existence, of which the Digest was to be an epitome superseding its
own sources, was of such prodigious bulk that three years cannot
have been sufficient for the work. To read, abstract, classify, and so
far as possible to harmonize, this mass of complicated material, was
a duty surely needing a much longer time for its satisfactory
performance. Moreover, as this official Corpus of jurisprudence was
designed for reference and citation as an authority in the courts, it
had to be[1446] brought up to date. That this necessity greatly
increased the commissioners’ burden is obvious: nor less so, that it
was a duty peculiarly difficult to discharge in haste, and liable, if
hurried, to result in obscurities inconsistencies and oversights. That
much of the Digest has suffered from overhaste in its production is
now generally admitted. Its evidence is therefore to be used with
caution. But on the subject of coloni the main points of interest are
attested by witnesses of high authority, such as Ulpian, in cited
passages not reasonably suspected of interpolation. And it is not
necessary to follow up a host of details. We have only to reconstruct
from the law-sources the characteristic features of agriculture and
rustic tenancy as it existed before the time of Diocletian; and these
features are on the whole significant and clear. Fortunately we are
not entirely dependent on collection and comparison of scattered
references from all parts of the great compilation. One title (xix 2
locati conducti)[1447] furnishes us with a quantity of relevant matter
classified under one head by the editors themselves.
First and foremost it stands out quite clear that the colonus is a
free man, who enters into a legal contract as lessee with lessor, and
that landlord and tenant are equally bound by the terms of the
lease. If any clause requires interpretation owing to special
circumstances having arisen, the jurist endeavours to lay down the
principles by which the court should be guided to an equitable
decision. For instance, any fact by which the productiveness of a
farm and therewith the solvency of the tenant are impaired may lead
to a dispute. Care is therefore taken to relieve the tenant of
responsibility for damage inflicted by irresistible force (natural or
human)[1448] or due to the landlord’s fault. But defects of climate
and soil[1449] give no claim to relief, since he is presumed to have
taken the farm with his eyes open: nor does the failure of worn-out
fruit trees, which tenants were regularly bound by their covenant to
replace. The chief rights of the landlord[1450] are the proper
cultivation of the farm and regular payment of the rent. In these the
law duly protects him. The tenant is bound not to let down the land
by neglect, or to defraud[1451] the landlord by misappropriating what
does not belong to him: rent is secured normally by sureties
(fideiussores)[1452] found by the tenant at the time of leasing, or
sometimes by the fact that all property of his on the farm is
expressly pledged[1453] to the lessor on this account. Thus it is the
aim of the law to guard the presumably poorer and humbler party
against hard treatment, while it protects the man of property against
fraud. In other words, it aims at strict enforcement of the
terms[1454] of lease, while inclined to construe genuinely doubtful
points or mistakes in favour[1455] of the party bound. That landlord
and tenant, even in cases of fixed money rent, have a certain
community[1456] of interest, seems recognized in the fact that some
legal remedies against third persons (for malicious damage etc)
could in some cases be employed[1457] by either landlord or tenant.
In short, the latter is a thoroughly free and responsible person.
That a tenant should be protected against disturbance[1458] was a
matter of course. During the term of his lease he has a right to
make his lawful profit on the farm: the landlord is not only bound to
allow him full enjoyment (frui licere), but to prevent molestation by a
third party over whom he has control. Indeed the tenant farmer has
in some relations a more positive protection than the landlord
himself. Thus a person who has right of usus over an estate may in
certain circumstances refuse[1459] to admit the dominus; but not the
colonus or his staff of slaves employed in the farm-work. Change of
ownership can perhaps never be a matter of indifference to the
sitting tenant of a farm. But it is the lawyer’s aim to see that the
passing of the property shall not impair the tenant’s rights under his
current lease. A lease sometimes contained clauses fixing the terms
(such as a money forfeit)[1460] on which the contract might be
broken; in fact a cross-guarantee between the parties, securing the
tenant against damage by premature ejectment and the landlord
against damage by the tenant’s premature quitting. The jurists often
appeal to local custom as a means of equitable decision on disputed
points. But one customary principle seems to be recognized[1461] as
of general validity, the rule of reconductio. If, on expiration of a
lease, the tenant holds on and the landlord allows him to remain, it
is regarded as a renewal of the contract by bare agreement (nudo
consensu). No set form of lease is necessary; but this tacit contract
holds good only from year to year. Another fact significant as to the
position of the colonus is that he is assumed to have the right to
sublet[1462] the farm: questions that would in that case arise are
dealt with as matters of course. I suppose that a lease might be so
drawn as to bar any such right, but that in practice it was always or
generally admitted. Again, it is a sign of his genuinely independent
position in the eye of the law that his own oath, if required of him,
may be accepted[1463] as a counter-active plea (exceptio
iurisiurandi) in his own defence, when sued by his landlord for
damage done on the farm.
On the economic side we have first to remark that the colonus is
represented as normally a man of small means. It is true that in the
Digest conductor and colonus are not clearly[1464] distinguished, as
we find them in the African inscriptions and in the later law. For the
former is simply the counterpart of locator, properly connoting the
relation between the contracting parties: colonus expresses the fact
that the cultivation (colere) of land belonging to another devolves
upon him by virtue of the contract. Every colonus is a conductor, but
not every conductor a colonus. Now custom, recognized by the
lawyers, provided a means of supplying the small man’s need of
capital. To set him up in a farm, the landlord equipped him with a
certain stock (instrumentum). This he took over at a valuation, not
paying ready money for it, but accepting liability[1465] to account for
the value at the end of his tenancy. The stock or plant included[1466]
implements and animals (oxen, slaves, etc), and a miscellaneous
array of things, of course varying with the nature of the farm and
local custom. To this nucleus he had inevitably to add
belongings[1467] of his own, which were likely to increase with time
if the farm prospered in his hands. His rent[1468] might be either a
fixed yearly payment in cash or produce, or a proportionate share of
produce varying from year to year. The money-rent[1469] seems to
have been the usual plan, and it was in connexion therewith that
claims for abatement generally arose. The impression left by the
frequent references to reliqua in the Digest, and the experiences of
the younger Pliny, is that tenant-farmers in Italy were habitually
behind with their rents and claiming[1470] remissio. This is probably
true of the period (say) 100-250 ad, with which we are here
concerned. It was probably a time of great difficulty for both
landlords and tenants, at least outside the range of suburban
market-gardening. Signs are not lacking that want of sufficient
capital[1471] cramped the vigour of agriculture directly and indirectly.
Improvements might so raise the standard of cultivation on an
estate as to leave an awkward problem for the owner. Its upkeep on
its present level might need a large capital; tenants of means were
not easy to find, and subdivision into smaller holdings would not in
all circumstances provide a satisfactory solution. Moreover, if the
man of means was not unlikely to act independently, in defiance of
the landlord, the small man was more likely to take opportunities of
misappropriating things to which he was not entitled.
All these difficulties, and others, suggest no great prosperity in
Italian agriculture of the period. That on certain soils farming did not
pay, was as well known[1472] to the jurists as to other writers. And
one great cause of agricultural decline appears in their incidental
remarks as clearly as in literature. It was the devotion of much of
the best land in the best situations to the unproductive parks and
pleasure-grounds of the rich. This can hardly be laid to the account
of the still favoured financial position of Italy as compared with the
Provinces, for we find the same state of things existing late in the
fourth century, when Italy had long been provincialized and taxed
accordingly. It was fashion, and fashion of long standing, that
caused this evil. And this cause was itself an effect of the conditions
of investment. The syndicates for exploiting provincial dues had
gone with the Republic. State contracts and industrial enterprises
were not enough to employ all the available capital. The ownership
of land, now that politics were not a school of ambition, was more
than ever the chief source of social importance. A man who could
afford to own vast unremunerative estates was a great personage.
We may add that such estates, being unremunerative, were less
likely to attract the fatal attention of bad emperors, while good
rulers deliberately encouraged rich men to invest fortunes in them as
being an evidence of loyalty to the government. The uneconomic
rural conditions thus created are plainly referred to in the staid
remarks of the jurists. We read of estates owned for pleasure
(voluptaria praedia)[1473]: of cases where it may be doubted[1474]
whether the fundus does not rather belong to the villa than the villa
to the fundus: and the use of the word praetorium[1475] (= great
mansion, palace, ‘Court’) for the lord’s headquarters on his demesne
becomes almost official in the mouth of lawyers. Meanwhile great
estates abroad could be, and were, profitable to their owners, who
drew rent from tenants and were normally non-resident. Yet
praetoria were sometimes found even in the Provinces.
In connexion with this topic it is natural to consider the questions
of upkeep and improvements. The former is simple. As the tenant
has the disposal of the crops raised and gathered (fructus), he is
bound[1476] to till the soil, to keep up the stock of plants, and to see
that the drainage of the farm is in working order. Further detail is
unnecessary, as his liability must be gauged by the state of the farm
when he took it over. Improvements look to the future. From the
lawyers we get only the legal point of view, which is of some interest
as proving that the subject was of sufficient importance not to be
overlooked. Now it seems certain that a conductor or colonus had a
right of action to recover[1477] from the dominus not only
compensation for unexhausted improvements, but his whole outlay
on them, if shewn to have been beneficial. Or his claim might rest
on the fact that the project had been approved[1478] by the landlord.
But it might happen that a work beneficial to the particular estate
was detrimental to a neighbouring one. In such a case, against
whom—landlord or tenant—had the owner of that estate a legal
remedy? It was held that, if the tenant had carried out the work in
question[1479] without his landlord’s knowledge, he alone was liable.
If, as some held, the landlord was bound to provide a particular
remedy, he could recover the amount paid under this head from his
tenant. To insure the owner against loss from the acts of his lessee
was evidently an object of the first importance, and this is in
harmony with the Roman lawyers’ intense respect for rights of
property. The general impression left on the reader of their
utterances on this subject is that a landlord, after providing a
considerable instrumentum, had done all that could reasonably be
expected from him. Improvements, the desirability of which was
usually discovered through the tenant’s experience, were normally
regarded as the tenant’s business: it was only necessary to prevent
the landlord from arbitrarily confiscating what the tenant had done
to improve his property. Obviously such ‘improvements’ were likely
to occasion disputes as to the value of the work done: but it was the
custom of the countryside to refer technical questions of this kind to
the arbitration of an impartial umpire (vir bonus), no doubt a
neighbour familiar with local circumstances. On the whole, it does
not appear that the law treated the colonus badly under this head,
and the difficulty of securing good tenants may be supposed to have
guaranteed him against unfair administration.
A great many more details illustrating the position of coloni as
they appear in the Digest could be added here, but I think the above
will be found ample for my purpose. The next topic to be dealt with
is that of labour, so far as the references of the lawyers give us any
information. First it is to be noted that the two systems[1480] of
estate-management, that of cultivation for landlord’s account by his
actor or vilicus, and that of letting to tenant farmers, were existing
side by side. The latter plan was to all appearance more commonly
followed than it would seem to have been in the time of Columella,
but the former was still working. A confident opinion as to the
comparative frequency[1481] of the two systems is hardly to be
formed on Digest evidence: for in rustic matters the interest of
lawyers was almost solely concerned with the relations of landlord
and tenant. What an owner did with his own property on his own
account was almost entirely his own business. There are signs that a
certain change in the traditional nomenclature represents a real
change of function in the case of landlords’ managers. The term
actor is superseding[1482] vilicus, but the vilicus still remains. He
would seem to be now more of a mere farm-bailiff, charged with the
cultivation of some part or parts of an estate that are not let to
tenants. It may even be that he is left with a free hand and only
required to pay a fixed[1483] yearly return. If so, this arrangement is
not easily to be distinguished from the case of a slave colonus or
quasi colonus[1484] occupying a farm. The financial and general
supervision of the estate is in the hands of the actor[1485], who
collects all dues, including rents of colonie and is held to full
account[1486] for all these receipts as well as for the contents of the
store-rooms. He is a slave, but a valuable and trusted man: it is
significant that the manumission[1487] of actores is not seldom
mentioned. Evidently the qualities looked for in such an agent were
observed to develope most readily under a prospect of freedom. But,
so long as he remained actor of an estate, he could be regarded as
part of it: in a bequest the testator could include him as a part[1488],
and often did so: and indeed his peculiar knowledge of local detail
must often have been an important element in its value. To employ
such a person in the management of an estate, with powerful
inducements to good conduct, may have solved many a difficult
problem. We may perhaps guess that it made the employment of a
qualified legal agent (procurator) less often necessary, at least if the
actor contrived to avoid friction with his master’s free tenants.
Whether an estate was farmed for the owner by his manager, or
let to tenants, or partly on one system partly on the other, it is clear
that slave-labour is assumed as the normal basis of working. For the
colonus takes over slaves supplied by the dominus as an item of the
instrumentum. And there was nothing to prevent him from adding
slaves of his own, if he could afford it and thought it worth his while
to employ a larger staff. Whether such additions were often or ever
made, we must not expect the lawyers to tell us; but we do now and
then hear[1489] of a slave who is the tenant’s own. Such a slave
might as part of the tenant’s goods be pledged to the landlord as
security for his rent, but he would not be a part of the estate of
which the landlord could dispose by sale or bequest. In such a case
the slaves might be regarded[1490] as accessories of the fundus, if it
were so agreed. This raised questions as to the degree of connexion
that should be treated as qualifying a slave to be considered an
appurtenance of a farm. The answer was in effect that he must be a
member of the regular staff. Mere temporary employment on the
place did not so attach him, mere temporary absence on duty
elsewhere did not detach him. A further question was whether all
slaves in any sort of employment on the place were included, or only
such as were actually engaged in farm work proper, cultivation of the
soil, not those employed in various subsidiary[1491] industries. These
questions the jurists discussed fully, but we cannot follow them here,
as their legal importance is chiefly in connexion with property and
can hardly have affected seriously the position of tenants. But it is
interesting to observe that the lawyers were feeling the necessity of
attempting some practical classification. The distinction[1492]
between urbana and rustica mancipia was old enough as a loose
conversational or literary one. But, when rights of inheritance or
legacy of such valuable property were involved, it became important
to define (if possible) the essential characteristics of a ‘rustic’ slave.
That the condition of the rustic slave was improving, and generally
far better than it had been on the latifundia of Republican days,
seems indicated by the jurists’ speaking of a slave as colonus or
quasi colonus without any suggestion of strangeness in the relation.
We may assume that only slaves of exceptional capacity and merit
would be placed in a position of economic (if not legal) equality with
free tenants. Still the growth of such a custom can hardly have been
without some effect on the condition of rustic slaves in general. It
was not new in the second century: it is referred to by a jurist[1493]
of the Augustan age. The increasing difficulty of getting either good