0% found this document useful (0 votes)
2 views

ML-ToC

The document is a detailed table of contents for a comprehensive book on machine learning, covering various topics such as data understanding, learning theories, regression analysis, decision trees, and neural networks. It outlines the structure of the book, including chapters on supervised, unsupervised, and reinforcement learning, as well as advanced topics like clustering algorithms and ensemble learning. Each section is organized to guide readers through foundational concepts to more complex applications in machine learning.

Uploaded by

arunworkspace20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ML-ToC

The document is a detailed table of contents for a comprehensive book on machine learning, covering various topics such as data understanding, learning theories, regression analysis, decision trees, and neural networks. It outlines the structure of the book, including chapters on supervised, unsupervised, and reinforcement learning, as well as advanced topics like clustering algorithms and ensemble learning. Each section is organized to guide readers through foundational concepts to more complex applications in machine learning.

Uploaded by

arunworkspace20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Detailed Contents

Preface iv
Acknowledgements vii
QR Code Content Details viii

1. Introduction to Machine 2.5.1 Data Visualization 36


Learning1 2.5.2 Central Tendency 38
1.1 Need for Machine Learning 1 2.5.3 Dispersion 40
1.2 Machine Learning Explained 3 2.5.4 Shape 41
1.3 M
 achine Learning in Relation 2.5.5 Special Univariate Plots 43
to Other Fields 5 2.6 B
 ivariate Data and Multivariate
1.3.1 M
 achine Learning and Data44
Artificial Intelligence 5 2.6.1 Bivariate Statistics 46
1.3.2 M
 achine Learning, Data 2.7 Multivariate Statistics 47
Science, Data Mining, and
2.8 E
 ssential Mathematics for
Data Analytics 5
Multivariate Data 49
1.3.3 M
 achine Learning and
2.8.1 L
 inear Systems and Gaussian
Statistics6
Elimination for Multivariate
1.4 Types of Machine Learning 7 Data49
1.4.1 Supervised Learning 8 2.8.2 Matrix Decompositions 50
1.4.2 Unsupervised Learning 11 2.8.3 M
 achine Learning and
1.4.3 Semi-supervised Learning 12 Importance of Probability
1.4.4 Reinforcement Learning 12 and Statistics 52
1.5 Challenges of Machine Learning 13 2.9 Overview of Hypothesis 57
1.6 Machine Learning Process 14 2.9.1 C
 omparing Learning
Methods59
1.7 Machine Learning Applications 15
2.10 F
 eature Engineering and
2. Understanding Data 22 Dimensionality Reduction
2.1 What is Data? 22 Techniques62
2.1.1 Types of Data 24 2.10.1 S
 tepwise Forward
Selection63
2.1.2 D
 ata Storage and
Representation25 2.10.2 S
 tepwise Backward
Elimination63
2.2 B
 ig Data Analytics and Types of
Analytics26 2.10.3 P
 rincipal Component
Analysis63
2.3 Big Data Analysis Framework 27
2.10.4 L
 inear Discriminant
2.3.1 Data Collection 29
Analysis67
2.3.2 Data Preprocessing 30
2.10.5 S
 ingular Value
2.4 Descriptive Statistics 34 Decomposition68
2.5 U
 nivariate Data Analysis and
Visualization36
Detailed Contents  xi

3. Basics of Learning Theory 77 4.2 Nearest-Neighbor Learning 117


3.1 I ntroduction to Learning and  eighted K-Nearest-Neighbor
4.3 W
its Types 78 Algorithm120
3.2 I ntroduction to Computation 4.4 Nearest Centroid Classifier 123
Learning Theory 80 4.5 L
 ocally Weighted
3.3 Design of a Learning System 81 Regression (LWR) 124
3.4 I ntroduction to Concept 5. Regression Analysis 130
Learning82
5.1 Introduction to Regression 130
3.4.1 R
 epresentation of a
5.2 I ntroduction to Linearity,
Hypothesis83
Correlation, and Causation 131
3.4.2 Hypothesis Space 85
5.3 I ntroduction to Linear
3.4.3 Heuristic Space Search 85 Regression134
3.4.4 G
 eneralization and 5.4 V
 alidation of Regression
Specialization86 Methods138
3.4.5 H
 ypothesis Space Search by 5.5 Multiple Linear Regression 141
Find-S Algorithm 88
5.6 Polynomial Regression 142
3.4.6 Version Spaces 90
5.7 Logistic Regression 144
3.5 Induction Biases 94
5.8 R
 idge, Lasso, and Elastic
3.5.1 Bias and Variance 95 Net Regression 147
3.5.2 Bias vs Variance Tradeoff 96 5.8.1 Ridge Regularization 148
3.5.3 Best Fit in Machine Learning 97 5.8.2 LASSO 149
3.6 Modelling in Machine Learning 97 5.8.3 Elastic Net 149
3.6.1 M
 odel Selection and Model
Evaluation98 6. Decision Tree Learning 155
3.6.2 Re-sampling Methods 99 6.1 I ntroduction to Decision Tree
3.7 Learning Frameworks 104 Learning Model 155

3.7.1 PAC Framework 104 6.1.1 S


 tructure of a Decision
Tree156
3.7.2 E
 stimating Hypothesis
Accuracy106 6.1.2 Fundamentals of Entropy 159

3.7.3 Hoeffding’s Inequality 106 6.2 D


 ecision Tree Induction
Algorithms161
3.7.4 V
 apnik–Chervonenkis
Dimension107 6.2.1 ID3 Tree Construction 161
6.2.2 C4.5 Construction 167
4. Similarity-based Learning 115
6.2.3 C
 lassification and Regression
4.1 I ntroduction to Similarity or Trees Construction 175
Instance-based Learning 116 6.2.4 Regression Trees 185
4.1.1 D
 ifferences Between Instance- 6.3 V
 alidating and Pruning of
and Model-based Learning 116 Decision Trees 190

ML_FM.indd 11 27-Mar-21 8:45:05 AM


xii  Detailed Contents

7. Rule–based Learning 196 9. Probabilistic Graphical Models 253


7.1 Introduction 196 9.1 Introduction 253
7.2 Sequential Covering Algorithm 198 9.2 Bayesian Belief Network 254
7.2.1 PRISM 198 9.2.1 Constructing BBN 254
7.3 First Order Rule Learning 206 9.2.2 Bayesian Inferences 256
7.3.1 F
 OIL (First Order Inductive 9.3 Markov Chain 261
Learner Algorithm) 208 9.3.1 Markov Model 261
7.4 Induction as Inverted Deduction 215 9.3.2 Hidden Markov Model 263
7.5 Inverting Resolution 215 9.4 Problems Solved with HMM 264
7.5.1 R
 esolution Operator 9.4.1 Evaluation Problem 265
(Propositional Form) 215
9.4.2 C
 omputing Likelihood
7.5.2 I nverse Resolution Operator Probability267
(Propositional Form) 216
9.4.3 Decoding Problem 269
7.5.3 First Order Resolution 216
9.4.4 Baum-Welch Algorithm 272
7.5.4 I nverting First Order
Resolution216 10. Artificial Neural Networks 279
7.6 A
 nalytical Learning or Explanation 10.1 Introduction 280
Based Learning (EBL) 217 10.2 Biological Neurons 280
7.6.1 Perfect Domain Theories 218 10.3 Artificial Neurons 281
7.7 Active Learning 221 10.3.1 S
 imple Model of an Artificial
7.7.1 A
 ctive Learning Neuron281
Mechanisms222 10.3.2 A
 rtificial Neural Network
7.7.2 Q
 uery Strategies/Selection Structure282
Strategies223 10.3.3 Activation Functions 282
7.8 Association Rule Mining 225 10.4 P
 erceptron and Learning
Theory284
8. Bayesian Learning 234
10.4.1 XOR Problem 287
8.1 I ntroduction to Probability-based
Learning234 10.4.2 D
 elta Learning Rule and
Gradient Descent 288
8.2 Fundamentals of Bayes Theorem 235
10.5 T
 ypes of Artificial Neural
8.3 Classification Using Bayes Model 235
Networks288
8.3.1 Naïve Bayes Algorithm 237
10.5.1 F
 eed Forward Neural
8.3.2 B
 rute Force Bayes Network289
Algorithm243
10.5.2 F
 ully Connected Neural
8.3.3 Bayes Optimal Classifier 243 Network289
8.3.4 Gibbs Algorithm 244 10.5.3 M
 ulti-Layer Perceptron
8.4 N
 aïve Bayes Algorithm for (MLP)289
Continuous Attributes 244 10.5.4 Feedback Neural Network290
8.5 O
 ther Popular Types of Naive 10.6 L
 earning in a Multi-Layer
Bayes Classifiers 247 Perceptron290

ML_FM.indd 12 27-Mar-21 8:45:05 AM


Detailed Contents  xiii

10.7 R
 adial Basis Function Neural 12.3.2 Cascading 347
Network297 12.4 Sequential Ensemble Models 347
10.8 Self-Organizing Feature Map 301 12.4.1 AdaBoost 347
10.9 P
 opular Applications of Artificial
Neural Networks 306 13. Clustering Algorithms 361
10.10 A
 dvantages and Disadvantages 13.1 I ntroduction to Clustering
of ANN 306 Approaches361
10.11 C
 hallenges of Artificial Neural 13.2 Proximity Measures 364
Networks307 13.3 H
 ierarchical Clustering
Algorithms368
11. Support Vector Machines 312
13.3.1 S
 ingle Linkage or MIN
11.1 I ntroduction to Support Vector Algorithm368
Machines  312
13.3.2 C
 omplete Linkage or MAX
11.2 Optimal Hyperplane 314 or Clique 371
11.3 F
 unctional and Geometric 13.3.3 Average Linkage 371
Margin316
13.3.4 M
 ean-Shift Clustering
11.4 H
 ard Margin SVM as an Algorithm372
Optimization Problem 319
13.4 Partitional Clustering Algorithm 373
11.4.1 L
 agrangian Optimization
13.5 Density-based Methods 376
Problem320
13.6 Grid-based Approach 377
11.5 S
 oft Margin Support Vector
Machines323 13.7 P
 robability Model-based
Methods379
11.6 I ntroduction to Kernels and
Non-Linear SVM 326 13.7.1 Fuzzy Clustering 379

11.7 K
 ernel-based Non-Linear 13.7.2 E
 xpectation-Maximization
Classifier 330 (EM) Algorithm 380

11.8 Support Vector Regression 331 13.8 Cluster Evaluation Methods 382
11.8.1 R
 elevance Vector 14. Reinforcement Learning 389
Machines333
14.1 O
 verview of Reinforcement
12. Ensemble Learning 339 Learning389

12.1 Introduction 339 14.2 S


 cope of Reinforcement
Learning390
12.1.1 Ensembling Techniques 341
14.3 R
 einforcement Learning As
12.2 Parallel Ensemble Models 341
Machine Learning 392
12.2.1 Voting 341
14.4 C
 omponents of Reinforcement
12.2.2 Bootstrap Resampling 341 Learning393
12.2.3 Bagging 342 14.5 Markov Decision Process 396
12.2.4 Random Forest 342 14.6 M
 ulti-Arm Bandit Problem
12.3 Incremental Ensemble Models 346 and Reinforcement Problem
12.3.1 Stacking 347 Types398

ML_FM.indd 13 27-Mar-21 8:45:05 AM


xiv  Detailed Contents

14.7 M
 odel-based Learning (Passive 15.6 Evolutionary Computing 433
Learning)402 15.6.1 Simulated Annealing 433
14.8 Model Free Methods 406 15.6.2 Genetic Programming 434
14.8.1 Monte-Carlo Methods 407
16. Deep Learning 439
14.8.2 T
 emporal Difference
Learning408 16.1 I ntroduction to Deep Neural
Networks439
14.9 Q-Learning409
16.2 I ntroduction to Loss Functions
14.10 SARSA Learning 410
and Optimization 440
15. Genetic Algorithms 417 16.3 Regularization Methods 442
15.1 O
 verview of Genetic 16.4 Convolutional Neural Networks 444
Algorithms417 16.5 Transfer Learning 451
15.2 O
 ptimization Problems and 16.6 Applications of Deep Learning 451
Search Spaces 419
16.6.1 Robotic Control 451
15.3 G
 eneral Structure of a
16.6.2 L
 inear Systems and
Genetic Algorithm 420
Non-linear Dynamics 452
15.4 Genetic Algorithm Components 422
16.6.3 Data Mining 452
15.4.1 Encoding Methods 422
16.6.4 Autonomous Navigation 453
15.4.2 Population Initialization 424
16.6.5 Bioinformatics 453
15.4.3 Fitness Functions 424
16.6.6 Speech Recognition 453
15.4.4 Selection Methods 425
16.6.7 Text Analysis 454
15.4.5 Crossover Methods 428
16.7 Recurrent Neural Networks 454
15.4.6 Mutation Methods 429
16.8 LSTM and GRU 457
15.5 C
 ase Studies in Genetic
Algorithms430 Bibliography463
15.5.1 M
 aximization of a Index472
Function430 About the Authors 480
15.5.2 G
 enetic Algorithm Related Titles 481
Classifier 433

Scan for ‘Appendix 1 - Python Language Fundamentals'

Scan for ‘Appendix 2 - Python Packages'

Scan for ‘Appendix 3 - Lab Manual with 25 Exercises'

ML_FM.indd 14 27-Mar-21 8:45:06 AM


Chapter 1

Introduction to
Machine Learning
“Computers are able to see, hear and learn. Welcome to the future.”
— Dave Waters

Machine Learning (ML) is a promising and flourishing field. It can enable top management of an
organization to extract the knowledge from the data stored in various archives of the business
organizations to facilitate decision making. Such decisions can be useful for organizations to
design new products, improve business processes, and to develop decision support systems.

Learning Objectives
• Explore the basics of machine learning
• Introduce types of machine learning
• Provide an overview of machine learning tasks
• State the components of the machine learning algorithm
• Explore the machine learning process
• Survey some machine learning applications

1.1 NEED FOR MACHINE LEARNING


Business organizations use huge amount of data for their daily activities. Earlier, the full potential of
this data was not utilized due to two reasons. One reason was data being scattered across different
archive systems and organizations not being able to integrate these sources fully. Secondly, the
lack of awareness about software tools that could help to unearth the useful information from
data. Not anymore! Business organizations have now started to use the latest technology, machine
learning, for this purpose.
Machine learning has become so popular because of three reasons:
1. H
 igh volume of available data to manage: Big companies such as Facebook, Twitter, and
YouTube generate huge amount of data that grows at a phenomenal rate. It is estimated
that the data approximately gets doubled every year.

ML_01.indd 1 26-Mar-21 3:12:40 PM


2  Machine Learning

2. S
 econd reason is that the cost of storage has reduced. The hardware cost has also dropped.
Therefore, it is easier now to capture, process, store, distribute, and transmit the digital
information.
3. T
 hird reason for popularity of machine learning is the availability of complex algorithms
now. Especially with the advent of deep learning, many algorithms are available for
machine learning.
With the popularity and ready adaption of machine learning by business organizations, it
has become a dominant technology trend now. Before starting the machine learning journey, let
us establish these terms - data, information, knowledge, intelligence, and wisdom. A knowledge
pyramid is shown in Figure 1.1.

Wisdom

Intelligence
(applied
knowledge)

Knowledge
(condensed
information)

Information (processed data)

Data (mostly available as raw facts and


symbols)

Figure 1.1: The Knowledge Pyramid


What is data? All facts are data. Data can be numbers or text that can be processed by a
computer. Today, organizations are accumulating vast and growing amounts of data with data
sources such as flat files, databases, or data warehouses in different storage formats.
Processed data is called information. This includes patterns, associations, or relationships
among data. For example, sales data can be analyzed to extract information like which is the
fast selling product. Condensed information is called knowledge. For example, the historical
patterns and future trends obtained in the above sales data can be called knowledge. Unless
knowledge is extracted, data is of no use. Similarly, knowledge is not useful unless it is put
into action. Intelligence is the applied knowledge for actions. An actionable form of knowledge
is called intelligence. Computer systems have been successful till this stage. The ultimate
objective of knowledge pyramid is wisdom that represents the maturity of mind that is, so far,
exhibited only by humans.
Here comes the need for machine learning. The objective of machine learning is to process
these archival data for organizations to take better decisions to design new products, improve the
business processes, and to develop effective decision support systems.

ML_01.indd 2 26-Mar-21 3:12:40 PM


Introduction to Machine Learning  3

1.2 MACHINE LEARNING EXPLAINED


Machine learning is an important sub-branch of Artificial Intelligence (AI). A frequently quoted
definition of machine learning was by Arthur Samuel, one of the pioneers of Artificial Intelligence.
He stated that “Machine learning is the field of study that gives the computers ability to learn without being
explicitly programmed.”
The key to this definition is that the systems should learn by itself without explicit programming.
How is it possible? It is widely known that to perform a computation, one needs to write programs
that teach the computers how to do that computation.
In conventional programming, after understanding the problem, a detailed design of the
program such as a flowchart or an algorithm needs to be created and converted into programs
using a suitable programming language. This approach could be difficult for many real-world
problems such as puzzles, games, and complex image recognition applications. Initially, artificial
intelligence aims to understand these problems and develop general purpose rules manually.
Then, these rules are formulated into logic and implemented in a program to create intelligent
systems. This idea of developing intelligent systems by using logic and reasoning by converting an
expert’s knowledge into a set of rules and programs is called an expert system. An expert system
like MYCIN was designed for medical diagnosis after converting the expert knowledge of many
doctors into a system. However, this approach did not progress much as programs lacked real
intelligence. The word MYCIN is derived from the fact that most of the antibiotics’ names end with
‘mycin’.
The above approach was impractical in many domains as programs still depended on human
expertise and hence did not truly exhibit intelligence. Then, the momentum shifted to machine
learning in the form of data driven systems. The focus of AI is to develop intelligent systems by
using data-driven approach, where data is used as an input to develop intelligent models. The
models can then be used to predict new inputs. Thus, the aim of machine learning is to learn a
model or set of rules from the given dataset automatically so that it can predict the unknown data
correctly.
As humans take decisions based on an experience, computers make models based on extracted
patterns in the input data and then use these data-filled models for prediction and to take decisions.
For computers, the learnt model is equivalent to human experience. This is shown in Figure 1.2.

Experience Decisions
Humans

(a)
Data Model
Data- Learning program
base

(b)

Figure 1.2: (a) A Learning System for Humans (b) A Learning System
for Machine Learning
Often, the quality of data determines the quality of experience and, therefore, the quality of
the learning system. In statistical learning, the relationship between the input x and output y is

ML_01.indd 3 26-Mar-21 3:12:40 PM


4  Machine Learning

modeled as a function in the form y = f(x). Here, f is the learning function that maps the input x
to output y. Learning of function f is the crucial aspect of forming a model in statistical learning.
In machine learning, this is simply called mapping of input to output.
The learning program summarizes the raw data in a model. Formally stated, a model is an
explicit description of patterns within the data in the form of:
1. Mathematical equation
2. Relational diagrams like trees/graphs
3. Logical if/else rules, or
4. Groupings called clusters
In summary, a model can be a formula, procedure or representation that can generate data
decisions. The difference between pattern and model is that the former is local and applicable only
to certain attributes but the latter is global and fits the entire dataset. For example, a model can be
helpful to examine whether a given email is spam or not. The point is that the model is generated
automatically from the given data.
Another pioneer of AI, Tom Mitchell’s definition of machine learning states that, “A computer
program is said to learn from experience E, with respect to task T and some performance measure P,
if its performance on T measured by P improves with experience E.” The important components of this
definition are experience E, task T, and performance measure P.
For example, the task T could be detecting an object in an image. The machine can gain the
knowledge of object using training dataset of thousands of images. This is called experience E.
So, the focus is to use this experience E for this task of object detection T. The ability of the system
to detect the object is measured by performance measures like precision and recall. Based on the
performance measures, course correction can be done to improve the performance of the system.
Models of computer systems are equivalent to human experience. Experience is based on data.
Humans gain experience by various means. They gain knowledge by rote learning. They observe
others and imitate it. Humans gain a lot of knowledge from teachers and books. We learn many things
by trial and error. Once the knowledge is gained, when a new problem is encountered, humans
search for similar past situations and then formulate the heuristics and use that for prediction.
But, in systems, experience is gathered by these steps:
1. Collection of data
2. O
 nce data is gathered, abstract concepts are formed out of that data. Abstraction is used
to generate concepts. This is equivalent to humans’ idea of objects, for example, we have
some idea about how an elephant looks like.
3. G
eneralization converts the abstraction into an actionable form of intelligence.
It can be viewed as ordering of all possible concepts. So, generalization involves ranking
of concepts, inferencing from them and formation of heuristics, an actionable aspect of
intelligence. Heuristics are educated guesses for all tasks. For example, if one runs or
encounters a danger, it is the resultant of human experience or his heuristics formation.
In machines, it happens the same way.
4. H
 euristics normally works! But, occasionally, it may fail too. It is not the fault
of heuristics as it is just a ‘rule of thumb′. The course correction is done by taking
evaluation measures. Evaluation checks the thoroughness of the models and to-do
course correction, if necessary, to generate better formulations.

ML_01.indd 4 26-Mar-21 3:12:40 PM


Introduction to Machine Learning  5

1.3 MACHINE LEARNING IN RELATION TO OTHER FIELDS


Machine learning uses the concepts of Artificial Intelligence, Data Science, and Statistics primarily.
It is the resultant of combined ideas of diverse fields.

1.3.1 Machine Learning and Artificial Intelligence


Machine learning is an important branch of AI, which is a much broader subject. The aim of AI is
to develop intelligent agents. An agent can be a robot, humans, or any autonomous systems.
Initially, the idea of AI was ambitious, that is, to develop intelligent systems like human beings.
The focus was on logic and logical inferences. It had seen many ups and downs. These down
periods were called AI winters.
The resurgence in AI happened due to development of data driven systems. The aim is to find
relations and regularities present in the data. Machine learning is the subbranch of AI, whose aim
is to extract the patterns for prediction. It is a broad field that includes learning from examples and
other areas like reinforcement learning. The relationship of AI and machine learning is shown in
Figure 1.3. The model can take an unknown instance and generate results.
Artificial
intelligence

Machine learning
Deep
learning

Figure 1.3: Relationship of AI with Machine Learning


Deep learning is a subbranch of machine learning. In deep learning, the models are constructed
using neural network technology. Neural networks are based on the human neuron models. Many
neurons form a network connected with the activation functions that trigger further neurons to
perform tasks.

1.3.2 Machine Learning, Data Science, Data Mining, and Data Analytics
Data science is an ‘Umbrella’ term that encompasses many fields. Machine learning starts with
data. Therefore, data science and machine learning are interlinked. Machine learning is a branch
of data science. Data science deals with gathering of data for analysis. It is a broad field that
includes:

ML_01.indd 5 26-Mar-21 3:12:40 PM

You might also like