Machine Learning
Machine Learning
of Technology, Gorakhpur
By
Sushil Kumar Saroj
Assistant Professor
Email: [email protected]
17-08-2020 Side 1
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Syllabus
Unit-I
17-08-2020 Side 2
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
What is Learning?
“Learning denotes changes in a system that ... enable a system to do the same
task … more efficiently the next time.” - Herbert Simon
17-08-2020 Side 3
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
What is Learning?
17-08-2020 Side 4
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
• No human experts
• industrial/manufacturing control
• mass spectrometer analysis, drug design, astronomic discovery
• Black-box human expertise
• face/handwriting/speech recognition
• driving a car, flying a plane
• Rapidly changing phenomena
• credit scoring, financial modeling
• diagnosis, fraud detection
• Need for customization/personalization
• personalized news reader
• movie/book recommendation
17-08-2020 Side 5
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Related Fields
data
mining control theory
statistics
decision theory
databases
psychological models
evolutionary
models neuroscience
Components of Learning
17-08-2020 Side 7
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
• Geometric Models
• Probabilistic Models
• Logical Models
17-08-2020 Side 8
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
Geometric Models
In Geometric models, features could be described as points in two dimensions
(x- and y-axis) or a three-dimensional space (x, y, and z). Even when features
are not intrinsically geometric, they could be modelled in a geometric manner
(for example, temperature as a function of time can be modelled in two axes).
In geometric models, there are two ways we could impose similarity.
17-08-2020 Side 9
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
Probabilistic Models
Probabilistic models see features and target variables as random variables.
The process of modelling represents and manipulates the level of uncertainty
with respect to these variables. There are two types of probabilistic models:
Predictive and Generative.
• Predictive probability models use the idea of a conditional probability
distribution P (Y |X) from which Y can be predicted from X
• Generative models estimate the joint distribution P (Y, X). Once we
know the joint distribution for the generative models, we can derive any
conditional or marginal distribution involving the same variables
17-08-2020 Side 10
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
Logical Models
Logical models use a logical expression to divide the instance space into
segments and hence construct grouping models. A logical expression is an
expression that returns a Boolean value, i.e., a True or False outcome. Once
the data is grouped using a logical expression, the data is divided into
homogeneous groupings for the problem we are trying to solve.
There are two types of logical models: Tree models and Rule models.
• Rule models consist of a collection of implications or IF-THEN rules. For
tree-based models, the ‘if-part’ defines a segment and the ‘then-part’
defines the behaviour of the model for this segment. Rule models follow
the same reasoning
• Tree models can be seen as a particular type of rule model where the if-
parts of the rules are organised in a tree structure. Both Tree models and
Rule models use the same approach to supervised learning
17-08-2020 Side 11
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
17-08-2020 Side 12
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
Groping Models
17-08-2020 Side 13
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
Grading Models
17-08-2020 Side 14
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning Models
17-08-2020 Side 15
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 16
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 17
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Types of Learning
• Supervised learning
• Unsupervised learning
• Reinforcement learning
17-08-2020 Side 18
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Types of Learning
Supervised learning
In this type of learning, the data set on which the machine is trained consists
of labelled data or simply said, consists both the input parameters as well as
the required output.
Supervised Machine Learning Algorithms can be broadly divided into two
types of algorithms; Classification and Regression.
• Classification: Supervised learning problem that involves predicting a
class label
• Regression: Supervised learning problem that involves predicting a
numerical label
Types of Learning
Unsupervised learning
Unlike supervised learning algorithms, where we deal with labelled data for
training, the training data will be unlabelled for Unsupervised Machine
Learning Algorithms. The clustering of data into a specific group will be done
on the basis of the similarities between the variables.
• Clustering: Unsupervised learning problem that involves finding groups
in data
• Density estimation: Unsupervised learning problem that involves
summarizing the distribution of data
• Visualization: Unsupervised learning problem that involves creating plots
of data
• Projection: Unsupervised learning problem that involves creating lower-
dimensional representations of data
Examples: K-means clustering, neural networks
17-08-2020 Side 20
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Types of Learning
Reinforcement learning
Reinforcement Learning is a type of Machine Learning in which the machine
is required to determine the ideal behaviour within a specific context, in order
to maximize its rewards. It works on the rewards and punishment principle
which means that for any decision which a machine takes, it will be either be
rewarded or punished. Thus, it will understand whether or not the decision
was correct. This is how the machine will learn to take the correct decisions to
maximize the reward in the long run.
17-08-2020 Side 21
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Error measures are a tool in ML that quantify the question “how wrong was
our estimation”. It is a function that compares the output of a learned
hypothesis with the output of the real target function. What this means in
practice is that we compare the prediction of our model with the real value in
data. An error measure is expressed as E(h, f) (a hypothesis h ∈ H, and f is the
target function). E is almost always pointwise. It is defined by the difference
at two points, therefore, we use the pointwise definition of the error measure
e() to compute this error in the different points: e(h(x), f(x)).
Examples:
Squared error: e(h(x), f(x)) = (h(x)- f(x))²
Binary error: e(h(x), f(x)) = ⟦h(x) ≠ f(x)⟧ (the number of wrong
classifications)
17-08-2020 Side 22
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 23
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 24
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
• Training Set: Here, you have the complete training dataset. You can
extract features and train to fit a model and so on
• Validation Set: This is crucial to choose the right parameters for your
estimator. We can divide the training set into a train set and validation set.
Based on the validation test results, the model can be trained(for instance,
changing parameters, classifiers)
• Testing Set: Here, once the model is obtained, you can predict using the
model obtained on the training set
17-08-2020 Side 25
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 26
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Theory of Generalization
17-08-2020 Side 27
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
17-08-2020 Side 28
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
17-08-2020 Side 29
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
17-08-2020 Side 30
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
17-08-2020 Side 31
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
17-08-2020 Side 32
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
Over-Training
17-08-2020 Side 33
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
Preventing Over-training
17-08-2020 Side 34
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization
17-08-2020 Side 35
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization Bound
17-08-2020 Side 36
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization Bound
17-08-2020 Side 37
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Generalization Bound
17-08-2020 Side 38
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 39
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 40
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 41
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 42
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 43
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 44
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 45
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 46
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 47
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Overfitting
17-08-2020 Side 48
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Bias
Bias is the difference between the Predicted Value and the Expected Value.
Mathematically, let the input variables be X and a target variable Y. We map
the relationship between the two using a function f. Therefore,
Y = f(X) + e
Here ‘e’ is the error that is normally distributed. The aim of our model f'(x)
is to predict values as close to f(x) as possible. Here, the Bias of the model
is:
Bias[f'(X)] = E[f'(X) – f(X)]
As I explained above, when the model makes the generalizations i.e. when
there is a high bias error, it results in a very simplistic model that does not
consider the variations very well. Since it does not learn the training data
very well, it is called Underfitting.
17-08-2020 Side 49
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Variance
Contrary to bias, the Variance is when the model takes into account the
fluctuations in the data i.e. the noise as well. So, what happens when our
model has a high variance?
The model will still consider the variance as something to learn from. That
is, the model learns too much from the training data, so much so, that when
confronted with new (testing) data, it is unable to predict accurately based
on it.
Mathematically, the variance error in the model is:
Variance[f(x))=E[X^2]−E[X]^2
Since in the case of high variance, the model learns too much from the
training data, it is called overfitting.
17-08-2020 Side 50
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 51
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning curves
• Learning curves are plots that show changes in learning performance
over time in terms of experience
• Learning curves of model performance on the train and validation
datasets can be used to diagnose an underfit, overfit, or well-fit model
• Learning curves of model performance can be used to diagnose whether
the train or validation datasets are not relatively representative of the
problem domain
• Generally, a learning curve is a plot that shows time or experience on the
x-axis and learning or improvement on the y-axis
Learning curves are deemed effective tools for monitoring the performance
of workers exposed to a new task. LCs provide a mathematical
representation of the learning process that takes place as task repetition
occurs.
17-08-2020 Side 52
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning curves
• Train Learning Curve: Learning curve calculated from the training
dataset that gives an idea of how well the model is learning
• Validation Learning Curve: Learning curve calculated from a hold-out
validation dataset that gives an idea of how well the model is
generalizing
• Optimization Learning Curves: Learning curves calculated on the
metric by which the parameters of the model are being optimized, e.g.
loss
• Performance Learning Curves: Learning curves calculated on the
metric by which the model will be evaluated and selected, e.g. accuracy
• There are three common dynamics that you are likely to observe in
learning curves; they are:
• Underfit
• Overfit
• Good Fit
17-08-2020 Side 53
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning curves
• Overfitting refers to a model that has learned the training dataset too
well, including the statistical noise or random fluctuations in the training
dataset.
• A plot of learning curves shows overfitting if:
• The plot of training loss continues to decrease with experience
• The plot of validation loss decreases to a point and begins increasing again
17-08-2020 Side 54
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
Learning curves
• A good fit is the goal of the learning algorithm and exists between an
overfit and underfit model
• A good fit is identified by a training and validation loss that decreases to
a point of stability with a minimal gap between the two final loss values
• A plot of learning curves shows a good fit if:
• The plot of training loss decreases to a point of stability
• The plot of validation loss decreases to a point of stability and has a small gap with
the training loss
17-08-2020 Side 55
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
References
Bartlett, Peter L., Dylan J. Foster, and Matus J. Telgarsky. “Spectrally-
[1]
17-08-2020 Side 56
Madan Mohan Malaviya Univ. of Technology, Gorakhpur
17-08-2020 Side 57