Machine Learning-2
Machine Learning-2
th
6 Semester B.Tech. (CSE)
Course Code:BTCS-T-PC-022
Nayan Ranjan Paul
Department of CSE
Silicon Institute of Technology
Syllabus
Topic Book/Chapter
Module I
Books
Overview of supervised learning T1/Ch. 2
K-nearest neighbour T1/Ch. 2.3.2, R2/Ch.8.2
Multiple linear regression T1/Ch. 3.2.3
Shrinkage methods (Ridge regression, Lasso regression) T1/Ch. 3.4
Logistic regression T1/Ch. 4.3
●
T1: T Hastie, R.Tibshirani and J Friedman, The Elements of Statistical Linear Discriminant Analysis T1/Ch. 4.4
Learning – Data Mining Inference and Prediction, 2 nd Edition, Springer Feature selection
Module II
T1/Ch. 5.3
●
The goal of any supervised machine learning algorithm is to best estimate the
mapping function (f) for the output variable (Y) given the input data (X).
●
The mapping function is often called the target function because it is the function
that a given supervised machine learning algorithm aims to approximate.
Overview
●
The learned model will be used to predict the unseen test data.
●
However, if the learned model is not accurate, it can make predictions errors.
●
In machine learning, these errors will always be present as there is always a slight
difference between the model predictions and actual value.
●
The main aim of ML analysts is to reduce these errors in order to get more
accurate results.
Errors in Machine Learning
●
In machine learning, an error is a measure of how accurately an algorithm can
make predictions for the previously unknown dataset.
●
On the basis of these errors, the machine learning model is selected that can
perform best on the particular dataset.
●
There are mainly two types of errors in machine learning, which are:
– Irreducible Error
– Reducible Error
Errors in Machine Learning
●
Irreducible error - It cannot be reduced regardless of what algorithm is used. It is
the error introduced due to the factors like unknown variables that influence the
mapping of the input variables to the output variable.
●
Reducible Error - These errors can be reduced to improve the model accuracy.
Such errors can further be classified into two categories.
– Bias
– Varience
Bias
●
In general, a machine learning model analyses the data, find patterns in it and
make predictions.
●
While training, the model learns these patterns in the dataset and applies them to
test data for prediction.
●
While making predictions, a difference occurs between prediction values made by
the model and actual values/expected values, and this difference is known as bias
errors or Errors due to bias.
Bias
●
It can be defined as an inability of machine learning algorithms such as Linear
Regression to capture the true relationship between the data points.
●
Bias are the simplifying assumptions made by a model to make the target function
easier to learn.
●
Model with high bias pays very little attention to the training data and
oversimplifies the model. It always leads to high error on training and test data.
Bias
●
A model has either:
– Low Bias: A low bias model will make fewer assumptions about the form of
the target function.
– High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.
Bias
●
High bias (overly simple) can cause an algorithm to miss the relevant relations
between features and target outputs (underfitting).
●
Examples of low-bias machine learning algorithms : Decision Trees, k-Nearest
Neighbors and SVM.
●
Examples of high-bias machine learning algorithms : Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
Bias
●
Ways to reduce High Bias
– High bias mainly occurs due to a much simple model. Below are some ways to
reduce the high bias:
●
Increase the input features as the model is underfitted.
●
Decrease the regularization term.
●
Use more complex models, such as including some polynomial features.
Variance
●
The variance is the amount of variation in the prediction if the different training
data was used.
●
The variance is an error from sensitivity to small fluctuations in the training set.
●
Variance is the variability of model prediction for a given data point or a value
which tells us spread of our data.
Variance
●
Ideally, a model should not vary too much from one training dataset to another,
which means the algorithm should be good in understanding the hidden mapping
between inputs and output variables.
●
Varieance is either:
– Low variance means there is a small variation in the prediction of the target
function with changes in the training data set.
– High variance shows a large variation in the prediction of the target function
with changes in the training dataset.
Variance
●
Model with high variance pays a lot of attention to training data and does not
generalize on the data which it hasn’t seen before. As a result, such models
perform very well on training data but has high error rates on test data.
●
Generally, nonlinear machine learning algorithms that have a lot of flexibility have
a high variance. For example, decision trees have a high variance, that is even
higher if the trees are not pruned before use.
●
Examples of low-variance machine learning algorithms : Linear Regression,
Linear Discriminant Analysis and Logistic Regression.
●
Examples of high-variance machine learning algorithms include: Decision Trees,
k-Nearest Neighbors and Support Vector Machines.
Variance
●
With high variance, the model learns too much from the dataset, it leads to
overfitting of the model.
●
A model with high variance has the below problems:
– A high variance model leads to overfitting.
– Increase model complexities.
●
Ways to Reduce High Variance:
– Reduce the input features or number of parameters as a model is overfitted.
– Do not use a much complex model.
– Increase the training data.
– Increase the Regularization term.
Underfitting vs Overfitting
●
Underfitting
– In supervised learning, underfitting happens when a model unable to capture
the underlying pattern of the data.
– It happens when we have very less amount of data to build an accurate model
or when we try to build a linear model with a nonlinear data.
– These kind of models are very simple to capture the complex patterns in data
like Linear and logistic regression.
Underfitting vs Overfitting
●
Overfitting
– Overfitting means that error on the training data is very low, but error on new
instances is high.
– In supervised learning, overfitting happens when our model captures the noise
along with the underlying pattern in data.
– These models are very complex like Decision trees which are prone to
overfitting.
Underfitting vs Overfitting(Linear Regression)
Underfitting vs Overfitting(Linear Regression)
●
Training error is high
●
Test error is high
●
Underfitting condition
●
High bias
Underfitting vs Overfitting(Linear Regression)
●
Training error is high
●
Test error is high
●
Underfitting condition
●
High bias
Underfitting vs Overfitting(Linear Regression)
●
Training error is high ●
Training error is lowest
●
Test error is high ●
Test error is high
●
Underfitting condition ●
Overfitting condition
●
High bias ●
High variance
Underfitting vs Overfitting(Linear Regression)
●
Training error is high ●
Training error is lowest
●
Test error is high ●
Test error is high
●
Underfitting condition ●
Overfitting condition
●
High bias ●
High variance
Underfitting vs Overfitting(Linear Regression)
●
Training error is high ●
Training error is low ●
Training error is lowest
●
Test error is high ●
Test error is moderate ●
Test error is high
●
Underfitting condition ●
Just right condition ●
Overfitting condition
●
High bias ●
Low bias ●
High variance
●
Low variance
Underfitting vs Overfitting(Classification)
●
If the model is very simple with fewer parameters, it may have low variance and
high bias.
●
Whereas, if the model has a large number of parameters, it will have high variance
and low bias.
●
So, it is required to make a balance between bias and variance errors, and this
balance between the bias error and variance error is known as the Bias-Variance
trade-off.
●
An optimal balance of bias and variance would never overfit or underfit the model.
Bias variance trade off
●
For an accurate prediction of the model, algorithms need a low variance and low
bias. But this is not possible because bias and variance are related to each other:
●
Bias-Variance trade-off is a central issue in supervised learning.
●
Ideally, we need a model that accurately captures the regularities in training data
and simultaneously generalizes well with the unseen dataset.
Bias variance trade off
●
Unfortunately, doing this is not possible simultaneously.
●
Because a high variance algorithm may perform well with training data, but it may
lead to overfitting to noisy data.
●
Whereas, high bias algorithm generates a much simple model that may not even
capture important regularities in the data.
●
So, we need to find a sweet spot between bias and variance to make an optimal
model.
●
So the Bias-Variance trade-off is about finding the sweet spot to make a balance
between bias and variance errors.
Model Assessment and Selection
Model Assessment and Selection
Model Assessment and Selection
Types of Errors
Types of Errors
Types of Errors
Variance
Bias
Expected Prediction Error
Expected Prediction Error
Mean Squared Error (MSE)
Cross-Validation
Cross-Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Bootstrap Method
Bootstrap Method
The Bootstrap: An Example
The Bootstrap: An Example
The Bootstrap: An Example
The Bootstrap
The Bootstrap
The Bootstrap
Q1.- If we obtain a bootstrap sample from a set of n observations, what is the probability
that the jth observation is not in the bootstrap sample?
Ans - The probability that the first bootstrap observation is not the th observation from the
original sample is 1−1/ . This is because there are samples, and we are equally likely to
pick each observation when taking our first bootstrap observation.
Q2. What is the probability that the jth observation is not in the bootstrap sample ?
Ans - Total times an observation will not appear in a bootstrap sample: ( − 1)n .
Therefore, the probability that an observation is not in the bootstrap sample:
Model Selection Methods
Model Selection Methods
Model Selection Methods
Q1.- Define AIC and BIC. How will you use these to choose the best model? What type of
model does AIC and BIC choose when data set size approaches infinity?
Classification Performance Measures
Confusion Matrix / Contingency Table based Measure
Accuracy and Recall
F-measure
Measuring performance of Multiclass Classifier: Example
Measuring performance of Multiclass Classifier: Example
Measuring performance of Binary Classifier
Measuring performance of Binary Classifier
Binary Classification: Positive and Negative Class
Binary Classification: Assessment Measures
Measuring performance of Binary Classifier: Example
Practice questions
●
Q1. Suppose a computer program for recognizing dogs in photos identifies eight
dogs in a picture containing 12 dogs and some cats. Of the eight dogs identified,
five are actually dogs while the rest are cats. Compute the precision and recall of
the computer program?
●
Q2. Let there be 10 balls(6 white and 4 red balls) in a box and let it be required to
pick up the red balls from them. Suppose we pick up 7 balls as the red balls, of
which only 2 are actually red balls. What are the values of precision and recall in
picking red balls?
Practice questions
●
Q3. Suppose 1000 patients get tested for flu, out of which 9000 are actually
healthy and 1000 are actually sick. For the sick people a test was positive for 620
and negative for 380. For healthy people, the same test was positive for 180 and
negative for 8820. Construct a confusion matrix and compute the accuracy ,
precision and recall.
●
Q4. A binary classifier was evaluated using a set of 1,000 test examples in which
50% of all examples are negative. It was found that the classifier has 60%
sensitivity and 70% accuracy. Write the confusion matrix. Using the confusion
matrix compute the classifier’s precision, F-measure, and specificity.
Practice questions
●
Q5. A database contains 80 records on a particular topic. A search was conducted on
that topic and 60 records were retrieved. Out of the 60 records retrieved, 45 were
relevant. Calculate the precision and recall scores for the search.
●
Sol:
Using the designations above:
A = The number of relevant records retrieved,
B = The number of relevant records not retrieved, and
C = The number of irrelevant records retrieved.
In this example
A = 45, B = 35 (80-45) and C = 15 (60-45).
Recall = (45 / (45 + 35)) * 100% => 45/80 *100% = 56%
Precision = (45 / (45 + 15)) * 100% => 45/60 * 100% = 75%
Practice questions
ROC Analysis
ROC Analysis
ROC Analysis
ROC
Random Classifier
ROC/AUC Algorithm
ROC/AUC Algorithm
ROC/AUC Algorithm
ROC Analysis: Example
ROC Analysis
ROC Plot and AUC: Trapezoid Region