0% found this document useful (0 votes)
42 views

Machine Learning-2

The document discusses different types of errors in machine learning models including irreducible error and reducible error which includes bias and variance. Bias refers to an inability to capture the true relationship between data points and is caused by oversimplifying assumptions in the model. High bias leads to high error on both training and test data and causes underfitting.

Uploaded by

ralac55582
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Machine Learning-2

The document discusses different types of errors in machine learning models including irreducible error and reducible error which includes bias and variance. Bias refers to an inability to capture the true relationship between data points and is caused by oversimplifying assumptions in the model. High bias leads to high error on both training and test data and causes underfitting.

Uploaded by

ralac55582
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Machine Learning

th
6 Semester B.Tech. (CSE)
Course Code:BTCS-T-PC-022
Nayan Ranjan Paul
Department of CSE
Silicon Institute of Technology
Syllabus
Topic Book/Chapter
Module I

Books
Overview of supervised learning T1/Ch. 2
K-nearest neighbour T1/Ch. 2.3.2, R2/Ch.8.2
Multiple linear regression T1/Ch. 3.2.3
Shrinkage methods (Ridge regression, Lasso regression) T1/Ch. 3.4
Logistic regression T1/Ch. 4.3

T1: T Hastie, R.Tibshirani and J Friedman, The Elements of Statistical Linear Discriminant Analysis T1/Ch. 4.4
Learning – Data Mining Inference and Prediction, 2 nd Edition, Springer Feature selection
Module II
T1/Ch. 5.3

2009. Bias, Variance, and model complexity T1/Ch. 7.2


Bias-variance trade off T1/Ch. 7.3

T2: S. Haykin, Neural Networks and Learning Machines, 3rd Edition, Bayesian approach and BIC T1/Ch. 7.7
Pearson Education, 2009. Cross- validation T1/Ch. 7.10
Boot strap methods T1/Ch. 7.11

T3: E. Alpaydin, Introduction to Machine Learning, 2 nd Edition, Performance of Classification algorithms(Confusion Matrix, R7
Prentice Hall of India, 2010. Precision, Recall and ROCCurve)
Module III

R1: Y. G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction Generative model for discrete data
Bayesian concept learning R2/Ch. 6
to Statistical Learning with Applications in R, 2nd Edition, Springer, Naive Bayes classifier T1/Ch. 6.6.3, R2/Ch. 6
2013. SVM for classification T1/Ch. 12.3.1, R3/Ch. 1.5, T2/Ch. 6
Reproducing Kernels R5, R6, T2/Ch. 6

R2: T. M. Mitchell, Machine Learning, 1 st Edition, McGraw-Hill SVM for regression T1/Ch. 12.3.6, R3/Ch. 1.6, T2/Ch. 6
Education, 2013 Regression and classification trees T1/Ch. 9.2.2, 9.2.3
Random forest T1/Ch. 15

R3: B. Scholkopf, A. J. Smola, Learning with Kernels, MIT Press, 2002 Module IV
Clustering (K-means, spectral clustering) T1/Ch. 13.2.1

R4: Murphy, Machine Learning - A Probabilistic Perspective, MIT Press, Feature Extraction (Principal Component Analysis (PCA) R1/10.2
2012 Kernel based PCA
Independent Component Analysis (ICA)
T1/Ch. 14.5.4
R4/Ch. 12.6, CS229

R5: N. Aronszajn. Theory of reproducing kernels. Transactions of the Non-negative matrix factorization T1/14.6
Mixture of Gaussians R4/Ch. 11.2.1, CS229
American Mathematical Society, 68 (1950): 337–404 Expectation Maximization (EM) algorithm R4/Ch. 11.4, CS229
Module V

R6: S. Saitoh. Theory of Reproducing Kernels and its Applications. Boosting methods-exponential loss and AdaBoost T1/Ch. 10.4
Longman Scientific & Technical, 1988 Numerical Optimization via gradient boosting T1/Ch. 10.10
Introduction to Reinforcement Learning T3/Ch. 18.1

R7:https://www.kdnuggets.com/2020/01/guide-precision-recall- Elements of Reinforcement Learning T3/Ch. 18.3
confusion-matrix.html Single State Case: K-Armed Bandit T3/Ch. 18.2
Model-Based Learning (Value Iteration, Policy Iteration) T3/Ch. 18.4
Module - II
Overview

In supervised machine learning, an algorithm learns a model from training data.


The goal of any supervised machine learning algorithm is to best estimate the
mapping function (f) for the output variable (Y) given the input data (X).


The mapping function is often called the target function because it is the function
that a given supervised machine learning algorithm aims to approximate.
Overview

The learned model will be used to predict the unseen test data.


However, if the learned model is not accurate, it can make predictions errors.


In machine learning, these errors will always be present as there is always a slight
difference between the model predictions and actual value.


The main aim of ML analysts is to reduce these errors in order to get more
accurate results.
Errors in Machine Learning

In machine learning, an error is a measure of how accurately an algorithm can
make predictions for the previously unknown dataset.


On the basis of these errors, the machine learning model is selected that can
perform best on the particular dataset.


There are mainly two types of errors in machine learning, which are:
– Irreducible Error
– Reducible Error
Errors in Machine Learning

Irreducible error - It cannot be reduced regardless of what algorithm is used. It is
the error introduced due to the factors like unknown variables that influence the
mapping of the input variables to the output variable.


Reducible Error - These errors can be reduced to improve the model accuracy.
Such errors can further be classified into two categories.
– Bias
– Varience
Bias

In general, a machine learning model analyses the data, find patterns in it and
make predictions.


While training, the model learns these patterns in the dataset and applies them to
test data for prediction.


While making predictions, a difference occurs between prediction values made by
the model and actual values/expected values, and this difference is known as bias
errors or Errors due to bias.
Bias

It can be defined as an inability of machine learning algorithms such as Linear
Regression to capture the true relationship between the data points.


Bias are the simplifying assumptions made by a model to make the target function
easier to learn.


Model with high bias pays very little attention to the training data and
oversimplifies the model. It always leads to high error on training and test data.
Bias

A model has either:
– Low Bias: A low bias model will make fewer assumptions about the form of
the target function.

– High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.
Bias

High bias (overly simple) can cause an algorithm to miss the relevant relations
between features and target outputs (underfitting).


Examples of low-bias machine learning algorithms : Decision Trees, k-Nearest
Neighbors and SVM.


Examples of high-bias machine learning algorithms : Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
Bias

Ways to reduce High Bias
– High bias mainly occurs due to a much simple model. Below are some ways to
reduce the high bias:

Increase the input features as the model is underfitted.

Decrease the regularization term.

Use more complex models, such as including some polynomial features.
Variance

The variance is the amount of variation in the prediction if the different training
data was used.


The variance is an error from sensitivity to small fluctuations in the training set.


Variance is the variability of model prediction for a given data point or a value
which tells us spread of our data.
Variance

Ideally, a model should not vary too much from one training dataset to another,
which means the algorithm should be good in understanding the hidden mapping
between inputs and output variables.


Varieance is either:
– Low variance means there is a small variation in the prediction of the target
function with changes in the training data set.

– High variance shows a large variation in the prediction of the target function
with changes in the training dataset.
Variance

Model with high variance pays a lot of attention to training data and does not
generalize on the data which it hasn’t seen before. As a result, such models
perform very well on training data but has high error rates on test data.


Generally, nonlinear machine learning algorithms that have a lot of flexibility have
a high variance. For example, decision trees have a high variance, that is even
higher if the trees are not pruned before use.


Examples of low-variance machine learning algorithms : Linear Regression,
Linear Discriminant Analysis and Logistic Regression.


Examples of high-variance machine learning algorithms include: Decision Trees,
k-Nearest Neighbors and Support Vector Machines.
Variance

With high variance, the model learns too much from the dataset, it leads to
overfitting of the model.

A model with high variance has the below problems:
– A high variance model leads to overfitting.
– Increase model complexities.

Ways to Reduce High Variance:
– Reduce the input features or number of parameters as a model is overfitted.
– Do not use a much complex model.
– Increase the training data.
– Increase the Regularization term.
Underfitting vs Overfitting

Underfitting
– In supervised learning, underfitting happens when a model unable to capture
the underlying pattern of the data.

– These models usually have high bias and low variance.

– It happens when we have very less amount of data to build an accurate model
or when we try to build a linear model with a nonlinear data.

– These kind of models are very simple to capture the complex patterns in data
like Linear and logistic regression.
Underfitting vs Overfitting

Overfitting
– Overfitting means that error on the training data is very low, but error on new
instances is high.

– In supervised learning, overfitting happens when our model captures the noise
along with the underlying pattern in data.

– It happens when we train our model a lot over noisy dataset.

– These models have low bias and high variance.

– These models are very complex like Decision trees which are prone to
overfitting.
Underfitting vs Overfitting(Linear Regression)
Underfitting vs Overfitting(Linear Regression)


Training error is high

Test error is high

Underfitting condition

High bias
Underfitting vs Overfitting(Linear Regression)


Training error is high

Test error is high

Underfitting condition

High bias
Underfitting vs Overfitting(Linear Regression)


Training error is high ●
Training error is lowest

Test error is high ●
Test error is high

Underfitting condition ●
Overfitting condition

High bias ●
High variance
Underfitting vs Overfitting(Linear Regression)


Training error is high ●
Training error is lowest

Test error is high ●
Test error is high

Underfitting condition ●
Overfitting condition

High bias ●
High variance
Underfitting vs Overfitting(Linear Regression)


Training error is high ●
Training error is low ●
Training error is lowest

Test error is high ●
Test error is moderate ●
Test error is high

Underfitting condition ●
Just right condition ●
Overfitting condition

High bias ●
Low bias ●
High variance

Low variance
Underfitting vs Overfitting(Classification)

Model 1 Model 2 Model 3



Training error: 1% ●
Training error: 25% ●
Training error: < 10%

Test error: 20% ●
Test error: 26% ●
Test error: 10%

Low bias ●
High bias ●
Low bias

High variance ●
High variance ●
Low variance
Bias vs Varience
Error vs Model Complexity
Bias variance trade off

While building the machine learning model, it is really important to take care of
bias and variance in order to avoid overfitting and underfitting in the model.


If the model is very simple with fewer parameters, it may have low variance and
high bias.


Whereas, if the model has a large number of parameters, it will have high variance
and low bias.


So, it is required to make a balance between bias and variance errors, and this
balance between the bias error and variance error is known as the Bias-Variance
trade-off.


An optimal balance of bias and variance would never overfit or underfit the model.
Bias variance trade off

For an accurate prediction of the model, algorithms need a low variance and low
bias. But this is not possible because bias and variance are related to each other:

– If we decrease the variance, it will increase the bias.


– If we decrease the bias, it will increase the variance.


Bias-Variance trade-off is a central issue in supervised learning.


Ideally, we need a model that accurately captures the regularities in training data
and simultaneously generalizes well with the unseen dataset.
Bias variance trade off

Unfortunately, doing this is not possible simultaneously.


Because a high variance algorithm may perform well with training data, but it may
lead to overfitting to noisy data.


Whereas, high bias algorithm generates a much simple model that may not even
capture important regularities in the data.


So, we need to find a sweet spot between bias and variance to make an optimal
model.


So the Bias-Variance trade-off is about finding the sweet spot to make a balance
between bias and variance errors.
Model Assessment and Selection
Model Assessment and Selection
Model Assessment and Selection
Types of Errors
Types of Errors
Types of Errors
Variance
Bias
Expected Prediction Error
Expected Prediction Error
Mean Squared Error (MSE)
Cross-Validation
Cross-Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Types of Cross Validation
Bootstrap Method
Bootstrap Method
The Bootstrap: An Example
The Bootstrap: An Example
The Bootstrap: An Example
The Bootstrap
The Bootstrap
The Bootstrap
Q1.- If we obtain a bootstrap sample from a set of n observations, what is the probability
that the jth observation is not in the bootstrap sample?

Ans - The probability that the first bootstrap observation is not the th observation from the
original sample is 1−1/ . This is because there are samples, and we are equally likely to
pick each observation when taking our first bootstrap observation.

Q2. What is the probability that the jth observation is not in the bootstrap sample ?

Ans - Total times an observation will not appear in a bootstrap sample: ( − 1)n .
Therefore, the probability that an observation is not in the bootstrap sample:
Model Selection Methods
Model Selection Methods
Model Selection Methods
Q1.- Define AIC and BIC. How will you use these to choose the best model? What type of
model does AIC and BIC choose when data set size approaches infinity?
Classification Performance Measures
Confusion Matrix / Contingency Table based Measure
Accuracy and Recall
F-measure
Measuring performance of Multiclass Classifier: Example
Measuring performance of Multiclass Classifier: Example
Measuring performance of Binary Classifier
Measuring performance of Binary Classifier
Binary Classification: Positive and Negative Class
Binary Classification: Assessment Measures
Measuring performance of Binary Classifier: Example
Practice questions

Q1. Suppose a computer program for recognizing dogs in photos identifies eight
dogs in a picture containing 12 dogs and some cats. Of the eight dogs identified,
five are actually dogs while the rest are cats. Compute the precision and recall of
the computer program?


Q2. Let there be 10 balls(6 white and 4 red balls) in a box and let it be required to
pick up the red balls from them. Suppose we pick up 7 balls as the red balls, of
which only 2 are actually red balls. What are the values of precision and recall in
picking red balls?
Practice questions

Q3. Suppose 1000 patients get tested for flu, out of which 9000 are actually
healthy and 1000 are actually sick. For the sick people a test was positive for 620
and negative for 380. For healthy people, the same test was positive for 180 and
negative for 8820. Construct a confusion matrix and compute the accuracy ,
precision and recall.


Q4. A binary classifier was evaluated using a set of 1,000 test examples in which
50% of all examples are negative. It was found that the classifier has 60%
sensitivity and 70% accuracy. Write the confusion matrix. Using the confusion
matrix compute the classifier’s precision, F-measure, and specificity.
Practice questions

Q5. A database contains 80 records on a particular topic. A search was conducted on
that topic and 60 records were retrieved. Out of the 60 records retrieved, 45 were
relevant. Calculate the precision and recall scores for the search.

Sol:
Using the designations above:
A = The number of relevant records retrieved,
B = The number of relevant records not retrieved, and
C = The number of irrelevant records retrieved.
In this example
A = 45, B = 35 (80-45) and C = 15 (60-45).
Recall = (45 / (45 + 35)) * 100% => 45/80 *100% = 56%
Precision = (45 / (45 + 15)) * 100% => 45/60 * 100% = 75%
Practice questions
ROC Analysis
ROC Analysis
ROC Analysis
ROC
Random Classifier
ROC/AUC Algorithm
ROC/AUC Algorithm
ROC/AUC Algorithm
ROC Analysis: Example
ROC Analysis
ROC Plot and AUC: Trapezoid Region

You might also like