0% found this document useful (0 votes)

0 views

Week8_Lecture_1_ML_SPR25

The document discusses various subset selection methods in machine learning, focusing on improving prediction accuracy and model interpretability. It outlines techniques such as Best Subset Selection, Forward Selection, and Backward Selection, along with their computational limitations and effectiveness in model selection. Additionally, it highlights the importance of metrics like Cp, AIC, and BIC for optimal model selection and emphasizes the use of validation methods for estimating test error.

Uploaded by

Not A Bourgeoisie

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Week8_Lecture_1_ML_SPR25

Uploaded by

Not A Bourgeoisie

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Machine Learning and Deep Learning

with R
Instructor: Babu Adhimoolam
Learning objectives Subset Selection Methods
To improve prediction accuracy on
the test dataset in the following
conditions with high variance:

number of (n) is not so large than

Why additional the number of (p)
linear methods?
By constraining or shrinking the
coefficients associated with (p), we
can substantially reduce the variance
associated with test error.
To improve model interpretability:

Including multiple variables that are

not associated with the response in a
Why additional model leads to complexity.
linear methods?
We can make the coefficient of those
variables that do not contribute to
the response as zero thereby
resulting in interpretable models.
Subset Selection Methods
Best Subset selection
Forward Selection method
Backward Selection method
Shrinkage Methods
Extension of linear
Ridge Regression
methods
Lasso Regression
Dimensionality Reduction Methods
Principal Components Regression
Partial Least Squares
The Best Subset Selection Method
• We fit least squares regression for each possible combination of p predictors.

• The total number of all possible combinations of predictors for least square estimation
in the case of p predictors is 2p

• We first start with a null model (M0) with no predictors and then we calculate Mk for each
value of k.

for k = 1, 2, …, p :

- fit all models for each Mk.

- choose the best of all models with RSS or R 2 criterion for each Mk

• We finally choose the best model from the list of available models M 0 ,…. ,Mk .
Application of Best Subset selection to Credit Data set

M1 to Mp models

Response - Balance
Predictors – Income, Limit, Rating, Cards, Age, Education, Own, Student, Married and Region

Red line – Best model in each subset of predictors.

Suffers heavily from computational
limitations as p becomes higher.

Recall the total possible subsets of

models with p predictors is 2 p

If p is 10, 2 10 ~1000 models to

evaluate
Limitations of
If p is 20, 2 20 ~1,000,000 models to
best subset Selection evaluate.

P>40 is computationally infeasible!

In addition, large model space allows

for overfitting problems that may not
generalize to high accuracy in test
data.
Forward Stepwise Selection

• Forward stepwise selection is a computationally feasible and efficient alterative to best

subset selection as it considers lesser subset of models in comparison to 2p models.

• We begin with a model with no predictors(M0), and then add predictors to the model,
one at a time until we add all the predictors.

for k = 0, 1,..,p-1:

- consider all (p – k) models that will augment the predictions in M k with one additional
predictor.
- choose the best among the (p-k) models with RSS and R2 and call it Mk+1.

• We finally choose the best model among the list of available models M0 … Mk.
Unlike best subset selection which
involves model selection with 2 p
models (with p predictors), we have
in forward stepwise selection:

Computational
feasibility of forward
stepwise selection
So, if p = 20, in best subset selection
we must fit approximately 1,048,576
models. In contrast we have only 211
models in forward stepwise
selection.
Forward selection methods always don’t guarantee the best
models

Note that the best four variable models are different between best subset selection and
Forward stepwise selection methods.
Backward Stepwise Selection
• Unlike the best subset selection and forward step wise selection, here we start with a full
model (Mp) containing all the predictors.

• We then iteratively remove the least useful predictor one and a time.

for k = p, p-1, …1:

- consider all k models that contain all but one (k-1) of the predictors in Mk

- choose the best among these k models, call it M k-1, (as assessed with RSS and R2)

• We then choose the single best model out of Mp,…,M1

• Backward stepwise method is computationally similar to Forward stepwise selection

• It requires n>p unlike forward stepwise selection method.

R 2 and RSS are not the best metrics
for selecting the best models as
models with all the predictors will
always have R 2 as highest and RSS as
lowest values.
Choosing the
optimal model Indirect methods for test error
among M0 ,…,Mp estimation use adjustment to
training error rates to account for
models bias/overfitting.

Direct methods for test error

estimation use cross-validation
methods.
Cp

Indirect Methods Akaike Information Criterion(AIC)

for adjusting training
error rates Bayesian Information Criterion(BIC)

Adjusted R2
For a fitted least square model with
d predictors, C p estimate of test MSE
is:

C p adds a penalty that is proportional

to the number of predictors in the
Cp model.

The model with more predictors will

incur more penalty.

C p tends to take the small value for

models with low test errors(best
model!).
AIC is defined for larger set of
models fit by maximum likelihood
approach.

In the case least square models it is

defined as follows:
Akaike Information
Criterion (AIC)
AIC and Cp values measure same
things for least square models and
are proportional to each other.

AIC takes smaller values for best

models.
Like Cp and AIC, the BIC takes on
small value for models with lowest
test error and is given by:

Bayesian
Information
Criteria(BIC) The BIC places heavier penalty than
the Cp or AIC for models with many
variables for observations(n)> 7.
Maximizing adjusted R 2 is equivalent
to minimizing the RSS/(n-d-1).

The addition of noise variables will

result in very small decrease in RSS
along with increase in d resulting in
Adjusted R2 overall increase in RSS.

In comparison with regular R 2,

adjusted R 2 accounts for nuisance
variables (larger adjusted R 2 is the
best model)
Optimal model selection in the credit data set

Choose M1 ,…, Mp

Select from M1,…,Mp

Low values of Cp, AIC and BIC and high values of adjusted R2 will reveal models with low test error rates.
Choosing the optimal model with validation and cross
validation

Validation and cross-validation methods directly estimate the test error and are the most preferred methods for model
selection in comparison with previous methods.

Practice Exam
No ratings yet
Practice Exam
6 pages
Model Selection-Handout PDF
No ratings yet
Model Selection-Handout PDF
57 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
Bi12-019 Bi12-263 LW3
No ratings yet
Bi12-019 Bi12-263 LW3
35 pages
Statistical Methods For Bioinformatics Lecture 3
No ratings yet
Statistical Methods For Bioinformatics Lecture 3
33 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
5 Subset - Selection
No ratings yet
5 Subset - Selection
5 pages
Final Stage For Cocomo Using PCA
No ratings yet
Final Stage For Cocomo Using PCA
8 pages
HW4 Template
No ratings yet
HW4 Template
3 pages
Ai&ml 2
No ratings yet
Ai&ml 2
15 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
Cheat Sheet Final
100% (2)
Cheat Sheet Final
7 pages
HW 6
No ratings yet
HW 6
2 pages
Unit 4
No ratings yet
Unit 4
7 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Unit 3
No ratings yet
Unit 3
24 pages
DATT - Class 05 - Assignment - GR 9
No ratings yet
DATT - Class 05 - Assignment - GR 9
9 pages
MLA Manual
No ratings yet
MLA Manual
25 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
Reg07
No ratings yet
Reg07
22 pages
DSBDL - Write - Ups - 4 To 7
No ratings yet
DSBDL - Write - Ups - 4 To 7
11 pages
Report
No ratings yet
Report
7 pages
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
No ratings yet
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
17 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Partial Least Square
No ratings yet
Partial Least Square
6 pages
Advanced Regression Assignment
No ratings yet
Advanced Regression Assignment
5 pages
Subset Selection and Shrinkage Methods
No ratings yet
Subset Selection and Shrinkage Methods
25 pages
Capstone Project 2
No ratings yet
Capstone Project 2
27 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Lab 9 Report
No ratings yet
Lab 9 Report
5 pages
Detailed-Report-Regression-Models-for-House-Price-Prediction
No ratings yet
Detailed-Report-Regression-Models-for-House-Price-Prediction
3 pages
Assignment II
No ratings yet
Assignment II
3 pages
3-Polynomial Regression Using Python
No ratings yet
3-Polynomial Regression Using Python
14 pages
Feature Combinatrics
No ratings yet
Feature Combinatrics
10 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
003-KNN Complete Updated
No ratings yet
003-KNN Complete Updated
72 pages
Lecture_6_Model_Selection_and_Regularization_11Oct2023
No ratings yet
Lecture_6_Model_Selection_and_Regularization_11Oct2023
29 pages
Neal Zhang
No ratings yet
Neal Zhang
33 pages
Quantile Regression (Final) PDF
100% (1)
Quantile Regression (Final) PDF
22 pages
L2D-Multiple Regression D 2022-03-03 21_20_03
No ratings yet
L2D-Multiple Regression D 2022-03-03 21_20_03
31 pages
A Majorization-Minimization Algorithm For (Multiple) Hyperparameter Learning
No ratings yet
A Majorization-Minimization Algorithm For (Multiple) Hyperparameter Learning
8 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
Chapter 6 Variable Selection and Model Building
No ratings yet
Chapter 6 Variable Selection and Model Building
32 pages
ML Assignment (22BCE8086) 2
No ratings yet
ML Assignment (22BCE8086) 2
19 pages
Glossary for AI ML and Statistics
No ratings yet
Glossary for AI ML and Statistics
85 pages
T6 Moleculardescriptors Variable Selection
No ratings yet
T6 Moleculardescriptors Variable Selection
11 pages
HW3+Solution
No ratings yet
HW3+Solution
10 pages
Modern Pridictive Modelling(Regression)
No ratings yet
Modern Pridictive Modelling(Regression)
12 pages
Unit-14
No ratings yet
Unit-14
23 pages
Ridge Mt1cars
No ratings yet
Ridge Mt1cars
4 pages
Module_3
No ratings yet
Module_3
6 pages
Mi Unit 5 2m
No ratings yet
Mi Unit 5 2m
3 pages
Poisson Regression - Stata Data Analysis Examples
No ratings yet
Poisson Regression - Stata Data Analysis Examples
12 pages
Minor Project
No ratings yet
Minor Project
9 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
9 - APM 1205 Linear Model
No ratings yet
9 - APM 1205 Linear Model
20 pages
Model Selection
No ratings yet
Model Selection
11 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
1 Kundur IntroDSP Handouts
No ratings yet
1 Kundur IntroDSP Handouts
13 pages
Regression and Analysis
No ratings yet
Regression and Analysis
132 pages
UNIT 3 For ACfn & MGT
No ratings yet
UNIT 3 For ACfn & MGT
28 pages
15 Newtons Divided Difference Lecture
No ratings yet
15 Newtons Divided Difference Lecture
23 pages
2017 Fall ME349 03 NumAnalysis1
No ratings yet
2017 Fall ME349 03 NumAnalysis1
41 pages
01 - Simple Linear Regression
No ratings yet
01 - Simple Linear Regression
24 pages
Quiz 1 - Econometrics 2
No ratings yet
Quiz 1 - Econometrics 2
8 pages
Regression Analysis
No ratings yet
Regression Analysis
22 pages
A Flux Reconstruction Approach To High Order Schemes Including Discontinuous Galerkin Method
No ratings yet
A Flux Reconstruction Approach To High Order Schemes Including Discontinuous Galerkin Method
42 pages
SOA Exam SRM - ASM Learning Flashcards
No ratings yet
SOA Exam SRM - ASM Learning Flashcards
26 pages
Para HP 50g Programas Matema
No ratings yet
Para HP 50g Programas Matema
21 pages
3 Interpolation
No ratings yet
3 Interpolation
32 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
Neighbors of Pixel
100% (2)
Neighbors of Pixel
6 pages
f21 Pratice Midterm 2 Solution (Part I)
No ratings yet
f21 Pratice Midterm 2 Solution (Part I)
7 pages
Smooth Signed Distance Field Textures
No ratings yet
Smooth Signed Distance Field Textures
114 pages
10120-ICPMG10 Rosquoet
No ratings yet
10120-ICPMG10 Rosquoet
7 pages
rohini_50134589512
No ratings yet
rohini_50134589512
7 pages
STAT2 2e R Markdown Files Sec4.7
No ratings yet
STAT2 2e R Markdown Files Sec4.7
10 pages
Lecture Notes 4-Interpolation
No ratings yet
Lecture Notes 4-Interpolation
35 pages
Interpolation With Higher Order Polynomials Is Prone To The Runge Phenomenon
No ratings yet
Interpolation With Higher Order Polynomials Is Prone To The Runge Phenomenon
34 pages
BA test
No ratings yet
BA test
7 pages
MTH603 Final Term (GIGA FILE by Ishfaq V11.02.02) 2 PDF
No ratings yet
MTH603 Final Term (GIGA FILE by Ishfaq V11.02.02) 2 PDF
171 pages
Ijsei 43815 06
No ratings yet
Ijsei 43815 06
6 pages
Assignment5_Regression Analysis - solutions
No ratings yet
Assignment5_Regression Analysis - solutions
5 pages
Course Note
No ratings yet
Course Note
121 pages
Interview Questions - Linear Regression
No ratings yet
Interview Questions - Linear Regression
6 pages
Previewpdf
No ratings yet
Previewpdf
66 pages
Computer Oriented Numerical Methods (CONM) 2620004
No ratings yet
Computer Oriented Numerical Methods (CONM) 2620004
3 pages
Fast Discrete Sinc-Interpolation: A Gold Standard For Image Resampling
No ratings yet
Fast Discrete Sinc-Interpolation: A Gold Standard For Image Resampling
69 pages