0% found this document useful (0 votes)
7 views

Resampling Methods Class 2

Uploaded by

nick peak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Resampling Methods Class 2

Uploaded by

nick peak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Lecture 1

Resampling Methods

Manoj Kumar

Machine Learning
Youtube

October 30, 2024

October 30, 2024 1 / 15


Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.

x y
0.0000 1.4544
0.1579 2.1039
0.3158 1.6518
0.4737 1.5701
0.6316 2.1284
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.

y x x2
1.4544 0.0000 0.0000
2.1039 0.1579 0.0249
1.6518 0.3158 0.0998
1.5701 0.4737 0.2244
2.1284 0.6316 0.3989
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.

y x x2 x3
1.4544 0.0000 0.0000 0.0000
2.1039 0.1579 0.0249 0.0039
1.6518 0.3158 0.0998 0.0315
1.5701 0.4737 0.2244 0.1063
2.1284 0.6316 0.3989 0.2519
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.

y x x2 x3
1.4544 0.0000 0.0000 0.0000
2.1039 0.1579 0.0249 0.0039
1.6518 0.3158 0.0998 0.0315
1.5701 0.4737 0.2244 0.1063
2.1284 0.6316 0.3989 0.2519

Does higher value of polynomial degree


means better model?
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.

y x x2 ··· x9
1.4544 0.0000 0.0000 ··· 0.0000
2.1039 0.1579 0.0249 ··· 0.0000
1.6518 0.3158 0.0998 ··· 0.0003
1.5701 0.4737 0.2244 ··· 0.0013
2.1284 0.6316 0.3989 ··· 0.0101
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.

y x x2 ··· x19
1.4544 0.0000 0.0000 ··· 0.0000
2.1039 0.1579 0.0249 ··· 0.0000
1.6518 0.3158 0.0998 ··· 0.0000
1.5701 0.4737 0.2244 ··· 0.0001
2.1284 0.6316 0.3989 ··· 0.0004
Given a dataset, begin by splitting into

October 30, 2024 2 / 15


Given a dataset, begin by splitting into

October 30, 2024 2 / 15


Given a dataset, begin by splitting into

Model assessment: Use TEST to assess the accuracy of the model you output.

October 30, 2024 2 / 15


Given a dataset, begin by splitting into

Model assessment: Use TEST to assess the accuracy of the model you output.
Never ever train or choose parameters based on the test data.

October 30, 2024 2 / 15


Validation set Approach

October 30, 2024 4 / 15


Validation set Approach

Goal: Estimate the test error for a supervised learning method.

Strategy:
Split the data into 2 parts
Train the method in the first part
Compute the error on the second part

October 30, 2024 4 / 15


Validation set approach

Estimates can vary considerably with different train-validation splits.


Only subset of points used to evaluate model.

October 30, 2024 5 / 15


Leave one out cross-validation

October 30, 2024 6 / 15


Leave one out cross-validation
For every i = 1, . . . , n:
▶ Train the model on every point except i,
▶ Compute the test error on the held-out point.
Average the test errors.

October 30, 2024 6 / 15


Leave one out cross-validation

For every i = 1, . . . , n:
▶ Train the model on every point except i,
▶ Compute the test error on the held-out point.
Average the test errors.

n
1X
CV(n) = (yi − ŷi )2
n
i=1

Prediction for the i sample without using the ith sample.

October 30, 2024 7 / 15


Leave one out cross-validation

For every i = 1, . . . , n:
▶ Train the model on every point except i,
▶ Compute the test error on the held-out point.
Average the test errors.

n
1X
CV(n) = 1(yi ̸= ŷi )
n
i=1

... for a classification problem.

October 30, 2024 8 / 15


Leave one out cross-validation

Computing CV(n) can be computationally expensive, since it involves fitting the model n
times.
A single linear regression fit takes O(d 3 + nd 2 ) time.

October 30, 2024 9 / 15


k-fold cross-validation
Split the data into k subsets or folds.
For every i = 1, . . . , k:
▶ Train the model on every fold except the i-th fold,
▶ Compute the test error on the i-th fold.
Average the test errors.

October 30, 2024 10 / 15


5-Fold Cross Validation Demo
Given k folds, to determine the quality of a particular hyperparameter, e.g., alpha =
0.1:
Pick a fold, which we’ll call the validation fold. Train the model on all but this fold.
Compute the error on the validation fold.
Repeat the step above for all k possible choices of validation fold.
Quality of the hyperparameter is the average of the k validation fold errors.

October 30, 2024 12 / 15


5-Fold Cross Validation Demo
Given k folds, to determine the quality of a particular hyperparameter, e.g., alpha =
0.1:
Pick a fold, which we’ll call the validation fold. Train the model on all but this fold.
Compute the error on the validation fold.
Repeat the step above for all k possible choices of validation fold.
Quality of the hyperparameter is the average of the k validation fold errors.

October 30, 2024 12 / 15


Picking K

Typical choices of k are 5, 10, and N, where N is the amount of data.


K = N is also known as “leave one out cross-validation,” and will typically give you the
best results.
▶ In this approach, each validation set is only one point.
▶ Every point gets a chance to get used as the validation set.
k = N is also very expensive, requiring you to fit a huge number of models.

Ultimately, the tradeoff is between k and computation time.

October 30, 2024 13 / 15


LOOCV vs. k-fold cross-validation

k-fold CV depends on the chosen split.


In k-fold CV, we train the model on less data than what is available. This
introduces bias into the estimates of test error.
In LOOCV, the training samples highly resemble each other. This increases the
variance of the test error estimate.
October 30, 2024 14 / 15

You might also like