0% found this document useful (0 votes)
31 views

RO47002 - Lecture 2C - Hyperparameters and Cross-Validation

The document discusses hyperparameters and cross-validation in machine learning. It defines hyperparameters and describes challenges in selecting their values, such as overfitting training data. It recommends using cross-validation and validation data sets to select models and evaluate performance, rather than relying on training or test accuracy alone. Random search is presented as more efficient than grid search for hyperparameter optimization.

Uploaded by

Haia Al Sharif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

RO47002 - Lecture 2C - Hyperparameters and Cross-Validation

The document discusses hyperparameters and cross-validation in machine learning. It defines hyperparameters and describes challenges in selecting their values, such as overfitting training data. It recommends using cross-validation and validation data sets to select models and evaluate performance, rather than relying on training or test accuracy alone. Random search is presented as more efficient than grid search for hyperparameter optimization.

Uploaded by

Haia Al Sharif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1

Hyperparameters
and cross-validation
Course: RO47002
Lecturer:Julian Kooij
2

Hyperparameters
Hyperparameters are parameters (modelling choices)
which are not optimized in the training phase:
• Parameters for feature extraction # of features, scaling
• Type parameter space of model linear vs polynomial fit
• The choice of model decision tree vs SVM
• Parameters that affect the used loss weight terms, ℓ1 vs ℓ2
• Parameters of optimization procedure the learning rate
• …
3

Hyperparameters
How to determine hyperparameter values?
1. Keep fixed, hope you have good values
– Probably not optimal. Requires good intuition of problem, experience, or literature
research
2. Optimize manually “tweaking”
– part of model exploration, but not systematic, ad-hoc 
3. Additional hyperparameter optimization
– systematic, but still requires human input on what to try. Curse of combinatorics

Fundamental challenge in ML:


• How can we compare modelling choices and draw correct conclusions?
4

(Wrong) idea 1: Use training performance


Q: Should I use a 1-Nearest Neighbour, or 3-NN classifier?
• Let’s test on the training data, what is classification error?

A: No, 1-NN will have no errors, per definition 1-NN Decision regions
• With infinite training data, all test cases are known
→ this would be a perfect classifier!
• But, we definitely will get unseen test samples
• Might have captured noise in data

Do not assume that good performance on training data


is indicative of good future performance

Don’t get obsessed with 100% training accuracy → trivial to achieve!


5

(Wrong) idea 2: Use test performance


Q: Ok, so should I evaluate my candidate models on test data,
and pick the best performing?

A: No, be careful!
• Test data is still used to optimize your hypothesis function
• Not “unseen” anymore, must be considered training data
• performance on this data is probably over-optimistic
when using in your actual application!
• Cannot compare anymore with reported test results
6

Validation split
• Voluntarily split apart part of training data as “validation” data
– E.g. keep 80% for training, use 20% for validation
– Train all models on reduced training data
– Make model selection choices on performance of validation split
– Afterwards, can use on all training data again for final model training

• Don’t touch your test data, until you present your results
– Must accept that your model’s performance on test data is suboptimal, but fair
– E.g. public benchmark servers keep true labels of test data hidden.
Participants cannot compute test performance and optimize on it!

• Best case: validation data is representative of test data


– Trade-off: larger validation split, fewer data left for training …

Problem: results may be affected by how data is split (e.g. where outliers go)
7

n-fold Cross-validation
• Divide training data
into n splits, called “folds”
• Perform n experiments
– Each time, validate on other split
– Train on remaining splits

Pros
• All training data used
for optimization and evaluation
• Generates statistics
on performance (mean, std.dev)
Cons
• N-times more work than single validation split
Image: https://scikit-learn.org/stable/modules/cross_validation.html
8

Hyperparameter search
Strategies to optimize hyperparameter combinations
• For discrete values, exhaustively try all → only works with few options, few hyperparameters
• For continues values:
Grid search Random search
best_err = None best_err = None
best_params = None best_params = None
for p1 in [0, 0.5, 1.0]: for p1 in numpy.random.rand(3):
for p2 in [0, 0.5, 1.0]: for p2 in numpy.random.rand(3):
err = run_crossval(p1,p2) err = run_crossval(p1,p2)
if err < best_err: if err < best_err:
best_err = err best_err = err
best_params = (p1,p2) best_params = (p1,p2)
print(’best params:’ + best_params) print(’best params:’ + best_params)
9

Hyperparameter search
9 trials … 9 trials …
optimum missed optimum found!

• In case of unimportant hyperparameters, grid search wastes time


• random search more efficient, better chance of finding maxima!
J. Bergstra, and Y. Bengio. "Random search for hyper-parameter optimization." JMLR 13.1 (2012): 281-305.
10

Conclusions
• Hyperparameter optimization non-trivial
• Selecting hyperparameters, model requires
separate validation data
• Don’t trick yourself with train or test performance
• When possible, use cross-validation
• Random search is preferred over grid search,
especially when hyperparameters are correlated or
irrelevant

You might also like