RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
Hyperparameters
and cross-validation
Course: RO47002
Lecturer:Julian Kooij
2
Hyperparameters
Hyperparameters are parameters (modelling choices)
which are not optimized in the training phase:
• Parameters for feature extraction # of features, scaling
• Type parameter space of model linear vs polynomial fit
• The choice of model decision tree vs SVM
• Parameters that affect the used loss weight terms, ℓ1 vs ℓ2
• Parameters of optimization procedure the learning rate
• …
3
Hyperparameters
How to determine hyperparameter values?
1. Keep fixed, hope you have good values
– Probably not optimal. Requires good intuition of problem, experience, or literature
research
2. Optimize manually “tweaking”
– part of model exploration, but not systematic, ad-hoc
3. Additional hyperparameter optimization
– systematic, but still requires human input on what to try. Curse of combinatorics
A: No, 1-NN will have no errors, per definition 1-NN Decision regions
• With infinite training data, all test cases are known
→ this would be a perfect classifier!
• But, we definitely will get unseen test samples
• Might have captured noise in data
A: No, be careful!
• Test data is still used to optimize your hypothesis function
• Not “unseen” anymore, must be considered training data
• performance on this data is probably over-optimistic
when using in your actual application!
• Cannot compare anymore with reported test results
6
Validation split
• Voluntarily split apart part of training data as “validation” data
– E.g. keep 80% for training, use 20% for validation
– Train all models on reduced training data
– Make model selection choices on performance of validation split
– Afterwards, can use on all training data again for final model training
• Don’t touch your test data, until you present your results
– Must accept that your model’s performance on test data is suboptimal, but fair
– E.g. public benchmark servers keep true labels of test data hidden.
Participants cannot compute test performance and optimize on it!
Problem: results may be affected by how data is split (e.g. where outliers go)
7
n-fold Cross-validation
• Divide training data
into n splits, called “folds”
• Perform n experiments
– Each time, validate on other split
– Train on remaining splits
Pros
• All training data used
for optimization and evaluation
• Generates statistics
on performance (mean, std.dev)
Cons
• N-times more work than single validation split
Image: https://scikit-learn.org/stable/modules/cross_validation.html
8
Hyperparameter search
Strategies to optimize hyperparameter combinations
• For discrete values, exhaustively try all → only works with few options, few hyperparameters
• For continues values:
Grid search Random search
best_err = None best_err = None
best_params = None best_params = None
for p1 in [0, 0.5, 1.0]: for p1 in numpy.random.rand(3):
for p2 in [0, 0.5, 1.0]: for p2 in numpy.random.rand(3):
err = run_crossval(p1,p2) err = run_crossval(p1,p2)
if err < best_err: if err < best_err:
best_err = err best_err = err
best_params = (p1,p2) best_params = (p1,p2)
print(’best params:’ + best_params) print(’best params:’ + best_params)
9
Hyperparameter search
9 trials … 9 trials …
optimum missed optimum found!
Conclusions
• Hyperparameter optimization non-trivial
• Selecting hyperparameters, model requires
separate validation data
• Don’t trick yourself with train or test performance
• When possible, use cross-validation
• Random search is preferred over grid search,
especially when hyperparameters are correlated or
irrelevant