0% found this document useful (0 votes)

19 views52 pages

TrainingNN 1

The document discusses training neural networks and provides guidance on topics like learning rates, overfitting, hyperparameters, and debugging. It outlines a basic recipe for machine learning including splitting data into train, validation, and test sets. It also discusses techniques like learning curves and grid search to optimize hyperparameters.

Uploaded by

Arooj Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views52 pages

TrainingNN 1

Uploaded by

Arooj Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Training Neural

Networks

1
Training Schedule
Manually specify learning rate for entire training process

• Manually set learning rate every n-epochs

• How?
– Trial and error (the hard way)
– Some experience (only generalizes to some degree)

Consider: #epochs, training set size, network size, etc.

22
Basic Recipe for Training

23
Learning
• Learning means generalization to unknown
dataset
– I.e., train on known dataset → test with optimized
parameters on unknown dataset
•
• Basically, we hope that based on the train set, the
optimized parameters will give similar results on
different data (i.e., test data)
Learning
• Training set (‘train’):
– Use for training your neural network
• Validation set (‘val’) - often cross-validation:
– Hyper parameter optimization
– Check generalization progress
• Test set (‘test’):
– Only for the very end
– NEVER TOUCH DURING DEVELOPMENT OR TRAINING
Learning
• Typical splits
– Train (60%), Val (20%), Test (20%)
– Train (80%), Val (10%), Test (10%)
– Train (98%), Val (1%), Test (1%)
– Train (80%), Cross-Val, Test (20%)

• During training:
– Train error comes from average minibatch error
– Typically take subset of validation every n iterations
Cross validation

27
Basic Recipe for Machine Learning

Done
I
Over- and Underfitting

Underfitted Appropriate
Overfitted
Source: Deep Learning by Adam Gibson, Josh Patterson, O‘Reily Media Inc., 2017
Over- and Underfitting

Source: https://srdas.github.io/DLBook/ImprovingModelGeneralization.html

32
Learning Curves
• Training graphs
- Accuracy Loss
Learning Curves
val
t
e
s
t
Overfitting Curves
Val
t
e
s
t

Source: https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
Other Curves

Underfitting (loss still decreasing) Validation set is easier than Training set
Source: https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
To Summarize
• Underfitting
– Training and validation losses decrease even at the end of
training
• Overfitting
– Training loss decreases and validation loss increases
• Ideal Training
– Small gap between training and validation loss, and both
go down at same rate (stable without fluctuations).
To Summarize
• Bad Signs
– Training error not going down
– Validation error not going down
– Performance on validation better than on training set
– Tests on train set different than during training
• Bad Practice
– Training set contains test data Never touch during
development or
– Debug algorithm on test data
training
Hyperparameters
• Network architecture (e.g., num layers, #weights)
• Number of iterations
• Learning rate(s) (i.e., solver parameters, decay, etc.)
• Regularization
• Batch size
• …
• Overall:
learning setup + optimization = hyperparameters
Hyperparameter Tuning
Grid search
• Methods: 1

Second Parameter
– Manual search: 0,8

0,6
• experience-based
– Grid search (structured, for ‘real’ applications) 0,4

0,2
• Define ranges for all parameters spaces and select points
0
• Usually pseudo-uniformly distributed 0 0,2 0,4 0,6 0,8 1
→ Iterate over all possible configurations First Parameter

– Random search: Random search

Like grid search but one picks points at random in the 1
predefined ranges

Second Parameter
0,8
– Auto-ML search: 0,6
Bayesian framework; gradient descent on gradient 0,4
descent, typically complex 0,2

0
0 0,2 0,4 0,6 0,8 1
First Parameter
How to Start
ONE

FEW

MANY
…
Find a Good Learning Rate

Loss

Training time
Karpathy’s constant

43
Coarse Grid Search
• Choose a few values of learning rate and weight
decay around what worked from
• Train a few models for a few epochs.
• Good weight decay to try: 1e-4, 1e-5, 0
Grid search
• Not scalable when there are many 1

Second Parameter
0,8

hyperparameters. 0,6

0,4

0,2

0
0 0,2 0,4 0,6 0,8 1
First Parameter

45
46
Refine Grid
• Pick best models found with coarse grid.
• Refine grid search around these models.
• Train them for longer (10-20 epochs) without learning
rate decay
• Study loss curves <- most important debugging tool!
Network Architecture
• Frequent mistake: “Let’s use this super big network,
train for two weeks and we see where we stand.”

• Instead: start with a simple

network.

• Get debug cycles down

– Ideally, minutes
Debugging
• Use train/validation/test curves
– Evaluation needs to be consistent
– Numbers need to be comparable

• Only make one change at a time

– “I’ve added 5 more layers and double the training size, and
now I also trained 5 days longer. Now it’s better, but why?”
Weight Initialization
Small Random Numbers
• Gaussian with zero mean and standard deviation 0.01
• Let’s see what happens:
– Network with 10 layers with 500 neurons each
–Tanh as activation functions
– Input unit Gaussian data

5 CommonPractices
No ratings yet
5 CommonPractices
106 pages
Fixing Neural Network Course 2 1659759284
No ratings yet
Fixing Neural Network Course 2 1659759284
30 pages
L7-Lecture-Image.classification.DNN-v4
No ratings yet
L7-Lecture-Image.classification.DNN-v4
61 pages
6.TrainingNN
No ratings yet
6.TrainingNN
51 pages
Lecture 13
No ratings yet
Lecture 13
39 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
ML MODULE6 Artificial Neural Networks
No ratings yet
ML MODULE6 Artificial Neural Networks
42 pages
Hyperparameter Tuning and LR Scheduling
No ratings yet
Hyperparameter Tuning and LR Scheduling
22 pages
DL Unit-2
No ratings yet
DL Unit-2
32 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
ML Lec 09 ANN Quadratic Training
No ratings yet
ML Lec 09 ANN Quadratic Training
44 pages
6_Tips for Training Deep Neural Networks
No ratings yet
6_Tips for Training Deep Neural Networks
59 pages
Unit 4 A
No ratings yet
Unit 4 A
16 pages
DL Regularization (1)
No ratings yet
DL Regularization (1)
28 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
DL_Session10_WK11_Part2
No ratings yet
DL_Session10_WK11_Part2
7 pages
Training NNs
No ratings yet
Training NNs
34 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Lec 8
No ratings yet
Lec 8
43 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
No ratings yet
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
4 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
Deep Learning
No ratings yet
Deep Learning
3 pages
A Recipe For Training Neural Networks
No ratings yet
A Recipe For Training Neural Networks
15 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
AN2DL_03_2324_NeuralNetwroksTraining
No ratings yet
AN2DL_03_2324_NeuralNetwroksTraining
40 pages
UNIT 4
No ratings yet
UNIT 4
13 pages
Artificial Neural Networks_dl
No ratings yet
Artificial Neural Networks_dl
55 pages
Parameters and LL
No ratings yet
Parameters and LL
6 pages
Pattern Classification 11. Backpropagation & Time-Series Forecasting
No ratings yet
Pattern Classification 11. Backpropagation & Time-Series Forecasting
78 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Lec 2
No ratings yet
Lec 2
5 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
cours4
No ratings yet
cours4
30 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
tutorial 4
No ratings yet
tutorial 4
6 pages
Lect 7
No ratings yet
Lect 7
43 pages
Unit 2
No ratings yet
Unit 2
37 pages
6 Working Example 01-08-2024
No ratings yet
6 Working Example 01-08-2024
21 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Unit 3
No ratings yet
Unit 3
110 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
No ratings yet
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
39 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep Learning concepts ppt
No ratings yet
Deep Learning concepts ppt
13 pages
DL Class3
No ratings yet
DL Class3
28 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages

TrainingNN 1

Uploaded by

TrainingNN 1

Uploaded by

Training Neural

• Manually set learning rate every n-epochs

Consider: #epochs, training set size, network size, etc.

– Random search: Random search

• Instead: start with a simple

• Get debug cycles down

• Only make one change at a time

You might also like