0% found this document useful (0 votes)

43 views

CNNs Pytorch

The document discusses different training tricks for CNNs using PyTorch, including ReLUs, dropout, and batch normalization. ReLUs train neural networks several times faster than sigmoids. Dropout prevents overfitting by randomly dropping neurons during training, forcing neurons to learn more robust features. Batch normalization standardizes the inputs to each layer to speed up training by reducing the internal covariate shift problem between layers. It normalizes each dimension independently and adds scale and shift parameters to allow the network flexibility.

Uploaded by

Alex Saave

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

CNNs Pytorch

Uploaded by

Alex Saave

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Training Tricks for CNNs + PyTorch

IALAB UC

Computer Science Department, PUC

IALAB UC Deep Learning I DCC 1 / 20

Today’s Schedule

Training Tricks
ReLUs
Dropout
Batch Normalization
PyTorch - in depth!

IALAB UC Deep Learning I DCC 2 / 20

Training Tricks: Rectified Linear Units (ReLUs)

Usually, NNs use sigmoid activation functions, such as tanh(x) or

(1 + exp−x )−1 . Here, however, they use Rectified Linear Units
(ReLU).
Empirical observation: Deep convolutional neural networks with
ReLUs train several times faster than their equivalents with
sigmoid units.
IALAB UC Deep Learning I DCC 3 / 20
Training Tricks: Rectified Linear Units (ReLUs)

Example: A four-layer convolutional neural network with ReLUs

(solid line) reaches a 25% training error rate on CIFAR-10 six
times faster than an equivalent network with tanh(x) neurons
(dashed line).

IALAB UC Deep Learning I DCC 4 / 20

Training Tricks: Dropout1

In general, combining different models can be very useful (Mixture

of experts, majority voting, boosting, etc.).
Training many different models, however, is very time consuming.
Here, they introduce Dropout.

1
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
http://jmlr.org/papers/v15/srivastava14a.html
IALAB UC Deep Learning I DCC 5 / 20
Training Tricks: Dropout

At each iteration (epoch) set the output of each hidden neuron to

zero with a certain probability.
The neurons which are “dropped out” in this way do not contribute
to the forward pass and do not participate in backpropagation.
For every input, the neural network samples a different
architecture, but all these architectures share weights.
This technique reduces complex co-adaptations of neurons, since
each neuron cannot rely on the presence of other neurons.

IALAB UC Deep Learning I DCC 6 / 20

Training Tricks: Dropout

Each neuron is forced to learn more robust features that are

useful in conjunction with many different random subsets of other
neurons.
Without dropout, the network exhibits substantial overfitting.
Downside: Dropout increases the number of iterations required to
converge.

IALAB UC Deep Learning I DCC 7 / 20

Training Tricks: Dropout

If you want to know of a few extensions to Dropout (which we won’t

cover here):
DropBlock2 : Dropping whole sets of neurons.
DropConnect3 : Dropping weights and biases randomly.

2
DropBlock: A regularization method for convolutional networks https://arxiv.org/abs/1810.12890
3
Regularization of Neural Networks using DropConnect
http://yann.lecun.com/exdb/publis/pdf/wan-icml-13.pdf
IALAB UC Deep Learning I DCC 8 / 20
Training Tricks: Batch Normalization4

As a ML problem, it is desirable that the distribution of training and

test data matches.
However, the distribution of the input to each layer changes during
training (why?).
This slows down the training by requiring lower learning rates and
careful parameter initialization.
We would like to keep the distribution of x̄ fixed over time.
Then, θ2 does not have to readjust to compensate for the change
in the distribution of x̄.

4
https://arxiv.org/abs/1502.03167
IALAB UC Deep Learning I DCC 9 / 20
Batch Normalization (BN), Ioffe and Szegedy, 2015

BN takes a step towards reducing the internal covariate shift

problem between layers.
Specifically, BN introduces a normalization step that fixes the
means and variances of layer inputs.

x̄ (k ) − E[x̄ (k ) ]
x̂ (k ) = p (1)
Var [x̄ (k ) ]
Expectation and variance are computed over the mini-batch for
each dimension k . So each dimension is normalized
independently.
Therefore, BN normalizes each scalar feature independently
trying to make them unit gaussian, i.e., each dimension has zero
mean and unit variance.

IALAB UC Deep Learning I DCC 10 / 20

Batch Normalization (BN)

A small constant is added to the denominator to avoid an eventual

division by zero.

IALAB UC Deep Learning I DCC 11 / 20

Batch Normalization (BN)

Simply normalizing each input of a layer might change the

representational space of the layer.
Ex. normalizing the inputs of a sigmoid would constrain them to
lay in the linear part of the sigmoid.
We need to add a mechanism to compensate for this effect.
IALAB UC Deep Learning I DCC 12 / 20
Batch Normalization (BN)

BN introduces a transformation that, if needed, allows the network

to cancel the operation of the BN operator.
In other words, the transformation provides the network with the
flexibility to convert the BN operator in the identity function.
Specifically, BN introduces for each activation x (k ) , parameters γ k
and β k that scale and shift the normalized value.
Using this transformation, if needed, the network can recover the
original activations by setting γ k = Var [x̄ (k ) ] and β k = E[x̄ (k ) ].
(k ) (k )
Recall that: x̂ (k ) = x̄√ −E[x̄(k ) ]
Var [x̄ ]

Parameters γk and βk are learned during training.

IALAB UC Deep Learning I DCC 13 / 20

Batch Normalization (BN)

IALAB UC Deep Learning I DCC 14 / 20

What happens at test time?
Do we have a mini batch?
How mini-batch normalization operates at test time?

Mini-batches are normalized using the

population, rather than mini-batch statistics.

IALAB UC Deep Learning I DCC 15 / 20

Batch Normalization (BN)

IALAB UC Deep Learning I DCC 16 / 20

Batch Normalization (BN)

IALAB UC Deep Learning I DCC 17 / 20

Batch Normalization (BN)

IALAB UC Deep Learning I DCC 18 / 20

Batch Normalization (BN) More Than Just Normalization

IALAB UC Deep Learning I DCC 19 / 20

Stock Market Price Prediction Using LSTM RNN
No ratings yet
Stock Market Price Prediction Using LSTM RNN
11 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
BN Layer
No ratings yet
BN Layer
4 pages
Notes For - Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - Paper GitHub
No ratings yet
Notes For - Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - Paper GitHub
3 pages
Batch Normalization Separate
No ratings yet
Batch Normalization Separate
20 pages
Batch Normalization
No ratings yet
Batch Normalization
6 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
ppt3dl
No ratings yet
ppt3dl
15 pages
SNGAN_5th_Module
No ratings yet
SNGAN_5th_Module
12 pages
9.b Handout-2-Regularization
No ratings yet
9.b Handout-2-Regularization
5 pages
Batch Normalisation
No ratings yet
Batch Normalisation
17 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
Adl Unit 1 2
No ratings yet
Adl Unit 1 2
67 pages
Deep Learning
No ratings yet
Deep Learning
3 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
8.TrainingNN-3
No ratings yet
8.TrainingNN-3
67 pages
How To Use Batch Normalization With TensorFlow and TF - Keras To Train Deep Neural Networks Faster
No ratings yet
How To Use Batch Normalization With TensorFlow and TF - Keras To Train Deep Neural Networks Faster
11 pages
Chen, Deng et al 2021 - Effective and Efficient Batch Normalization
No ratings yet
Chen, Deng et al 2021 - Effective and Efficient Batch Normalization
15 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
WEEK 10
No ratings yet
WEEK 10
69 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Unit II.
No ratings yet
Unit II.
14 pages
DL UNIT 3
No ratings yet
DL UNIT 3
14 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep Learning
100% (1)
Deep Learning
49 pages
Unit 2
No ratings yet
Unit 2
112 pages
Intro to Neural network
No ratings yet
Intro to Neural network
25 pages
批处理标准化如何帮助优化（李宏毅教授视频推荐）
No ratings yet
批处理标准化如何帮助优化（李宏毅教授视频推荐）
26 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
No ratings yet
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
9 pages
DL Unit-3
No ratings yet
DL Unit-3
10 pages
Unit 2
No ratings yet
Unit 2
112 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Neural Network Toolbox: A Tutorial For The Course Computational Intelligence
No ratings yet
Neural Network Toolbox: A Tutorial For The Course Computational Intelligence
8 pages
Dropout
No ratings yet
Dropout
14 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Deep Nets
No ratings yet
Deep Nets
25 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Deep Learing
No ratings yet
Deep Learing
37 pages
Difference Between Local Response Normalization and Batch Normalization - by Aqeel Anwar - Towards Data Science
No ratings yet
Difference Between Local Response Normalization and Batch Normalization - by Aqeel Anwar - Towards Data Science
9 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
2 Deep Neural Network_241120_095158
No ratings yet
2 Deep Neural Network_241120_095158
47 pages
A Probabilistic Theory of Deep Learning: Unit 2
No ratings yet
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
INTRO TO Deep Learning Focusing on ToolS_Knowlexon_Biswa
No ratings yet
INTRO TO Deep Learning Focusing on ToolS_Knowlexon_Biswa
37 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Deep Learning Interview Q&a
No ratings yet
Deep Learning Interview Q&a
7 pages
2. Deep Neural Network
No ratings yet
2. Deep Neural Network
60 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
6S191 MIT DeepLearning L3
100% (1)
6S191 MIT DeepLearning L3
60 pages
Poster Template
100% (1)
Poster Template
1 page
Intrusion Detection System Using Voting-Based Neural Network
No ratings yet
Intrusion Detection System Using Voting-Based Neural Network
12 pages
Lecture 3 - MATLAB Representation of Neural Network
No ratings yet
Lecture 3 - MATLAB Representation of Neural Network
6 pages
A Brief Review of Feed-Forward Neural Networks
No ratings yet
A Brief Review of Feed-Forward Neural Networks
8 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
10 RNN
No ratings yet
10 RNN
77 pages
19eid331 - Artificial Neural Networks
No ratings yet
19eid331 - Artificial Neural Networks
3 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
02 Deep Feedforward Learning - Notes
No ratings yet
02 Deep Feedforward Learning - Notes
34 pages
AI31
No ratings yet
AI31
13 pages
Malware Classification Using Deep Learning: Mohd Shahril
No ratings yet
Malware Classification Using Deep Learning: Mohd Shahril
48 pages
Week 3
No ratings yet
Week 3
15 pages
ML 2
No ratings yet
ML 2
3 pages
08 An Example of NN Using ReLu
No ratings yet
08 An Example of NN Using ReLu
10 pages
Prediction of Student's Performance With Deep Neural Networks
No ratings yet
Prediction of Student's Performance With Deep Neural Networks
9 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Deep Learning Book
100% (1)
Deep Learning Book
1,029 pages
ENSEMBLE_LEARNING
No ratings yet
ENSEMBLE_LEARNING
9 pages
Retracted Comparative Analysis OfDeepfake Image Detection
No ratings yet
Retracted Comparative Analysis OfDeepfake Image Detection
19 pages
Numericals
No ratings yet
Numericals
4 pages
Module 2
No ratings yet
Module 2
13 pages
CSE3008 Module3
No ratings yet
CSE3008 Module3
38 pages
Lab 7
No ratings yet
Lab 7
12 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
2 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
ML Mentorship Prahitha Movva V1
No ratings yet
ML Mentorship Prahitha Movva V1
5 pages
NN -4TH
No ratings yet
NN -4TH
26 pages
Unit2
No ratings yet
Unit2
25 pages