Batch Normalization: Motivation

Batch normalization addresses the problem of internal covariate shift in deep learning models by normalizing layer inputs through centering and rescaling. It works by calculating the mean and variance of a batch of inputs and then normalizing each input using those statistics and learned scale and shift parameters. This normalization accelerates learning by smoothing the optimization function.

Uploaded by

Raghunath Siripudi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Batch Normalization: Motivation

Uploaded by

Raghunath Siripudi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Batch Normalization

Motivation
Batch normalization was originally developed to address the problem of “internal covariate shift”.
The randomness of initial weight values and the randomness of the batch selection can create
situations unfavorable for training. This may become worse in a deep learning model, because
small changes in shallower hidden layers will be amplified as they propagate within the network,
resulting in significant shift in deeper hidden layers.
It was found experimentally that batch normalization accelerates the speed of learning, but the
current view is that this acceleration is not related to an improvement in the internal covariate shift.
Instead, the current view is that batch normalization “smooths” the function to be optimized.

The idea
The idea is to “normalize” the input to a layer by re-centering and re-scaling. Let B be a batch of
training data, containing the m examples x1 . . . xm . (xi is taken as a scalar. For multidimensional
data the batch normalization is applied separately in each dimension.) Then batch normalization
replaces each xi with yi defined as follows:

yi = γxi + β,

where:
1 X 1 X xi − µB
µB = xi , vB = (xi − µb )2 , xi = √
m m vB +
i i

The parameters γ, β are learned during the back propagation optimization. is a small positive
value that guards against numerical instability that may occur if vB is too small.

Statistical Methods For Bioinformatics Lecture 5
No ratings yet
Statistical Methods For Bioinformatics Lecture 5
48 pages
Batch Norm Parameter Tuning
No ratings yet
Batch Norm Parameter Tuning
2 pages
AIOT unit 1 (6)
No ratings yet
AIOT unit 1 (6)
23 pages
Postopt
No ratings yet
Postopt
5 pages
斯坦福大学机器学习数学基础 57-64
No ratings yet
斯坦福大学机器学习数学基础 57-64
8 pages
6.1 No Feasible Solution: Lecture 6: Special Cases in Simplex Method
No ratings yet
6.1 No Feasible Solution: Lecture 6: Special Cases in Simplex Method
12 pages
SVM
No ratings yet
SVM
17 pages
1 Review
No ratings yet
1 Review
7 pages
LG 04
No ratings yet
LG 04
21 pages
Slides 2
No ratings yet
Slides 2
27 pages
20-1135
No ratings yet
20-1135
41 pages
Lecture 7 Large-Scale
No ratings yet
Lecture 7 Large-Scale
6 pages
Day 1
No ratings yet
Day 1
41 pages
Induction Over Strategic Agents: A Genetic Algorithm Solution
No ratings yet
Induction Over Strategic Agents: A Genetic Algorithm Solution
12 pages
Bruch Addition
No ratings yet
Bruch Addition
10 pages
Optimal Control Under Unknown Intensity With Bayesian Learning
No ratings yet
Optimal Control Under Unknown Intensity With Bayesian Learning
23 pages
Regression
No ratings yet
Regression
16 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
CS-13
No ratings yet
CS-13
92 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Unit 4 (Part 1)
No ratings yet
Unit 4 (Part 1)
49 pages
09 Discrete Choice 1 Notes
No ratings yet
09 Discrete Choice 1 Notes
17 pages
Support Vector Machines
No ratings yet
Support Vector Machines
25 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Case Study
No ratings yet
Case Study
87 pages
Support Vector Machine Classification Algorithm and Its Application
No ratings yet
Support Vector Machine Classification Algorithm and Its Application
8 pages
A Matrix Formulation of The Multiple Regression Model
No ratings yet
A Matrix Formulation of The Multiple Regression Model
5 pages
cs229 SVM Notes
No ratings yet
cs229 SVM Notes
20 pages
Lecture-Notes-12-8
No ratings yet
Lecture-Notes-12-8
13 pages
10.1007@s11590-019-01418-9
No ratings yet
10.1007@s11590-019-01418-9
12 pages
Bayes 2021 Part2
No ratings yet
Bayes 2021 Part2
33 pages
5 Robust Optimization
No ratings yet
5 Robust Optimization
24 pages
Support Vector Machines: CS229 Lecture Notes
100% (2)
Support Vector Machines: CS229 Lecture Notes
25 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
1 Simple Linear Regression I Least Squares Estimation
No ratings yet
1 Simple Linear Regression I Least Squares Estimation
70 pages
6.3 Linear Regression
No ratings yet
6.3 Linear Regression
4 pages
Maria Research Paper
No ratings yet
Maria Research Paper
10 pages
Orf523 S24 HW3
No ratings yet
Orf523 S24 HW3
4 pages
CS 256: LMS Algorithms
No ratings yet
CS 256: LMS Algorithms
23 pages
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
No ratings yet
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
49 pages
Gonzalez 2021
No ratings yet
Gonzalez 2021
67 pages
Fast Image Recovery Using Variable Splitting and Constrained Optimization
No ratings yet
Fast Image Recovery Using Variable Splitting and Constrained Optimization
11 pages
CHP - 4 LLP
No ratings yet
CHP - 4 LLP
25 pages
lec03
No ratings yet
lec03
4 pages
Statistics 244 - Binary Response Regression, and Related Issues
100% (1)
Statistics 244 - Binary Response Regression, and Related Issues
30 pages
Bayesian Analysis of The Beta Regression Model Subject To Linear Inequality Restrictions With Application
No ratings yet
Bayesian Analysis of The Beta Regression Model Subject To Linear Inequality Restrictions With Application
16 pages
Experiment N1
No ratings yet
Experiment N1
7 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
wainwrightslides1
No ratings yet
wainwrightslides1
67 pages
Exam-MIT-Economie-2
No ratings yet
Exam-MIT-Economie-2
8 pages
Robust Shape Matching With OT
No ratings yet
Robust Shape Matching With OT
175 pages
Lossless Online Bayesian Bagging: Herbert K. H. Lee
No ratings yet
Lossless Online Bayesian Bagging: Herbert K. H. Lee
9 pages
2.098/6.255/15.093 Optimization Methods Practice True/False Questions
No ratings yet
2.098/6.255/15.093 Optimization Methods Practice True/False Questions
5 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
No ratings yet
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
20 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
Another Pedagogy For Pure-Integer Gomory
No ratings yet
Another Pedagogy For Pure-Integer Gomory
9 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet

Batch Normalization: Motivation

Uploaded by

Batch Normalization: Motivation

Uploaded by

Batch Normalization

You might also like