0% found this document useful (0 votes)
12 views

Batch Normalization

Uploaded by

cse21298
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Batch Normalization

Uploaded by

cse21298
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

BATCH NORMALIZATION

Training Deep Neural Networks is a difficult task that involves several problems to tackle.
Despite their huge potential, they can be slow and be prone to overfitting. Batch
normalization is one of the methods for solving problems which are constant in deep learning
models.

Normalization

"Normalization" is a broad category of methods that seek to make different samples seen by a
machine learning model more similar to each other, which helps the model learn and
generalize well to new data. The most common form of data normalization is centering the
data on 0 by subtracting the mean from the data, and giving it a unit standard deviation by
dividing the data by its standard deviation. In effect, this makes the assumption that the data
follows a normal (or Gaussian) distribution, and makes sure that this distribution is centered
and scaled to unit variance.

Normalization is a pre-processing technique used to standardize data.

In other words, having different sources of data inside the same range. Not normalizing the
data before training can cause problems in the network, making it drastically harder to train
and decrease its learning speed.

Batch Normalization

Batch normalization is a feature added between the layers of the neural network and it
continuously takes the output from the previous layer and normalizes it before sending it to
the next layer. This has the effect of stabilizing the neural network. Batch normalization is
also used to maintain the distribution of the data.

It is a process to make neural networks faster and more stable through adding extra layers in a
deep neural network. The new layer performs the standardizing and normalizing operations
on the input of a layer coming from a previous layer.

The idea is to normalise the inputs of each layer in such a way that they have a mean output
activation of zero and standard deviation of one.
A typical neural network is trained using a collected set of input data called batch. Similarly,
the normalizing process in batch normalization takes place in batches, not as a single input.

Consider a deep neural network as shown below.

Initially, inputs X1, X2, X3, X4 are in normalized form as they are coming from the pre-
processing stage. When the input passes through the first layer, it transforms, as a sigmoid
function applied over the dot product of input X and the weight matrix W.
Similarly, this transformation will take place for the second layer and go till the last layer L
as shown in the following image.

Although, our input X was normalized with time the output will no longer be on the same
scale. As the data go through multiple layers of the neural network and L activation functions
are applied, it leads to an internal co-variate shift in the data.

Batch Normalization steps

It is a two-step process. First, the input is normalized, and later rescaling and offsetting is
performed.

Normalization of the Input

Normalization is the process of transforming the data to have a mean zero and standard
deviation one. In this step we have batch input from layer h, first, calculate the mean of this
hidden activation.
Here, m is the number of neurons at layer h.

The next step is to calculate the standard deviation of the hidden activations.

Next normalize the hidden activations using these values. For this, subtract the mean from
each input and divide the whole value with the sum of standard deviation and the smoothing
term (ε).

The smoothing term(ε) assures numerical stability within the operation by stopping a division
by a zero value.

Rescaling of Offsetting

In the final operation, the re-scaling and offsetting of the input take place. Here two
components of the Batch Normalization algorithm come into the picture, γ(gamma) and β
(beta). These parameters are used for re-scaling (γ) and shifting(β) of the vector containing
values from the previous operations.

These two are learnable parameters, during the training neural network ensures the optimal
values of γ and β are used. That will enable the accurate normalization of each batch.
Benefits of Batch Normalization

The intention behind batch normalisation is to optimise network training. This approach leads
to faster learning rates since normalization ensures there’s no activation value that’s too high
or too low, as well as allowing each layer to learn independently of the others.

Normalizing inputs reduces the “dropout” rate, or data lost between processing layers which
significantly improves accuracy throughout the network.

It has been shown to have several benefits:

1. Networks train faster — Whilst each training iteration will be slower because of the extra
normalisation calculations during the forward pass and the additional hyperparameters to train
during back propagation. However, it should converge much more quickly, so training should
be faster overall.
2. Allows higher learning rates — Gradient descent usually requires small learning rates for the
network to converge. As networks get deeper, gradients get smaller during back propagation,
and so require even more iterations. Using batch normalisation allows much higher learning
rates, increasing the speed at which networks train.
3. Makes weights easier to initialise — Weight initialisation can be difficult, especially when
creating deeper networks. Batch normalisation helps reduce the sensitivity to the initial
starting weights.
4. Makes more activation functions viable — Some activation functions don’t work well in
certain situations. Sigmoids lose their gradient quickly, which means they can’t be used in
deep networks, and ReLUs often die out during training (stop learning completely), so we
must be careful about the range of values fed into them. But as batch normalisation regulates
the values going into each activation function, nonlinearities that don’t work well in deep
networks tend to become viable again.
5. Simplifies the creation of deeper networks — The previous 4 points make it easier to build
and faster to train deeper neural networks, and deeper networks generally produce better
results.
6. Provides some regularisation — Batch normalisation adds a little noise to network, and in
some cases, (e.g., Inception modules) it has been shown to work as well as dropout. Batch
normalisation act as a bit of extra regularization, allowing to reduce some of the dropout
which ca add to a network.
7. Handles internal covariate shift

It solves the problem of internal covariate shift. It ensures that the input for every layer is
distributed around the same mean and standard deviation.

8. Smoothens the Loss Function

Batch normalization smoothens the loss function that in turn by optimizing the model
parameters improves the training speed of the model.

You might also like