Regularization: Swetha V, Research Scholar
Regularization: Swetha V, Research Scholar
BY,
SWETHA V,
RESEARCH SCHOLAR.
Regularization
• Regularization is a set of techniques that can prevent overfitting in
neural networks and thus improve the accuracy of a Deep
Learning model when facing completely new data from the
problem domain.
• Recap: Overfitting
• What is Regularization?
• L2 Regularization
• L1 Regularization
• Why do L1 and L2 Regularizations work?
• Dropout
• Take-Home-Message
1. Recap: Overfitting
• One of the most important aspects when training neural networks is
avoiding overfitting.
Quick recap:
Overfitting refers to the phenomenon where a neural network models
the training data very well but fails when it sees new data from the same
problem domain.
Overfitting is caused by noise in the training data that the neural
network picks up during training and learns it as an underlying concept
of the data.
• This learned noise, however, is unique to each training set. As soon as the
model sees new data from the same problem domain, but that does not
contain this noise, the performance of the neural network gets much worse.
• Why does the neural network picks up that noise in the first place?
The reason for this is that the complexity of this network is too high. A
fit of a neural network with higher complexity is shown in the image on the
right-hand side.
Bias
• Represents the extent to which average prediction over all data sets differs
from the desired regression function
Variance
• Represent the extent to which the model is sensitive to the particular choice
of data set
Graph 1. Model with a good fit and high variance
• The model with a higher complexity is able to pick up and learn
patterns (noise) in the data that are just caused by some random
fluctuation or error.
• The network would be able to model each data sample of the
distribution one-by-one, while not recognizing the true function that
describes the distribution.
• New arbitrary samples generated with the true function would have a
high distance to the fit of the model. We also say that the model has a
high variance.
• On the other hand, the lower complexity network on the left side
models the distribution much better by not trying too hard to model
each data pattern individually.
• In practice, overfitting causes the neural network model to perform
very well during training, but the performance gets much worse
during inference time when faced with brand new data.