UNIT-II Regularization in Deep Learning
UNIT-II Regularization in Deep Learning
What is Overfitting?
• When a model trains on sample data for an excessively long time or
becomes very complicated, it may begin to learn "noise," or
unimportant information, from the dataset.
• The model becomes "overfitted" and unable to generalize
successfully to new data when it memorizes the noise.
• A model won't be able to carry out the classification or prediction
tasks that it was designed for if it can't generalize successfully to new
data.
What is Regularization?
• When a neural network faces entirely new data, regularization acts as
a guiding principle to prevent it from becoming too focused on just
the training data.
• By slightly altering the learning process, the model learns to
generalize better, improving its performance on unseen data.
• The model then performs better on the unobserved data as a result.
Why Regularization?
• Through Regularization the bigger coefficient input parameters
receive a "penalty", which ultimately reduces the variance of the
model, and particularly in deep learning the nodes weight matrices
are penalized.
• With regularization, a more optimized and better accurate model for
better output is achieved.
How does Regularization work?
• When modeling the data, a low bias and high variance scenario is
referred to as overfitting.
• To handle this, regularization techniques trade more bias for less
variance.
• Effective regularization is one that strikes the optimal balance
between bias and variation.
• Additionally, Regularization orders possible models from weakest
overfit to biggest and adds penalties to more complicated models.
• Regularization makes the assumption that the least weights could
result in simpler models and help prevent overfitting.
Techniques of Regularization
L1 Regularization(Lasso Regression)
L2 Regularization(Ridge Regression)
Early stopping
Dropout Regularization
Data Augmentation
Batch Normalization
L1 Regularization
• L1 regularization adds the absolute values of weights to the loss
function as a penalty.
• This encourages some weights to shrink to exactly zero, effectively
eliminating those parameters from the model.
• This is particularly useful for feature selection, as it helps the model
focus on only the most important inputs while ignoring irrelevant
ones.
• Mathematical representation for the L1 regularization is: