0% found this document useful (0 votes)
14 views

Bias and Variance in Machine Learning

The document discusses bias and variance in machine learning models. Bias refers to error due to overly simplistic assumptions, while variance is error due to sensitivity to fluctuations in training data. Underfitting occurs when models are too simple to capture complexity, while overfitting occurs when models are too complex and learn noise. Techniques to address underfitting and overfitting include adjusting model complexity and the amount of training data.

Uploaded by

guptaatulit55
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Bias and Variance in Machine Learning

The document discusses bias and variance in machine learning models. Bias refers to error due to overly simplistic assumptions, while variance is error due to sensitivity to fluctuations in training data. Underfitting occurs when models are too simple to capture complexity, while overfitting occurs when models are too complex and learn noise. Techniques to address underfitting and overfitting include adjusting model complexity and the amount of training data.

Uploaded by

guptaatulit55
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Bias and Variance in Machine Learning

• Bias: Bias refers to the error due to overly simplistic assumptions in the learning
algorithm. These assumptions make the model easier to comprehend and learn but
might not capture the underlying complexities of the data. It is the error due to the
model’s inability to represent the true relationship between input and output
accurately. When a model has poor performance both on the training and testing data
means high bias because of the simple model, indicating underfitting.
• Variance: Variance, on the other hand, is the error due to the model’s sensitivity to
fluctuations in the training data. It’s the variability of the model’s predictions for
different instances of training data. High variance occurs when a model learns the
training data’s noise and random fluctuations rather than the underlying pattern. As a
result, the model performs well on the training data but poorly on the testing data,
indicating overfitting.

Bias and Variance


Underfitting in Machine Learning
A statistical model or a machine learning algorithm is said to have underfitting when a model
is too simple to capture data complexities. It represents the inability of the model to learn the
training data effectively result in poor performance both on the training and testing data. In
simple terms, an underfit model’s are inaccurate, especially when applied to new, unseen
examples. It mainly happens when we uses very simple model with overly simplified
assumptions. To address underfitting problem of the model, we need to use more complex
models, with enhanced feature representation, and less regularization.
Note: The underfitting model has High bias and low variance.
Reasons for Underfitting
1. The model is too simple, So it may be not capable to represent the complexities in the
data.
2. The input features which is used to train the model is not the adequate
representations of underlying factors influencing the target variable.
3. The size of the training dataset used is not enough.
4. Excessive regularization are used to prevent the overfitting, which constraint the
model to capture the data well.
5. Features are not scaled.
Techniques to Reduce Underfitting
1. Increase model complexity.
2. Increase the number of features, performing feature engineering.

3. Remove noise from the data.


4. Increase the number of epochs or increase the duration of training to get better
results.
Overfitting in Machine Learning
A statistical model is said to be overfitted when the model does not make accurate predictions
on testing data. When a model gets trained with so much data, it starts learning from the
noise and inaccurate data entries in our data set. And when testing with test data results in
High variance. Then the model does not categorize the data correctly, because of too many
details and noise. The causes of overfitting are the non-parametric and non-linear methods
because these types of machine learning algorithms have more freedom in building the model
based on the dataset and therefore they can really build unrealistic models. A solution to avoid
overfitting is using a linear algorithm if we have linear data or using the parameters like the
maximal depth if we are using decision trees.
In a nutshell, Overfitting is a problem where the evaluation of machine learning algorithms on
training data is different from unseen data.
Reasons for Overfitting:
1. High variance and low bias.
2. The model is too complex.
3. The size of the training data.
Techniques to Reduce Overfitting
1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye over the loss over the training
period as soon as loss begins to increase stop training).
4. Ridge Regularization and Lasso Regularization.
5. Use dropout for neural networks to tackle overfitting.

Underfitting and Overfitting


Good Fit in a Statistical Model
Ideally, the case when the model makes the predictions with 0 error, is said to have a good fit
on the data. This situation is achievable at a spot between overfitting and underfitting. In order
to understand it, we will have to look at the performance of our model with the passage of
time, while it is learning from the training dataset.
With the passage of time, our model will keep on learning, and thus the error for the model
on the training and testing data will keep on decreasing. If it will learn for too long, the model
will become more prone to overfitting due to the presence of noise and less useful details.
Hence the performance of our model will decrease. In order to get a good fit, we will stop at
a point just before where the error starts increasing. At this point, the model is said to have
good skills in training datasets as well as our unseen testing dataset.

You might also like