0% found this document useful (0 votes)
26 views

Bias and Variance

Uploaded by

Niharika Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Bias and Variance

Uploaded by

Niharika Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Bias and Variance

Machine learning is a branch of Artificial Intelligence, which allows machines to


perform data analysis and make predictions. However, if the machine learning
model is not accurate, it can make predictions errors, and these prediction errors
are usually known as Bias and Variance. In machine learning, these errors will
always be present as there is always a slight difference between the model
predictions and actual predictions
The main aim of ML/data science analysts is to reduce these errors in order to
get more accurate results.

o Reducible errors: These errors can be reduced to improve the


model accuracy. Such errors can further be classified into bias and
Variance.

o Irreducible errors: These errors will always be present in the


model
regardless of which algorithm has been used. The cause of these errors is
unknown variables whose value can't be reduced.

What is Bias?
In general, a machine learning model analyses the data, find patterns in it
and make predictions. While training, the model learns these patterns in
the dataset and applies them to test data for prediction. While making
predictions, a difference occurs between prediction values made
by the model and actual values/expected values, and this
difference is known as bias errors or Errors due to bias. It can be
defined as an inability of machine learning algorithms such as Linear
Regression to capture the true relationship between the data points. Each
algorithm begins with some amount of bias because bias occurs from
assumptions in the model, which makes the target function simple to
learn.

A model has either:

o Low Bias: A low bias model will make fewer assumptions about the
form of the target function.
o High Bias: A model with a high bias makes more assumptions, and
the model becomes unable to capture the important features of our
dataset. A high bias model also cannot perform well on new
data.

Some examples of machine learning algorithms with low bias are Decision
Trees, k-Nearest Neighbours and Support Vector Machines. At the same
time, an algorithm with high bias is Linear Regression, Linear Discriminant
Analysis and Logistic Regression.

Ways to reduce high bias in Machine Learning:


Use a more complex model: One of the main reasons for high bias is
the very simplified model. it will not be able to capture the complexity of
the data. In such cases, we can make our mode more complex by
increasing the number of hidden layers in the case of a deep neural
network. Or we can use a more complex model like Polynomial regression
for non-linear datasets, CNN for image processing, and RNN for sequence
learning.

Increase the number of features: By adding more features to train the


dataset will increase the complexity of the model. And improve its ability
to capture the underlying patterns in the data.

Reduce Regularization of the model: Regularization techniques such


as L1 or L2 regularization can help to prevent overfitting and improve the
generalization ability of the model. if the model has a high bias, reducing
the strength of regularization or removing it altogether can help to
improve its performance.

Increase the size of the training data: Increasing the size of the
training data can help to reduce bias by providing the model with more
examples to learn from the dataset.

What is Variance?
Variance is the measure of spread in data from its mean position. In machine learning
variance is the amount by which the performance of a predictive model changes when it is
trained on different subsets of the training data. More specifically, variance is the variability
of the model that how much it is sensitive to another subset of the training dataset. i.e. how
much it can adjust on the new subset of the training dataset.

Low variance: Low variance means that the model is less sensitive to changes in the training
data and can produce consistent estimates of the target function with different subsets of
data from the same distribution. This is the case of underfitting when the model fails to
generalize on both training and test data.
High variance: High variance means that the model is very sensitive to changes in the
training data and can result in significant changes in the estimate of the target function
when trained on different subsets of data from the same distribution. This is the case of
overfitting when the model performs well on the training data but poorly on new, unseen
test data. It fits the training data too closely that it fails on the new training dataset.
Ways to Reduce the reduce Variance in Machine Learning:
Cross-validation: By splitting the data into training and testing sets multiple times, cross-
validation can help identify if a model is overfitting or underfitting and can be used to tune
hyperparameters to reduce variance.
Feature selection: By choosing the only relevant feature will decrease the model’s
complexity. and it can reduce the variance error.
Regularization: We can use L1 or L2 regularization to reduce variance in machine learning
models
Ensemble methods: It will combine multiple models to improve generalization performance.
Bagging, boosting, and stacking are common ensemble methods that can help reduce
variance and improve generalization performance.
Simplifying the model: Reducing the complexity of the model, such as decreasing the
number of parameters or layers in a neural network, can also help reduce variance and
improve generalization performance.
Different Combinations of Bias-Variance
There can be four combinations between bias and variance.

1. Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal
machine learning model. However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model
predictions are inconsistent and accurate on average. This case
occurs when the model learns with a large number of parameters
and hence leads to an overfitting
3. High-Bias, Low-Variance: With High bias and low variance,
predictions are consistent but inaccurate on average. This case
occurs when a model does not learn well with the training dataset or
uses few numbers of the parameter. It leads
to underfitting problems in the model.
4. High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and
also inaccurate on average.
Bias Variance Tradeoff
If the algorithm is too simple (hypothesis with linear equation) then it may be on high bias
and low variance condition and thus is error-prone. If algorithms fit too complex (hypothesis
with high degree equation) then it may be on high variance and low bias. In the latter
condition, the new entries will not perform well. Well, there is something between both of
these conditions, known as a Trade-off or Bias Variance Trade-off. This tradeoff in complexity
is why there is a tradeoff between bias and variance. An algorithm can’t be more complex
and less complex at the same time. For the graph, the perfect tradeoff will be like this.

You might also like