Ensemble Learning-Bagging-Boosting-Stacking
Ensemble Learning-Bagging-Boosting-Stacking
Machine learning is great! But there’s one thing that makes it even better:
Bagging, boosting and stacking are the three most popular ensemble learning
predictive accuracy. Each technique is used for a different purpose, with the
In other words, they tend to have low prediction accuracy. To mitigate this
The individual models that we combine are known as weak learners. We call
them weak learners because they either have a high bias or high variance.
Because they either have high bias or variance, weak learners cannot learn
● A high-bias model results from not learning data well enough. It is not
● A high variance model results from learning the data too well. It varies
with each data point. Hence it is impossible to predict the next point
accurately.
Both high bias and high variance models thus cannot generalize properly.
be relied on by themselves.
As we know from the bias-variance trade-off, an underfit model has high
bias and low variance, whereas an overfit model has high variance and low
bias. In either case, there is no balance between bias and variance. For there
Ensemble learning will aim to reduce the bias if we have a weak model with
high bias and low variance. Ensemble learning will aim to reduce the variance
if we have a weak model with high variance and low bias. This way, the
resulting model will be much more balanced, with low bias and variance.
Thus, the resulting model will be known as a strong learner. This model will be
more generalized than the weak learners. It will thus be able to make accurate
predictions.
ways:
reduce the bias of weak learners. Stacking is used to improve the overall
to produce a model with lower variance than the individual weak models.
These weak learners are homogenous, meaning they are of the same type.
Bootstrapping
Involves resampling subsets of data with replacement from an initial dataset.
In other words, subsets of data are taken from the initial dataset. These
Aggregating
The individual weak learners are trained independently from each other. Each
the mode of the predictions (the most occurring prediction). It is called voting
because like in election voting, the premise is that ‘the majority rules’. Each
‘vote’. The most occurring ‘vote’ is chosen as the representative for the
combined model.
Steps of Bagging
The steps of bagging are as follows:
a subset of N sample points from the initial dataset for each subset.
Each subset is taken with replacement. This means that a specific data
● The predictions are aggregated into a single prediction. For this, either
produce a model with a lower bias than that of the individual models. Like in
data is first taken from the initial dataset. This sample is used to train the first
model, and the model makes its prediction. The samples can either be
correctly or incorrectly predicted. The samples that are wrongly predicted are
reused for training the next model. In this way, subsequent models can
aggregates the results at each step. They are aggregated using weighted
averaging.
their predictive power. In other words, it gives more weight to the model with
the highest predictive power. This is because the learner with the highest
Steps of Boosting
● We test the trained weak learner using the training data. As a result of
● Each data point with the wrong prediction is sent into the second subset
● Using this updated subset, we train and test the second weak learner.
● We continue with the following subset until the total number of subsets
is reached.
● We now have the total prediction. The overall prediction has already
strong learners.
models make predictions and form a single new dataset using those
predictions. This new data set is used to train the metamodel, which makes
boosted models.
Steps of Stacking
● Using the results of the meta-model, we make the final prediction. The
bagging. If you are looking to reduce underfitting or bias, you use boosting. If
Bagging and boosting both works with homogeneous weak learners. Stacking
All three of these methods can work with either classification or regression
problems.
thus not advisable to use boosting for reducing variance. Boosting will do a
reduce bias or underfitting. This is because bagging is more prone to bias and
they have the disadvantage of needing much more time and computational
power. If you are looking for faster results, it’s advisable not to use stacking.
Conclusion
One of the first uses of ensemble methods was the bagging technique. This
example of the bagging technique is the random forest algorithm. The random
bagging is employed to form a random forest. The resulting random forest has
such as boosting, stacking, and many others. Today, these developments are
vehicles, medical diagnosis, and many others. These systems are crucial
because they have the ability to impact human lives and business revenues.
human lives.
Bagging, boosting and stacking are important for ensuring the accuracy of
inaccurate models. Below are some of the key takeaways from the article: