Ensemble Final
Ensemble Final
Agenda
• Ensemble Learning
• Boosting
• Gradient Boosting and XGBoost
• Overfitting/Underfitting
• How to address Overfitting/Underfitting
Ensemble Learning
• Performance
– None of the classifiers is perfect
– Complementary
• Examples which are not correctly classified by one
classifier may be correctly classified by the other
classifiers
• Potential Improvements
– Utilize the complementary property
An EXAMPLE
Name Age Male? Height >
Male 55”
? Alice 14 0 1
Ye N
s o Bob 10 1 1
Age>9 Age>10 Carol 13 0 1
? ?
Ye N Ye N Dave 8 1 0
s o s o
1 0 1 0 Erin 11 0 0
Frank 9 1 1
Gena 8 0 0
Ensembles of Classifiers
Combine the classifiers to improve the
performance
Ensembles of Classifiers
– Two ways to combine the classification results
from different classifiers to produce the final
output
• Unweighted voting
• Weighted voting
Example: Weather Forecast
Reality
1 X X X
2 X X X
3 X X X
4 X X
5 X X
Combine
Type of Ensemble methods:
The three most popular methods for combining the
predictions from different models are:
James 11 1 1
Jessica 14 0 1
Person Age Male? Height > 55”
Alice 14 0 1
Amy 12 0 1 Alice 14 0 1
Bob 10 1 1
Xavier 9 1 0
Bob 10 1 1
Cathy 9 0 1
Carol 13 0 1
Carol 13 0 1
Eugene 13 1 0
Rafael 12 1 1
Dave 8 1 0
Dave 8 1 0
Peter 9 1 0
Henry 13 1 0
Erin 11 0 0
Erin 11 0 0
Rose 7 0 0 Frank 9 1 1
Iain 8 1 1
Paulo 12 1 0 Gena 8 0 0
Margaret 10 0 1
Frank 9 1 1
Jill 13 0 0
Leon 10 1 0
y h(
Sarah 12 0 0
Generalization x)
Gena 8 0 0
f(h(x),y) ]
Boosting
• The term ‘Boosting’ refers to a family of algorithms
which converts weak learner to strong learners.
• The weak learner is the classifiers that are correct only up to a
small extent with the actual classification, while the strong
learners are the classifiers that are well correlated with the
actual classification.
• To find weak rule, we apply base learning (ML) algorithms with a
different distribution. Each time base learning algorithm is
applied, it generates a new weak prediction rule. After many
iterations, the boosting algorithm combines these weak rules into
a single strong prediction rule.
Boosting
Choosing different distribution for each round
• For all the records which are classified incorrectly by the base
learner, we only take them and pass it to other base learner say
BL2 and simultaneously we pass the incorrect records classified
by BL2 to train BL3.
Disadvantages of Bagging
• It may result in high bias if it is not modelled properly and
thus may result in underfitting.
• Since we must use multiple models, it becomes
computationally expensive and may not be suitable in
various use cases.
Advantages of Boosting
• It is one of the most successful techniques in solving the
two-class classification problems.
• It is good at handling the missing data.
Disadvantages of Boosting
• Boosting is hard to implement in real-time due to the
increased complexity of the algorithm.
• High flexibility of this techniques results in a multiple
number of parameters than have a direct effect on the
behaviour of the model.
Types of Boosting Algorithms
• Gradient Tree Boosting
• XGBoost
• Q.1 What do you mean by Bagging?