0% found this document useful (0 votes)
55 views

Lecture 05 Random Forest 07112022 124639pm

Random Forest is an ensemble learning method that constructs multiple decision trees and merges their predictions to improve accuracy over a single tree. It works by constructing trees on random subsets of data and features to decrease correlation between trees and reduce overfitting. Random Forests randomly select a subset of features at each split to grow trees independently, making them more diverse than trees in a boosted or bagged ensemble that see all features. This document provides an overview of Random Forests and how they compare to other ensemble methods like Bagging and Boosting.

Uploaded by

Misbah
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Lecture 05 Random Forest 07112022 124639pm

Random Forest is an ensemble learning method that constructs multiple decision trees and merges their predictions to improve accuracy over a single tree. It works by constructing trees on random subsets of data and features to decrease correlation between trees and reduce overfitting. Random Forests randomly select a subset of features at each split to grow trees independently, making them more diverse than trees in a boosted or bagged ensemble that see all features. This document provides an overview of Random Forests and how they compare to other ensemble methods like Bagging and Boosting.

Uploaded by

Misbah
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Random Forest

LECTURER:
Humera Farooq, Ph.D.
Computer Sciences Department,
Bahria University (Karachi Campus)
Outline
 RF as ensembler

 Overfitting
 Some experimental issues
Ensemble methods
 A single decision tree does not perform well
 But, it is super fast
 What if we learn multiple trees?

We need to make sure they do not all just learn the


same
Ensemble methods
 Ensemble learning helps to improve machine
learning results by combining several models.

 This approach allows the production of better


predictive performance compared to a single model.

 Basic idea is to learn a set of classifiers (experts)


and to allow them to vote.

 Bagging and Boosting are two types of Ensemble


Learning.

 These two decrease the variance of a single estimate


as they combine several estimates from different
models. So the result may be a model with higher
stability.
Advantage: Improvement in predictive accuracy.

Disadvantage : It is difficult to understand an


ensemble of classifiers.
Ensemble methods
• Statistical Problem –
The Statistical Problem arises when the hypothesis space is too large for the amount of
available data. Hence, there are many hypotheses with the same accuracy on the data and the
learning algorithm chooses only one of them! There is a risk that the accuracy of the chosen
hypothesis is low on unseen data!
• Computational Problem –
The Computational Problem arises when the learning algorithm cannot guarantees finding the
best hypothesis.
• Representational Problem –
The Representational Problem arises when the hypothesis space does not contain any good
approximation of the target class(es).

• The main challenge is not to obtain highly accurate base models, but rather to obtain base
models which make different kinds of errors. For example, if ensembles are used for
classification, high accuracies can be accomplished if different base models misclassify
different training examples, even if the base classifier accuracy is low.
Bagging
 Bagging (Bootstrap aggregating) is a method that result in low variance, usually applied to
decision tree methods.
1. It is a homogeneous weak learners’ model that learns from each other independently in
parallel. It is the case of the model averaging approach. 

 It is designed to improve the stability and accuracy of machine learning algorithms used in
statistical classification and regression.

 If we split the data in random different ways, decision trees give different results, high
variance. Bagging decreases the variance and helps to avoid overfitting.

 If we had multiple realizations of the data (or multiple samples) we could calculate the
predictions multiple times and take the average of the fact that averaging multiple onerous
(difficult) estimations produce less uncertain results.
 The Random Forest model uses Bagging, where decision tree models with higher variance are
present. It makes random feature selection to grow trees. Several random trees make a
Random Forest.
Bagging
• Step 1: Multiple subsets are created from the original data set with equal tuples,
selecting observations with replacement.
• Step 2: A base model is created on each of these subsets.
• Step 3: Each model is learned in parallel with each training set and independent of
each other.
• Step 4: The final predictions are determined by combining the predictions from all
the models.
Variable Importance Measures
 Bagging results in improved accuracy over prediction
using a single tree
 Unfortunately, difficult to interpret the resulting model.
Bagging improves prediction accuracy at the expense of
interpretability.
Bagging - issues
Each tree is identically distributed (i.d.)
the expectation of the average of B such trees is the same as the expectation of any
one of them
the bias of bagged trees is the same as that of the individual trees
i.d. and not i.i.d (independently and identically distributed)

We can penalize the splitting (like in pruning) with a penalty term that depends on the
number of times a predictor is selected at a given length

We can restrict how many times a predictor can be used

We only allow a certain number of predictors


Bagging - issues
Remember we want i.i.d such as the bias to be the
same and variance to be less?
What if we consider only a subset of the predictors at
each split?
We will still get correlated trees unless ….
we randomly select the subset !
Random Forests
As in bagging, we build a number of decision trees on
bootstrapped training samples each time a split in a tree
is considered, a random sample of m predictors is
chosen as split candidates from the full set of p
predictors.

Note that if m = p, then this is bagging.


Random Forest
 Random forests (RF) are a combination of tree predictors such that each tree
depends on the values of a random vector sampled independently and with the same
distribution for all trees in the forest.

 The generalization error of a forest of tree classifiers depends on the strength of the
individual trees in the forest and the correlation between them.

 Using a random selection of features to split each node yields error rates that
compare favorably to Adaboost, and are more robust with respect to noise.

 Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final
output.
 The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.

Random Forests Algorithm
For b = 1 to B:
(a) Draw a bootstrap sample Z∗ of size N from the training data.
(b) Grow a random-forest tree to the bootstrapped data, by recursively
repeating the following steps for each terminal node of the tree, until the minimum
node size nmin is reached.
i. Select m variables at random from the p variables.
ii. Pick the best variable/split-point among the m.
iii. Split the node into two daughter nodes.
Output the ensemble of trees.

To make a prediction at a new point x we do:


For regression: average the results
For classification: majority vote
Random Forests Tuning
The inventors make the following recommendations:
In the case of a classification problem, the final output is taken by using the majority
voting classifier. The default value for m is √p and the minimum node size is one.
In the case of a regression problem, the final output is the mean of all the outputs .
The default value for m is p/3 and the minimum node size is five.

In practice the best values for these parameters will depend on the problem, and they
should be treated as tuning parameters.
Random Forest Classifier
Create bootstrap samples
from the training data

M features
N examples

....…
Random Forest Classifier
Construct a decision tree

M features
N examples

....…
Random Forest Classifier
At each node in choosing the split feature
choose only among m<M features

M features
N examples

....…
Random Forest Classifier
Create decision tree
from each bootstrap sample

M features
N examples

....…
....…
Random Forest Classifier

M features
N examples

Take he
majority
vote

....…
....…

Aggregation
Example
 4,718 genes measured on tissue samples from 349 patients.
 Each gene has different expression
 Each of the patient samples has a qualitative label with 15
different levels: either normal or 1 of 14 different types of
cancer.

We can use random forests to predict cancer type based on the


500 genes that have the largest variance in the training set.
Can RF overfit?
Random forests “cannot overfit” the data wrt to
number of trees.

The number of trees, B does not mean increase in the


flexibility of the model
Boosting
Boosting is a general approach that can be applied to many statistical learning
methods for regression or classification.
It is also a homogeneous weak learners’ model but works differently from Bagging. In
this model, learners learn sequentially and adaptively to improve model predictions of
a learning algorithm.

Bagging: Generate multiple trees from bootstrapped data and average the trees.
Bagging results in i.d. trees and not i.i.d.

RF produces i.i.d (or more independent) trees by randomly selecting a subset of


predictors at each step
Boosting
Boosting works very differently.
1.Boosting does not involve bootstrap sampling

2.Trees are grown sequentially: each tree is grown

using information from previously grown trees


3. Like bagging, boosting involves combining a
large number of decision trees, f1, . . . , fB
Boosting
1. Initialize the dataset and assign equal weight to each of the data point.

2. Provide this as input to the model and identify the wrongly classified data points.

3. Increase the weight of the wrongly classified data points and decrease the weights of
correctly classified data points. And then normalize the weights of all data points.

4. if (got required results)


   Goto step 5
else
   Goto step 2

5. End
References
A Data Science Harvard University

https://www.javatpoint.com/machine-learning-random-forest-algorithm

https://www.geeksforgeeks.org/bagging-vs-boosting-in-machine-learning/

You might also like