0% found this document useful (0 votes)

337 views

Decision Tree & Random Forest

A decision tree is a tool that uses a tree-like model to predict outcomes based on input data. It starts with a root node and uses a series of splits based on feature values to classify or regress outcomes at the leaf nodes. Decision trees select splits that maximize the purity of outcomes in the descendant nodes. They use measures like information gain and Gini impurity to evaluate split quality. Decision trees can overfit data, so techniques like pruning branches and limiting tree depth are used to improve generalization. Random forests are an ensemble method that combines many decision trees to make more robust predictions.

Uploaded by

reshma acharya

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

337 views

Decision Tree & Random Forest

Uploaded by

reshma acharya

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Decision Tree

What is a Decision Tree?

It is a tool that has applications spanning several different areas. Decision
trees can be used for classification as well as regression problems. The
name itself suggests that it uses a flowchart like a tree structure to show
the predictions that result from a series of feature-based splits. It starts with
a root node and ends with a decision made by leaves.

It is a graphical representation of all the possible solution to a decision that is

based on certain condition. In this algorithm, the training sample points are
split into two or more sets based on the split condition over input variables. A
simple example of decision tree can be as a person has to take a decision for
going to sleep or restaurant based on parameters like he is hungry or have 25$
in his pocket.

Types of Decision tree –

Categorical variable decision tree: The type of decision tree is classified based in the
response/target variable. A tree with qualitative or categorical response variable is known as
Categorical variable decision tree.

Continuous variable decision tree: A tree with continuous response variable is known as
continuous variable decision tree
The tree accuracy is heavily affected by the split point at decision node. Decision trees use
different criteria to decide split on decision node to get two or more sub nodes. The resultant
sub nodes must increase in the homogeneity of data points also known as the purity of nodes
with respect to target variable. The split decision is tested on all available variables and then
the split with maximum purity sub nodes is get selected.
Measures of Impurity: Decision trees recursively split feature about to their target variable’s
purity. The algorithm is designed to optimize each split such the purity will be maximized.
Impurity can be measured in many ways such as Gini impurity, Entropy and information
gain.
Gini Impurity -Gini index is the measure of how often a randomly chosen element from the
set would be incorrectly labelled. Mathematically the impurity of a set can be expressed as:

Entropy Entropy is nothing but the uncertainty in our dataset or measure of disorder.
In a decision tree, the output is mostly “yes” or “no”

The formula for Entropy is shown below:

Here p+ is the probability of positive class

p– is the probability of negative class

S is the subset of the training example

Entropy basically measures the impurity of a node. Impurity is the degree of

randomness; it tells how random our data is. A pure sub-split means that either you
should be getting “yes”, or you should be getting “no”.

Suppose a feature has 8 “yes” and 4 “no” initially, after the first split the left node gets
5 ‘yes’ and 2 ‘no’ whereas right node gets 3 ‘yes’ and 2 ‘no’.
We see here the split is not pure, why? Because we can still see some negative
classes in both the nodes. In order to make a decision tree, we need to calculate the
impurity of each split, and when the purity is 100%, we make it as a leaf node.

To check the impurity of feature 2 and feature 3 we will take the help for Entropy
formula.

Image Source: Author

For feature 3,

We can clearly see from the tree itself that left node has low entropy or more purity
than right node since left node has a greater number of “yes” and it is easy to decide
here.
Always remember that the higher the Entropy, the lower will be the purity and the
higher will be the impurity.

As mentioned earlier the goal of machine learning is to decrease the uncertainty or

impurity in the dataset, here by using the entropy we are getting the impurity of a
particular node, we don’t know if the parent entropy or the entropy of a particular
node has decreased or not.

For this, we bring a new metric called “Information gain” which tells us how much the
parent entropy has decreased after splitting it with some feature.

Information Gain

Information gain measures the reduction of uncertainty given some feature and it is also a
deciding factor for which attribute should be selected as a decision node or root node.

It is just entropy of the full dataset – entropy of the dataset given some feature.

To understand this better let’s consider an example:

Suppose our entire population has a total of 30 instances. The dataset is to predict whether
the person will go to the gym or not. Let’s say 16 people go to the gym and 14 people don’t

Now we have two features to predict whether he/she will go to the gym or not.

Feature 1 is “Energy” which takes two values “high” and “low”

Feature 2 is “Motivation” which takes 3 values “No motivation”, “Neutral” and “Highly

motivated”.

Let’s see how our decision tree will be made using these 2 features. We’ll use information
gain to decide which feature should be the root node and which feature should be placed
after the split.
Image Source: Author

Let’s calculate the entropy:

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Energy), information gain will be:

Our parent entropy was near 0.99 and after looking at this value of information gain, we can
say that the entropy of the dataset will decrease by 0.37 if we make “Energy” as our root
node.

Similarly, we will do this with the other feature “Motivation” and calculate its information
gain.

Image Source: Author

Let’s calculate the entropy here:

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Motivation), information gain will be:

We now see that the “Energy” feature gives more reduction which is 0.37 than the
“Motivation” feature. Hence we will select the feature which has the highest information
gain and then split the node based on that feature.

In this example “Energy” will be our root node and we’ll do the same for sub-nodes. Here we
can see that when the energy is “high” the entropy is low and hence we can say a person will
definitely go to the gym if he has high energy, but what if the energy is low? We will again
split the node based on the new feature which is “Motivation”.

When to stop splitting?

Usually, real-world datasets have a large number of features, which will result in a large
number of splits, which in turn gives a huge tree. Such trees take time to build and can lead to
overfitting. That means the tree will give very good accuracy on the training dataset but will
give bad accuracy in test data.

There are many ways to tackle this problem through hyperparameter tuning. We can set the
maximum depth of our decision tree using the max_depth parameter. The more the value
of max_depth, the more complex your tree will be. The training error will off-course decrease
if we increase the max_depth value but when our test data comes into the picture, we will get
a very bad accuracy. Hence you need a value that will not overfit as well as underfit our data
and for this, you can use GridSearchCV.

Another way is to set the minimum number of samples for each spilt. It is denoted
by min_samples_split. Here we specify the minimum number of samples required to do a spilt.
For example, we can use a minimum of 10 samples to reach a decision. That means if a node
has less than 10 samples then using this parameter, we can stop the further splitting of this
node and make it a leaf node.

There are more hyperparameters such as :

min_samples_leaf – represents the minimum number of samples required to be in the leaf

node. The more you increase the number, the more is the possibility of overfitting.

max_features – it helps us decide what number of features to consider when looking for the
best split.

Pruning

It is another method that can help us avoid overfitting. It helps in improving the performance
of the tree by cutting the nodes or sub-nodes which are not significant. It removes the
branches which have very low importance.

There are mainly 2 ways for pruning:

(i) Pre-pruning – we can stop growing the tree earlier, which means we can
prune/remove/cut a node if it has low importance while growing the tree.

(ii) Post-pruning – once our tree is built to its depth, we can start pruning the nodes based on
their significance.

A complex and large tree poorly generalizes the new samples data whereas a small tree fails
to capture the information of training sample data.
Pruning may be defined as shortening the branches of tree. The process of reducing the size
of the tree by turning some branch node into leaf node and removing the leaf node under the
original branch.
Pruning is very useful in decision tree because sometime what happens is that the decision
tree may fit the training data very well but performs very poorly in testing or new data. So, by
removing branches we can reduce the complexity of tree which help in reducing the over
fitting of tree.

Pruning – is nothing but cutting down some nodes to stop overfitting

Decision Trees
Pros
1. Normalization or scaling of data not needed.
2. Handling missing values: No considerable impact of missing values.
3. Easy to explain to non-technical team members.
4. Easy visualization
5. Automatic Feature selection : Irrelevant features won’t affect decision trees.
Cons
1. Prone to overfitting.
2. Sensitive to data. If data changes slightly, the outcomes can change to a very
large extent.
3. Higher time required to train decision trees.
Applications:
Identifying buyers for products, prediction of likelihood of default, which
strategy can maximize profit, finding strategy for cost minimization, which
features are most important to attract and retain customers (is it the frequency
of shopping, is it the frequent discounts, is it the product mix etc), fault
diagnosis in machines(keep measuring pressure, vibrations and other measures
and predict before a fault occurs) etc.

RANDOM FOREST
Random forest is a Supervised Machine Learning Algorithm that is used
widely in Classification and Regression problems. It builds decision trees on
different samples and takes their majority vote for classification and average
in case of regression.

One of the most important features of the Random Forest Algorithm is that
it can handle the data set containing continuous variables as in the case of
regression and categorical variables as in the case of classification.

It performs better results for classification problems.

Working of Random Forest Algorithm

Before understanding the working of the random forest we must look into the
ensemble technique. Ensemble simply means combining multiple models. Thus a
collection of models is used to make predictions rather than an individual model.

Ensemble uses two types of methods:

1. Bagging– It creates a different training subset from sample training data
with replacement & the final output is based on majority voting. For
example, Random Forest.

2. Boosting– It combines weak learners into strong learners by creating

sequential models such that the final model has the highest accuracy. For
example, ADA BOOST, XG BOOST

As mentioned earlier, Random forest works on the Bagging principle. Now let’s dive

in and understand bagging in detail.

Bagging
Bagging, also known as Bootstrap Aggregation is the ensemble technique
used by random forest. Bagging chooses a random sample from the data
set. Hence each model is generated from the samples (Bootstrap Samples)
provided by the Original Data with replacement known as row sampling.
This step of row sampling with replacement is called bootstrap. Now each
model is trained independently which generates results. The final output is
based on majority voting after combining the results of all models. This step
which involves combining all the results and generating output based on
majority voting is known as aggregation.

Now let’s look at an example by breaking it down with the help of the
following figure. Here the bootstrap sample is taken from actual data
(Bootstrap sample 01, Bootstrap sample 02, and Bootstrap sample 03) with
a replacement which means there is a high possibility that each sample
won’t contain unique data. Now the model (Model 01, Model 02, and Model
03) obtained from this bootstrap sample is trained independently. Each
model generates results as shown. Now Happy emoji is having a majority
when compared to sad emoji. Thus based on majority voting final output is
obtained as Happy emoji.

Steps involved in random forest algorithm:

Step 1: In Random forest n number of random records are taken from the
data set having k number of records.

Step 2: Individual decision trees are constructed for each sample.

Step 3: Each decision tree will generate an output.

Step 4: Final output is considered based on Majority Voting or Averaging for

Classification and regression respectively.
For example: consider the fruit basket as the data as shown in the figure
below. Now n number of samples are taken from the fruit basket and an
individual decision tree is constructed for each sample. Each decision tree
will generate an output as shown in the figure. The final output is
considered based on majority voting. In the below figure you can see that
the majority decision tree gives output as an apple when compared to a
banana, so the final output is taken as an apple.

Important Features of Random Forest

1. Diversity- Not all attributes/variables/features are considered while
making an individual tree, each tree is different.
2. Immune to the curse of dimensionality- Since each tree does not
consider all the features, the feature space is reduced.

3. Parallelization-Each tree is created independently out of different data

and attributes. This means that we can make full use of the CPU to build
random forests.

4. Train-Test split- In a random forest we don’t have to segregate the data
for train and test as there will always be 30% of the data which is not seen
by the decision tree.

5. Stability- Stability arises because the result is based on majority voting/

averaging.

Difference Between Decision Tree & Random Forest

Random forest is a collection of decision trees; still, there are a lot of
differences in their behavior.

Decision trees Random Forest

1. Decision trees normally suffer 1. Random forests are created from
from the problem of overfitting if it’s subsets of data and the final output is
allowed to grow without any control. based on average or majority ranking
and hence the problem of overfitting
is taken care of.

2. A single decision tree is faster in 2. It is comparatively slower.

computation.

3. When a data set with features is 3. Random forest randomly selects

taken as input by a decision tree it observations, builds a decision tree
will formulate some set of rules to do and the average result is taken. It
prediction. doesn’t use any set of formulas.

Thus random forests are much more successful than decision trees only if
the trees are diverse and acceptable.
Important Hyperparameters
Hyperparameters are used in random forests to either enhance the
performance and predictive power of models or to make the model faster.

Following hyperparameters increases the predictive power:

1. n_estimators– number of trees the algorithm builds before averaging the

predictions.

2. max_features– maximum number of features random forest considers

splitting a node.

3. mini_sample_leaf– determines the minimum number of leaves required to

split an internal node.

Following hyperparameters increases the speed:

1. n_jobs– it tells the engine how many processors it is allowed to use. If the
value is 1, it can use only one processor but if the value is -1 there is no
limit.

2. random_state– controls randomness of the sample. The model will always

produce the same results if it has a definite value of random state and if it
has been given the same hyperparameters and the same training data.

3. oob_score – OOB means out of the bag. It is a random forest cross-

validation method. In this one-third of the sample is not used to train the
data instead used to evaluate its performance. These samples are called out
of bag samples.
Advantages and Disadvantages of Random Forest
Algorithm

Advantages

1. It can be used in classification and regression problems.

2. It solves the problem of overfitting as output is based on majority voting

or averaging.

3. It performs well even if the data contains null/missing values.

4. Each decision tree created is independent of the other thus it shows the
property of parallelization.

5. It is highly stable as the average answers given by a large number of trees

are taken.

6. It maintains diversity as all the attributes are not considered while making
each decision tree though it is not true in all cases.

7. It is immune to the curse of dimensionality. Since each tree does not

consider all the attributes, feature space is reduced.

8. We don’t have to segregate data into train and test as there will always
be 30% of the data which is not seen by the decision tree made out of
bootstrap.

Disadvantages

1. Random forest is highly complex when compared to decision trees where

decisions can be made by following the path of the tree.
2. Training time is more compared to other models due to its complexity.
Whenever it has to make a prediction each decision tree has to generate
output for the given input data.

Summary

Now, we can conclude that Random Forest is one of the best techniques
with high performance which is widely used in various industries for its
efficiency. It can handle binary, continuous, and categorical data.

Random forest is a great choice if anyone wants to build the model fast and
efficiently as one of the best things about the random forest is it can handle
missing values.

Overall, random forest is a fast, simple, flexible, and robust model with some
limitations.

Online Learning Systems: Methods and Applications with Large-Scale Data Zdzislaw Polkowski (Editor) - The ebook is available for instant download, read anywhere
100% (1)
Online Learning Systems: Methods and Applications with Large-Scale Data Zdzislaw Polkowski (Editor) - The ebook is available for instant download, read anywhere
77 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Sample Investor Teaser Template: Management
No ratings yet
Sample Investor Teaser Template: Management
2 pages
Huawei LTE RF Planning and Optimization Training
100% (2)
Huawei LTE RF Planning and Optimization Training
4 pages
The Impact of Agile Software Development Process On The Quality of Software Product
No ratings yet
The Impact of Agile Software Development Process On The Quality of Software Product
4 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Linear Algebra Interview Questions and Answers - Sanfoundry 2
No ratings yet
Linear Algebra Interview Questions and Answers - Sanfoundry 2
1 page
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
No ratings yet
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
11 pages
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
No ratings yet
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
4 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Heart Disease PredictionUsing
No ratings yet
Heart Disease PredictionUsing
6 pages
Python-Linear Regression
No ratings yet
Python-Linear Regression
72 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
1syllabus Machine Learning and Data Mining 2015
No ratings yet
1syllabus Machine Learning and Data Mining 2015
9 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Week 5 Programming Assignment: (Https://swayam - Gov.in)
No ratings yet
Week 5 Programming Assignment: (Https://swayam - Gov.in)
12 pages
Binomial Distribution
No ratings yet
Binomial Distribution
7 pages
Comm-05-Random Variables and Processes
No ratings yet
Comm-05-Random Variables and Processes
90 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Text
No ratings yet
Text
131 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Sem - 2 - Engineering Maths - III & IV
No ratings yet
Sem - 2 - Engineering Maths - III & IV
336 pages
Max and Min PDF
No ratings yet
Max and Min PDF
19 pages
Unit 3 - Week 1 Quiz
No ratings yet
Unit 3 - Week 1 Quiz
3 pages
Numerical Analysis and Computational Applications Farkaleet Series
No ratings yet
Numerical Analysis and Computational Applications Farkaleet Series
139 pages
Probability Distributions in Data Science - Towards Data Science
No ratings yet
Probability Distributions in Data Science - Towards Data Science
15 pages
Complete Download Introduction to Probability Detailed Solutions to Exercises 1st Edition David F Anderson Timo Sepp Al Ainen Benedek Valkó PDF All Chapters
100% (1)
Complete Download Introduction to Probability Detailed Solutions to Exercises 1st Edition David F Anderson Timo Sepp Al Ainen Benedek Valkó PDF All Chapters
41 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Statistics Basics From IITM Statistits 2 Course Week - 0
100% (1)
Statistics Basics From IITM Statistits 2 Course Week - 0
71 pages
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
No ratings yet
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
3 pages
Inclusion Exclusion Principle
No ratings yet
Inclusion Exclusion Principle
8 pages
CS7641 Machine Learning Midterm Notes PDF
No ratings yet
CS7641 Machine Learning Midterm Notes PDF
239 pages
Ma5160 Applied Probability and Statistics 1 PDF
50% (2)
Ma5160 Applied Probability and Statistics 1 PDF
4 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Linear Algebra Challenging Problems For Students, 2 Edition (Fuzhen Zhang)
100% (1)
Linear Algebra Challenging Problems For Students, 2 Edition (Fuzhen Zhang)
266 pages
M.sc. (Maths) Book Final 29-11-18 PDF
No ratings yet
M.sc. (Maths) Book Final 29-11-18 PDF
171 pages
CpyProbStatSection PDF
No ratings yet
CpyProbStatSection PDF
240 pages
Function
No ratings yet
Function
17 pages
Lecture 9 Moments
No ratings yet
Lecture 9 Moments
29 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Bayesforbeginners
No ratings yet
Bayesforbeginners
21 pages
Python Programming Set 1
100% (1)
Python Programming Set 1
8 pages
A Comparative Study and Systematic Analysis of XAI Models and Their Applications in Healthcare
No ratings yet
A Comparative Study and Systematic Analysis of XAI Models and Their Applications in Healthcare
26 pages
JNTUA Linear Algebra and Calculus Notes - R20
0% (1)
JNTUA Linear Algebra and Calculus Notes - R20
213 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Continue
No ratings yet
Continue
2 pages
Complex Variables and Numerical Methods - BTech Mechanical Engineering Notes, Ebook PDF
No ratings yet
Complex Variables and Numerical Methods - BTech Mechanical Engineering Notes, Ebook PDF
92 pages
ML Interview
No ratings yet
ML Interview
17 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
7 pages
Unit 5 Fitting of Curves: Structure
No ratings yet
Unit 5 Fitting of Curves: Structure
21 pages
Unit - I Discrete Mathematics: Dr. Krishna Keerthi Chennam
No ratings yet
Unit - I Discrete Mathematics: Dr. Krishna Keerthi Chennam
105 pages
Download Full Discrete Mathematics Special Indian Edition Semyour Lipschutz PDF All Chapters
100% (10)
Download Full Discrete Mathematics Special Indian Edition Semyour Lipschutz PDF All Chapters
77 pages
GATE Problems in Probability
No ratings yet
GATE Problems in Probability
12 pages
1552976863mathematical Statistics (MS) PDF
No ratings yet
1552976863mathematical Statistics (MS) PDF
15 pages
Notes PDF
No ratings yet
Notes PDF
407 pages
Advanced Numerical Analysis
No ratings yet
Advanced Numerical Analysis
2 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Mostostalex ENG ST Capacity
No ratings yet
Mostostalex ENG ST Capacity
1 page
Cell Charger
No ratings yet
Cell Charger
13 pages
Techniques and Management of E-Waste Recycling (4 Credits)
No ratings yet
Techniques and Management of E-Waste Recycling (4 Credits)
1 page
At Intersection: Wall Base Reinforcing
No ratings yet
At Intersection: Wall Base Reinforcing
9 pages
SageX3 Architecture Guide V12
100% (1)
SageX3 Architecture Guide V12
28 pages
1st Monthly Exam in Math 7
No ratings yet
1st Monthly Exam in Math 7
3 pages
Mobile Pressure Switch: P04159 - Normally Closed P04160 - Normally Open Features
No ratings yet
Mobile Pressure Switch: P04159 - Normally Closed P04160 - Normally Open Features
2 pages
SAGLOBAL Catalogue 2021 - 2022
No ratings yet
SAGLOBAL Catalogue 2021 - 2022
78 pages
Week 9 CRM
No ratings yet
Week 9 CRM
73 pages
X07a - Integrating Jenkins With Automated Unit Testing
No ratings yet
X07a - Integrating Jenkins With Automated Unit Testing
5 pages
EdgeWise Building Workflow Guide
No ratings yet
EdgeWise Building Workflow Guide
32 pages
Top Data Science Interview Questions and Answers in 2023 PDF
100% (1)
Top Data Science Interview Questions and Answers in 2023 PDF
14 pages
As Inter College Workshop Broucher
No ratings yet
As Inter College Workshop Broucher
1 page
UPI Circular No 32 - Multi-Bank Approach - 15th Sept
No ratings yet
UPI Circular No 32 - Multi-Bank Approach - 15th Sept
3 pages
CET® Onsite Alternate Energy
No ratings yet
CET® Onsite Alternate Energy
7 pages
MongoDB Datatypes
No ratings yet
MongoDB Datatypes
14 pages
BTech - CSE (With Lab Manual)
No ratings yet
BTech - CSE (With Lab Manual)
77 pages
Base Transceiver Station
100% (1)
Base Transceiver Station
2 pages
Insurance Premium Prediction
No ratings yet
Insurance Premium Prediction
12 pages
Share Market Course Details
No ratings yet
Share Market Course Details
3 pages
Machine Shop Practice - Volume II, Industrial Press, New York, 1971
No ratings yet
Machine Shop Practice - Volume II, Industrial Press, New York, 1971
2 pages
What Is Data Manipulation Language (DML) - Definition From Techopedia
No ratings yet
What Is Data Manipulation Language (DML) - Definition From Techopedia
8 pages
R Programming
No ratings yet
R Programming
9 pages
SF 71MC
No ratings yet
SF 71MC
1 page
Omron PLC Manual
No ratings yet
Omron PLC Manual
33 pages

Decision Tree & Random Forest

Uploaded by

Decision Tree & Random Forest

Uploaded by

Decision Tree

What is a Decision Tree?

It is a graphical representation of all the possible solution to a decision that is

Types of Decision tree –

The formula for Entropy is shown below:

Here p+ is the probability of positive class

p– is the probability of negative class

S is the subset of the training example

Entropy basically measures the impurity of a node. Impurity is the degree of

Image Source: Author

As mentioned earlier the goal of machine learning is to decrease the uncertainty or

To understand this better let’s consider an example:

Feature 1 is “Energy” which takes two values “high” and “low”

Feature 2 is “Motivation” which takes 3 values “No motivation”, “Neutral” and “Highly

Let’s calculate the entropy:

To see the weighted average of entropy of each node we will do as follows:

Image Source: Author

Let’s calculate the entropy here:

When to stop splitting?

There are more hyperparameters such as :

min_samples_leaf – represents the minimum number of samples required to be in the leaf

There are mainly 2 ways for pruning:

Pruning – is nothing but cutting down some nodes to stop overfitting

It performs better results for classification problems.

Working of Random Forest Algorithm

Ensemble uses two types of methods:

2. Boosting– It combines weak learners into strong learners by creating

in and understand bagging in detail.

Steps involved in random forest algorithm:

Step 2: Individual decision trees are constructed for each sample.

Step 3: Each decision tree will generate an output.

Step 4: Final output is considered based on Majority Voting or Averaging for

Important Features of Random Forest

3. Parallelization-Each tree is created independently out of different data

5. Stability- Stability arises because the result is based on majority voting/

Difference Between Decision Tree & Random Forest

Decision trees Random Forest

2. A single decision tree is faster in 2. It is comparatively slower.

3. When a data set with features is 3. Random forest randomly selects

Following hyperparameters increases the predictive power:

1. n_estimators– number of trees the algorithm builds before averaging the

2. max_features– maximum number of features random forest considers

3. mini_sample_leaf– determines the minimum number of leaves required to

Following hyperparameters increases the speed:

2. random_state– controls randomness of the sample. The model will always

3. oob_score – OOB means out of the bag. It is a random forest cross-

1. It can be used in classification and regression problems.

2. It solves the problem of overfitting as output is based on majority voting

3. It performs well even if the data contains null/missing values.

5. It is highly stable as the average answers given by a large number of trees

7. It is immune to the curse of dimensionality. Since each tree does not

1. Random forest is highly complex when compared to decision trees where

You might also like