Decision Tree

The document outlines various types of decision tree algorithms, including ID3, C4.5, and CART, detailing their methodologies and assumptions such as binary splits and recursive partitioning. It explains key concepts like entropy and Gini impurity, which are used to evaluate splits, and introduces information gain as a metric for determining the effectiveness of splits. Additionally, the document discusses the advantages and disadvantages of decision trees, highlighting their interpretability and risk of overfitting.

Uploaded by

ashwinitetame6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Decision Tree

Uploaded by

ashwinitetame6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Example of Decision Tree

Types of Decision Trees

• Hunt’s algorithm, which was developed in the 1960s to model human learning in Psychology,
forms the foundation of many popular decision tree algorithms, such as the following:

1. - ID3: Ross Quinlan is credited within the development of ID3, which is shorthand for “Iterative
Dichotomiser 3.” This algorithm leverages entropy and information gain as metrics to evaluate
candidate splits. Some of Quinlan’s research on this algorithm from 1986 can be found here.

2. - C4.5: This algorithm is considered a later iteration of ID3, which was also developed by
Quinlan. It can use information gain or gain ratios to evaluate split points within the decision
trees.

3. - CART: The term, CART, is an abbreviation for “classification and regression trees” and was
introduced by Leo Breiman. This algorithm typically utilizes Gini impurity to identify the ideal
attribute to split on. Gini impurity measures how often a randomly chosen attribute is
misclassified. When evaluating using Gini impurity, a lower value is more ideal.
Decision Tree Assumptions
• Several assumptions are made to build effective models when creating decision
trees. These assumptions help guide the tree’s construction and impact its
performance. Here are some common assumptions and considerations when
creating decision trees:
1. Binary Splits
➢Decision trees typically make binary splits, meaning each node divides the data
into two subsets based on a single feature or condition. This assumes that each
decision can be represented as a binary choice.
2. Recursive Partitioning
➢Decision trees use a recursive partitioning process, where each node is divided
into child nodes, and this process continues until a stopping criterion is met. This
assumes that data can be effectively subdivided into smaller, more manageable
subsets.
3. Feature Independence
➢These trees often assume that the features used for splitting nodes are independent.
In practice, feature independence may not hold, but it can still perform well if
features are correlated.
4. Homogeneity
➢It aim to create homogeneous subgroups in each node, meaning that the samples
within a node are as similar as possible regarding the target variable. This
assumption helps in achieving clear decision boundaries.
5. Top-Down Greedy Approach
➢They are constructed using a top-down, greedy approach, where each split is
chosen to maximize information gain or minimize impurity at the current node.
This may not always result in the globally optimal tree.
❖A leaf node in a decision tree is the terminal node at the bottom of the tree, where
no further splits are made. Leaf nodes represent the final output or prediction of
the decision tree. Once a data point reaches a leaf node, a decision or prediction is
made based on the majority class (for classification) or the average value (for
regression) of the data points that reach that leaf.

❖To check mathematically if any split is pure split or not we use entropy or gini
impurity. Information Gain helps us to determine which features need to be
selected
Entropy
➢Entropy is a concept borrowed from information theory and is commonly used as
a measure of uncertainty or disorder in a set of data. In the context of decision
trees, entropy is often employed as a criterion to decide how to split data points at
each node, aiming to create subsets that are more homogeneous with respect to the
target variable.
➢Entropy is a measure of uncertainty or disorder. A low entropy indicates a
more ordered or homogeneous set, while a high entropy signifies greater
disorder or diversity.
➢In the context of a decision tree, the goal is to reduce entropy by selecting features
and split points that result in more ordered subsets.
➢Entropy values range from 0 to 1. The minimum entropy (0) occurs when all
instances belong to a single class, making the set perfectly ordered. The maximum
entropy () occurs when instances are evenly distributed across all classes, creating
a state of maximum disorder.
➢Entropy values can fall between 0 and 1. If all samples in data set, S, belong to
one class, then entropy will equal zero. If half of the samples are classified as one
class and the other half are in another class, entropy will be at its highest at 1. In
order to select the best feature to split on and find the optimal decision tree, the
attribute with the smallest amount of entropy should be used.
➢At each node of a decision tree, the algorithm evaluates the entropy for each
feature and split point. The feature and split point that result in the largest
reduction in entropy are chosen for the split. The reduction in entropy is often
referred to as Information Gain and is calculated as the difference between the
entropy before and after the split.
Entropy and Gini impurity formulas
The minimum value of entropy is 0.
Thus, for different numbers of classes:
➢ For 2 classes (binary classification): maximum entropy is 1.
➢ For 3 classes: maximum entropy is log⁡ 2(3)≈1.585
➢ For 4 classes: maximum entropy is log⁡ 2(4)=2 and so on
➢ G = 0 indicates a perfectly pure node (all elements belong to the same class).
➢ G = 0.5 indicates maximum impurity (elements are evenly distributed among all classes)
Gini impurity
➢Gini impurity is a measure of the impurity or disorder in a set of elements,
commonly used in decision tree algorithms, especially for classification tasks. It
quantifies the likelihood of misclassification of a randomly chosen element in the
dataset.

➢A lower Gini impurity suggests a more homogeneous set of elements within the
node, making it an attractive split in a decision tree. Decision tree algorithms aim
to minimize the Gini impurity at each node, selecting the feature and split point
that results in the lowest impurity.
❑ To check the impurity of feature 2 and feature 3 we will take the help for
Entropy formula.
For feature 3,
➢We can clearly see from the tree itself that left node has low entropy or more
purity than right node since left node has a greater number of “yes” and it is easy
to decide here.
➢Always remember that the higher the Entropy, the lower will be the purity and the
higher will be the impurity.
➢As mentioned earlier the goal of machine learning is to decrease the uncertainty or
impurity in the dataset, here by using the entropy we are getting the impurity of a
particular node, we don’t know if the parent entropy or the entropy of a particular
node has decreased or not.
➢For this, we bring a new metric called “Information gain” which tells us how
much the parent entropy has decreased after splitting it with some feature.
Information Gain
➢Information gain represents the difference in entropy before and after a split on a
given attribute. The attribute with the highest information gain will produce the
best split as it’s doing the best job at classifying the training data according to its
target classification. Information gain is usually represented with the following
formula,

where;

▪ a represents a specific attribute or class label

▪ Entropy(S) is the entropy of dataset, S
▪ |Sv|/|S| represents the proportion of the values in Sv to
the number of values in dataset, S.
Example
➢Let's take an example of a feature f1 split into nodes c1 and c2. The feature f1 has
6 yes and 3 no, and when it is split c1 it has 3 yes and 3 no and c2 has 3 yes and 0
no.

If our dataset is huge we should choose Gini impurity as its calculation is much simpler compared to entropy
Example 2: Imagine that we have the following arbitrary dataset:
➢For this dataset, the entropy is 0.94. This can be calculated by finding the
proportion of days where “Play Tennis” is “Yes”, which is 9/14, and the proportion
of days where “Play Tennis” is “No”, which is 5/14. Then, these values can be
plugged into the entropy formula above.

➢Entropy (Tennis) = -(9/14) log2(9/14) – (5/14) log2 (5/14) = 0.94

➢We can then compute the information gain for each of the attributes individually.
For example, the information gain for the attribute, “Humidity” would be the
following:

➢Gain (Tennis, Humidity) = (0.94)-(7/14)(0.985) – (7/14)(0.592) = 0.151

Advantages of Decision Tree

• Easy to Understand: They are simple to visualize and interpret, making them easy
to understand even for non-experts.
• Handles Both Numerical and Categorical Data: They can work with both types of
data without needing much preprocessing.
• No Need for Data Scaling: These trees do not require normalization or scaling of
data.
• Automated Feature Selection: They automatically identify the most important
features for decision-making.
• Handles Non-Linear Relationships: They can capture non-linear patterns in the
data effectively.
Disadvantages of Decision Trees

• Overfitting Risk: It can easily overfit the training data, especially if they are too
deep.
• Unstable with Small Changes: Small changes in data can lead to completely
different trees.
• Biased with Imbalanced Data: They tend to be biased if one class dominates the
dataset.
• Limited to Axis-Parallel Splits: They struggle with diagonal or complex decision
boundaries.
• Can Become Complex: Large trees can become hard to interpret and may lose
their simplicity.

Crime and Punishment Through Time
100% (3)
Crime and Punishment Through Time
186 pages
Means of Transportation Dialogue
40% (10)
Means of Transportation Dialogue
3 pages
Gap Analysis/Identifying Priority Improvement Areas: Reporter: Edrohn R. Cumla
No ratings yet
Gap Analysis/Identifying Priority Improvement Areas: Reporter: Edrohn R. Cumla
16 pages
Act9
No ratings yet
Act9
22 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
Decision Tree Tutorial
No ratings yet
Decision Tree Tutorial
8 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
Classification
No ratings yet
Classification
30 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
DS4 - CLS-Decision Tree
No ratings yet
DS4 - CLS-Decision Tree
32 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
COS10022 DSP Week05 Decision Tree and Random Forest
No ratings yet
COS10022 DSP Week05 Decision Tree and Random Forest
50 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
ML-chap9_2024_110217
No ratings yet
ML-chap9_2024_110217
52 pages
MODULE 4-Dr - GM
No ratings yet
MODULE 4-Dr - GM
23 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Trinh Khanh Ly 20213676
No ratings yet
Trinh Khanh Ly 20213676
13 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Data Mining Algorithms Classification L4
No ratings yet
Data Mining Algorithms Classification L4
7 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Classification
No ratings yet
Classification
7 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Decision Tree: "For Each Node of The Tree, The Information Value Measures
No ratings yet
Decision Tree: "For Each Node of The Tree, The Information Value Measures
3 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Edexcel GCE January 2010 Core Mathematics C2 QP
100% (1)
Edexcel GCE January 2010 Core Mathematics C2 QP
24 pages
Chapter 7 Memory
No ratings yet
Chapter 7 Memory
67 pages
BDS Curriculum 2016
No ratings yet
BDS Curriculum 2016
302 pages
Nist SP 800-229
No ratings yet
Nist SP 800-229
27 pages
Student Success Facilitator - 2 Full Time Positions
No ratings yet
Student Success Facilitator - 2 Full Time Positions
2 pages
Quality Learning Environments Design Studio Classroom
No ratings yet
Quality Learning Environments Design Studio Classroom
11 pages
1 - Journeys in Language Learning As Adults - A Kannadiga Learning Tamil and A Tamilian Learning Kannada
No ratings yet
1 - Journeys in Language Learning As Adults - A Kannadiga Learning Tamil and A Tamilian Learning Kannada
5 pages
GRADE 11 MATHS INVESTIGATION TERM 1 Jan 2020
No ratings yet
GRADE 11 MATHS INVESTIGATION TERM 1 Jan 2020
5 pages
Michel Thomas Mandarin
No ratings yet
Michel Thomas Mandarin
9 pages
GE1901 Syllabus and Course Outline2
No ratings yet
GE1901 Syllabus and Course Outline2
3 pages
References
No ratings yet
References
2 pages
CI-2 Introduction to ANNs
No ratings yet
CI-2 Introduction to ANNs
33 pages
Unity Education Accelerator Solutions Flyer
No ratings yet
Unity Education Accelerator Solutions Flyer
2 pages
Modern Spiritualism and Scottish Art 1860 To 1940 Thesis
No ratings yet
Modern Spiritualism and Scottish Art 1860 To 1940 Thesis
343 pages
StatProb Monthly Exam 2022 2023
No ratings yet
StatProb Monthly Exam 2022 2023
3 pages
AIAS1_Week10_Activity1
No ratings yet
AIAS1_Week10_Activity1
2 pages
John MacArthur_servant of the Word and Flock (1)
No ratings yet
John MacArthur_servant of the Word and Flock (1)
282 pages
IIT-JEE Super Course in Mathematics - Vol 4 Coordinate Geometry and Vector Algebra (Trishna Knowledge Systems) (Z-Library)
100% (1)
IIT-JEE Super Course in Mathematics - Vol 4 Coordinate Geometry and Vector Algebra (Trishna Knowledge Systems) (Z-Library)
366 pages
Vark Learning Style
No ratings yet
Vark Learning Style
4 pages
Kundan Kumar: Web Developer
No ratings yet
Kundan Kumar: Web Developer
3 pages
Data Gathering Answer The Questions On The Last Part of PDF Answer The Following Questions and Explain or Do What Is Required
No ratings yet
Data Gathering Answer The Questions On The Last Part of PDF Answer The Following Questions and Explain or Do What Is Required
3 pages
Appendix
No ratings yet
Appendix
9 pages
The Nature of Human Creativity 1st Edition Robert J. Sternberg - The ebook version is available in PDF and DOCX for easy access
100% (1)
The Nature of Human Creativity 1st Edition Robert J. Sternberg - The ebook version is available in PDF and DOCX for easy access
72 pages
Syllabus Aku 3501 - Taxation
No ratings yet
Syllabus Aku 3501 - Taxation
3 pages
Notes On MULTICULTURAL CURRICULUM WRITE-UP
No ratings yet
Notes On MULTICULTURAL CURRICULUM WRITE-UP
15 pages
Paciano Rizal Elementary School
No ratings yet
Paciano Rizal Elementary School
3 pages
Anitesh Barua Bio Short 2013
No ratings yet
Anitesh Barua Bio Short 2013
1 page