ID3 Decision Tree Explanation
ID3 Decision Tree Explanation
The ID3 (Iterative Dichotomiser 3) algorithm is a popular decision tree algorithm used for
classification tasks. It was developed by Ross Quinlan in 1986 and is one of the simplest
algorithms for creating decision trees.
Key Concepts:
1. Decision Tree: A decision tree is a flowchart-like structure where each internal node
represents a decision based on the value of a feature (attribute), and each leaf node
represents the classification outcome (or decision).
2. ID3: The ID3 algorithm is used to build a decision tree by selecting the best feature to
split the data at each step, aiming to reduce uncertainty in the data. It uses information
gain as a criterion to determine which attribute to split on at each node.
The algorithm begins by considering all the data at the root node.
Information Gain (IG) is used to measure how much uncertainty is reduced when
we split the data on an attribute. It is based on Entropy, a measure of disorder or
impurity in the dataset.
k
H(D) = − ∑ pi log2 (pi )
i=1
Where:
V
∣Dv ∣
IG(D, A) = H(D) − ∑ H(Dv )
∣D∣
v=1
Where:
1/8
A is the attribute.
Dv is the subset of D where attribute A has value v .
The attribute with the highest Information Gain is chosen for splitting the data.
The dataset is split into subsets based on the selected attribute. Each subset is
formed by the unique values of the chosen attribute.
For each subset created in the previous step, the algorithm recursively repeats the
process: calculate entropy, determine the best attribute, and split the data further.
5. Stopping Criteria:
All instances in a subset belong to the same class (i.e., no further splitting is
needed).
There are no more attributes to split on (all attributes have been used).
Example (Simple):
Let's say we have a dataset with three attributes: Weather, Temperature, and PlayTennis
(whether the person plays tennis or not). We want to predict whether someone will play
tennis based on weather conditions and temperature.
Dataset:
Sunny Hot No
Rainy Cool No
2/8
Step-by-Step Process:
4 4 2 2
H(PlayT ennis) = − log2 ( ) − log2 ( )
6 6 6 6
2. Calculate Information Gain for Each Attribute: For each attribute (Weather,
Temperature), calculate how much uncertainty is reduced if we split the data based on
that attribute. For example, for the Weather attribute:
If the weather is Rainy, the possible outcome is mixed, so we calculate the entropy
for each subset and compute the weighted average.
Calculate the information gain for Weather and Temperature, and choose the attribute
with the highest information gain.
3. Create the Root Node and Branches: If, say, the attribute Weather has the highest
information gain, we make Weather the root of the tree. We create branches for Sunny,
Overcast, and Rainy.
4. Repeat the Process for Each Branch: For each branch, the process repeats to decide
whether to split further, based on the best attribute at that node.
5. Construct the Decision Tree: Continue recursively until the tree is fully constructed.
Disadvantages:
3/8
Prone to overfitting, especially if the tree is too deep.
Tends to favor attributes with more categories, which can lead to biased splits.
Conclusion:
The ID3 algorithm is a foundational decision tree algorithm that uses entropy and
information gain to create simple, interpretable classification models. Although it's powerful
for small datasets, it may need enhancements (like pruning) to handle larger, more complex
data effectively.
Let's go through a detailed example of the ID3 algorithm using a simple dataset and
visualize it with a decision tree diagram.
Example Dataset:
We are trying to predict whether a person will play tennis ( PlayTennis ) based on two
features:
Sunny Hot No
Rainy Cool No
4/8
The entropy of a dataset D is calculated as:
k
H(D) = − ∑ pi log2 (pi )
i=1
4 4 2 2
H(PlayT ennis) = − ( log2 ( ) + log2 ( ))
6 6 6 6
Let’s compute:
For Weather:
The possible values for Weather are Sunny, Overcast, and Rainy.
We will calculate the entropy for each subset of data created by these values.
1 1 1 1
H(Sunny) = − ( log2 ( ) + log2 ( )) = 1
2 2 2 2
Outcome: 2 "Yes"
5/8
Subset for Rainy:
H(Rainy) = 1
2 2 2
H(W eather) = H(Sunny) + H(Overcast) + H(Rainy)
6 6 6
2 2 2
H(W eather) = × 1 + × 0 + × 1 = 0.667
6 6 6
For Temperature:
The possible values for Temperature are Hot, Mild, and Cool.
We calculate the entropy for each subset of data created by these values.
H(Hot) = 1
Outcomes: 2 "Yes"
6/8
Subset for Cool:
H(Cool) = 1
2 2 2
H(T emperature) = H(Hot) + H(M ild) + H(Cool)
6 6 6
2 2 2
H(T emperature) = × 1 + × 0 + × 1 = 0.667
6 6 6
Sunny → PlayTennis = No (since both instances under Sunny have this outcome)
Overcast → PlayTennis = Yes (since both instances under Overcast have this outcome)
For Rainy:
Cool → PlayTennis = No
7/8
Weather
/ | \
Sunny Overcast Rainy
| | / \
No Yes Mild Cool
| |
Yes No
Conclusion:
This decision tree can now be used to predict whether a person will play tennis based on
their weather conditions and temperature. For example:
If the weather is Rainy and the temperature is Mild, the decision is Yes (play tennis), but
if the temperature is Cool, the decision is No (do not play tennis).
This process, using information gain and entropy, helps in creating an efficient and
interpretable decision tree for classification tasks.
8/8