0% found this document useful (0 votes)
17 views

ID3 Decision Tree Explanation

Uploaded by

simohod818
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

ID3 Decision Tree Explanation

Uploaded by

simohod818
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ChatGPT - ID3 Decision Tree Explanation

The ID3 (Iterative Dichotomiser 3) algorithm is a popular decision tree algorithm used for
classification tasks. It was developed by Ross Quinlan in 1986 and is one of the simplest
algorithms for creating decision trees.

Key Concepts:
1. Decision Tree: A decision tree is a flowchart-like structure where each internal node
represents a decision based on the value of a feature (attribute), and each leaf node
represents the classification outcome (or decision).

2. ID3: The ID3 algorithm is used to build a decision tree by selecting the best feature to
split the data at each step, aiming to reduce uncertainty in the data. It uses information
gain as a criterion to determine which attribute to split on at each node.

Steps in ID3 Algorithm:


1. Start at the root:

The algorithm begins by considering all the data at the root node.

2. Calculate Information Gain:

Information Gain (IG) is used to measure how much uncertainty is reduced when
we split the data on an attribute. It is based on Entropy, a measure of disorder or
impurity in the dataset.

The formula for Entropy H(D) of a dataset D is:

k
H(D) = − ∑ pi log2 (pi )
​ ​ ​ ​

i=1

Where:

pi is the probability of class i in the dataset D


k is the number of classes.

Information Gain is calculated as:

V
∣Dv ∣
IG(D, A) = H(D) − ∑ H(Dv )

∣D∣
​ ​ ​

v=1

Where:

1/8
A is the attribute.
Dv is the subset of D where attribute A has value v .

∣D∣ is the size of dataset D.


V is the number of possible values for the attribute A.

The attribute with the highest Information Gain is chosen for splitting the data.

3. Split the Data:

The dataset is split into subsets based on the selected attribute. Each subset is
formed by the unique values of the chosen attribute.

4. Repeat the Process:

For each subset created in the previous step, the algorithm recursively repeats the
process: calculate entropy, determine the best attribute, and split the data further.

5. Stopping Criteria:

The algorithm stops when:

All instances in a subset belong to the same class (i.e., no further splitting is
needed).

There are no more attributes to split on (all attributes have been used).

The dataset is empty.

Example (Simple):
Let's say we have a dataset with three attributes: Weather, Temperature, and PlayTennis
(whether the person plays tennis or not). We want to predict whether someone will play
tennis based on weather conditions and temperature.

Dataset:

Weather Temperature PlayTennis

Sunny Hot No

Sunny Mild Yes

Overcast Hot Yes

Overcast Cool Yes

Rainy Mild Yes

Rainy Cool No

2/8
Step-by-Step Process:

1. Calculate the Entropy of the Dataset:


We start by calculating the entropy of the target attribute PlayTennis . The classes are
"Yes" and "No". Based on the frequencies:

4 "Yes" (play tennis)

2 "No" (don’t play tennis)

The entropy for this dataset is:

4 4 2 2
H(PlayT ennis) = − log2 ( ) − log2 ( )
6 6 6 6
​ ​ ​ ​ ​ ​

2. Calculate Information Gain for Each Attribute: For each attribute (Weather,
Temperature), calculate how much uncertainty is reduced if we split the data based on
that attribute. For example, for the Weather attribute:

If the weather is Sunny, the possible outcome is "No".

If the weather is Overcast, the possible outcome is "Yes".

If the weather is Rainy, the possible outcome is mixed, so we calculate the entropy
for each subset and compute the weighted average.

Calculate the information gain for Weather and Temperature, and choose the attribute
with the highest information gain.

3. Create the Root Node and Branches: If, say, the attribute Weather has the highest
information gain, we make Weather the root of the tree. We create branches for Sunny,
Overcast, and Rainy.

4. Repeat the Process for Each Branch: For each branch, the process repeats to decide
whether to split further, based on the best attribute at that node.

5. Construct the Decision Tree: Continue recursively until the tree is fully constructed.

Advantages and Disadvantages of ID3:


Advantages:

Simple to understand and easy to implement.

Works well with categorical data.

The tree structure is easy to interpret.

Disadvantages:

3/8
Prone to overfitting, especially if the tree is too deep.

Does not handle continuous attributes well (requires discretization).

Tends to favor attributes with more categories, which can lead to biased splits.

Conclusion:
The ID3 algorithm is a foundational decision tree algorithm that uses entropy and
information gain to create simple, interpretable classification models. Although it's powerful
for small datasets, it may need enhancements (like pruning) to handle larger, more complex
data effectively.

Let's go through a detailed example of the ID3 algorithm using a simple dataset and
visualize it with a decision tree diagram.

Example Dataset:
We are trying to predict whether a person will play tennis ( PlayTennis ) based on two
features:

1. Weather (Sunny, Overcast, Rainy)

2. Temperature (Hot, Mild, Cool)

Here’s the dataset:

Weather Temperature PlayTennis

Sunny Hot No

Sunny Mild Yes

Overcast Hot Yes

Overcast Cool Yes

Rainy Mild Yes

Rainy Cool No

Step 1: Calculate Entropy for the Entire Dataset


We first calculate the entropy of the target variable PlayTennis.

There are 6 instances in total:

4 instances where the person plays tennis (Yes)

2 instances where the person does not play tennis (No)

4/8
The entropy of a dataset D is calculated as:

k
H(D) = − ∑ pi log2 (pi ) ​ ​ ​ ​

i=1

For PlayTennis, the entropy is:

4 4 2 2
H(PlayT ennis) = − ( log2 ( ) + log2 ( ))
6 6 6 6
​ ​ ​ ​ ​ ​

Let’s compute:

H(PlayT ennis) = − (0.6667 log2 (0.6667) + 0.3333 log2 (0.3333))


​ ​

H(PlayT ennis) ≈ 0.918

Step 2: Calculate Information Gain for Each Attribute


Now, we calculate the Information Gain for each attribute: Weather and Temperature. The
Information Gain (IG) is the reduction in entropy when the data is split on a particular
attribute.

For Weather:

The possible values for Weather are Sunny, Overcast, and Rainy.

We will calculate the entropy for each subset of data created by these values.

Subset for Sunny:

Instances: Sunny, Hot, No and Sunny, Mild, Yes

Outcomes: 1 "No" and 1 "Yes"

Entropy of Sunny subset:

1 1 1 1
H(Sunny) = − ( log2 ( ) + log2 ( )) = 1
2 2 2 2
​ ​ ​ ​ ​

Subset for Overcast:

Instances: Overcast, Hot, Yes and Overcast, Cool, Yes

Outcome: 2 "Yes"

Entropy of Overcast subset:

H(Overcast) = 0 (since all are "Yes")

5/8
Subset for Rainy:

Instances: Rainy, Mild, Yes and Rainy, Cool, No

Outcomes: 1 "Yes" and 1 "No"

Entropy of Rainy subset:

H(Rainy) = 1

Now we calculate the Weighted Entropy for the Weather attribute:

2 2 2
H(W eather) = H(Sunny) + H(Overcast) + H(Rainy)
6 6 6
​ ​ ​

2 2 2
H(W eather) = × 1 + × 0 + × 1 = 0.667
6 6 6
​ ​ ​

Now, we calculate the Information Gain for Weather:

IG(W eather) = H(PlayT ennis) − H(W eather)

IG(W eather) = 0.918 − 0.667 = 0.251

For Temperature:

The possible values for Temperature are Hot, Mild, and Cool.

We calculate the entropy for each subset of data created by these values.

Subset for Hot:

Instances: Sunny, Hot, No and Overcast, Hot, Yes

Outcomes: 1 "No" and 1 "Yes"

Entropy of Hot subset:

H(Hot) = 1

Subset for Mild:

Instances: Sunny, Mild, Yes , Rainy, Mild, Yes

Outcomes: 2 "Yes"

Entropy of Mild subset:

H(M ild) = 0 (since all are "Yes")

6/8
Subset for Cool:

Instances: Overcast, Cool, Yes , Rainy, Cool, No

Outcomes: 1 "Yes" and 1 "No"

Entropy of Cool subset:

H(Cool) = 1

Now we calculate the Weighted Entropy for the Temperature attribute:

2 2 2
H(T emperature) = H(Hot) + H(M ild) + H(Cool)
6 6 6
​ ​ ​

2 2 2
H(T emperature) = × 1 + × 0 + × 1 = 0.667
6 6 6
​ ​ ​

Now, we calculate the Information Gain for Temperature:

IG(T emperature) = H(PlayT ennis) − H(T emperature)

IG(T emperature) = 0.918 − 0.667 = 0.251

Step 3: Choose the Best Attribute to Split


Since the Information Gain for both Weather and Temperature is the same (0.251), we can
choose either attribute to split on. For simplicity, let’s choose Weather.

Step 4: Create the Decision Tree


Now, we can build the decision tree. Starting with Weather as the root, we split into the
branches:

Sunny → PlayTennis = No (since both instances under Sunny have this outcome)

Overcast → PlayTennis = Yes (since both instances under Overcast have this outcome)

Rainy → We split further based on Temperature.

For Rainy:

Mild → PlayTennis = Yes

Cool → PlayTennis = No

Final Decision Tree Diagram:


yaml

7/8
Weather
/ | \
Sunny Overcast Rainy
| | / \
No Yes Mild Cool
| |
Yes No

Conclusion:
This decision tree can now be used to predict whether a person will play tennis based on
their weather conditions and temperature. For example:

If the weather is Sunny, the decision is No (do not play tennis).

If the weather is Overcast, the decision is Yes (play tennis).

If the weather is Rainy and the temperature is Mild, the decision is Yes (play tennis), but
if the temperature is Cool, the decision is No (do not play tennis).

This process, using information gain and entropy, helps in creating an efficient and
interpretable decision tree for classification tasks.

8/8

You might also like