Decision Tree
Decision Tree
Decision Trees
A Decision Tree is a machine learning model that is represented as a flowchart-like structure, where:
The tree is structured in a way that helps decision-making based on the features (attributes) of the data.
The flow from the root to the leaf nodes provides a decision rule that helps predict the class or value for
a given set of features.
1. Start at the Root Node: The root node represents the entire dataset. We begin by selecting a
feature (attribute) that best splits the data into different classes or outcomes. This split is
determined by specific criteria like Gini Impurity, Information Gain, or Variance Reduction.
2. Split Data Based on Features: At each internal node, the dataset is split based on the feature
that provides the best separation between classes or predicts the target value the best.
3. Continue Splitting: This process continues recursively at each internal node until we reach the
leaf nodes. These leaf nodes hold the final decision (class label for classification or value for
regression).
4. Make a Prediction: For new, unseen data, the prediction is made by following the tree structure
from the root to a leaf node, applying the decisions (tests) along the way.
Decision Rules:
Definition: A decision rule is a simple "if-then" condition derived from the decision tree.
Example: Consider a decision tree for classifying whether someone will buy a product based on
their age and income:
o If Age ≤ 30 and Income > 50,000, then "Buy Product" (Class 1).
o If Age > 30 and Income ≤ 50,000, then "Don't Buy Product" (Class 0).
These rules are extracted from the paths leading to the leaf nodes in the decision tree.
Let’s consider a small example to illustrate how a decision tree works for classification:
Problem:
Classify whether a person will play tennis based on the weather conditions (Outlook, Temperature,
Humidity, Wind).
Attributes: Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool), Humidity (High,
Low), Wind (Weak, Strong)
Target/Label: PlayTennis (Yes, No)
Dataset:
Outlook Temperature Humidity Wind PlayTennis
1. Step 1: Select the Root Node: The root node is selected based on the best feature to split the
data. In this case, we would use a criterion like Information Gain to decide the best feature.
After calculating the Information Gain, we might find that Outlook is the best feature to split the
data, as it has the highest Information Gain.
2. Step 2: Split Data: The tree branches into three based on the possible values of Outlook (Sunny,
Overcast, Rain).
3. Step 3: Continue Splitting: Now, for each of these branches, we further split based on the next
best feature (say, Humidity or Wind).
o For Sunny, the tree might split based on Humidity: If Humidity = High, predict "No"
(Leaf node), otherwise "Yes".
o For Rain, the tree might split based on Wind: If Wind = Weak, predict "Yes" (Leaf node),
otherwise "No".
4. Step 4: Reach Leaf Nodes: The decision tree will keep splitting until it reaches leaf nodes with a
predicted label.
Outlook
/ | \
Sunny Overcast Rain
/ \ | |
Humidity Yes Wind
/ \ / \
High Low Weak Strong
| | | |
No Yes Yes No
Advantages:
Easy to Interpret: The model is visual and intuitive, making it easy to explain to non-experts.
No Need for Data Preparation: Minimal data preprocessing is required (e.g., no need for
normalization or scaling).
Disadvantages:
Overfitting: Decision trees can easily overfit to training data, especially with deep trees.
Instability: Small changes in the data can result in a completely different tree.
Bias toward Dominant Classes: Decision trees can be biased if the dataset is imbalanced.
Pruning is the process of reducing the size of a decision tree to prevent overfitting and improve
generalization.
Pre-Pruning: Stop the tree's growth early based on conditions like maximum depth or
minimum data at a node.
Post-Pruning: Grow the entire tree and then remove branches that do not improve
performance on a validation set.
5. Decision Rules
Decision Rules are IF-THEN conditions derived from Decision Trees. For example, a rule might
look like:
These rules provide a straightforward way to represent the tree’s logic, offering interpretability
and flexibility in practical applications.
6. Limitations of Decision Trees and Rules
1. Overfitting:
o Decision Trees can grow excessively, capturing noise in the training data.
o Pruning helps mitigate this but may lead to underfitting if over-pruned.
2. Bias Towards Dominant Features:
o Trees can favor features with many levels (e.g., ID numbers) or numeric features
with high variance.
3. Instability:
o Small changes in the training data can lead to entirely different tree structures.
4. Performance on Complex Relationships:
o Decision Trees struggle with datasets where features interact in complex, non-
linear ways.
5. Scalability:
o For very large datasets, tree construction can become computationally expensive.