0% found this document useful (0 votes)
7 views

Decision Tree

A Decision Tree is a machine learning model structured as a flowchart, where internal nodes represent decisions on attributes, branches show outcomes, and leaf nodes indicate results. The model predicts outcomes by recursively splitting data based on the best features until reaching leaf nodes, and it can generate decision rules for interpretability. While easy to understand and requiring minimal data preparation, Decision Trees can suffer from overfitting, instability, and bias towards dominant classes.

Uploaded by

jokesodysseyurdu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Decision Tree

A Decision Tree is a machine learning model structured as a flowchart, where internal nodes represent decisions on attributes, branches show outcomes, and leaf nodes indicate results. The model predicts outcomes by recursively splitting data based on the best features until reaching leaf nodes, and it can generate decision rules for interpretability. While easy to understand and requiring minimal data preparation, Decision Trees can suffer from overfitting, instability, and bias towards dominant classes.

Uploaded by

jokesodysseyurdu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

Decision Trees

A Decision Tree is a machine learning model that is represented as a flowchart-like structure, where:

 Internal nodes represent a decision or test on an attribute.


 Branches represent the outcome of the decision or test.
 Leaf nodes represent the result or output, such as a class label (in classification) or a value (in
regression).

The tree is structured in a way that helps decision-making based on the features (attributes) of the data.
The flow from the root to the leaf nodes provides a decision rule that helps predict the class or value for
a given set of features.

How Decision Trees Work:

1. Start at the Root Node: The root node represents the entire dataset. We begin by selecting a
feature (attribute) that best splits the data into different classes or outcomes. This split is
determined by specific criteria like Gini Impurity, Information Gain, or Variance Reduction.
2. Split Data Based on Features: At each internal node, the dataset is split based on the feature
that provides the best separation between classes or predicts the target value the best.
3. Continue Splitting: This process continues recursively at each internal node until we reach the
leaf nodes. These leaf nodes hold the final decision (class label for classification or value for
regression).
4. Make a Prediction: For new, unseen data, the prediction is made by following the tree structure
from the root to a leaf node, applying the decisions (tests) along the way.

Decision Rules:

 Definition: A decision rule is a simple "if-then" condition derived from the decision tree.
 Example: Consider a decision tree for classifying whether someone will buy a product based on
their age and income:
o If Age ≤ 30 and Income > 50,000, then "Buy Product" (Class 1).
o If Age > 30 and Income ≤ 50,000, then "Don't Buy Product" (Class 0).

These rules are extracted from the paths leading to the leaf nodes in the decision tree.

Example of Decision Tree for Classification:

Let’s consider a small example to illustrate how a decision tree works for classification:

Problem:

Classify whether a person will play tennis based on the weather conditions (Outlook, Temperature,
Humidity, Wind).
 Attributes: Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool), Humidity (High,
Low), Wind (Weak, Strong)
 Target/Label: PlayTennis (Yes, No)

Dataset:
Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Low Weak Yes

Rain Cool Low Strong No

Overcast Cool Low Strong Yes

Sunny Mild High Weak No

Sunny Cool Low Weak Yes

Rain Mild Low Weak Yes

Building the Decision Tree:

1. Step 1: Select the Root Node: The root node is selected based on the best feature to split the
data. In this case, we would use a criterion like Information Gain to decide the best feature.

After calculating the Information Gain, we might find that Outlook is the best feature to split the
data, as it has the highest Information Gain.

2. Step 2: Split Data: The tree branches into three based on the possible values of Outlook (Sunny,
Overcast, Rain).
3. Step 3: Continue Splitting: Now, for each of these branches, we further split based on the next
best feature (say, Humidity or Wind).
o For Sunny, the tree might split based on Humidity: If Humidity = High, predict "No"
(Leaf node), otherwise "Yes".
o For Rain, the tree might split based on Wind: If Wind = Weak, predict "Yes" (Leaf node),
otherwise "No".
4. Step 4: Reach Leaf Nodes: The decision tree will keep splitting until it reaches leaf nodes with a
predicted label.

Decision Tree Diagram:

Below is a simplified decision tree for the above example.

Outlook
/ | \
Sunny Overcast Rain
/ \ | |
Humidity Yes Wind
/ \ / \
High Low Weak Strong
| | | |
No Yes Yes No

Explanation of the Tree:

1. Root Node: The first decision is based on Outlook.


o If Outlook is Overcast, predict Yes (PlayTennis).
o If Outlook is Sunny, we move to the next test: Humidity.
 If Humidity is High, predict No.
 If Humidity is Low, predict Yes.
o If Outlook is Rain, the next test is Wind.
 If Wind is Weak, predict Yes.
 If Wind is Strong, predict No.

Advantages:

 Easy to Interpret: The model is visual and intuitive, making it easy to explain to non-experts.
 No Need for Data Preparation: Minimal data preprocessing is required (e.g., no need for
normalization or scaling).

Disadvantages:

 Overfitting: Decision trees can easily overfit to training data, especially with deep trees.
 Instability: Small changes in the data can result in a completely different tree.
 Bias toward Dominant Classes: Decision trees can be biased if the dataset is imbalanced.

2. Generating Decision Trees

To construct a Decision Tree:

1. Choose the Best Attribute:


o Use measures like Information Gain or the Gini Index to identify the attribute that
splits the data most effectively.
2. Recursively Split Data:
o Apply the splitting process to each subset until the stopping criteria are met.
3. Assign Labels or Predictions:
o At the leaf nodes, assign the majority class label (for classification) or the average
value (for regression).

3. Pruning Decision Trees

Pruning is the process of reducing the size of a decision tree to prevent overfitting and improve
generalization.

 Pre-Pruning: Stop the tree's growth early based on conditions like maximum depth or
minimum data at a node.
 Post-Pruning: Grow the entire tree and then remove branches that do not improve
performance on a validation set.

5. Decision Rules

Decision Rules are IF-THEN conditions derived from Decision Trees. For example, a rule might
look like:

 IF age > 30 AND income > 50K THEN approve loan.

These rules provide a straightforward way to represent the tree’s logic, offering interpretability
and flexibility in practical applications.
6. Limitations of Decision Trees and Rules

1. Overfitting:
o Decision Trees can grow excessively, capturing noise in the training data.
o Pruning helps mitigate this but may lead to underfitting if over-pruned.
2. Bias Towards Dominant Features:
o Trees can favor features with many levels (e.g., ID numbers) or numeric features
with high variance.
3. Instability:
o Small changes in the training data can lead to entirely different tree structures.
4. Performance on Complex Relationships:
o Decision Trees struggle with datasets where features interact in complex, non-
linear ways.
5. Scalability:
o For very large datasets, tree construction can become computationally expensive.

You might also like