0% found this document useful (1 vote)
56 views

Decision Tree Ppt

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
56 views

Decision Tree Ppt

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Decision Tree

What is a Decision Tree?


What is a Decision Tree?
A Decision Tree is a supervised machine learning algorithm used for both classification and regression
tasks. It models decisions and their possible consequences as a tree-like structure, where:

● Nodes represent features (attributes).


● Edges represent decision rules.
● Leaves represent outcomes (class labels or continuous values).

Decision trees mimic human decision-making processes, making them intuitive and easy to interpret.

Example:

Imagine you're deciding whether to play tennis based on weather conditions. A decision tree can help by splitting decisions based on
factors like Outlook, Temperature, Humidity, and Wind.
How Do Decision Trees Work?
Building a decision tree involves recursively splitting the dataset into subsets based on feature values. The goal is to create
homogeneous subsets where the target variable is consistent.

Steps to Build a Decision Tree:

1. Select the Best Feature to Split On:


○ Choose the feature that best separates the data into target classes.
2. Create a Decision Node:
○ The selected feature becomes a node, and branches are created for each possible value.
3. Repeat Recursively:
○ For each subset of data, repeat the process to create sub-nodes.
4. Stop When:
○ All samples belong to the same class.
○ No remaining features to split.
○ A predefined depth is reached.
5. Assign Leaf Nodes:
○ Assign the most common class (for classification) or the average value (for regression) to the leaf nodes.
Visualization:
Outlook
├── Sunny
│ ├── Humidity
│ │ ├── High → No
│ │ └── Normal → Yes
├── Overcast → Yes
└── Rain
├── Wind
│ ├── Weak → Yes
│ └── Strong → No
Types of Decision Trees
Decision trees can be categorized based on their output and the algorithms used to construct them.

Classification Trees

● Purpose: Predict categorical class labels.


● Example: Determining whether an email is spam or not.
● Leaf Nodes: Represent class labels.

Regression Trees

● Purpose: Predict continuous numerical values.


● Example: Estimating house prices based on features like size, location, and number of rooms.
● Leaf Nodes: Represent numerical values (e.g., average house price in that node).

Other Variants

● Multi-output Trees: Handle multiple target variables.


● Probabilistic Trees: Assign probabilities to class labels.
● Ensemble Trees: Combine multiple trees (e.g., Random Forests, Gradient Boosted Trees).
Key Metrics for Building Decision Trees
Choosing the right feature and split point is crucial for building an effective
decision tree. Vaious metrics evaluate the quality of splits.
Entropy
Algorithms for Building Decision Trees
ID3 (Iterative Dichotomiser 3)

● Introduced By: Ross Quinlan.


● Use Case: Classification tasks.
● Splitting Criterion: Information Gain (based on entropy).
● Characteristics:
○ Handles categorical data.
○ Prone to overfitting.
○ Does not handle continuous attributes directly.
C4.5 and C5.0
C4.5:

● Introduced By: Ross Quinlan.


● Improvements Over ID3:
○ Uses Gain Ratio instead of Information Gain.
○ Handles both categorical and continuous data.
○ Prunes trees to avoid overfitting.
○ Handles missing values.
C4.5 and C5.0
C5.0:

● Improvements Over C4.5:


○ Faster and more memory-efficient.
○ Supports boosting.
○ Generates smaller trees.
CART (Classification and Regression Trees)
Introduced By: Breiman et al.

Use Case: Both classification and regression.

Splitting Criterion:

● Classification: Gini Impurity.


● Regression: Variance Reduction.

Characteristics:

● Produces binary trees (each node has two children).


● Handles both categorical and continuous data.
● Basis for ensemble methods like Random Forests.
Advantages and Disadvantages
Advantages

1. Interpretability:
○ Easy to visualize and understand.
○ Decisions can be traced back through the tree.
2. No Need for Feature Scaling:
○ Works with both numerical and categorical data.
3. Handles Non-linear Relationships:
○ Can capture complex interactions between features.
4. Feature Selection:
○ Automatically selects the most significant features for splits.
Advantages and Disadvantages
Disadvantages

1. Overfitting:
○ Trees can become overly complex, capturing noise in the data.
○ Pruning and setting depth limits can mitigate this.
2. Instability:
○ Small changes in data can lead to different trees.
3. Bias Towards Features with More Levels:
○ Features with many unique values may dominate splits (mitigated by metrics like Gain Ratio).
4. Performance:
○ Can be less accurate compared to ensemble methods like Random Forests or Gradient Boosted Trees.

You might also like