0% found this document useful (0 votes)
6 views

Week 8 - Understanding the Decision Tree

Uploaded by

Sujal Shrestha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Week 8 - Understanding the Decision Tree

Uploaded by

Sujal Shrestha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Understanding the Decision Trees

CMP4294: Introduction to Artificial Intelligence

Dr Mariam Adedoyin-Olowe
[email protected]
Outlines
• Recap on Classification Techniques
• Overview of Decision Trees
• How Decision Trees Work
• Decision Tree Components
• Advantages of Decision Trees
• Common Use Cases
• Decision Tree Example
• Conclusion
• Classification
– allocate a new data record to one of numerous prior
groups or classes
– We know X and Y belong together, find other things in
same group
Ex. of Classification
Task
• Let’s assume you’re assessing data on individual customers’
financial
backgrounds and purchase history

• You could classify them as “low,” “medium,” or “high” credit


risks

• You could then use these classifications to learn even more


about those customers and make decision on those to give
credit facilities to without endangering the prospect of the
business.
…More Examples of
Classification Tasks
Task Attribute set Class label

Categorising Features extracted Spam or non-spam


email messages from email
message header
and content

Categorising Scores extracted Fail or pass


exam grades from exam results
Decision Tree
• Decision Tree is a decision support tool that applies
a tree-like model of decisions and their possible
consequences/event outcomes - should we play
football today?

• E.g., classify countries based on climate, or classify


cars based on gas mileage.
A Decision Tree
Overview of Decision Trees
Decision trees are a popular machine learning
algorithm used for both classification and regression
tasks.

Visual representation:
A tree-like model that makes decisions based on input
features.
How Decision Trees Work
• Decision-making process: Sequentially split the data based
on features to create a tree structure.

• Nodes: Represent decision points based on specific features.

• Edges: Connect nodes, indicating the possible outcomes.


Description of Decision Rules or Trees

• Intuitive appeal for users

• Presentation Forms
– “if, then” statements (decision rules)
– graphically - decision trees
Decision Tree Components
• Works like a flow chart
• Looks like an upside down tree
• Root Node: The starting point of
the tree.
• Decision Nodes: Nodes that split
the data based on a certain
feature.
• Leaf Nodes: Terminal nodes that
represent the final decision or
outcome.
• Branches: Connect nodes and
represent the decision path.
How DT Works

Tid Refund Marital Taxable • Predict if the loan applicant is cheating


Status Income Cheat
or not
1 Yes Single 125K No
No
• Hard to guess
2 No Married 100K
3 No Single 70K No • Try to understand what are the factors
4 Yes Married 120K No that influence the decision (cheat/not
5 No Divorced 95K Yes cheat)
6 No Married 60K No
• Divide and conquer
7 Yes Divorced 220K No
• Split into subsets:
8 No Single 85K Yes

9 No Married 75K No
• Are they all pure? (all yes or all
10 No Single 90K Yes no)
• If yes: stop
10

Training • If not: repeat


Data
...How DT works

10
How DT works
• Decision tree builds classification in the
form of a tree structure.
• It breaks down a dataset into smaller
subsets while at the same time an
associated decision tree is incrementally
developed.
• The final result is a tree with:
• Internal node denotes a test on an
attribute
• Branch represents an outcome of
the test
• Leaf nodes represent class labels or
class distribution
Root Node, Internal Node and leaf Node

Branch
Decision Tree Classification
Task
Apply Model to Test
Data Test Data

Start from the root of tree. Refund Marital Taxable


Cheat
Status Income

Refund 10

No Married 80K ?
Ye N
s o
NO MarSt
Single, Marrie
Divorced d
TaxInc NO
< >
80K 80K
NO YES

Introduction to Data Mining, 2nd Edition


Apply Model to Test
Data
Test
Data
Refund Marital Taxable
Cheat
Status Income

Refund 10

No Married 80K ?
Ye N
s o
NO MarSt
Single, Marrie
Divorced d
TaxInc NO
< >
80K 80K
NO YES

Introduction to Data Mining, 2nd Edition


Apply Model to Test
Data
Test
Data
Refund Marital Taxable
Cheat
Status Income

Refund 10

No Married 80K ?
Ye N
s o
NO MarSt
Single, Marrie
Divorced d
TaxInc NO
< >
80K 80K
NO YES

Introduction to Data Mining, 2nd Edition


Apply Model to Test
Data
Test
Data
Refund Marita Taxable
Cheat
l
No Statu
Married Income
80K ?
Refund 10

s
Ye N
s o
NO MarSt
Single, Marrie
Divorced d
TaxInc NO
< >
80K 80K
NO YES

Introduction to Data Mining, 2nd Edition


Apply Model to Test
Data
Test
Data
Refund Marital Taxable
Cheat
Status Income

Refund 10

Ye N No Married 80K ?
s o
NO MarSt
Single, Marrie
Divorced d
TaxInc NO
< >
80K 80K
NO YES

Introduction to Data Mining, 2nd Edition


Apply Model to Test
Data
Test
Data
Refund Marit Taxabl
Cheat
al e
Statu Income
Refund 10

s
Ye N No Married 80K ?
s o
NO MarSt
Single, Marrie Assign Cheat to
Divorced d “No”
TaxInc NO
< >
80K 80K
NO YES

Introduction to Data Mining, 2nd Edition


Decision Tree
Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No
Induction
2 No Medium 100K No
algorithm
3 No Small 70K No

4 Yes Medium 120K No Induction


5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training
Set Apply
Model Decision
Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ? Tree


12 Yes Medium 80K ?

13 Yes Large 110K ? Deductio


14 No Small 95K ? n
15 No Large 67K ?
10

Test Introduction to Data Mining, 2nd Edition


Set
Tree
Induction
• Goal: Find the tree that has low classification error in the training data
(training error)
• Finding the best decision tree (lowest training error) is NP-­hard

• Many Algorithms:
• ID3 (passing: Examples_target_Attributes_attributes)
• https://www.youtube.com/watch?app=desktop&v=K-oGwFoCGU0
• SLIQ,SPRINT
• Hunt’s Algorithm (one of the earliest)
• CART
ID3
Algorithm
1. If all examples are of the same class, create a leaf node labelled
by the class
2. If examples in the training set are of different classes,
determine which attribute should be selected as the root of
the current tree
3. Partition the input examples into subsets according to the
values of the selected root attribute
4. Construct a decision tree recursively for each subset
5. Connect the roots for the subtrees to the root of the whole
tree via labelled links
Based on Prof Mohamed Gaber slides
Advantages of Decision Trees

• Easy to understand and interpret.

• Requires little data preprocessing.

• Handles both numerical and categorical data.

• Non-parametric: No assumptions about the underlying


data distribution.
Common Use Cases

• Classification tasks: Identifying categories or classes.


• Regression tasks: Predicting numeric values.
• Decision support systems.
• Risk assessment and management.
Summary

• Decision Tree is a tree-like technique used for both


classification and regression tasks.
• In Decision Tree, the Nodes represent decision points based
on specific features. While the edges connect nodes,
indicating the possible outcomes.

• Handles both numerical and categorical data.

You might also like