Decision Tree
Decision Tree
Decision Trees
Gerhard Held
Product Manager Analytical Applications
Outline
1. The SAS Data Mining Solution
2. Introduction to Decision Trees
3. Assessment Measures
4. Overview of DataSplits Tool within SAS
Data Mining Solution
???
???
???
???
???
??? ??? ??? ??? ???
Business Problem
Business Problem
Introduction to Decision
Trees
Searching for Segments
What is a Decision Tree?
Common Applications
Tree Methodologies
All Observations
4,000 Resp - 29%
10,000 NonResp - 71%
Split on AGE
TEENS <20
3,000 Resp - 60%
2,000 NonResp - 40%
ADULTS >=20
1,000 Resp - 11%
8,000 NonResp - 89%
Bought
Search
Searchfor
forpredictor
predictorwith
withlarge
largedifference
difference
in
inresponse
responserate
rateover
overits
itscategories.
categories.
Bought
100%
80%
60%
Young: No or Yes?
40%
20%
0%
No Reply
No
Bought
No
No Reply
Yes
Bought
Yes
Young: No or Yes?
40%
20%
0%
No Reply Bought
No
No
No Reply Bought
Yes
Yes
100%
80%
60%
40%
UK: No or Yes?
20%
0%
No Repy Bought
No
No
No Reply Bought
Yes
Yes
All Observations
4,000 Resp - 29%
10,000 NonResp - 71%
Split on AGE
TEENS <20
3,000 Resp - 60%
2,000 NonResp - 40%
ADULTS >=20
1,000 Resp - 11%
8,000 NonResp - 89%
TEENS <20
ADULTS >=20
-1-
Split on Country
UK only
Continent
0 Resp - 0%
1,000 NonResp - 100%
- 2-
-3-
Startups
Foreign Economical Cars
Junk Food
Segment 2
ages: 26-35
income: 100K
DINKS
Sports Cars
Fashionable Restaurants
New Applications:
Variable Selection
Include
AGE
COUNTRY
AGE
Exclude
COUNTRY
FINANCIAL ASSETS
INCOME
Variable Creation
Include
AGE
STATE
AGE
Create
COUNTRY
TEENS
ADULTS
ADULTS not UK
ADULTS in UK
Tree Methodologies
Methodologies -Tree
The CART
CART Family
Family
The
CART, Salford Systems, S-Plus (all 1980s)
no
is x(3)<3.2?
yes
class 2
no
class 1
is x(1)<7.6?
yes
class 3
no
class 3
Model Assessment
Classification
Sensitivity
Lift
Profit and Loss Equations
Model Assessment
Alternatives
Proportion correctly predicted
(classified)
Proportion responding to Promotion
Expected Profit
Classification
PREDICTED
O
B
S
E
R
V
E
D
Default
Default
Paid
Paid
10
10
20
40
40
10
50
60
( 10 + 40) / 60 = 50 / 60
Sensitivity
PREDICTED
O
B
S
E
R
V
E
D
Default
Paid
10
10
20
40
40
10
50
60
Default
Paid
Sensitivity: 10 / 20
Sensitivity and
Classification
PREDICTED
O
B
S
E
R
V
E
D
Default
Paid
Default
20
20
Paid
20
20
40
40
20
60
Sensitivity: 20 / 20
( 20 + 20) / 60 = 40 / 60
Specificity
PREDICTED
O
B
S
E
R
V
E
D
Default
Paid
Default
20
20
Paid
20
20
40
40
20
60
Specificity = 20 / 40
The Alternative
of Sensitivity
Lift
PREDICTED
O
B
S
E
R
V
E
D
Respond
No
Respond
20
20
No
20
20
40
40
20
60
Alternative Objectives
Classification
Proportion Correct
Diagonal
Sensitivity
Proportion correct of an observed Event
Row
Lift
Proportion correct of a predicted Event
Column
Trade Offs
Classification
none
Sensitivity
Specificity
Lift
Size of Subset (Node)
Icon
PROC DATASPLIT
Methods Calls (non-display AF Object
Class)
Methodology
Nominal and Interval Targets and Inputs
Multiway Splits on all Input Types
Up to 32000 Values for a nominal Input
Allows User defined Split Criteria
General Profit and Loss Functions
Suited for Rare Events
Overview of DataSplits
Tool - Advanced Options
Overview of DataSplits
Tool - Assessment
Segmentation Using
Decision Trees
Thank you for your Attention!
Questions?