Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
Rule-Based
Rule: (Condition) y
– where
Condition is a conjunctions of attributes
y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
(Blood Type=Warm) (Lay Eggs=Yes) Birds
(Taxable Income < 50K) (Refund=Yes) Evade=No
02/14/2018 Introduction to Data Mining, 2nd Edition 3/28/21 12:17 PM 2
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
consequent of a (Status=Single) No
rule Coverage = 40%, Accuracy = 50%
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
Exhaustive rules
– Classifier has exhaustive coverage if it
accounts for every possible combination of
attribute values
– Each record is covered by at least one rule
02/14/2018 Introduction to Data Mining, 2nd Edition 3/28/21 12:17 PM 7
Characteristics of Rule Sets: Strategy 2
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
02/14/2018 Introduction to Data Mining, 2nd Edition 3/28/21 12:17 PM 9
Rule Ordering Schemes
Rule-based ordering
– Individual rules are ranked based on their quality
Class-based ordering
– Rules that belong to the same class appear together
Direct Method:
Extract rules directly from data
Examples: RIPPER, CN2, Holte’s 1R
Indirect Method:
Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
Examples: C4.5rules
R1 R1
R2
Why do we need to
eliminate instances?
R3 R2
– Otherwise, the next rule is
R1
identical to previous rule + + + + +
+ ++ +
Why do we remove class = +
+
+++
+ +
+
+
+
+ +
positive instances? + + + +
+ + +
+ +
– Ensure that the next rule is - - -
- - - - -
different - -
- -
Why do we remove class = -
-
-
-
negative instances? -
-
- -
-
– Prevent underestimating -
accuracy of rule
– Compare rules R2 and R3
in the diagram
Yes: 3
{} No: 4
Refund=No, Refund=No,
Status=Single, Status=Single,
Income=85K Income=90K
(Class=Yes) (Class=Yes)
Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
... Income
> 80K
Refund=No,
Status = Single
Yes: 3 Yes: 2 Yes: 1 Yes: 0 Yes: 3 (Class = Yes)
No: 4 No: 1 No: 0 No: 3 No: 1
Growing a rule:
– Start from empty rule
– Add conjuncts as long as they improve FOIL’s
information gain
– Stop when rule no longer covers negative examples
– Prune the rule immediately using incremental reduced
error pruning
– Measure for pruning: v = (p-n)/(p+n)
p: number of positive examples covered by the rule in
the validation set
n: number of negative examples covered by the rule in
the validation set
– Pruning method: delete any final sequence of
conditions that maximizes v
02/14/2018 Introduction to Data Mining, 2nd Edition 3/28/21 12:17 PM 19
Direct Method: RIPPER
P
No Yes
Q R Rule Set
Give C4.5rules:
Birth? (Give Birth=No, Can Fly=Yes) Birds
(Give Birth=No, Live in Water=Yes) Fishes
Yes No (Give Birth=Yes) Mammals
(Give Birth=No, Can Fly=No, Live in Water=No) Reptiles
Mammals Live In ( ) Amphibians
Water?
Yes No RIPPER:
(Live in Water=Yes) Fishes
Sometimes (Have Legs=No) Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)
Fishes Amphibians Can
Fly? Reptiles
(Can Fly=Yes,Give Birth=No) Birds
Yes No
() Mammals
Birds Reptiles
RIPPER:
PREDICT ED CLASS
Amphibians Fishes Reptiles Birds M ammals
ACT UAL Amphibians 0 0 0 0 2
CLASS Fishes 0 3 0 0 0
Reptiles 0 0 3 0 1
Birds 0 0 1 2 1
M ammals 0 2 1 0 4