Topic01 Classification Basics Jiawei Han Extra
Topic01 Classification Basics Jiawei Han Extra
Classification Basics
[Jiawei Han, Jian Pei, Hanghang Tong. 2022. Data Mining Concepts and Techniques. 4th Ed. Morgan
Kaufmann. ISBN: 0128117605.]
[Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. 2018. Introduction to Data Mining.
2nd Ed. Pearson. ISBN: 0133128903.]
1
Contents
1. Decision Tree
2. Naïve Bayesian Classification
3. Rule-Based Classification
4. Evaluate Classifier Performance
5. K-Nearest Neighbors Classification
2
Introduction – Basic Concepts
3
Introduction – Basic Concepts
4
Introduction – Basic Concepts
5
Introduction – Basic Concepts
6
Introduction – Basic Concepts
7
Introduction – Basic Concepts
8
Introduction – Basic Concepts
9
Introduction – Basic Concepts
10
1. Decision Tree
12
1. Decision Tree
13
Basic Algorithm for Inducing a Decision Tree
Algorithm: Generate_decision_tree.
// Generate decision tree from training tuples of data partition
D.
Input:
• Data partition D is set of training tuples and their associated
class labels
• attribute_list is the set of candidate attributes
• Attribute_selection_method is a procedure to determine the
splitting criterion that “best” partitions the data tuples into
individual classes. This criterion consists of a splitting_attribute
and, possibly, either a split-point or splitting subset.
Output: A decision tree.
14
Basic Algorithm for Inducing a Decision Tree
Method:
1. Create a node N
2. if tuples in D are all of the same class C then
3. return N as leaf node labeled with the class C
4. if attribute_list is empty then
5. return N as a leaf node labeled with the
majority class in D // majority voting
15
Basic Algorithm for Inducing a Decision Tree
16
Basic Algorithm for Inducing a Decision Tree
18
Three Possibilities for Partitioning Tuples
19
Attribute Selection Measures - Notations Used
20
Attribute Selection Measures - Notations Used
21
Information Gain (ID3)
23
Attribute Selection Measures - Notations Used
25
Example 1: Information Gain (ID3)
27
Example 1: Information Gain (ID3)
pi = |Ci,D| / |D|
Info(D) = –[p1log2(p1) + p2log2(p2)]
p1 = 9 / 14, p2 = 5 / 14
28
Example 1: Information Gain (ID3)
29
Example 1: Information Gain (ID3)
30
Example 1: Information Gain (ID3)
31
Example 1: Information Gain (ID3)
33
Example 1: Information Gain (ID3)
34
Example 1: Information Gain (ID3)
35
Example 1: Information Gain (ID3)
36
Reminder: Information Gain (ID3)
37
Gain Ratio (C4.5)
38
Gain Ratio (C4.5)
39
Example 2: Gain Ratio (C4.5)
40
Example 2: Gain Ratio (C4.5)
41
Example 2: Gain Ratio (C4.5)
42
Example 2: Gain Ratio (C4.5)
43
Reminder
44
Gini index (CART)
46
Gini index (CART)
48
Example 3: Gini index (CART)
49
Example 3: Gini index (CART)
i =1
50
Example 3: Gini index (CART)
51
Example 3: Gini index (CART)
52
Example 3: Gini index (CART)
53
Example 3: Gini index (CART)
54
Example 3: Gini index (CART)
55
Example 3: Gini index (CART)
56
Example 3: Gini index (CART)
57
Example 3: Gini index (CART)
58
Reminder
59
Other Attribute Selection Measures
60
Other Attribute Selection Measures
61
Scalability of Decision Tree Induction
62
Scalability of Decision Tree Induction
63
AVC-sets for the Tuple Data of Table 8.1
64
Scalability of Decision Tree Induction
65
Scalability of Decision Tree Induction
1. Decision Tree
2. Naïve Bayesian Classification
3. Rule-Based Classification
4. Evaluate Classifier Performance
5. K-Nearest Neighbors Classification
68
2. Naïve Bayesian Classification
69
2. Bayes’ Theorem
72
2. Bayes’ Theorem
74
2. Naïve Bayesian Classification
75
2. Naïve Bayesian Classification
76
2. Naïve Bayesian Classification
77
2. Naïve Bayesian Classification
79
2. Naïve Bayesian Classification
80
2. Naïve Bayesian Classification
81
2. Naïve Bayesian Classification
82
2. Naïve Bayesian Classification
84
2. Naïve Bayesian Classification
85
2. Naïve Bayesian Classification
• Step 2: Compute
• To compute P(X | Ci), for i = 1, 2, we compute the
following conditional probabilities.
• Remainder: P(X | Ci) = P(x1 | Ci) × P(x2 | Ci) × ...
× P(xn | Ci) is computed from the training tuples in
D, where xk is the value of attribute Ak for the given
tuple X and P(xk | Ci) =
(e.g., X = (x1 = youth, x2 = medium, x3 = yes, x4 =
fair))
86
2. Naïve Bayesian Classification
90
2. Naïve Bayesian Classification
91
2. Naïve Bayesian Classification
• Step 5: Classification
• We have P(X | C1)P(C1) = 0.028 > P(X | C2)P(C2)
= 0.007. Thus, NBC predicts buys_computer = yes
for the given tuple X (i.e., unseen X is
classified/labeled as C1).
/* X = (age = youth,
income = medium,
student = yes,
credit_rating = fair) */
92
2. Naïve Bayesian Classification
93
2. Naïve Bayesian Classification
94
2. Naïve Bayesian Classification
95
2. Naïve Bayesian Classification
96
2. Naïve Bayesian Classification
97
Contents
1. Decision Tree
2. Naïve Bayesian Classification
3. Rule-Based Classification
4. Evaluate Classifier Performance
5. K-Nearest Neighbors Classification
98
3. Rule-Based Classification
99
3. Rule-Based Classification
100
3. Rule-Based Classification
101
3. Rule-Based Classification
102
3. Rule-Based Classification
103
3. Rule-Based Classification
104
3. Rule-Based Classification
105
3. Rule-Based Classification
108
Conflict Resolution
109
Conflict Resolution – Size Ordering
110
Conflict Resolution – Rule Ordering
113
Default Rule
114
Rule Extraction from Decision Tree
116
Rule Extraction from Decision Tree
118
Rule Induction: Sequential Covering Algorithm
119
Rule Induction: Sequential Covering Algorithm
120
Rule Induction: Sequential Covering Algorithm
121
Rule Induction: Sequential Covering Algorithm
Method:
1. Rule_set = {} // initial set of rules learned is empty
2. for each class c do
3. repeat
4. Rule = Learn_One_Rule(D, Att_vals, c)
5. remove tuples covered by Rule from D
6. Rule_set = Rule_set + Rule // add new rule
// to rule set
7. until terminating condition
8. endfor
9. return Rule_set
Figure 8.10 Basic sequential covering algorithm.
122
Rule Induction: Sequential Covering Algorithm
123
Rule Induction: Sequential Covering Algorithm
124
Rule Induction: Sequential Covering Algorithm
• Example:
125
Rule Quality Measures/Metrics
accuracy(R1) = 95%
.
accuracy(R2) = 100%
128
Rule Quality Measures: FOIL_Gain
132
Rule Quality Measures: Likelihood Ratio
133
Rule Quality Measures: Laplace
1. Decision Tree
2. Naïve Bayesian Classification
3. Rule-Based Classification
4. Evaluate Classifier Performance
5. K-Nearest Neighbors Classification
135
4. Evaluate Classifier Performance
136
4. Evaluate Classifier Performance
137
Summary of Classifier Evaluation Measures
138
4. Evaluate Classifier Performance
139
4. Evaluate Classifier Performance
141
4. Evaluate Classifier Performance
142
4. Evaluate Classifier Performance
sensitivity = TP / P (8.23)
144
Example 1
146
Example 1
147
Example 1
150
Example 2
151
Example 2
152
Contents
1. Decision Tree
2. Naïve Bayesian Classification
3. Rule-Based Classification
4. Evaluate Classifier Performance
5. K-Nearest Neighbors Classification
153
5. K-Nearest Neighbors Classification
154
5. K-Nearest Neighbors Classification
157
5. K-Nearest Neighbors Classification
158
5. K-Nearest Neighbors Classification
160
5. K-Nearest Neighbors Classification
161
5. K-Nearest Neighbors Classification
163
5. K-Nearest Neighbors Classification
164
5. K-Nearest Neighbors Classification
165
Data Normalization (or Feature Scaling)
166
Data Normalization (or Feature Scaling)
167
Data Normalization (or Feature Scaling)
168
Data Normalization (or Feature Scaling)
169
Data Normalization (or Feature Scaling)
170
Data Normalization (or Feature Scaling)
171
5. K-Nearest Neighbors Classification
173
Exercises
174
Exercises
175
Exercises
176
Exercises
177
Exercises
178
Exercises
179
Exercises
7. You are given a training dataset D shown in the table below for a
binary classification problem, where MP = Magazine Promotion, WP
= Watch Promotion, LIP = Life Insurance Promotion, CCI = Credit
Card Insurance. .
The class-labeled training dataset D for credit card customers
181
Exercises
182
References
183
References
184
References
185
Extra Slides
186
Visualize a Decision Tree – Iris Data Set
• Iris data set contains 150 tuples (50 setosa tuples (class value = 0),
50 versicolor tuples (class value = 1), 50 virginica tuples (class value
= 2))
187
Visualize a Decision Tree
188
Visualize a Decision Tree
189
Visualize a Decision Tree
# Load data
iris = load_iris()
# extract petal length and width
X = iris.data[:, 2:]
y = iris.target
191
Visualize a Decision Tree
192
Visualize a Decision Tree
# Visualize
export_graphviz(
tree_clf,
out_file = "iris_tree.dot",
feature_names = iris.feature_names[2:],
# use petal length and width only
class_names = iris.target_names,
rounded = True,
filled = True)
193
Extra Slides
194
Extra Slides
3-fold cross-validation
195
Extra Slides
3. Random sampling
196
Extra Slides
4. Bootstrapping
197
Extra Slides - Weka
198