0% found this document useful (0 votes)

3 views

Class Adv Classification II

The document discusses classification techniques, focusing on K-nearest neighbors (K-NN) and rule-based classifiers. K-NN requires stored records, a distance metric, and a value for k to classify unknown records based on the majority vote of nearest neighbors, while rule-based classifiers utilize 'if...then...' rules for classification. It also addresses challenges such as scaling issues, the curse of dimensionality, and methods for building and simplifying classification rules.

Uploaded by

fakertoolzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Class Adv Classification II

Uploaded by

fakertoolzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Classification

• K-NN Classifier
• Rule Based Classifier
Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number
of nearest neighbors to
retrieve
To classify an unknown record:
– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown
record (e.g., by taking
majority vote)
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

that have the k smallest distance to x
Nearest Neighbor Classification…
• Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes

X
Nearest Neighbor Classification
• Compute distance between two points:
– Euclidean distance

d ( p, q )   ( pi
i
 q) i
2

Determine the class from nearest neighbor list

– take the majority vote of class labels among the k-nearest neighbors
– Weigh the vote according to distance
• weight factor, w = 1/d2
Distance-Weighted Nearest Neighbor
Algorithm
• Assign weights to the neighbors based on their ‘distance’
from the query point
– Weight ‘may’ be inverse square of the distances
All training points may influence a particular instance
(a) Classify the data point x = 5.0 according to its 1-, 3-, 5-,
and 9-nearest neighbors (using majority vote).
(b) Repeat the previous analysis using the distance-weighted
voting approach
Nearest Neighbor Classification…

• Scaling issues
– Attributes may have to be scaled to prevent distance
measures from being dominated by one of the attributes
– Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 90lb to 300lb
• income of a person may vary from $10K to $1M
Nearest Neighbor Classification…

• Problem with Euclidean measure:

– High dimensional data
• curse of dimensionality
– Can produce counter-intuitive results

111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142

 Solution: Normalize the vectors to unit

length
Nearest neighbor Classification…

• k-NN classifiers are lazy learners

– It does not build models explicitly
– Unlike eager learners such as decision tree induction and rule-
based systems
• Classifying unknown records are relatively expensive
– Naïve algorithm: O(n)
– Need for structures to retrieve nearest neighbors fast.
• The Nearest Neighbor Search problem.
Rule-Based Classifier

• Classify records by using a collection of “if…then…” rules

• Rule: (Condition)  y
– where
• Condition is a conjunctions of attributes
• y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
• (Blood Type=Warm)  (Lay Eggs=Yes)  Birds
• (Taxable Income < 50K)  (Refund=Yes)  Evade=No

12
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds

R1: (Give Birth = no)  (Can Fly = yes)  Birds

• A rule r covers an instance x if the attributes of the instance

satisfy the condition of the rule

R1: (Give Birth = no)  (Can Fly = yes)  Birds

Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?

The rule R1 covers a hawk => Bird

The rule R3 covers the grizzly bear => Mammal
14
How does Rule-based Classifier Work?
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians

Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?

A lemur triggers rule R3, so it is classified as a mammal

A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules

15
Rule Coverage and Accuracy
Tid Refund Marital Taxable
Status Income Class

• Coverage of a rule: 1 Yes Single 125K No

– Fraction of records that 2 No Married 100K No

satisfy the antecedent of a 3 No Single 70K No

4 Yes Married 120K No
rule
5 No Divorced 95K Yes
• Accuracy of a rule: 6 No Married 60K No
– Fraction of records that 7 Yes Divorced 220K No

satisfy both the 8 No Single 85K Yes

antecedent and 9 No Married 75K No

consequent of a rule 10
10 No Single 90K Yes

(Status=Single)  No
Coverage = 40%, Accuracy = 50%

16
Characteristics of Rule-Based Classifier
• Mutually exclusive rules
– Classifier contains mutually exclusive rules if the rules are
independent of each other
– Every record is covered by at most one rule

• Exhaustive rules
– Classifier has exhaustive coverage if it accounts for every
possible combination of attribute values
– Each record is covered by at least one rule

17
From Decision Trees To Rules
Classification Rules
(Refund=Yes) ==> No
Refund
Yes No (Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
NO Marita l
Status (Refund=No, Marital Status={Single,Divorced},
{Single,
{Married} Taxable Income>80K) ==> Yes
Divorced}
(Refund=No, Marital Status={Married}) ==> No
Taxable NO
Income
< 80K > 80K

NO YES
Rules are mutually exclusive and exhaustive
Rule set contains as much information as the
tree

18
Rules Can Be Simplified
Tid Refund Marital Taxable
Status Income Cheat
Refund
Yes No 1 Yes Single 125K No
2 No Married 100K No
NO Marita l
3 No Single 70K No
{Single, Status
{Married} 4 Yes Married 120K No
Divorced}
5 No Divorced 95K Yes
Taxable NO
Income 6 No Married 60K No

< 80K > 80K 7 Yes Divorced 220K No

8 No Single 85K Yes
NO YES
9 No Married 75K No
10 No Single 90K Yes
10

Initial Rule: (Refund=No)  (Status=Married)  No

Simplified Rule: (Status=Married)  No
19
Effect of Rule Simplification
• Rules are no longer mutually exclusive
– A record may trigger more than one rule
– Solution?
• Ordered rule set
• Unordered rule set – use voting schemes

• Rules are no longer exhaustive

– A record may not trigger any rules
– Solution?
• Use a default class

20
Ordered Rule Set
• Rules are rank ordered according to their priority
– An ordered rule set is known as a decision list
• When a test record is presented to the classifier
– It is assigned to the class label of the highest ranked rule it has triggered
– If none of the rules fired, it is assigned to the default class

R1: (Give Birth = no)  (Can Fly = yes)  Birds

R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm) 
Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
21
Rule Ordering Schemes
• Rule-based ordering
– Individual rules are ranked based on their quality
• Class-based ordering
– Rules that belong to the same class appear together

Rule-based Ordering Class-based Ordering

(Refund=Yes) ==> No (Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Single,Divorced},

Taxable Income<80K) ==> No Taxable Income<80K) ==> No

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Married}) ==> No

Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Single,Divorced},
(Refund=No, Marital Status={Married}) ==> No Taxable Income>80K) ==> Yes

22
Building Classification Rules

• Direct Method:
• Extract rules directly from data
• e.g.: RIPPER, CN2, Holte’s 1R

• Indirect Method:
• Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
• e.g: C4.5rules

23
Direct Method: Sequential Covering

1. Start from an empty rule

2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion is met

24
Example of Sequential Covering

(i) Original Data (ii) Step 1

26
Example of Sequential Covering…

R1 R1

(iii) Step 2 (iv) Step 3

27
28
29
30
31
When to Stop Building a Rule
• When the rule is perfect, i.e. accuracy
=1
• When increase in accuracy gets
below a given threshold
• When the training set cannot be split
any further

Mimo Matlab Code
75% (4)
Mimo Matlab Code
7 pages
Class Adv Classification I
No ratings yet
Class Adv Classification I
39 pages
Rule Based Classification
No ratings yet
Rule Based Classification
28 pages
Rule Based Classification
No ratings yet
Rule Based Classification
34 pages
11 - ML - Rule-Based Classifier
No ratings yet
11 - ML - Rule-Based Classifier
18 pages
APznzaaOoSfWDDs6MOckIGqH4XP2VHeq48_kGcBsO4AMqmGggWfQprpvqUi7un5sx3f3JT83ORHggRKjkAZyq6KG7QYiz-aJNIrQFyYcfM2CctUVKMqMQatTTYqCq-D30cEe2eQkpsv7eD8UdUymTe-_Z6Rzow7Ed8jsByqz8R-ymgT6HWk-iek4A3yLZZ7hpyO0mjabXEk
No ratings yet
APznzaaOoSfWDDs6MOckIGqH4XP2VHeq48_kGcBsO4AMqmGggWfQprpvqUi7un5sx3f3JT83ORHggRKjkAZyq6KG7QYiz-aJNIrQFyYcfM2CctUVKMqMQatTTYqCq-D30cEe2eQkpsv7eD8UdUymTe-_Z6Rzow7Ed8jsByqz8R-ymgT6HWk-iek4A3yLZZ7hpyO0mjabXEk
65 pages
Rules
No ratings yet
Rules
26 pages
Lect12-Rule Based Classifier
No ratings yet
Lect12-Rule Based Classifier
27 pages
4 Rules
No ratings yet
4 Rules
23 pages
CSC4316 10
No ratings yet
CSC4316 10
28 pages
Rule Based Classifier
No ratings yet
Rule Based Classifier
20 pages
Classification: Alternative Techniques: Salvatore Orlando
No ratings yet
Classification: Alternative Techniques: Salvatore Orlando
52 pages
Lesson 3.2 - Supervised Learning Evaluation
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation
31 pages
Lec06 Classification NaiveBayes RuleBased
No ratings yet
Lec06 Classification NaiveBayes RuleBased
44 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
Chapter 4: Classification & Prediction
100% (1)
Chapter 4: Classification & Prediction
54 pages
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
0% (1)
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
88 pages
Class Adv Classification III
No ratings yet
Class Adv Classification III
54 pages
Chap5 Alternative Classifi1
No ratings yet
Chap5 Alternative Classifi1
67 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
88 pages
Data Mining Alternative Classification Notes
No ratings yet
Data Mining Alternative Classification Notes
72 pages
Lecture 7 - Classification (Rules and Naïve Bayes)
100% (1)
Lecture 7 - Classification (Rules and Naïve Bayes)
19 pages
IDS7
No ratings yet
IDS7
50 pages
Datamining Unit 3
No ratings yet
Datamining Unit 3
47 pages
Rule-Based Classification (1)
No ratings yet
Rule-Based Classification (1)
43 pages
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
No ratings yet
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
44 pages
Data Mining - Classification: Alternative Techniques
100% (1)
Data Mining - Classification: Alternative Techniques
120 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
28 pages
Module4 QB 1
No ratings yet
Module4 QB 1
26 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Classification Slides
No ratings yet
Classification Slides
147 pages
Ch 4_Classification rule_based Global Edition edited Oct 17, 2024
No ratings yet
Ch 4_Classification rule_based Global Edition edited Oct 17, 2024
28 pages
Lecture 3 Basics of Clssification
No ratings yet
Lecture 3 Basics of Clssification
53 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Rule Coverage and Accuracy
No ratings yet
Rule Coverage and Accuracy
43 pages
Chap4 Rule Based (1)
No ratings yet
Chap4 Rule Based (1)
21 pages
Rule Based Classification
No ratings yet
Rule Based Classification
42 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
14 pages
Chap4 Naive Bayes
No ratings yet
Chap4 Naive Bayes
14 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
Lec4 PDF
No ratings yet
Lec4 PDF
14 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
IME672 - Lecture 48
No ratings yet
IME672 - Lecture 48
21 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
27 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
DM 05 04 Rule-Based Classification
No ratings yet
DM 05 04 Rule-Based Classification
72 pages
DM - 05 - 04 - Rule-Based Classification PDF
No ratings yet
DM - 05 - 04 - Rule-Based Classification PDF
72 pages
Lecture3 2020classification PDF
No ratings yet
Lecture3 2020classification PDF
124 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Module 4
No ratings yet
Module 4
41 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
DM 04 04 Rule-Based Classification
No ratings yet
DM 04 04 Rule-Based Classification
72 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
Lecture13-Ch8-ClassBasic-Part3
No ratings yet
Lecture13-Ch8-ClassBasic-Part3
23 pages
Class10 14 PatternClassification - 13 24sept2019
No ratings yet
Class10 14 PatternClassification - 13 24sept2019
50 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
M6 Classification Alternative
No ratings yet
M6 Classification Alternative
145 pages
The Scorpion Keeper's Guide
From Everand
The Scorpion Keeper's Guide
Lady Gracious George
No ratings yet
Compiler Design (BTCS601) - MST QP
No ratings yet
Compiler Design (BTCS601) - MST QP
5 pages
Chapter 3, ND Vohra Book Linear Programming II:: Simplex Method
No ratings yet
Chapter 3, ND Vohra Book Linear Programming II:: Simplex Method
38 pages
IS 7118 Unit-2 Regular Expressions
No ratings yet
IS 7118 Unit-2 Regular Expressions
69 pages
Lec-01 OOP Data Structures Overview
No ratings yet
Lec-01 OOP Data Structures Overview
10 pages
CH #2 Solved Exercise
No ratings yet
CH #2 Solved Exercise
3 pages
IR ch4 - Inverted-Index
No ratings yet
IR ch4 - Inverted-Index
44 pages
Decidable Regular Context Free
No ratings yet
Decidable Regular Context Free
16 pages
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
No ratings yet
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
1 page
Floyd Warshall Algorithm (Python) - Dynamic Programming - FavTutor
No ratings yet
Floyd Warshall Algorithm (Python) - Dynamic Programming - FavTutor
4 pages
Single-Dimensional Arrays: 7.2.1 Declaring Array Variables
No ratings yet
Single-Dimensional Arrays: 7.2.1 Declaring Array Variables
23 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
Examen Deep Learning
100% (1)
Examen Deep Learning
8 pages
Final: CS 188 Spring 2014 Introduction To Artificial Intelligence
No ratings yet
Final: CS 188 Spring 2014 Introduction To Artificial Intelligence
28 pages
Quicksort On Singly Linked List 14. Iterative Quick Sort 15. Merge Sort For Linked List
No ratings yet
Quicksort On Singly Linked List 14. Iterative Quick Sort 15. Merge Sort For Linked List
21 pages
Final Exam - Design and Analysis of Algorithms - Fall 2010 Semester - Sol
No ratings yet
Final Exam - Design and Analysis of Algorithms - Fall 2010 Semester - Sol
7 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Habib
No ratings yet
Habib
1 page
Hill Climbing Search-1
No ratings yet
Hill Climbing Search-1
6 pages
5 More On Optimum Design Concepts:: Optimality Conditions
No ratings yet
5 More On Optimum Design Concepts:: Optimality Conditions
88 pages
Assignment No-4 Subject: Cse-202: Object Oriented Programming
No ratings yet
Assignment No-4 Subject: Cse-202: Object Oriented Programming
9 pages
Binary Addition and Subtraction
No ratings yet
Binary Addition and Subtraction
3 pages
Submitted By: Submitted To:: Archie Jamwal Mrs. Ruchi Gupta
No ratings yet
Submitted By: Submitted To:: Archie Jamwal Mrs. Ruchi Gupta
18 pages
Selection Quick
No ratings yet
Selection Quick
13 pages
3.3SEM - IPCC - DS - CS322I2R and Lab Programs
No ratings yet
3.3SEM - IPCC - DS - CS322I2R and Lab Programs
11 pages
07 DP Coin Change Problem
No ratings yet
07 DP Coin Change Problem
18 pages
Instant Download Models and Algorithms of Time Dependent Scheduling 2nd Edition Stanislaw Gawiejnowicz PDF All Chapters
100% (1)
Instant Download Models and Algorithms of Time Dependent Scheduling 2nd Edition Stanislaw Gawiejnowicz PDF All Chapters
62 pages
EE224 Handout Fast Adders: 1 The Problem
No ratings yet
EE224 Handout Fast Adders: 1 The Problem
6 pages
Optimization Syllabus
No ratings yet
Optimization Syllabus
2 pages
Ass 1
No ratings yet
Ass 1
9 pages

Class Adv Classification II

Uploaded by

Class Adv Classification II

Uploaded by

Classification

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

Determine the class from nearest neighbor list

• Problem with Euclidean measure:

 Solution: Normalize the vectors to unit

• k-NN classifiers are lazy learners

• Classify records by using a collection of “if…then…” rules

R1: (Give Birth = no)  (Can Fly = yes)  Birds

• A rule r covers an instance x if the attributes of the instance

R1: (Give Birth = no)  (Can Fly = yes)  Birds

The rule R1 covers a hawk => Bird

A lemur triggers rule R3, so it is classified as a mammal

• Coverage of a rule: 1 Yes Single 125K No

– Fraction of records that 2 No Married 100K No

satisfy the antecedent of a 3 No Single 70K No

satisfy both the 8 No Single 85K Yes

antecedent and 9 No Married 75K No

< 80K > 80K 7 Yes Divorced 220K No

Initial Rule: (Refund=No)  (Status=Married)  No

• Rules are no longer exhaustive

R1: (Give Birth = no)  (Can Fly = yes)  Birds

Rule-based Ordering Class-based Ordering

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Single,Divorced},

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Married}) ==> No

1. Start from an empty rule

(i) Original Data (ii) Step 1

(iii) Step 2 (iv) Step 3

You might also like