0% found this document useful (0 votes)

2 views22 pages

Classification (K-Nearest Neighbor)

The document provides an overview of the Nearest Neighbor Classifier, detailing its purpose in classification tasks where unseen records are assigned class labels based on training data. It outlines the steps for implementing the k-nearest neighbors (k-NN) algorithm, including calculating distances, selecting neighbors, and determining class labels through majority voting. Additionally, it discusses considerations such as choosing the value of k, scaling attributes, and alternative distance measures.

Uploaded by

nick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views22 pages

Classification (K-Nearest Neighbor)

Uploaded by

nick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

NEAREST NEIGHBOR

CLASSIFIER
1
CLASSIFICATION: DEFINITION
Given a collection of records (training set )
 Each record contains a set of attributes, one of the
attributes is the class.
Find a model for class attribute as a function of the values of other
attributes.
Goal: previously unseen records should be assigned a class as accurately
as possible.
 A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test
sets, with training set used to build the model and test set
used to validate it.
ILLUSTRATING CLASSIFICATION TASK
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
EXAMPLES OF CLASSIFICATION TASK
Predicting tumor cells as benign or malignant

Classifying credit card transactions

as legitimate or fraudulent

Classifying secondary structures of protein

as alpha-helix, beta-sheet, or random
coil

Categorizing news stories as finance,

weather, entertainment, sports, etc
NEAREST NEIGHBOR CLASSIFIER
Basic idea:
 If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test record

Choose k of the
Training “nearest” records
records

5
NEAREST-NEIGHBOR CLASSIFIER
Unknown record Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

To classify an unknown record:

– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
6
DEFINITION OF NEAREST NEIGHBOR

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

that have the k smallest distance to x

7
NEAREST NEIGHBOR CLASSIFICATION
Compute distance between two points:
 Euclidean distance

d ( p, q ) =  ( pi
i
−q ) i
2

Determine the class from nearest neighbor list

 take the majority vote of class labels among the k-nearest neighbors
 Weight the vote according to distance
 weight factor, w = 1/d2

8
K-NEAREST NEIGHBOR ( K-NN)
ALGORITHM
For every point in dataset:
calculate the distance between X and the current point
sort the distances in increasing order
take k items with lowest distances to X
find the majority class among these items
return the majority class as our prediction for the class of X

9
HOW TO IMPLEMENT K-NN IN PYTHON
1. Handle Data: Open the dataset from CSV and split into test/train
datasets.
2. Similarity: Calculate the distance between two data instances.
3. Neighbors: Locate k most similar data instances.
4. Response: getting the majority voted response from a number of
neighbors.
5. Accuracy: Summarize the accuracy of predictions.
6. Main: Tie it all together.

https://machinelearningmastery.com/tutorial-to-implement-k-nearest-
neighbors-in-python-from-scratch/

10
HOW TO IMPLEMENT K-NN IN PYTHON
1. Handle Data: Open the dataset from CSV and split into
test/train datasets.

11
HOW TO IMPLEMENT K-NN IN PYTHON
2. Similarity: Calculate the distance between two
data instances.

12
HOW TO IMPLEMENT K-NN IN PYTHON
3. Neighbors: Locate k most similar data instances.

13
HOW TO IMPLEMENT K-NN IN PYTHON
4. Response: a function for getting the majority
voted response from a number of neighbors

14
HOW TO IMPLEMENT K-NN IN PYTHON
5. Accuracy: Summarize the accuracy of predictions

15
HOW TO IMPLEMENT K-NN IN PYTHON

16
NEAREST NEIGHBOR CLASSIFICATION…
Choosing the value of k:
 If k is too small, sensitive to noise points
 If k is too large, neighborhood may include points from other
classes

17
NEAREST NEIGHBOR CLASSIFICATION…
Scaling issues
Attributes may have to be scaled to prevent
distance measures from being dominated by one
of the attributes
Example:
 height of a person may vary from 1.5m to 1.8m
 weight of a person may vary from 90lb to 300lb
 income of a person may vary from $10K to $1M

18
NEAREST NEIGHBOR CLASSIFICATION…

Problem with Euclidean measure:

 High dimensional data
 curse of dimensionality
 Can produce counter-intuitive results

111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142

◆ Solution: Normalize the vectors to unit length

19
OTHER DISTANCE MEASURES

Hamming Distance: Calculate the distance between binary vectors

Manhattan Distance: Calculate the distance between real vectors using the sum
of their absolute difference. Also called City Block Distance
Minkowski Distance: Generalization of Euclidean and Manhattan distance

20
NEAREST NEIGHBOR CLASSIFICATION…

k-NN classifiers are lazy learners

It does not build models explicitly
Unlike eager learners such as decision tree
induction and rule-based systems
Classifying unknown records are relatively
expensive

21
REFERENCES
Tan, Steinbach, Kumar, Introduction to Data Mining, 2000
https://machinelearningmastery.com/k-nearest-neighbors-for-
machine-learning/

CESC Q3 Module6
No ratings yet
CESC Q3 Module6
17 pages
Mathis-Jackson 14e PPT Ch09
100% (1)
Mathis-Jackson 14e PPT Ch09
53 pages
English: Quarter 1 - Module 7: Singular To Plural Nouns of Regular Nouns
100% (5)
English: Quarter 1 - Module 7: Singular To Plural Nouns of Regular Nouns
22 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
JSHistory Final Syllabus 2024
No ratings yet
JSHistory Final Syllabus 2024
58 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Understanding History 2nd Ed TG 1
No ratings yet
Understanding History 2nd Ed TG 1
55 pages
Training and Development of
50% (2)
Training and Development of
82 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
24 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
ML Notes
100% (2)
ML Notes
125 pages
Lesson Plan in English 9
75% (4)
Lesson Plan in English 9
2 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Lecture 3 - KNN Algorithm
No ratings yet
Lecture 3 - KNN Algorithm
28 pages
Lecture 10 - The Textbook Method of Teaching
100% (2)
Lecture 10 - The Textbook Method of Teaching
6 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
No ratings yet
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
13 pages
Student Teaching Manual
100% (1)
Student Teaching Manual
31 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
No ratings yet
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
21 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
Lecture 22 - K-Nearnest Neighbours
No ratings yet
Lecture 22 - K-Nearnest Neighbours
11 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN Cookbook
No ratings yet
KNN Cookbook
8 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
Module 12 TTTSC Autosaved
No ratings yet
Module 12 TTTSC Autosaved
31 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Long Test 2 - Prelim
No ratings yet
Long Test 2 - Prelim
6 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Lisa Holm - Lesson Observations
No ratings yet
Lisa Holm - Lesson Observations
8 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
DLL Empowerment Tech Week 19
No ratings yet
DLL Empowerment Tech Week 19
5 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Presentation of KNN-1
No ratings yet
Presentation of KNN-1
18 pages
ML KN
No ratings yet
ML KN
12 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Materi 7.2. K-NN
No ratings yet
Materi 7.2. K-NN
6 pages
KRA 1: Content Knowledge and Pedagogy: DEPED MEMORANDUM No. 004 S. 2022
No ratings yet
KRA 1: Content Knowledge and Pedagogy: DEPED MEMORANDUM No. 004 S. 2022
7 pages
AI Class X
No ratings yet
AI Class X
1 page
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
16 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
No ratings yet
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
11 pages
0 - Performance of Shs Students in Teaching Physics
No ratings yet
0 - Performance of Shs Students in Teaching Physics
65 pages
Week 07
No ratings yet
Week 07
24 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
Challenge Accepted - Grade 10 Moussa Showaib Azad
No ratings yet
Challenge Accepted - Grade 10 Moussa Showaib Azad
9 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
ORGANISATIONAL BEHAVIOUR Assignment
100% (1)
ORGANISATIONAL BEHAVIOUR Assignment
7 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Motion Distance and Displacement
100% (1)
Motion Distance and Displacement
6 pages
Application of The General Concept: Language Learning Materials Development (Elt 3)
No ratings yet
Application of The General Concept: Language Learning Materials Development (Elt 3)
4 pages
TLE TVL SHIELDED METAL ARC WELDING NCI ACTIVITY SHEET NO. 2 2nd Quarter
No ratings yet
TLE TVL SHIELDED METAL ARC WELDING NCI ACTIVITY SHEET NO. 2 2nd Quarter
6 pages
Mahayag-South NLC EOS 2023 ACR-other
No ratings yet
Mahayag-South NLC EOS 2023 ACR-other
26 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
Philosophical Foundation of Special Education
No ratings yet
Philosophical Foundation of Special Education
6 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Lucan Central Colleges
No ratings yet
Lucan Central Colleges
14 pages
Language Teaching Approaches: An Overview Pre-Twentieth-Century Trends: A Brief Survey
No ratings yet
Language Teaching Approaches: An Overview Pre-Twentieth-Century Trends: A Brief Survey
11 pages
Mastery of The Topic Delivery and Presentation of The Topic Instructional Materials Cooperation
No ratings yet
Mastery of The Topic Delivery and Presentation of The Topic Instructional Materials Cooperation
2 pages
GDSC
No ratings yet
GDSC
1 page
Academic Ref Standards
No ratings yet
Academic Ref Standards
6 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Semester Plan Grade 10 (Aim High)
No ratings yet
Semester Plan Grade 10 (Aim High)
6 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mil DLL Nov 4 2019 Grade 12
No ratings yet
Mil DLL Nov 4 2019 Grade 12
3 pages

Classification (K-Nearest Neighbor)

Uploaded by

Classification (K-Nearest Neighbor)

Uploaded by

NEAREST NEIGHBOR

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

Classifying credit card transactions

Classifying secondary structures of protein

Categorizing news stories as finance,

To classify an unknown record:

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

Determine the class from nearest neighbor list

Problem with Euclidean measure:

◆ Solution: Normalize the vectors to unit length

Hamming Distance: Calculate the distance between binary vectors

k-NN classifiers are lazy learners

You might also like