0% found this document useful (0 votes)

34 views

Lecture7 KNN

K-Nearest Neighbor (KNN) is a supervised learning algorithm that can be used for both classification and regression tasks. It works by finding the K closest training examples in the feature space and predicting the label based on a majority vote of the labels of the K nearest neighbors. The document provides examples of using a KNN classifier to classify fruits into different categories based on their features like width, height and color score. It describes splitting the dataset into training and test sets, extracting features, finding the nearest neighbors and making predictions on new samples.

Uploaded by

raverita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Lecture7 KNN

Uploaded by

raverita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

K-Nearest Neighbor (KNN) Classifier and Regressor

Liang Liang
Categories of Machine Learning
• Unsupervised Learning
Clustering: k-means, GMM
Dimensionality reduction (representation learning): PCA, isomap, etc
to learn a meaningful representation in a lower dimensional space
Probability Density Estimation: GMM, KDE
• Supervised Learning
to model the relationship between measured features of data and
some labels associated with the data
• Reinforcement Learning
the goal is to develop a model (agent) that improves its performance
based on interactions with the environment
Supervised Learning: classification and regression
Input x Output y

Classifier 0 Class label

Input x Output y
Feature Vector
of a house
Regressor Sale Price
Target (value)
Supervised Learning: classification and regression

Input x Machine Learning Model Output y

Dataset:
input-output pairs, 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , 𝑥3 , 𝑦3 , . . , (𝑥𝑁 , 𝑦𝑁 )
Binary Classification
• Data points are from two classes. A data point only belongs to one class.

label of the data point x

Classifier y = 0 male
y = 1 female

or
y = -1 male
y = 1 female
Multiclass Classification
• Data points are from many classes.
• A data point only belongs to one class.

label of the data point x

label = ?
10 possible labels:
Classifier (0, 1, 2, 3, 4, 5 ,6, 7, 8, 9)
Multiclass Classification
𝑦 = 𝑔(𝑥)
Classifier y

y is the class label of the data point x

y=0
Multiclass Classification: one-hot-encoding
• Data points are from many classes. A data point only belongs to one class.
one-hot encoding: the output from a classifier is a vector, length= # of classes
label = 0 label = 1 label = 2 label = 9
𝑦0 1 0 0 0
𝑦1 0 1 0 0
𝑦2 0 0 1 0
𝑦3 0 0 0 0
𝑦4 0 0 0 0
𝑦= 𝑦
5 0 0 0 0
𝑦6 0 0 0 0
𝑦7 0 0 0 0
𝑦8 0 0 0 0
𝑦9 0 0 0 1
Output of a classifier could be real numbers
soft label hard label
0.8 1
Example: 0 0
0 0
convert
0 0
to binary
0 0 class label = 0
Classifier 0 0
0 0
10 possible labels:
0 0
(0, 1, 2, 3, 4, 5 ,6, 7, 8, 9)
0.1 0
0.1 0

"0.8" is usually interpreted as the "confidence" of the classifier about {𝑦𝑜𝑢𝑡 = 1}

Multi-Label Classification
• The data points are from many classes.
• A data point may belong to more than one class.
𝑦0 = 1, it is a cat
𝑦0 = 0, it is not a cat
𝑦1 = 1, it is cute
𝑦1 = 0, it is not cute

𝑦0 1 cat
Classifier 𝑦= 𝑦 =
1 1 cute

It is a cute cat
classifiers
• Many types of classifiers:
KNN classifier (K-Nearest Neighbor)
Naïve Bayes classifier
Decision Tree classifier
Random Forest classifier
SVM classifier (Support Vector Machine)
Neural Network classifier

• Each type is associated with specific Learning Algorithms

A Classification Task

• A simple task: classify a fruit into four classes/categories

{1:apple, 2:mandarin, 3:orange, 4:lemon},
note: class-3 contains oranges that are not mandarin oranges

A sample/instance

perform classification class/fruit label

1:apple
2:mandarin
3:orange
4:lemon
A sample/instance
(a 2D image)
class label
1:apple
2:mandarin
Human Brain features It is an apple! 3:orange
4:lemon

Model-1 features Model-2 It is an apple!

It is not easy to develop Model-1 for feature extraction

It is relatively easy to develop Model-2 for classification, given the features of the sample.

Now, let's develop Model-2 for classification.

1:apple
Feature 2:mandarin
Classifier Class Label 3:orange
Vector
4:lemon

The feature vector of a fruit sample: [width, height, color_score ]

color_score is a number (0~1) to describe the color
The Fruit Dataset
The fruit dataset was created by Dr. Iain Murray at the University
A bucket of Edinburgh. He bought a few dozen oranges, lemons and apples,
and recorded their features in a table.
of fruits
4 classes: {1:apple, 2:mandarin, 3:orange, 4:lemon}

The fruit dataset (a table)

Each row contains the information of a fruit sample/instance
fruit label fruit_name subtype mass (g) width (cm) height (cm) color_score
1 apple granny_smith 192 8.4 7.3 0.55
4 lemon spanish_belsan 194 7.2 10.3 0.70

In this table: what is input x? what is output y?

http://usapple.org/the-industry/apple-varieties/
In total, there are 59 fruit samples (i.e. 59 rows) in the table

4 classes: {1:apple, 2:mandarin, 3:orange, 4:lemon}

Select 3 features: width, height, color_score
Split data (59) into a training set (80%, 47) and a testing set (20%, 12)

X_train contains the features of the 47 training samples

Each row of X_train is a feature vector of a training sample.

Y_train contains the class/fruit labels of the 47 training samples

Each element of Y_train is a class label of a training sample.

X_test contains the features of the 12 testing samples

Each row of X_test is a feature vector of a testing sample.

Y_test contains the class/fruit labels of the 12 testing samples

Each element of Y_test is a class label of a testing sample.
The flowchart of a classification study
• Classification is a subcategory of supervised learning where the goal is to
predict the class labels of new samples.
Model Training
80% Training Set
Data Set
(with class Labels)

Learning
20%
Algorithm
Model Testing

Testing Set Classifier Predicted Class Labels

True Class Labels
Classification Accuracy
KNN classifier (K-Nearest Neighbor)
• A KNN classifier. The user needs to:
(1) choose the value of K and (2) choose a distance measure

7 samples in the training set

a new sample,
class label is
unknown

7 samples in training set

Let’s set K=1 and use L2-based distance measure
Task: Find the nearest neighbor in the training set (by comparing distances)

the nearest neighbor in the

training set is an apple, therefore
the KNN classifier will classify
the input as an apple

is classified as an apple
because its nearest neighbor
is an apple

7 samples in training set

Let’s set K=5 and use L2-based distance measure
Task: Find the 5 nearest neighbor in the training set

Among the 5 nearest neighbors

in the training set, there are 3
lemons and 2 apples, therefore,
based on majority vote, the
KNN classifier will classify the
input as a lemon
is classified as a lemon
because the majority of its K
nearest neighbors are lemons
7 samples in training set
Let's build and train a KNN classifier using sk-learn
Build a KNN classifier, name it knn

K=5

Train the KNN classifier (fit the model to the data)

L2-based distance

Model training is to let knn memorize all of the training samples (features and labels), and
build a tree for K-nearest neighbor search.
Use the trained KNN classifier to classify a sample in testing set

Select a sample in the testing set

We know the true label of this sample

Use knn to Predict the label of this sample

Use the trained KNN classifier
to classify a new, previously unseen sample that is not
in the training set nor in the testing set
the Feature Vector of a new sample
Evaluate the Performance of the KNN Classifier (K=5)

the number of correctly classified samples

• Classification Accuracy =
total number of samples

• Training Accuracy: accuracy on training set (80% of the data)

Testing Accuracy: accuracy on testing set (20% of the data)

Training Accuracy of KNN classifier is 100% when K=1
Use the KNN classifier to
7 samples in the training set predict the label of a sample
x that is in the training set
height
apple apple
height
lemon apple x (apple) (apple)
KNN
(apple)
lemon
apple classifier (lemon)
lemon (lemon)
It memorized all data (apple)
points in the training set (lemon)

width
width
The nearest neighbor of x is itself: x and its label are in KNN’s memory
Use confusion matrix to visualize the classification
result on the testing set
2 apples are classified as apples 2 apples are classified as oranges

The off diagonal numbers

show the wrong classifications
the sum of the
numbers in the
matrix is the
The diagonal numbers show
total number of
the correct classifications
testing samples
KNN_classification.ipynb
• Use hand-engineered features to improve classification accuracy
• Feature normalization may improve classification accuracy
Plot the Decision Boundary
to Visualize the Classification Result
A point on the plot represents a sample (a feature vector),
which may be in training set or the testing set or unobserved yet.
Roughly speaking, to get the decision boundary
plot, we use the KNN classifier to predict the decision boundary between
lemon and orange
class label of every point on the plot.

In fact, we do not need to check every point:

we only need to predict the class labels of the
points on a dense grid and interpolate the result.
Question: how do we choose the value of K ?

Question: what is the weakness of KNN classifier ?

What's the difference between
clustering and classification ?
Data: 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑁

Clustering
Input 𝑥 Output 𝑦: predicted cluster label
Algorithm

Data: input-output pairs, 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , 𝑥3 , 𝑦3 , . . , (𝑥𝑁 , 𝑦𝑁 )

Input 𝑥 Classifier Output 𝑦: predicted class label

KNN can be used for classification and regression
• For classification, the output from a KNN classifier is a discrete
value (class label), which is done by majority vote
• For regression, the output from a KNN regressor is a continues
value (target value)
• For regression, the average target value of the K-nearest neighbors
will be the predicted target value of the input x
Assume K=3 and training samples 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 are the (K=3) nearest
neighbors of x, the target values are 𝒚𝟏 , 𝒚𝟐 , 𝒚𝟑
Then, the predicted target value 𝑦෤ of x is (𝒚𝟏 + 𝒚𝟐 + 𝒚𝟑 )/3
Boston Housing Dataset
The Housing dataset, which contains information about houses in the different
districts of Boston collected by D. Harrison and D.L. Rubinfeld in 1978.
The dataset is a large table that has 506 samples (rows) and 14 columns
Each row contains information/attributes of a region in Boston
y: target
x: input (13 attributes)

MEDV: Median value of owner-occupied homes in $1000s

Regression y=f(x)

Model Training
80% Training Set
Data Set
(with target values)

Learning
20%
Algorithm
Model Testing

Testing Set ML-Model Predicted Target Values

True Target Values
Regression Accuracy
KNN_Regression.ipynb

Conscience and Its Role in Moral Decision Making
No ratings yet
Conscience and Its Role in Moral Decision Making
7 pages
Manual Digital Therapy Machine
100% (1)
Manual Digital Therapy Machine
37 pages
Endangered Species
0% (1)
Endangered Species
4 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
105 pages
Total Listing Machine Learning
100% (1)
Total Listing Machine Learning
114 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
Practical 10 K-Nearest Neighbors Algorithm
No ratings yet
Practical 10 K-Nearest Neighbors Algorithm
16 pages
Lecture 3 Basics of Clssification
No ratings yet
Lecture 3 Basics of Clssification
53 pages
MachineLearning Unit-III Ppt
No ratings yet
MachineLearning Unit-III Ppt
26 pages
ML 4 (1)
No ratings yet
ML 4 (1)
33 pages
Supervised Learning
No ratings yet
Supervised Learning
71 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Classification
No ratings yet
Classification
58 pages
Data Analytics Classification
No ratings yet
Data Analytics Classification
56 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Mod3_Classification
No ratings yet
Mod3_Classification
32 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
90 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Unit 5 Learning with Algorithm
No ratings yet
Unit 5 Learning with Algorithm
7 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
T6- KNN - Features, Distances &amp; Non-Parametric Models
No ratings yet
T6- KNN - Features, Distances &amp; Non-Parametric Models
23 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Lecture 02 - KNN and ML Basics
No ratings yet
Lecture 02 - KNN and ML Basics
33 pages
ML1 - Classification - KNN & NB
No ratings yet
ML1 - Classification - KNN & NB
23 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
06-knn
No ratings yet
06-knn
41 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
ML U4
No ratings yet
ML U4
48 pages
03 - Classification PDF
No ratings yet
03 - Classification PDF
92 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
Unit4_PPT
No ratings yet
Unit4_PPT
118 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Classification and Regression: Arturo Calder On Mora
No ratings yet
Classification and Regression: Arturo Calder On Mora
8 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
K-Nearest Neighbors in MATLAB & Classification Learner App - Machine Learning - @MATLABHelper
No ratings yet
K-Nearest Neighbors in MATLAB & Classification Learner App - Machine Learning - @MATLABHelper
6 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Clustering - KNN
No ratings yet
Clustering - KNN
10 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
TouchCode Class 7: Coding Book
From Everand
TouchCode Class 7: Coding Book
Team Orange
No ratings yet
Methods Used in EFL Teaching
No ratings yet
Methods Used in EFL Teaching
2 pages
Session 1-Introduction To The Course MKT318
No ratings yet
Session 1-Introduction To The Course MKT318
20 pages
Practical Work Past Tense
No ratings yet
Practical Work Past Tense
1 page
(Ebook) Blood Stasis: China's classical concept in modern medicine by Gunter R. Neeb ISBN 9780443101854, 044310185X download
100% (1)
(Ebook) Blood Stasis: China's classical concept in modern medicine by Gunter R. Neeb ISBN 9780443101854, 044310185X download
57 pages
Birds of the Horn of Africa Ethiopia Eritrea Djibouti Somalia Socotra Second Edition Nigel Redman - The ebook in PDF format with all chapters is ready for download
No ratings yet
Birds of the Horn of Africa Ethiopia Eritrea Djibouti Somalia Socotra Second Edition Nigel Redman - The ebook in PDF format with all chapters is ready for download
27 pages
G.R. No. L-6923
100% (1)
G.R. No. L-6923
2 pages
ECB European Central Bank
No ratings yet
ECB European Central Bank
12 pages
R.Buhnaci
No ratings yet
R.Buhnaci
56 pages
Music and Architecture
No ratings yet
Music and Architecture
8 pages
COMSAVINGS v. CAPISTRANO
No ratings yet
COMSAVINGS v. CAPISTRANO
3 pages
Introduction To Marketing
No ratings yet
Introduction To Marketing
95 pages
Jurnal - Urgensi Pembuatan PPJB
No ratings yet
Jurnal - Urgensi Pembuatan PPJB
21 pages
En Barozzi-Veiga Portfolio-2019
No ratings yet
En Barozzi-Veiga Portfolio-2019
13 pages
ACC04 - Quiz08 - Chap21-22 - BSA 3-1t
100% (1)
ACC04 - Quiz08 - Chap21-22 - BSA 3-1t
3 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
2nd Detailed Lesson Plan in English For Academic and Professional Purposes in Grade 11
No ratings yet
2nd Detailed Lesson Plan in English For Academic and Professional Purposes in Grade 11
9 pages
The Emerging Social Science Literature On Health Technology Assessment: A Narrative Review
No ratings yet
The Emerging Social Science Literature On Health Technology Assessment: A Narrative Review
7 pages
KFC
0% (1)
KFC
32 pages
2013 AFR LGUs Vol3
No ratings yet
2013 AFR LGUs Vol3
908 pages
Test Your Vocabulary 10
No ratings yet
Test Your Vocabulary 10
6 pages
Asfaw Dessie PDF
No ratings yet
Asfaw Dessie PDF
95 pages
Bin Packing Algorithms
No ratings yet
Bin Packing Algorithms
3 pages
PracRes01 11 Q2 M14
No ratings yet
PracRes01 11 Q2 M14
16 pages
PAL Partner Admin Link - Partner Facing
No ratings yet
PAL Partner Admin Link - Partner Facing
12 pages
DaphileInstallation PDF
No ratings yet
DaphileInstallation PDF
10 pages
VP Communications Marketing Executive in Washington DC Resume Woody Goulart
No ratings yet
VP Communications Marketing Executive in Washington DC Resume Woody Goulart
2 pages
Sri Lanka
No ratings yet
Sri Lanka
8 pages

Lecture7 KNN

Uploaded by

Lecture7 KNN

Uploaded by

K-Nearest Neighbor (KNN) Classifier and Regressor

Classifier 0 Class label

Input x Machine Learning Model Output y

label of the data point x

label of the data point x

y is the class label of the data point x

"0.8" is usually interpreted as the "confidence" of the classifier about {𝑦𝑜𝑢𝑡 = 1}

• Each type is associated with specific Learning Algorithms

• A simple task: classify a fruit into four classes/categories

perform classification class/fruit label

Model-1 features Model-2 It is an apple!

It is not easy to develop Model-1 for feature extraction

Now, let's develop Model-2 for classification.

The feature vector of a fruit sample: [width, height, color_score ]

The fruit dataset (a table)

In this table: what is input x? what is output y?

4 classes: {1:apple, 2:mandarin, 3:orange, 4:lemon}

X_train contains the features of the 47 training samples

Y_train contains the class/fruit labels of the 47 training samples

X_test contains the features of the 12 testing samples

Y_test contains the class/fruit labels of the 12 testing samples

Testing Set Classifier Predicted Class Labels

7 samples in the training set

7 samples in training set

the nearest neighbor in the

7 samples in training set

Among the 5 nearest neighbors

Train the KNN classifier (fit the model to the data)

Select a sample in the testing set

We know the true label of this sample

Use knn to Predict the label of this sample

the number of correctly classified samples

• Training Accuracy: accuracy on training set (80% of the data)

Testing Accuracy: accuracy on testing set (20% of the data)

The off diagonal numbers

In fact, we do not need to check every point:

Question: what is the weakness of KNN classifier ?

Data: input-output pairs, 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , 𝑥3 , 𝑦3 , . . , (𝑥𝑁 , 𝑦𝑁 )

Input 𝑥 Classifier Output 𝑦: predicted class label

MEDV: Median value of owner-occupied homes in $1000s

Testing Set ML-Model Predicted Target Values

You might also like