0% found this document useful (0 votes)

96 views32 pages

Week03 - 1 - KNN

The document discusses nearest neighbors classifiers. It begins with an overview of eager vs lazy classification strategies and distance-based models. It then covers feature spaces, various methods for measuring distance between examples (absolute difference, Euclidean distance, etc.), and data normalization. Finally, it introduces the k-nearest neighbor classifier (kNN) and weighted kNN, noting they can be implemented in scikit-learn.

Uploaded by

Rawaf Fahad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views32 pages

Week03 - 1 - KNN

Uploaded by

Rawaf Fahad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

CCAI 312

Pattern Recognition

Nearest Neighbors Classifiers

Adopted from Dr. Pádraig Cunningham COMP47750 School of Computer Science UCD(Dublin)
Overview
• Eager v Lazy Classification Strategies
• Distance-based Models
• Feature Spaces
• Measuring Distance
• Data Normalisation
• Nearest Neighbours
• k-Nearest Neighbour Classifier (kNN)
• Weighted kNN
• kNN in scikit-learn in Python

2
Reminder: Classification
• Supervised Learning: Algorithm that learns a function from
manually-labelled training examples.
• Classification: Training examples, usually represented by a set
of descriptive features, help decide the class to which a new
unseen query input belongs.
• Binary Classification: Assign one of two possible target class
labels to the new query input.

Non-Spam

Query Input ?
Spam

• Multiclass Classification: Assign one of M>2 possible target

class labels to the new query input.

3
Eager v Lazy
Classifiers
• Eager Learning Classification Strategy (model based)
• Classifier builds a full model during an initial training phase, to
use later when new query examples arrive.
• More oﬄine setup work, less work at run-time.
• Generalise before seeing the query example.
• Lazy Learning Classification Strategy (instance based)
• Classifier keeps all the training examples for later use.
• Little work is done oﬄine, wait for new query examples.
• Focus on the local space around the examples.
• Distance-based Models: Many learning algorithms are based on
generalising from training data to unseen data by exploiting the
distances (or similarities) between the two.
4
Example: Athlete Selection
• Training set of performance ratings for 20 college athletes, where
each athlete is described by 2 continuous features: speed, agility.
• Each athlete has a target class label indicating whether they were
selected for the university athletics team: 'Yes' or 'No'.
Athlete Speed Agility Selected Athlete Speed Agility Selected

x1 2.50 6.00 No x11 2.00 2.00 No

x2 3.75 8.00 No x12 5.00 2.50 No

x3 2.25 5.50 No x13 8.25 8.50 Yes

x4 3.25 8.25 No x14 5.75 8.75 Yes

x5 2.75 7.50 No x15 4.75 6.25 Yes

x6 4.50 5.00 No x16 5.50 6.75 Yes

x7 3.50 5.25 No x17 5.25 9.50 Yes

x8 3.00 3.25 No x18 7.00 4.25 Yes

x9 4.00 4.00 No x19 7.50 8.00 Yes

x10 4.25 3.75 No x20 7.25 3.75 Yes

Q. Will a new athlete q be Athlete Speed Agility Selected

selected: 'Yes' or 'No'? q 3.00 8.00 ???

5
Feature Spaces
• A feature space is a D-dimensional coordinate space used to
represent the input examples for a given problem, with one
coordinate for each descriptive feature.
• Example: Use a feature space to visually position the 20 athletes
in a 2-dimensional coordinate space (i.e. agility versus speed):

Training set of 20
examples (athletes)

Each example described

by 2 feature values:
agility & speed

6
Measuring Distance
• Measuring the distance (or similarity) between two examples is
fundamental to many ML algorithms.
• Many measures can be used to calculate distance. There is no
“best” distance measure. The choice is highly problem-dependent.
Examples x4 and x5
x4 x13 have a low distance
(high similarity)
x5

Examples x10 and x13

have a high distance
(low similarity)
x10

7
Measuring Distance
• Distance function: A suitable function to measure how distant
(or similar) two input examples are from one another are in
some D-dimensional feature space.

• Local distance function: Measure the Athlete Speed Agility

distance between two examples based x1 2.50 6.00

on a single feature. x2 3.75 8.00

• e.g. what is distance between x1

and x2 in terms of Speed?
• e.g. what is distance between x1
and x2 in terms of Agility?
• Global distance function: Measure the distance between two
examples based on the combination of the local distances
across all features.
• e.g. what is distance between x1 and x2 based on both
Speed and Agility? 8
Measuring Distance
• Overlap function: Simplest local Athlete Gender Nationality

distance measure. Returns 0 if the two x1 Female Irish

values for a feature are equal and 1 x2 Male Irish

x3 Male Italian
otherwise. Generally suitable for
categorical data.
dg(x1,x2) = 1 dn(x1,x2) = 0
For feature For feature
dg(x1,x3) = 1 dn(x1,x3) = 1
Gender Nationality
dg(x2,x3) = 0 dn(x2,x3) = 1

• Hamming distance: Global distance function which is the sum of

the overlap differences across all features - i.e. number of features
on which two examples disagree.
d(x1,x2) = 1 + 0 = 1
d(x1,x3) = 1 + 1 = 2 Overlap distance for Gender +
d(x2,x3) = 0 + 1 = 1 Overlap distance for Nationality

9
Measuring Distance
• Absolute difference: For numeric data, we Athlete
x1
Speed
2.50
Agility
6.00
can calculate absolute value of the x2 3.75 8.00

difference between values for a feature. x3 2.25 5.50

For feature ds(x1,x2) = |2.50-3.75| = 1.25 For feature ds(x1,x2) = |6.0-8.0| = 2.0
Speed ds(x1,x3) = |2.50-2.25| = 0.25 Agility ds(x1,x3) = |6.0-5.5| = 0.5
ds(x2,x3) = |3.75-2.25| = 1.5 ds(x2,x3) = |8.0-5.5| = 2.5

• Again we can compute a global distance between two

examples by summing the local distances over all features.
d(x1,x2) = 1.25 + 2.0 = 3.25 Absolute difference for Speed +
d(x1,x3) = 0.25 + 0.5 = 0.75 Absolute difference for Agility
d(x2,x3) = 1.5 + 2.5 = 4.0

• For ordinal features, calculate the absolute value of the difference

between the two positions in the ordered list of possible values.
diff(Low,High) = |1-3| = 2
e.g. Ordinal Feature Dosage:
diff(Medium,Low) = |2-1| = 1
{Low,Medium,High} = {1, 2, 3}
diff(High,High) = |3-3| = 0
10
Measuring Distance
• Euclidean distance: Most common measure used to quantify
distance between two examples with numeric features.
• Given by the "straight line" distance between two points in a
Euclidean coordinate space - e.g. a feature space.
• Calculated as the square root of sum of squared differences for
each feature f representing a pair of examples.
• The output is a real value ≥ 0, where a larger value indicates two
examples are more distant (i.e. less similar to one another).

Input:
2 examples ED(p,q) = Calculate square
of the difference
p and q
between the examples
on feature f

For each feature f in

the full set of features F

11
Measuring Distance
• Example: Apply Euclidean
x4
distance, where F
x5
consists of 2 numeric
features: speed, agility x15

ED(p,q) =

Athlete Speed Agility

x4 3.25 8.25
E D (x 4, x 15) =
x15 4.75 6.25
= = 2.5
Athlete Speed Agility
E D (x 4, x 5) =
x4 3.25 8.25

x5 2.75 7.50
=

12
Heterogeneous Distance Functions
• In many datasets, the features associated with examples will
have different types (e.g. continuous, categorical, ordinal etc).
• We can create a global measure from different local distance
functions, using an appropriate function for each feature.
Athlete Speed Agility Gender Nationality • Use absolute difference for continuous
features Speed & Agility
x1 2.50 6.00 Female Irish
• Use overlap for categorical features
x2 3.75 8.00 Male Irish Gender & Nationality
x3 2.25 5.50 Male Italian • Global distance calculated as sum
over individual local distances

d(x1,x2) = 1.25 + 2.0 + 1 + 0 = 4.25

d(x1,x3) = 0.25 + 0.5 + 1 + 1 = 2.75
d(x2,x3) = 1.5 + 2.5 + 0 + 1 = 5.0

Often domain expertise is required to choose an appropriate

distance function for a particular dataset.
13
Data Normalisation
Example Age
• Numeric features often have different x1 24

ranges, which can skew certain distance x2 19

x3 50
functions. x4 40

• So that all features have similar range, we x5

x6
23
68
apply feature normalisation. x7 45
x8 33

• Min-max normalisation: x9
x10
80
58
Use min and max values for a given
feature to rescale to the range [0,1] xi —min(x)
zi = max(x)—min(x)
• Example: Feature Age
min(x) = 19
Age 24 19 50 40 23 68 45 33 80 58
(Non-normalised)
max(x) = 80
Age
max(x ) — min(x ) = (Normalised)
0.08 0.00 0.51 0.34 0.07 0.80 0.43 0.23 1.00 0.64

61
14
Nearest Neighbour Classifier
Lazy learning approach: Do not build a model for the data. Identify
most similar previous example(s) from the training set for which a
label has already been assigned, using some distance function.
Nearest neighbour rule (1NN): For a new query input q, find a single
labelled example x closest to q, and assign q the same label as x.

q2 Assign q2 to
'Yes' class

Assign q1 to q1
'No' class

15
k-Nearest Neighbour Classifier
k-Nearest neighbours (kNN): The NN approach naturally generalises
to the case where we use k nearest neighbours from the training
set to assign a label to a new query input.
Example: For new query inputs, calculate distance to all training
examples. Find k=3 nearest examples (i.e. with smallest distances).

3 nearest training
q2 examples to q2

3 nearest training
q1
examples to q1

16
k-Nearest Neighbour Classifier
Majority voting: The decision on a label for a new query example is
decided based on the “votes” of its k nearest neighbours. The label
for the query is the majority label of its neighbours.
Example: Measure distance from q to all training examples.
Find the k=3 nearest examples, and use their labels as votes.

x16
x15

x6
Neighbour counts
q
• Yes = 2 votes
• No = 1 vote
➡ Majority says
Yes!
17
k-Nearest Neighbour Classifier
Majority voting: The decision on a label for a new query example is
decided based on the “votes” of its k nearest neighbours. The label
for the query is the majority label of its neighbours.
Example: Measure distance from q to all training examples.
Find the k=4 nearest examples, and use their labels as votes.

x16
x15 In the case that…
x6
• Yes = 2 votes
q
• No = 2 votes
Can break ties…
x12 ‣ At random
‣ Based on sum of
neighbour distances
18
Example: kNN Classification
(k=3)
• Training set of 20 athletes - 8 labelled as 'Yes', 12 as 'No'.
• Each athlete described by 2 continuous features: Speed, Agility
Euclidean distance would be an appropriate distance function.
Athlete Speed Agility Selected Athlete Speed Agility Selected

x1 2.50 6.00 No x11 2.00 2.00 No

x2 3.75 8.00 No x12 5.00 2.50 No

x3 2.25 5.50 No x13 8.25 8.50 Yes

x4 3.25 8.25 No x14 5.75 8.75 Yes

x5 2.75 7.50 No x15 4.75 6.25 Yes

x6 4.50 5.00 No x16 5.50 6.75 Yes

x7 3.50 5.25 No x17 5.25 9.50 Yes

x8 3.00 3.25 No x18 7.00 4.25 Yes

x9 4.00 4.00 No x19 7.50 8.00 Yes

x10 4.25 3.75 No x20 7.25 3.75 Yes

Athlete Speed Agility Selected

Will a new input example q be
q 5.00 7.50 ???
classified as 'Yes' or 'No'?
19
Example: kNN Classification
(k=3)
• Measure distance between q and all 20 training examples.
Athlete Speed Agility Selected Distance Athlete Speed Agility Selected Distance
x1 2.50 6.00 No 2.915 x11 2.00 2.00 No 6.265
x2 3.75 8.00 No 1.346 x12 5.00 2.50 No 5.000
x3 2.25 5.50 No 3.400 x13 8.25 8.50 Yes 3.400
x4 3.25 8.25 No 1.904 x14 5.75 8.75 Yes 1.458
x5 2.75 7.50 No 2.250 x15 4.75 6.25 Yes 1.275
x6 4.50 5.00 No 2.550 x16 5.50 6.75 Yes 0.901
x7 3.50 5.25 No 2.704 x17 5.25 9.50 Yes 2.016
x8 3.00 3.25 No 4.697 x18 7.00 4.25 Yes 3.816
x9 4.00 4.00 No 3.640 x19 7.50 8.00 Yes 2.550
x10 4.25 3.75 No 3.824 x20 7.25 3.75 Yes 4.373

q 5.00 7.50 ???

• Rank the training examples and identify set of 3 examples with the
smallest distances.
Athlete
x16
Speed
5.50
Agility
6.75
Selected
Yes
Distance
0.901
• Yes = 2 votes
x15 4.75 6.25 Yes 1.275 • No = 1 vote
x2 3.75 8.00 No 1.346
➡ Majority says Yes,
so assign label Yes to q 20
Weighted kNN
• Weighted voting: In this approach, some training examples have a
higher weight than others.
• Instead of using a binary vote of 1 for each nearest neighbour,
typically closer neighbours get higher votes when deciding on the
predicted label for a query example.
• Inverse distance-weighted voting: Simplest strategy is to take a
neighbour’s vote to be the inverse of their distance from the query
(i.e. 1/Distance). We then sum over the weights for each class.
d(q, x 16) = 0.901
1
weight(x16) = = =
d(q,1x 16)
1.109
0.901
d(q, x 2) = 1.346
1 1
weight(x2) = = = 0.743
d(q, x 2) 1.346

21
Example: Weighted kNN
(k=3)
• Measure distance between q and all 20 training examples.
Athlete Speed Agility Selected Distance Athlete Speed Agility Selected Distance
x1 2.50 6.00 No 2.915 x11 2.00 2.00 No 6.265
x2 3.75 8.00 No 1.346 x12 5.00 2.50 No 5.000
x3 2.25 5.50 No 3.400 x13 8.25 8.50 Yes 3.400
x4 3.25 8.25 No 1.904 x14 5.75 8.75 Yes 1.458
x5 2.75 7.50 No 2.250 x15 4.75 6.25 Yes 1.275
x6 4.50 5.00 No 2.550 x16 5.50 6.75 Yes 0.901
x7 3.50 5.25 No 2.704 x17 5.25 9.50 Yes 2.016
x8 3.00 3.25 No 4.697 x18 7.00 4.25 Yes 3.816
x9 4.00 4.00 No 3.640 x19 7.50 8.00 Yes 2.550
x10 4.25 3.75 No 3.824 x20 7.25 3.75 Yes 4.373

• Rank the training examples and identify set of 3 examples with the
smallest distances. Assign weights based on 1/Distance, and
weights for each class.
sum
• Weights for Yes =
Athlete Speed Agility Selected Distance Weight
1.109 + 0.784 = 1.893
x16 5.50 6.75 Yes 0.901 1.109
x15 4.75 6.25 Yes 1.275 0.784 • Weights for No =
x2 3.75 8.00 No 1.346 0.743 0.743
➡ Majority says
22
Yes
Parameter Tuning
• A simple 1-NN classifier is easy to implement. But it will be
susceptible to “noise” in the data. A misclassification will occur
every time a single noisy example is retrieved.
• We might decide to vary the neighbourhood size parameter k to
improve the predictive performance of kNN.
• Choosing between different settings of an algorithm is often
referred to as hyperparameter tuning or model selection.

• Using a larger (e.g. k > 2) can

sometimes make the classifier more
robust and overcome this problem.
• But when k is large (k→N) and
classes are unbalanced, we always
predict the majority class.

23
Problems with kNN
 Can be slow to find nearest neighbors in high dim space

 Need to store all the training data, so takes a lot of memory

 Need to specify the distance function

 Does not give probabilistic output

 Sensitive to class noise

 Sensitive to scales of attributes

 Distances are less meaningful in high dimensions

24
k-NN with Scikit Learn

• Examples in Notebook 02-kNN

• Loading a dataset
• Finding nearest neighbours
• Training a k-NN classifier
• Scaling features
• Weighting Instances

25
Load a dataset into Python
• Load a csv file into a Pandas dataframe in Python
athlete = pd.read_csv('AthleteSelection.csv',index_col = 'Athlete')
athlete.head()

y = athlete.pop('Selected').values
X= athlete.values

Speed Agility Selected

• X contains the features Athlete

• y contains the targets x1 2.50 6.00 0

x2 3.75 8.00 0

x3 2.25 5.50 0

x4 3.25 8.25 0

x5 2.75 7.50 0

26
Train a k-NN classifier

Set up classifier forecast_kNN = KNeighborsClassifier(n_neighbors=3)

Train it forecast_kNN.fit(X,y)

Setup queries xinput = np.array([[8.,70.,11.],

examples [8,69,15]])

Make predictions forecast_kNN.predict(xinput)

Temperature Humidity Wind_Speed Go-Out

0 6 85 30 0
1 14 90 35 0
2 15 86 8 1
3 21 56 15 1
4 17 67 9 1

27
Test on the Training Data
• Use training data as test (not a good idea)
• k = 3 so one misclassification
y_dash = forecast_kNN.predict(X)
print(' y:',y)
print('y_dash:',y_dash)

y: [0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 0 0 0]
y_dash: [0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 0]

confusion = confusion_matrix(y, y_dash)

print("Confusion matrix:\n{}".format(confusion))

Confusion matrix:
[[ 7 1]
[ 0 10]]

What would we expect to happen when k=1? (Try it.)

28
Normalizing (Scaling) Data
• Normaize data so all features have the same influence
• Two popular models
• N(0,1)
• MinMax typically range (0,1)

scaler = preprocessing.StandardScaler().fit(X)
Set up Scaler
X_scaled = scaler.transform(X)
Scale the data q_scaled = scaler.transform([q])

Retrain the classifier forecast_kNN_S.fit(X_scaled,y)

Make predictions forecast_kNN_S.kneighbors(q_scaled)

29
Instance weighting
• Give nearer neighbours a bigger weight (vote)
• based on distance
forecast_kNN_SW = KNeighborsClassifier(n_neighbors=3,weights='distance')
forecast_kNN_SW.fit(X_scaled,y)
y_dash = forecast_kNN_SW.predict(X_scaled)
confusion = confusion_matrix(y, y_dash)
print("Confusion matrix:\n{}".format(confusion))
print('\n y:',y)
print('y_dash:',y_dash)

Confusion matrix:
[[ 8 0]
[ 0 10]]

y: [0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 0 0 0]
y_dash: [0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 0 0 0]

30
Advantages and disadvantages of the KNN algorithm

Advantages

 Easy to implement: Given the algorithm’s simplicity and accuracy, it is

one of the first classifiers that a new data scientist will learn.

 Adapts easily: As new training samples are added, the algorithm

adjusts to account for any new data since all training data is stored
into memory.

 Few hyperparameters: KNN only requires a k value and a distance

metric, which is low when compared to other machine learning
algorithms.
Advantages and disadvantages of the KNN algorithm

Disadvantages
 Does not scale well: Since KNN is a lazy algorithm, it takes up more memory
and data storage compared to other classifiers. This can be costly from both a
time and money perspective.

 Curse of dimensionality: The KNN algorithm tends to fall victim to the curse
of dimensionality, which means that it doesn’t perform well with high-
dimensional data inputs. This is sometimes also referred to as the peaking
phenomenon , where after the algorithm attains the optimal number of
features, additional features increases the amount of classification errors,
especially when the sample size is smaller.

 Prone to overfitting: value of k can also impact the model’s behavior. Lower
values of k can overfit the data, whereas higher values of k tend to “smooth
out” the prediction values since it is averaging the values over a greater area,
or neighborhood. However, if the value of k is too high, then it can underfit
the data.

ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
No ratings yet
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
74 pages
YOLO Presentation
100% (1)
YOLO Presentation
21 pages
Autoencoder Report 1
No ratings yet
Autoencoder Report 1
34 pages
402B Deep Learning
No ratings yet
402B Deep Learning
82 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
05 KNN
No ratings yet
05 KNN
49 pages
Gpucoder Ug
No ratings yet
Gpucoder Ug
560 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
Semigroup Theory With Applica Ns To Systems An o N Tro L
No ratings yet
Semigroup Theory With Applica Ns To Systems An o N Tro L
300 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
CS601PC - MACHINE LEARNING Unit - 1-2
No ratings yet
CS601PC - MACHINE LEARNING Unit - 1-2
145 pages
DS - Module 3
No ratings yet
DS - Module 3
65 pages
Trainner XBEE DIGI March 2017
No ratings yet
Trainner XBEE DIGI March 2017
221 pages
2007 SPIE Abs
No ratings yet
2007 SPIE Abs
175 pages
BMA150
No ratings yet
BMA150
56 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Matlab Manual
No ratings yet
Matlab Manual
65 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Soft Computing Assignment
100% (1)
Soft Computing Assignment
13 pages
TensorFlow Lite Micro Embedded Machine L
No ratings yet
TensorFlow Lite Micro Embedded Machine L
13 pages
IOT Mod-4
No ratings yet
IOT Mod-4
42 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
03 Diversity PDF
No ratings yet
03 Diversity PDF
30 pages
Keysight Software Examples
No ratings yet
Keysight Software Examples
66 pages
Dm0807ghatare PDF
No ratings yet
Dm0807ghatare PDF
10 pages
Week03 - 2 - KNN - Tutorial +solutions
No ratings yet
Week03 - 2 - KNN - Tutorial +solutions
14 pages
Parameter Estimation
100% (1)
Parameter Estimation
24 pages
IoT-Lab-Manual BTCS9201
No ratings yet
IoT-Lab-Manual BTCS9201
47 pages
Support Vector Machine SVM For Medical I
No ratings yet
Support Vector Machine SVM For Medical I
9 pages
Diffusion Models
No ratings yet
Diffusion Models
46 pages
Probability Theory and Statistics Lab - Prof - S N Chandra Shekhar
No ratings yet
Probability Theory and Statistics Lab - Prof - S N Chandra Shekhar
30 pages
2021 EE769 Tutorial Sheet 1
No ratings yet
2021 EE769 Tutorial Sheet 1
4 pages
UAVs For Disaster Management
100% (1)
UAVs For Disaster Management
14 pages
18AI71 - AAI INTERNAL 2 QB Answers
No ratings yet
18AI71 - AAI INTERNAL 2 QB Answers
16 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Xpectation Aximization: Grading An Exam Without An Answer Key
No ratings yet
Xpectation Aximization: Grading An Exam Without An Answer Key
9 pages
MCQ of Basic Introduction To C
No ratings yet
MCQ of Basic Introduction To C
19 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Applied Category Theory in Chemistry, Computing, and Social Networks
100% (1)
Applied Category Theory in Chemistry, Computing, and Social Networks
6 pages
Intro To QMLand QNN
No ratings yet
Intro To QMLand QNN
13 pages
NuttX RTOS
No ratings yet
NuttX RTOS
6 pages
SIMD Tutorial
No ratings yet
SIMD Tutorial
17 pages
Intro4 ANN Deep CNN PDF
No ratings yet
Intro4 ANN Deep CNN PDF
20 pages
K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
Working With Sensor
No ratings yet
Working With Sensor
46 pages
General Framework For Object Detection
No ratings yet
General Framework For Object Detection
9 pages
Elseviers Cas Latex Double Column Template
No ratings yet
Elseviers Cas Latex Double Column Template
4 pages
Egomotion Estimation Using Visual Odometry
No ratings yet
Egomotion Estimation Using Visual Odometry
40 pages
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
No ratings yet
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
1 page
Presented By: Ranjeet Singh Sachin Anand
No ratings yet
Presented By: Ranjeet Singh Sachin Anand
24 pages
Exercises 695 Clas
No ratings yet
Exercises 695 Clas
3 pages
K-Means Clustering Tutorial - Matlab Code
No ratings yet
K-Means Clustering Tutorial - Matlab Code
6 pages
Supervised-Unsupervised Learning
No ratings yet
Supervised-Unsupervised Learning
2 pages
Data Modelling and Code Automation For Space Systems Software
No ratings yet
Data Modelling and Code Automation For Space Systems Software
1 page
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet

Week03 - 1 - KNN

Uploaded by

Week03 - 1 - KNN

Uploaded by

CCAI 312

Nearest Neighbors Classifiers

• Multiclass Classification: Assign one of M>2 possible target

x1 2.50 6.00 No x11 2.00 2.00 No

x2 3.75 8.00 No x12 5.00 2.50 No

x3 2.25 5.50 No x13 8.25 8.50 Yes

x4 3.25 8.25 No x14 5.75 8.75 Yes

x5 2.75 7.50 No x15 4.75 6.25 Yes

x6 4.50 5.00 No x16 5.50 6.75 Yes

x7 3.50 5.25 No x17 5.25 9.50 Yes

x8 3.00 3.25 No x18 7.00 4.25 Yes

x9 4.00 4.00 No x19 7.50 8.00 Yes

x10 4.25 3.75 No x20 7.25 3.75 Yes

Q. Will a new athlete q be Athlete Speed Agility Selected

selected: 'Yes' or 'No'? q 3.00 8.00 ???

Each example described

Examples x10 and x13

• Local distance function: Measure the Athlete Speed Agility

distance between two examples based x1 2.50 6.00

on a single feature. x2 3.75 8.00

• e.g. what is distance between x1

distance measure. Returns 0 if the two x1 Female Irish

values for a feature are equal and 1 x2 Male Irish

• Hamming distance: Global distance function which is the sum of

difference between values for a feature. x3 2.25 5.50

• Again we can compute a global distance between two

• For ordinal features, calculate the absolute value of the difference

For each feature f in

Athlete Speed Agility

d(x1,x2) = 1.25 + 2.0 + 1 + 0 = 4.25

Often domain expertise is required to choose an appropriate

ranges, which can skew certain distance x2 19

• So that all features have similar range, we x5

x1 2.50 6.00 No x11 2.00 2.00 No

x2 3.75 8.00 No x12 5.00 2.50 No

x3 2.25 5.50 No x13 8.25 8.50 Yes

x4 3.25 8.25 No x14 5.75 8.75 Yes

x5 2.75 7.50 No x15 4.75 6.25 Yes

x6 4.50 5.00 No x16 5.50 6.75 Yes

x7 3.50 5.25 No x17 5.25 9.50 Yes

x8 3.00 3.25 No x18 7.00 4.25 Yes

x9 4.00 4.00 No x19 7.50 8.00 Yes

x10 4.25 3.75 No x20 7.25 3.75 Yes

Athlete Speed Agility Selected

q 5.00 7.50 ???

• Using a larger (e.g. k > 2) can

 Need to store all the training data, so takes a lot of memory

 Need to specify the distance function

 Does not give probabilistic output

 Sensitive to class noise

 Sensitive to scales of attributes

 Distances are less meaningful in high dimensions

• Examples in Notebook 02-kNN

Speed Agility Selected

• y contains the targets x1 2.50 6.00 0

Set up classifier forecast_kNN = KNeighborsClassifier(n_neighbors=3)

Setup queries xinput = np.array([[8.,70.,11.],

Make predictions forecast_kNN.predict(xinput)

Temperature Humidity Wind_Speed Go-Out

confusion = confusion_matrix(y, y_dash)

What would we expect to happen when k=1? (Try it.)

Retrain the classifier forecast_kNN_S.fit(X_scaled,y)

Make predictions forecast_kNN_S.kneighbors(q_scaled)

 Easy to implement: Given the algorithm’s simplicity and accuracy, it is

 Adapts easily: As new training samples are added, the algorithm

 Few hyperparameters: KNN only requires a k value and a distance

You might also like