0% found this document useful (0 votes)

46 views

L3 KNN

The document discusses the k-nearest neighbors algorithm, including how it works, the effect of dimensionality on its performance, and ways to mitigate issues from high dimensionality such as intrinsic low dimensionality of real-world data. It provides examples and illustrations to explain concepts like the curse of dimensionality and how distances scale with dimension.

Uploaded by

Mayank Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

L3 KNN

Uploaded by

Mayank Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

K Nearest Neighbor

Supervised Learning

Some slides were adapted/taken from various sources, including Prof. Andrew Ng’s Coursera Lectures, Stanford
University, Prof. Kilian Q. Weinberger’s lectures on Machine Learning, Cornell University, Prof. Sudeshna Sarkar’s
Lecture on Machine Learning, IIT Kharagpur, Prof. Bing Liu’s lecture, University of Illinois at Chicago (UIC),
CS231n: Convolutional Neural Networks for Visual Recognition lectures, Stanford University and many more. We
thankfully acknowledge them. Students are requested to use this material for their study only and NOT to distribute it.
Recap
•

)
, where and
Hypothesis Class

• How to find a hypothesis class?

• Some assumption is required. Example: function is linear.
K- Nearest Neighbor
Assumption: Similar Inputs have similar outputs/labels
Classification rule: For a test input x, assign the most common label amongst its k most
similar training inputs
Formal (and borderline incomprehensible) definition of k-NN:
Test point:
Denote the set of the k nearest neighbors of x as Sx
Formally Sx is defined as s.t. and

(i.e. every point in D but not in Sx is at least as far away from x as the furthest
point in Sx). We can then define the classifier h(.) as a function returning the most
common label in Sx :

where mode(⋅) means to select the label of the highest occurrence

K- Nearest Neighbor

A binary classification example with k=3. The green point in the center is the test
sample x. The labels of the 3 neighbors are 2×(+1) and 1×(-1) resulting in majority
predicting (+1)
Distance Function
• The k-nearest neighbor classifier fundamentally relies on a distance
metric. The better that metric reflects label similarity, the better the
classification will be. The most common choice is the Minkowski
distance

• p = 1, Manhattan Distance
• p = 2, Eucledian distance etc.
Curse of Dimensionality
• The kNN classifier makes the assumption that similar points share similar labels.

• Unfortunately, in high dimensional spaces, points that are drawn from a

probability distribution, tend to never be close together.

• We can illustrate this on a simple example. We will draw points uniformly at

random within the unit cube (illustrated in the figure) and we will investigate how
much space the k nearest neighbors of a test point inside this cube will take up.
Curse of Dimensionality

Formally, imagine the unit cube . All training data is

sampled uniformly within this cube, i.e. ,
and we are considering the k=10 nearest neighbors of
such a test point.

Let ℓ be the edge length of the smallest hyper-cube that

contains all k-nearest neighbor of a test point.

Then and If n=1000, how big is ℓ?

Curse of Dimensionality

𝒅 ℓ
Let ℓ be the edge length of the smallest hyper-cube that
2 0.1
contains all k-nearest neighbor of a test point.
10 0.63
100 0.955
Then and If n=1000, how big is ℓ?
1000 0.9954

So as d 0 almost the entire space is needed to find the 10-NN. This breaks down the
k-NN assumptions, because the k-NN are not particularly closer (and therefore more
similar) than any other data points in the training set. Why would the test points share
the label with those k-nearest neighbors, if they are not actually similar to it?
Figure demonstrating ``the curse of dimensionality''. The histogram plots show the distributions
of all pairwise distances between randomly distributed points within d-dimensional unit squares.
As the number of dimensions d grows, all distances concentrate within a very small range.
What

happens

increase

k?
Curse of Dimensionality
• One might think that one rescue could be to increase the number of training
samples, n, until the nearest neighbors are truly close to the test point. How many
data points would we need such that ℓ becomes truly small?

• Fix ℓ= which grows exponentially!

• For d>100, we would need far more data points than there are electrons in the
universe...
Distances to hyperplanes
• So the distance between two randomly drawn data points increases drastically with their
dimensionality.
• How about the distance to a hyperplane?
• There are two blue points and a red
• Consider the following figure.
hyperplane. The left plot shows the
scenario in 2d and the right plot in 3d.

• As long as d=2, the distance between the

two points is

• When a third dimension is added, this

extends to , which must be
at least as large (and is probably larger).

This confirms again that pairwise distances grow in high dimensions. On the other hand, the distance
to the red hyperplane remains unchanged as the third dimension is added.
Distances to hyperplanes
• The reason is that the normal of the hyper-plane is orthogonal to the new dimension. This
is a crucial observation.
• In d dimensions, d−1 dimensions will be orthogonal to the normal of any given hyper-
plane. Movement in those dimensions cannot increase or decrease the distance to the
hyperplane --- the points just shift around and remain at the same distance.
• As distances between pairwise points become very large in high dimensional spaces,
distances to hyperplanes become comparatively tiny.
• For machine learning algorithms, this is highly relevant. As we will see later on, many
classifiers (e.g. the Perceptron or SVMs) place hyper planes between concentrations of
different classes.
• One consequence of the curse of dimensionality is that most data points tend to be very
close to these hyperplanes and it is often possible to perturb input slightly (and often
imperceptibly) in order to change a classification outcome. This practice has recently
become known as the creation of adversarial samples, whose existents is often falsely
attributed to the complexity of neural networks.
Distances to hyperplanes

An animation illustrating the effect on

randomly sampled data points in 2D, as
a 3rd dimension is added (with random
coordinates).

As the points expand along the 3rd

dimension they spread out and their
pairwise distances increase. However,
their distance to the hyper-plane
(z=0.5) remains unchanged ---

so in relative terms the distance from

the data points to the hyper-plane
shrinks compared to their respective
nearest neighbors.
Data with low dimensional structure
However, not all is lost. Data may lie in low dimensional subspace or on sub-manifolds.
Example: natural images (digits, faces). Here, the true dimensionality of the data can be much lower
than its ambient space.
The figure below shows an example of a data set sampled from a 2-dimensional manifold (i.e. a
surface in space), that is embedded within 3d.
Human faces are a typical example of an intrinsically low dimensional data set. Although an image of
a face may require 18M pixels, a person may be able to describe this person with less than 50
attributes (e.g. male/female, blond/dark hair, ...) along which faces vary.

An example of a data set in 3d that is drawn from an

underlying 2-dimensional manifold. The blue points are
confined to the pink surface area, which is embedded in a
3-dimensional ambient space.
k-NN summary
• k-NN is a simple and effective classifier if distances reliably reflect a
semantically meaningful notion of the dissimilarity. (It becomes truly
competitive through metric learning)

• As n→∞, k-NN becomes provably very accurate, but also very slow.

• As d 0, points drawn from a probability distribution stop being

similar to each other, and the kNN assumption breaks down.

K NN Annotated Slides
No ratings yet
K NN Annotated Slides
9 pages
Law of Large Numbers
No ratings yet
Law of Large Numbers
15 pages
4.4-InstanceBasedLearning Part 1
No ratings yet
4.4-InstanceBasedLearning Part 1
16 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
A Review of Data Classification Using K-Nearest Neighbour
No ratings yet
A Review of Data Classification Using K-Nearest Neighbour
7 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Non Parametric Classification: Pattern Recognition
No ratings yet
Non Parametric Classification: Pattern Recognition
74 pages
Introduction To Classification - KNN
No ratings yet
Introduction To Classification - KNN
29 pages
04 KNN M
No ratings yet
04 KNN M
26 pages
08 Kmethods3 Curse Deminsionality
No ratings yet
08 Kmethods3 Curse Deminsionality
44 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Machine learning Lecture 02
No ratings yet
Machine learning Lecture 02
25 pages
K Nearest Neighbour - Algorithm
No ratings yet
K Nearest Neighbour - Algorithm
29 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Similarity_Based_learning_(part_2_)__
No ratings yet
Similarity_Based_learning_(part_2_)__
15 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
KNN and Naive Bayes
No ratings yet
KNN and Naive Bayes
61 pages
textbook ML_removed (2)
No ratings yet
textbook ML_removed (2)
10 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Ch2_Lec2_ K Nearest Neighbour (KNN)
No ratings yet
Ch2_Lec2_ K Nearest Neighbour (KNN)
18 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
An Empirical Study of Distance Metrics For K-Nearest Neighbor Algorithm
No ratings yet
An Empirical Study of Distance Metrics For K-Nearest Neighbor Algorithm
6 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
KNN For Classification
No ratings yet
KNN For Classification
1 page
02-knn__slides
No ratings yet
02-knn__slides
57 pages
Lecture 17 - KNN
No ratings yet
Lecture 17 - KNN
18 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
1 - Nearest Neighbor Classification Handout
No ratings yet
1 - Nearest Neighbor Classification Handout
6 pages
Knn
No ratings yet
Knn
30 pages
KNN
No ratings yet
KNN
16 pages
lec02 (1)
No ratings yet
lec02 (1)
27 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
11 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Chapter_2
No ratings yet
Chapter_2
70 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Unit 4_KVR
No ratings yet
Unit 4_KVR
111 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
ML-Lecture-13-KNN
No ratings yet
ML-Lecture-13-KNN
14 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
w5 Classification
No ratings yet
w5 Classification
34 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
ml5
No ratings yet
ml5
35 pages
siddu AIml
No ratings yet
siddu AIml
8 pages
DS_Module 3
No ratings yet
DS_Module 3
65 pages
Aiml M5 PPT 23-24
No ratings yet
Aiml M5 PPT 23-24
94 pages
Distance Metric Learning For Large Margin Nearest Neighbor Classification
No ratings yet
Distance Metric Learning For Large Margin Nearest Neighbor Classification
8 pages

L3 KNN

Uploaded by

L3 KNN

Uploaded by

K Nearest Neighbor

• How to find a hypothesis class?

where mode(⋅) means to select the label of the highest occurrence

• Unfortunately, in high dimensional spaces, points that are drawn from a

• We can illustrate this on a simple example. We will draw points uniformly at

Formally, imagine the unit cube . All training data is

Let ℓ be the edge length of the smallest hyper-cube that

Then and If n=1000, how big is ℓ?

• Fix ℓ= which grows exponentially!

• As long as d=2, the distance between the

• When a third dimension is added, this

An animation illustrating the effect on

As the points expand along the 3rd

so in relative terms the distance from

An example of a data set in 3d that is drawn from an

• As d 0, points drawn from a probability distribution stop being

You might also like