0% found this document useful (0 votes)
8 views

K - Nearest Neighbours (K-NN) Algorithm

The KNN algorithm classifies new data points based on their distance to labeled examples in a training set. It assumes that similar things exist in close proximity, so it finds the k nearest neighbors of a new point and assigns the most common label of those neighbors. The distance metric, usually Euclidean distance, is used to calculate distances between data points. Choosing the optimal k value, which determines the number of neighbors considered, is important but challenging, as different k values can lead to different classifications of test points.

Uploaded by

2023aa05902
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

K - Nearest Neighbours (K-NN) Algorithm

The KNN algorithm classifies new data points based on their distance to labeled examples in a training set. It assumes that similar things exist in close proximity, so it finds the k nearest neighbors of a new point and assigns the most common label of those neighbors. The distance metric, usually Euclidean distance, is used to calculate distances between data points. Choosing the optimal k value, which determines the number of neighbors considered, is important but challenging, as different k values can lead to different classifications of test points.

Uploaded by

2023aa05902
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Classification: K-

Nearest Neighbours (K-NN)


Prof. (Dr.) Honey Sharma
KNN Classification
The KNN algorithm assumes that similar things exist in close proximity. In
other words, similar things are near to each other.
“Birds of a feather flock together.”
The KNN algorithm hinges on assumption similar data points are close to
each other. KNN captures the idea of similarity (sometimes called distance,
proximity, or closeness) with some mathematics — calculating the distance
between points on a graph.
It manipulates the training data and classifies the new test data based
on distance metrics. It finds the k-nearest neighbors to the test data,
and then classification is performed by the majority of class labels.
Consider the following figure. Let us say
we have plotted data points from our
training set on a two-dimensional
feature space. As shown, we have a
total of 6 data points (3 red and 3 blue).
Red data points belong to ‘class1’ and
blue data points belong to ‘class2’. And
yellow data point in a feature space
represents the new point for which a
class is to be predicted. Obviously, we
say it belongs to ‘class1’ (red points)
Distance Metrics
The distance metric is the
effective hyper-parameter through
which we measure the distance
between data feature values and
new test inputs.
Usually, we use the Euclidean
approach, which is the most
widely used distance measure
to calculate the distance
between test samples and
trained data values.
How to choose a K value?
Selecting the optimal K value to achieve the maximum accuracy of the model
is always challenging for a data scientist.
K value indicates the count of the nearest neighbors. We have to compute
distances between test points and trained labels points.
Updating distance metrics with every iteration is computationally expensive,
and that’s why KNN is a lazy learning algorithm.
As you can verify from the above
image, if we proceed with K=3, then
we predict that test input belongs to
class B, and if we continue with K=7,
then we predict that test input belongs
to class A.
That’s how you can imagine that the
K value has a powerful effect on KNN
performance.
● There are no pre-defined statistical methods to find the most favorable
value of K.
● Initialize a random K value and start computing.
● Choosing a small value of K leads to unstable decision boundaries.
● The substantial K value is better for classification as it leads to
smoothening the decision boundaries.
● Derive a plot between error rate and K denoting values in a defined
range. Then choose the K value as having a minimum error rate.

You might also like