K-Nearest Neighbor Classification-Algorithm and Characteristics
K-Nearest Neighbor Classification-Algorithm and Characteristics
characteristics
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
o K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we can
use the KNN algorithm, as it works on a similarity measure. Our KNN model will find
the similar features of the new data set to the cats and dogs images and based on
the most similar features it will put it in either cat or dog category.
Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the
below image:
o As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.
1. Input:
Training dataset with labeled examples.
New data point (unlabeled) that needs to be classified.
2. Choose K:
Decide the number of neighbors, K.
3. Calculate Distance:
Use a distance metric (commonly Euclidean distance, Manhattan distance,
Minkowski distance, etc.) to measure the distance between the new data point
and each training data point.
4. Find K Nearest Neighbors:
Identify the K training data points that are closest to the new data point based
on the chosen distance metric.
5. Majority Vote: (for Classification)
For classification tasks, assign the class label that is most common among
the K nearest neighbors.
6. Output:
The predicted class label for the new data point.
KNN is a simple yet effective algorithm, often used as a baseline model or for quick
prototyping. It's particularly useful when the decision boundaries are not well-defined
or when the data distribution is not known in advance. However, its performance
may suffer in high-dimensional or large-scale datasets.
Sample Data:
Consider the following dataset:
Implementation:
Let's say we want to predict the class of a new data point with features (3, 4). We'll
use Euclidean distance as the distance metric and set K to 3.
Calculate the Euclidean distance between the new point (3, 4) and each point in the
training set:
Since two out of three nearest neighbors belong to class A, we predict that the new
data point (3, 4) belongs to class A.
This is a basic example to illustrate the steps of the KNN algorithm. In practice, you
would typically use libraries like scikit-learn in Python to implement KNN and handle
distance calculations efficiently.