KNN
KNN
Classification Algorithm
• In machine learning and statistics, classification
is a supervised learning approach
• computer program learns from the data input
given to it and then uses this learning to classify
new observation
• This data set may simply be bi-class (like
identifying whether the person is male or
female or that the mail is spam or non-spam)
• it may be multi-class too
Examples of classification
examples of classification problems are:
• speech recognition
• handwriting recognition
• bio metric identification
• document classification
Classification Algorithms
• Linear Classifiers: Logistic Regression, Naive
Bayes Classifier
• k-Nearest Neighbor (kNN)
• Support Vector Machines
• Decision Trees (IG –ID3)
• Boosted Trees
• Random Forest
• Neural Networks
The k-Nearest- Neighbours algorithm
• Step-1: Select the number K of the
neighbours
• Step-2: Calculate the Euclidean distance of K
number of neighbors
• Step-3: Take the K nearest neighbours as per
the calculated Euclidean distance.
• Step-4: Among these k neighbours, count the
number of the data points in each category.
• Step-5: Assign the new data points to that
category for which the number of the
neighbour is maximum.
• Step-6: Our model is ready.
K-NN Algorithm
K-NN Algorithm
K=4
K=3
• Total number of category + 1 in target feature
• In real time application, it is based on iterative
accuracy process (Elbow curve plot)
Validation Error Curve (Elbow curve plot)
Pseudo Code of KNN
• Load the data
• Initialise the value of k
• For getting the predicted class, iterate from 1
to total number of training data points
– Calculate the distance between test data and each
row of training data.
– Here we will use Euclidean distance as our
distance metric since it’s the most popular
method.
– The other metrics that can be used are
Manhattan, Minkowski, Chebyshev, cosine, etc.
Pseudo Code of KNN
– Sort the calculated distances in ascending order
based on distance values
– Get top k rows from the sorted array
– Get the most frequent class of these rows
– Return the predicted class
• Euclidean distance
DATASET
A startup company’s Acceptance and Rejection of the product
is tabulated in the table given below. Suggest the company
whether the new product Prod5, with Attrib1=3 and
Attrib2=7 will be accepted or rejected using similarity based
learning algorithm by considering 2 nearest neighbors.