Open In App

k-nearest neighbor algorithm using Sklearn – Python

Last Updated : 23 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

K-Nearest Neighbors (KNN) works by identifying the ‘k’ nearest data points called as neighbors to a given input and predicting its class or value based on the majority class or the average of its neighbors. In this article we will implement it using Python’s Scikit-Learn library.

Implementation of KNN : Step-by-Step

Choosing the optimal k-value is critical before building the model for balancing the model’s performance.

  • A smaller k value makes the model sensitive to noise, leading to overfitting (complex models).
  • A larger k value results in smoother boundaries, reducing model complexity but possibly underfitting.
Python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

irisData = load_iris()

X = irisData.data
y = irisData.target

X_train, X_test, y_train, y_test = train_test_split(
             X, y, test_size = 0.2, random_state=42)

knn = KNeighborsClassifier(n_neighbors=7)

knn.fit(X_train, y_train)

print(knn.predict(X_test))

In the example shown above following steps are performed:

  1. The k-nearest neighbor algorithm is imported from the scikit-learn package.
  2. Create feature and target variables.
  3. Split data into training and test data.
  4. Generate a k-NN model using neighbors value.
  5. Train or fit the data into the model.
  6. Predict the future.

We have seen how we can use K-NN algorithm to solve the supervised machine learning problem. But how to measure the accuracy of the model?

Consider an example shown below where we predicted the performance of the above model:

Python
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Loading data
irisData = load_iris()

# Create feature and target arrays
X = irisData.data
y = irisData.target

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
             X, y, test_size = 0.2, random_state=42)

knn = KNeighborsClassifier(n_neighbors=7)

knn.fit(X_train, y_train)

# Calculate the accuracy of the model
print(knn.score(X_test, y_test))

  
Model Accuracy: So far so good. But how to decide the right k-value for the dataset?

Obviously, we need to be familiar to data to get the range of expected k-value, but to get the exact k-value we need to test the model for each and every expected k-value. Refer to the example shown below.
 

Python
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt

irisData = load_iris()

# Create feature and target arrays
X = irisData.data
y = irisData.target

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
             X, y, test_size = 0.2, random_state=42)

neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

# Loop over K values
for i, k in enumerate(neighbors):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    
    # Compute training and test data accuracy
    train_accuracy[i] = knn.score(X_train, y_train)
    test_accuracy[i] = knn.score(X_test, y_test)

# Generate plot
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')

plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()

Output: 
 

Here in the example shown above, we are creating a plot to see the k-value for which we have high accuracy.



Next Article

Similar Reads