📊 PYTHON + AI TIP 🧮 How Does the Machine Learn by Proximity – Mathematics and Implementation of KNN (K Nearest Neighbors)

📰 Edição #52 — PYTHON + AI TIP - How Does the Machine Learn by Proximity – Mathematics and Implementation of KNN (K Nearest Neighbors)

🎯 1. OBJECTIVE

Understand how the KNN algorithm classifies new data points by comparing the manual formula implementation with the Python built-in function, including:

Mathematical formula with variable explanations
Python function and explanation
Machine learning process step-by-step
Realistic practical application

🧠 2. CONCEPT

KNN (K Nearest Neighbors) is a supervised learning algorithm based on geometric proximity. Its principle is simple:

When receiving a new data point, KNN does not perform model fitting or prior training.
It stores the training data and, at prediction time, calculates the distance between the new point and all known points.
Then, it selects the k closest points (neighbors) and determines the most frequent class among them to assign to the new point.

📍 Conceptual summary: ➔ KNN classifies by spatial similarity, working like a “memory consultant” that decides based on proximity to past examples.

🗂️ 3. REAL STUDY CASE SCENARIO

Imagine you work in a retail store segmenting customers by:

Spending Score (monthly spending)
Visit Frequency (visits per month)

Your goal is to classify a new customer to recommend targeted promotions.

📝 4. MATHEMATICAL FORMULA AND VARIABLES

🔢 Euclidean Distance Formula

dist(p₁, p₂) = √((x₁ − x₂)² + (y₁ − y₂)²)

➔ Variable explanations:

x1, y1: Coordinates of the new customer
x2, y2: Coordinates of a training customer
dist: Euclidean distance between p1 and p2

📍 Interpretation: Measures geometric proximity, the foundation of KNN decisions.

🛠️ 5. PYTHON FUNCTION THAT AUTOMATES THE FORMULA

🔧 Library: sklearn.neighbors.KNeighborsClassifier

➔ What it does:

Implements KNN efficiently
Stores data with .fit()
Calculates distances and performs voting with .predict()

➔ Why use it:

Avoids manual calculations
Scales for large datasets
Produces standardized, reliable results

💻 6. COMPLETE PYTHON SCRIPT – MANUAL VS BUILT-IN FUNCTION

# 🧠 KNN: Manual calculation vs sklearn function implementation
import numpy as np
from collections import Counter
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier


# 🔢 Training data (customer segments)
# Each row represents [Spending Score, Visit Frequency]
X_train = np.array([[15, 2], [18, 3], [21, 3], [30, 8], [35, 10]])
y_train = np.array(['Low', 'Low', 'Low', 'High', 'High'])

# 🎯 New customer to classify
x_new = np.array([[20, 4]]) 

# ⚙️ Manual Euclidean distance function
def euclidean_distance(p1, p2):
    # Calculates the geometric distance between two points
    return np.sqrt(np.sum((p1 - p2) ** 2))
 
# 🧠 Manual KNN implementation
def knn_predict(x_new, X_train, y_train, k=3):
    # 1. Calculate distances from x_new to each training point
    distances = [euclidean_distance(x_new[0], x) for x in X_train]
   
    # 2. Sort distances and get indices of k nearest neighbors
    k_indices = np.argsort(distances)[:k]

    # 3. Retrieve the labels of the k nearest neighbors
    k_labels = y_train[k_indices]
   
    # 4. Perform majority voting to predict the class
    return Counter(k_labels).most_common(1)[0][0] 

# 🔍 Prediction using manual KNN
pred_manual = knn_predict(x_new, X_train, y_train, k=3)
print(f"[Manual KNN] Predicted class: {pred_manual}") 

# 📊 Plotting the manual method result
plt.scatter(X_train[:,0], X_train[:,1],
            c=['blue' if label=='Low' else 'red' for label in y_train],
            label='Training Data')
plt.scatter(x_new[:,0], x_new[:,1],
            c='green', marker='*', s=200, label='New Customer')
plt.title('Manual KNN Prediction')
plt.xlabel('Spending Score')
plt.ylabel('Visit Frequency')
plt.legend()
plt.show() 

# 🛠️ Using sklearn KNeighborsClassifier implementation
knn = KNeighborsClassifier(n_neighbors=3) 

# Fitting the model with training data
knn.fit(X_train, y_train)

# Predicting the class of the new customer
pred_sklearn = knn.predict(x_new)
print(f"[Sklearn KNN] Predicted class: {pred_sklearn[0]}")

# 📊 Plotting the sklearn method result
plt.scatter(X_train[:,0], X_train[:,1],
            c=['blue' if label=='Low' else 'red' for label in y_train],
            label='Training Data')
plt.scatter(x_new[:,0], x_new[:,1],
            c='purple', marker='*', s=200, label='New Customer')
plt.title('Sklearn KNN Prediction')
plt.xlabel('Spending Score')
plt.ylabel('Visit Frequency')
plt.legend()
plt.show()

🧠 7. DETAILED MACHINE LEARNING PROCESS EXPLANATION

🔍 Line-by-line explanation:

➔ 6.1 distances = [euclidean_distance(x_new[0], x) for x in X_train]

Calculates distances from the new point to all training points (magic moment: determines proximity).

➔ 6.2 k_indices = np.argsort(distances)[:k]

Sorts distances and selects indices of k nearest neighbors (defines who votes).

➔ 6.3 k_labels = y_train[k_indices]

Extracts class labels of neighbors (prepares for final decision).

➔ 6.4 Counter(k_labels).most_common(1)[0][0]

Performs majority vote (final decision step).

➔ 6.5 sklearn .fit() and .predict()

Automates the entire process (magic: automated proximity + voting).

🧩 8. MOMENT OF LEARNING – Where the AI Actually “Decides”

The learning in KNN occurs at prediction time, unlike traditional models that perform training to adjust weights and minimize loss functions. In KNN:

There is no formal training phase.
The model simply stores the training data in memory and uses this historical dataset to classify new points.

🔍 Code lines evidencing the learning moment

Line 1 – Distance Calculation

distances = [euclidean_distance(x_new[0], x) for x in X_train]

✔ What it does: Calculates the Euclidean distance from the new customer to each training point. ✔ Why it matters: This is the first decision step. Here, the model “observes” how close each example is to the new data point. ✔ Technical concept: Geometric proximity. KNN entirely depends on the chosen distance metric (e.g., Euclidean, Manhattan) to base its classification.

Line 2 – Selecting the k Nearest Neighbors

k_indices = np.argsort(distances)[:k]

✔ What it does: Sorts distances in ascending order and selects the indices of the k closest neighbors. ✔ Why it matters: Defines who will vote in the final classification. ✔ Technical concept: This step determines the local decision space – the sample is classified based only on its nearest neighbors, not the entire dataset.

Line 3 – Extracting Neighbor Classes

k_labels = y_train[k_indices]

✔ What it does: Retrieves the labels (classes) of the k selected neighbors. ✔ Why it matters: Forms the basis for the majority vote. ✔ Technical concept: The model uses memory from the training examples, reinforcing KNN’s definition as an instance-based learning model.

Line 4 – Majority Voting (Final Decision)

return Counter(k_labels).most_common(1)[0][0]

✔ What it does: Counts the frequency of each class among the neighbors and returns the most common one. ✔ Why it matters: This is the “magic moment” of KNN learning, where the actual classification decision is made. ✔ Technical concept: KNN’s learning is not about parameter optimization but about efficiently querying stored data and deciding by similarity.

🧠 Conceptual Summary

🔹 KNN learning is considered lazy learning because:

It does not train in advance (no model fitting).
It decides at prediction time by calculating distances and voting based on stored examples.
All of the model’s intelligence is concentrated in these steps: calculating distances, selecting neighbors, and majority voting.

✅ Practical Conclusion

➔ KNN learns by direct comparison, without abstracting functions or creating complex mathematical generalizations.

➔ Its strength lies in its simplicity and geometric clarity, making it excellent for small datasets where spatial relationships between points are clear.

🗂️ 9. TECHNICAL SUMMARY – SCRIPT EXPLANATION TABLE

Line of Code and Function

def euclidean_distance(p1, p2): ➔ Defines distance function (fundamental for proximity logic).
distances = [euclidean_distance(x_new[0], x) for x in X_train] ➔ Calculates distances to all points (magic: determines closeness).
k_indices = np.argsort(distances)[:k] ➔ Sorts and finds k closest neighbors (defines neighborhood).
k_labels = y_train[k_indices] ➔ Extracts class labels of neighbors (prepares for voting).
Counter(k_labels).most_common(1)[0][0] ➔ Voting and final decision (magic: predicted class).
knn = KNeighborsClassifier(n_neighbors=3) ➔ Initializes sklearn model.
knn.fit(X_train, y_train) ➔ Stores training data.
knn.predict(x_new) ➔ Runs prediction (magic: automated proximity + voting).

📍 Important observations:

📌 KNN does not learn in advance – it reacts in real time.
🧭 Decision is based purely on closeness.
🧠 Learning is implicit, by spatial reasoning.
🕵️♂️ Noisy or imbalanced data may hinder generalization.

📌 10. WHEN TO USE THIS TYPE OF LEARNING?

Ideal when:

✅ You have small or simple datasets
✅ You want minimal preprocessing
✅ Classes are clearly separated by geometry

🔎 11. VISUAL INTERPRETATION

🟢 1. Graph – Manual KNN Prediction

Visualization: Displays training data points in blue (class Low) and red (class High). The new customer is marked with a green star.
Interpretation: The new customer at (20,4) was classified as Low. ➔ This result comes from the manual calculation of Euclidean distances, where its 3 nearest neighbors belong to class Low.

Learning Insight: Demonstrates correct implementation of manual KNN, reinforcing the geometric concept of proximity-based classification.

🟣 2. Graph – Sklearn KNN Prediction

Visualization: Shows the same training data with the new customer represented by a purple star.
Interpretation: The sklearn implementation also classified the new customer as Low. ➔ Confirms that the library function produces the same output as the manual method.

Learning Insight: Highlights that sklearn automates distance calculations, neighbor selection, and majority voting internally for efficiency in production.

💻 3. VSCode Terminal – Script Execution Results

First execution (manual_knn.py): Prints: The predicted class for point [3 4] is: A ➔ Demonstrates an earlier example using different data.
Second execution (knn_manual_function.py): Prints: [Manual KNN] Predicted class: Low [Sklearn KNN] Predicted class: Low
Conclusion: ✔ Both manual implementation and sklearn function return the same predicted class (Low). ✔ Confirms correctness of the manual implementation and reliability of the sklearn classifier for real-world use.

✅ Overall Summary

Manual implementation builds foundational understanding of KNN.
Sklearn provides a scalable, production-ready approach with identical logic.

🛠️ 12. PRACTICAL APPLICATIONS

Customer segmentation
Medical diagnosis
Pattern recognition
Recommendation systems

💡 13. EXTRA TIP

Test both methods side by side to strengthen your understanding and ensure consistency in your models.

📅 14. CTA – Follow & Connect

💼 LinkedIn & Newsletters: 👉 https://www.linkedin.com/in/izairton-oliveira-de-vasconcelos-a1916351/ 👉 https://www.linkedin.com/newsletters/scripts-em-python-produtividad-7287106727202742273 👉 https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7319069038595268608

💼 Company Page: 👉 https://www.linkedin.com/company/106356348/

💻 GitHub: 👉 https://github.com/IOVASCON

📊 PYTHON + AI TIP 🧮 How Does the Machine Learn by Proximity – Mathematics and Implementation of KNN (K Nearest Neighbors)

Izairton Vasconcelos

Python Expert | Intelligent Solutions for Finance and Business | AI and Strategic Automation | Predictive Modeling | AI Applied to Business | Creator of Intelligent Python Scripts Python | Advanced Statistical Analysis

🎯 1. OBJECTIVE

🧠 2. CONCEPT

🗂️ 3. REAL STUDY CASE SCENARIO

📝 4. MATHEMATICAL FORMULA AND VARIABLES

🛠️ 5. PYTHON FUNCTION THAT AUTOMATES THE FORMULA

💻 6. COMPLETE PYTHON SCRIPT – MANUAL VS BUILT-IN FUNCTION

🧠 7. DETAILED MACHINE LEARNING PROCESS EXPLANATION

🧩 8. MOMENT OF LEARNING – Where the AI Actually “Decides”

🗂️ 9. TECHNICAL SUMMARY – SCRIPT EXPLANATION TABLE

📌 10. WHEN TO USE THIS TYPE OF LEARNING?

🔎 11. VISUAL INTERPRETATION

🛠️ 12. PRACTICAL APPLICATIONS

💡 13. EXTRA TIP

📅 14. CTA – Follow & Connect

Python Tips for Productivity

750 followers

More articles by this author

Others also viewed

Things You Can Do with Python: Advanced and Special Use Cases

A Beginner's Guide to Probability and Bayesian Reasoning in Python

Building a Machine Learning Model from Scratch Using Python

Understanding Gradient Descent in Python

SIMPLE LINEAR REGRESSION IN PYTHON :

Mastering Logistic Regression: The Complete Guide with Python Code

AI face detection program in Python

Are you new to learn Python? Want to know what is the structure of writing the python code for machine learning operations?

Python Implementation of Binomial Stock Option Pricing

How to build Gradient Boosting Regressor in Python?

Explore topics

🎯 1. OBJECTIVE

🧠 2. CONCEPT

🗂️ 3. REAL STUDY CASE SCENARIO

📝 4. MATHEMATICAL FORMULA AND VARIABLES

🛠️ 5. PYTHON FUNCTION THAT AUTOMATES THE FORMULA

💻 6. COMPLETE PYTHON SCRIPT – MANUAL VS BUILT-IN FUNCTION

🧠 7. DETAILED MACHINE LEARNING PROCESS EXPLANATION

🧩 8. MOMENT OF LEARNING – Where the AI Actually “Decides”

🗂️ 9. TECHNICAL SUMMARY – SCRIPT EXPLANATION TABLE

📌 10. WHEN TO USE THIS TYPE OF LEARNING?

🔎 11. VISUAL INTERPRETATION

🛠️ 12. PRACTICAL APPLICATIONS

💡 13. EXTRA TIP

📅 14. CTA – Follow & Connect

Python Tips for Productivity

750 followers

📈 Estimating Sales with Multiple Linear Regression: From Math to Python with Solve

Jul 27, 2025

🧠 PYTHON + FINTECH – Handling Outliers in Cryptocurrency Time Series Using Dynamic Z-Score and EWMA

Jul 24, 2025

🧠 PYTHON + TIME SERIES – Correcting Outliers with Z-Score and Linear Interpolation

Jul 23, 2025

Python GUI Libraries: Choosing the Right One + Detailed Explanations and Real Example

Jul 22, 2025

🚦 Stationarity in Time Series: The Power of ADF and KPSS Tests for ARIMA Models

Jul 22, 2025

⚙️ PYTHON – ARIMA Modeling: Parameter Selection (p, d, q)

Jul 8, 2025

📝 Correlation Between Age and Income: Technical and Practical Python Guide

Jul 7, 2025

📊 PYTHON TIP + MACHINE LEARNING – Sigmoid Function Applied to Credit Risk Classification

Jul 6, 2025

📑 Python GUI & Machine Learning Series Cosine Similarity Applied to Product Recommendation

Jul 4, 2025

📊 PYTHON + AI – MACHINE LEARNING 🧮 How Does the Machine Learn a Line? – The Mathematics Behind Gradient Descent

Jul 3, 2025

Others also viewed

Things You Can Do with Python: Advanced and Special Use Cases

A Beginner's Guide to Probability and Bayesian Reasoning in Python

Building a Machine Learning Model from Scratch Using Python

Understanding Gradient Descent in Python

SIMPLE LINEAR REGRESSION IN PYTHON :

Mastering Logistic Regression: The Complete Guide with Python Code

AI face detection program in Python

Are you new to learn Python? Want to know what is the structure of writing the python code for machine learning operations?

Python Implementation of Binomial Stock Option Pricing

How to build Gradient Boosting Regressor in Python?

Explore topics