📊 PYTHON + AI TIP 🧮 How Does the Machine Learn by Proximity – Mathematics and Implementation of KNN (K Nearest Neighbors)
📰 Edição #52 — PYTHON + AI TIP - How Does the Machine Learn by Proximity – Mathematics and Implementation of KNN (K Nearest Neighbors)
🎯 1. OBJECTIVE
Understand how the KNN algorithm classifies new data points by comparing the manual formula implementation with the Python built-in function, including:
🧠 2. CONCEPT
KNN (K Nearest Neighbors) is a supervised learning algorithm based on geometric proximity. Its principle is simple:
📍 Conceptual summary: ➔ KNN classifies by spatial similarity, working like a “memory consultant” that decides based on proximity to past examples.
🗂️ 3. REAL STUDY CASE SCENARIO
Imagine you work in a retail store segmenting customers by:
Your goal is to classify a new customer to recommend targeted promotions.
📝 4. MATHEMATICAL FORMULA AND VARIABLES
🔢 Euclidean Distance Formula
dist(p₁, p₂) = √((x₁ − x₂)² + (y₁ − y₂)²)
➔ Variable explanations:
📍 Interpretation: Measures geometric proximity, the foundation of KNN decisions.
🛠️ 5. PYTHON FUNCTION THAT AUTOMATES THE FORMULA
🔧 Library: sklearn.neighbors.KNeighborsClassifier
➔ What it does:
➔ Why use it:
💻 6. COMPLETE PYTHON SCRIPT – MANUAL VS BUILT-IN FUNCTION
# 🧠 KNN: Manual calculation vs sklearn function implementation
import numpy as np
from collections import Counter
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
# 🔢 Training data (customer segments)
# Each row represents [Spending Score, Visit Frequency]
X_train = np.array([[15, 2], [18, 3], [21, 3], [30, 8], [35, 10]])
y_train = np.array(['Low', 'Low', 'Low', 'High', 'High'])
# 🎯 New customer to classify
x_new = np.array([[20, 4]])
# ⚙️ Manual Euclidean distance function
def euclidean_distance(p1, p2):
# Calculates the geometric distance between two points
return np.sqrt(np.sum((p1 - p2) ** 2))
# 🧠 Manual KNN implementation
def knn_predict(x_new, X_train, y_train, k=3):
# 1. Calculate distances from x_new to each training point
distances = [euclidean_distance(x_new[0], x) for x in X_train]
# 2. Sort distances and get indices of k nearest neighbors
k_indices = np.argsort(distances)[:k]
# 3. Retrieve the labels of the k nearest neighbors
k_labels = y_train[k_indices]
# 4. Perform majority voting to predict the class
return Counter(k_labels).most_common(1)[0][0]
# 🔍 Prediction using manual KNN
pred_manual = knn_predict(x_new, X_train, y_train, k=3)
print(f"[Manual KNN] Predicted class: {pred_manual}")
# 📊 Plotting the manual method result
plt.scatter(X_train[:,0], X_train[:,1],
c=['blue' if label=='Low' else 'red' for label in y_train],
label='Training Data')
plt.scatter(x_new[:,0], x_new[:,1],
c='green', marker='*', s=200, label='New Customer')
plt.title('Manual KNN Prediction')
plt.xlabel('Spending Score')
plt.ylabel('Visit Frequency')
plt.legend()
plt.show()
# 🛠️ Using sklearn KNeighborsClassifier implementation
knn = KNeighborsClassifier(n_neighbors=3)
# Fitting the model with training data
knn.fit(X_train, y_train)
# Predicting the class of the new customer
pred_sklearn = knn.predict(x_new)
print(f"[Sklearn KNN] Predicted class: {pred_sklearn[0]}")
# 📊 Plotting the sklearn method result
plt.scatter(X_train[:,0], X_train[:,1],
c=['blue' if label=='Low' else 'red' for label in y_train],
label='Training Data')
plt.scatter(x_new[:,0], x_new[:,1],
c='purple', marker='*', s=200, label='New Customer')
plt.title('Sklearn KNN Prediction')
plt.xlabel('Spending Score')
plt.ylabel('Visit Frequency')
plt.legend()
plt.show()
🧠 7. DETAILED MACHINE LEARNING PROCESS EXPLANATION
🔍 Line-by-line explanation:
➔ 6.1 distances = [euclidean_distance(x_new[0], x) for x in X_train]
Calculates distances from the new point to all training points (magic moment: determines proximity).
➔ 6.2 k_indices = np.argsort(distances)[:k]
Sorts distances and selects indices of k nearest neighbors (defines who votes).
➔ 6.3 k_labels = y_train[k_indices]
Extracts class labels of neighbors (prepares for final decision).
➔ 6.4 Counter(k_labels).most_common(1)[0][0]
Performs majority vote (final decision step).
➔ 6.5 sklearn .fit() and .predict()
Automates the entire process (magic: automated proximity + voting).
🧩 8. MOMENT OF LEARNING – Where the AI Actually “Decides”
The learning in KNN occurs at prediction time, unlike traditional models that perform training to adjust weights and minimize loss functions. In KNN:
🔍 Code lines evidencing the learning moment
Line 1 – Distance Calculation
distances = [euclidean_distance(x_new[0], x) for x in X_train]
✔ What it does: Calculates the Euclidean distance from the new customer to each training point. ✔ Why it matters: This is the first decision step. Here, the model “observes” how close each example is to the new data point. ✔ Technical concept: Geometric proximity. KNN entirely depends on the chosen distance metric (e.g., Euclidean, Manhattan) to base its classification.
Line 2 – Selecting the k Nearest Neighbors
k_indices = np.argsort(distances)[:k]
✔ What it does: Sorts distances in ascending order and selects the indices of the k closest neighbors. ✔ Why it matters: Defines who will vote in the final classification. ✔ Technical concept: This step determines the local decision space – the sample is classified based only on its nearest neighbors, not the entire dataset.
Line 3 – Extracting Neighbor Classes
k_labels = y_train[k_indices]
✔ What it does: Retrieves the labels (classes) of the k selected neighbors. ✔ Why it matters: Forms the basis for the majority vote. ✔ Technical concept: The model uses memory from the training examples, reinforcing KNN’s definition as an instance-based learning model.
Line 4 – Majority Voting (Final Decision)
return Counter(k_labels).most_common(1)[0][0]
✔ What it does: Counts the frequency of each class among the neighbors and returns the most common one. ✔ Why it matters: This is the “magic moment” of KNN learning, where the actual classification decision is made. ✔ Technical concept: KNN’s learning is not about parameter optimization but about efficiently querying stored data and deciding by similarity.
🧠 Conceptual Summary
🔹 KNN learning is considered lazy learning because:
✅ Practical Conclusion
➔ KNN learns by direct comparison, without abstracting functions or creating complex mathematical generalizations.
➔ Its strength lies in its simplicity and geometric clarity, making it excellent for small datasets where spatial relationships between points are clear.
🗂️ 9. TECHNICAL SUMMARY – SCRIPT EXPLANATION TABLE
Line of Code and Function
📍 Important observations:
📌 10. WHEN TO USE THIS TYPE OF LEARNING?
Ideal when:
🔎 11. VISUAL INTERPRETATION
🟢 1. Graph – Manual KNN Prediction
🟣 2. Graph – Sklearn KNN Prediction
💻 3. VSCode Terminal – Script Execution Results
✅ Overall Summary
🛠️ 12. PRACTICAL APPLICATIONS
💡 13. EXTRA TIP
Test both methods side by side to strengthen your understanding and ensure consistency in your models.
📅 14. CTA – Follow & Connect
💼 LinkedIn & Newsletters: 👉 https://www.linkedin.com/in/izairton-oliveira-de-vasconcelos-a1916351/ 👉 https://www.linkedin.com/newsletters/scripts-em-python-produtividad-7287106727202742273 👉 https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7319069038595268608
💼 Company Page: 👉 https://www.linkedin.com/company/106356348/