Recommender Systems using KNN
Last Updated :
11 Jun, 2024
Recommender systems are widely used in various applications, such as e-commerce, entertainment, and social media, to provide personalized recommendations to users. One common approach to building recommender systems is using the K-Nearest Neighbors (KNN) algorithm. This method leverages the similarity between users or items to generate recommendations.
Overview of K-Nearest Neighbors (KNN)
KNN is a simple, non-parametric, and instance-based learning algorithm that can be used for classification and regression tasks. In the context of recommender systems, KNN is used to find the closest neighbours (either users or items) based on a similarity metric. The recommendations are then made based on the preferences of these neighbours.
Types of Recommender Systems
- User-Based Collaborative Filtering: This approach recommends items to a user by finding similar users (neighbours) who have similar preferences.
- Item-Based Collaborative Filtering: This approach recommends items based on the similarity between items. It finds items similar to those the user has liked in the past.
Steps to Build a KNN-Based Recommender System
1. Data Collection and Preprocessing
Collect data on user interactions with items, such as ratings, purchases, or clicks. Preprocess the data to handle missing values, normalize ratings, and transform the data into a user-item matrix.
2. Similarity Computation
Calculate the similarity between users or items using a similarity metric. Common metrics include:
- Cosine Similarity: Measures the cosine of the angle between two vectors.
- Pearson Correlation: Measures the linear correlation between two vectors.
- Jaccard Similarity: Measures the similarity between two sets.
3. Finding Neighbors
For each user or item, find the K-nearest neighbors based on the computed similarity scores. This involves sorting the similarity scores and selecting the top K neighbors.
4. Generating Recommendations
Based on the preferences of the K-nearest neighbors, generate recommendations for the user. This can be done by aggregating the ratings or interactions of the neighbors and recommending items with the highest aggregated scores.
Code Implementation of building Recommender Systems using KNN
Step 1: Import Libraries
First, we import the necessary libraries. numpy
is used for numerical operations, and NearestNeighbors
from scikit-learn
is used to find the nearest neighbors based on cosine similarity.
Python
import numpy as np
from sklearn.neighbors import NearestNeighbors
Step 2: Create User-Item Interaction Matrix
We create a user-item interaction matrix. This matrix represents the ratings given by users to items, where rows represent users and columns represent items. A value of 0
indicates that the user has not rated the item.
Python
# Example user-item interaction matrix (ratings from 1 to 5, 0 means no rating)
user_item_matrix = np.array([
[4, 0, 0, 5, 1],
[5, 5, 4, 0, 0],
[0, 0, 0, 2, 4],
[0, 3, 0, 0, 5],
[5, 0, 4, 0, 0]
])
Step 3: Normalize the Matrix
We normalize the user-item matrix by subtracting the mean rating of each user. This step is important to account for differences in user rating behavior.
Python
# Normalize the matrix by subtracting the mean rating of each user
mean_user_rating = np.mean(user_item_matrix, axis=1).reshape(-1, 1)
normalized_matrix = user_item_matrix - mean_user_rating
Step 4: Fit the KNN Model
We fit the KNN model using the normalized user-item matrix. The metric='cosine'
parameter specifies that we use cosine similarity to measure the similarity between users.
Python
# Fit the KNN model
knn = NearestNeighbors(metric='cosine', algorithm='brute')
knn.fit(normalized_matrix)
Step 5: Find Nearest Neighbors
For a target user (e.g., user index 0), we find the k nearest neighbors. In this example, we choose n_neighbors=3
Python
# Find the k nearest neighbors for a target user (e.g., user index 0)
target_user_index = 0
distances, indices = knn.kneighbors(normalized_matrix[target_user_index].reshape(1, -1), n_neighbors=3)
Step 6: Aggregate Ratings from Neighbors
We aggregate the ratings from the nearest neighbors. We average the ratings of the k nearest neighbors to predict the ratings for the target user.
Python
# Aggregate ratings from the nearest neighbors
neighbors_ratings = user_item_matrix[indices.flatten()]
predicted_ratings = neighbors_ratings.mean(axis=0)
Step 7: Recommend Items
We identify items that the target user has not rated (i.e., entries with a value of 0
). We then recommend the items with the highest predicted ratings.
Python
# Recommend items with the highest predicted ratings that the target user hasn't rated
unrated_items = np.where(user_item_matrix[target_user_index] == 0)[0]
recommended_items = unrated_items[np.argsort(predicted_ratings[unrated_items])[::-1]]
print(f"Recommended items for user {target_user_index}: {recommended_items}")
Output:
Top 10 similar users to user 196 are:
879
431
672
168
275
744
114
600
358
656
The list provided represents the top 10 users who are most similar to user 196 based on some similarity metric, likely using a collaborative filtering approach
Advantages and Disadvantages
Advantages:
- Simple to implement and understand.
- Effective for small to medium-sized datasets.
- No need for model training; works on instance-based learning.
Disadvantages:
- Computationally expensive for large datasets.
- Memory-intensive as it requires storing the entire dataset.
- Performance can degrade with sparse data.
Conclusion
KNN is a powerful yet simple algorithm for building recommender systems. By leveraging the similarity between users or items, it can generate personalized recommendations effectively. However, it is essential to consider the computational and memory limitations when dealing with large datasets.
Similar Reads
Recommender System using Pyspark - Python A recommender system is a type of information filtering system that provides personalized recommendations to users based on their preferences, interests, and past behaviors. Recommender systems come in a variety of forms, such as content-based, collaborative filtering, and hybrid systems. Content-ba
5 min read
What are Recommender Systems? There are so many choices that people often feel trapped, whether they're trying to choose a movie to watch, the right product to buy, or new music to listen to. To solve this problem, recommendation systems comes into play that help people find their way through all of these choices by giving them
9 min read
ML - Content Based Recommender System Recommendation systems are an important part of many digital platforms, like for suggesting movies on Netflix, recommending products on Amazon etc. Among the different types of recommendation approaches, Content-Based Recommender Systems focus on the characteristics of items and the preferences of u
4 min read
ML - Content Based Recommender System Recommendation systems are an important part of many digital platforms, like for suggesting movies on Netflix, recommending products on Amazon etc. Among the different types of recommendation approaches, Content-Based Recommender Systems focus on the characteristics of items and the preferences of u
4 min read
ML - Content Based Recommender System Recommendation systems are an important part of many digital platforms, like for suggesting movies on Netflix, recommending products on Amazon etc. Among the different types of recommendation approaches, Content-Based Recommender Systems focus on the characteristics of items and the preferences of u
4 min read
Recommendation System in Python Industry leaders like Netflix, Amazon and Uber Eats have transformed how individuals access products and services. They do this by using recommendation algorithms that improve the user experience. These systems offer personalized recommendations based on users interests and preferences. In this arti
6 min read