0% found this document useful (0 votes)
6 views19 pages

K-Means for Recommendation System

The document presents a case study on the application of K-Means clustering in recommendation systems, detailing the types of recommendation systems and the K-Means algorithm steps. It discusses how to apply K-Means for a movie recommendation system, including data collection, clustering, and movie recommendations based on user attributes. The study concludes with the advantages and limitations of K-Means clustering, emphasizing its effectiveness in building recommendation systems.

Uploaded by

23p1192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views19 pages

K-Means for Recommendation System

The document presents a case study on the application of K-Means clustering in recommendation systems, detailing the types of recommendation systems and the K-Means algorithm steps. It discusses how to apply K-Means for a movie recommendation system, including data collection, clustering, and movie recommendations based on user attributes. The study concludes with the advantages and limitations of K-Means clustering, emphasizing its effectiveness in building recommendation systems.

Uploaded by

23p1192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

A CASE STUDY ON THE USE OF K-

MEANS CLUSTERING IN
RECOMMENDATION SYSTEM

Presented by:
Prajwal S M
Sagar Talagatti
Sanjay Lote
OUTLINE

1. Introduction to Recommendation Systems


2. Types of Recommendation Systems
3. Introduction to K-Means Clustering
4. Steps in K-Means algorithm
5. Methods to determine value of K
6. Applying K-Means for a Movie Recommendation System
7. Advantages
8. Limitations
9. Conclusions
10. References
1. Introduction to Recommendation Systems

• A recommendation system is an artificial intelligence or AI algorithm, usually associated with machine learning,
that uses Data Science & Big Data to suggest or recommend additional products to consumers.

• Uses data to help predict, narrow down, and find what people are looking for among an exponentially growing
number of options.

• These can be based on various criteria, including past purchases, search history, demographic information, and
other factors. Recommender systems are highly useful as they help users discover products and services they
might otherwise have not found on their own.

• Because of their capability to predict consumer interests and desires on a highly personalized level,
recommender systems are a favorite with content and product providers. They can drive consumers to just
about any product or service that interests them, from books to videos to health classes to clothing.
2. Types of Recommendation Systems

• While there are a vast number of recommender algorithms and techniques, most fall into these broad
categories:
1. Collaborative filtering
2. Content filtering
3. Context filtering

Fig. 1: Recommendation System


2. Types of Recommendation Systems (contd…)

1. Collaborative Filtering
• Collaborative filtering algorithms recommend items (this is the
filtering part) based on preference information from many users
(this is the collaborative part).
• This approach uses similarity of user preference behavior, given
previous interactions between users and items, recommender
algorithms learn to predict future interaction.
• The idea is that if some people have made similar decisions and
purchases in the past, like a movie choice, then there is a high
probability they will agree on additional future selections.

Fig. 2: Collaborative Filtering


2. Types of Recommendation Systems (contd…)

2. Content Filtering
• Uses the attributes or features of an item (this is the content
part) to recommend other items similar to the user’s preferences.
• This approach is based on similarity of item and user features,
given information about a user and items they have interacted
with (e.g. a user’s age, the category of a restaurant’s cuisine, the
average review for a movie), model the likelihood of a new
interaction.
• For example, if a content filtering recommender sees you liked
the movies You’ve Got Mail and Sleepless in Seattle, it might
recommend another movie to you with the same genres and/or
cast such as Joe Versus the Volcano. Fig. 3: Content Filtering
2. Types of Recommendation Systems (contd…)

3. Context Filtering
• Includes users’ contextual information in the recommendation
process.
• Netflix spoke at NVIDIA GTC about making better
recommendations by framing a recommendation as a contextual
sequence prediction. This approach uses a sequence of
contextual user actions, plus the current context, to predict the
probability of the next action.
• In the Netflix example, given one sequence for each user—the
country, device, date, and time when they watched a movie—
they trained a model to predict what to watch next.
Fig. 4: Context Filtering
3. Introduction to K-Means Clustering

• Say you are given a data set where each observed example has a set of features, but has no labels. Labels
are an essential ingredient to a supervised algorithm. We cannot run a supervised algorithm.
• One of the most straightforward tasks we can perform on a data set without labels is to find groups of data
in our dataset which are similar to one another -- what we call clusters.
• K-Means is one of the most popular "clustering" algorithms. K-means stores “k” centroids that it uses to
define clusters. A point is considered to be in a particular cluster if it is closer to that cluster's centroid
than any other centroid.
• It partitions the data points into clusters in such a way that:
• Data points in same cluster have high degree of similarity.
• Data points in different clusters have high degree of dis-similarity.

Fig. 5: K-Means illustration


4. Steps in K-Means algorithm

1. Choose the number of clusters K. The value of K can be chosen randomly or based on some observations
or using a method like Within Cluster Sum of Squares(WCSS).
2. Randomly select any K data points as cluster centers. Select cluster centers in such a way that they are
as farther as possible from each other.
3. Calculate the distance between each data point and each cluster center. The distance may be calculated
either by using given distance function or by using Euclidean distance formula.
4. Assign each data point to some cluster. A data point is assigned to that cluster whose center is nearest to
that data point.
5. Re-compute the center of newly formed clusters. The center of a cluster is computed by taking mean of
all the data points contained in that cluster.
6. Keep repeating the procedure from Step-3 to Step-5 until any of the following stopping criteria is met:
• Center of newly formed clusters do not change
• Data points remain present in the same cluster
• Maximum number of iterations are reached
5. Methods to determine the value of K

• Following are some of the ways to select the value of K:


1. Select a value of K randomly.
2. Plot a scatterplot of the data points and if you can readily observe some clusters being formed,
choose the value of K based on your observation.
3. Within Cluster Sum of Squares(WCSS) method, also called the Elbow method.
5. Methods to determine the value of K(contd…)

• Let us look into the WCSS method in some detail:


• WCSS stands for Within Cluster Sum of Squares, which defines the total
variations within a cluster. The formula to calculate the value of WCSS (for 3
clusters) is given below:
• WCSS= ∑Pi in Cluster1 distance(Pi,C1)^ 2 + ∑Pi in Cluster2
distance(Pi,C2)^2 + ∑Pi in Cluster3 distance(Pi,C3)^2
• To find the optimal value of clusters, the elbow method follows the below steps:
• It executes the K-means clustering on a given dataset for different K values
(ex: ranges from 1-10).
• For each value of K, calculates the WCSS value.
• Plots a curve between calculated WCSS values and the number of clusters
K. Fig. 6: WCSS method to
calculate K value
• The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.
6. Applying K-Means for a Movie Recommendation System

Now let us see how we can apply K-Means to train a Movie Recommendation System:

1. Collect dataset to train a model. It can be taken from Kaggle. If a company like Netflix is developing a system,
then they already have the data of their users.

2. This data has several attributes like username, age, subscriptions, watch history of users, ratings and reviews
etc.. Not all data is of use for training the model, hence we extract only those attributes that are required.

3. In this case, lets say we are implementing recommender using Content Based Filtering. So we make use of
user’s age, ratings, and the genres of the movies that they watched. As genres is a nominal variable, assign
some numeric values to it instead and maintain this mapping for further use.

4. Next, we run K-Means algorithm on the dataset obtained in Step 3 for say values of K in the range 1-20. For
each iteration, we calculate WCSS value and plot a WCSS curve to obtain optimal value of K.

5. Now we run K-Means for the selected value of K to obtain K clusters.

6. These clusters can be formed based on say age-genres, ratings-genres.


6. Applying K-Means for a Movie Recommendation
System(contd…)
7. Once the model is ready, we have clusters of users based on age-genres, ratings-genres.
8. Now suppose a new movie is released, it will have a genre, an age group above which it is suitable to
view etc…
9. We can use these attributes of the movie, find which cluster it fits into and recommend that movie to
users in that cluster.
10. Not just that, we can also do the same for already existing movies and then recommend appropriate
movies to users who may best like them.
6. Applying K-Means for a Movie Recommendation
System(contd…)

Fig. 7: Movie Recommendation


System
7. Advantages
1. It is relatively efficient with time complexity O(nkt) where-
• n = number of instances
• k = number of clusters
• t = number of iterations
2. Relatively simple to implement.
3. Scales to large data sets.
4. Generalizes to clusters of different shapes and sizes, such as elliptical clusters.
5. Guarantees convergence.
8. Limitations
1. It requires to specify the number of clusters (k) in advance.
2. It is sensitive to noisy data and outliers and their presence may lead to a bad model.
3. It is not suitable to identify clusters with non-convex shapes. A convex shape is a shape where all of its
parts “point outwards”.
4. It’s result is dependent on the initially selected centroids, hence it may produce different clusters for the
same dataset when run by selecting different initial centroids, and we need to pick the optimal result
ourselves.
5. Less effective as the number of dimensions(features) increase.
9. Conclusions
Based on the discussion so far, we can conclude the following:
1. Recommendation Systems have become popular due to their effectiveness.
2. There are 3 main approaches to develop Recommendation Systems.
3. K-Means is a popular clustering algorithm used in multiple applications.
4. We can build an effective Recommendation System using K-Means clustering.
10. References

1. What is a Recommendation System and types -


https://www.nvidia.com/en-us/glossary/recommendation-system/
2. How to apply K-Means to a Movie Recommendation System -
https://www.quora.com/How-can-I-apply-a-k-means-algorithm-in-a-recommendation-system
3. Movie Recommender System Using K-Means Clustering - https://ieeexplore.ieee.org/document/8776969
THANK YOU

You might also like