0% found this document useful (0 votes)

6 views19 pages

K-Means for Recommendation System

The document presents a case study on the application of K-Means clustering in recommendation systems, detailing the types of recommendation systems and the K-Means algorithm steps. It discusses how to apply K-Means for a movie recommendation system, including data collection, clustering, and movie recommendations based on user attributes. The study concludes with the advantages and limitations of K-Means clustering, emphasizing its effectiveness in building recommendation systems.

Uploaded by

23p1192

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views19 pages

K-Means for Recommendation System

Uploaded by

23p1192

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

A CASE STUDY ON THE USE OF K-

MEANS CLUSTERING IN
RECOMMENDATION SYSTEM

Presented by:
Prajwal S M
Sagar Talagatti
Sanjay Lote
OUTLINE

1. Introduction to Recommendation Systems

2. Types of Recommendation Systems
3. Introduction to K-Means Clustering
4. Steps in K-Means algorithm
5. Methods to determine value of K
6. Applying K-Means for a Movie Recommendation System
7. Advantages
8. Limitations
9. Conclusions
10. References
1. Introduction to Recommendation Systems

• A recommendation system is an artificial intelligence or AI algorithm, usually associated with machine learning,
that uses Data Science & Big Data to suggest or recommend additional products to consumers.

• Uses data to help predict, narrow down, and find what people are looking for among an exponentially growing
number of options.

• These can be based on various criteria, including past purchases, search history, demographic information, and
other factors. Recommender systems are highly useful as they help users discover products and services they
might otherwise have not found on their own.

• Because of their capability to predict consumer interests and desires on a highly personalized level,
recommender systems are a favorite with content and product providers. They can drive consumers to just
about any product or service that interests them, from books to videos to health classes to clothing.
2. Types of Recommendation Systems

• While there are a vast number of recommender algorithms and techniques, most fall into these broad
categories:
1. Collaborative filtering
2. Content filtering
3. Context filtering

Fig. 1: Recommendation System

2. Types of Recommendation Systems (contd…)

1. Collaborative Filtering
• Collaborative filtering algorithms recommend items (this is the
filtering part) based on preference information from many users
(this is the collaborative part).
• This approach uses similarity of user preference behavior, given
previous interactions between users and items, recommender
algorithms learn to predict future interaction.
• The idea is that if some people have made similar decisions and
purchases in the past, like a movie choice, then there is a high
probability they will agree on additional future selections.

Fig. 2: Collaborative Filtering

2. Types of Recommendation Systems (contd…)

2. Content Filtering
• Uses the attributes or features of an item (this is the content
part) to recommend other items similar to the user’s preferences.
• This approach is based on similarity of item and user features,
given information about a user and items they have interacted
with (e.g. a user’s age, the category of a restaurant’s cuisine, the
average review for a movie), model the likelihood of a new
interaction.
• For example, if a content filtering recommender sees you liked
the movies You’ve Got Mail and Sleepless in Seattle, it might
recommend another movie to you with the same genres and/or
cast such as Joe Versus the Volcano. Fig. 3: Content Filtering
2. Types of Recommendation Systems (contd…)

3. Context Filtering
• Includes users’ contextual information in the recommendation
process.
• Netflix spoke at NVIDIA GTC about making better
recommendations by framing a recommendation as a contextual
sequence prediction. This approach uses a sequence of
contextual user actions, plus the current context, to predict the
probability of the next action.
• In the Netflix example, given one sequence for each user—the
country, device, date, and time when they watched a movie—
they trained a model to predict what to watch next.
Fig. 4: Context Filtering
3. Introduction to K-Means Clustering

• Say you are given a data set where each observed example has a set of features, but has no labels. Labels
are an essential ingredient to a supervised algorithm. We cannot run a supervised algorithm.
• One of the most straightforward tasks we can perform on a data set without labels is to find groups of data
in our dataset which are similar to one another -- what we call clusters.
• K-Means is one of the most popular "clustering" algorithms. K-means stores “k” centroids that it uses to
define clusters. A point is considered to be in a particular cluster if it is closer to that cluster's centroid
than any other centroid.
• It partitions the data points into clusters in such a way that:
• Data points in same cluster have high degree of similarity.
• Data points in different clusters have high degree of dis-similarity.

Fig. 5: K-Means illustration

4. Steps in K-Means algorithm

1. Choose the number of clusters K. The value of K can be chosen randomly or based on some observations
or using a method like Within Cluster Sum of Squares(WCSS).
2. Randomly select any K data points as cluster centers. Select cluster centers in such a way that they are
as farther as possible from each other.
3. Calculate the distance between each data point and each cluster center. The distance may be calculated
either by using given distance function or by using Euclidean distance formula.
4. Assign each data point to some cluster. A data point is assigned to that cluster whose center is nearest to
that data point.
5. Re-compute the center of newly formed clusters. The center of a cluster is computed by taking mean of
all the data points contained in that cluster.
6. Keep repeating the procedure from Step-3 to Step-5 until any of the following stopping criteria is met:
• Center of newly formed clusters do not change
• Data points remain present in the same cluster
• Maximum number of iterations are reached
5. Methods to determine the value of K

• Following are some of the ways to select the value of K:

1. Select a value of K randomly.
2. Plot a scatterplot of the data points and if you can readily observe some clusters being formed,
choose the value of K based on your observation.
3. Within Cluster Sum of Squares(WCSS) method, also called the Elbow method.
5. Methods to determine the value of K(contd…)

• Let us look into the WCSS method in some detail:

• WCSS stands for Within Cluster Sum of Squares, which defines the total
variations within a cluster. The formula to calculate the value of WCSS (for 3
clusters) is given below:
• WCSS= ∑Pi in Cluster1 distance(Pi,C1)^ 2 + ∑Pi in Cluster2
distance(Pi,C2)^2 + ∑Pi in Cluster3 distance(Pi,C3)^2
• To find the optimal value of clusters, the elbow method follows the below steps:
• It executes the K-means clustering on a given dataset for different K values
(ex: ranges from 1-10).
• For each value of K, calculates the WCSS value.
• Plots a curve between calculated WCSS values and the number of clusters
K. Fig. 6: WCSS method to
calculate K value
• The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.
6. Applying K-Means for a Movie Recommendation System

Now let us see how we can apply K-Means to train a Movie Recommendation System:

1. Collect dataset to train a model. It can be taken from Kaggle. If a company like Netflix is developing a system,
then they already have the data of their users.

2. This data has several attributes like username, age, subscriptions, watch history of users, ratings and reviews
etc.. Not all data is of use for training the model, hence we extract only those attributes that are required.

3. In this case, lets say we are implementing recommender using Content Based Filtering. So we make use of
user’s age, ratings, and the genres of the movies that they watched. As genres is a nominal variable, assign
some numeric values to it instead and maintain this mapping for further use.

4. Next, we run K-Means algorithm on the dataset obtained in Step 3 for say values of K in the range 1-20. For
each iteration, we calculate WCSS value and plot a WCSS curve to obtain optimal value of K.

5. Now we run K-Means for the selected value of K to obtain K clusters.

6. These clusters can be formed based on say age-genres, ratings-genres.

6. Applying K-Means for a Movie Recommendation
System(contd…)
7. Once the model is ready, we have clusters of users based on age-genres, ratings-genres.
8. Now suppose a new movie is released, it will have a genre, an age group above which it is suitable to
view etc…
9. We can use these attributes of the movie, find which cluster it fits into and recommend that movie to
users in that cluster.
10. Not just that, we can also do the same for already existing movies and then recommend appropriate
movies to users who may best like them.
6. Applying K-Means for a Movie Recommendation
System(contd…)

Fig. 7: Movie Recommendation

System
7. Advantages
1. It is relatively efficient with time complexity O(nkt) where-
• n = number of instances
• k = number of clusters
• t = number of iterations
2. Relatively simple to implement.
3. Scales to large data sets.
4. Generalizes to clusters of different shapes and sizes, such as elliptical clusters.
5. Guarantees convergence.
8. Limitations
1. It requires to specify the number of clusters (k) in advance.
2. It is sensitive to noisy data and outliers and their presence may lead to a bad model.
3. It is not suitable to identify clusters with non-convex shapes. A convex shape is a shape where all of its
parts “point outwards”.
4. It’s result is dependent on the initially selected centroids, hence it may produce different clusters for the
same dataset when run by selecting different initial centroids, and we need to pick the optimal result
ourselves.
5. Less effective as the number of dimensions(features) increase.
9. Conclusions
Based on the discussion so far, we can conclude the following:
1. Recommendation Systems have become popular due to their effectiveness.
2. There are 3 main approaches to develop Recommendation Systems.
3. K-Means is a popular clustering algorithm used in multiple applications.
4. We can build an effective Recommendation System using K-Means clustering.
10. References

1. What is a Recommendation System and types -

https://www.nvidia.com/en-us/glossary/recommendation-system/
2. How to apply K-Means to a Movie Recommendation System -
https://www.quora.com/How-can-I-apply-a-k-means-algorithm-in-a-recommendation-system
3. Movie Recommender System Using K-Means Clustering - https://ieeexplore.ieee.org/document/8776969
THANK YOU

kmeansfinal
No ratings yet
kmeansfinal
16 pages
Movie Recommendation
No ratings yet
Movie Recommendation
8 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
DSUP_Exp7[1]
No ratings yet
DSUP_Exp7[1]
6 pages
K, Eans
No ratings yet
K, Eans
4 pages
AI-unit-5
No ratings yet
AI-unit-5
103 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
M5
No ratings yet
M5
40 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Module 5 - Clustering - Afterclassb
No ratings yet
Module 5 - Clustering - Afterclassb
49 pages
k means clustering
No ratings yet
k means clustering
27 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
K-MEANS CLUSTERING ppt kpu
No ratings yet
K-MEANS CLUSTERING ppt kpu
4 pages
Cluster
No ratings yet
Cluster
50 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
K-Mean
No ratings yet
K-Mean
9 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
ML Unit III.pptx
No ratings yet
ML Unit III.pptx
82 pages
M5
No ratings yet
M5
40 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Clustering
No ratings yet
Clustering
6 pages
UNIT-4
No ratings yet
UNIT-4
22 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
09 Clustering
No ratings yet
09 Clustering
21 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Clustering
No ratings yet
Clustering
104 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
2016 - A Random Forest Approach for Rating–Based Recommender System
No ratings yet
2016 - A Random Forest Approach for Rating–Based Recommender System
5 pages
Lecture 2.1.1 to 2.1.2 (1)
No ratings yet
Lecture 2.1.1 to 2.1.2 (1)
97 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering
No ratings yet
Clustering
34 pages
K Mean
No ratings yet
K Mean
7 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
k Means Clustering
No ratings yet
k Means Clustering
29 pages
02 - Clustering
No ratings yet
02 - Clustering
43 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
Clustering
No ratings yet
Clustering
67 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
Machine Learning Algorithms For Recommender System - A Comparative Analysis
No ratings yet
Machine Learning Algorithms For Recommender System - A Comparative Analysis
4 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Muh. Rafli Rasyid - Jurnal EAI (Internasional)
No ratings yet
Muh. Rafli Rasyid - Jurnal EAI (Internasional)
8 pages
Stock Chart Pattern Recognition With Deep Learning: Marc Velay and Fabrice Daniel
No ratings yet
Stock Chart Pattern Recognition With Deep Learning: Marc Velay and Fabrice Daniel
6 pages
4-Uninformed Search-part2
No ratings yet
4-Uninformed Search-part2
38 pages
The Influence of The Regularization Parameter and The First Estimate On The Performance of Tikhonov Regularized Non-Linear Image Restoration Algorithms
No ratings yet
The Influence of The Regularization Parameter and The First Estimate On The Performance of Tikhonov Regularized Non-Linear Image Restoration Algorithms
13 pages
(Tit 701) Cryptography and Network Security
No ratings yet
(Tit 701) Cryptography and Network Security
3 pages
Design and Analysis of Algorithm
100% (1)
Design and Analysis of Algorithm
20 pages
Seminar Report
No ratings yet
Seminar Report
25 pages
Relationship of Math Proficiency To Engineering Practices
No ratings yet
Relationship of Math Proficiency To Engineering Practices
9 pages
Personalized Information Retrieval Syste
No ratings yet
Personalized Information Retrieval Syste
6 pages
Digital Image Processing Assignment
No ratings yet
Digital Image Processing Assignment
5 pages
Topic 02 - Polynomials - Features
50% (2)
Topic 02 - Polynomials - Features
3 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Data Flow Diagrams
No ratings yet
Data Flow Diagrams
3 pages
Chris Van Den Broeck - On The (Im) Possibility of Warp Bubbles
No ratings yet
Chris Van Den Broeck - On The (Im) Possibility of Warp Bubbles
6 pages
EC409 Control Systems (CareerYuga)
No ratings yet
EC409 Control Systems (CareerYuga)
3 pages
Security (1-4 4aes 5 6 6 - 1 7
No ratings yet
Security (1-4 4aes 5 6 6 - 1 7
273 pages
Std-EEE F 342 - Mod-3 - L 20 To 22 (AC Modelling)
No ratings yet
Std-EEE F 342 - Mod-3 - L 20 To 22 (AC Modelling)
48 pages
Formal Languages and Automata Theory
No ratings yet
Formal Languages and Automata Theory
7 pages
IEEE 754 Conversion (32-Bit Single Precision) Bit Fields
No ratings yet
IEEE 754 Conversion (32-Bit Single Precision) Bit Fields
4 pages
DL Module 1 - CS-1 Fundamentals of Neural Network
No ratings yet
DL Module 1 - CS-1 Fundamentals of Neural Network
81 pages
Job Sequencing With Deadlines: The Problem Is Stated As Below
No ratings yet
Job Sequencing With Deadlines: The Problem Is Stated As Below
24 pages
Chapter 3 Searching and Planning
No ratings yet
Chapter 3 Searching and Planning
104 pages
Pumping Lemma For Regular Language
No ratings yet
Pumping Lemma For Regular Language
38 pages
Thesis Darshan Ramasubramanian
No ratings yet
Thesis Darshan Ramasubramanian
75 pages
Computational Fluid Dynamics: Department of
No ratings yet
Computational Fluid Dynamics: Department of
1 page
Error Detection Using CRC
No ratings yet
Error Detection Using CRC
6 pages
NSC March 2016 Exam MS Final
No ratings yet
NSC March 2016 Exam MS Final
13 pages
Teeko
No ratings yet
Teeko
11 pages
Prac 1,2 and 5
No ratings yet
Prac 1,2 and 5
6 pages
Tugas 2 Indah Fitriany Purwaningtyas - 01012622024017
No ratings yet
Tugas 2 Indah Fitriany Purwaningtyas - 01012622024017
4 pages

K-Means for Recommendation System

Uploaded by

K-Means for Recommendation System

Uploaded by

A CASE STUDY ON THE USE OF K-

1. Introduction to Recommendation Systems

Fig. 1: Recommendation System

Fig. 2: Collaborative Filtering

Fig. 5: K-Means illustration

• Following are some of the ways to select the value of K:

• Let us look into the WCSS method in some detail:

5. Now we run K-Means for the selected value of K to obtain K clusters.

6. These clusters can be formed based on say age-genres, ratings-genres.

Fig. 7: Movie Recommendation

1. What is a Recommendation System and types -

You might also like