0% found this document useful (0 votes)
4 views

M03 Item-Based CF-V2 (1)

The document discusses item-based collaborative filtering (CF) as a solution for large-scale e-commerce recommendation systems, highlighting its advantages over user-based CF, particularly in terms of scalability. It explains the use of cosine similarity to measure item similarity and provides examples of how to create similarity matrices and predict user ratings. Additionally, it addresses challenges such as data sparsity and cold start problems, along with various model-based approaches to improve recommendation accuracy.

Uploaded by

Fa Putra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

M03 Item-Based CF-V2 (1)

The document discusses item-based collaborative filtering (CF) as a solution for large-scale e-commerce recommendation systems, highlighting its advantages over user-based CF, particularly in terms of scalability. It explains the use of cosine similarity to measure item similarity and provides examples of how to create similarity matrices and predict user ratings. Additionally, it addresses challenges such as data sparsity and cold start problems, along with various model-based approaches to improve recommendation accuracy.

Uploaded by

Fa Putra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Item-Based Collaborative

Filtering
Dr ZK Abdurahman Baizal

Sumber : Dietmar Jannach, et al, 2010, Introduction to Recommender System


Introduction
• Although user-based CF approaches have been applied successfully in
different domains, some serious challenges remain when it comes to
large e-commerce sites
• millions of users and millions of catalog items -> need to scan a vast number
of potential neighbors makes it impossible to compute predictions in real
time.
• Large-scale e-commerce sites, often implement a different technique,
called as item-based recommendation
Introduction
• Item-to-item collaborative filtering is the technique used by
Amazon.com to recommend books or CDs to their customers.
The main problem with traditional user-based CF is that the algorithm
does not scale well for such large numbers of users and catalog items.
• User-Based Nearest Neighbor Collaborative Filtering:
• Recommendations based on the calculating similarities of two users
• Item-Based Nearest Neighbor Collaborative Filtering:
• Recommendation based on calculating similarities of two items based on
peoples rating of two items.
Cosine Similarity
• Cosine Similarity is a metric used to measure how similar the two
items or documents are irrespective of their size.
• It measures the cosine of an angle between two vectors projected in
multi-dimensional space. This allows us to measure the similarity of a
document of any type.
Cosine Similarity
The cosine of 0° is 1, and it
is less than 1 for any other
angle.

two vectors with the same


orientation have a cosine
similarity of 1, two vectors at
90° have a similarity of 0, and
two vectors diametrically
opposed have a similarity of
-1, independent of their
magnitude.
Cosine Similarilty
Cosine Similarity
• Since we are finding the cosine of two vectors the output will always
range from -1 to 1, where -1 shows that two items are dissimilar and
1 shows that two items are completely similar.
• We will now see how we can use the Cosine Similarity measure to
determine how similar the movies are.
Example
• Suppose
we have
movie
ratings
given by
different
users
Example
• Step 1: We create a matrix where we write user-item ratings in a
matrix form

• In this matrix user, Amy has already rated and watched movies Pulp Fiction and The
GodFather but hasn’t watched the movie, Forrest Gump.
• We will be using the above matrix for our example and will try to create an item-item
similarity matrix using Cosine Similarity method to determine how similar the movies are
to each other.
Example
• Step 2: To calculate the similarity between the movie Pulp Fiction (P) and Forrest Gump
(F), we will first find all the users who have rated both the movies. In our case, Calvin (C),
Robert (R) and Bradley (B) have rated the movies. We now create two vectors:

Therefore Cosine Similarity between movies Pulp Fiction and Forrest Gump is:
Example
• Similarly, we can calculate the cosine similarity of all the movies and
our final similarity matrix will be:
Example
• Step 3: Now we can predict and fill the ratings for a user for the items he
hasn’t rated yet. So to calculate the rating of user Amy for the movie
Forrest Gump, we will use the calculated similarity matrix along with the
already rated movie by the Amy.
∑!∈# 𝑟$,! ∗ 𝑠𝑖𝑚(𝑖, 𝑝)
𝑝𝑟𝑒𝑑 𝑢, 𝑝 =
∑!∈# 𝑠𝑖𝑚(𝑖, 𝑝)

𝐼 = himpunan item yang pernah di-rating oleh active user dan yang similar dengan item 𝑝
Example

Hence, our final matrix would be:

Dalam implementasi, prediksi rating dihitung berdasarkan item-item yang mempunyai


Tingkat similarity tinggi terhadap item yang akan diprediksi ratingnya. Dalam Kasus Amy,
Kita dapat menentukan treshold tingkat similarity dari item-item yang akan dilibatkan
dalam prediksi
Example
Using Adjusted Cosine Similarity
The basic cosine measure does not take the differences in the average rating
behavior of the users into account.

This problem is solved by using the adjusted cosine measure, which subtracts the
user average from the ratings. The values for the adjusted cosine measure
correspondingly range from -1 to +1, as in the Pearson measure
Example
Item1 Item2 Item3 Item4 Item5 Mean-adjusted ratings matrix
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

Basic Cosine Similarity

Adjusted Cosine Similarity


Implementasi dalam Python
Implementasi dalam Python
Implementasi dalam Python
Preprocessing data for item-based filtering
• For making item-based recommendation algorithms applicable also
for large scale e-commerce sites without sacrificing recommendation
accuracy, an approach based on offline precomputation of the data is
typically chosen.
• The idea is to construct in advance the item similarity matrix that
describes the pairwise similarity of all catalog items.
Model-Based Approach
• Besides different preprocessing techniques used in so-called model-
based approaches, it is an option to exploit only a certain fraction of
the rating matrix to reduce the computational complexity.
• Basic techniques include subsampling, which can be accomplished by
randomly choosing a subset of the data or by ignoring customer
records that have only a very small set of ratings or that only contain
very popular items
Data Sparsity Problem and Cold Start Problem
• Cold start problem
• How to recommend new items? What to recommend to new users?
• Straightforward approaches
• Ask/force users to rate a set of items
• Use another method (e.g., content-based, demographic or simply non-
personalized) in the initial phase
• Alternatives
• Use better algorithms (beyond nearest-neighbor approaches)
• Example:
• In nearest-neighbor approaches, the set of sufficiently similar neighbors might be to
small to make good predictions
• Assume "transitivity" of neighborhoods
Data Sparsity Problem and Cold Start Problem

Ratings database for spreading activation approach.

A 0 in this matrix should not be interpreted as an explicit


Graphical representation of user–item relationships (poor) rating, but rather as a missing rating
Data Sparsity Problem and Cold Start Problem
• In a standard user-based or item-based CF approach, paths of length 3 will
be considered – that is, Item3 is relevant for User1 because there exists a
three-step path (User1–Item2–User2–Item3) between them.
• Using path length 5, for instance, would allow for the recommendation also
of Item1, as two five-step paths exist that connect User1 and Item1.
• Because the computation of these distant relationships is computationally
expensive, Huang et al. (2004) propose transforming the rating matrix into
a bipartite graph of users and items.
Data Sparsity Problem and Cold Start Problem
• the quality of the recommendations can be significantly improved
with the proposed technique based on indirect relationships, in
particular when the ratings matrix is sparse
• for new users, the algorithm leads to measurable performance
increases when compared with standard collaborative filtering
techniques
More model-based approaches
• Plethora of different techniques proposed in the last years, e.g.,
• Matrix factorization techniques, statistics
• singular value decomposition, principal component analysis
• Association rule mining
• compare: shopping basket analysis
• Probabilistic models
• clustering models, Bayesian networks, probabilistic Latent Semantic Analysis
• Various other machine learning approaches
• Costs of pre-processing
• Usually not discussed
• Incremental updates possible?
Latihan
1. Buatlah file excel untuk perhitungan prediksi rating di kasus Alice
pada slide materi sebelumnya (materi User Based Collaborative
Filtering), dengan menggunakan item-based collaborative filtering.
Tetapkan treshold similarity yang akan digunakan (input dilakukan
di file excel tersebut). Gunakan cosine similarity dan adjusted cosine
similarity. Bandingkan hasil prediksi dari kedua rumus similarity
tersebut
2. Buatlah program dalam phyton untuk mengerjakan kasus no 1.
Input dapat berupa matriks rating dengan dimensi bebas
Latihan
5. Jelaskan keunggulan item based collaborative filtering dibanding
user based collaborative filtering
6. Jelaskan keungglan adjusted cosine similarity dibanding basic cosine
similarity
7. Jelaskan perbedaan implicit rating dan explicit rating
8. Jelaskan apa yang dimaksud dengan Sparsity problem, dan apakah
efeknya?
9. Jelaskan apa yang dimaksud dengan cold start problem, dan apakah
efeknya?

You might also like