Recommender System - New
Recommender System - New
Matrix Factorization
Venkateswara
NIT Warangal
Disclaimer: Some of these slides are taken from “Recommender Systems – An Introduction, Dietmar Jannach, Markus Zanker,
Alexander Felfernig, Gerhard Friedrich, Cambridge University Press” slides.
Agenda
▪ Introduction to Recommender Systems
▪ Paradigms of Recommender Systems
▪ Collaborative Filtering
– Memory based Collaborative Filtering
– Collaborative Filtering via Matrix Factorization
– Geometrical interpretation of matrix factorization
Recommender Systems
▪ Data here, data there, data everywhere.
▪ There is a need to derive meaningful
insights that can help
– organizations for better functioning,
– assist user in his/her decision making
process.
▪ The amount of data is enormous and
therefore requires a large amount of time
to explore all of them.
▪ To avoid the information overload, we
need some assistance in the form of
recommendation in our day to day life.
Recommender System
The world move from “one size fits all” to personal tailor made solutions.
Book Recommender
Red
Mars
Found
ation
Juras-
sic
Park
Machine User
Lost
Learning Profile
World
Differ-
ence
Engine
Recommender Systems
▪ Recommendation systems (RS) help to match users with
items
– Ease information overload
– Sales assistance (guidance, advisory, persuasion,…)
▪ RS seen as a function
– Given:
• User model (e.g. ratings, preferences, demographics,
situational context)
• Items (with or without description of item characteristics)
– Find:
• Relevance score. Used for ranking.
Paradigms of Recommender Systems
Paradigms of recommender systems
Recommender Systems
Pure CF Approaches
▪ Input
– Only a matrix of given user–item ratings
▪ Output types
– A (numerical) prediction indicating to what degree the current user will like or dislike
a certain item
– A top-N list of recommended items
Recommender Systems
Collaborative Filtering
Movies
▪ Collaborative Filtering (CF) aims at predicting users 0 4 0 0 1 0 0
interests in some given items based on their 0 0 0 4 0 0 3
preferences so far, and the preference information U 2 0 0 0 3 0 5
of other users. s
0 5 0 0 1 3 0
e
▪ Let 𝑌 = [ 𝑦𝑖𝑗 ] is a N × M user/items rating matrix r 3 0 1 0 5 0 0
1 0 0 3 0 0 4
where 𝑅 is the total level of rating and 0 indicate
unknown rating. The goal is to predict the unknown
ratings, represented by 𝑦𝑖𝑗 = 0. Will user I like movie J ?
Collaborative Filtering
Collaborative Prediction
Recommender Systems
User-based nearest-neighbor collaborative filtering
▪ Example
– A database of ratings of the current user, Alice, and some other users is given:
– Determine whether Alice will like or dislike Item5, which Alice has not yet rated or
seen
Recommender Systems
User-based nearest-neighbor collaborative filtering (3)
▪ Some first questions
– How do we measure similarity?
– How many neighbors should we consider?
– How do we generate a prediction from the neighbors' ratings?
Recommender Systems
Measuring user similarity (1)
▪ A popular similarity measure in user-based CF: Pearson correlation
𝑎, 𝑏 : users
𝑟𝑎,𝑝 : rating of user 𝑎 for item 𝑝
𝑃 : set of items, rated both by 𝑎 and 𝑏
– Possible similarity values between −1 and 1
σ𝒑 ∈𝑷(𝒓𝒂,𝒑 − 𝒓ത 𝒂 )(𝒓𝒃,𝒑 − 𝒓ത 𝒃 )
𝒔𝒊𝒎 𝒂, 𝒃 =
𝟐 𝟐
σ𝒑 ∈𝑷 𝒓𝒂,𝒑 − 𝒓ത 𝒂 σ𝒑 ∈𝑷 𝒓𝒃,𝒑 − 𝒓ത 𝒃
Recommender Systems
Measuring user similarity (2)
▪ A popular similarity measure in user-based CF: Pearson correlation
𝑎, 𝑏 : users
𝑟𝑎,𝑝 : rating of user 𝑎 for item 𝑝
𝑃 : set of items, rated both by 𝑎 and 𝑏
– Possible similarity values between −1 and 1
Recommender Systems
Pearson correlation
▪ Takes differences in rating behavior into account
6 Alice
5 User1
User4
4
Ratings
3
0
Item1 Item2 Item3 Item4
Recommender Systems
Making predictions
▪ A common prediction function:
σ𝒃 ∈𝑵 𝒔𝒊𝒎 𝒂, 𝒃 ∗ (𝒓𝒃,𝒑 − 𝒓𝒃 )
𝒑𝒓𝒆𝒅 𝒂, 𝒑 = 𝒓𝒂 +
σ𝒃 ∈𝑵 𝒔𝒊𝒎 𝒂, 𝒃
▪ Calculate, whether the neighbors' ratings for the unseen item 𝑖 are higher or
lower than their average
▪ Combine the rating differences – use the similarity with 𝑎 as a weight
▪ Add/subtract the neighbors' bias from the active user's average and use this as a
prediction
Recommender Systems
Item-based collaborative filtering
▪ Basic idea:
– Use the similarity between items (and not users) to make predictions
▪ Example:
– Look for items that are similar to Item5
– Take Alice's ratings for these items to predict the rating for Item5
Recommender Systems
The cosine similarity measure
▪ Produces better results in item-to-item filtering
▪ Ratings are seen as vector in n-dimensional space
▪ Similarity is calculated based on the angle between the vectors
𝒂∙𝒃
𝒔𝒊𝒎 𝒂, 𝒃 =
𝒂 ∗ |𝒃|
▪ Adjusted cosine similarity
– take average user ratings into account, transform the original ratings
– 𝑈: set of users who have rated both items 𝑎 and 𝑏
σ𝒖∈𝑼(𝒓𝒖,𝒂 − 𝒓𝒖 )(𝒓𝒖,𝒃 − 𝒓𝒖 )
𝒔𝒊𝒎 𝒂, 𝒃 =
𝟐 𝟐
σ𝒖∈𝑼 𝒓𝒖,𝒂 − 𝒓𝒖 σ𝒖∈𝑼 𝒓𝒖,𝒃 − 𝒓𝒖
Recommender Systems
Making predictions
▪ A common prediction function:
Recommender Systems
Data sparsity problems
▪ Cold start problem
– How to recommend new items? What to recommend to new users?
▪ Straightforward approaches
– Ask/force users to rate a set of items
– Use another method (e.g., content-based, demographic or simply non-personalized)
in the initial phase
– Default voting: assign default values to items that only one of the two users to be
compared has rated (Breese et al. 1998)
▪ Alternatives
– Use better algorithms (beyond nearest-neighbor approaches)
– Example:
• In nearest-neighbor approaches, the set of sufficiently similar neighbors might be too
small to make good predictions
• Assume "transitivity" of neighborhoods
Recommender Systems
Model based Collaborative Filtering
Collaborative Prediction Via Matrix Factorization
• CP can be formalized as a matrix completion problem,
completing entries in a partially observed rating matrix Y.
• Given the rating matrix 𝑌 ∈ ℝ𝑛 × 𝑚 𝑤𝑒 𝑤𝑎𝑛𝑡 𝑡𝑜 𝑓𝑖𝑛𝑑 𝑡𝑤𝑜
𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠 𝑈 ∈ ℝ𝑛 × 𝑘 , and 𝑉 ∈ ℝ𝑚 × 𝑘 such that:
Y ≈ UVT
where k is number of features.
Collaborative Prediction via Matrix Factorization
Drama Comedy
-0.63 -0.50 𝑈1
-0.69 -0.96 𝑈2
0.27 -1.09 𝑈3
0.63 -0.84 𝑈4
0 1 0 0 1 0 -1 -0.02 -1.03 𝑈5 -0.24 0.95 0.46 0.14 0.84 -0.13 -0.81
-1 0 1 1 1 0 -1 1.19 -0.03 𝑈6 -0.65 1.46 0.33 0.55 -1.06 0.09 -1.23
0 1 -1 0 0 1 -1 -1.11 0.21 𝑈7 -1.12 0.92 -0.74 1.21 0.11 0.81 -0.71
-1 1 -1 1 -1 1 0
-1.02 0.42 -1.02 1.16 -0.35 0.91 -0.27
-1 0 -1 1 1 0 0
-0.37 0.94 𝑉1 -0.96 1.06 -0.41 0.99 0.39 0.58 -0.85
0 -1 0 1 0 1 1
-0.70 -1.02 𝑉2 -0.47 -0.80 -1.27 0.68 -1.23 0.81 0.75
1 1 0 -1 0 0 0
-1.06 -1.02 𝑉3 0.61 0.56 1.26 -0.81 1.08 -0.84 -0.55
0.55 -0.97 𝑉4
-1.04 -0.36 𝑉5
0.67 -0.58 𝑉6
0.65 0.81 𝑉7
Collaborative Prediction via Matrix Factorization
Matrix Factorization
𝜕𝐽 𝑇
= −2 𝐼⨂ 𝑌 − 𝑈𝑉 𝑇 𝑈 + 𝜆𝑉
𝜕𝑉
Non-Negative Matrix Factorization(NMF)
▪ Instead of introducing a regularization constraint as done in
RMF, NMF attempts to impose restrictions on individual element
of factors U and V as non-negative element.
▪ Given a non-negative matrix Y ∈ ℝ𝑛 × 𝑚 , NMF try to find non-
negative matrix factors U ∈ ℝ𝑛 ×𝑘 and 𝑉 ∈ ℝ𝑘 × 𝑚 such that:
𝑌 ≈ 𝑈𝑉
Rows of U
(Users)
Columns of VT
(Movies)
Maximum Margin Matrix Factorization
▪ When the rating matrix contains only two levels (±1)
– Rows of V can be viewed as points in k-dimensional space and
– Rows of U as decision hyperplanes in this space separating +1 entries from −1 entries.
▪ When hinge/smooth hinge loss is the loss function, the hyperplanes act as
maximum-margin separator.
Columns of V'
Rows of U (Movies)
(Users)
𝐽 = λ ( ⃦ U ⃦F + ⃦ V ⃦F ) + ℎ 𝑇𝑖𝑗𝑟 𝑇
𝜃𝑖𝑟 − 𝑈𝑖𝑘 𝑉𝑘𝑗
𝑖𝑗 ∈𝛺 𝑟 = 1 𝑘
where,
Maximum Margin Matrix Factorization
▪ This extension of hinge loss function can be Geometrically Interpreted as
Maximum Margin Matrix Factorization
▪ This extension of hinge loss function can be Geometrically Interpreted as
Maximum Margin Matrix Factorization
▪ This extension of hinge loss function can be Geometrically Interpreted as
Hierarchical Matrix Factorization for Collaborative Filtering
1 0 0 1 1 0 0 1 1 1 1 1 1 -1 0 0 0 0 0 0 1
𝝀
𝒎𝒊𝒏 𝑱(𝑼, 𝑽) = 𝒉 𝒚𝒊𝒋 𝑼𝒊 𝑽𝑻𝒋 + (∥ 𝑼 ∥𝟐𝑭 +∥ 𝑽 ∥𝟐𝑭 ) 1 1 0 -1 1 1 1 1 1 1 -1 1 1 1 0 0 0 1 0 0 0
𝑼,𝑽
𝒊,𝒋 ∈ 𝜴
𝟐
𝑌1 𝑌ത
-1 0 1 0 1 -1 0 -1 -1 1 -1 1 -1 1 1 1 0 1 0 1 0
1 1 0 0 0 0 -1 1 1 -1 1 1 1 -1 0 0 1 0 0 0 1
1 0 1 1 0 1 1 0 1 1 1 -1 1 1 1 0 0 0 1 0 0 0
𝑌
1 0 0 1 -1 0 0 1 -1 -1 1 -1 1 -1 0 2 2 0 2 0 1
1 1 0 -1 1 1 1 1 1 -1 -1 1 1 1 0 0 2 1 0 0 0
𝑌2
3 0 0 5 2 0 0
-1 0 1 0 1 -1 0 -1 -1 1 -1 1 -1 1 1 1 0 1 0 1 0 𝑌ത
1 1 0 0 0 0 -1 1 1 -1 1 -1 1 -1 0 0 1 0 2 0 1
5 4 0 1 5 3 4 2
1 0 4 0 3 1 0
𝑌 0 1 -1 0 1 -1 0 -1 1 -1 -1 1 -1 1 2 0 2 1 0 2 0
5 4 0 0 0 0 1 -1 0 0 1 -1 0 0 -1 -1 -1 1 -1 1 -1 3 2 2 0 2 0 1
0 3 2 0 5 2 0 1 1 0 -1 1 -1 1 1 1 -1 -1 1 -1 1 0 0 2 1 0 3 0
𝑌3
-1 0 1 0 -1 -1 0 -1 -1 1 -1 -1 -1 1 1 1 0 1 3 1 0 𝑌ത
1 1 0 0 0 0 -1 1 1 -1 -1 -1 1 -1 0 0 1 3 2 0 1
3
𝑌 0 -1 -1 0 1 -1 0 -1 -1 -1 -1 1 -1 1 2 3 2 1 0 2 0
3 2 2 5 2 5 1
-1 0 0 1 -1 0 0 -1 -1 -1 1 -1 1 -1 3 2 2 0 2 0 1
5 4 2 1 5 3 4
1 -1 0 -1 1 -1 -1 1 -1 -1 -1 1 -1 -1 0 4 2 1 0 3 4
1 1 4 1 3 1 5 𝑌4
5 4 1 3 2 4 1
-1 0 -1 0 -1 -1 0 -1 -1 -1 -1 -1 -1 1 1 1 4 1 3 1 0 𝑌ത
1 -1 0 0 0 0 -1 1 -1 -1 -1 -1 -1 -1 0 4 1 3 2 4 1
2 3 2 1 5 2 4 𝑌4 0 -1 -1 0 1 -1 0 -1 -1 -1 -1 1 -1 -1 2 3 2 1 0 2 4
Proximal Matrix Factorization
▪ We observe that there could be several
possible alternative criteria to formulate the
factorization problem of discrete ordinal
rating matrix, other than the maximum
margin criterion.
▪ Taking the cue from the alternative
formulation of support vector machines, a
novel loss function is derived by considering
proximity as an alternative criterion for matrix
factorization framework.
𝑅 𝑅
∗ 2 𝜆
𝑚𝑖𝑛𝑈,𝑉 𝐽 𝑈, 𝑉 = 𝑈𝑖 𝑉𝑗𝑇 − 𝜃𝑖𝑟 + ℎ 𝑇𝑖𝑗𝑟 𝑈𝑖 𝑉𝑗𝑇 − 𝜃𝑖𝑟
∗
+ (||𝑈||2𝐹 + ||𝑉||2𝐹 )
2
𝑟=1 𝑖,𝑗 ∈Ω∧𝑦𝑖𝑗 =𝑟 𝑟=1 𝑖,𝑗 ∈Ω∧𝑦𝑖𝑗 ≠𝑟
Research Topics in Recommender Systems
▪ Application specific recommender ▪ Cross-domain recommender
systems. systems.
▪ Geography based recommender ▪ Group recommender systems.
systems.
▪ Evaluation in recommender systems.
▪ Context-aware recommender systems.
▪ Justified recommender systems.
▪ Privacy and security issues in
recommender systems. ▪ Multi-criteria recommender systems.