DM - Lecture 5
DM - Lecture 5
RECOMMENDER SYSTEMS
Jesse Davis
Economics:
Traditional Retail vs. The Web
2
Stores
Mixed
(e.g., Amazon)
Sales
Online only
(e.g., iTunes)
Editorial
List
of favorites
Essential items
Aggregates
Top10 lists
Most emailed articles
Most recent posts
Ratings
1 3 4
3 5 5
4 5 5
Users
3
3
2 ? ?
?
Most
2 1 ?
recent
3 ?
ratings
1
Metrics: Compare with Known Rating
10
Order of prediction
Learn a model
Genre
Director
Etc.
News articles:
Words, title, author, etc.
User Profiles and
Content-Based Prediction
14
Advantages
Onlyneed data about one user
More personalized approach
Disadvantages
Must manually construct meaningful features
Never recommends items outside of a
user’s content profile
Hard to build a profile for a new user
16 Netflix and Recent Advances
Present Recommendation in the
Context of Netflix Challenge
17
Advantages
Expensive to develop internal system
Publicity is good
Disadvantages
Privacy concerns (user backlash to data release)
Prize won too quickly or no one wins the prize
Big idea:
Find other users whose ratings are similar to the
current user
Propagate the (dis)likes to the current user
Pictorial Overview
25
R1 R2 R3 R4 R5 R6
Alice 2 - 3 2 - 1
Bob 2 5 4 - - 2
Chris 4 3 - - - 5
Diana 3 - 2 4 - 5
Find Correlation R3 = 4
R1 R2 R3 R4 R5 R6
Eve 2 5 ? ? ? ?
Active user
General Algorithm
26
𝑆𝑖 ∩ 𝑆𝑗
Idea 1: Jaccard similarity: sim 𝑆𝑖 , 𝑆𝑗 =
𝑆𝑖 ∪ 𝑆𝑗
𝑆𝑖 ⋅ 𝑆𝑗
Idea 2: Cosine similarity: sim(Si, Sj) =
||𝑆𝑖 || ⋅ ||𝑆𝑗 ||
Problem: Treats missing ratings as zero
Step 1: Similarity Weighting
28
Pearson Correlation
σ𝑘 𝑅𝑖𝑘 − 𝑅ഥ𝑖 𝑅𝑗𝑘 − 𝑅ഥ𝑗
𝑊𝑖𝑗 =
2 2
σ𝑘 𝑅𝑖𝑘 − 𝑅ഥ𝑖 σ𝑘 𝑅𝑗𝑘 − 𝑅ഥ𝑗
σ𝑚 𝑅𝑖𝑒
𝑅ഥ𝑖 =
𝑒=1
(Average rating given by user i)
𝑚
Rik = User i’s rating on item k
Predict a rating
Account for different rating levels by looking at
difference from a user’s average rating
Weight each user’s contribution by similarity
Advantages
Simple and intuitive approach for any item type
No feature construction and selection
Exploits information about other users
Disadvantages
Data is sparse: Hard to find similar users
Cold start: Need to enough users in database
First rater: Can’t recommend unrated items,
e.g., new or unique items
Popularity bias: Favors items that lots of people like
(i.e., bad if you have unique taste)
Performance of Various Methods
Global average: 1.1296
Netflix: 0.9514
Basic Collaborative filtering: 0.94
35
36 Predict the Right Thing
Predict the “Right Thing”
37
Our task
Predict(user_id, movie_id, ?)
Minimize RMSE of predictions
Two points:
Obvious: Better results if model optimized
towards the given objective
Subtle: To get good results, often have to derive
a new target variable to predict
What Affects a User’s Rating?
38
𝒓𝒙𝒊 = 𝒃𝒙𝒊 = 𝝁 + 𝒃𝒙 + 𝒃𝒊
Baseline estimation:
3.7+ 0.5 + (-0.2) = 4
Joe will rate The Sixth Sense 4 stars
Three Problems with the
Collaborative Filtering Model
41
Netflix: 0.9514
Basic Collaborative filtering: 0.94
44
45 Think Outside the Box
Let’s Think about the Task
46
Braveheart
Serious Factor 1Light
Dumb and hearted
Dumber
Lethal Weapon
Factor 2
47
Syrianna
Ocean’s 11
Action
Latent Factor Model
48
Movies
= R ≈ Q x PT
1 3 ? 5 Movies “Topics’’ Movies
5 4 ? ?
“Topics’’
Users
Users
Users
2 4
2 4
1 2
5 ͌ X
4 3 4 2
R=uxm Q=uxd P=mxd
1 3 3 ?
matrix matrix matrix
Key idea: d ≪ m, u
“Topics”: shared hidden structure
(e.g., how much each users likes each genre)
Latent Factor Models
49
-.5 .6 .5
2 4 1 2 3 4 3 5 .3 .5 .6
Users
≈
Items
-.2 .3 .5
2 4 5 4 2 1.1 2.1 .3 .5 1.4 1.7
4 3 4 2 2 -.7 2.1 -2 -2 .3 2.4
1 3 3 2 4 -1 .7 .3 -.5 -1 .9
.8 1.4 -.3
? = -.5 * -2 +.6 * .3 + .5 * 2.4 = 2.38 -.4 2.9 .4
.3 -.7 .8
Note: Item-factor should be 1.4 1.2 .7
transposed, but not due to space 2.4 -.1 -.6
Optimization problem
50
Objective is non-convex
1 3 5 ? ? ? ? ? ? ? ?
5 4 ? ? X ? ? ? ? ? ?
2 4 1 2 ? ?
2 4 5
4 3 4 2
͌ ? ?
? ?
1 3 3 ? ?
Intuition of a Solution
52
factors
1 3 5 5 4 .1 -.4 .2 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4
5 4 4 2 1 -.5 .6 .5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1
items
≈
-.2 .3 .5
2 4 5 4 2 1.1 2.1 .3
PT
items
4 3 4 2 2 -.7 2.1 -2
1 3 3 2 4 -1 .7 .3
R Q
𝒓ො 𝒙𝒊 = 𝒒𝒊 ⋅ 𝒑𝒙 = 𝒒𝒊𝒇 ⋅ 𝒑𝒙𝒇
𝒇
𝑟𝑥𝑖 = 𝜇 + 𝑏𝑥 + 𝑏𝑖 + 𝑞𝑖 ⋅ 𝑝𝑥 T
Netflix: 0.9514
Basic Collaborative filtering: 0.94
58
59 Know Your Data
Temporal Effect: Early 2004 Sudden
Jump in Average Movie Rating
60
Netflix: 0.9514
Basic Collaborative filtering: 0.94
63
64 The More (Models) the Merrier
What Now??
65