0% found this document useful (0 votes)

129 views

Appm 3310 Final Project

This document summarizes a Netflix recommender algorithm paper. It begins by explaining that singular value decomposition (SVD) can be used to extract trends and similarities between users and movies/shows to make recommendations. SVD regularizes the user-item rating matrix to find the most important latent feature trends. The paper will explain how Netflix's baseline algorithm works and how additional factors are added to improve accuracy, such as user and item biases. It focuses on how latent factor models that compare user preferences to item features, like genre, are used in Netflix's algorithm.

Uploaded by

api-491772270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views

Appm 3310 Final Project

Uploaded by

api-491772270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Netflix

Recommender Algorithm Straub, Ahonen, Benalcazar

Netflix Recommender Algorithm

Daniel Straub, Alexander Ahonen, Jason Benalcazar
APPM 3310-002

1 Abstract

This report explains the different techniques, used in the Netflix recommender algorithm, to
model and predict ratings for movies and TV shows. It turns out that the Singular Value
Decomposition, and different variations of it, can be used to extract trends and similarities
between users and movies/shows. The importance of SVD has become revolutionary in
collaborative filtering and recommender algorithms. By Primary Component Analysis (PCA), the
SVD regularizes the matrix, and provides the most “important” or crucial latent feature trends of
a matrix, that otherwise would be hard to find.
This paper will begin by explaining and forming the basic algorithm used to predict all
unknown ratings. We will then explain how different factors can be added to this baseline model
to improve the accuracy of each predicted rating, and ultimately, utilizing some numerical
methods to further reduce error in the model. We will not go into extreme mathematical detail of
the parameters or the numerical techniques in this paper, but will provide some general
understanding. The focus is the importance and use of the SVD matrix factorization in the
recommender algorithm, of which we will provide clear details. This paper is written for
individuals with a somewhat basic understanding of linear algebra and its concepts.

2 Introduction
Recommender algorithms are data-filtering systems used widely in today’s technological
applications, ranging from Amazon, Netflix, news sites, etc. The goal of the recommender is to
determine what items the user would enjoy, based off the limited amount of information from the
user. In the long run, these recommenders reduce the amount of research a user must do to
stumble upon what he or she may like. It transforms research into discovery. For example,
Netflix offers almost 10,000 different movies and shows for a user to choose from, closer to
100,000 movies when including their mailing service, and can produce an overwhelming
experience when trying to select the perfect movie [3]. The recommender algorithm side steps
all the “fluff”, and will present, as accurate as it can, only the items that the user will enjoy.
These algorithms not only make the search experience less hectic, but also provide a
customized and personal viewing experience. Ultimately, the main motivation for these
companies to be constantly improving these algorithms is business. For Netflix, they realize that
they have only a short time-frame to catch the viewer’s attention, before the viewer resorts to
another streaming service or activity. This comes down to keeping their viewers happy in order
to maintain their subscribers. Likewise, Amazon presents items that they know their users will
enjoy so that their consumers will spend more money through their website. All in all, the use of
recommender algorithms creates a win-win situation for both consumer and corporation.
To fully explain the process of the algorithm, and the SVD used, the math on an example
problem of a 2x3 matrix will be thoroughly explained. In addition, Matlab code will be provided to
solve a more realistic, non-square matrix. The data in the matrices will be arbitrary.

1
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

3 Preliminaries
In 2006, Netflix launched a competition to improve their accuracy from their system at the time,
CineMatch, by 10%. Providing a probe data set of over 18 thousand movies, 480 thousand
users, and 100 million ratings from those users. The competitors’ algorithms were then tested
against a test set, using the root mean square error, RMSE, as the error metric. One year into
this competition, a team, known as BellKor, won the first progress prize with their design, giving
an 8.43% improvement [5]. This method used what is known as Singular Value Decomposition,
abbreviated as SVD. To achieve the full 10% improvement, BellKor teamed up with other
competitors and blended all their algorithms, ultimately learning that the more algorithms and
predictors, the more accurate the “guess” rating is. The prize-winning predictor model averaged
over 800 algorithms. These algorithms exploit all the user data that it can, utilizing bizarre things
like the date of rating, time of day, day of the week, how many ratings a user gave in that time
frame, etc. (people’s preferences over movies change over time).
To understand the goal of the Netflix algorithm, it is helpful to visualize a chart: each row
represents a user, and each column represent a movie or TV show item. Refer to Figure 1.

Movies
5 1 2
3 3
Users

4 1 3
2
2 3 1

Figure 1: Rating Matrix

Every time a user rates content, the corresponding data cell for the user and the rated
media gets filled in. Because viewers typically do not rate media often, this chart contains many
blank cells. The goal of the algorithm is to fill in the whole chart with the calculated value that
each user would rate each item. From the data in this chart, the algorithm would then find the
items with the highest guessed rating, and then present those items to the user. The process is
as follows. Refer to Figure 2.

Input (known ratings and user

data)

Predictor Algorithm

Output (predicted ratings)

Figure 2: Visualization of Algorithm

2
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

The input data includes data such as the names of the users, ratings of all items each user
has rated, the title of the movie/show, the date of rating, etc. [7]. The predictor algorithm outputs
the ratings that each user would give for all unwatched or items that have not been rated.

4 Notation

r"# = the known rating given to item i by user u

r"$ = the predicted rating given to item i by user u
μ = the overall average rating from all users (will be set to μ=3.7 stars for purposes of this
paper)
b" = user bias (may be written as function of time b (t)) u

b# = item bias (may be written as function of time b (t)) i

qi, pu = (bold lowercase) feature vectors of items and users, respectively

R, R, P, Σ, Q = (capital letters) matrices (R is the user-item rating matrix, R is the predicted rating
matrix, and P,Σ,Q are the factorization variables of the SVD of R) [3]

5 The Predictor Algorithm

There are two main ways to form the predictor algorithm: Neighborhood models or Latent Factor
models [7]. In the Neighborhood model, user preferences are extracted from their similarities
among other users. In other words, if a group of individuals all like the same movie, these
individuals are placed into a “bag” or neighborhood of similar individuals. From these
neighborhoods, each individual is presented with all unwatched movies that his or her neighbor
liked. Because each neighbor is similar, they can both offer recommend movies for each other.
The second model is the Latent Factor model in which users are compared to the item itself. In
other words, the user’s preferences are compared to the features of the movie, such as genre,
actors, director. If a user likes action movies with Bruce Willis, then the user will be presented
with all Die Hard movies, for example. Both models are made useful, and are typically blended
and used together to arrive at the best possible predicted rating. However, this model will focus
on the item-user comparison used in the Latent Factor model.
The most basic predictor would be to take an average of all ratings from all users, and fill
in each user’s cell with the average for that movie. However, this offers a lot of error. To
minimize this error, more parameters need to be added to the equation, providing more
information about the user and the item itself.

5.1 Baseline Predictors

The simplest baseline predictor rests on a constant value μ, calculated by averaging all ratings
of all movies, from all users. From the probe data set that Netflix released, the value was μ=3.7
out of 5 stars [7]. This average rating is not accurate for everyone. For example, some may
really enjoy Star Wars, and other users may hate it. In addition, each user may be more critical
of movies than others, which introduces a rating bias. So, for example, Joe may rate an on-
average 5-star movie as 4, meaning Joe is more critical than most users. There is also an item
bias, as well, where a certain movie may be better than the average movie, i.e. the average
rating of the movie is higher than μ. From these three factors, the predictor baseline equation
becomes

3
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

r"$ = 𝜇 + 𝑏𝑢 + 𝑏𝑖 (1)

where r"$ is the predicted rating for user, u, of movie/show item, i, μ is the overall average item
rating, b" is the user bias, and b# is the item bias where one movie may be better or worse than
the average rating [7].
So, for example, to predict Joe’s rating of Star Wars, Joe may be more critical than most
and rate 0.2 stars less than the average user so b" = 0.2. Now, Star Wars may be, on average,
rated 0.4 above the average µ = 3.7, and hence, b# = 0.4. Joe’s predicted rating for Star Wars
would then be 3.9 stars [7].

5.2 Time-Related Factors

This extremely simple model creates a great deal of error. To make it more accurate, more
predictors can be added. It is observed from Netflix public user data that many users’ ratings
and criticism changes with respect to time; people are less critical of older movies, bulk ratings
from a user tend to be more identifiable with the user, etc. To encompass these time-dependent
factors, the predictors b" and b# become functions of time. The model becomes

r"$ = 𝜇 + 𝑏𝑢 (𝑡𝑢𝑖 ) + 𝑏𝑖 (𝑡𝑢𝑖 ) (2)

where tui is the amount of days from the first date of rating in the data set to the date user u
rated item i [7].

Figure 3: R2 Vector Space of Items and Users Features

4
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

5.3 Feature Vectors

To further optimize the model, user preferences, or features, pu, and item features, qi, are
added. These two vectors lay in a two-dimensional vector space, where the amount of
“seriousness” or “intensity” lies along the vertical axis and the amount of “chick-flick-ness” of the
item lies along the horizontal axis. Refer to Figure 3. Each item can be represented by a feature
vector qi and each user can be represented by a user feature vector pu. Namely, consider the
two vectors pu and qi of the same length, where each component of pu would be associated
with a specific genre, name of director, year of movie, etc. The values of the entries represent
how much the movie identifies with that description. For the user preference vector pu, the
components store the user’s preference over each respective component in qi, i.e. if the user
likes the respective genre, name of director, year of movie, etc.
For example, after analyzing Joe’s viewing and search history, it turns out that Joe likes
comedies, dramas, and horror films the most. If Netflix wanted to know if user Joe would like the
movie Forrest Gump (1994), the similarity between

𝑑𝑟𝑎𝑚𝑎 𝑞 𝑑𝑟𝑎𝑚𝑎 𝑝
𝑞 𝑝
𝒒𝑻𝒊𝒕𝒂𝒏𝒊𝒄 = 𝑟𝑜𝑚𝑎𝑛𝑐𝑒 = and 𝒑𝑱𝒐𝒆 = 𝑟𝑜𝑚𝑎𝑛𝑐𝑒 =
⋮ ⋮ ⋮ ⋮
ℎ𝑜𝑟𝑟𝑜𝑟 0 ℎ𝑜𝑟𝑟𝑜𝑟 𝑝

where q, p are real, nonzero values, can be analyzed using these feature matrices.
To compare the user, u, with the item, i, an inner product, in this case the Euclidean dot
product, of these two vectors can be taken. The dot product will show how much two vectors in
Figure 3 point in the same direction, or in other words, will give the similarity between the user’s
preferences and the features of the movie [8]. Negative outcomes from the dot product are
considered as a bad match. For example, from Figure 3, the user (red vector) matches better
with Die Hard and Dumb and Dumber, than the user does with The Notebook and Mean Girls,
based on the angle between them. This dot product is essentially the rating the user would give
to that item. So, the model becomes

r"$ = 𝜇 + 𝑏𝑢 𝑡𝑢𝑖 + 𝑏𝑖 𝑡𝑢𝑖 + 𝒒𝑻𝒊 𝒑𝒖 (3)

where the dot product between pu and qi is expressed in matrix form.

Like the item and user biases, the user preferences become a function of time as well
pu(t), since a user may not like horror films one year, but may like them the next year, for
example [8]. So, the model transforms once again to

r"$ = 𝜇 + 𝑏𝑢 𝑡𝑢𝑖 + 𝑏𝑖 𝑡𝑢𝑖 + 𝒒𝑻𝒊 (𝒑𝒖 𝑡𝑢𝑖 ) (4)

In the BellKor model, the item biases are organized in bins; each bin contains all the
ratings from a specific time interval t [7]. The user biases and preferences are both based on the
time deviations of the date of rating and the mean rating date. The competing teams
experimented with different forms of functions for each time dependent factor, ultimately leading
to a satisfactory model.
In general, the more terms and dimensions of the variables, the more accurate the
predicted rating r"$ will be, at the expense of memory and restriction to computational limits. The

5
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

Netflix Prize-winning algorithm contained millions of parameters. The model in Equation (4) is
much simpler than the one used in the Netflix prize-winning algorithm, however, this is the basic
idea of how the predictor algorithm is formed. The qi and pu vectors are what hold the valuable
latent information that can be extracted using the SVD factorization [8], and allowed the BellKor
team to succeed in the Netflix challenge.
This model is then sent through various numerical techniques to ensure the greatest
amount of accuracy. The most important technique used is a Boltzmann machine.

6 Boltzmann Machine and Stochastic Gradient Descent

Once the baseline predictor model is established, numerical methods such as a Boltzmann
machine and stochastic gradient descent are used to minimize error in the parameters of the
baseline predictor, and “teach” the algorithm more about each user [4].
The parameters and variables in the model need to be “trained” to increase accuracy in
the predicted rating. To train them, the predicted ratings are compared to the actual ratings
provided by Netflix. By Equation (5), the difference between the predicted rating and actual
rating can be calculated.

𝑒TU = 𝑟TV − 𝜇 − 𝑏U (𝑡TU ) − 𝑏T (𝑡TU ) − 𝒒𝑻𝒊 𝒑𝒖 (𝑡TU ) (5)

where eui is the error in the predicted rating.

By solving the least squares problem in Equation (6), the most accurate parameters can
be found [7].

𝑒TU = min[Σ 𝑟TV − 𝜇 − 𝑏U 𝑡TU − 𝑏T 𝑡TU − 𝒒𝑻𝒊 𝒑𝒖 𝑡TU (6)

]
+ 𝜆 Σ𝑏T] + Σ𝑏U] + Σ 𝒒𝑻𝒊 𝒑𝒖 ]

where Equation (6) is a Tikhonov regularization technique, and λ is regularizing term to avoid
overfitting the data; found by cross-validating [6,7].
The model is sent through a Boltzmann machine, a type of neural network typically used
to “teach” models from a training data set, which uses stochastic gradient descent, an iterative
optimization method for finding minimums [1]. In this case, it finds the minimum value to solve
the least squares Equation (6). Because Equation (6) is differentiable, the stochastic gradient
descent explores the surface of the vector space to find a minimum value, by means of stepping
in the direction opposite the gradient. Minimizing the error in each rating will eventually “train”
the model and tune its parameters.
Once the algorithm is trained and optimized, the singular value decomposition is used to
extract the latent information from the rating matrix.

7 Matrix Factorization – SVD

7.1 How the SVD Works

6
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

The Singular Value Decomposition is a generalization of the spectral factorization, and can be
used for any real, rectangular matrices. Consider the user-item matrix R. Its factorization is as
follows
R = PΣQT (7)

where R is the mxn factorable matrix with rank r, P is an orthogonal mxr (PT=P-1) matrix whose
columns are the left singular vectors of R, Σ is an rxr diagonal matrix with the singular values of
R as its diagonal entries, and QT is an rxn with orthonormal rows containing the right singular
vectors of R [2].
The main difference between the SVD and spectral factorization is that SVD can be
used on any size, real matrix, not just a square matrices. If non square, the mxn matrix can be
converted into a square, symmetric, positive semi-definite matrix K. Refer to Equation (8),

K = R` R (8)

where the transpose of R, RT, is multiplied by R [2]. The singular values of R, 𝜎U = 𝜆U , are the
positive square roots of the eigenvalues of K, and the corresponding singular vectors of R are
the eigenvectors of K. The singular values are placed in Σ.
The columns of Q are the unit singular vectors 𝐩# of R, corresponding to each singular
value σ# . The columns of P are given by Equation (9)

R (9)
𝐩# = ( )𝐪𝐢
σ#

After finding the column vectors of the matrices P and Q, and finding the singular values
of R, the factorization can be put in matrix multiplication form shown in Equation (7).
The singular value decomposition is essentially just a series of stretches, reflections, and
rotations. Namely, Q` and P rotate the matrix R, while Σ stretches the matrix in the direction of
the singular vectors, by an amount of the singular value of R.
The SVD is most commonly used for least squares approximation, determining rank,
range, and null-space of a matrix, and to find the pseudoinverse of a matrix, which alone has
many applications [2].

7.2 Why SVD Works for Netflix

In the application of the Netflix algorithm, the column vectors of P are the normalized user
preference vectors 𝐩𝐮 , and span the range of matrix R. The columns of Q are the item
characteristics 𝐪𝐢 , and span the co-range of R. The matrix Σ contains the singular values of R in
decreasing order, and represent the “weight” or importance of the corresponding feature vectors
(PCA). In other words, the largest singular value, σ# , corresponds to the most important vectors
𝐩𝐮 and 𝐪𝐢 , and therefore, those vectors’ dot product will provide the most accurate predicted
rating for user, u. on item, i. After computing their dot product and plugging into the predictor
model in (4), the predicted rating value is outputted [5].
Note that because the column vectors are singular vectors, they are linear independent.
This is to say that if all users who liked Star Wars also liked Lord of the Rings, then those two
vectors would be linearly dependent. A one-to-one relationship like this makes it easier to
predict the ratings, i.e. if everyone who likes Star Wars likes Lord of the Rings, then anyone who
likes Star Wars (and has not seen Lord of the Rings) will be presented Lord of the Rings as a

7
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

recommendation. This relationship is referred to as the ‘feature’ relationship between the user
and item. Comparing the ‘features’ or characteristics of users is much more efficient than
comparing users over each item. This approach is known as the previously discussed, Latent
Factor model [8].
The rank of the matrix can be decreased, because two linearly dependent vectors only
count as one for the rank, and so one of the important features of the SVD is that it shrinks the
rank of the matrix. This feature of the SVD is known as low-rank approximation, and can take
any matrix A and truncate it to find the closest matrix B with a selected rank r. Because the rank
of the matrix equals the number of nonzero singular values, the small singular values, the least
important corresponding eigen-directions, are set to zero. So, low-rank approximation can be
used to eliminate the less-important eigen-directions [2] and extracting just the important ones.

8 Example Problem
As an example of singular value decomposition, we begin with a rating matrix, R,

2 0 0 (10)
R=
0 3 0
which involves two users’ ratings of three movies.

Since our matrix is not square or symmetric, we need to take the transpose of the matrix
and multiply it by itself to produce a square, symmetric matrix. Refer to Equation (11).

2 0 4 0 0 (11)
2 0 0
K = R` R = 0 3 = 0 9 0
0 3 0
0 0 0 0 0

The eigenvalues of K can be calculated by Equation (12).

4−λ 0 0 (12)
det K − λI = 0 9−λ 0 =0
0 0 −λ

Since (K − λI) is a diagonal matrix, the determinant is simply the products of the elements of the
diagonal. Thus, we get our characteristic equation:

det K − λ = −λ 4 − λ 9 − λ = 0 (13)

After solving for the roots of the equation, λ1 = 9, λ2 = 4 and λ3 = 0. Because λ3 is equal to
zero, its associated eigenvector can be ignored. Our eigenvalues will be listed in descending
order, i.e. 𝜆m ≥ 𝜆] . The square roots of the nonzero eigenvalues σ# = λ# are the singular values
of R, and are the entries of the diagonal matrix Σ. So, the sigma matrix is as follows:

3 0 (14)
Σ=
0 2

8
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

The eigenvalues will now be used to solve for the corresponding eigenvectors by plugging each
eigenvalue into K − λI = 0 and solving the homogenous equation:

−5 0 0 (15)
𝜆m = 9 : K − 𝜆m I = 0 0 0 =0
0 0 −9

0
→ 𝒒𝟏 = 1
0

0 0 0 (16)
𝜆] = 4: K − 𝜆] I = 0 5 0 =0
0 0 −4

1
→ 𝒒𝟐 = 0
0

Normalizing these vectors will give us unit vectors in the direction of each eigenvector. In this
specific example, 𝐪𝟏 an𝐪𝟐 d are already unit vectors, and will make up the columns of our matrix
Q.

To find the columns of P,

𝑅𝒒𝟏 0 (17)
𝒑𝟏 = =
𝜎m 1

𝑅𝒒] 1 (18)
𝒑] = =
𝜎] 0

Where 𝐩𝐢 represents the column vectors of the matrix P, and qn represents the unit vectors that
make up the columns of the matrix Q.

Thus, our singular value decomposition is:

2 0 0 0 1 3 0 0 1 0 (19)
𝑅 = = 𝑃Σ𝑄 x =
0 3 0 1 0 0 2 1 0 0

9
Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

9 Matlab Code

To solve much more realistic, larger, and non-square rating matrices, Matlab
(MATLAB_R2016b) can be used. This Matlab code will take a square matrix, and output
predicted ratings for each user and item. The pseudo code for a user-inputted matrix R is as
follows:

9.1 Pseudocode
1. Ask user for input of rating matrix R
2. Use Matlab’s built in svd function to extract the factorization matrices
3. Take dot product of each pairs of columns of the factorization matrices P
and Q
4. Implant the predicted values into a matrix, with all other elements as zero
(for readability purposes), to display the predicted ratings

The program takes the inputted matrix R, finds the SVD factorization of R, calculates the
dot product for each column pair 𝐩" and 𝐪# , and produces a predicted rating matrix R. The code
can be found in the Appendix.

9.2 Example
Inputting the following arbitrary 6x6 known-rating matrix R, which includes the ratings of
six movies by six users (all zero-valued entries are unknown ratings),

1 5 0 3 0 2 (20)
1 0 3 0 3 2
5 2 1 1 0 0
𝑅 =
0 0 1 1 4 5
1 1 0 1 1 0
3 0 0 0 4 5

produces the following predicted rating matrix (all zero-valued elements are the known ratings):

0 0 3.4641 0 3.5711 0 (21)

0 3.7199 0 4.0785 0 0
0 0 0 0 3.2412 3.4094
𝑅 =
3.8620 4.4252 0 0 0 0
0 0 3.6304 0 0 3.4370
0 3.6621 3.6646 3.1424 0 0

These predicted ratings do not include any user or item bias, or temporal factors. The predicted
ratings are based off the relationship between the user’s preference and the genre of movie. For
instance, user one and two (1st and 2nd rows) have given the same ratings to movie one and six
(1st and 6th column), and we would expect that their ratings for the other items should be similar.
If we compare their ratings for item three (3rd column), user one’s predicted rating of 3.4641 is

10

Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

compared to user two’s known rating of 3. These ratings are similar, because both user’s
preferences are similar, which is what we expected.

10 Conclusion
The Netflix algorithm starts from a baseline model developed and improved by adding factors
and parameters to the equation. This equation is then taught by a test data set, and further
optimized using numerical techniques. The biggest contribution to the accuracy of the predicted
guess is the use of the Singular Value Decomposition in determining underlying features of the
user and items. Comparing the feature vectors between each user and item offers another type
of a similarity metric, in addition to the Neighborhood model where users are grouped together
based on similarity among other users. The main take away is how the SVD can extract these
trend directions that otherwise would not be observable. Without it, recommender algorithms
and collaborative filtering would never have reached the level it is today.
To improve this design, modifications can be made to the algorithm and to the SVD to
find more accurate feature vectors. The more accurate the feature vectors, the more accurate
the dot product will be, and therefore, the more accurate the predicted guess will be. Blending
more than one model will surely optimize the algorithm; averaging more than one prediction will
give a much more accurate prediction. It is apparent that consumer behavior and psychology
play a role in predictions, as we explained with the temporal factors. The power of the SVD in
the Netflix application is that it only relies on limited information of each user, but still can utilize
the information in tremendous ways.
Another powerful feature of the SVD is using Principal Component analysis to take a
matrix with a large rank, and find a matrix that holds the same information, but with a lower rank.
This minimizes computational effort and can turn an impossible-to-solve matrix into a more
reasonable problem. This is known as low-rank approximation. An important application of PCA
is photography, taking a high-pixel photo that uses lots of memory, and finding a lower-pixel
photo that stores the same resolution as the first. The possibilities of applications of the SVD are
endless. Whenever it is needed to find latent directional trends in big data, these techniques can
be utilized. For example, the variance and covariance of data sets can be calculated using the
SVD, which can provide an endless amount of information, depending on the application.
Because SVD relates two feature vectors, it could be used for online dating websites to
show similarity between two people. If the corresponding dot product between two people are
similar, a value will be produced as a metric for that similarity. If the dot product is zero, the
“people” vectors are orthogonal, and so, the two people have nothing in common. If the SVD
can be used for compressing images, then it surely could be used to compress data and
remove redundancies in a large set. Or perhaps, finding linear trends and predicting outcomes
in oscillations under various dampening conditions, for example. These are just examples of
how these math techniques can be used to solve problems that can be modeled by math. The
possibilities are endless.

11

Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

11 Appendix

11.1 Matlab Code

%% NETFLIX ALGORITHM, LINEAR ALGEBRA FINAL PROJECT, APPM3310-002, FA2016

%This script will take a SQUARE inputted matrix R (users=rows, and

%movies=columns) and product predicted ratings (Emulation of Netflix
%Algorithm)

%NOTE: this program does not include biases or any other factors (q'p). Without a large data set,
and
%the knowledge of data science techniques, the biases would all have to be
%inputted by user and hard-coded, which seems non-important (it is just another factor
%that gets added to the predicted rating). This script shows the gist of
%the predictor algorithm.

%Input user-movie matrix, A is all known ratings

R = input('Enter the user-movie rating matrix R (zero for unknown values): ');

%Initialize variables
mu = 3.7; %const overall average ratings from all users [stars]

%Find the SVD decomp R = PSQ^T

[P,S,Q] = svd(R,'econ'); %’econ’ makes S square

%Take dot product of column vectors to calculate predicted rating. Place

%all predictions in R_p, with all other elements initialized to zero
R_p = predictRating(R,mu,P,Q);

%display both known ratings and predicted ratings

fprintf('The known ratings (zero entries are unknown ratings): \n')
disp(R)
fprintf('The predicted ratings (zero entries are known ratings): \n')
disp(R_p)

predictRating function:

function [ R_p ] = predictRating( R, mu, P, Q )

%%NETFLIX ALGORITHM, LINEAR ALGEBRA FINAL PROJECT, APPM3310-002, FA2016

%predictRating function description (Line 25 in NetflixSVDalg.m)

%Check if entries of R are in the range 0<=r<5. Copy R = R_p. Replace all zero
%entries of R_p (unknown ratings) with the predicted rating. Replace all
%nonzero entires of R_p to zero, so the only nonzero elements in R_p are
%the predicted ratings

for i = 1:size(R,1)
for j = 1:size(R,2)
if R(i,j) > 5 || R(i,j) < 0 %Check if entries are 0<r<=5
error('Error, values must be 0<=r<5, rerun program') %display error if the entries r
are outside the bounds
elseif R(i,j) == 0
R_p(i,j) = mu + dot(Q(:,j),P(:,i)); %the predicted rating is given by dot(q,p) + mu +
biases (which are not included)
else R(i,j) ~= 0
R_p(i,j) = 0; %readability: set all known values to zero in order to show the
predicted values without confusing them with known ratings
end
end
end
end

12

Netflix Recommender Algorithm Straub, Ahonen, Benalcazar

11.2 References

[1] A. Töscher, M. Jahrer, The BigChaos Solution to the Netflix Prize 2008, Commendo
Research
& Consulting, 2008.
[2] M. Vozalis, K. Margaritis, Applying SVD on Generalized Item-based Filtering, International
Journal of Computer Science & Application, Vol. 3, 27-51 (2006).
[3] R. Bell, A. Töscher, M. Jahrer, The BigChaos Solution to the Netflix Grand Prize,
Commendo
Research & Consulting, 2009.
[4] R. Salakhutdinov, A. Mnih, G. Hinton, Restricted Boltzmann Machines for Collaborative
Filtering, University of Toronto, 2007.
[5] R. Bell, Y. Koren, Lessons from the Netflix Prize Challenge, SIGKDD Explorations, Vol. 9,
75-79
(2007).
[6] S. Funk, Netflix Update: Try This at Home http://sifter.org/~simon/journal/20061211.html
(December 11, 2006)
[7] Y. Koren, The BellKor Solution to the Netflix Grand Prize, Yahoo! Research Israel, 2009.
[8] Y. Koren, R. Bell, C. Volinsky, Matrix Factorizations Techniques for Recommender
Systems, IEEE
Computer Society, Vol. 42, 42-49 (2009).

Ultrasonic Test Report: Testing Technique
No ratings yet
Ultrasonic Test Report: Testing Technique
2 pages
Analysis of Frames and Machines
No ratings yet
Analysis of Frames and Machines
84 pages
15 - Matrix Factorization.pptx
No ratings yet
15 - Matrix Factorization.pptx
55 pages
The Bellkor 2008 Solution To The Netflix Prize
No ratings yet
The Bellkor 2008 Solution To The Netflix Prize
21 pages
Recommender System
No ratings yet
Recommender System
45 pages
CC Project - Tarik Sulic
No ratings yet
CC Project - Tarik Sulic
16 pages
Movie Recommendation System Using SVD
No ratings yet
Movie Recommendation System Using SVD
1 page
CompSci HL P3 Case Study
No ratings yet
CompSci HL P3 Case Study
7 pages
Machine_Learning_Model_for_Movie_Recomme
No ratings yet
Machine_Learning_Model_for_Movie_Recomme
6 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Predict
No ratings yet
Predict
196 pages
Matrix Factorization
No ratings yet
Matrix Factorization
18 pages
Recommendation System
No ratings yet
Recommendation System
15 pages
.Trashed-1724941095-Recommender Systems
No ratings yet
.Trashed-1724941095-Recommender Systems
30 pages
RecSys PyData2016
No ratings yet
RecSys PyData2016
32 pages
Personal and Big
No ratings yet
Personal and Big
6 pages
Case_Study
No ratings yet
Case_Study
8 pages
Recommendation System Based On Collaborative Filtering: Zheng Wen December 12, 2008
No ratings yet
Recommendation System Based On Collaborative Filtering: Zheng Wen December 12, 2008
10 pages
Case Study on Netflix
No ratings yet
Case Study on Netflix
20 pages
Collaborative Filtering & Recommendation System
No ratings yet
Collaborative Filtering & Recommendation System
17 pages
A Guide To Singular Value Decomposition For Collaborative Filtering
No ratings yet
A Guide To Singular Value Decomposition For Collaborative Filtering
14 pages
第十讲-Recommender Systems
No ratings yet
第十讲-Recommender Systems
81 pages
Minor Project
No ratings yet
Minor Project
15 pages
MINIPROJECT
No ratings yet
MINIPROJECT
11 pages
daa paper
No ratings yet
daa paper
6 pages
Movie Recommender System Using Content Based AndCollaborative Filtering
No ratings yet
Movie Recommender System Using Content Based AndCollaborative Filtering
7 pages
Building Industrial - Scale Real - World Recommender Systems
No ratings yet
Building Industrial - Scale Real - World Recommender Systems
82 pages
Ppt
No ratings yet
Ppt
6 pages
Final Project Document Mon
No ratings yet
Final Project Document Mon
70 pages
Building_a_Movie_Recommendation_System_using_SVD_a
No ratings yet
Building_a_Movie_Recommendation_System_using_SVD_a
4 pages
Module 2
No ratings yet
Module 2
53 pages
13jay Chotaliya
No ratings yet
13jay Chotaliya
119 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
Probabilistic Matrix Factorization: Ruslan Salakhutdinov and Andriy Mnih
No ratings yet
Probabilistic Matrix Factorization: Ruslan Salakhutdinov and Andriy Mnih
8 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Architecting Recommender Systems: Boston Machine Learning
No ratings yet
Architecting Recommender Systems: Boston Machine Learning
63 pages
Report 8 AIML
No ratings yet
Report 8 AIML
28 pages
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
Technical Documenetflix Technicalnt
No ratings yet
Technical Documenetflix Technicalnt
15 pages
Module 4
No ratings yet
Module 4
20 pages
Movie Recommender
No ratings yet
Movie Recommender
23 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
Case Study
No ratings yet
Case Study
5 pages
UNIT I- Introduction-Recommender Systems
No ratings yet
UNIT I- Introduction-Recommender Systems
24 pages
Recommendation System-WPS Office
No ratings yet
Recommendation System-WPS Office
18 pages
12-recsys-1 - converted
No ratings yet
12-recsys-1 - converted
11 pages
Movie Recommender System PDF
100% (1)
Movie Recommender System PDF
5 pages
Netflix Recommendation Based On IMDB
No ratings yet
Netflix Recommendation Based On IMDB
5 pages
Recommendation Systems
No ratings yet
Recommendation Systems
62 pages
Movie Recommendation System Based On SVD Collaborative Filtering
No ratings yet
Movie Recommendation System Based On SVD Collaborative Filtering
7 pages
Examp
No ratings yet
Examp
27 pages
Movie Recommendation System Updated
No ratings yet
Movie Recommendation System Updated
8 pages
Introduction of Ai (Bft3)
No ratings yet
Introduction of Ai (Bft3)
21 pages
Recommender Lecture
No ratings yet
Recommender Lecture
29 pages
Industrial Training PPT On Movie Recomendation System
No ratings yet
Industrial Training PPT On Movie Recomendation System
13 pages
Chapter4 - Web Based Personalization Systems - Part3 - Collaborative Filtering - SVD
No ratings yet
Chapter4 - Web Based Personalization Systems - Part3 - Collaborative Filtering - SVD
14 pages
Survey On Cinematics Recommendation System
No ratings yet
Survey On Cinematics Recommendation System
10 pages
PYTHON CBP - Removed
No ratings yet
PYTHON CBP - Removed
15 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Final Project Analysis Heliostat (7)
No ratings yet
Final Project Analysis Heliostat (7)
52 pages
Space Mean Speed Study
100% (3)
Space Mean Speed Study
9 pages
BPSC Ae Mains 2019 Question Paper Civil Engineering Paper I PDF
No ratings yet
BPSC Ae Mains 2019 Question Paper Civil Engineering Paper I PDF
10 pages
CE6402-Strength of Materials PDF
No ratings yet
CE6402-Strength of Materials PDF
12 pages
Technical Data Sheet
No ratings yet
Technical Data Sheet
3 pages
Ferrotherm 4742
No ratings yet
Ferrotherm 4742
2 pages
Save Planet Earth Project V
No ratings yet
Save Planet Earth Project V
7 pages
Reviewer Ftaai 2019 Finals
100% (1)
Reviewer Ftaai 2019 Finals
11 pages
Reporte Diario Laboratorio Asoc Nare Mayo
No ratings yet
Reporte Diario Laboratorio Asoc Nare Mayo
433 pages
I BBS Business Statistics Mathematics
No ratings yet
I BBS Business Statistics Mathematics
5 pages
CBO9780511530159A108
No ratings yet
CBO9780511530159A108
2 pages
Indeterminate Structure Pyq End Sem 2023
No ratings yet
Indeterminate Structure Pyq End Sem 2023
2 pages
Topic 2.4 - Momentum and Impulse
100% (1)
Topic 2.4 - Momentum and Impulse
49 pages
WO 2645 - PO - 963 - Altera (Papa Terra) - General Inspection
No ratings yet
WO 2645 - PO - 963 - Altera (Papa Terra) - General Inspection
39 pages
HVAC
No ratings yet
HVAC
21 pages
Nitobond EP TDS
No ratings yet
Nitobond EP TDS
2 pages
Mod 8-Application of Heat and Mass Balances
100% (3)
Mod 8-Application of Heat and Mass Balances
218 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
20 pages
Aeronautical Engineering
No ratings yet
Aeronautical Engineering
3 pages
Introduction to quantum computers Berman - Download the full ebook version right now
No ratings yet
Introduction to quantum computers Berman - Download the full ebook version right now
50 pages
Serial Review: Oxidative DNA Damage and Repair: Guest Editor: Miral Dizdaroglu
No ratings yet
Serial Review: Oxidative DNA Damage and Repair: Guest Editor: Miral Dizdaroglu
14 pages
Chapter 6 Transducers
No ratings yet
Chapter 6 Transducers
106 pages
Q.2 Revision Worksheet-2
No ratings yet
Q.2 Revision Worksheet-2
12 pages
L A S E R: The Electromagnetic Spectrum
100% (1)
L A S E R: The Electromagnetic Spectrum
56 pages
On The Sliding-Window Representation in Digital Signal Processing
No ratings yet
On The Sliding-Window Representation in Digital Signal Processing
6 pages
Chapter 5-Thin Cylinder 5907df20082ec
No ratings yet
Chapter 5-Thin Cylinder 5907df20082ec
27 pages
Iec60076-1 (Ed2.1) en D PDF
No ratings yet
Iec60076-1 (Ed2.1) en D PDF
7 pages
Microstrip Patch Antenna With Aperture Coupler Fed at 5.8GHz
No ratings yet
Microstrip Patch Antenna With Aperture Coupler Fed at 5.8GHz
6 pages

Appm 3310 Final Project

Uploaded by

Appm 3310 Final Project

Uploaded by

Netflix

Recommender Algorithm Straub, Ahonen, Benalcazar

Netflix Recommender Algorithm

Input (known ratings and user

Output (predicted ratings)

Figure 2: Visualization of Algorithm

r"# = the known rating given to item i by user u

b# = item bias (may be written as function of time b (t)) i

qi, pu = (bold lowercase) feature vectors of items and users, respectively

5 The Predictor Algorithm

5.1 Baseline Predictors

5.2 Time-Related Factors

r"$ = 𝜇 + 𝑏𝑢 (𝑡𝑢𝑖 ) + 𝑏𝑖 (𝑡𝑢𝑖 ) (2)

Figure 3: R2 Vector Space of Items and Users Features

5.3 Feature Vectors

r"$ = 𝜇 + 𝑏𝑢 𝑡𝑢𝑖 + 𝑏𝑖 𝑡𝑢𝑖 + 𝒒𝑻𝒊 𝒑𝒖 (3)

where the dot product between pu and qi is expressed in matrix form.

r"$ = 𝜇 + 𝑏𝑢 𝑡𝑢𝑖 + 𝑏𝑖 𝑡𝑢𝑖 + 𝒒𝑻𝒊 (𝒑𝒖 𝑡𝑢𝑖 ) (4)

6 Boltzmann Machine and Stochastic Gradient Descent

𝑒TU = 𝑟TV − 𝜇 − 𝑏U (𝑡TU ) − 𝑏T (𝑡TU ) − 𝒒𝑻𝒊 𝒑𝒖 (𝑡TU ) (5)

where eui is the error in the predicted rating.

𝑒TU = min[Σ 𝑟TV − 𝜇 − 𝑏U 𝑡TU − 𝑏T 𝑡TU − 𝒒𝑻𝒊 𝒑𝒖 𝑡TU (6)

7 Matrix Factorization – SVD

7.1 How the SVD Works

7.2 Why SVD Works for Netflix

The eigenvalues of K can be calculated by Equation (12).

To find the columns of P,

Thus, our singular value decomposition is:

0 0 3.4641 0 3.5711 0 (21)

11.1 Matlab Code

%This script will take a SQUARE inputted matrix R (users=rows, and

%Input user-movie matrix, A is all known ratings

%Find the SVD decomp R = PSQ^T

%Take dot product of column vectors to calculate predicted rating. Place

%display both known ratings and predicted ratings

function [ R_p ] = predictRating( R, mu, P, Q )

%%NETFLIX ALGORITHM, LINEAR ALGEBRA FINAL PROJECT, APPM3310-002, FA2016

You might also like