0% found this document useful (0 votes)
14 views

38.10 - Matrix Factorization For Recommender Systems Netflix Prize Solution - mp4

Uploaded by

NAKKA PUNEETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

38.10 - Matrix Factorization For Recommender Systems Netflix Prize Solution - mp4

Uploaded by

NAKKA PUNEETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

So Netflix price is a very popular, very popular machine learning competition.

I think this
was around 2000. This ended in 2009, that I remember very well. But I think it started
maybe a year or two earlier to this. So there is actually a Wikipedia page on Netflix Price
where you can get all the details. As I told you, I remember that the competition ended in
2009 because I was at graduate jury's school when this ended, and we ended up reading the
research on how this problem was solved. So the problem is very interesting. The problem
is Netflix, a very popular company that streams the today streams video on demand, right?
You can watch movies and things like that. I believe Netflix is available in multiple
geographies, including India, of course, it started in the US. I also have a Netflix account in
India, so I know that it exists in India for sure. So one of the interesting things is Netflix
price. The Netflix company said that we will give you a bunch of user ratings, a bunch of
user ratings for some movies. For a bunch of movies, right? The exact numbers are here. So
if you go down here, it shows that, okay, so they gave them roughly 100 million ratings for
480,000 users and 17,770 movies. This is the data set that they're given. So they're given
user, the movie, the date of rating, and the rate itself, or the grade, right? So this is the fun
part. So, given this data, they said Netflix algorithm, and they gave some loss metric. They
gave a loss metric, or they gave a performance metric, to be precise. And they said Netflix
itself, internally, we have some value. So we have a value of its. So they said the root mean
square error. So they wanted to minimize the root mean square error, right? So let's
understand what is root mean square error. So imagine for user I for movie J. Suppose if
their system predicts, if an algorithm predicts some rating, I, J, right? And if the actual
rating, so this is predicted. Let me put a hat here to represent that it's predicted, right? So
the actual rating is, let's say, Rij. Now, root mean square error works like this. You take Rij
minus, right? You'd square this. You sum up between all ijs, right? And then you take the
square root of it. This is your typical rmse. We have seen this in the past. We have seen this
when we learned regression root mean, of course, you take the number n, where n is the
number of ratings, right? Where n is the number of ratings, where number of ratings that
you're averaging. So root. So you have a square root. This is your root, this is your mean, this
is your squared. And this is the error. That's how I remember root mean square error.
Right? So they said that Netflix itself has some root mean square error. Let's call it root
mean square error, Netflix. And they said, using this data, if any team or individual can build
a better algorithm than what Netflix itself had, and if the algorithm that somebody builds
has a root mean square error, which is 10% lower than what Netflix internally has, Netflix
promised a million dollars. A million dollars. For those of you who prefer the indian system,
it is ten lakh US dollars as a prize money. Very hard competition. People took, I think,
multiple years to crack this. And the winning team, and the winning team, and the winning
team used had consisted of multiple researchers. This became a very, very active research
area. It had multiple researchers and actually lots of teams combined together at the end to
breach the 10% mark, right? So the team, actually, some of the core team members, these
are team members from at and T research, et cetera. So after the end of this whole
competition, the winners wrote a very nice article. So one of the winners is Yahuda Corin,
who happened to work at Yahoo Research while I was working there. And I happened to see
the talk that he gave. This was around 2009, when I just joined Yahoo Research, or Yahoo
Labs, as it's called in India. So when I joined Yahoo Labs in India, Yehuda Korean gave. He's
one of the team members, he's not the only member, he's one of the team members that
won this competition, and he provided a brilliant lecture on how they won. Thankfully, they
have written a very nice research paper called matrix factorization techniques for
recommender systems, explaining how they built these systems. Truth be told, for
recommender systems, matrix factorization became popular, became popular, became very
popular only after the Netflix price people tried to use it, but not very successfully. It's only
after Netflix price that people realize that matrix factorization is a very, very powerful
technique for recommender system. Earlier to that, people were mostly doing item item
similarity, or they were doing user user similarity type of stuff. Matrix factorization, as a
core idea became extremely popular after Netflix price. And actually, frankly speaking, it's
only after Netflix price that I actually learned about matrix factorization and started
applying it to new problems. Personally speaking, we have provided a reference link in this
video for this research paper itself. It's a brilliant research paper. It's not very hard to read,
actually. I strongly recommend everyone to read this whole research paper. We will cover
part of this research paper in this video, but I strongly recommend it's very, very readable.
It's very, very simple to read. It's written in an English or the terminology is very, very
simple. It's not like a dense research paper. It's beautifully written research paper. I
strongly recommend everyone to read it. Okay, so let's go to the problem itself. I will use the
notation that is there in the research paper so that it's easy for you to follow later. Right. So,
Rui, let me introduce you to some terminology. RuI is the rating given by a user u on an item
I, right? So qi is an item vector. Qi is an item vector. Pu is a user vector. Right. Here, I'm just
using the notation that is used in the research paper so that it will be easy for you to follow
later. Right. So, given this data, now, if you think about it logically, your RUI is nothing but aij
for us in our previous discussions. Right, your RUI is nothing but aij. I'm trying to connect
the dots between what we already learned and this notation so that it's easy for you to
follow. Right. So what is the problem that we are solving? So let's take the first optimization
problem that we are solving. We are trying to find q and p user vectors and, sorry, user
vectors and item vectors such that across all items and users. Sorry, let me write it this way.
Instead of writing ij, the actual notation used in the paper is across all users and items.
Right. I want to minimize r u I minus q I transpose pu square. Right. This part here, this part
here is exactly same as AI. J minus bi transpose cj squared in our previous notation. Right.
This is exactly what we did in the previous notation. This is your squared loss. I'm just using
the notation of the paper again. But they say that instead of just solving this problem, it is
better to add regularization to this. So they add lambda times right. Q I square plus pu
square. If you think about this from an optimization standpoint, if you look at this from an
optimization standpoint, this is your loss. This is your squared loss. And this is your l two
regularization. And why do you need an L two regularization? As usual, to avoid overfitting.
This is to avoid overfitting. So actually, even though I said this, even though I said that this is
a problem we solve, we should always add regularization to any optimization problem so
that we avoid overfitting. Right. So the actual problem that we are solving here, let me write
it again. Clearly here is minimization. Find me user vectors and item vectors which
minimize sum over all users and items the rating given by user u on item I minus the
product of user vectors and item vectors squared plus. Okay, let me write it here itself. Plus
lambda times my regularization, which is q I squared plus pu squared. And we know how to
find the lambda using crossfire rotation. Right? Very simple. To put it simply, this is your
loss and this is your l two regularizer. And how do you solve it? How do you solve this
problem? Of course, one solution for this is your SGD, right? Given this problem, you can
compute the derivative, because what are the stuff that you have to find if this is your.

You might also like