0% found this document useful (0 votes)
8 views

38.3 - Similarity Based Algorithms - mp4

Ml project

Uploaded by

NAKKA PUNEETH
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

38.3 - Similarity Based Algorithms - mp4

Ml project

Uploaded by

NAKKA PUNEETH
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Some of the simplest recommender system algorithms that we can design or build are called

similarity based algorithms. There are broadly two types of similarities that we can use. One
of them is called the item item based similarity. The other one is called the user. User
similarity. Very, very simple ideas. Item item similarity was actually popularized, it was
popularized by Amazon, by Amazon in 98. If I'm not wrong, there is a research paper, I think
in the early 2000s, maybe 2000 to 2001, which is very popular, an Amazon popularized
item, item similarity. But it's a very simple idea. It's nothing like super fancy, but it is applied
at scale, at a large ecommerce like scale at Amazon around late 90s. But let's go into the core
idea itself. Let's look at the user. User, user user similarity based recommender system,
right? It's a very, very simple idea. Look at it like this, right? We are given this big matrix.
We are given this big matrix a, right? We are given this big matrix A, where you have user
one, user two, so on user I, so on user n. Similarly, item one, item two, so on item j, item m,
right? So if you look at this vector, let's look at this vector, right? So let's call this vector also
ui. I'll write it as a column vector, just for simplicity. So if you take this column vector, if you
take this column vector and put it here, this is actually a row vector, right? I'm just writing it
as a column vector where the first cell will be what is the rating that ui gave on item 1?
Second value will be what is the rating that ui gave on item two. Similarly, ui on item three,
so on and so forth. The last value is ui on item m, right? So this vector, ui can be thought of
as a user vector. Of course, this vector, this vector is a sparse vector. It's a very, very sparse
vector, right? It's a sparse vector which is very similar to your bag of words, right? Your bag
of words, remember, your bag of words is also a sparse vector which has counts. This user
vector has rating of user ui on item ij, and it's also very sparse, just like bag of words. User
vector is also a sparse vector, right? That's the similarity between the user vectors and bag
of words that we learned in text processing. Now I can define the similarity between a user
ui and uj as cosine. Similarity cosine between ui comma uj, which is nothing but ui
transpose uj divided by the length of ui and the length of uj or l two norm. If you want to
write it the l two norm of ui and uj. This is nothing but your cosine similarity between ui and
uj, right? Imagine given this matrix. Now given this matrix, given this matrix s, right? I can
compute for every pair of users using this vector representation of a user. Using this user
vectors, I can compute a similarity, right? So let's call these similarity values as sij on user
values. So I'll put a u as a superscript to symbolize that these are user similarities, right? So
imagine if I build a matrix s with siju. So this is nothing but a user similarity matrix. It's a
user similarity matrix, right? And here we are using the cosine similarity. You can use any
similarity metric of your choice. But cosine similarity is more popular because these are
sparse vectors, right? So now given this similarity vector, how does it look? The similarity
vector will look like this, visually speaking, right? So it has user one, user two, so on, so
forth, user n, user one, user two, so on, user n. So this su is an n cross n matrix, right? Where
any cell, ui and uj. Let's assume this cell is, this cell represents how similar, how similar is
user I is to user j. Now once you compute it, there are a lot of fun things. So let's assume,
let's take the task. Let's assume you are given user ten, right? Your task is to recommend.
Your task is to recommend new items, let's say new items to user ten. Now what you can do
here is to user ten, you'll go to the user ten vector here. So user ten in the similarity matrix,
right? Of course, if you look at this just by looking at these values, whichever are the large
values, because larger values basically means more similar. If you can declare that user one,
user two, and user seven are the three most are the three most similar users are the three
most similar users to U ten. You can easily get that right by looking at this user similarity
matrix. And remember, how is this user similarity matrix built? By using the ratings given
by each user. Let's not forget that flow, right? This similarity was built using the ratings.
This UI came from these ratings data, right? So we are using the ratings data itself to say
that user ten's ratings are very similar to user one, user two, and user seven ratings. So let's
assume that what I'll do here is so I know that these three are the users which are very
similar to uten. Now I will say let's pick up items. Let's pick items. Let's pick items that are
liked by user one, user two, and user seven that are not yet watched. That are not yet
watched by U ten, right? There will be some items that user one, user two, and user seven
have liked which are not yet rated by u ten or which is not yet watched by u ten. Now pick
up those items and you recommend those items and recommend them to uten. This is how a
user user recommendation system user user similarity based recommendation system will
work, right? The flow is like this. The flow is first. You use the first step. The first step here is
first step here is to build a user vector based on ratings. The second step is to compute a
similarity matrix. Once you get a similarity matrix, the third step is if you want to find this,
the third step is to find the most similar users. The fourth step is find the items that are
liked by these similar users that is not yet watched by uten. Recommend them. That's the
fifth step. You're done. Very very simple algorithm, right? So there is one small problem.
There is one small problem. There is one small problem with user user with user user
similarity based recommender systems. One major similarity based recommender system.
That problem is users preferences. Users preferences change over time. Change over time.
And there is no way in this similarity similarity based scheme to be able to do that very well
because look at it. Imagine take YouTube as an example, right? YouTube as an example.
Today, maybe I'm trying to buy a smartwatch or something, right? So I look at lot of videos
of reviews, of product reviews, of product reviews. Or tomorrow I may have discovered a
new artist and I may listen to lots of songs by the new artist. By a new artist, right? So it gets
much, much harder because my tastes are evolving with time and users preferences change
much more frequently over time than other things, right? So we'll see how this problem can
be avoided using item item similarity. One major problem with all user user similarity based
recommender systems is users preferences changing over time. If they change too often, it's
harder if they don't change too often, probably you can build your matrix with respect to
time, this ratings matrix, you can say, I'm not going to use all the ratings of the user. I'm
going to use only the last 90 days of data or last three months of data, okay? That way, if
user tastes or preferences do not change much for every three months, but again, if you only
use the last 90 days data, this becomes sparser. There are all these problems, right? Of
course you can limit your data to only the last few days, but then this data becomes much,
much sparser. You're not using historically old data. Right? So the alternative approach for
this is called item. Item is called item item based similarity based recommender system. It's
a very, very simple idea. It's very, very similar to the user, user similarity matrix, except that
now I will represent each item as a vector. And how do I get that vector again, from my a
matrix? If you take my a matrix, an item I subscript I has a vector here. My item I j also has a
vector representation here, right? I take these vectors, and now I'll say similarity between
item I and item j is nothing, but I can define it as cosine similarity between II and I j. Right?
So here there is one key advantage of item based stuff. Ratings or ratings on a given item. On
a given item. This is a very, very key aspect. Ratings on a given item do not change
significantly. Do not change significantly. Right? After the initial period. After the initial
period. So, for example, let's take a very popular movie like Titanic, right? So when titanic
was released, probably in the first few days, there are lots of ratings, right? And let's assume
the average rating on Titanic is, let's say, four stars out of five stars. Now, after the initial
period, most people recognize that Titanic is a brilliant movie and its rating would not
change as significantly. So ratings on a given product, for a given product or item do not
change significantly over time. Do not change significantly over time. After the initial period.
In the initial period, there will be lot of positive comments. There will be negative
comments, there will be like, pros, cons, all of that. But after a limited period of time, the
ratings more or less stabilize. And this is the reason why e commerce companies like
Amazon preferred, like, I read AMZN because that's a stock market symbol for Amazon.
Actually, I often write Amazon in short form as AMZN because that's the stock market
symbol for Amazon. Anyway, okay? Having said that, companies like Amazon preferred this
approach. And now, once you have the similarity matrix, it's very, very simple. Imagine you
have a user ten to whom you want to recommend products, right? So you know that user
one already likes, let's say you already know that, let's say from historical data, you know
that user one likes item one, item three, and item seven. Now, to recommend a new product,
you will say, tell me all the products that are similar to item one, right? This is products
similar to products or items similar to. Similar to I one. Similarly, you'll get another set
which are products or items. Products similar to item three. Similarly, you'll get another set
which are similar to I seven. Now, you say if there is an item, let's say I four, that is present
in many of these sets, if it's present in this and also this, then the probability or the
likelihood that u ten will like I four is high because u ten already likes item one, item three,
and item seven and item four. Here, item four, right, is similar to both item one and item
three, and hence there is a very high likelihood that you ten will like item four. So I'll
recommend item four to that person. As a rule of thumb, as a rule of thumb, as a rule of
thumb, when you have more users, when you have more users than items, when you have
more users than items. And this is what happens for Amazon. Amazon has hundreds of
millions of users and only maybe a few or a few tens of millions of items. Or with Netflix, or
even with YouTube for that matter. When you have more users than items, you know that.
And when item ratings, when item ratings do not change much over time, okay? Except for
the initial period. Except for the much, much over time. After the initial period. After the
initial period. When you know this, it is better to use item item similarity based
recommender system over user user similarity based user user based recommender
system. So it's better to use item item recommender system when you have more users than
items, right? And because computing item item similarity is easy, right, if you have more
users, computing SU, which is a similarity matrix for users, versus si, which is a similarity
matrix for items. This is easy, right? And you know that item ratings do not change much
over time. So it's much more beneficial to compute similarity across items rather than
similarity across users. In such a case, typically people prefer item item recommender
system over user user recommender system. I mean, the most I've seen either at Netflix,
YouTube, Amazon, most of these e commerce companies, Alibaba, etc. Or even ebay, use
item item over user user in most situations. Not all situations, of course.

You might also like