Recommendation System
Recommendation System
SYSTEM
UNIT -5 Sowmya V
Assistant Professor
BMSCE, Bangalore
Recommender system or Recommendation system
In this system, keywords are used to describe the items and a user profile is built to indicate
the type of item this user likes.
To create a user profile, the system mostly focuses on two types of information:
1. A model of the user's preference.
2. A history of the user's interaction with the recommender system.
Consider an example of recommending news articles to users. Let’s say we have 100 articles and a
vocabulary of size N. We first compute the tf-idf score for each of the words for every article. Then we
construct 2 vectors:
1. Item vector: This is a vector of length N. It contains 1 for words that have a high tf-idf score in that
article, otherwise 0.
2. User vector: Again a 1xN vector. For every word, we store the probability of the word occurring (i.e.
having a high tf-idf score) in articles that the user has consumed. Note here, that the user vector is
based on the attributes of the item (tf-idf score of words in this case).
Once we have these profiles, we compute similarities between the users and the items. The items that
are recommended are the ones that 1) the user has the highest similarity with or 2) has the highest
similarity with the other items the user has read.
2 common methods:
1.Cosine Similarity:
To compute similarity between the user and item, we simply take the cosine similarity between the user
vector and the item vector. This gives us user-item similarity.
To recommend items that are most similar to the items the user has bought, we compute cosine similarity
between the articles the user has read and other articles. The ones that are most similar are recommended.
Thus this is item-item similarity.
Cosine similarity is best suited when you have high dimensional features, especially in information
retrieval and text mining.
2. Jaccard similarity:
Also known as intersection over union, the formula is as follows:
This is used for item-item similarity. We compare item vectors with each other and return the items
that are most similar.
Jaccard similarity is useful only when the vectors contain binary values. If they have rankings or
ratings that can take on multiple values, Jaccard similarity is not applicable.
In addition to the similarity methods, for content based recommendation, we can treat
recommendation as a simple machine learning problem. Here, regular machine learning algorithms
like random forest, XGBoost, etc., come in handy.
Collaborative Filtering
Collaborative filtering is based on the assumption that people who agreed in the past will agree
in the future, and that they will like similar kinds of items as they liked in the past.
The system generates recommendations using only information about rating profiles for
different users or items. By locating peer users/items with a rating history similar to the current
user or item, they generate recommendations using this neighborhood.
Examples of explicit data collection include the following:
∙ Asking a user to rate an item on a sliding scale.
∙ Asking a user to search.
∙ Asking a user to rank a collection of items from favorite to least favorite.
∙ Presenting two items to a user and asking him/her to choose the better one of them.
∙ Asking a user to create a list of items that he/she likes .
Examples of implicit data collection include the following:
∙ Observing the items that a user views in an online store.
∙ Analyzing item/user viewing times.
∙ Keeping a record of the items that a user purchases online.
∙ Obtaining a list of items that a user has listened to or watched on his/her computer.
∙ Analyzing the user's social network and discovering similar likes and dislikes.
Collaborative filtering approaches often suffer from three problems: cold start, scalability, and
sparsity.
∙ Cold start: For a new user or item, there isn't enough data to make accurate
recommendations.
∙ Scalability: In many of the environments in which these systems make recommendations,
there are millions of users and products. Thus, a large amount of computation power is
often necessary to calculate recommendations.
∙ Sparsity: The number of items sold on major e-commerce sites is extremely large. The most
active users will only have rated a small subset of the overall database. Thus, even the
most popular items have very few ratings.
One of the most famous examples of collaborative filtering is item-to-item collaborative filtering
(people who buy x also buy y), an algorithm popularized by Amazon.com's recommender
system.
Memory based approach
For the memory based approach, the utility matrix is memorized and recommendations are made by querying
the given user with the rest of the utility matrix. Let’s consider an example of the same: If we have m movies
and u users, we want to find out how much user i likes movie k.
This is the mean rating that user i has given all the movies she/he has rated. Using this, we estimate his
rating of movie k as follows:
Similarity between users a and i can be computed using any methods like cosine similarity/Jaccard
similarity/Pearson’s correlation coefficient, etc.
These results are very easy to create and interpret, but once the data becomes too sparse, performance
becomes poor.
Model based approach
One of the more prevalent implementations of model based approach is Matrix Factorization. In this, we
create representations of the users and items from the utility matrix. This is what it looks like:
Thus, our utility matrix decomposes into U and V where U represents the users and V represents the
movies in a low dimensional space. This can be achieved by using matrix decomposition techniques like
SVD or PCA or by learning the 2 embedding matrices using neural networks with the help of some optimizer
like Adam, SGD etc.
For a user i and every movie j we just need to compute rating y to and recommend the movies with the
highest predicted rating. This approach is most useful when we have a ton of data and it has high sparsity.
Matrix factorization helps by reducing the dimensionality, hence making computation faster. One
disadvantage of this method is that we tend to lose interpretability as we do not know what exactly
elements of the user/item vectors mean.