unit-i-introduction
unit-i-introduction
UNIT I- Introduction
UNIT I INTRODUCTION 6
Introduction and basic taxonomy of recommender systems - Traditional and non-personalized
Recommender Systems - Overview of data mining methods for recommender systems-
similarity measures- Dimensionality reduction – Singular Value Decomposition (SVD)
Suggested Activities:
Practical learning – Implement Data similarity measures.
External Learning – Singular Value Decomposition (SVD) applications
Suggested Evaluation Methods:
Quiz on Recommender systems.
Quiz of python tools available for implementing Recommender systems
INTRODUCTION:
Recommender systems, also known as recommendation systems or engines, are
a type of software application designed to provide personalized suggestions or
recommendations to users. These systems are widely used in various online platforms
and services to help users discover items or content of interest. Recommender systems
leverage data about users' preferences, behaviors, and interactions to generate accurate
and relevant recommendations.
Ranking and presentation: Finally, the recommended items are ranked based on
their relevance to the user. The top-ranked items are then presented to the user
through interfaces like recommendation lists, personalized emails, or pop-up
suggestions.
There are several types of recommender systems, each with its own
approach to generating recommendations. The basic taxonomy of recommender
systems includes:
1. Netflix
2. Amazon
Amazon’s recommendation engine suggests products based on a user’s purchase
history, search history, and browsing behavior. It makes personalized
recommendations based on the user’s prior purchases, products viewed, and items
added to their shopping cart.
3. Spotify
Spotify’s music recommendation system suggests songs, playlists, and albums
depending on a user’s listening history, liked songs, and search history. It tailors
recommendations based on the user’s listening habits, favorite genres, and favorite
artists.
4. YouTube
YouTube’s recommendation engine suggests videos based on a user’s viewing
history, liked videos, and search history. The algorithm considers factors such as the
user’s favourite channels, the length of time spent watching a video, and other viewing
habits to make personalized recommendations.
5. LinkedIn
LinkedIn’s recommendation engine suggests jobs, connections, and content based on
a user’s profile, skills, and career history. To make personalized recommendations, the
algorithm takes the user’s job title, industry, and location.
6. Zillow
Zillow’s recommendation system suggests real estate properties depend on a user’s
search history and preferences. Users can receive personalized recommendations
based on their budget, location, and desired features.
7. Airbnb
Airbnb’s recommendation system suggests accommodations based on a user’s
search history, preferences, and reviews. Personal recommendations are made based
on factors such as the user’s travel history, location, and desired amenities.
8. Uber
Uber’s recommendation system suggests ride options created on a user’s previous
rides and preferred options. When recommending rides, the algorithm considers
factors such as the user’s preferred vehicle type, location, and other preferences.
9. Google Maps
Google Maps’ recommendation system suggests places to visit, eat, and shop based
on a user’s search history and location. Personalized recommendations are generated
based on factors such as the user’s location, time of day, and preferences.
10. Goodreads
Based on the user’s data such as purchases or ratings, personalized recommenders try to
understand and predict what items or content a specific user is likely to be interested in. In
that way, every user will get customized recommendations.
Personalized recommender systems can be categorized into several types, each with its own
methods and techniques for providing tailored recommendations.
These include:
Content-based filtering,
Collaborative filtering, and
Hybrid recommenders.
CONTENT-BASED FILTERING
Let’s assume that Jenny loves sci-fi books and her favorite writer is Walter Jon Williams. If
she reads the Aristoi book, then her recommended book will be Angel Station, also a sci-fi
book written by Walter Jon Williams.
Advantages
10
Reduced data privacy concerns: Since content-based systems primarily use item
attributes, they may not require as much user data, which can mitigate privacy
concerns associated with collecting and storing user data.
Disadvantages of the content-based approach
The “Filter bubble”: Content filtering can recommend only content similar to the user’s
past preferences. If a user reads a book about a political ideology and books related to that
ideology are recommended to them, they will be in the “bubble of their previous
interests”.
Limited serendipity: Content-based systems may have limited capability to recommend
items that are outside a user’s known preferences.
In the first case scenario, 20% of items attract the attention of 70-80% of users and 70-
80% of items attract the attention of 20% of users. The recommender’s goal is to
introduce other products that are not available to users at first glance.
In the second case scenario, content-based filtering recommends products that are fitting
content-wise, yet very unpopular (i.e. people don’t buy those products for some reason,
for example, the book is bad even though it fits thematically).
Over-specialization: If the content-based system relies too heavily on a user’s past
interactions, it can recommend items that are too similar to what the user has already seen
or interacted with, potentially missing opportunities for diversification.
COLLABORATIVE FILTERING
Memory-based recommenders
Memory-based recommenders can be categorized into two main types user-based and
item-based collaborative filtering.
A user-based collaborative filtering recommender system
11
With the used-based approach, recommendations to the target user are made by
identifying other users who have shown similar behavior or preferences. This
translates to finding users who are most similar to the target user based on their
historical interactions with items. This could be “users who are similar to you also
liked…” type of recommendations.
But if we say that users are similar, what does that mean?
Let’s say that Jenny and Tom both love sci-fi books. This means that, when a new sci-
fi book appears and Jenny buys that book, that same book will be recommended to
Tom, since he also likes sci-fi books.
An item-based collaborative filtering recommender system
12
The idea is to find items that share similar user interactions and recommend those
items to the target user. This can include “users who liked this item also liked…” type
of recommendations.
To illustrate with an example, let’s assume that John, Robert, and Jenny highly rated
sci-fi books Fahrenheit 451 and The Time Machine, giving them 5 stars. So, when
Tom buys Fahrenheit 451, the system automatically recommends The Time Machine
to him because it has identified it as similar based on other users’ ratings.
How to calculate user-user and item-item similarities?
Unlike the content-based approach where metadata about users or items is used, in the
collaborative filtering memory-based approach we are looking at the user’s behavior
e.g. whether the user liked or rated an item or whether the item was liked or rated by a
certain user.
For example, the idea is to recommend Robert the new sci-fi book. Let’s look at the
steps in this process:
Create a user-item-rating matrix.
Create a user-user similarity matrix: Cosine similarity is calculated (alternatives:
adjusted cosine similarity, Pearson similarity, Spearman rank correlation) between
every two users. This is how we get a user-user matrix. This matrix is smaller than the
initial user-item-rating matrix.
Look up similar users: In the user-user matrix, we observe users that are most
similar to Robert.
Candidate generation: When we find Robert’s most similar users, we look at all the
books these users read and the ratings they gave them.
Candidate scoring: Depending on the other users’ ratings, books are ranked from the
ones they liked the most, to the ones they liked the least. The results are normalized
on a scale from 0 to 1.
Candidate filtering: We check if Robert has already bought any of these books and
eliminate those he already read.
The item-item similarity calculation is done in an identical way and has all the same
steps as user-user similarity.
Model-based recommenders
13
These systems learn patterns, correlations, and relationships from historical user-item
interaction data to make predictions about a user’s preferences for items they haven’t
interacted with yet.
There are different types of model-based recommenders, such as matrix factorization,
Singular Value Decomposition (SVD), or neural networks.
However, matrix factorization remains the most popular one, so let’s explore it a bit
further.
Matrix factorization
Matrix factorization aims to approximate this interaction matrix by factorizing it into two or
more lower-dimensional matrices:
User latent factor matrix (U), which contains information about users and their
relationships with latent factors.
Item latent factor matrix (V), which contains information about items and their
relationships with latent factors.
The rating matrix is a product of two smaller matrices – the item-feature matrix and the user-
feature matrix. The higher the score in the matrix, the better the match between the item and
the user.
14
Where K is a set of (u, i) pairs, r(u, i) is the rating for item i by user u and λ is a
regularization term (used to avoid overfitting).
In order to minimize loss function we can apply Stochastic Gradient Descent (SGD)
or Alternating Least Squares (ALS). Both methods can be used to incrementally
update the model as new rating comes in. SGD is faster and more accurate than ALS.
Advantages of collaborative filtering
15
It’s important to note that while collaborative filtering offers these and other advantages, it
also has its limitations, including:
User cold start occurs when a new user joins the system without any prior interaction
history. Collaborative filtering relies on historical interactions to make
recommendations, so it can’t provide personalized suggestions to new users who start
with no data.
Item cold start happens when a new item is added, and there’s no user interaction data
for it. Collaborative filtering has difficulty recommending new items since it lacks
information about how users have engaged with these items in the past.
Sensitivity to sparse data: Collaborative filtering depends on having enough user-
item interaction data to provide meaningful recommendations. In situations where
data is sparse and users interact with only a small number of items, collaborative
filtering may struggle to find useful patterns or similarities between users and items.
Potential for popularity bias: Collaborative filtering tends to recommend popular
items more frequently. This can lead to a “rich get richer” phenomenon, where
already popular items receive even more attention, while niche or less-known items
are overlooked.
To address these and other limitations, recommendation systems often use hybrid
approaches that combine collaborative filtering with content-based methods or other
techniques to improve recommendation quality in the long run.
HYBRID RECOMMENDERS
16
17
Mean Absolute Error (MAE), Root Mean Square Error (RMSE), or Mean Squared
Logarithmic Error (MSLE).
Ranking metrics evaluate how well a recommender system ranks items for a user,
especially in top-N recommendation scenarios. Think of hit rate, average reciprocal
hit rate (ARHR), cumulative hit rate, or rating hit rate.
Diversity metrics assess the diversity of recommended items to ensure that
recommendations are not overly focused on a narrow set of items. These include
Intra-List Diversity or Inter-List Diversity.
Novelty metrics evaluate how well a recommender system introduces users to new or
unfamiliar items. Catalog coverage and item popularity belong to this category.
Serendipity metrics assess the system’s ability to recommend unexpected but
interesting items to users – surprise or diversity are looked at in this case.
You can also choose to look at some business metrics such as conversion rate, click-
through rate (CTR), or revenue impact. But, ultimately, the best way to do an online
evaluation of your recommender system is through A/B testing.
2. Clustering Algorithms:
Overview: Clustering methods group users or items with similar characteristics.
Users or items within the same cluster are likely to share common preferences.
Application: Recommending items popular within a user's cluster, assuming similar
preferences within the group.
3. Classification Algorithms:
Overview: Classification models predict user preferences for items based on
historical interactions. These models can be trained to classify items as relevant or
irrelevant to a user.
Application: Providing recommendations by predicting user preferences for items
not yet interacted with.
18
4. Matrix Factorization:
Overview: Matrix factorization techniques decompose the user-item interaction
matrix into latent factors, capturing hidden patterns and relationships. Singular Value
Decomposition (SVD) and Alternating Least Squares (ALS) are common matrix
factorization methods.
Application: Predicting missing values in the user-item matrix to recommend items
a user might like.
Similarity Measures:
Different data types require different functions to measure the similarity of data points.
Diffentiation between unary, binary and quantitative data helps with most problems. Unary
data could be the number of likes for a blog post. Binary data could be likes and dislikes of
a video and quantitative data could be rating provided like 4/10 stars or similar. The
following table summarises which similarity functions are suitable for different data types.
1. Cosine Similarity:
Definition: Measures the cosine of the angle between two vectors, representing
users or items, in a multidimensional space.
Cosine similarity is a measure used to determine the similarity between two non-
zero vectors in a vector space. It calculates the cosine of the angle between the
vectors, representing their orientation and similarity.
A · B denotes the dot product of vectors A and B, which is the sum of the element-
wise multiplication of their corresponding components.
||A|| represents the Euclidean norm or magnitude of vector A, calculated as the
square root of the sum of the squares of its components.
||B|| represents the Euclidean norm or magnitude of vector B.
The resulting value ranges from -1 to 1, where 1 indicates that the vectors are in the same
direction (i.e., completely similar), -1 indicates they are in opposite directions (i.e.,
completely dissimilar), and 0 indicates they are orthogonal or independent (i.e., no
19
similarity). It is particularly useful in scenarios where the magnitude of the vectors is not
significant, and the focus is on the direction or relative orientation of the vectors.
Dimensionality Independence: It is not affected by the magnitude or length of vectors. It
solely focuses on the direction or orientation of the vectors. This property makes it
valuable when dealing with high-dimensional data or sparse vectors, where the magnitude
of the vectors may not be as informative as their relative angles or orientations.
Sparse Data: It is particularly effective when working with sparse data, where vectors
have many zero or missing values. In such cases, the non-zero elements play a crucial role
in capturing the meaningful information and similarity between vectors.
Application: In recommender systems, cosine similarity can be used to measure the
similarity between user preferences or item characteristics, aiding in generating
personalised recommendations based on similar user preferences or item profiles.
2. Pearson Correlation Coefficient:
Definition: Measures linear correlation between two variables, providing a
measure of the strength and direction of a linear relationship.
The Pearson correlation coefficient, also known as Pearson’s correlation or simply
correlation coefficient, is a statistical measure that quantifies the linear
relationship between two variables. It measures how closely the data points of the
variables align on a straight line, indicating the strength and direction of the
relationship.
The Pearson correlation coefficient is denoted by the symbol “r” and takes values
between -1 and 1. The coefficient value indicates the following:
r = 1: Perfect positive correlation. The variables have a strong positive linear
relationship, meaning that as one variable increases, the other variable also
increases proportionally.
r = -1: Perfect negative correlation. The variables have a strong negative linear
relationship, meaning that as one variable increases, the other variable decreases
proportionally.
r = 0: No linear correlation. There is no linear relationship between the variables.
They are independent of each other.
Application: Evaluating how well users' preferences align, especially in
scenarios with numerical ratings.
3. Jaccard Similarity:
Definition: Measures the intersection over the union of sets, quantifying the
similarity between two sets.
It calculates the size of the intersection of the sets divided by the size of their union.
The resulting value ranges from 0 to 1, where 0 indicates no similarity and 1 indicates
complete similarity.
20
In other words, to calculate the Jaccard similarity, you need to determine the common
elements between the sets of interest and divide it by the total number of distinct
elements across both sets.
In other words, to calculate the Jaccard similarity, you need to determine the common
elements between the sets of interest and divide it by the total number of distinct
elements across both sets.
It is useful because it provides a straightforward and intuitive measure to quantify the
similarity between sets. Its simplicity makes it applicable in various domains and
scenarios.
Here are some key reasons for its usefulness:
Set Comparison: It enables the comparison of sets without considering the
specific elements or their ordering. It focuses on the presence or absence of
elements, making it suitable for cases where the structure or attributes of the
elements are not important or would need additional feature engineering,
which would slow down the system.
Scale-Invariant: It remains unaffected by the size of the sets being compared.
It solely relies on the intersection and union of sets, making it a robust
measure even when dealing with sets of different sizes.
Binary Data: It is particularly suitable for binary data, where elements are
either present or absent in the sets. It can be applied to scenarios where the
presence or absence of specific features or attributes is important for
comparison.
Applications
In the context of a recommender system, Jaccard similarity can be used to
identify users with similar item preferences and recommend items that are
highly rated or popular among those similar users. By leveraging Jaccard
similarity, the recommender can enhance the personalisation of
recommendations and help users discover relevant items based on the
preferences of users with similar tastes.
Assessing similarity between sets of items liked or interacted with by users.
4. Euclidean Distance:
Definition: Represents the straight-line distance between two points in a
multidimensional space.
Application: Quantifying the dissimilarity or proximity between user or item
vectors.
5. Manhattan Distance:
Definition: Measures the distance between two points by summing the absolute
differences along each dimension.
Application: Similar to Euclidean distance, but may be less sensitive to outliers.
6. Hamming Distance:
Definition: Measures the number of positions at which corresponding bits differ in
two binary strings.
Application: Suitable for comparing binary user profiles or item representations.
21
Choosing the appropriate data mining method and similarity measure depends on
the characteristics of the data, the nature of the recommendation problem, and
computational considerations. Hybrid approaches that combine multiple methods or
measures often yield more robust and accurate recommendations.
DIMENSIONALITY REDUCTION:
Overview:
Dimensionality reduction is a technique used to reduce the number of features
(dimensions) in a dataset while preserving its essential information. In the context of
recommender systems, dimensionality reduction is often applied to user-item
interaction matrices to capture latent factors that represent hidden patterns in the data.
By reducing the dimensionality, the computational complexity decreases, and the
model becomes more efficient.
Methods:
Principal Component Analysis (PCA): PCA is a popular linear dimensionality
reduction method that transforms the original features into a new set of uncorrelated
variables (principal components) while preserving the variance in the data.
Singular Value Decomposition (SVD): SVD is a matrix factorization technique
that decomposes a matrix into three other matrices, capturing latent factors. It is
commonly used in collaborative filtering for recommender systems.
Non-Negative Matrix Factorization (NMF): NMF decomposes a matrix into
two lower-rank matrices with non-negative elements, making it suitable for
scenarios where non-negativity is a meaningful constraint.
22
From matrix factorization, the latent factors show the characteristics of the items.
Finally, the utility matrix A is produced with shape m*n. The final output of the matrix
A reduces the dimension through latent factors’ extraction. From the matrix A, it shows
the relationships between users and items by mapping the user and item into r-
dimensional latent space. Vector X_i is considered each item and vector Y_u is regarded
as each user. The rating is given by a user on an item as Rui = X Ti ∗Y u . The loss can be
minimized by the square error difference between the product of R_ui and the expected
rating.
Regularization is used to avoid overfitting and generalize the dataset by adding the
penalty.
Here, we add a bias term to reduce the error of actual versus predicted value by the
model.
(u, i): user-item pair
μ: the average rating of all items
bi: average rating of item i minus μ
bu: the average rating given by user u minus μ
The equation below adds the bias term and the regularization term:
23
24