Unit-1 - Introduction
Unit-1 - Introduction
Introduction
PERSONALIZED RECOMMENDER
SYSTEMS
• Personalized recommendation systems
are designed to provide tailored
recommendations to individual users
based on their past behavior,
preferences, and demographic
information.
• These include:
• Content-based filtering,
• Collaborative filtering, and
• Hybrid recommenders.
CONTENT-BASED FILTERING
•Content-based recommender systems use items or user
metadata to create specific recommendations. To do this,
we look at the user’s purchase history.
•For example, if a user has already read a book from one
author or a product from a certain brand, you assume that
they have a preference for that author or that brand. Also,
there is a probability that they will buy a similar product in
the future.
CONTENT-BASED FILTERING
•Advantages
•Less cold-start problem: Content-based recommendations can
effectively address the “cold-start” problem, allowing new users or
items with limited interaction history to still receive relevant
recommendations.
•Transparency: Content-based filtering allows users to understand
why a recommendation is made because it’s based on the content
and attributes of items they’ve previously interacted with.
•Diversity: Considering various attributes, content-based systems
can provide diverse recommendations. For example, in a movie
recommendation system, recommendations can be based on genre,
director, and actors.
•Reduced data privacy concerns: Since content-based systems
primarily use item attributes, they may not require as much user
data, which can mitigate privacy concerns associated with collecting
and storing user data.
CONTENT-BASED FILTERING
•Disadvantages
•The “Filter bubble”: Content filtering can recommend only content similar to
the user’s past preferences. If a user reads a book about a political ideology and
books related to that ideology are recommended to them, they will be in the
“bubble of their previous interests”.
•Limited serendipity: Content-based systems may have limited capability to
recommend items that are outside a user’s known preferences.
•In the first case scenario, 20% of items attract the attention of 70-80%
of users and 70-80% of items attract the attention of 20% of users. The
recommender’s goal is to introduce other products that are not
available to users at first glance.
•In the second case scenario, content-based filtering recommends
products that are fitting content-wise, yet very unpopular (i.e. people
don’t buy those products for some reason, for example, the book is bad
even though it fits thematically).
•Over-specialization: If the content-based system relies too heavily on a user’s
past interactions, it can recommend items that are too similar to what the user
has already seen or interacted with, potentially missing opportunities for
diversification
COLLABORATIVE FILTERING
• Collaborative filtering is a popular
technique used to provide personalized
recommendations to users based on the
behavior and preferences of similar
users.
• There are two main types of collaborative
filtering: memory-based and model-
based.
Memory-based recommenders
• Memory-based recommenders rely on the direct
similarity between users or items to make
recommendations.
• Usually, these systems use raw, historical user
interaction data, such as user-item ratings or purchase
histories, to identify similarities between users or
items and generate personalized recommendations.
• The biggest disadvantage of memory-based
recommenders is that they require a lot of data to be
stored and comparing every item/user with every
item/user is extremely computationally demanding.
• Memory-based recommenders can be categorized into
two main types user-based and item-based
collaborative filtering.
A user-based collaborative
filtering recommender system
With the used-based approach, recommendations to the target user are made by
identifying other users who have shown similar behavior or preferences
An item-based collaborative
filtering recommender system
• Where K is a set of (u, i) pairs, r(u, i) is the rating for item i by user u and
λ is a regularization term (used to avoid overfitting).
• In order to minimize loss function we can apply Stochastic Gradient
Descent (SGD) or Alternating Least Squares (ALS). Both methods can be
used to incrementally update the model as new rating comes in. SGD is
faster and more accurate than ALS.
HYBRID RECOMMENDERS
• Hybrid recommendation systems combine multiple
recommendation techniques or approaches to provide more
accurate, diverse, and effective personalized recommendations.
• They are particularly valuable in real-world recommendation
scenarios because they can provide more robust, accurate, and
adaptable recommendations.
• The choice of which hybrid approach to use depends on the
specific requirements and constraints of the recommendation
system and the nature of the available data.
HYBRID RECOMMENDERS
Advantages of hybrid recommenders include:
• Improved recommendation quality.
• Enhanced robustness and flexibility
• Addressing common recommendation limitations
Cons of hybrid recommenders
• Increased complexity and development effort.
• Data and computational demands
• Tuning and parameter sensitivity.
• While hybrid recommenders offer significant advantages in
terms of recommendation quality and versatility
• This is the best way to ensure that the benefits of hybridization
outweigh the added complexity and costs.
EVALUATION METRICS FOR RECOMMENDER
SYSTEMS
•Accuracy metrics assess the accuracy of the recommendations made by a
system in terms of how well they match the user’s actual preferences or behavior.
Here we have Mean Absolute Error (MAE), Root Mean Square Error (RMSE), or Mean
Squared Logarithmic Error (MSLE).
•Ranking metrics evaluate how well a recommender system ranks items for a
user, especially in top-N recommendation scenarios. Think of hit rate, average
reciprocal hit rate (ARHR), cumulative hit rate, or rating hit rate.
•Diversity metrics assess the diversity of recommended items to ensure that
recommendations are not overly focused on a narrow set of items. These include
Intra-List Diversity or Inter-List Diversity.
•Novelty metrics evaluate how well a recommender system introduces users to
new or unfamiliar items. Catalog coverage and item popularity belong to this
category.
•Serendipity metrics assess the system’s ability to recommend unexpected but
interesting items to users – surprise or diversity are looked at in this case.
You can also choose to look at some business metrics such as conversion
rate, click-through rate (CTR), or revenue impact. But, ultimately, the
best way to do an online evaluation of your recommender system is
through A/B testing.
Data Mining Methods for Recommender
Systems
1. Association Rule Mining:
•Overview: Association rule mining identifies relationships or patterns in
user-item interactions. It helps discover associations between items that
are frequently co-purchased or co-viewed.
•Application: Generating recommendations based on association rules, e.g.,
"Users who bought X also bought Y."
2. Clustering Algorithms:
•Overview: Clustering methods group users or items with similar
characteristics. Users or items within the same cluster are likely to share
common preferences.
•Application: Recommending items popular within a user's cluster,
assuming similar preferences within the group.
3. Classification Algorithms:
•Overview: Classification models predict user preferences for items based
on historical interactions. These models can be trained to classify items as
relevant or irrelevant to a user.
•Application: Providing recommendations by predicting user preferences
for items not yet interacted with.
Data Mining Methods for Recommender
Systems
4. Matrix Factorization:
•Overview: Matrix factorization techniques decompose the user-item
interaction matrix into latent factors, capturing hidden patterns and
relationships. Singular Value Decomposition (SVD) and Alternating Least
Squares (ALS) are common matrix factorization methods.
•Application: Predicting missing values in the user-item matrix to
recommend items a user might like.
5. Deep Learning Models:
•Overview: Deep learning models, such as neural networks, can capture
complex patterns in user-item interactions. Neural collaborative filtering is
an example where embeddings are used to represent users and items.
•Application: Learning intricate user-item relationships for more accurate
and personalized recommendations
Similarity Measures
1. Cosine Similarity:
•Definition: Measures the cosine of the angle between two vectors,
representing users or items, in a multidimensional space.
•Cosine similarity is a measure used to determine the similarity between
two non-zero vectors in a vector space. It calculates the cosine of the angle
between the vectors, representing their orientation and similarity.
•A · B denotes the dot product of vectors A and B, which is the sum of the
element-wise multiplication of their corresponding components.
•||A|| represents the Euclidean norm or magnitude of vector A, calculated
as the square root of the sum of the squares of its components.
•||B|| represents the Euclidean norm or magnitude of vector B.
The resulting value ranges from -1 to 1, where 1 indicates that the vectors
are in the same direction (i.e., completely similar), -1 indicates they are in
opposite directions (i.e., completely dissimilar), and 0 indicates they are
orthogonal or independent (i.e., no similarity).
Similarity Measures
Cosine similarity between two vectors X and Y is computed using the
following formula.
Hence,
Example: Consider Row 1 and Row 3 of the four-dimensional data table above. Row 1
contains (10, 3, 3, 5) and Row 3 contains (9, 4, 6, 4). What is the cosine similarity between
Row 1 and Row 3?
Now, compute the cosine similarity between Row 1 and Row 5. Row 1 contains (10, 3, 3,
5). Row 5 contains (20, 15, 10, 20). The cosine similarity should be 0.93494699.
Therefore, Row 3 is more similar to Row 1 than Row 5.
Similarity Measures
2.Pearson Correlation Coefficient:
•Definition: Measures linear correlation between two variables, providing a
measure of the strength and direction of a linear relationship.
•The Pearson correlation coefficient, also known as Pearson’s correlation or
simply correlation coefficient, is a statistical measure that quantifies the
linear relationship between two variables. It measures how closely the data
points of the variables align on a straight line, indicating the strength and
direction of the relationship.
The Pearson correlation coefficient is denoted by the symbol “r” and takes
values between -1 and 1. The coefficient value indicates the following:
•r = 1: Perfect positive correlation. The variables have a strong positive
linear relationship, meaning that as one variable increases, the other
variable also increases proportionally.
•r = -1: Perfect negative correlation. The variables have a strong negative
linear relationship, meaning that as one variable increases, the other
variable decreases proportionally.
•r = 0: No linear correlation. There is no linear relationship between the
variables. They are independent of each other
Similarity Measures
3.Jaccard Similarity:
•Definition: Measures the intersection over the union of sets,
quantifying the similarity between two sets.
•It calculates the size of the intersection of the sets divided by the
size of their union. The resulting value ranges from 0 to 1, where 0
indicates no similarity and 1 indicates complete similarity.
Example: Consider Row 1 and Row 3 of the four-dimensional data table that we
have been using in this lesson. Row 1 contains (10, 3, 3, 5) and Row 3 contains (9,
4, 6, 4). What is the Tanimoto index between Row 1 and Row 3?
Now, compute the Tanimoto similarity between Row 1 and Row 5. Row 1 contains (10,
3, 3, 5). Row 5 contains (20, 15, 10, 20). Please do the calculation as a practice. You will
find that Tanimoto between Row 1 and Row 5 is 0.419932811.
Row 1 has a larger Tanimoto similarity with Row 3 than Row 5. Therefore, Rows 1 and
3 are more similar than Rows 1 and 5.
Similarity Measures
• Suppose, we have a four-dimensional
dataset (Features 1 through 4).
Feature 1 Feature 2 Feature 3 Feature 4
Row 1 10 3 3 5
Row 2 5 4 5 3
Row 3 9 4 6 4
Row 4 8 6 2 6
Row 5 20 15 10 20
5. Manhattan Distance:
•Definition: Measures the distance between two points by
summing the absolute differences along each dimension.
•Application: Similar to Euclidean distance, but may be less
sensitive to outliers.
6. Hamming Distance:
•Definition: Measures the number of positions at which
corresponding bits differ in two binary strings.
•Application: Suitable for comparing binary user profiles or item
representations
DIMENSIONALITY REDUCTION
Dimensionality reduction is a technique used to reduce the
number of features (dimensions) in a dataset while preserving its
essential information. In the context of recommender systems,
dimensionality reduction is often applied to user-item interaction matrices
to capture latent factors that represent hidden patterns in the data. By
reducing the dimensionality, the computational complexity decreases, and
the model becomes more efficient.
Methods:
•Principal Component Analysis (PCA): PCA is a popular linear
dimensionality reduction method that transforms the original
features into a new set of uncorrelated variables (principal
components) while preserving the variance in the data.
•Singular Value Decomposition (SVD): SVD is a matrix factorization
technique that decomposes a matrix into three other matrices, capturing
latent factors. It is commonly used in collaborative filtering for
recommender systems.
•Non-Negative Matrix Factorization (NMF): NMF decomposes a matrix
into two lower-rank matrices with non-negative elements, making it
suitable for scenarios where non-negativity is a meaningful constraint.
Single Value Decomposition
According to the formula for SVD,
SVD Formula
1. A is the input matrix
2. U are the left singular vectors,
3. sigma are the diagonal/eigenvalues
4. V are the right singular vectors.
The shape of these three matrices will be
1. A — m x n matrix
2. U — m x k matrix
3. Sigma — k x k matrix
4. V — n x k matrix
Single Value Decomposition
• Step 1
• So, as the first step, we need to find eigenvalues (watch the video provided below
to get an understanding of eigenvalues and eigenvectors) of matrix A and as A
can be a rectangular matrix, we need to convert it to a square matrix by
multiplying A with its transpose. Here, for easier computation I have taken A as a
2 x 2 matrix.
Single Value Decomposition
• Step 2
• Once we have calculated the eigenvalues, it’s time to calculate the two
eigenvectors for each eigenvalue. So, let’s start by calculating the eigenvector for
10.
• Step 3.1
Next, we need to reduce this matrix to the Row-Echelon Form so that we can
easily solve the equation. Let’s talk about Row-Echelon for a moment here.
Single Value Decomposition
• Row-Echelon Form
2. If a column contains a leading entry then all the entries below the leading entry
should be zero
3. If any two consecutive non-zero rows, the leading entry in the upper row should
occur to the left of the leading entry in the lower row.
4. All rows which consist only of zeros should occur in the bottom of the matrix
• We need to perform some operations on the rows to reduce the matrix. These
operations are called elementary row operations and there are a certain rules to
follow for these operations as given below,
Single Value Decomposition