100% found this document useful (4 votes)

6K views

Project Report On Recommendation System

This document provides an overview of a movies recommendation system project. It discusses the need for recommendation systems and describes three main types: content-based filtering, collaborative filtering, and hybrid filtering. The project uses the MovieLens 100k dataset and the LightFM library to build a model that recommends movies to users based on a hybrid approach combining content-based and collaborative filtering. The recommended movies are displayed on a webpage along with each movie's name, poster, and ability to watch trailers by clicking on the poster.

Uploaded by

Amal Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

6K views

Project Report On Recommendation System

Uploaded by

Amal Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Movies Recommendation System

A training report submitted in the partial fulfillment

Of the requirement for the award of degree of
Bachelor of Engineering (Computer Science & Engineering)

(Session: 2016-2020)

Computer Science and Engineering

University Institute of Engineering and Technology
Panjab University, Chandigarh – 160014, INDIA

Submitted by:
Amal Yadav
UE163007
BE CSE SECTION-I
Table of contents

S.No Title Page No

1. Acknowledgement 1

2. Abstract 2

3. Introduction : 3
i. Need of recommendation system
ii. Types of filtering in recommendation system:
1. Content based filtering
2. Collaborative filtering
3. Hybrid filtering
4. Basic terminologies 6

5. Project dependencies: 9
1. Dataset
2. Libraries used
3. Loss function used
4. UI
6. Methodology 14

7. Result 15

8. Application of recommendation system 17

9. Advantages of recommendation system 19
10. Conclusion 21

11. Future scope 23

12. References 24
Acknowledgment

I highly grateful to the Dr. Savita Gupta ‘Director’ UNIVERSITY INSTITUTE OF

TECHNOLOGY, PANJAB UNIVERSITY for providing this opportunity to carry out the four
weeks industrial training at Access Computer Institute, Delhi.

I would like to express a deep sense of gratitude and thanks profusely to Mr. Avinash, without
his wise counsel and able guidance, it would have been impossible to complete the project in this
manner.

I express gratitude to other team members of the Information Technology department of Access
Computer Institute. For their intellectual support throughout the course of this work.

I perceive as this opportunity as a big milestone in my career development. I will strive to use
gained skills and knowledge in the best possible way, and I will continue to work on their
improvement, in order to attain desired career objectives. Hope to continue cooperation with all
of you in the future.

1
Abstract

In this Project report, I present a summary of my project that is recommendation system that
recommends movies for a given user based on a hybrid approach which is a combination of
content-based (using user’s past history or choice) and collaborative approach (using other
similar user’s choice).

For this project, I have used Movie lens 100k dataset, to train and test our model so that it can
recommend movies for any given user. The light FM python library is used for implementing the
popular recommendation algorithms, i.e. WARP (Weighted Approximate Rating Pairwise) loss
based algorithm. The given user’s past viewed history and recommended movies are put on the
webpage, which shows the name, poster of the movie, and even a user can watch the trailer of
any movie present there by clicking on its poster.

2
Introduction
A product recommendation is a filtering system that seeks to predict and show the items that a
user would like to purchase. It may not be entirely accurate, but if it shows what a user like then it
is doing its job right.
Recommendation engines basically are data filtering tools that make use of algorithms and data to
recommend the most relevant items to a particular user. In simple terms, they are nothing but an
automated form of a “shop counter guy”.

Need of recommendation system

➢ In the immortal words of Steve Jobs: “A lot of times, people don’t know what they want until
you show it to them.” Customers may love your movie, your product, your job opening- but
they may not know it exists. The job of the recommender system is to open the customer/user
up to completely new products and possibilities, which they would not think to directly
search for themselves.
➢ With the growing amount of information on the internet and with a significant rise in the
number of users, it is becoming important for companies to search, map and provide them
with the relevant chunk of information according to their preferences and tastes.

Types of recommendation systems

There are majorly three important types of recommendation systems:

1. Collaborative filtering
2. Content-Based Filtering
3. Hybrid Recommendation Systems

1. Collaborative filtering:

➢ This filtering method is usually based on collecting and analyzing information on user’s
behaviors, their activities or preferences and predicting what they will like based on the
similarity with other users.
➢ A key advantage of the collaborative filtering approach is that it does not rely on machine
analyzable content and thus it is capable of accurately recommending complex items such
as movies without requiring an “understanding” of the item itself.
➢ Collaborative filtering is based on the assumption that people who agreed in the past will
agree in the future, and that they will like similar kinds of items as they liked in the past.
➢ For example, if a person A likes item 1, 2, 3 and B like 2,3,4 then they have similar
interests and A should like item 4 and B should like item 1.

➢ Further, there are several types of collaborative filtering algorithms:

3
• User-User Collaborative Filtering: Here, the search is done for lookalike customers
and offer products based on what his/her lookalike has chosen. This algorithm is very
effective but takes a lot of time and resources. This type of filtering requires
computing every customer pair information which takes time. So, for big base
platforms, this algorithm is hard to put in place.
• Item-Item Collaborative Filtering: It is very similar to the previous algorithm, but
instead of finding a customer look alike, we try finding item look alike. Once we have
item look alike matrix, we can easily recommend alike items to a customer who has
purchased an item from the store. This algorithm requires far fewer resources than
user-user collaborative filtering. Hence, for a new customer, the algorithm takes far
lesser time than user-user collaborate as we don’t need all similarity scores between
customers. Amazon uses this approach in its recommendation engine to show related
products which boost sales.
• Other simpler algorithms: There are other approaches like market basket analysis,
which generally do not have high predictive power than the algorithms described
above.

2. Content-based filtering:

➢ These filtering methods are based on the description of an item and a profile of the user’s
preferred choices.
➢ In a content-based recommendation system, keywords are used to describe the items;
besides, a user profile is built to state the type of item this user likes. In other words, the
algorithms try to recommend products which are similar to the ones that a user has liked
in the past.
➢ The idea of content-based filtering is that if a user like an item then he/she will also like a
‘similar’ item.
➢ For example, when we are recommending the same kind of item like a movie or song
recommendation. This approach has its roots in information retrieval and information
filtering research.
➢ A major issue with content-based filtering is whether the system is able to learn user
preferences from users actions about one content source and replicate them across other
different content types.
➢ When the system is limited to recommending the content of the same type as the user is
already using, the value from the recommendation system is significantly less when other
content types from other services can be recommended. For example, recommending
news articles based on the browsing of news is useful, but wouldn’t it be much more
useful when music, videos from different services can be recommended based on the
news browsing.

4
Fig 1: Filtering method representation of collaborative and content-based filtering

3. Hybrid Recommendation systems:

➢ Hybrid approaches can be implemented by making content-based and collaborative-based
predictions separately and then combining them.
➢ Further, by adding content-based capabilities to a collaborative-based approach and vice
versa; or by unifying the approaches into one model.
➢ Several studies focused on comparing the performance of the hybrid with the pure
collaborative and content-based methods and demonstrate that hybrid methods can
provide more accurate recommendations than pure approaches.
➢ Such methods can be used to overcome the common problems in recommendation
systems such as cold start and the data paucity problem.
➢ Netflix is a good example of the use of hybrid recommender systems. The website makes
recommendations by comparing the watching and searching habits of similar users (i.e.,
collaborative filtering) as well as by offering movies that share characteristics with films
that a user has rated highly (content-based filtering).

Fig 2: Reference: http://dataconomy.com/2015/03/an-introduction-to-recommendation-engines

5
Basic terminologies
1. Labels: A label is a thing we're predicting. For example, the label could be the future price of
wheat, the kind of animal shown in a picture, the meaning of an audio clip, or just about
anything.
2. Feature: A feature is an input variable. For example, in spam detector example, the features
could include the words in the email text, sender’s address etc.
3. Model: It defines the relationship between features and label. For example, a spam detection
model might associate certain features strongly with "spam".
4. Training means creating or learning the model. That is, the model is shown the labeled
examples and it enables the model to gradually learn the relationships between features and
label.
5. Inference means applying the trained model to unlabeled examples. That is, you use the
trained model to make useful predictions (y’).
6. Loss Function: It measures the difference between the model’s predictions and the desired
output. We want to minimize it during training so that our model becomes more accurate
over time.
➢ Loss: Loss is the penalty for a bad prediction. That is, the loss is a number indicating
how bad the model's prediction was on a single example. If the model's prediction is
perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to
find a set of weights and biases that have low loss, on average, across all examples. For
example, Figure 1.3 shows a high loss model on the left and a low loss model on the
right. Note the following about the figure:
• The red arrow represents a loss.
• The blue line represents predictions.

Fig 3: High loss in the left model; low loss in the right model.

The red arrows in the left plot are much longer than their counterparts in the right plot.
Clearly, the blue line in the right plot is a much better predictive model than the blue line
in the left plot.

6
➢ Popular Loss Functions

1. Squared loss: The linear regression models we'll examine here use a loss function
called squared loss (also known as L2 loss). The squared loss for a single example is
as follows:
= the square of the difference between the label and the prediction
= (observation - prediction(x))2
= (y - y')2

2. Mean square error (MSE) is the average squared loss per example over the whole
dataset. To calculate MSE, sum up all the squared losses for individual examples and
then divide by the number of examples:
MSE=1N∑(x, y) ∈ D(y−prediction(x))2

where:
• (x,y) is an example in which
• x is the set of features (for example, chirps/minute, age, gender) that the model
uses to make predictions.
• y is the example's label (for example, temperature).
• prediction(x) is a function of the weights and bias in combination with the set of
features x.
• D is a data set containing many labeled examples, which are (x, y) pairs.
• N is the number of examples in D.

Although MSE is commonly-used in machine learning, it is neither the only practical

loss function nor the best loss function for all circumstances.

7. Reducing Loss: Calculating the loss function for every conceivable value of the weight of a
feature over the entire data set would be an inefficient way of finding the convergence point.
So, we use the following ways to minimize the loss:

i. Gradient Descent: It is an iterative optimization algorithm used in machine learning to

find the best results (minima of a curve).

(a) Gradient: It means the rate of inclination or declination of a slope. A gradient is a

vector, so it has both of the following characteristics:
• a direction
• a magnitude

(b) Descent: means the instance of descending.

The gradient always points in the direction of steepest increase in the loss function. The
gradient descent algorithm takes a step in the direction of the negative gradient in order to
reduce loss as quickly as possible. The gradient descent then repeats this process, edging
ever closer to the minimum.

7
Fig 4: A gradient step moves us to the next point on the loss curve
c) Learning Rate: Gradient descent algorithms multiply the gradient by a scalar
known as the learning rate (also sometimes called step size) to determine the next
point.
d) Batch: It is the total number of examples you use to calculate the gradient in a single
iteration. A very large batch may cause even a single iteration to take a very long
time to compute.
ii. SGD (stochastic gradient descent): It uses only a single example (a batch size of 1) per
iteration. Given enough iterations, SGD works but is very noisy. The term "stochastic"
indicates that the one example comprising each batch is chosen at random.
8. Epochs - One Epoch is when an entire dataset is passed forward and backward through the
neural network only once.
➢ Why we use more than one Epoch?
I know it doesn’t make sense in the starting that the passing the entire dataset through a
neural network is not enough and we need to pass the full dataset multiple times to the
same neural network. But we are using a limited dataset and to optimize the learning and
the graph we are using Gradient Descent which is an iterative process. So, updating the
weights with a single pass or one epoch is not enough.
One epoch leads to underfitting of the curve in the graph (below).

Fig 5: shows that as the number of epochs increases, the number of times the weight is
changed in the neural network and the curve goes from under
fitting to optimal to overfitting curve.

8
Project Dependencies
1. Dataset:
For the project, MovieLens dataset is used. MovieLens is run by GroupLens, a research lab at the
University of Minnesota. The Movielens dataset is a big CSV file that contains data of 943 users
for 1682 items. Each user has given a rating to at least 20 movies.
DETAILED DESCRIPTIONS OF DATA FILES:
S.No File Description
1. u.data The full u data set, 100000 ratings by 943 users on 1682 items. Each user
has rated at least 20 movies. Users and items are numbered consecutively
from 1. The data is randomly ordered. This is a tab separated list of
user id | item id | rating | timestamp.
The time stamps are Unix seconds since 1/1/1970 UTC
2. u.info The number of users, items, and ratings in the u data set.
3. u.item Information about the items (movies); this is a tab separated list of
movie id | movie title | release date | video release date |IMDb URL |
unknown | Action | Adventure | Animation | Children’s | Comedy | Crime |
Documentary | Drama | Fantasy | Film-Noir | Horror | Musical | Mystery |
Romance | Sci-Fi | Thriller | War | Western |
The last 19 fields are the genres, a 1 indicates the movie is of that genre, a
0 indicates it is not; movies can be in several genres at once. The movie
ids are the ones used in the u.data data set.
4. u.genre A list of the genres.
5. u.user Demographic information about the users; this is a tab separated list of
user id | age | gender | occupation | zip code
The user ids are the ones used in the u.data data set.
6. u.occupation A list of the occupations.
7. U1.base The data sets u1.base and u1.test through u5.base and u5.test are
U1.test 80%/20% splits of the u data into training and test data. Each of u1, …, u5
U2.base have disjoint test sets; this if for 5 fold cross validation (where you repeat
U2.test your experiment with each training and test set and average the results).
U3.base These data sets can be generated from u.data by mku.sh.
U3.test
U4.base
U4.test
U5.base
U5.test
Table 1: Brief description of the Movielens dataset

9
2. Libraries Used:
a) LightFM library - LightFM is a Python implementation of a number of popular
recommendation algorithms for both implicit and explicit feedback, including efficient
implementation of BPR and WARP ranking losses. It's easy to use, fast (via multithreaded
model estimation) and produces high-quality results.

In this project, this library is used to fetch Movielens dataset at runtime, for creating a
model and train it using WARP ranking losses and for training our model. This
implementation uses stochastic gradient descent for training.

b) Webbrowser module - The webbrowser module in Python provides an interface to

display Web-based documents. The webbrowser module includes functions to open URLs
in interactive browser applications. The module includes a registry of available browsers,
in case multiple options are available on the system.

In this project, this library is used to open HTML page containing watched and
recommended movies for a user and also to play movie’s trailer.

c) Os module - The OS module in Python provides a way of using operating system

dependent functionality.

In this project, this library is used to extract the path of the filename which is to be used
to load the HTML page.

d) Numpy library - NumPy is a general-purpose array-processing package. It provides a

high-performance multidimensional array object and tools for working with these arrays.
It is the fundamental package for scientific computing with Python.
3. Loss Function Used:
Loss functions are one of the most important parts of a machine learning algorithm; by telling the
algorithm what it got right or wrong, they essentially define what it is learning. A loss function is
a scalar value, where — in general — a higher value means the model is more wrong.

When training recommenders, we often don’t care about the absolute score of the items being
recommended as much as their rank relative to one another. However, few loss functions actually
optimize for this.

LightFM Model supports these four loss functions:

Four loss functions are available:
1) logistic: useful when both positive (1) and negative (-1) interactions are present.

10
2) BPR: Bayesian Personalised Ranking pairwise loss. Maximizes the prediction difference
between a positive example and a randomly chosen negative example. Useful when only
positive interactions are present and optimizing ROC AUC is desired.
3) WARP: Weighted Approximate-Rank Pairwise loss. Maximizes the rank of positive examples
by repeatedly sampling negative examples until rank violating one is found. Useful when
only positive interactions are present and optimizing the top of the recommendation list
(precision@k) is desired.
4) k-OS WARP: k-th order statistic loss. A modification of WARP that uses the k-th positive
example for any given user as a basis for pairwise updates.

For this project, the WARP loss function is used for training our model. WARP is an implicit
feedback model: all interactions in the training matrix are treated as positive signals and products that
users did not interact with the implicitly do not like. The goal of the model is to score these implicit
positives highly while assigning low scores to implicit negatives.

Intro to WARP Loss :

WARP loss was first introduced in 2011, not for recommender systems but for image annotation.
It was used to assign to an image the correct label from a very large sample of possible labels.
Originally, the motivation for developing this loss — which in particular, has a novel sampling
technique — was one of memory efficiency. However, the sampling technique also has additional
benefits which make it well suited to training a recommender system.

So how does WARP loss work?

At a high level, WARP loss will randomly sample output labels of a model, until it finds a pair
which it knows are wrongly labeled, and will then only apply an update to these two incorrectly
labeled examples.

Consider the following example: Let's take the example of a recommender system to recommend
one of the following 5 candy bars. Let a customer’s customer journey is inputted through my
recommender as given, and it has generated an output vector, which assigns to each candy bar a
probability that this customer will purchase it. To train the recommender, there is a target vector,
which describes the customer’s actual behavior using 1s if the customer purchased a specific
candy bar, and 0 if they did not:

11
Highlighted in red is the candy bar the customer actually bought (note that for simplicity, we are
only considering a single purchase, but this loss extends to the case where the customer has made
multiple purchases). This is known as the correct label; let’s label it x³+ for clarity (where the +
highlights that this was the purchased item, and the superscript indicates where the element is in
the vector).

Now we going to randomly sample the other labels until we find one for which the model
assigned a higher probability of purchase to the customer (or we run out of labels to sample).
Then it is known that this randomly sampled label is wrongly labeled because we know that the
Milky Way bar should have the highest probability — since this is the one the customer actually
bought!

For instance, if the first random sample we look at is the Mars bar:

Now, we have two variables: my correct label, x³+, and my sampled label, which we take as a
sampled negative label, x⁵-(negative because since the customer didn’t buy it).

In this case, our model was correct; 0.59 > 0.17 (or x³+>x⁵-) so our model correctly ranked the
Milky Way higher than the Mars bar. When this happens, we sample another label — and we will
keep doing this until we find a case where the model was wrong.

Say the second random sample we take is of the Kit Kat (which becomes the sampled negative
label, x²-):

In this case, 0.59 < 0.63 (or x³+ < x²-). Our model was wrong here since it thought the customer
would be more likely to buy the Kit Kat. To tell our model to correct this, x³+ and x²- are the two
examples we will use for the WARP loss, where the loss is the difference between the two values.

12
In addition to this pair, if we want to have an idea of how well my model did in general; was the
Milky Way bar ranked near the top of all the candy bars? Or did the model do poorly, and stick it
near the bottom?

To avoid having to look at all the examples (remember; efficiency!), we can keep track of this
while we do the random sampling. If it takes us lots of random samples to find an example where
our model was wrong, then we can assume it did pretty well. On the other hand, if the first
random sample we looked at had a higher score than my correct label, then we can assume it did
pretty poorly.

We, therefore, multiply loss by the following function:

where X is the total number of labels (5, in this case) and N is the number of samples needed to
find an example where the model was wrong (2, in this case — the Mars bar, and the Kit Kat).

This makes sense; as we have to take more samples (and N gets larger), it indicates our model is
more correct, so we want our loss to be small. We also take the natural logarithm of this function,
just to prevent the loss from exploding if N gets small (and since X is generally large).

So now, loss function looks like this:

It’s interesting to note that the loss only depends on these two examples which we have sampled
(and so only weights for those two examples will be updated). Nothing is going to be done about
the fact that the Twix bar was also ranked higher than the Milky Way, or the fact that Snickers got
a 0.35 chance of being bought even though the customer didn’t buy it (so in the best model, it
should have a 0). The model will only learn that the Milky Way bar should be ranked above the
Kit Kat.

For a recommender, this is much more desirable than a model which learns that it should output
1s for all positive examples and 0 for all negative examples, because often for recommenders, a 0
does not mean a negative interaction. Just because the customer didn’t buy a Twix, it doesn’t
mean they didn’t want to buy it — many other factors could have contributed to their not
purchasing it, most notably (considering the case where there are not 5 but 500 products to
recommend) that they just didn’t see it.

4. UI:
For better user experience and understanding, the known choices of movies for a particular user
and the recommended movies are put on an HTML page showing the title, poster of the movie
and if the user clicks on the poster of any movie its trailer is played at the center of the screen.

13
Methodology
i. LightFM includes functions for getting and processing the dataset. There is a function
(fetch_movielens) which downloads the dataset and automatically pre-processes it into
sparse matrices suitable for further calculation. In particular, it prepares the sparse user-
item matrices, containing positive entries where a user interacted with a product, and zeros
otherwise.

ii. We have two such matrices, training, and a testing set. Both have around 1000 users and
1700 items. We’ll train the model on the training matrix but test it on the test matrix.

iii. To run this recommendation system, first, a user id is required just like when a particular
user login his/her account then only his/her past history is known to the system and
according to that particular user’s past history and other user’s choices like him/her are
being recommended to it. So, just for now the user id is given to the recommender system
at runtime.

iv. Then the LightFM model is created. It is a hybrid latent representation recommender
model.

The model learns embeddings (latent representations in a high-dimensional space) for

users and items in a way that encodes user preferences over items. When multiplied
together, these representations produce scores for every item for a given user; items scored
highly are more likely to be interesting to the user. The embeddings are learned through
stochastic gradient descent methods.

The user and item representations are expressed in terms of representations of their
features: an embedding is estimated for every feature, and these features are then summed
together to arrive at representations for users and items. For example, if the movie ‘Wizard
of Oz’ is described by the following features: ‘musical fantasy’, ‘Judy Garland’, and
‘Wizard of Oz’, then its embedding will be given by taking the features’ embeddings and
adding them together. The same applies to user features.

v. Then we use the WARP (Weighted Approximate-Rank Pairwise) loss function to train our
model. WARP is an implicit feedback model: all interactions in the training matrix are
treated as positive signals and products that users did not interact with they implicitly do
not like. The goal of the model is to score these implicit positives highly while assigning
low scores to implicit negatives.

Model training is accomplished via SGD (stochastic gradient descent). This means that for
every pass through the data — an epoch — the model learns to fit the data more and more
closely. We’ll run it for 10 epochs in this example. We can also run it on multiple cores,
so we’ll set that to 2. (The dataset in this example is too small for that to make a
difference, but it will matter on bigger datasets).

14
Result
After training the model, it predicts the recommended movies for the user id given as an input.
The figure below shows the known choices and recommended movies for the user id 5.

Fig 6: Showing known choices and recommended movies for a user with id 5
But, just for better user interface the above information can be put on the HTML page which contains the
title of the movie, poster of the movie and a functionality that if the user clicks on the poster of the movie
then its trailer is going to run at the center of the screen.

Fig 7: Showing HTML page containing watched and recommended movies title and poster.

15
Fig 8: It shows the selection of a movie when the mouse hovers over the poster of a movie ‘Toy Story’.

Fig 9: Showing the trailer of the movie ‘Toy Story’ is playing on the screen when the user clicks on its
poster.

16
Application of Recommendation System
The following are the application of Recommendation System:
➢ Recommender systems have become increasingly popular in recent years, and are utilized in a
variety of areas including movies, music, news, books, research articles, search queries, social
tags, and products in general.

➢ Mostly used in the digital domain, the majority of today’s E-Commerce sites like eBay,
Amazon, Alibaba etc., make use of their proprietary recommendation algorithms in order to
better serve the customers with the products they are bound to like.

Popular recommendation system examples:

1. Amazon:

Fig 10: Reference: Amazon’s recommendation system providing a recommendation of the products
2. YouTube :

Fig 11: YouTube’s recommendation system giving a recommendation for videos

17
3. Netflix:

Fig 12: Netflix recommendation system giving a recommendation for a user for a movies
4. Gaana Music App

Fig 13: Gaana music app’s recommendation system recommends songs (Made for you).

18
Advantages of using a recommendation system
Below are some of the various potential benefits of recommendation systems in business, and the
companies that use them:

1. “Improving with use” (retention): One of the core potential benefits of recommendation
systems is their ability to continuously calibrate to the preferences of the user. This makes
products that become more and more “sticky” in their customer retention as time goes on:

❖ You’re much less likely to switch to a Netflix competitor when Netflix has such a
wonderful sense of which movies and shows you might want to watch next (i.e. they
“know you so well”). Because most of Netflix’s revenues come from a fixed-rate
recurring billing model subscription, the company’s biggest ROI “win” with
recommendation systems is retention.

2. Improving cart value: A company with an inventory of thousands and thousands of items
would be hard pressed to hard-code product suggestions for all of its products, and it’s
obvious that such static suggestions would quickly be out-of-date or irrelevant for many
customers. By using various means of “filtering”, eCommerce giants can find opportune
times to suggest (on their site, via email, or through other means) new products that you’re
likely to buy.

❖ Amazon’s quick delivery and emphasis on customer service have earned them millions of
customers. Recommendation engines play a role not only in helping customers find more
of what they need (and see Amazon as an authority), but these systems also improve cart
value. If Amazon doesn’t have to pay much more for shipping to send you two or three
times as many products, their profit margins improve.

3. Improved engagement and delight: Sometimes seeing an ROI doesn’t involve explicitly
asking for payment. Many companies use these systems to simply encourage engagement
and activity on their product or platform.

❖ YouTube has subscription options, but the majority of the firm’s revenues are driven
through advertisements placed across its wide array of video properties. The company
makes more money when users come back time and time again. YouTube doesn’t
optimize for short-term view length, as this might encourage pushy or flashy tactics that
wouldn’t genuinely delight users. Instead, the service aims to encourage long-term use,
because advertising views are the ROI that these systems serve at YouTube. Facebook is
another obvious example of a similar application of recommendation engines.

It’s also important to note that recommendation system:

1. are likely only to be a fit for companies with enough data and in-house AI talent to use
them well, and
2. many businesses and business models may be better off not using recommendation
systems as they are not guaranteed to be a higher yield approach than the alternatives.

19
That being said, there are some sectors (most notably digital media, eCommerce) where such
systems seem to be borderline inevitable.

Let’s consider examples to better understand the concept of a recommendation engine.

1. According to McKinsey & Company, 35% of Amazon.com’s revenue is generated by its

recommendation engine.

2. According to a paper written by Netflix executives Carlos A. Gomez-Uribe and Neil Hunt,
the video streaming service’s AI recommendation system saves the company around $1
billion each year. This allows them to invest more money in new content which viewers
will continue to view, giving them a good ROI. According to McKinsey, 75 percent of
what users watch on Netflix come from product recommendations.

3. According to YouTube after implementation of the RS for more than a year, it has been
successful in terms of their stated goals, with recommendations accounting for around 60
percent of video clicks from the homepage.
Recommendation systems can significantly boost revenues, CTRs, conversions, and other
important metrics.
Moreover, they can have positive effects on the user experience as well, which translates into
metrics that are harder to measure but are nonetheless of much importance to online businesses,
such as customer satisfaction and retention.

20
Conclusion
1. Recommendation engines basically are data filtering tools that make use of algorithms and
data to recommend the most relevant items to a particular user.

2. Recommendation system can be categorized into the following categories:

a) Collaborative filtering: This type of recommendation system makes predictions of what
might interest a person based on the taste of many other users. It assumes that if person X
likes Snickers, and person Y likes Snickers and Milky Way, then person X might like Milky
Way as well.
b) Content-based filtering: This type of recommendation system focuses on the products
themselves and recommends other products that have similar attributes. Content-based
filtering relies on the characteristics of the products themselves, so it doesn’t rely on other
users to interact with the products before making a recommendation.
c) Demographic-based recommender system: This type of recommendation system
categorizes users based on a set of demographic classes. This algorithm requires market
research data to fully implement. The main benefit is that it doesn’t need a history of user
ratings.
d) Knowledge-based Recommender System: This type of system makes suggestions based
on information relating to each user’s preferences and needs. Using function knowledge it
can draw connections between a customer’s need and a suitable product.
e) Hybrid filtering: This type of recommendation system can implement a combination of any
two of the above systems.

3. The recommendation system made in this project is able to recommend movies for a
particular user-provided its user id is given. Our program fetches the Movielens dataset, and
then create and train a model using WARP loss function. It uses a hybrid approach that is the
content-based and collaborative approach in order to recommend movies for a user
appropriately.
For the evaluation of our results, we can use two metrics of accuracy: precision@k and ROC
AUC. Both are ranking metrics: to compute them, we’ll be constructing recommendation
lists for all of our users, and checking the ranking of known positive movies. For precision at
k we’ll be looking at whether they are within the first k results on the list; for AUC, we’ll be
calculating the probability that any known positive is higher on the list than a random
negative example.
For example, for instance for user with id:5 have the following values of the matrices are:

Fig 14: Values of accuracy metrices

We can compare the performance of WARP model with other models using these matrices
values.

21
4. The need of recommendation system is: With the growing amount of information on the
internet and with a significant rise in the number of users, it is becoming important for
companies to search, map and provide them with the relevant chunk of information according
to their preferences and tastes.

5. Application of recommendation system: Almost nowadays all web service based business uses
recommendation system. Examples of popular recommendation systems are that of Netflix,
Amazon, YouTube, Gaana Music App, Flipkart, eBay etc.

6. Advantages of using recommendation system: Recommendation systems can significantly

boost revenues, CTRs, conversions, and other important metrics. Moreover, they can have
positive effects on the user experience as well, which translates into metrics that are harder to
measure but are nonetheless of much importance to online businesses, such as customer
satisfaction and retention.

22
Future Scope

The future scope of this project, the Recommendation System is very wide. There are many
additional features, which are planned to be incorporated during the future enhancements of this
project.

Although all the main objectives have been achieved still there is room for enhancement.
• This system can be easily upgraded in the future. And also include many more features for
the existing system.

• It can be made to give a more realistic recommendation if we include demographic

filtering.
• Future enhancements can be made so that it can work well with any dataset.

• Also, the recommendation system can be generalized or changed so that it can give
recommendations for other things also like music, books, video recommendation provided
appropriate dataset is available to create and train our model.

• Django framework can be used for the providing realistic user experience to the user
including login in into the website and then user id based on login id is processed on the
server and provide a recommendation on that simultaneously.

23
References
1. https://developers.google.com/machine-learning/crash-course
2. https://medium.com/@gabrieltseng/intro-to-warp-loss-automatic-differentiation-and-
pytorch-b6aa5083187a
3. https://movielens.org/
4. https://lyst.github.io/lightfm/docs/quickstart.html
5. https://dataconomy.com/2015/03/an-introduction-to-recommendation-engines/
6. https://www.youtube.com/Siraj-raval
7. https://towardsdatascience.com/what-are-product-recommendation-engines-and-the-
various-versions-of-them-9dcab4ee26d5
8. https://www.datasciencecentral.com/profiles/blogs/5-types-of-recommenders
9. https://en.wikipedia.org/wiki/Recommender_system

Movie Recommendation Project Report
90% (10)
Movie Recommendation Project Report
30 pages
Major Project Report Bca Final
No ratings yet
Major Project Report Bca Final
66 pages
Movie Recommendation System
50% (4)
Movie Recommendation System
55 pages
U4526113 Rankine Cycle Lab Report
100% (2)
U4526113 Rankine Cycle Lab Report
7 pages
Final Report Spam Mail Detection 33
No ratings yet
Final Report Spam Mail Detection 33
51 pages
Major Project Report
100% (2)
Major Project Report
33 pages
Year 3/4 - English - Exposition Writing (Travel Brochure)
No ratings yet
Year 3/4 - English - Exposition Writing (Travel Brochure)
16 pages
Book Recommendation System PROJECT PDF
No ratings yet
Book Recommendation System PROJECT PDF
54 pages
Music Recommendation System Powerpoint Presentation
100% (4)
Music Recommendation System Powerpoint Presentation
16 pages
Disease Prediction and Drug Recommendation Using Machine Learning
100% (1)
Disease Prediction and Drug Recommendation Using Machine Learning
26 pages
Weather Forecaste System
100% (1)
Weather Forecaste System
32 pages
Project Railway Reservation System
91% (11)
Project Railway Reservation System
127 pages
W G Ss 001
No ratings yet
W G Ss 001
44 pages
Book Recommendation System Proposal Report
100% (1)
Book Recommendation System Proposal Report
20 pages
Crypto Currency Tracker PBL Report
No ratings yet
Crypto Currency Tracker PBL Report
13 pages
Flight Fare Prediction Final
No ratings yet
Flight Fare Prediction Final
65 pages
Movie Recommendation System
100% (3)
Movie Recommendation System
41 pages
Laptop Price Prediction: A Project Report On
No ratings yet
Laptop Price Prediction: A Project Report On
20 pages
Music Recommendation Using Emotion Detection
100% (1)
Music Recommendation Using Emotion Detection
22 pages
Simon-Game: Integrated Project Report
100% (1)
Simon-Game: Integrated Project Report
13 pages
MovieProject Report
100% (1)
MovieProject Report
23 pages
Stress Detection in It Professional by Image Processing and Machine Learning
No ratings yet
Stress Detection in It Professional by Image Processing and Machine Learning
91 pages
Restaurant Review Analysis
67% (3)
Restaurant Review Analysis
59 pages
MINI REPORT Final PDF
100% (2)
MINI REPORT Final PDF
68 pages
Movie Recommender System: Shekhar 20BCS9911 Sanya Pawar 20BCS9879 Tushar Mishra 20BCS9962
No ratings yet
Movie Recommender System: Shekhar 20BCS9911 Sanya Pawar 20BCS9879 Tushar Mishra 20BCS9962
27 pages
Book Recommendation System Using Machine Learning
100% (1)
Book Recommendation System Using Machine Learning
3 pages
Online Book Recommendation System
100% (1)
Online Book Recommendation System
21 pages
PR3215 - Movie - Recommendation - System-Report - PAVAN KUMAR P B
No ratings yet
PR3215 - Movie - Recommendation - System-Report - PAVAN KUMAR P B
30 pages
Movie Recommendation Report - A
0% (1)
Movie Recommendation Report - A
33 pages
Movie Recomendation: A Project Report o
No ratings yet
Movie Recomendation: A Project Report o
15 pages
Recommendation System
No ratings yet
Recommendation System
17 pages
A Project-Based Seminar Report On Movie Rating Prediction System
100% (2)
A Project-Based Seminar Report On Movie Rating Prediction System
21 pages
Final
No ratings yet
Final
58 pages
Movie Recommender System
No ratings yet
Movie Recommender System
47 pages
A Final Year Project Presentation On: "Movie Recommendation System
33% (3)
A Final Year Project Presentation On: "Movie Recommendation System
21 pages
Movie Recommender System
100% (1)
Movie Recommender System
37 pages
Railway Reservation System Report
No ratings yet
Railway Reservation System Report
74 pages
Movie Recommendations
No ratings yet
Movie Recommendations
35 pages
Movie Recommendation System Using Simple Recommender-Based Approach
No ratings yet
Movie Recommendation System Using Simple Recommender-Based Approach
4 pages
Emotion Based Music Player: Graduate Project Report
50% (2)
Emotion Based Music Player: Graduate Project Report
53 pages
Movie Recommender System PDF
100% (1)
Movie Recommender System PDF
5 pages
Stock Market Analysis Using Supervised Machine Learning
100% (2)
Stock Market Analysis Using Supervised Machine Learning
17 pages
JournalNX - Movie Recommender System
No ratings yet
JournalNX - Movie Recommender System
4 pages
Movie Recommendation System-1
No ratings yet
Movie Recommendation System-1
25 pages
MovieTicketBooking Java 1
No ratings yet
MovieTicketBooking Java 1
24 pages
Industrial Training PPT On Movie Recomendation System
No ratings yet
Industrial Training PPT On Movie Recomendation System
13 pages
Objective of Online Quiz System
75% (4)
Objective of Online Quiz System
9 pages
21 - Online Ticket Booking System Project-Synopsis
No ratings yet
21 - Online Ticket Booking System Project-Synopsis
8 pages
Weather App Project Report
No ratings yet
Weather App Project Report
32 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
57 pages
Online Movie Ticket Booking System
60% (5)
Online Movie Ticket Booking System
46 pages
Characteristics of Soft Computing
88% (8)
Characteristics of Soft Computing
11 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
76 pages
Project On Online Train Ticket Booking System: 1.1 Survey
100% (2)
Project On Online Train Ticket Booking System: 1.1 Survey
29 pages
Jewellery Stock Management System: Software Requirements Specification
No ratings yet
Jewellery Stock Management System: Software Requirements Specification
15 pages
Movie Recommendation System Using AI & ML
No ratings yet
Movie Recommendation System Using AI & ML
22 pages
Medicine Recommedation System Project T.Y.
No ratings yet
Medicine Recommedation System Project T.Y.
43 pages
"Weather Prediction System": (Major Project) Master of Computer Application
100% (1)
"Weather Prediction System": (Major Project) Master of Computer Application
31 pages
Android Online Local Train Reservation
No ratings yet
Android Online Local Train Reservation
57 pages
Railway Ticket Reservation System Project Report
100% (1)
Railway Ticket Reservation System Project Report
9 pages
419623731-Project-Report-on-Recommendation-System (1)
No ratings yet
419623731-Project-Report-on-Recommendation-System (1)
26 pages
Content Based Movie Recommendation System by Python
No ratings yet
Content Based Movie Recommendation System by Python
44 pages
Project Report in House
No ratings yet
Project Report in House
19 pages
A221 BIL 3013 Assignment 3
No ratings yet
A221 BIL 3013 Assignment 3
2 pages
Telegram - @officialieltsreality
100% (1)
Telegram - @officialieltsreality
167 pages
Warna-Warna Tradisi Melayu Untuk Rujukan Rekabentuk Kontemporari
No ratings yet
Warna-Warna Tradisi Melayu Untuk Rujukan Rekabentuk Kontemporari
11 pages
Mechanics Formula Sheet
No ratings yet
Mechanics Formula Sheet
1 page
Most Wanted Property Crime Offenders July 2010
No ratings yet
Most Wanted Property Crime Offenders July 2010
1 page
Let's Have Fun!!: Training Games
No ratings yet
Let's Have Fun!!: Training Games
14 pages
Instagram Audit Checklist and Report
No ratings yet
Instagram Audit Checklist and Report
1 page
Nautical Knoweldge - 3 - Breaking Strain & SWL
No ratings yet
Nautical Knoweldge - 3 - Breaking Strain & SWL
9 pages
User-Centered Design Chadia Abras
0% (1)
User-Centered Design Chadia Abras
14 pages
Unit-9 Tolerance Analysis
No ratings yet
Unit-9 Tolerance Analysis
23 pages
Advancedgrammar Feb Longtest
No ratings yet
Advancedgrammar Feb Longtest
5 pages
ADP 7-0 Training
No ratings yet
ADP 7-0 Training
38 pages
A Case Study On Visual Spatial Skills and Level of Geometric Thinking in Learning 3D Geometry Among High Achievers
No ratings yet
A Case Study On Visual Spatial Skills and Level of Geometric Thinking in Learning 3D Geometry Among High Achievers
11 pages
HCV Solutions
No ratings yet
HCV Solutions
18 pages
Section 1
No ratings yet
Section 1
9 pages
The Women's Center: Madison Bloodworth Niki Winters Sam Elliott
No ratings yet
The Women's Center: Madison Bloodworth Niki Winters Sam Elliott
70 pages
Numerical Aperture, Acceptance Angle
No ratings yet
Numerical Aperture, Acceptance Angle
7 pages
Sembagavally A/p Murugason V Tee Seng Hock (Evrol Mariette Peters JC)
No ratings yet
Sembagavally A/p Murugason V Tee Seng Hock (Evrol Mariette Peters JC)
22 pages
Comments On ACRs
No ratings yet
Comments On ACRs
14 pages
The Role of Caste For Board Membership C
No ratings yet
The Role of Caste For Board Membership C
13 pages
Final Report PDF
No ratings yet
Final Report PDF
33 pages
Uq Scie1000
No ratings yet
Uq Scie1000
8 pages
TOOls TIC ITIL
No ratings yet
TOOls TIC ITIL
6 pages
Debate Script - A201 Version
100% (1)
Debate Script - A201 Version
5 pages
Orient DB
No ratings yet
Orient DB
23 pages
bsta450-ASSIGNMENT 5
No ratings yet
bsta450-ASSIGNMENT 5
2 pages
Procedure 23 283 (A)
No ratings yet
Procedure 23 283 (A)
20 pages