0% found this document useful (0 votes)
7 views

Final Project Report

The document discusses building a recommendation system project. It covers collaborative filtering and sentiment analysis techniques. It describes the hardware, software, data flow, and modules required to design the recommendation system.

Uploaded by

payalkri1103
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Final Project Report

The document discusses building a recommendation system project. It covers collaborative filtering and sentiment analysis techniques. It describes the hardware, software, data flow, and modules required to design the recommendation system.

Uploaded by

payalkri1103
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Mini Project Report (KCS-354)

on
Recommendation System
Submitted in partial fulfillment for award of
BACHELOR OF TECHNOLOGY
Degree
In
COMPUTER SCIENCE & ENGINEERING

2022-23
Under the Guidance of Submitted By:
Priya Kumari Singh
Miss. Vernika Singh (2100330100174)
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


RAJ KUMAR GOEL INSTITUTE OF TECHNOLOGY
DELHI-MEERUT ROAD, GHAZIABAD

Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow


SYNOPSIS

In today's technical era, every startup or company attempts to establish a better sort of
communication between their products and the users, and for that purpose, they require a type
of mechanism that can promote their product effectively, and here the recommender system
serves this motive. It is basically a filtering system that tries to predict and show the items that
a user would like to purchase. By analyzing the preference of the users, companies can decide
which product to be launched in the market to procure more benefits. These systems are proved
to be very beneficial in a variety of domains involving music, books, movies, research articles,
and products in common. In this paper, we review various mechanisms and techniques that are
required for recommender systems for recommending products or items in the domain of
fashion and books.
One of the main areas where this concept is currently used is e-commerce which interacts
directly with customers by suggesting products of interest with the aim of improving sales.
Motivated by the observation, a novel Domain-sensitive Recommendation (DsRec) algorithm
is proposed, to make the rating prediction by exploring the user-item subgroup analysis
simultaneously, in which a user-item subgroup is deemed as a domain consisting of a subset of
items with similar attributes and a subset of users who have interests in these items.
Collaborative Filtering (CF) is an effective and widely adopted recommendation approach.
Different from content-based recommender systems which rely on the profiles of users and
items for predictions, CF approaches make predictions by only utilizing the user-item
interaction information such as transaction history or item satisfaction expressed in ratings, etc.

ii
TABLE OF CONTENT

CHAPTER NO. TITLE PAGE NO.


Synopsis ii
List of Figures iv
List of Tables V
1 INTRODUCTION 01
1.1 Motivation 01
1.2Objective 01
1.3 Problem statement 01
2 Hardware and Software Required 02
2.1 Hardware Requirements 02
2.2 Software Requirements 02
3 Data Flow Diagram 03
4 Project Module Design 04
4.1 Collaborative Filtering 05
4.1.1 Diving into train and test dataset 05
4.1.2 Adjusted cosine similarity 06
4.1.3 Find top n products 06
4.1.4 Apply RMSE method 07
4.2 Sentiment analysis 07
4.2.1 Divide into train and test data 07
4.2.2 Apply oversample 08
4.2.3 Apply TF-IDF vectorization 08
4.2.4 Evaluate using F-1 score 09
4.2.5 Calculate sensitivity and specificity 09
4.3 Connecting the dots 09
5 Project Snapshot 09
6 Limitations 10
7 Future work 11
Conclusion 12
References 13
iii
LIST OF FIGURES

S.NO TITLE PAGE NO.

1. Fig.3.1. system architecture 08

2. Fig.4.1 Collaborative filtering 09

3. Fig.4.2. Sentiment analysis 12

4. Fig.5.1 Recommended products for joshua 14

iv
LIST OF TABLES

S.NO TABLE TITLE PAGE


NO.
1. Table 1 Rated products of user 1 10

2. Table 2 Rated products of user 2 10

3. Table 3 The collaborative filtering 10


process of user 1
4. Table 4 The collaborative filtering 10
process of user 2

v
CHAPTER 1

INTRODUCTION
1.1Motivation
Nowadays, a recommender system can be found on almost every information-intensive
website. For example, a list of likely preferred products is recommended to a customer when
browsing the target product on Amazon. Moreover, when watching a video clip on Youtube, a
recommender system employed in the system suggests some relevant videos to users by
learning the users’ behaviors that were generated previously. So to speak, recommender
systems have deeply changed the way we obtain information. Recommender systems not only
make it easier and more convenient for people to receive information, but also provide great
potential for economic growth as described in. As more and more people realize the importance
and power of recommender systems, the exploration of designing high-quality recommender
systems have been remaining an active topic in the community over the past decade. Due to
the continuous efforts in the field, thankfully, many recommender systems have been
developed and used in a variety of domains. Based on this, a key question arising is how to
know the performance of recommender systems so that the most suitable ones can be found to
apply in certain contexts or domains. The answer goes to evaluating recommender systems by
conducting rigorous and scientific evaluation experiments.

Evaluating recommender systems has become increasingly important with the growing
popularity of recommender systems in some applications. It is often the case that an application
designer needs to choose between a set of candidate recommendation algorithms. This goal can
be achieved by comparing the performance of these algorithms in evaluation experiments.
Besides, evaluating recommender systems can help researchers to select, tune and design
recommender systems as a whole. This is because when designing a recommender system,
some key factors influencing the system’s quality often come too noticed in the process of
evaluation. For example, in, Her locker et al. highlight that, when considering evaluation
metrics, evaluators should not only take into account the accuracy metrics, but also some extra
quality metrics, or to say beyond accuracy metrics, which attach importance to the fact that
users are often not interested in the items that they already know and surely like but sometimes
in discovering new items and exploring diverse items.

1.2 Objective
The objective of recommender systems is to provide recommendations based on recorded
information on the users' preferences. These systems use information filtering techniques to
process information and provide the user with potentially more relevant items.

1.3 Problem Definition


So, why do we need to use the recommendation system? The answer could be very simple and
easy Recommender systems help users to get personalized recommendations, helps users to
take correct decisions in their online transactions, increase sales and redefine the user’s web
browsing experience, retain customers, and enhance their shopping experience

1
CHAPTER2

HARDWARE AND SOFTWARE REQUIREMENTS

2.1 HARDWARE REQUIREMENTS


- Ram: 1 GB

- Storage: 2 GB

- Internet Connection

2.2 SOFTWARE REQUIREMENTS


- Python, Google

- Operating System (like window 10, 11)

2
CHAPTER 3

DATA FLOW DIAGRAM

The system architecture consists of the 3 main steps or the technologies which help in
completing the recommendation system:

COLLABORATIVE FILTERING

SENTIMENT ANALYSIS

CONNECTING THE DOTS

Figure 3.1: System Architecture

3
CHAPTER 4

PROJECT MODULES DESIGN

4.1 COLLABORATIVE FILTERING


The very first technology that we are using is the user-user-based approach or user-based
collaborative filtering which consists of six small steps to find the similarity between the
users.
It can be defined as if two different persons like the same products then there exists a chance
that they are having the same choice. So the recommendation system will suggest the choices
of the first user to the second and suggest the choices of the second user to the first. Like if
person one like the products{S1, S2, S3, S4, S5} and the second person likes the{S1, S2, S3,
S6, S8}. so the product {S4 and S5} will be suggested to the second user and the product{S6,
S8} will be suggested to the first user.

4.1.1 DIVIDE THE DATASET INTO TEST AND TRAIN


The very first step of the recommendation system is to import all the libraries and datasets. The
dataset consists of the user ID, and name of the products, and the ratings. The whole dataset
will be divided into two parts that are test dataset and the training dataset. The training dataset
will be used for training the system and the test dataset will be used for testing or for checking
the whole application.
Importing Library and Dataset

Train Test
Dataset Dataset

Apply Adjusted
Cosine Similarity

Find Top N Products

Evaluate Using RMSE Method

Fig. 4.1 Collaborative Filtering

4
4.1.2 APPLY ADJUSTED COSINE SIMILARITY
As the different users rate products differently so we need to use a mechanism to find the
similarities between the users’ choices. firstly, the products rated by user 1 and user 2 will be
collected and then we will use the Cosine similarity or the adjusted Cosine similarity to find
the user’s similarity.
User 1 =

PRODUCT Red dvdvideo Manila file backs Cheetos cheese


flavored
RATINGS 4 5 5
Table 1: rated products of user 1

User 2 =
PRODUCTS Aussie volume Colorex Blu-ray/dyd
shampoo disinfecting
bathroom cleaner
RATINGS 5 4 5
Table 2: rated products of user 2
Now,
we will use the adjusted Cosine similarity, will find the average rating of the users
User 1 = {4+5+5}/3
= 4.6
User 2 = {5+4+5}/3
= 4.6
The next step will consist of the subtraction of the average rating of each user from each user’s
different rating products.
User 1 =

PRODUCTS Red dvdvideo Manilla file backs Cheetos cheese


flavored
RATINGS o.6 -0.4 -0.4

Table 3: collaborative filtering process of user 1


User 2=

PRODUCTS Aussie volume Colorex disinfecting Blu-ray/DVD


shampoo bathroom cleaner
RATINGS -0.4 0.6 -0.4

Table 4: collaborative filtering process of user 2

4.1.3 FIND TOP N PRODUCTS


This step consists of finding the top products from the dataset from their particular consumer.
we will find the top n products which will be rated by the user-user approach. In which, always
check how much our method is accurate.
5
4.1.4 Evaluate Using RMSE Method
Root mean square error or root mean square deviation is one of the most commonly used
measures for evaluating the quality of predictions. It shows how far predictions fall from
measured true values using Euclidean distance. To compute RMSE, calculate the residual
(difference between prediction and truth) for each data point, compute the norm of residual
for each data point, compute the mean of residuals and take the square root of that mean.
RMSE is commonly used in supervised learning applications, as RMSE uses and needs true
measurements at each predicted data point.

Root mean square error can be expressed as

4.2 SENTIMENT ANALYSIS


The products that the recommendation system will suggest will be the decision after analyzing
the user’s ratings and reviews. Those products whose ratings will be between 4.5 to 5 will be
excellent products. so, the ratings of the products can be analyzed by the points or the numbers
but for analyzing the reviews of the users on that particular product we are using sentiment
analysis. This paper uses logistic regression to build the whole sentiment analysis. Logistic
Regression is part of Supervised learning. Logistic Regression is the regression algorithm used
for predicting the values according to the given independent variables.
Sentiment analysis means analyzing or understanding the emotion behind that statement or
understanding the aspect behind the statement like is the affirmation is telling in the positive
aspect, negative, or neutral aspect. On the positive aspect, reviews can be like: I like the
product, the quality of the product is very good, very amazing product, etc. In the negative
aspect, the reviews can be like: I don’t like the product, it’s not worth it, it could be better, I
have expected a much better product from you, etc. But sometimes it goes beyond the aspects
like some kind of emotions (anger, sadness, urgency), it also covers the intentions like the
customer is interested or not interested.

4.2.1 DIVIDE THE DATASET INTO TEST AND TRAIN


The very first step of the recommendation system is to import all the libraries and datasets. The
dataset consists of the user ID, and name of the products, and the ratings. The whole dataset
will be divided into two parts that are test dataset and the training dataset. The training dataset
will be used for training the system and the test dataset will be used for testing or for checking
the whole application.

4.2.2 APPLY OVERSAMPLING TECHNIQUE


The training dataset will be 70% of the whole data set and the test dataset will be 30% of the
whole but our data is not balanced means 88% of our data consists of positive reviews and only
12% of the data consists of negative reviews.so, it is important to balance the dataset before
using it. We will use the over-sampling technique to balance the dataset.

6
IMPORTING LIBRARY
AND DATA

TRAIN TEST DATA


DATA SET SET

APPLY OVERSAMPLING
TECHNIQUE

APPLY TF-IDF
VECTORIZATION

EVALUATE USING F-1


SCORE

CALCULATE
SENSITIVITY AND
SPECIFICITY

Fig.4.2: sentiment analysis

OVER SAMPLING refers to copying or duplicating the minority class until it matches the
majority class count. Oversampling is a technique used in data mining and data analytics to
modify unequal data classes to create balanced data sets.
We can easily understand it by an example of the minority class containing 600 values and the
majority class containing 30000 values, we will duplicate the minority class 50 times to make
it equal to the majority class.

4.2.3 APPLY TF-IDF VECTORIZATION


We all know that machines only understand numeric values mean numbers so, we need to
convert all the texts into numbers. For this purpose, we will use the TF-IDFVECTORIZER
method. where Tf stands for term frequency and idf stands for inverse document frequency.
Tf-idf assigns a particular number to a particular word and it also shows the significance of the
number means how significant or important the word is in the whole document.

TF-IDF = TF*IDF

TF = No. of time term appears in a document


Total no. of words in a document

7
IDF = log (total number of documents)
No. of lines in which that particular word appears

4.2.4 EVALUATE USING F-1 SCORE


The next step will be, evaluating the model using the F-1 score. It is used for finding out the
model’s accuracy. It forms an accuracy matrix and this matrix refers that how many times our
model is able to produce the right answer from the whole dataset. F1 score is a machine learning
algorithm by evaluation metric that measures a model’s accuracy. It combines the precision and recall
scores of a model.
The accuracy metric computes how many times a model made a correct prediction across the entire
dataset. This can be a reliable metric only if the dataset is class-balanced; that is, each class of the
dataset has the same number of samples.
Nevertheless, real-world datasets are heavily class-imbalanced, often making this metric unviable.
For example, if a binary class dataset has 90 and 10 samples in class-1 and class-2, respectively,
A model that only predicts “class-1,” regardless of the sample, will still be 90% accurate. Accuracy
computes how many times a model made a correct prediction across the entire dataset.

4.2.5 EVALUATE SENSITIVITY AND SPECIFICITY


We will calculate the sensitivity and specificity of the model. Sensitivity refers to, that how
much our model is able to correctly predict the positive values while specificity refers to, how
much our model is able to correctly predict the negative values. Now, we will print the training
and test the model's sensitivity and specificity.

4.3 CONNECTING THE DOTS


The third and final step consists of merging two little steps which are the loading review &
trained data of both the recommendation and the sentiment analysis. The first step is to generate
product ratings and the second step is to load review and trained data. After that step, we will
use a formula that will tell us the best product ranking.

Ranking = (W1* Predicted rating of recommended product) + (w2*normalized sentiment


score on a scale of 1-5 of recommended product)

As users give more priority to reviews rather than ratings because reviews clarify emotions and
expressions in a better way than ratings. so, we have assigned w1=1 and w2=2 means reviews
are taken as more important than the ratings. So, those products that will have higher rankings
will have higher ratings and reviews. So, using product ranking will make it much easier to
recommend the products.

8
CHAPTER 5

PROJECT SNAPSHOT

Based on the proposed experimental scenarios, the following gives the results and analysis of
them. These are the recommended products for Joshua after using sentiment analysis:

fig.5.1 Recommended product for joshua

9
CHAPTER 6

LIMITATIONS

• The cold-start problem: Collaborative filtering systems are based on the action of
available data from similar users. If you are building a brand new recommendation
system, you would have no user data to start with. You can use content-based filtering
first and then move on to the collaborative filtering approach.
• Scalability: As the number of users grows, the algorithms suffer scalability issues. If
you have 10 million customers and 100,000 movies, you would have to create a sparse
matrix with one trillion elements.
• The lack of the right data: Input data may not always available.
• Lack of data analytics capability: Deep learning-based recommendation engines can
demand high computational complexity. If the data that is fed to the model is less
accurate or valuable, the result will be less useful. So, before investing in
recommendation engines, make sure your business is up to the complex data analytics
demands required.
• Inability to capture changes in user behavior : Consumers do not stand still – they
are constantly behaving and evolving both as people and customers. Staying on top of
these changes is a constant battle. A strong recommendation engine will be able to
identify changes (or signs of impending changes in customers’ preferences and
behavior, and constantly auto-train themselves in real-real-time order to serve relevant
recommendations

10
CHAPTER 7

FUTURE WORK
Expand to support more algorithms. The application now only supports three collaborative
filtering algorithms, user-based, item-based, and biased matrix-factorization. As the
application is readily extensible for recommendation algorithms, it is expected to include more
other algorithms, such as hybrid algorithms, content-based algorithms, demographic-based
algorithms, etc.

In the future, the recommendation system can be solved by using the concept of deep learning.
In this, we can use an RNN which means Recurrent Neural Network. It is a sub-class of neural
network that has a memory unit and feedback unit which makes it better to find the patterns in
the data set. The main component in the RNN is the memory element which keeps a record of
all the previous calculations.

11
CONCLUSION

We have successfully implemented a recommendation system by using users’ ratings and


reviews. This model is able to provide the related products which are of your choice. This
model will be able to increase users’ interest in the e-commerce websites for purchasing the
products. As it is using reviews also so, it will be much more efficient and accurate toward
products.
A general study of the performance of recommender systems is conducted. There are many
different recommendation algorithms proposed to meet the requirement of discovering
preferred items in a large information space. The algorithms include content-based,
collaborative filtering, hybrid, etc. Among these algorithms, collaborative filtering (CF)
algorithms are considered well-performed and the most commonly applied in the modern
world. Hence, this project concentrates on the CF algorithms. There are main three types of CF
algorithms, user-based and item-based, and matrix-factorization, each of which has different
performance in different domains. In the domain of recommender systems, a major challenge
is how to evaluate recommender systems comprehensively so as to find the algorithms that best
suit a certain domain. Inspired by the challenge, the project presents an outcome of a
recommendation system for offline evaluating the three CF algorithms. This report goes from
a literature review to a comparative performance analysis of three algorithms.

12
REFERENCES

[1] Khanvilkar, G., & Vora, D. (2018). Sentiment Analysis for Product Recommendation
Using Random Forest. International Journal of Engineering & Technology.
https://doi.org/10.14419/ijet.v7i3.3.14492

[2] Yi Ren, Jingke Xu, Jie Huang and Cuirong Chi “Research on Collaborative Filtering
Recommendation Algorithm for Personalized Recommendation System”, 2019 9th
International Conference on Education and Social Science (ICESS 2019).

[3] Prateek Sappadla, Yash Sadhwani and Pranit Arora “Movie Recommender System”, Search
Engine Architecture, spring 2017, NYU Courant.

[4] Liaoliang Jiang, Yuting Cheng, L. Y. J. L. H. Y. X. W. (2019). “A trust based collaborative


filtering algorithm for e commerce recommendation system.” Journal of Ambient Intelligence
and Humanized Computing, 10, 3023–3034

[5] Liaoliang Jiang, Yuting Cheng, L. Y. J. L. H. Y. X. W. (2019). “A trust based collaborative


filtering algorithm for e commerce recommendation system.” Journal of Ambient Intelligence
and Humanized Computing, 10, 3023–3034

[6] Lavanya, R., Rithika Lahari, PalakGupta. (2019). “An Optimal Enhancement of the
Dynamic Features of Recommender Systems” International Journal of Recent Technology and
Engineering, 8(2S4), 51-55.

[7] Gharbi Alshammari, Stelios Kapetanakis, Abdullah Alshammari and Nikolaos Polatidi
“Improved Movie Recommendations Based on a Hybrid Feature Combination Method“,
International Journal Of Engineering And Computer Science, July 2019.

[8] Yibo Wang, Mingming Wang, and Wei Xu “A Sentiment-Enhanced Hybrid Recommender
System for Movie Recommendation: A Big Data Analytics Framework”, 1School of
Information, Renmin University of China, Beijing 100872, China, March 2018.

[9] Vishwa, Bhavesh, Aman Gupta, Pranal Soni “Movie Recommendation System”,
International Research Journal of Engg and Technology Volume: 05 Issue: 02, Feb-2018.

[10] Qianzi Shen, Zijian Wang, and Yaoru Sun “Sentiment Analysis of Movie Reviews Based
on CNN-BLSTM”, Springer International Publishing AG 2017.

[11] Yashar Deldjoo, Markus Schedl, Balázs Hidasi, Peter Knees. "Multimedia Recommender
Systems." , RecSys ’18 Proceedings of the 12th ACM Conference on Recommender Systems,
Pages 537-538.

13

You might also like