0% found this document useful (0 votes)

478 views58 pages

Final

This document describes a web-based book recommender system project submitted by four students. The project aims to understand and compare different recommendation filtering techniques including popularity model, content-based filtering, and collaborative filtering. It will combine ratings from these techniques to provide more accurate recommendations. The document includes sections on literature review, analysis of recommendation techniques, system design diagrams, coding of the system, and methodology for implementing the popularity model, content-based filtering, and collaborative filtering.

Uploaded by

vinay mocharla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

478 views58 pages

Final

Uploaded by

vinay mocharla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 58

WEB BASED BOOK RECOMMENDER SYSTEM

A Mini project report submitted in partial fulfillment of the requirements for the award of
the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by

V.SAI ALEKHYA(1210316358)
M.VINAY(1210316330)
T.RAHUL(1210316356)
N.SHOURI(1210316343)

Under the esteemed guidance of

Mrs P.Saraswathi
Assisstant professor,GITAM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM

(Deemed to be University)

VISAKHAPATNAM

i
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM UNIVERSITY OF TECHNOLOGY

GITAM

(Deemed to be university)

DECLARATION

We hereby declare that the project review entitled WEB BASED BOOK RECOMMENDER
SYSTEM is an original work done in the Department of computer science and engineering,Gitam
institute of technology,GITAM(Deemed to be university)submitted in partial fulfillment of the
requirements for the award of the degree of B.Tech in Computer Science and Engineering.The
work has not been submitted to any other college or University for the award of any degree or
Diploma

Date: Signature of HOD

Registration no:(1210316358) Name:V.Sai Alekhya Signature:

Registration no:(1210316330) Name:M.Vinay Signature:

Registration no:(1210316356) Name:T.Rahul Signature:

Registration no:(1210316343) Name:N.Shouri Signature:

ii
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM INSTITUTE OF TECHNOLOGY

GITAM

(Deemed to be University)

CERTIFICATE

This is to certify that the Mini-Project Report entitled WEB BASED RECOMMENDER
SYSTEM is bonafide record of work carried by V.SAL ALEKHYA(1210316358),
M.VINAY(1210316330), N.SHOURI(1210316343), T.RAHUL(1210316356) submitted in
partial fulfillment of requirement for the award of the degree of Bachelor of Technology in
Computer science and Engineering.

PROJECT GUIDE PROJECT REVIEWER

(Mrs P.Saraswathi) Dr Praveen Kumar.S Dr T.Srinivasa Rao

(Asst.professor) (Asst.professor) (Assoc.professor)

iii
ABSTRACT

The online recommendation system has become a trend.Now a days rather than going out and
buying items for themselves, online recommendation provides an easier and quicker way to buy
items and transactions are also quick when it is done online. Recommended systems are powerful
new technology and it helps users to find items which they want to buy.On the Internet ,where the
number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant
information in order to alleviate the problem of information overload, which has created a potential
problem to many internet users. Recommender systems solve this problem by searching through
large volume of dynamically generated information to provide users with personalized content and
services.We explore the different characteristics and potentials of different prediction techniques
in recommendation systems in order to serve as a compass for research and practice in the field of
recommendation systems.In this project, we attempt to understand the different kinds of
recommendation systems such as popularity model, Content based filtering and Collaborative
filtering and compare their performance on the goodbooks dataset. We combine the ratings from
popularity model, Content based filtering and Collaborative filtering to get more accurate results.

iv
ACKNOWLEDGEMENT

This Mini-Project which we had was a great chance for Learning and also for Professional
Development. Therefore,we consider ourself as a lucky individual as we were provided an
opportunity to be a part of it.

We express my deepest thanks to P.SARASWATHI(Asst.Professor) for taking part of the project

as a guide and also helpful in giving the useful decision & giving necessary advices and guidances
and arranged all facilities to make this easier.We choose this moment to acknowledgment her
contribution gratefully.

We would also like to thank the project Review Faculty Mr S.Praveen Kumar (Asst.professor) and
for reviewing the project every week and explaining our mistakes and setting our project in the
correct path and finally who are the helping hands in the successful completion of the Project.

We would like to express deepest thanks to Dr. Konala Thammi Reddy sir, HOD of
CSE Department, GIT for giving us a great opportunity to complete the Project Successfully. We
choose this moment to acknowledgement his contribution gratefully.

We would also like to thank B.Rajesh sir,(AMC) who motivated us and also explained about
the importance of the Project in student life.

We perceive this opportunity as a big milestone in our career development. We still strive to use
gained skills and knowledge in the best possible way, and We’ll continue to work on the
improvement in order to attain desired career objectives. Hope to continue cooperation with all of
you in future.

v
TABLE OF CONTENTS
ABSTRACT ................................................................................................................................................ iv
CHAPTER 1 ................................................................................................................................................ 1
INTRODUCTION....................................................................................................................................... 1
1.1 MOTIVATION....................................................................................................................................... 2

1.2 PROBLEM STATEMENT ...................................................................................................................... 3

CHAPTER 2 ................................................................................................................................................ 4
LITERATURE SURVEY ........................................................................................................................... 4
CHAPTER 3 ................................................................................................................................................ 7
ANALYSIS .................................................................................................................................................. 7
3.1 RECOMMENDATION FILTERING TECHNIQUES.................................................................................... 7

3.1.1 CONTENT BASED FILTERING......................................................................................................... 8

3.1.2 COLLABORATIVE FILTERING TECHNIQUE .................................................................................. 10
3.1.3 HYBRID FILTERING TECHNIQUE.................................................................................................. 15
3.2 SOFTWARE REQUIREMENTS: ............................................................................................................ 16

CHAPTER 4 .............................................................................................................................................. 17
DESIGN ..................................................................................................................................................... 17
4.1 UML DIAGRAMS ................................................................................................................................ 17

4.1.1 USE CASE ................................................................................................................................... 18

4.1.2 CLASS DIAGRAM ......................................................................................................................... 18
4.1.4 ACTIVITY DIAGRAM ................................................................................................................... 21
CHAPTER 5 .............................................................................................................................................. 22
CODING .................................................................................................................................................... 22
CHAPTER 6 .............................................................................................................................................. 34
METHODOLOGY ................................................................................................................................... 34
6.1 POPULARITY MODEL ......................................................................................................................... 34

6.2 CONTENT BASED FILTERING ............................................................................................................. 35

6.3 COLLABORATIVE FILTERING .............................................................................................................. 36

6.4 HYBRID RECOMMENDATION SYSTEM ............................................................................................ 39

FUTURE SCOPE ...................................................................................................................................... 41

vi
RELATED WORK ................................................................................................................................... 43
SCREENSHOTS ....................................................................................................................................... 44
REFERENCES .......................................................................................................................................... 46

vii
LIST OF FIGURES

Figure 1: Block Diagram Of Recommender System .................................................................... 7

Figure 2: Item Ratings matrix in Collaborative filtering ............................................................. 11
Figure 3:Block diagram of Content based filtering ...................................................................... 35
Figure 4:Block diagram of Collaborative filtering ....................................................................... 37
Figure 5:User based Collaborative filtering .................................................................................. 38
Figure 6:Item based Collaborative filtering .................................................................................. 39
Figure 7:A Hybrid Recommender System .................................................................................... 40

viii
CHAPTER 1

INTRODUCTION

A Recommender system is a type of information filtering system that predict the rating or
preference that a user would give to an item. Recommender systems are sometimes referred to as
recommendation system.

constantly expect good recommendations. They have a low threshold for services that are not able
to make appropriate suggestions. If a music streaming app is not able to predict and play
Recommendation system describes web applications that predicted response to options. According
to, Recommender Systems are targeted to individuals who do not have enough personal experience
to evaluate the potentially overwhelming alternatives that a web site for instance may offer.

Recommender systems often provide personalized recommendation of items (to users) using a
ranked list of predicted items. According to individuals rely on recommendations provided by
others in making routine daily decisions. As examples, it is common to rely on peer
recommendations when selecting a book to read; employers use recommendation letters in their
recruiting decisions;

Due to the advances in recommender systems, users music that the user likes, then the user will
simply stop using it. This has led to a high emphasis by tech companies on improving their
recommendation systems. However, the problem is more complex than it seems.

Every user has different preferences and likes. In addition, even the taste of a single user can vary
depending on a large number of factors, such as mood, season, or type of activity the user is doing.
For example, the type of music one would like to hear while exercising differs greatly from the
type of music he’d listen to when cooking dinner. Another issue that recommendation systems
have to solve is the exploration vs exploitation problem. They must explore new domains to
discover more about the user, while still making the most of what is already known about of the
user. Two main approaches are widely used for recommender systems. One is content-based
filtering, where we try to profile the users interests using information collected, and recommend

1
items based on that profile. The other is collaborative filtering, where we try to group similar users
together and use information about the group to make recommendations to the user

1.1 MOTIVATION

The explosive growth in the amount of available digital information and the number of visitors to
the Internet have created a potential challenge of information overload which hinders timely access
to items of interest on the Internet. Information retrieval system such as Google, Devil Finder and
Altavista have partially solved this problem but prioritization and personalization (where a system
maps available content to user’s interests and preferences) of information were absent. This has
increased the demand for recommender systems more than ever before. Recommender systems are
information filtering systems that deal with the problem of information overload by filtering vital
information fragment out of large amount of dynamically generated information according to
user’s preferences, interest, or observed behavior about item. Recommender system has the ability
to predict whether a particular user would prefer an item or not based on the user’s profile.

In e-commerce setting, recommender systems enhance revenues, for the fact that they are effective
means of selling more products. In scientific libraries, recommender systems support users by
allowing them to move beyond catalog searches. Therefore, the need to use efficient and accurate
recommendation techniques within a system that will provide relevant and dependable
recommendations for users cannot be over-emphasized.

2
1.2 PROBLEM STATEMENT

“Enhancing Performance of Recommender Systems” deal with improving performance of

recommender systems applicable to various domains. Goal of this work is to make
recommendation methods more accurate and applicable to broader range of real-life needs. This
system calculates the similarities between different users and then recommend books to them as
per the ratings given by the different users of similar tastes. This will provide a precise
recommendation to the user. This system will recommend books to users. This system will provide
more precise results as compared to the existing systems.

3
CHAPTER 2

LITERATURE SURVEY

Recommender system is defined as a decision making strategy for users under complex
information environments. Also, recommender system was defined from the perspective of E-
commerce as a tool that helps users search through records of knowledge which is related to users’
interest and preference. Recommender system was defined as a means of assisting and augmenting
the social process of using recommendations of others to make choices when there is no sufficient
personal knowledge or experience of the alternatives. Recommender systems handle the problem
of information overload that users normally encounter by providing them with personalized,
exclusive content and service recommemdations. Recently, various approaches for building
recommendation systems have been developed, which can utilize either collaborative filtering,
content-based filtering or hybrid filtering. Collaborative filtering technique is the most mature and
the most commonly implemented. Collaborative filtering recommends items by identifying other
users with similar taste; it uses their opinion to recommend items to the active user. Collaborative
recommender systems have been implemented in different application areas. GroupLens is a news-
based architecture which employed collaborative methods in assisting users to locate articles from
massive news database. Ringo is an online social information filtering system that uses
collaborative filtering to build users profile based on their ratings on music albums. Amazon uses
topic diversification algorithms to improve its recommendation. The system uses collaborative to
overcome scalability issue by generating a table of similar items offline through the use of item-
to-item matrix.

The system then recommends other products which are similar online according to the users’
purchase history. On the other hand, content-based techniques match content resources to user
characteristics. Content-based filtering techniques normally base their predictions on user’s
information, and they ignore contributions from other users as with the case of collaborative
techniques. Fab relies heavily on the ratings of different users in order to create a training set and
it is an example of content-based recommender system. Some other systems that use content-based
filtering to help users find information on the Internet include Letizia.

4
The system makes use of a user interface that assists users in browsing the Internet; it is able to
track the browsing pattern of a user to predict the pages that they may be interested in. Pazzani et
al. designed an intelligent agent that attempts to predict which web pages will interest a user by
using naive Bayesian classifier. The agent allows a user to provide training instances by rating
different pages as either hot or cold. Jennings and Higuchi describe a neural networks that models
the interests of a user in a Usenet news environment.

Despite the success of these two filtering techniques, several limitations have been identified.
Some of the problems associated with content-based filtering techniques are limited content
analysis, overspecialization an sparsity of data. Also, collaborative approaches exhibit cold-start,
sparsity and scalability problems. These problems usually reduce the quality of recommendations.
In order to mitigate some of the problems identified, Hybrid filtering, which combines two or more
filtering techniques in different ways in order to increase the accuracy and performance of
recommender systems has been proposed. These techniques combine two or more filtering
approaches in order to harness their strengths while leveling out their corresponding weaknesses.
They can be classified based on their operations into weighted hybrid, mixed hybrid, switching
hybrid, feature-combination hybrid, cascade hybrid, feature-augmented hybrid and meta-level
hybrid .

Collaborative filtering and content-based filtering approaches are widely used today by
implementing content-based and collaborative techniques differently and the results of their
prediction later combined or adding the characteristics of content-based to collaborative filtering
and vice versa. Finally, a general unified model which incorporates both content-based and
collaborative filtering properties could be developed . The problem of sparsity of data and cold-
start was addressed by combining the ratings, features and demographic information about items
in a cascade hybrid recommendation technique in . In Ziegler et al. , a hybrid collaborative filtering
approach was proposed to exploit bulk taxonomic information designed for exacting product
classification to address the data sparsity problem of CF recommendations, based on the generation
of profiles via inference of super-topic score and topic diversification. A hybrid recommendation
technique is also proposed in Ghazantar and Pragel-Benett , and this uses the content-based profile
of individual user to find similar users which are used to make predictions. In Sarwar et al. ,

5
collaborative filtering was combined with an information filtering agent. Here, the authors
proposed a framework for integrating the content-based filtering agents and collaborative filtering.
A hybrid recommender algorithm is employed by many applications as a result of new user
problem of content-based filtering techniques and average user problem of collaborative filtering .
A simple and straightforward method for combining content-based and collaborative filtering was
proposed by Cunningham et al. . A music recommendation system which combined tagging
information, play counts and social relations was proposed in Konstas et al. . In order to determine
the number of neighbors that can be automatically connected on a social platform, Lee and
Brusilovsky embedded social information into collaborative filtering algorithm. A
Bayesian mixed effects model that integrates user ratings, user and item features in a single unified
framework was proposed by Condiff et al.

6
CHAPTER 3

ANALYSIS

3.1 RECOMMENDATION FILTERING TECHNIQUES

The use of efficient and accurate recommendation techniques is very important for a system that
will provide good and useful recommendation to its individual users. This explains the importance
of understanding the features and potentials of different recommendation
techniques. Figure shows the anatomy of different recommendation filtering techniques.

7
3.1.1 CONTENT BASED FILTERING

Content-Based Filtering (CBF) is one of the traditional types of recommender systems. The root
of the content-based filtering is in information retrieval and information filtering research. In this
method, the algorithm will suggest new items to users based on user interest in the past. Content-
based filtering can be used in different recommendation systems such as news article
recommendation systems or TV program recommendation systems. The method varies partly 2 in
each of these systems. However, some fundamental concepts stay the same, like the two sets of
information that it works with:
1) a set of features that describe the items to be recommended and
2) a user profile built from past choices that the user made.

In content-based recommender systems, the descriptive attributes of items are used to make
recommendations. The term “content” refers to these descriptions. In content-based methods, the
ratings and buying behavior of users are combined with the content information available in the
items. For example, consider a situation where John has rated the movie Terminator highly, but
we do not have access to the ratings of other users. Therefore, collaborative filtering methods are
ruled out. However, the item description of Terminator contains similar genre keywords as other
science fiction movies, such as Alien and Predator. In such cases, these movies can be
recommended to John.

In content-based methods, the item descriptions, which are labeled with ratings, are used as
training data to create a user-specific classification or regression modeling problem. For each user,
the training documents correspond to the descriptions of the items she has bought or rated. The
class (or dependent) variable corresponds to the specified ratings or buying behavior. These
training documents are used to create a classification or regression model, which is specific to the
user at hand (or active user). This user-specific model is used to predict whether the corresponding
individual will like an item for which her rating or buying behavior is unknown.

8
Finally, content-based filtering will use information gained from the two sets to recommend a new
item system compares any new item with those that exist in the user’s profile . However, CB
techniques have some limitations, like the data scarcity problem. The only resource for modeling
user interest is extracting features from their browsing or purchasing history . Therefore, CB
systems are not able to identify different items that the user may enjoy, because they attempt to
find those items that are very similar to the items in the history of that user.
Content-based technique is a domain-dependent algorithm and it emphasizes more on the analysis
of the attributes of items in order to generate predictions. When documents such as web pages,
publications and news are to be recommended, content-based filtering technique is the most
successful. In content-based filtering technique, recommendation is made based on the user
profiles using features extracted from the content of the items the user has evaluated in the past.
Items that are mostly related to the positively rated items are recommended to the user.

CBF uses different types of models to find similarity between documents in order to generate
meaningful recommendations. It could use Vector Space Model such as Term Frequency Inverse
Document Frequency (TF/IDF) or Probabilistic models such as Naïve Bayes Classifier, Decision
Trees or Neural Networks to model the relationship between different documents within a corpus.
These techniques make recommendations by learning the underlying model with either statistical
analysis or machine learning techniques. Content-based filtering technique does not need the
profile of other users since they do not influence recommendation. Also, if the user profile changes,
CBF technique still has the potential to adjust its recommendations within a very short period of
time. The major disadvantage of this technique is the need to have an in-depth knowledge and
description of the features of the items in the profile.

Pros and cons of content based filtering

CB filtering techniques overcome the challenges of CF. They have the ability to recommend new
items even if there are no ratings provided by users. So even if the database does not contain user
preferences, recommendation accuracy is not affected. Also, if the user preferences change, it has
the capacity to adjust its recommendations in a short span of time. They can manage situations
where different users do not share the same items, but only identical items according to their

9
intrinsic features. Users can get recommendations without sharing their profile, and this ensures
privacy. CBF technique can also provide explanations on how recommendations are generated to
users. However, the techniques suffer from various problems as discussed in the literature. Content
based filtering techniques are dependent on items’ metadata. That is, they require rich description
of items and very well organized user profile before recommendation can be made to users. This
is called limited content analysis. So, the effectiveness of CBF depends on the availability
of descriptive data. Content overspecialization is another serious problem of CBF technique.
Users are restricted to getting recommendations similar to items already defined in their profiles.

3.1.2 COLLABORATIVE FILTERING TECHNIQUE

Collaborative filtering is the type of recommendation algorithm that bases its predictions and
recommendations on the rating or behavior of other users in the system. The fundamental idea of
collaborative filtering is to find other users in the community that share opinions.

Collaborative filtering models use the collaborative power of the ratings provided by multiple users
to make recommendations. The main challenge in designing collaborative filtering methods is that
the underlying ratings matrices are sparse. Consider an example of a movie application in which
users specify ratings indicating their like or dislike of specific movies. Most users would have
viewed only a small fraction of the large universe of available movies. As a result, most of the
ratings are unspecified. The specified ratings are also referred to as observed ratings. Throughout
this book, the terms “specified” and “observed” will be used in an interchangeable way. The
unspecified ratings will be referred to as “unobserved” or “missing.”

The basic idea of collaborative filtering methods is that these unspecified ratings can be imputed
because the observed ratings are often highly correlated across various users and items. For
example, consider two users named Alice and Bob, who have very similar tastes. If the ratings,
which both have specified, are very similar, then their similarity can be identified by the underlying
algorithm. In such cases, it is very likely that the ratings in which only one of them has specified
a value, are also likely to be similar. This similarity can be used to make inferences about
incompletely specified values. Most of the models for collaborative filtering focus on leveraging

10
either inter-item correlations or inter-user correlations for the prediction process. Some models use
both types of correlations. Furthermore, some models use carefully designed optimization
techniques to create a training model in much the same way a classifier creates a training model
from the labeled data. This model is then used to impute the missing values in the matrix, in the
same way that a classifier imputes the missing test labels.

Collaborative filtering is a domain-independent prediction technique for content that cannot easily
and adequately be described by metadata such as movies and music. Collaborative filtering
technique works by building a database (user-item matrix) of preferences for items by users. It
then matches users with relevant interest and preferences by calculating similarities between their
profiles to make recommendations. Such users build a group called neighborhood. An user gets
recommendations to those items that he has not rated before but that were already positively rated
by users in his neighborhood. Recommendations that are produced by CF can be of either
prediction or recommendation. Prediction is a numerical value, Rij, expressing the predicted score
of item j for the user i, while Recommendation is a list of top N items that the user will like the
most as shown in figure. The technique of collaborative filtering can be divided into two
categories: memory-based and model-based .

11
There are two popular approaches of collaborative filtering:

A. User-based approach

Book Recommendation System uses the user ratings of other users with similar preferences to
recommend a book item to a certain user. User-based recommendation algorithms firstly identify
the k most similar users to the active user using the Pearson correlation or vector-space model in
which each user is treated as a vector in the m-dimensional item space, and the similarities between
the active user and other users are computed between the vectors. After the k most similar users
have been discovered, their corresponding rows in the user-item matrix R are aggregated to
identify a set of book items, C, ate by the group together with their frequency. With the set C,
user-based CF techniques then recommend the top-N most frequent elements in C that the active
user has not ate (XiaoyuanSu, 2009).

B. Item-based approach

Though user- based approach is useful, it suffers from the scalability problem as the user base
grows. Searching from the neighbors of a user becomes time-consuming. To extend collaborative
filtering to the large user base, a more scalable version of collaborative filtering, the i.e. item based
approach was introduced. In item based approach, instead of using similarities between users’
rating to predict preferences, similarities between the evaluation patterns of a particular item is
considered. Thus, the overall structure of this approach seems to be similar to that of content based
approach to recommendation and personalization, but item similarity is deduced from user
preference patterns rather than extracted from the item data. Even in its raw form, item–item CF
does not fix anything: it is still necessary to find themost similar to generate predictions and
recommendations. In a system that has more users than items, it allows the neighborhood finding
to be amongst the smaller of the two dimensions.

The significant performance gain occurs as it lends itself well to pre-computing the similarity
matrix. As, a user rates and re-rates items, their rating vector will change along with their similarity
to other users. Finding similar users in advance is, therefore, complicated: a user’s neighborhood
is determined not only by their ratings but also by the ratings of other users, so their neighborhood
can change as a result of new ratings supplied by any user in the system. For this reason, most

12
user- based CF systems find neighborhoods at the time when predictions or recommendations are
needed (Ekstrand, 2010).

Pros and cons of collaborative filtering

Collaborative Filtering has some major advantages over CBF in that it can perform in domains
where there is not much content associated with items and where content is difficult for a computer
system to analyze (such as opinions and ideal). Also, CF technique has the ability to provide
serendipitous recommendations, which means that it can recommend items that are relevant to the
user even without the content being in the user’s profile. Despite the success of CF techniques,
their widespread use has revealed some potential problems such as follows.

Cold start problem

This refers to a situation where a recommender does not have adequate information about a user
or an item in order to make relevant predictions. This is one of the major problems that reduce the
performance of recommendation system. The profile of such new user or item will be empty since
he has not rated any item; hence, his taste is not known to the system.

New items and new users pose a significant challenge to recommender systems. Collectively these
problems are referred to as the cold-start problem. The first of these problems arises in CF systems,
where an item cannot be recommended unless some user has rated it before. This issue applies not
only to new items but also to obscure items, which is particularly detrimental to users with
heterogeneous tastes.

Since content-based approaches do not rely on ratings from other users, they can be used to
produce recommendations for all items, provided attributes of the items are available. In fact, the
content-based predictions of similar users can also be used to improve predictions further for the
active user. The new-user problem is hard to tackle since without previous record of preferences
of a user it is not possible to find similar users or to build a content-based profile. As such, research
in this area has primarily focused on effectively selecting items to be rated by a user so as to
improve recommendation performance rapidly with the least user feedback. In this setting,

13
classical techniques from active learning can be leveraged to address the task of item selection
(Melville, 2010).

Data sparsity problem

This is the problem that occurs as a result of lack of enough information, that is, when only a few
of the total number of items available in a database are rated by users. This always leads to a sparse
user-item matrix, inability to locate successful neighbors and finally, the generation of weak
recommendations. Also, data sparsity always leads to coverage problems, which is the percentage
of items in the system that recommendations can be made .

Most users do not rate most items and hence the user rating matrix is typically very less. This is a
problem for Collaborative Filtering systems since it decreases the probability of finding a set of
users with similar ratings. This issue often occurs when a system has a very high item to-user ratio
or the system is in the initial stages of use. This issue can be mitigated by using additional domain
information or making assumptions about the data generation process that allows for high-quality
imputation

Scalability

This is another problem associated with recommendation algorithms because computation

normally grows linearly with the number of users and items. A recommendation technique that is
efficient when the number of dataset is limited may be unable to generate satisfactory number of
recommendations when the volume of dataset is increased. Thus, it is crucial to apply
recommendation techniques which are capable of scaling up in a successful manner as the number
of dataset in a database increases. Methods used for solving scalability problem and speeding up
recommendation generation are based on Dimensionality reduction techniques, such as Singular
Value Decomposition (SVD) method, which has the ability to produce reliable and efficient
recommendations.

14
Synonymy

Synonymy is the tendency of very similar items to have different names or entries. Most
recommender systems find it difficult to make distinction between closely related items such as
the difference between e.g. baby wear and baby cloth. Collaborative Filtering systems usually find
no match between the two terms to be able to compute their similarity. Different methods, such as
automatic term expansion, the construction of a thesaurus, and Singular Value Decomposition
(SVD), especially Latent Semantic Indexing are capable of solving the synonymy problem. The
shortcoming of these methods is that some added terms may have different meanings from what
is intended, which sometimes leads to rapid degradation of recommendation performance.

Fraud

As Recommender Systems are increasingly adopted by commercial websites, they have started to
play a significant role in affecting the profitability of sellers. This has led to many unscrupulous
vendors attacks usually involve setting up dummy profiles and assume different amounts of
knowledge engaging in different forms of fraud to game recommender systems for their benefit.
Typically, they attempt to inflate the perceived desirability of their products or lower the ratings
of their competitors. These types of attack have been broadly studied as shilling attacks or profile
injection attacks. Such about the system. For instance, the average attack assumes knowledge of
the mean rating for each item; and the attacker assigns values randomly distributed around this
average, along with a high score for the item being pushed.

3.1.3 HYBRID FILTERING TECHNIQUE

Hybrid filtering technique combines different recommendation techniques in order to gain better
system optimization to avoid some limitations and problems of pure recommendation systems.
The idea behind hybrid techniques is that a combination of algorithms will provide more accurate
and effective recommendations than a single algorithm as the disadvantages of one algorithm can
be overcome by another algorithm. Using multiple recommendation techniques can suppress the

15
weaknesses of an individual technique in a combined model. The combination of approaches can
be done in any of the following ways: separate implementation of algorithms and combining the
result, utilizing some content-based filtering in collaborative approach, utilizing some
collaborative filtering in content-based approach, creating a unified recommendation system that
brings together both approaches.

Recent research has demonstrated that a hybrid approach, combining collaborative filtering and
contentbased filtering could be more effective in some cases. Hybrid approaches can be
implemented in several ways: by making content-based and collaborative-based predictions
separately and then combining them; by adding content-based capabilities to a collaborative-based
approach (and vice versa); or by unifying the approaches into one model. Several studies
empirically compare the performance of the hybrid with the pure collaborative and content-based
methods and demonstrate that the hybrid methods can provide more accurate recommendations
than pure approaches. These methods can also be used to overcome some of the common problems
in recommendation systems such as cold start and the sparsity problem.

A variety of techniques have been proposed as the basis for recommendation systems:
collaborative, content-based, knowledge-based, and demographic techniques. Each of these
techniques has known shortcomings, such as the well-known cold-start problem for collaborative
and content-based systems and the knowledge engineering bottleneck in knowledge-based
approaches. A hybrid recommendation system is one that combines multiple techniques together
to achieve some synergy between them

3.2 SOFTWARE REQUIREMENTS:

1. PYTHON IDLE

IDLE(Integrated Development and Learning Environment) is an Integrated Development

Environment for python. The Python installer for windows containd IDLE module by
default.IDLE is not available by default in python distributons for linux.

2. DATASET

16
Dataset used for the project is goodbooks-10k. This dataset contains ratings for ten thousand
popular books. There are 100 reviews for each book, although some have less - fewer - ratings.
Ratings go from one to five.

Contents of dataset:

1. to_read.csv provides IDs of the books marked "to read" by each user, as user_id, book_id
pairs.
2. books.csv has metadata for each book (goodreads IDs, authors, title, average rating, etc.).
3. book_tags.csv contains tags/shelves/genres assigned by users to books. Tags in this file
are represented by their IDs.
4. tags.csv translates tag IDs to names.
5. Ratings.csv which gives ratings of all the books.

This dataset contains ratings for ten thousand popular books. As to the source, let's say that
these ratings were found on the internet. Generally, there are 100 reviews for each book,
although some have less - fewer - ratings. Ratings go from one to five.

Both book IDs and user IDs are contiguous. For books, they are 1-10000, for users, 1-53424.
All users have made at least two ratings. Median number of ratings per user is 8.

There are also books marked to read by the users, book metadata (author, year, etc.) and tags.

CHAPTER 4

DESIGN

4.1 UML DIAGRAMS

17
4.1.1 USE CASE

4.1.2 CLASS DIAGRAM

18
4.1.3 SEQUENCE DIAGRAM

19
20
4.1.4 ACTIVITY DIAGRAM

21
CHAPTER 5

CODING

Python is an interpreted, high-level, general purpose programming language. Python’s design

philosophy emphasizes code readability with its notable use of significant whitespace. Its language
constructs and object oriented approach aim to help programmers write, clear, logical code for
small and large scale projects.

Python is dynamically typed and garbage collected. It supports multiple programming paradigms,
including procedural, object oriented and functional programming.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import warnings; warnings.simplefilter('ignore')

from scipy import stats

from ast import literal_eval

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

from nltk.stem.snowball import SnowballStemmer

from nltk.stem.wordnet import WordNetLemmatizer

from nltk.corpus import wordnet

from surprise import Reader, Dataset, SVD, evaluate

class hybrid(object):

def init (self,user_id,ratings):

self.user_id = user_id

22
self.md =
pd.read_csv(r'C:/Users/TINKU/Downloads/IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)/IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master/CustomData/FinalData.csv')

self.ratings = ratings

print(ratings[(ratings['user_id'] == user_id)][['user_id','book_id', 'rating']])

self.popularity_rating = self.popularity(self.md)

self.collaborative_rating = self.collaborative(self.ratings, self.user_id)

self.content_rating = self.content_based(self.md,self.ratings,self.user_id)

self.final_hybrid(self.md, self.popularity_rating , self.collaborative_rating,

self.content_rating, self.user_id)

def popularity(self,md):

fd=pd.read_csv(r'C:\Users\TINKU\Downloads\IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)\IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master\CustomData\AverageRatings.csv')

fd1 =
pd.read_csv(r'C:\Users\TINKU\Downloads\IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)\IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master\CustomData\RatingsCount.csv')

fd[fd['rating'].notnull()]['rating'] = fd[fd['rating'].notnull()]['rating'].astype('float')

23
vote_averages= fd[fd['rating'].notnull()]['rating']

C = vote_averages.mean()

fd1[fd1['rating'].notnull()]['rating'] = fd1[fd1['rating'].notnull()]['rating'].astype('float')

vote_counts = fd1[fd1['rating'].notnull()]['rating']

m = len(vote_counts)

md['ratings_count'] = fd1['rating']

md['average_rating'] = fd['rating']

qualified = md[(md['ratings_count'].notnull())][['book_id','title', 'authors', 'ratings_count',

'average_rating']]

qualified['ratings_count'] = qualified['ratings_count'].astype('float')

qualified['average_rating'] = qualified['average_rating'].astype('float')

qualified.shape

def weighted_rating(x):

v = x['ratings_count']

R = x['average_rating']

return (v/(v+m) * R) + (m/(m+v) * C)

qualified['popularity_rating'] = qualified.apply(weighted_rating, axis=1)

pop = qualified[['book_id','popularity_rating']]

print(qualified.shape)

print(pop.shape)

return pop

24
def collaborative(self,ratings,user_id):

reader = Reader#ratings.head()

temp_ratings = ratings

data = Dataset.load_from_df(temp_ratings[['user_id', 'book_id', 'rating']], reader)

data.split(n_folds=2)

svd = SVD()

evaluate(svd, data, measures=['RMSE', 'MAE'])

trainset = data.build_full_trainset()

algo = SVD()

algo.fit(trainset)

#svd.train(trainset)

from collections import defaultdict

testset = trainset.build_anti_testset()

predictions = algo.test(testset)

count = 0

for uid, iid, true_r, est, _ in predictions:

if uid == user_id:

count = count+1

temp_ratings.loc[len(temp_ratings)+1]= [uid,iid,est]

#print("count\n")

#print(count)

#print("\n--------here-------\n")

#print(temp_ratings)

25
cb = temp_ratings[(temp_ratings['user_id'] == user_id)][['book_id', 'rating']]

#print("\n--------here-------\n")

#print(cb)

cb = temp_ratings[(temp_ratings['user_id'] == user_id)][['book_id', 'rating']]

return(cb)

def content_based(self,md,ratings,user_id):

md['book_id'] = md['book_id'].astype('int')

ratings['book_id'] = ratings['book_id'].astype('int')

ratings['user_id'] = ratings['user_id'].astype('int')

ratings['rating'] = ratings['rating'].astype('int')

md['authors'] = md['authors'].str.replace(' ','')

md['authors'] = md['authors'].str.lower()

md['authors'] = md['authors'].str.replace(',',' ')

#print(md.head())

md['authors'] = md['authors'].apply(lambda x: [x,x])

#print(md['authors'])

md['Genres']=md['Genres'].str.split(';')

#print(md['Genres'])

md['soup'] = md['authors'] + md['Genres']

#print(md['soup'])

md['soup'] = md['soup'].str.join(' ')

#md['soup'].fillna({})

26
#print(md['soup'])

count = CountVectorizer(analyzer='word',ngram_range=(1,1),min_df=0,
stop_words='english')

count_matrix = count.fit_transform(md['soup'])

print(count_matrix.shape)

#print np.array(count.get_feature_names())

#print(count_matrix.shape)

cosine_sim = cosine_similarity(count_matrix, count_matrix)

def build_user_profiles():

user_profiles=np.zeros((60001,999))

#taking only the first 100000 ratings to build user_profile

for i in range(0,100000):

u=ratings.iloc[i]['user_id']

b=ratings.iloc[i]['book_id']

user_profiles[u][b-1]=ratings.iloc[i]['rating']

return user_profiles

user_profiles=build_user_profiles()

def _get_similar_items_to_user_profile(person_id):

#Computes the cosine similarity between the user profile and all item profiles

user_ratings = np.empty((999,1))

cnt=0

for i in range(0,998):

27
book_sim=cosine_sim[i]

user_sim=user_profiles[person_id]

user_ratings[i]=(book_sim.dot(user_sim))/sum(cosine_sim[i])

maxval = max(user_ratings)

print(maxval)

for i in range(0,998):

user_ratings[i]=((user_ratings[i]*5.0)/(maxval))

if(user_ratings[i]>3):

cnt+=1

return user_ratings

content_ratings = _get_similar_items_to_user_profile(user_id)

num = md[['book_id']]

num1 = pd.DataFrame(data=content_ratings[0:,0:])

frames = [num, num1]

content_rating = pd.concat(frames, axis =1,join_axes=[num.index])

content_rating.columns=['book_id', 'content_rating']

#print(content_rating.shape)

#print(content_rating)

return(content_rating)

def final_hybrid(self,md, popularity_rating , collaborative_rating, content_rating, user_id):

hyb = md[['book_id']]

title = md[['book_id','title', 'Genres']]

hyb = hyb.merge(title,on = 'book_id')

28
hyb = hyb.merge(self.collaborative_rating,on = 'book_id')

hyb = hyb.merge(self.popularity_rating, on='book_id')

hyb = hyb.merge(self.content_rating, on='book_id')

def weighted_rating(x):

v = x['rating']

R = x['popularity_rating']

c = x['content_rating']

return 0.4v + 0.2R + 0.4 * c

hyb['hyb_rating'] = hyb.apply(weighted_rating, axis=1)

hyb = hyb.sort_values('hyb_rating', ascending=False).head(999)

hyb.columns = ['Book ID' , 'Title', 'Genres', 'Collaborative Rating', 'Popularity Rating' ,

'Content Rating', 'Hybrid Rating']

print(len(hyb['Hybrid Rating']))

print(hyb)

def newUser():

print('\n Rate from books\n')

print('ID Author Title Genre\n')

print('2. J.K. Rowling, Mary Harry Potter and the Sorcerer\'s Stone (Harry Potter,
#1) Fantasy;Young-Age')

print('127. Malcolm Gladwell The Tipping Point: How Little Things Can Make a
Big Difference Self-Help')

print('239. Max Brooks World War Z: An Oral History of the Zombie War
Horror;Fiction')

print('26 Dan Brown The Da Vinci Code Thriller;Drama')

29
print('84 Michael Crichton Jurassic Park (Jurassic Park, #1)
SciFi;Thriller;Fantasy')

print('86 John Grisham A Time to Kill Thriller')

print('966 Scott Turow Presumed Innocent Thriller;Crime')

print('42 Louisa May Alcott Little Women (Little Women, #1)

Young-Age;Romance;Drama')

print('44 Nicholas Sparks The Notebook (The Notebook, #1)

Romance;Drama')

print('54 Douglas Adams The Hitchhiker\'s Guide to the Galaxy

Fantasy;Fiction')

print('134 Cassandra Clare City of Glass (The Mortal Instruments, #3)

Kids;Fantasy;Fiction')

print('399 J.K. Rowling The Tales of Beedle the Bard

Kids;Fantasy;Fiction')

print('38 Audrey Niffenegger The Time Traveler\'s Wife

Romance;SciFi;Fantasy;Domestic')

print('729 Dan Simmons Hyperion (Hyperion Cantos, #1)

SciFi')

print('807 Dave Eggers The Circle SciFi')

print('690 Barack Obama The Audacity of Hope: Thoughts on Reclaiming the

American Dream Biography')

print('617 Piper Kerman Orange Is the New Black

Biography')

print('495 Dave Eggers A Heartbreaking Work of Staggering Genius

Biography')

30
print('770 William Shakespeare,Roma Gill Julius Caesar
History;Classic')

print('773 William Shakespeare The Taming of the Shrew

Comedy;Classic')

print('829 E.M. Forster A Room with a View Classic')

print('971 Marcus Pfister, J. Alison James The Rainbow Fish

Kids')

print('976 Robert Kapilow, Dr. Seuss Dr. Seuss\'s Green Eggs and Ham: For Soprano,
Boy Soprano, and Orchestra Kids')

print('627 Jon Scieszka, Lane Smith The True Story of the 3 Little Pigs
Kids;Fiction')

print('121 Vladimir Nabokov, Craig Raine Lolita

Biography;Romance;Comedy')

print('196 Chuck Palahniuk Fight Club

Comedy;Drama')

print('444 A.A. Milne, Ernest H. Shepard Winnie-the-Pooh (Winnie-the-Pooh, #1)

Kids;Comedy')

print('745 Jenny Lawson Lets Pretend This Never Happened: A Mostly True
Memoir Biography;Comedy')

ratings =
pd.read_csv(r'C:/Users/TINKU/Downloads/IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)/IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master/CustomData/ratings.csv')

ratings=ratings[1:100000]

user_id = 60000

31
rating_count = len(ratings['user_id'])+1

print(user_id)

print('\n----------------Welcome User '+str(user_id)+'-------------------')

print('\nPlease Rate 5 books from the above list.')

for x in range(0,5):

print("\n")

bookId=input("BookId:")

rating=input("Rating:")

ratings.loc[rating_count]= [user_id,bookId,rating]

rating_count =rating_count+1

h = hybrid(user_id,ratings)

print("------------------------------Welcome to the Book Recommendation Engine----------------------

-----\n")

user=input("1. Book Recommendation for New User. \n2. Book Recommendation for Existing
User.\n")

if user=='1':

newUser()

elif user=='2':

ratings =
pd.read_csv(r'C:/Users/TINKU/Downloads/IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)/IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master/CustomData/ratings.csv')

ratings=ratings[1:100000]

#taking only the first 100000 ratings

32
userId=int(input("\nPlease Enter User Id: "))

print('\n----------------Welcome User'+str(userId)+'-------------------')

h = hybrid(userId,ratings)

else:

print("Invalid option\n ")

33
CHAPTER 6

METHODOLOGY

This section explain in detail the process of implementing web based book recommender system.It
discusses about how the filtering methods are used one after the other to get output.We have
implemented three different algorithms to build an efficient recommendation system.

6.1 POPULARITY MODEL

As the name suggests Popularity based recommendation system works with the trend. It basically
uses the items which are in trend right now. For example, if any product which is usually bought
by every new user then there are chances that it may suggest that item to the user who just signed
up.

The basic idea behind this recommender is that movies that are more popular and more critically
acclaimed will have a higher probability of being liked by the average audience.

From the ratings matrix, average ratings and rating count for each book is calculated. Then,
weighted rating formula is used to construct a chart. Mathematically,it is represented as follows:

Weighted Rating(WR)=(v/(v+m)*R)+(m/(v+m)*C)

Where, v is the number of ratings for the book

m is the minimum ratings required to be listed in the chart

R is the average rating of the movie

C is the mean rating across the whole report

34
6.2 CONTENT BASED FILTERING

Content Based Recommendation algorithm takes into account the likes and dislikes of the user and
generates a User Profile. For generating a user profile, we take into account the item profiles(
vector describing an item) and their corresponding user rating. The user profile is the weighted
sum of the item profiles with weights being the ratings user rated. Once the user profile is
generated, we calculate the similarity of the user profile with all the items in the dataset, which is
calculated using cosine similarity between the user profile and item profile.

Advantages of Content Based approach is that data of other users is not required and the
recommender engine can recommend new items which are not rated currently, but the
recommender algorithm doesn’t recommend the items outside the category of items the user has
rated.

The content based recommenders are built using:

35
1.Authors

2.Genres

The content-based filtering algorithm finds the cosine of the angle between the profile vector and
item vector, i.e. cosine similarity. Cosine Similarity is used to calculate a numeric quantity that
denotes the similarity between two books.

6.3 COLLABORATIVE FILTERING

One of the most commonly used techniques for developing recommendation engines is
Collaborative Filtering. It has been used for years by the researchers for implementing
recommender systems. Collaborative filtering is a technology to recommend items based on
similarity. Collaborative Filtering, also known as social information filtering is based on the
principle of finding a subset of users who have similar taste and preferences to that of the active
user, and offering recommendations based on that subset of users. The idea is that given an active
user, u, compute her n similar users {u1, u2, … un} and predict u’s preference based on the
preferences of {u1, u2, … un}. Similar users mean users who share the same kind of tastes and
preferences over items. The basic idea behind collaborative filtering is that users who agreed on
the past tend to agree on the future also.

Collaborative Filtering works based on the following assumptions:

 Users with similar interest have common preferences and vice versa.
 Sufficiently huge number of user preferences is available.

36
There are different types of collaborating filtering techniques

1. user-based collaborative filtering

2. item-based collaborative filtering

User based collaborative filtering:

This algorithm first finds the similarity score between users. Based on this similarity score, it then
picks out the most similar users and recommends products which these similar users have liked or
bought previously.

37
The prediction of an item for a user u is calculated by computing the weighted sum of the user
ratings given by other users to an item i.

The prediction Pu,i is given by:

 Pu,i is the prediction of an item

 Rv,i is the rating given by a user v to a book i
 Su,v is the similarity between users

Item based collaborative filtering:

In this algorithm, we compute the similarity between each pair of items.

In item-based model , it is assumed that books that are often read together by some users tend to
be similar and are more likely to be read in future also by some other user.

38
we will find the similarity between each book and based on that, we will recommend similar books
which are liked by the users in the past. This algorithm works similar to user-user collaborative
filtering with just a little change – instead of taking the weighted sum of ratings of “user-
neighbors”, we take the weighted sum of ratings of “item-neighbors”. The prediction is given by:

Now we will find the similarity between items.

6.4 HYBRID RECOMMENDATION SYSTEM

Most recommender systems now use a hybrid approach, combining collaborative filterin, content-
based filtering, and other approaches . There is no reason why several different techniques of the
same type could not be hybridized. Hybrid approaches can be implemented in several ways: by
making content-based and collaborative-based predictions separately and then combining them;
by adding content-based capabilities to a collaborative-based approach (and vice versa); or by
unifying the approaches into one model . Several studies that empirically compare the performance

39
of the hybrid with the pure collaborative and content-based methods and demonstrated that the
hybrid methods can provide more accurate recommendations than pure approaches..

Hybrid approaches can be implemented by making content-based and collaborative-based

predictions separately and then combining them.

A system that combines content-based filtering and collaborative filtering could potentially take
advantage from both the representation of the content as well as the similarities among users. One
approach to combine collaborative and content-based filtering is to make predictions based on a
weighted average of the content-based recommendations and the collaborative recommendations.
Various means of doing so are:

 Combining item scores

 Combining item ranks

We combine the ratings from Popularity model, Content based filtering and Collaborative filtering
to get more accurate results.It gives the predicted rating as weighted combination of the above
described methods. Equal weigths have been given to collaborative and content rating.

Rhybrid= (1-2a)Rpopularity+aRcollaborative+a*Rcontent (where a=0.4)

40
FUTURE SCOPE

Recommender system has been an active area of research for a decade or so and continues to be
an interesting research domain. Although recommender systems has witnessed unprecedented
improvements strating from very primiltive content based and collaborative filtering methods, a
lot of research is going on to further enhance the output accuracy and improvements in all
dimensions of recommender system. The search is focused on various areas to make the RS more
and more useable and practical in real life scenarios. The following are some of the areas of RS
where there are intense research going on and these efforts are surely shaping the future of
recommender systems

1.Privacy

Privacy preserving RS is one of the major challenges towards developing a practical RS. There are
various real life situations where getting input data is not easy and at times extremely difficult for
the recommender system to make a reliable recommendation. There are various reasons for that.
In the case of systems like medical recommendation system, availability of input data are in sparse
as medical history is often treated as personal, confidential information. As a result of these,
developing a reliable medical recommender system or any such system which requires data that is
considered to be private and confidential is extremely difficult. In [26], an approach towards
privacy preserving RS is detailed that makes use of Homomorphic cryptography to achieve the
same.

2.Recommendation list diversity

Most research into recommending items has been towards the accuracy of predicted ratings. There
are also other factors those have been identified as important to users. One such factor is the
diversity of items in the recommendation list. In a user survey aimed at evaluating the effect of
diversification on user satisfaction, it is found that it had a positive effect on overall satisfaction
even though accuracy of the recommendations was affected adversely. There is a great need for a
shift in focus that is related to the functionality offered by recommender systems that can exploit
directly the usage data, and add more value to the user.

41
3.Dynamics in user interest

Human beings have varied interest and most importantly this interest is dynamic. Recommender
system needs to adapt to this dynamism. Most personalization systems tend to use a static profile
of the user. However user interests are not static, changing with time and context. Few systems
have attempted to handle the dynamics within the user profile. The behavior of users varies over
time and it should affect the construction of models. A Recommender system should be able to
adapt to the user’s behavior, when this changes.

4.Data sparseness and cold start

In many of the practical dataset, it has been found that data sparseness is a major issue, many of
the recommender algorithms makes this issue worse and cold start problem is becoming a deterrent
for the RS usage. There are many research initiatives towards eliminating data sparseness using
singular value decomposition.

5.Scalable RS

Although the accuracy of RS is being enhanced, the computing requirements are also becoming
more and more complex. Scalable RS has become an impending need towards practical use-case
scenarios and indeed an area of focused research.

6.Collaborative RS

Recommender systems need to collaborate among themselves in order to increase the accuracy of
RS and also increase the scope of RS. These collaborative RSs would be linked to each other over
a simplified but standard interface and would be complementary to each other.

42
RELATED WORK

In this section we review some of the works related to various approaches to Recommender System
(RS). A lot of work has been done in the area of recommender system in general. The collaborative
filtering, content based and Hybrid approaches and the issues of Recommender System are
explained in the survey done by Adomavicius . The new algorithm for increasing the accuracy of
collaborative Filtering is discussed by Herlocker. A new filtering technique combing collaborative
Filtering and Content Based filter is Hybrid method. Context aware RS is discussed by G.
Adomavicius and Alexander Tuzhilin in. Their work details about modeling contextual
information in recommendation system. They also describe contextual pre-filtering, post-filtering
and Contextual modeling.They mentioned about possibility of combining post-filter, pre-filtering
and contextual modeling in order to achieve higher accuracy in RS output. Matthias, Gernot Bauer
explore the design space of RS for mobile applications and describe different dimensions and
techniques for capturing the users, the items, the contexts etc. Sofiane Abbar, Mokrane and
Stephane, present an approach based on data personalization and Personal Access Model that
provides a set of personalization services. Daniar discusses different types of traditional
approaches as well as modern approaches to RS. They also discussed the main challenges in RS.
Chang E, Thomson P et. al in, discusses the dynamic and fuzzy nature of trust and their impact.
John O’Donovan & Barry Smyth discusses the impact of trust in Recommender System. Different
computational models for trust are also discussed.

43
SCREENSHOTS

44
45
REFERENCES

Francesco Ricci and LiorRokach and Bracha Shapira, Introduction to Recommender Systems
Handbook, Recommender Systems Handbook, Springer, 2011, pp. 1-35

^ "Facebook, Pandora Lead Rise of Recommendation Engines - TIME". TIME.com. 27 May

2010. Retrieved 1 June 2015.

^ Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Bosagh
Zadeh WTF:The who-to-follow system at Twitter, Proceedings of the 22nd international
conference on World Wide Web

^ H. Chen, A. G. Ororbia II, C. L. Giles ExpertSeer: a Keyphrase Based Expert Recommender

for Digital Libraries, in arXiv preprint 2015

^ H. Chen, L. Gou, X. Zhang, C. Giles Collabseer: a search engine for collaboration discovery,
in ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2011

^ Alexander Felfernig, Klaus Isak, Kalman Szabo, Peter Zachar, The VITA Financial Services
Sales Support Environment, in AAAI/IAAI 2007, pp. 1692-1699, Vancouver, Canada, 2007.

^ HoseinJafarkarimi; A.T.H. Sim and R. Saadatdoost A Naïve Recommendation Model for

Large Databases, International Journal of Information and Education Technology, June 2012

^ Prem Melville and Vikas Sindhwani, Recommender Systems, Encyclopedia of Machine

Learning, 2010.

^ R. J. Mooney & L. Roy (1999). Content-based book recommendation using learning for text
categorization. In Workshop Recom. Sys.: Algo. and Evaluation.

^ Jump up to:a b Rubens, Neil; Elahi, Mehdi; Sugiyama, Masashi; Kaplan, Dain
(2016). "Active Learning in Recommender Systems". In Ricci, Francesco; Rokach, Lior;

46
Shapira, Bracha (eds.). Recommender Systems Handbook (2 ed.). Springer
US. doi:10.1007/978-1-4899-7637-6_24. ISBN 978-1-4899-7637-6.

^ Jump up to:a b Elahi, Mehdi; Ricci, Francesco; Rubens, Neil (2016). "A survey of active
learning in collaborative filtering recommender systems". Computer Science Review. 20: 29–
50. doi:10.1016/j.cosrev.2016.05.002.

^ Andrew I. Schein, AlexandrinPopescul, Lyle H. Ungar, David M. Pennock (2002). Methods

and Metrics for Cold-Start Recommendations. Proceedings of the 25th Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR 2002). New York City, New York: ACM. pp. 253–260. ISBN 1-58113-561-0.
Retrieved 2008-02-02.

^ Karlgren, Jussi. 1990. "An Algebra for Recommendations." Syslab Working Paper 179
(1990).

^ Karlgren, Jussi. "Newsgroup Clustering Based On User Behavior-A Recommendation

Algebra." SICS Research Report (1994).

^ Karlgren, Jussi (October 2017). "A digital bookshelf: original work on recommender
systems". Retrieved 27 October 2017.

^ Shardanand, Upendra, and Pattie Maes. "Social information filtering: algorithms for
automating “word of mouth”." In Proceedings of the SIGCHI conference on Human factors
in computing systems, pp. 210-217. ACM Press/Addison-Wesley Publishing Co., 1995.

^ Hill, Will, Larry Stead, Mark Rosenstein, and George Furnas. "Recommending and
evaluating choices in a virtual community of use." In Proceedings of the SIGCHI conference
on Human factors in computing systems, pp. 194-201. ACM Press/Addison-Wesley
Publishing Co., 1995.

^ Resnick, Paul, NeophytosIacovou, Mitesh Suchak, Peter Bergström, and John Riedl.
"GroupLens: an open architecture for collaborative filtering of netnews." In Proceedings of

47
the 1994 ACM conference on Computer supported cooperative work, pp. 175-186. ACM,
1994.

^ Resnick, Paul, and Hal R. Varian. "Recommender systems." Communications of the ACM
40, no. 3 (1997): 56-58.

^ Montaner, M.; Lopez, B.; de la Rosa, J. L. (June 2003). "A Taxonomy of Recommender
Agents on the Internet". Artificial Intelligence Review. 19 (4): 285–
330. doi:10.1023/A:1022850703159..

^ Jump up to:a b Adomavicius, G.; Tuzhilin, A. (June 2005). "Toward the Next Generation of
Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions". IEEE
Transactions on Knowledge and Data Engineering. 17 (6): 734–
749. CiteSeerX 10.1.1.107.2790. doi:10.1109/TKDE.2005.99..

^ Herlocker, J. L.; Konstan, J. A.; Terveen, L. G.; Riedl, J. T. (January 2004). "Evaluating
collaborative filtering recommender systems". ACM Trans. Inf. Syst. 22 (1): 5–
53. CiteSeerX 10.1.1.78.8384. doi:10.1145/963770.963772..

^ Jump up to:a b c Beel, J.; Genzmehr, M.; Gipp, B. (October 2013). "A Comparative Analysis
of Offline and Online Evaluations and Discussion of Research Paper Recommender System
Evaluation" (PDF). Proceedings of the Workshop on Reproducibility and Replication in
Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference
(RecSys).

^ Beel, J.; Langer, S.; Genzmehr, M.; Gipp, B.; Breitinger, C. (October 2013). "Research
Paper Recommender System Evaluation: A Quantitative Literature
Survey" (PDF). Proceedings of the Workshop on Reproducibility and Replication in
Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference
(RecSys).

48
^ Beel, J.; Gipp, B.; Langer, S.; Breitinger, C. (26 July 2015). "Research Paper Recommender
Systems: A Literature Survey". International Journal on Digital Libraries. 17 (4): 305–
338. doi:10.1007/s00799-015-0156-0.

^ Waila, P.; Singh, V.; Singh, M. (26 April 2016). "A Scientometric Analysis of Research in
Recommender Systems" (PDF). Journal of Scientometric Research. 5: 71–
84. doi:10.5530/jscires.5.1.10.

^ Stack, Charles. "System and method for providing recommendation of goods and services
based on recorded purchasing history." U.S. Patent 7,222,085, issued May 22, 2007.

^ Herz, Frederick SM. "Customized electronic newspapers and advertisements." U.S. Patent
7,483,871, issued January 27, 2009.

^ Herz, Frederick, Lyle Ungar, Jian Zhang, and David Wachob. "System and method for
providing access to data using customer profiles." U.S. Patent 8,056,100, issued November 8,
2011.

^ Harbick, Andrew V., Ryan J. Snodgrass, and Joel R. Spiegel. "Playlist-based detection of
similar digital works and work creators." U.S. Patent 8,468,046, issued June 18, 2013.

49
50

Movie Recommendation Final Project Report
No ratings yet
Movie Recommendation Final Project Report
50 pages
Recommendation System Final
No ratings yet
Recommendation System Final
16 pages
Movie Recommendation System
100% (3)
Movie Recommendation System
41 pages
Movie Recommender System
No ratings yet
Movie Recommender System
47 pages
Book Recommendation System
No ratings yet
Book Recommendation System
1 page
Movie Recommendations
No ratings yet
Movie Recommendations
35 pages
Movie Recommender System: Shekhar 20BCS9911 Sanya Pawar 20BCS9879 Tushar Mishra 20BCS9962
No ratings yet
Movie Recommender System: Shekhar 20BCS9911 Sanya Pawar 20BCS9879 Tushar Mishra 20BCS9962
27 pages
Recommendation System
No ratings yet
Recommendation System
17 pages
Movie Recommender System PDF
100% (1)
Movie Recommender System PDF
5 pages
Recommender Systems
No ratings yet
Recommender Systems
6 pages
Online Book Recommendation System
No ratings yet
Online Book Recommendation System
7 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
Content Based Movie Recommendation System by Python
No ratings yet
Content Based Movie Recommendation System by Python
44 pages
Movie Recommendation System-1
No ratings yet
Movie Recommendation System-1
25 pages
Movie Recommendation System Using Simple Recommender-Based Approach
No ratings yet
Movie Recommendation System Using Simple Recommender-Based Approach
4 pages
Book Recommendation System Using Machine Learning
100% (1)
Book Recommendation System Using Machine Learning
3 pages
Movie Recommendation System: CSN-382 Project
No ratings yet
Movie Recommendation System: CSN-382 Project
25 pages
Final Year Project (Product Recommendation)
No ratings yet
Final Year Project (Product Recommendation)
33 pages
Music Recommender System
No ratings yet
Music Recommender System
69 pages
Movie Recommendation System Presentation
No ratings yet
Movie Recommendation System Presentation
15 pages
Movies Recommendation System Using Cosine Similarity
No ratings yet
Movies Recommendation System Using Cosine Similarity
5 pages
Semantics-Aware Content-Based Recommender Systems
No ratings yet
Semantics-Aware Content-Based Recommender Systems
10 pages
Recommendation System
No ratings yet
Recommendation System
11 pages
ResearchPaperRecommenderSystems ALiteratureSurvey Preprint
No ratings yet
ResearchPaperRecommenderSystems ALiteratureSurvey Preprint
70 pages
PR3215 - Movie - Recommendation - System-Report - PAVAN KUMAR P B
No ratings yet
PR3215 - Movie - Recommendation - System-Report - PAVAN KUMAR P B
30 pages
Online Book Recommendation System
100% (1)
Online Book Recommendation System
21 pages
Final Report
100% (1)
Final Report
20 pages
Movie Recommendation Report - A
0% (1)
Movie Recommendation Report - A
33 pages
Internship Report
No ratings yet
Internship Report
26 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
46 pages
Movie Recommender System Using Genetic Algorithm
No ratings yet
Movie Recommender System Using Genetic Algorithm
8 pages
Movie Recommendation System PDF
No ratings yet
Movie Recommendation System PDF
48 pages
Developing A Personalized E-Learning and MOOC Recommender System in IoT-Enabled Smart Education
No ratings yet
Developing A Personalized E-Learning and MOOC Recommender System in IoT-Enabled Smart Education
19 pages
Movie Recomendation: A Project Report o
No ratings yet
Movie Recomendation: A Project Report o
15 pages
Personalized Movie Database System
No ratings yet
Personalized Movie Database System
15 pages
Topic:-Product Recommendation System Using Machine Learning
No ratings yet
Topic:-Product Recommendation System Using Machine Learning
26 pages
Study On Movie Recommendation System Using Machine Learning
No ratings yet
Study On Movie Recommendation System Using Machine Learning
4 pages
Crop Recommendation System Using ML
No ratings yet
Crop Recommendation System Using ML
11 pages
Online Recommendation System
No ratings yet
Online Recommendation System
42 pages
Report Final Project
100% (1)
Report Final Project
43 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
57 pages
An Online Voting System Using Biometric Fingerprint and Aadhaar Card PDF
No ratings yet
An Online Voting System Using Biometric Fingerprint and Aadhaar Card PDF
6 pages
Medicinal Drug Recommendation System
No ratings yet
Medicinal Drug Recommendation System
52 pages
A Movie Recommendation System Based On A Convolutional Neural Network
No ratings yet
A Movie Recommendation System Based On A Convolutional Neural Network
13 pages
Movie Recommendation System Using AI & ML
No ratings yet
Movie Recommendation System Using AI & ML
22 pages
Movi3 Recommender System
No ratings yet
Movi3 Recommender System
15 pages
Recommender Systems: A Project Report Submitted in Partial Fulfillment of Requirement For The Award in The Degree of
No ratings yet
Recommender Systems: A Project Report Submitted in Partial Fulfillment of Requirement For The Award in The Degree of
33 pages
Grocery Shopping Android
No ratings yet
Grocery Shopping Android
3 pages
A Project-Based Seminar Report On Movie Rating Prediction System
100% (2)
A Project-Based Seminar Report On Movie Rating Prediction System
21 pages
Farmers Buddy
No ratings yet
Farmers Buddy
114 pages
Movie Website
No ratings yet
Movie Website
54 pages
b3 Plant Leaf Disease Detection
No ratings yet
b3 Plant Leaf Disease Detection
62 pages
Movie Recommendation System Using Content Based Filtering Ijariie14954
No ratings yet
Movie Recommendation System Using Content Based Filtering Ijariie14954
16 pages
Final Report
No ratings yet
Final Report
79 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
5 pages
Management Science - Important Questions
No ratings yet
Management Science - Important Questions
6 pages
Project Report On Flight Price Predication Using ML Techniques
No ratings yet
Project Report On Flight Price Predication Using ML Techniques
23 pages
Web Mining Project Document Final
No ratings yet
Web Mining Project Document Final
40 pages
ML Report 20.1
No ratings yet
ML Report 20.1
29 pages
B.E Cse Batchno 46
No ratings yet
B.E Cse Batchno 46
58 pages
IZND Services (50 Most Admired Companies in 2019)
No ratings yet
IZND Services (50 Most Admired Companies in 2019)
2 pages
VW Gol 2005 Rear Axle Listado de Partes
No ratings yet
VW Gol 2005 Rear Axle Listado de Partes
6 pages
FMCG Sales Distribution
No ratings yet
FMCG Sales Distribution
8 pages
Project Report On CN
No ratings yet
Project Report On CN
42 pages
14-Mescom - Annexure - 4-1 PDF
No ratings yet
14-Mescom - Annexure - 4-1 PDF
39 pages
R Rec F.1668 1 200701 I!!pdf e
No ratings yet
R Rec F.1668 1 200701 I!!pdf e
14 pages
Mtinv PDF
No ratings yet
Mtinv PDF
50 pages
Prediction of NOx Emissions in Recovery Boilers PDF
No ratings yet
Prediction of NOx Emissions in Recovery Boilers PDF
14 pages
Preface: Industrial Training Report
No ratings yet
Preface: Industrial Training Report
10 pages
Aiche 174855 Feedstock Contaminants: A Case Study: Patricio Herrera
No ratings yet
Aiche 174855 Feedstock Contaminants: A Case Study: Patricio Herrera
13 pages
Close Coiled Helical Spring
75% (4)
Close Coiled Helical Spring
4 pages
Ese 123 Syllabus
No ratings yet
Ese 123 Syllabus
3 pages
JDBC Program List: 1. Using JDBC, Create Table Called CONTACT - INFO With The Following Fields
No ratings yet
JDBC Program List: 1. Using JDBC, Create Table Called CONTACT - INFO With The Following Fields
3 pages
02 Number Systems
No ratings yet
02 Number Systems
52 pages
P72.1 - 138kv Substation Equipment
No ratings yet
P72.1 - 138kv Substation Equipment
7 pages
Backup 1
No ratings yet
Backup 1
7 pages
Tagum National Trade School Apokon, Tagum City Work Immersion
No ratings yet
Tagum National Trade School Apokon, Tagum City Work Immersion
46 pages
22 MM Push Button Specifications: Technical Data
No ratings yet
22 MM Push Button Specifications: Technical Data
124 pages
Software Tester Quality Assurance Analyst in Minneapolis ST Paul MN Resume William Heptig
No ratings yet
Software Tester Quality Assurance Analyst in Minneapolis ST Paul MN Resume William Heptig
3 pages
Advert - Industrial Practical Training
100% (1)
Advert - Industrial Practical Training
2 pages
Packaging List Last Updated 21/03/2023
No ratings yet
Packaging List Last Updated 21/03/2023
30 pages
MST6M181VS PDF
0% (1)
MST6M181VS PDF
4 pages
S2 N Ku NOn 14 C TCOg 9
No ratings yet
S2 N Ku NOn 14 C TCOg 9
5 pages
PM 1F (250 Hour Interval) : 330D L Excavator Mwp03102
100% (1)
PM 1F (250 Hour Interval) : 330D L Excavator Mwp03102
1 page
Get Rich Stay Rich Pass It On The Wealthaccumulation Secrets Of Americas Richest Families Hardcover Catherine S Mcbreen George H Walper instant download
No ratings yet
Get Rich Stay Rich Pass It On The Wealthaccumulation Secrets Of Americas Richest Families Hardcover Catherine S Mcbreen George H Walper instant download
28 pages
As 12 To SCR G&SR
No ratings yet
As 12 To SCR G&SR
12 pages
Partex Catalog
No ratings yet
Partex Catalog
199 pages
Lab Manual (Fdy-2)
No ratings yet
Lab Manual (Fdy-2)
57 pages
ASTDM D2000 Elastromer Specification
No ratings yet
ASTDM D2000 Elastromer Specification
3 pages
As 1235-2000 Road Vehicles - Roof Load Carriers - Roof Bars
No ratings yet
As 1235-2000 Road Vehicles - Roof Load Carriers - Roof Bars
8 pages

Final

Uploaded by

Final

Uploaded by

WEB BASED BOOK RECOMMENDER SYSTEM

Under the esteemed guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM UNIVERSITY OF TECHNOLOGY

Date: Signature of HOD

Registration no:(1210316358) Name:V.Sai Alekhya Signature:

Registration no:(1210316330) Name:M.Vinay Signature:

Registration no:(1210316356) Name:T.Rahul Signature:

Registration no:(1210316343) Name:N.Shouri Signature:

GITAM INSTITUTE OF TECHNOLOGY

PROJECT GUIDE PROJECT REVIEWER

(Mrs P.Saraswathi) Dr Praveen Kumar.S Dr T.Srinivasa Rao

(Asst.professor) (Asst.professor) (Assoc.professor)

We express my deepest thanks to P.SARASWATHI(Asst.Professor) for taking part of the project

1.2 PROBLEM STATEMENT ...................................................................................................................... 3

3.1.1 CONTENT BASED FILTERING......................................................................................................... 8

4.1.1 USE CASE ................................................................................................................................... 18

6.2 CONTENT BASED FILTERING ............................................................................................................. 35

6.3 COLLABORATIVE FILTERING .............................................................................................................. 36

6.4 HYBRID RECOMMENDATION SYSTEM ............................................................................................ 39

FUTURE SCOPE ...................................................................................................................................... 41

Figure 1: Block Diagram Of Recommender System .................................................................... 7

“Enhancing Performance of Recommender Systems” deal with improving performance of

3.1 RECOMMENDATION FILTERING TECHNIQUES

Pros and cons of content based filtering

3.1.2 COLLABORATIVE FILTERING TECHNIQUE

Pros and cons of collaborative filtering

Cold start problem

Data sparsity problem

This is another problem associated with recommendation algorithms because computation

3.1.3 HYBRID FILTERING TECHNIQUE

3.2 SOFTWARE REQUIREMENTS:

IDLE(Integrated Development and Learning Environment) is an Integrated Development

4.1 UML DIAGRAMS

4.1.2 CLASS DIAGRAM

Python is an interpreted, high-level, general purpose programming language. Python’s design

import matplotlib.pyplot as plt

import seaborn as sns

import warnings; warnings.simplefilter('ignore')

from scipy import stats

from ast import literal_eval

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

from nltk.stem.snowball import SnowballStemmer

from nltk.stem.wordnet import WordNetLemmatizer

from nltk.corpus import wordnet

from surprise import Reader, Dataset, SVD, evaluate

def __init__ (self,user_id,ratings):

print(ratings[(ratings['user_id'] == user_id)][['user_id','book_id', 'rating']])

self.collaborative_rating = self.collaborative(self.ratings, self.user_id)

self.final_hybrid(self.md, self.popularity_rating , self.collaborative_rating,

qualified = md[(md['ratings_count'].notnull())][['book_id','title', 'authors', 'ratings_count',

return (v/(v+m) * R) + (m/(m+v) * C)

qualified['popularity_rating'] = qualified.apply(weighted_rating, axis=1)

data = Dataset.load_from_df(temp_ratings[['user_id', 'book_id', 'rating']], reader)

evaluate(svd, data, measures=['RMSE', 'MAE'])

from collections import defaultdict

for uid, iid, true_r, est, _ in predictions:

cb = temp_ratings[(temp_ratings['user_id'] == user_id)][['book_id', 'rating']]

md['authors'] = md['authors'].str.replace(' ','')

md['authors'] = md['authors'].str.replace(',',' ')

md['authors'] = md['authors'].apply(lambda x: [x,x])

md['soup'] = md['authors'] + md['Genres']

md['soup'] = md['soup'].str.join(' ')

cosine_sim = cosine_similarity(count_matrix, count_matrix)

#taking only the first 100000 ratings to build user_profile

frames = [num, num1]

content_rating = pd.concat(frames, axis =1,join_axes=[num.index])

def final_hybrid(self,md, popularity_rating , collaborative_rating, content_rating, user_id):

title = md[['book_id','title', 'Genres']]

hyb = hyb.merge(title,on = 'book_id')

hyb = hyb.merge(self.popularity_rating, on='book_id')

hyb = hyb.merge(self.content_rating, on='book_id')

return 0.4*v + 0.2*R + 0.4 * c

hyb['hyb_rating'] = hyb.apply(weighted_rating, axis=1)

hyb = hyb.sort_values('hyb_rating', ascending=False).head(999)

def init (self,user_id,ratings):

return 0.4v + 0.2R + 0.4 * c

Rhybrid= (1-2a)Rpopularity+aRcollaborative+a*Rcontent (where a=0.4)