Final
Final
A Mini project report submitted in partial fulfillment of the requirements for the award of
the degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
V.SAI ALEKHYA(1210316358)
M.VINAY(1210316330)
T.RAHUL(1210316356)
N.SHOURI(1210316343)
GITAM
(Deemed to be University)
VISAKHAPATNAM
i
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GITAM
(Deemed to be university)
DECLARATION
We hereby declare that the project review entitled WEB BASED BOOK RECOMMENDER
SYSTEM is an original work done in the Department of computer science and engineering,Gitam
institute of technology,GITAM(Deemed to be university)submitted in partial fulfillment of the
requirements for the award of the degree of B.Tech in Computer Science and Engineering.The
work has not been submitted to any other college or University for the award of any degree or
Diploma
ii
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GITAM
(Deemed to be University)
CERTIFICATE
This is to certify that the Mini-Project Report entitled WEB BASED RECOMMENDER
SYSTEM is bonafide record of work carried by V.SAL ALEKHYA(1210316358),
M.VINAY(1210316330), N.SHOURI(1210316343), T.RAHUL(1210316356) submitted in
partial fulfillment of requirement for the award of the degree of Bachelor of Technology in
Computer science and Engineering.
iii
ABSTRACT
The online recommendation system has become a trend.Now a days rather than going out and
buying items for themselves, online recommendation provides an easier and quicker way to buy
items and transactions are also quick when it is done online. Recommended systems are powerful
new technology and it helps users to find items which they want to buy.On the Internet ,where the
number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant
information in order to alleviate the problem of information overload, which has created a potential
problem to many internet users. Recommender systems solve this problem by searching through
large volume of dynamically generated information to provide users with personalized content and
services.We explore the different characteristics and potentials of different prediction techniques
in recommendation systems in order to serve as a compass for research and practice in the field of
recommendation systems.In this project, we attempt to understand the different kinds of
recommendation systems such as popularity model, Content based filtering and Collaborative
filtering and compare their performance on the goodbooks dataset. We combine the ratings from
popularity model, Content based filtering and Collaborative filtering to get more accurate results.
iv
ACKNOWLEDGEMENT
This Mini-Project which we had was a great chance for Learning and also for Professional
Development. Therefore,we consider ourself as a lucky individual as we were provided an
opportunity to be a part of it.
We would also like to thank the project Review Faculty Mr S.Praveen Kumar (Asst.professor) and
for reviewing the project every week and explaining our mistakes and setting our project in the
correct path and finally who are the helping hands in the successful completion of the Project.
We would like to express deepest thanks to Dr. Konala Thammi Reddy sir, HOD of
CSE Department, GIT for giving us a great opportunity to complete the Project Successfully. We
choose this moment to acknowledgement his contribution gratefully.
We would also like to thank B.Rajesh sir,(AMC) who motivated us and also explained about
the importance of the Project in student life.
We perceive this opportunity as a big milestone in our career development. We still strive to use
gained skills and knowledge in the best possible way, and We’ll continue to work on the
improvement in order to attain desired career objectives. Hope to continue cooperation with all of
you in future.
v
TABLE OF CONTENTS
ABSTRACT ................................................................................................................................................ iv
CHAPTER 1 ................................................................................................................................................ 1
INTRODUCTION....................................................................................................................................... 1
1.1 MOTIVATION....................................................................................................................................... 2
CHAPTER 2 ................................................................................................................................................ 4
LITERATURE SURVEY ........................................................................................................................... 4
CHAPTER 3 ................................................................................................................................................ 7
ANALYSIS .................................................................................................................................................. 7
3.1 RECOMMENDATION FILTERING TECHNIQUES.................................................................................... 7
CHAPTER 4 .............................................................................................................................................. 17
DESIGN ..................................................................................................................................................... 17
4.1 UML DIAGRAMS ................................................................................................................................ 17
vi
RELATED WORK ................................................................................................................................... 43
SCREENSHOTS ....................................................................................................................................... 44
REFERENCES .......................................................................................................................................... 46
vii
LIST OF FIGURES
viii
CHAPTER 1
INTRODUCTION
A Recommender system is a type of information filtering system that predict the rating or
preference that a user would give to an item. Recommender systems are sometimes referred to as
recommendation system.
constantly expect good recommendations. They have a low threshold for services that are not able
to make appropriate suggestions. If a music streaming app is not able to predict and play
Recommendation system describes web applications that predicted response to options. According
to, Recommender Systems are targeted to individuals who do not have enough personal experience
to evaluate the potentially overwhelming alternatives that a web site for instance may offer.
Recommender systems often provide personalized recommendation of items (to users) using a
ranked list of predicted items. According to individuals rely on recommendations provided by
others in making routine daily decisions. As examples, it is common to rely on peer
recommendations when selecting a book to read; employers use recommendation letters in their
recruiting decisions;
Due to the advances in recommender systems, users music that the user likes, then the user will
simply stop using it. This has led to a high emphasis by tech companies on improving their
recommendation systems. However, the problem is more complex than it seems.
Every user has different preferences and likes. In addition, even the taste of a single user can vary
depending on a large number of factors, such as mood, season, or type of activity the user is doing.
For example, the type of music one would like to hear while exercising differs greatly from the
type of music he’d listen to when cooking dinner. Another issue that recommendation systems
have to solve is the exploration vs exploitation problem. They must explore new domains to
discover more about the user, while still making the most of what is already known about of the
user. Two main approaches are widely used for recommender systems. One is content-based
filtering, where we try to profile the users interests using information collected, and recommend
1
items based on that profile. The other is collaborative filtering, where we try to group similar users
together and use information about the group to make recommendations to the user
1.1 MOTIVATION
The explosive growth in the amount of available digital information and the number of visitors to
the Internet have created a potential challenge of information overload which hinders timely access
to items of interest on the Internet. Information retrieval system such as Google, Devil Finder and
Altavista have partially solved this problem but prioritization and personalization (where a system
maps available content to user’s interests and preferences) of information were absent. This has
increased the demand for recommender systems more than ever before. Recommender systems are
information filtering systems that deal with the problem of information overload by filtering vital
information fragment out of large amount of dynamically generated information according to
user’s preferences, interest, or observed behavior about item. Recommender system has the ability
to predict whether a particular user would prefer an item or not based on the user’s profile.
In e-commerce setting, recommender systems enhance revenues, for the fact that they are effective
means of selling more products. In scientific libraries, recommender systems support users by
allowing them to move beyond catalog searches. Therefore, the need to use efficient and accurate
recommendation techniques within a system that will provide relevant and dependable
recommendations for users cannot be over-emphasized.
2
1.2 PROBLEM STATEMENT
3
CHAPTER 2
LITERATURE SURVEY
Recommender system is defined as a decision making strategy for users under complex
information environments. Also, recommender system was defined from the perspective of E-
commerce as a tool that helps users search through records of knowledge which is related to users’
interest and preference. Recommender system was defined as a means of assisting and augmenting
the social process of using recommendations of others to make choices when there is no sufficient
personal knowledge or experience of the alternatives. Recommender systems handle the problem
of information overload that users normally encounter by providing them with personalized,
exclusive content and service recommemdations. Recently, various approaches for building
recommendation systems have been developed, which can utilize either collaborative filtering,
content-based filtering or hybrid filtering. Collaborative filtering technique is the most mature and
the most commonly implemented. Collaborative filtering recommends items by identifying other
users with similar taste; it uses their opinion to recommend items to the active user. Collaborative
recommender systems have been implemented in different application areas. GroupLens is a news-
based architecture which employed collaborative methods in assisting users to locate articles from
massive news database. Ringo is an online social information filtering system that uses
collaborative filtering to build users profile based on their ratings on music albums. Amazon uses
topic diversification algorithms to improve its recommendation. The system uses collaborative to
overcome scalability issue by generating a table of similar items offline through the use of item-
to-item matrix.
The system then recommends other products which are similar online according to the users’
purchase history. On the other hand, content-based techniques match content resources to user
characteristics. Content-based filtering techniques normally base their predictions on user’s
information, and they ignore contributions from other users as with the case of collaborative
techniques. Fab relies heavily on the ratings of different users in order to create a training set and
it is an example of content-based recommender system. Some other systems that use content-based
filtering to help users find information on the Internet include Letizia.
4
The system makes use of a user interface that assists users in browsing the Internet; it is able to
track the browsing pattern of a user to predict the pages that they may be interested in. Pazzani et
al. designed an intelligent agent that attempts to predict which web pages will interest a user by
using naive Bayesian classifier. The agent allows a user to provide training instances by rating
different pages as either hot or cold. Jennings and Higuchi describe a neural networks that models
the interests of a user in a Usenet news environment.
Despite the success of these two filtering techniques, several limitations have been identified.
Some of the problems associated with content-based filtering techniques are limited content
analysis, overspecialization an sparsity of data. Also, collaborative approaches exhibit cold-start,
sparsity and scalability problems. These problems usually reduce the quality of recommendations.
In order to mitigate some of the problems identified, Hybrid filtering, which combines two or more
filtering techniques in different ways in order to increase the accuracy and performance of
recommender systems has been proposed. These techniques combine two or more filtering
approaches in order to harness their strengths while leveling out their corresponding weaknesses.
They can be classified based on their operations into weighted hybrid, mixed hybrid, switching
hybrid, feature-combination hybrid, cascade hybrid, feature-augmented hybrid and meta-level
hybrid .
Collaborative filtering and content-based filtering approaches are widely used today by
implementing content-based and collaborative techniques differently and the results of their
prediction later combined or adding the characteristics of content-based to collaborative filtering
and vice versa. Finally, a general unified model which incorporates both content-based and
collaborative filtering properties could be developed . The problem of sparsity of data and cold-
start was addressed by combining the ratings, features and demographic information about items
in a cascade hybrid recommendation technique in . In Ziegler et al. , a hybrid collaborative filtering
approach was proposed to exploit bulk taxonomic information designed for exacting product
classification to address the data sparsity problem of CF recommendations, based on the generation
of profiles via inference of super-topic score and topic diversification. A hybrid recommendation
technique is also proposed in Ghazantar and Pragel-Benett , and this uses the content-based profile
of individual user to find similar users which are used to make predictions. In Sarwar et al. ,
5
collaborative filtering was combined with an information filtering agent. Here, the authors
proposed a framework for integrating the content-based filtering agents and collaborative filtering.
A hybrid recommender algorithm is employed by many applications as a result of new user
problem of content-based filtering techniques and average user problem of collaborative filtering .
A simple and straightforward method for combining content-based and collaborative filtering was
proposed by Cunningham et al. . A music recommendation system which combined tagging
information, play counts and social relations was proposed in Konstas et al. . In order to determine
the number of neighbors that can be automatically connected on a social platform, Lee and
Brusilovsky embedded social information into collaborative filtering algorithm. A
Bayesian mixed effects model that integrates user ratings, user and item features in a single unified
framework was proposed by Condiff et al.
6
CHAPTER 3
ANALYSIS
The use of efficient and accurate recommendation techniques is very important for a system that
will provide good and useful recommendation to its individual users. This explains the importance
of understanding the features and potentials of different recommendation
techniques. Figure shows the anatomy of different recommendation filtering techniques.
7
3.1.1 CONTENT BASED FILTERING
Content-Based Filtering (CBF) is one of the traditional types of recommender systems. The root
of the content-based filtering is in information retrieval and information filtering research. In this
method, the algorithm will suggest new items to users based on user interest in the past. Content-
based filtering can be used in different recommendation systems such as news article
recommendation systems or TV program recommendation systems. The method varies partly 2 in
each of these systems. However, some fundamental concepts stay the same, like the two sets of
information that it works with:
1) a set of features that describe the items to be recommended and
2) a user profile built from past choices that the user made.
In content-based recommender systems, the descriptive attributes of items are used to make
recommendations. The term “content” refers to these descriptions. In content-based methods, the
ratings and buying behavior of users are combined with the content information available in the
items. For example, consider a situation where John has rated the movie Terminator highly, but
we do not have access to the ratings of other users. Therefore, collaborative filtering methods are
ruled out. However, the item description of Terminator contains similar genre keywords as other
science fiction movies, such as Alien and Predator. In such cases, these movies can be
recommended to John.
In content-based methods, the item descriptions, which are labeled with ratings, are used as
training data to create a user-specific classification or regression modeling problem. For each user,
the training documents correspond to the descriptions of the items she has bought or rated. The
class (or dependent) variable corresponds to the specified ratings or buying behavior. These
training documents are used to create a classification or regression model, which is specific to the
user at hand (or active user). This user-specific model is used to predict whether the corresponding
individual will like an item for which her rating or buying behavior is unknown.
8
Finally, content-based filtering will use information gained from the two sets to recommend a new
item system compares any new item with those that exist in the user’s profile . However, CB
techniques have some limitations, like the data scarcity problem. The only resource for modeling
user interest is extracting features from their browsing or purchasing history . Therefore, CB
systems are not able to identify different items that the user may enjoy, because they attempt to
find those items that are very similar to the items in the history of that user.
Content-based technique is a domain-dependent algorithm and it emphasizes more on the analysis
of the attributes of items in order to generate predictions. When documents such as web pages,
publications and news are to be recommended, content-based filtering technique is the most
successful. In content-based filtering technique, recommendation is made based on the user
profiles using features extracted from the content of the items the user has evaluated in the past.
Items that are mostly related to the positively rated items are recommended to the user.
CBF uses different types of models to find similarity between documents in order to generate
meaningful recommendations. It could use Vector Space Model such as Term Frequency Inverse
Document Frequency (TF/IDF) or Probabilistic models such as Naïve Bayes Classifier, Decision
Trees or Neural Networks to model the relationship between different documents within a corpus.
These techniques make recommendations by learning the underlying model with either statistical
analysis or machine learning techniques. Content-based filtering technique does not need the
profile of other users since they do not influence recommendation. Also, if the user profile changes,
CBF technique still has the potential to adjust its recommendations within a very short period of
time. The major disadvantage of this technique is the need to have an in-depth knowledge and
description of the features of the items in the profile.
CB filtering techniques overcome the challenges of CF. They have the ability to recommend new
items even if there are no ratings provided by users. So even if the database does not contain user
preferences, recommendation accuracy is not affected. Also, if the user preferences change, it has
the capacity to adjust its recommendations in a short span of time. They can manage situations
where different users do not share the same items, but only identical items according to their
9
intrinsic features. Users can get recommendations without sharing their profile, and this ensures
privacy. CBF technique can also provide explanations on how recommendations are generated to
users. However, the techniques suffer from various problems as discussed in the literature. Content
based filtering techniques are dependent on items’ metadata. That is, they require rich description
of items and very well organized user profile before recommendation can be made to users. This
is called limited content analysis. So, the effectiveness of CBF depends on the availability
of descriptive data. Content overspecialization is another serious problem of CBF technique.
Users are restricted to getting recommendations similar to items already defined in their profiles.
Collaborative filtering is the type of recommendation algorithm that bases its predictions and
recommendations on the rating or behavior of other users in the system. The fundamental idea of
collaborative filtering is to find other users in the community that share opinions.
Collaborative filtering models use the collaborative power of the ratings provided by multiple users
to make recommendations. The main challenge in designing collaborative filtering methods is that
the underlying ratings matrices are sparse. Consider an example of a movie application in which
users specify ratings indicating their like or dislike of specific movies. Most users would have
viewed only a small fraction of the large universe of available movies. As a result, most of the
ratings are unspecified. The specified ratings are also referred to as observed ratings. Throughout
this book, the terms “specified” and “observed” will be used in an interchangeable way. The
unspecified ratings will be referred to as “unobserved” or “missing.”
The basic idea of collaborative filtering methods is that these unspecified ratings can be imputed
because the observed ratings are often highly correlated across various users and items. For
example, consider two users named Alice and Bob, who have very similar tastes. If the ratings,
which both have specified, are very similar, then their similarity can be identified by the underlying
algorithm. In such cases, it is very likely that the ratings in which only one of them has specified
a value, are also likely to be similar. This similarity can be used to make inferences about
incompletely specified values. Most of the models for collaborative filtering focus on leveraging
10
either inter-item correlations or inter-user correlations for the prediction process. Some models use
both types of correlations. Furthermore, some models use carefully designed optimization
techniques to create a training model in much the same way a classifier creates a training model
from the labeled data. This model is then used to impute the missing values in the matrix, in the
same way that a classifier imputes the missing test labels.
Collaborative filtering is a domain-independent prediction technique for content that cannot easily
and adequately be described by metadata such as movies and music. Collaborative filtering
technique works by building a database (user-item matrix) of preferences for items by users. It
then matches users with relevant interest and preferences by calculating similarities between their
profiles to make recommendations. Such users build a group called neighborhood. An user gets
recommendations to those items that he has not rated before but that were already positively rated
by users in his neighborhood. Recommendations that are produced by CF can be of either
prediction or recommendation. Prediction is a numerical value, Rij, expressing the predicted score
of item j for the user i, while Recommendation is a list of top N items that the user will like the
most as shown in figure. The technique of collaborative filtering can be divided into two
categories: memory-based and model-based .
11
There are two popular approaches of collaborative filtering:
A. User-based approach
Book Recommendation System uses the user ratings of other users with similar preferences to
recommend a book item to a certain user. User-based recommendation algorithms firstly identify
the k most similar users to the active user using the Pearson correlation or vector-space model in
which each user is treated as a vector in the m-dimensional item space, and the similarities between
the active user and other users are computed between the vectors. After the k most similar users
have been discovered, their corresponding rows in the user-item matrix R are aggregated to
identify a set of book items, C, ate by the group together with their frequency. With the set C,
user-based CF techniques then recommend the top-N most frequent elements in C that the active
user has not ate (XiaoyuanSu, 2009).
B. Item-based approach
Though user- based approach is useful, it suffers from the scalability problem as the user base
grows. Searching from the neighbors of a user becomes time-consuming. To extend collaborative
filtering to the large user base, a more scalable version of collaborative filtering, the i.e. item based
approach was introduced. In item based approach, instead of using similarities between users’
rating to predict preferences, similarities between the evaluation patterns of a particular item is
considered. Thus, the overall structure of this approach seems to be similar to that of content based
approach to recommendation and personalization, but item similarity is deduced from user
preference patterns rather than extracted from the item data. Even in its raw form, item–item CF
does not fix anything: it is still necessary to find themost similar to generate predictions and
recommendations. In a system that has more users than items, it allows the neighborhood finding
to be amongst the smaller of the two dimensions.
The significant performance gain occurs as it lends itself well to pre-computing the similarity
matrix. As, a user rates and re-rates items, their rating vector will change along with their similarity
to other users. Finding similar users in advance is, therefore, complicated: a user’s neighborhood
is determined not only by their ratings but also by the ratings of other users, so their neighborhood
can change as a result of new ratings supplied by any user in the system. For this reason, most
12
user- based CF systems find neighborhoods at the time when predictions or recommendations are
needed (Ekstrand, 2010).
Collaborative Filtering has some major advantages over CBF in that it can perform in domains
where there is not much content associated with items and where content is difficult for a computer
system to analyze (such as opinions and ideal). Also, CF technique has the ability to provide
serendipitous recommendations, which means that it can recommend items that are relevant to the
user even without the content being in the user’s profile. Despite the success of CF techniques,
their widespread use has revealed some potential problems such as follows.
This refers to a situation where a recommender does not have adequate information about a user
or an item in order to make relevant predictions. This is one of the major problems that reduce the
performance of recommendation system. The profile of such new user or item will be empty since
he has not rated any item; hence, his taste is not known to the system.
New items and new users pose a significant challenge to recommender systems. Collectively these
problems are referred to as the cold-start problem. The first of these problems arises in CF systems,
where an item cannot be recommended unless some user has rated it before. This issue applies not
only to new items but also to obscure items, which is particularly detrimental to users with
heterogeneous tastes.
Since content-based approaches do not rely on ratings from other users, they can be used to
produce recommendations for all items, provided attributes of the items are available. In fact, the
content-based predictions of similar users can also be used to improve predictions further for the
active user. The new-user problem is hard to tackle since without previous record of preferences
of a user it is not possible to find similar users or to build a content-based profile. As such, research
in this area has primarily focused on effectively selecting items to be rated by a user so as to
improve recommendation performance rapidly with the least user feedback. In this setting,
13
classical techniques from active learning can be leveraged to address the task of item selection
(Melville, 2010).
This is the problem that occurs as a result of lack of enough information, that is, when only a few
of the total number of items available in a database are rated by users. This always leads to a sparse
user-item matrix, inability to locate successful neighbors and finally, the generation of weak
recommendations. Also, data sparsity always leads to coverage problems, which is the percentage
of items in the system that recommendations can be made .
Most users do not rate most items and hence the user rating matrix is typically very less. This is a
problem for Collaborative Filtering systems since it decreases the probability of finding a set of
users with similar ratings. This issue often occurs when a system has a very high item to-user ratio
or the system is in the initial stages of use. This issue can be mitigated by using additional domain
information or making assumptions about the data generation process that allows for high-quality
imputation
Scalability
14
Synonymy
Synonymy is the tendency of very similar items to have different names or entries. Most
recommender systems find it difficult to make distinction between closely related items such as
the difference between e.g. baby wear and baby cloth. Collaborative Filtering systems usually find
no match between the two terms to be able to compute their similarity. Different methods, such as
automatic term expansion, the construction of a thesaurus, and Singular Value Decomposition
(SVD), especially Latent Semantic Indexing are capable of solving the synonymy problem. The
shortcoming of these methods is that some added terms may have different meanings from what
is intended, which sometimes leads to rapid degradation of recommendation performance.
Fraud
As Recommender Systems are increasingly adopted by commercial websites, they have started to
play a significant role in affecting the profitability of sellers. This has led to many unscrupulous
vendors attacks usually involve setting up dummy profiles and assume different amounts of
knowledge engaging in different forms of fraud to game recommender systems for their benefit.
Typically, they attempt to inflate the perceived desirability of their products or lower the ratings
of their competitors. These types of attack have been broadly studied as shilling attacks or profile
injection attacks. Such about the system. For instance, the average attack assumes knowledge of
the mean rating for each item; and the attacker assigns values randomly distributed around this
average, along with a high score for the item being pushed.
Hybrid filtering technique combines different recommendation techniques in order to gain better
system optimization to avoid some limitations and problems of pure recommendation systems.
The idea behind hybrid techniques is that a combination of algorithms will provide more accurate
and effective recommendations than a single algorithm as the disadvantages of one algorithm can
be overcome by another algorithm. Using multiple recommendation techniques can suppress the
15
weaknesses of an individual technique in a combined model. The combination of approaches can
be done in any of the following ways: separate implementation of algorithms and combining the
result, utilizing some content-based filtering in collaborative approach, utilizing some
collaborative filtering in content-based approach, creating a unified recommendation system that
brings together both approaches.
Recent research has demonstrated that a hybrid approach, combining collaborative filtering and
contentbased filtering could be more effective in some cases. Hybrid approaches can be
implemented in several ways: by making content-based and collaborative-based predictions
separately and then combining them; by adding content-based capabilities to a collaborative-based
approach (and vice versa); or by unifying the approaches into one model. Several studies
empirically compare the performance of the hybrid with the pure collaborative and content-based
methods and demonstrate that the hybrid methods can provide more accurate recommendations
than pure approaches. These methods can also be used to overcome some of the common problems
in recommendation systems such as cold start and the sparsity problem.
A variety of techniques have been proposed as the basis for recommendation systems:
collaborative, content-based, knowledge-based, and demographic techniques. Each of these
techniques has known shortcomings, such as the well-known cold-start problem for collaborative
and content-based systems and the knowledge engineering bottleneck in knowledge-based
approaches. A hybrid recommendation system is one that combines multiple techniques together
to achieve some synergy between them
1. PYTHON IDLE
2. DATASET
16
Dataset used for the project is goodbooks-10k. This dataset contains ratings for ten thousand
popular books. There are 100 reviews for each book, although some have less - fewer - ratings.
Ratings go from one to five.
Contents of dataset:
1. to_read.csv provides IDs of the books marked "to read" by each user, as user_id, book_id
pairs.
2. books.csv has metadata for each book (goodreads IDs, authors, title, average rating, etc.).
3. book_tags.csv contains tags/shelves/genres assigned by users to books. Tags in this file
are represented by their IDs.
4. tags.csv translates tag IDs to names.
5. Ratings.csv which gives ratings of all the books.
This dataset contains ratings for ten thousand popular books. As to the source, let's say that
these ratings were found on the internet. Generally, there are 100 reviews for each book,
although some have less - fewer - ratings. Ratings go from one to five.
Both book IDs and user IDs are contiguous. For books, they are 1-10000, for users, 1-53424.
All users have made at least two ratings. Median number of ratings per user is 8.
There are also books marked to read by the users, book metadata (author, year, etc.) and tags.
CHAPTER 4
DESIGN
17
4.1.1 USE CASE
18
4.1.3 SEQUENCE DIAGRAM
19
20
4.1.4 ACTIVITY DIAGRAM
21
CHAPTER 5
CODING
Python is dynamically typed and garbage collected. It supports multiple programming paradigms,
including procedural, object oriented and functional programming.
import pandas as pd
import numpy as np
class hybrid(object):
self.user_id = user_id
22
self.md =
pd.read_csv(r'C:/Users/TINKU/Downloads/IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)/IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master/CustomData/FinalData.csv')
self.ratings = ratings
self.popularity_rating = self.popularity(self.md)
self.content_rating = self.content_based(self.md,self.ratings,self.user_id)
def popularity(self,md):
fd=pd.read_csv(r'C:\Users\TINKU\Downloads\IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)\IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master\CustomData\AverageRatings.csv')
fd1 =
pd.read_csv(r'C:\Users\TINKU\Downloads\IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)\IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master\CustomData\RatingsCount.csv')
fd[fd['rating'].notnull()]['rating'] = fd[fd['rating'].notnull()]['rating'].astype('float')
23
vote_averages= fd[fd['rating'].notnull()]['rating']
C = vote_averages.mean()
fd1[fd1['rating'].notnull()]['rating'] = fd1[fd1['rating'].notnull()]['rating'].astype('float')
vote_counts = fd1[fd1['rating'].notnull()]['rating']
m = len(vote_counts)
md['ratings_count'] = fd1['rating']
md['average_rating'] = fd['rating']
qualified['ratings_count'] = qualified['ratings_count'].astype('float')
qualified['average_rating'] = qualified['average_rating'].astype('float')
qualified.shape
def weighted_rating(x):
v = x['ratings_count']
R = x['average_rating']
pop = qualified[['book_id','popularity_rating']]
print(qualified.shape)
print(pop.shape)
return pop
24
def collaborative(self,ratings,user_id):
reader = Reader#ratings.head()
temp_ratings = ratings
data.split(n_folds=2)
svd = SVD()
trainset = data.build_full_trainset()
algo = SVD()
algo.fit(trainset)
#svd.train(trainset)
testset = trainset.build_anti_testset()
predictions = algo.test(testset)
count = 0
if uid == user_id:
count = count+1
temp_ratings.loc[len(temp_ratings)+1]= [uid,iid,est]
#print("count\n")
#print(count)
#print("\n--------here-------\n")
#print(temp_ratings)
25
cb = temp_ratings[(temp_ratings['user_id'] == user_id)][['book_id', 'rating']]
#print("\n--------here-------\n")
#print(cb)
return(cb)
def content_based(self,md,ratings,user_id):
md['book_id'] = md['book_id'].astype('int')
ratings['book_id'] = ratings['book_id'].astype('int')
ratings['user_id'] = ratings['user_id'].astype('int')
ratings['rating'] = ratings['rating'].astype('int')
md['authors'] = md['authors'].str.lower()
#print(md.head())
#print(md['authors'])
md['Genres']=md['Genres'].str.split(';')
#print(md['Genres'])
#print(md['soup'])
#md['soup'].fillna({})
26
#print(md['soup'])
count = CountVectorizer(analyzer='word',ngram_range=(1,1),min_df=0,
stop_words='english')
count_matrix = count.fit_transform(md['soup'])
print(count_matrix.shape)
#print np.array(count.get_feature_names())
#print(count_matrix.shape)
def build_user_profiles():
user_profiles=np.zeros((60001,999))
for i in range(0,100000):
u=ratings.iloc[i]['user_id']
b=ratings.iloc[i]['book_id']
user_profiles[u][b-1]=ratings.iloc[i]['rating']
return user_profiles
user_profiles=build_user_profiles()
def _get_similar_items_to_user_profile(person_id):
#Computes the cosine similarity between the user profile and all item profiles
user_ratings = np.empty((999,1))
cnt=0
for i in range(0,998):
27
book_sim=cosine_sim[i]
user_sim=user_profiles[person_id]
user_ratings[i]=(book_sim.dot(user_sim))/sum(cosine_sim[i])
maxval = max(user_ratings)
print(maxval)
for i in range(0,998):
user_ratings[i]=((user_ratings[i]*5.0)/(maxval))
if(user_ratings[i]>3):
cnt+=1
return user_ratings
content_ratings = _get_similar_items_to_user_profile(user_id)
num = md[['book_id']]
num1 = pd.DataFrame(data=content_ratings[0:,0:])
content_rating.columns=['book_id', 'content_rating']
#print(content_rating.shape)
#print(content_rating)
return(content_rating)
hyb = md[['book_id']]
28
hyb = hyb.merge(self.collaborative_rating,on = 'book_id')
def weighted_rating(x):
v = x['rating']
R = x['popularity_rating']
c = x['content_rating']
print(len(hyb['Hybrid Rating']))
print(hyb)
def newUser():
print('2. J.K. Rowling, Mary Harry Potter and the Sorcerer\'s Stone (Harry Potter,
#1) Fantasy;Young-Age')
print('127. Malcolm Gladwell The Tipping Point: How Little Things Can Make a
Big Difference Self-Help')
print('239. Max Brooks World War Z: An Oral History of the Zombie War
Horror;Fiction')
29
print('84 Michael Crichton Jurassic Park (Jurassic Park, #1)
SciFi;Thriller;Fantasy')
30
print('770 William Shakespeare,Roma Gill Julius Caesar
History;Classic')
print('976 Robert Kapilow, Dr. Seuss Dr. Seuss\'s Green Eggs and Ham: For Soprano,
Boy Soprano, and Orchestra Kids')
print('627 Jon Scieszka, Lane Smith The True Story of the 3 Little Pigs
Kids;Fiction')
print('745 Jenny Lawson Lets Pretend This Never Happened: A Mostly True
Memoir Biography;Comedy')
ratings =
pd.read_csv(r'C:/Users/TINKU/Downloads/IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)/IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master/CustomData/ratings.csv')
ratings=ratings[1:100000]
user_id = 60000
31
rating_count = len(ratings['user_id'])+1
print(user_id)
for x in range(0,5):
print("\n")
bookId=input("BookId:")
rating=input("Rating:")
ratings.loc[rating_count]= [user_id,bookId,rating]
rating_count =rating_count+1
h = hybrid(user_id,ratings)
user=input("1. Book Recommendation for New User. \n2. Book Recommendation for Existing
User.\n")
if user=='1':
newUser()
elif user=='2':
ratings =
pd.read_csv(r'C:/Users/TINKU/Downloads/IT556_Worthless_without_coffee_DA-
IICT_Final_Project-master (1)/IT556_Worthless_without_coffee_DA-IICT_Final_Project-
master/CustomData/ratings.csv')
ratings=ratings[1:100000]
32
userId=int(input("\nPlease Enter User Id: "))
print('\n----------------Welcome User'+str(userId)+'-------------------')
h = hybrid(userId,ratings)
else:
33
CHAPTER 6
METHODOLOGY
This section explain in detail the process of implementing web based book recommender system.It
discusses about how the filtering methods are used one after the other to get output.We have
implemented three different algorithms to build an efficient recommendation system.
As the name suggests Popularity based recommendation system works with the trend. It basically
uses the items which are in trend right now. For example, if any product which is usually bought
by every new user then there are chances that it may suggest that item to the user who just signed
up.
The basic idea behind this recommender is that movies that are more popular and more critically
acclaimed will have a higher probability of being liked by the average audience.
From the ratings matrix, average ratings and rating count for each book is calculated. Then,
weighted rating formula is used to construct a chart. Mathematically,it is represented as follows:
Weighted Rating(WR)=(v/(v+m)*R)+(m/(v+m)*C)
34
6.2 CONTENT BASED FILTERING
Content Based Recommendation algorithm takes into account the likes and dislikes of the user and
generates a User Profile. For generating a user profile, we take into account the item profiles(
vector describing an item) and their corresponding user rating. The user profile is the weighted
sum of the item profiles with weights being the ratings user rated. Once the user profile is
generated, we calculate the similarity of the user profile with all the items in the dataset, which is
calculated using cosine similarity between the user profile and item profile.
Advantages of Content Based approach is that data of other users is not required and the
recommender engine can recommend new items which are not rated currently, but the
recommender algorithm doesn’t recommend the items outside the category of items the user has
rated.
35
1.Authors
2.Genres
The content-based filtering algorithm finds the cosine of the angle between the profile vector and
item vector, i.e. cosine similarity. Cosine Similarity is used to calculate a numeric quantity that
denotes the similarity between two books.
One of the most commonly used techniques for developing recommendation engines is
Collaborative Filtering. It has been used for years by the researchers for implementing
recommender systems. Collaborative filtering is a technology to recommend items based on
similarity. Collaborative Filtering, also known as social information filtering is based on the
principle of finding a subset of users who have similar taste and preferences to that of the active
user, and offering recommendations based on that subset of users. The idea is that given an active
user, u, compute her n similar users {u1, u2, … un} and predict u’s preference based on the
preferences of {u1, u2, … un}. Similar users mean users who share the same kind of tastes and
preferences over items. The basic idea behind collaborative filtering is that users who agreed on
the past tend to agree on the future also.
Users with similar interest have common preferences and vice versa.
Sufficiently huge number of user preferences is available.
36
There are different types of collaborating filtering techniques
This algorithm first finds the similarity score between users. Based on this similarity score, it then
picks out the most similar users and recommends products which these similar users have liked or
bought previously.
37
The prediction of an item for a user u is calculated by computing the weighted sum of the user
ratings given by other users to an item i.
In item-based model , it is assumed that books that are often read together by some users tend to
be similar and are more likely to be read in future also by some other user.
38
we will find the similarity between each book and based on that, we will recommend similar books
which are liked by the users in the past. This algorithm works similar to user-user collaborative
filtering with just a little change – instead of taking the weighted sum of ratings of “user-
neighbors”, we take the weighted sum of ratings of “item-neighbors”. The prediction is given by:
Most recommender systems now use a hybrid approach, combining collaborative filterin, content-
based filtering, and other approaches . There is no reason why several different techniques of the
same type could not be hybridized. Hybrid approaches can be implemented in several ways: by
making content-based and collaborative-based predictions separately and then combining them;
by adding content-based capabilities to a collaborative-based approach (and vice versa); or by
unifying the approaches into one model . Several studies that empirically compare the performance
39
of the hybrid with the pure collaborative and content-based methods and demonstrated that the
hybrid methods can provide more accurate recommendations than pure approaches..
A system that combines content-based filtering and collaborative filtering could potentially take
advantage from both the representation of the content as well as the similarities among users. One
approach to combine collaborative and content-based filtering is to make predictions based on a
weighted average of the content-based recommendations and the collaborative recommendations.
Various means of doing so are:
We combine the ratings from Popularity model, Content based filtering and Collaborative filtering
to get more accurate results.It gives the predicted rating as weighted combination of the above
described methods. Equal weigths have been given to collaborative and content rating.
40
FUTURE SCOPE
Recommender system has been an active area of research for a decade or so and continues to be
an interesting research domain. Although recommender systems has witnessed unprecedented
improvements strating from very primiltive content based and collaborative filtering methods, a
lot of research is going on to further enhance the output accuracy and improvements in all
dimensions of recommender system. The search is focused on various areas to make the RS more
and more useable and practical in real life scenarios. The following are some of the areas of RS
where there are intense research going on and these efforts are surely shaping the future of
recommender systems
1.Privacy
Privacy preserving RS is one of the major challenges towards developing a practical RS. There are
various real life situations where getting input data is not easy and at times extremely difficult for
the recommender system to make a reliable recommendation. There are various reasons for that.
In the case of systems like medical recommendation system, availability of input data are in sparse
as medical history is often treated as personal, confidential information. As a result of these,
developing a reliable medical recommender system or any such system which requires data that is
considered to be private and confidential is extremely difficult. In [26], an approach towards
privacy preserving RS is detailed that makes use of Homomorphic cryptography to achieve the
same.
Most research into recommending items has been towards the accuracy of predicted ratings. There
are also other factors those have been identified as important to users. One such factor is the
diversity of items in the recommendation list. In a user survey aimed at evaluating the effect of
diversification on user satisfaction, it is found that it had a positive effect on overall satisfaction
even though accuracy of the recommendations was affected adversely. There is a great need for a
shift in focus that is related to the functionality offered by recommender systems that can exploit
directly the usage data, and add more value to the user.
41
3.Dynamics in user interest
Human beings have varied interest and most importantly this interest is dynamic. Recommender
system needs to adapt to this dynamism. Most personalization systems tend to use a static profile
of the user. However user interests are not static, changing with time and context. Few systems
have attempted to handle the dynamics within the user profile. The behavior of users varies over
time and it should affect the construction of models. A Recommender system should be able to
adapt to the user’s behavior, when this changes.
In many of the practical dataset, it has been found that data sparseness is a major issue, many of
the recommender algorithms makes this issue worse and cold start problem is becoming a deterrent
for the RS usage. There are many research initiatives towards eliminating data sparseness using
singular value decomposition.
5.Scalable RS
Although the accuracy of RS is being enhanced, the computing requirements are also becoming
more and more complex. Scalable RS has become an impending need towards practical use-case
scenarios and indeed an area of focused research.
6.Collaborative RS
Recommender systems need to collaborate among themselves in order to increase the accuracy of
RS and also increase the scope of RS. These collaborative RSs would be linked to each other over
a simplified but standard interface and would be complementary to each other.
42
RELATED WORK
In this section we review some of the works related to various approaches to Recommender System
(RS). A lot of work has been done in the area of recommender system in general. The collaborative
filtering, content based and Hybrid approaches and the issues of Recommender System are
explained in the survey done by Adomavicius . The new algorithm for increasing the accuracy of
collaborative Filtering is discussed by Herlocker. A new filtering technique combing collaborative
Filtering and Content Based filter is Hybrid method. Context aware RS is discussed by G.
Adomavicius and Alexander Tuzhilin in. Their work details about modeling contextual
information in recommendation system. They also describe contextual pre-filtering, post-filtering
and Contextual modeling.They mentioned about possibility of combining post-filter, pre-filtering
and contextual modeling in order to achieve higher accuracy in RS output. Matthias, Gernot Bauer
explore the design space of RS for mobile applications and describe different dimensions and
techniques for capturing the users, the items, the contexts etc. Sofiane Abbar, Mokrane and
Stephane, present an approach based on data personalization and Personal Access Model that
provides a set of personalization services. Daniar discusses different types of traditional
approaches as well as modern approaches to RS. They also discussed the main challenges in RS.
Chang E, Thomson P et. al in, discusses the dynamic and fuzzy nature of trust and their impact.
John O’Donovan & Barry Smyth discusses the impact of trust in Recommender System. Different
computational models for trust are also discussed.
43
SCREENSHOTS
44
45
REFERENCES
Francesco Ricci and LiorRokach and Bracha Shapira, Introduction to Recommender Systems
Handbook, Recommender Systems Handbook, Springer, 2011, pp. 1-35
^ Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Bosagh
Zadeh WTF:The who-to-follow system at Twitter, Proceedings of the 22nd international
conference on World Wide Web
^ H. Chen, L. Gou, X. Zhang, C. Giles Collabseer: a search engine for collaboration discovery,
in ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2011
^ Alexander Felfernig, Klaus Isak, Kalman Szabo, Peter Zachar, The VITA Financial Services
Sales Support Environment, in AAAI/IAAI 2007, pp. 1692-1699, Vancouver, Canada, 2007.
^ R. J. Mooney & L. Roy (1999). Content-based book recommendation using learning for text
categorization. In Workshop Recom. Sys.: Algo. and Evaluation.
^ Jump up to:a b Rubens, Neil; Elahi, Mehdi; Sugiyama, Masashi; Kaplan, Dain
(2016). "Active Learning in Recommender Systems". In Ricci, Francesco; Rokach, Lior;
46
Shapira, Bracha (eds.). Recommender Systems Handbook (2 ed.). Springer
US. doi:10.1007/978-1-4899-7637-6_24. ISBN 978-1-4899-7637-6.
^ Jump up to:a b Elahi, Mehdi; Ricci, Francesco; Rubens, Neil (2016). "A survey of active
learning in collaborative filtering recommender systems". Computer Science Review. 20: 29–
50. doi:10.1016/j.cosrev.2016.05.002.
^ Karlgren, Jussi. 1990. "An Algebra for Recommendations." Syslab Working Paper 179
(1990).
^ Karlgren, Jussi (October 2017). "A digital bookshelf: original work on recommender
systems". Retrieved 27 October 2017.
^ Shardanand, Upendra, and Pattie Maes. "Social information filtering: algorithms for
automating “word of mouth”." In Proceedings of the SIGCHI conference on Human factors
in computing systems, pp. 210-217. ACM Press/Addison-Wesley Publishing Co., 1995.
^ Hill, Will, Larry Stead, Mark Rosenstein, and George Furnas. "Recommending and
evaluating choices in a virtual community of use." In Proceedings of the SIGCHI conference
on Human factors in computing systems, pp. 194-201. ACM Press/Addison-Wesley
Publishing Co., 1995.
^ Resnick, Paul, NeophytosIacovou, Mitesh Suchak, Peter Bergström, and John Riedl.
"GroupLens: an open architecture for collaborative filtering of netnews." In Proceedings of
47
the 1994 ACM conference on Computer supported cooperative work, pp. 175-186. ACM,
1994.
^ Resnick, Paul, and Hal R. Varian. "Recommender systems." Communications of the ACM
40, no. 3 (1997): 56-58.
^ Montaner, M.; Lopez, B.; de la Rosa, J. L. (June 2003). "A Taxonomy of Recommender
Agents on the Internet". Artificial Intelligence Review. 19 (4): 285–
330. doi:10.1023/A:1022850703159..
^ Jump up to:a b Adomavicius, G.; Tuzhilin, A. (June 2005). "Toward the Next Generation of
Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions". IEEE
Transactions on Knowledge and Data Engineering. 17 (6): 734–
749. CiteSeerX 10.1.1.107.2790. doi:10.1109/TKDE.2005.99..
^ Herlocker, J. L.; Konstan, J. A.; Terveen, L. G.; Riedl, J. T. (January 2004). "Evaluating
collaborative filtering recommender systems". ACM Trans. Inf. Syst. 22 (1): 5–
53. CiteSeerX 10.1.1.78.8384. doi:10.1145/963770.963772..
^ Jump up to:a b c Beel, J.; Genzmehr, M.; Gipp, B. (October 2013). "A Comparative Analysis
of Offline and Online Evaluations and Discussion of Research Paper Recommender System
Evaluation" (PDF). Proceedings of the Workshop on Reproducibility and Replication in
Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference
(RecSys).
^ Beel, J.; Langer, S.; Genzmehr, M.; Gipp, B.; Breitinger, C. (October 2013). "Research
Paper Recommender System Evaluation: A Quantitative Literature
Survey" (PDF). Proceedings of the Workshop on Reproducibility and Replication in
Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference
(RecSys).
48
^ Beel, J.; Gipp, B.; Langer, S.; Breitinger, C. (26 July 2015). "Research Paper Recommender
Systems: A Literature Survey". International Journal on Digital Libraries. 17 (4): 305–
338. doi:10.1007/s00799-015-0156-0.
^ Waila, P.; Singh, V.; Singh, M. (26 April 2016). "A Scientometric Analysis of Research in
Recommender Systems" (PDF). Journal of Scientometric Research. 5: 71–
84. doi:10.5530/jscires.5.1.10.
^ Stack, Charles. "System and method for providing recommendation of goods and services
based on recorded purchasing history." U.S. Patent 7,222,085, issued May 22, 2007.
^ Herz, Frederick SM. "Customized electronic newspapers and advertisements." U.S. Patent
7,483,871, issued January 27, 2009.
^ Herz, Frederick, Lyle Ungar, Jian Zhang, and David Wachob. "System and method for
providing access to data using customer profiles." U.S. Patent 8,056,100, issued November 8,
2011.
^ Harbick, Andrew V., Ryan J. Snodgrass, and Joel R. Spiegel. "Playlist-based detection of
similar digital works and work creators." U.S. Patent 8,468,046, issued June 18, 2013.
49
50