Survey Recomender System Algorithm
Survey Recomender System Algorithm
Spring 2000
By
Yuan Qu
Xiaoyun Yang
Tianping Huang
2
May 5, 2000
Table of Contents
I. Introduction ........................................................ 3
I. Algorithms.......................................................... 14
II. Discussion.......................................................... 29
III. Reference........................................................... 31
2
3
1. Introduction
In our daily life, we make our choices at most cases relying on recommendations
from other people either by word of mouth, recommendation letters, movie and book
reviews printed in newspapers, or general surveys. In this information age, each day tons
of news published through the Internet. This leads to a clear demand for automated
methods that locate and retrieve information with respect to users’ individual interests.
More and more people accessing the Internet also provide new possibilities to organize
Recommendation systems can assist and augment this natural social process.
These systems can recommend what you want according what you want in previous time.
The main purpose of the recommendation systems is to provide tools for people to
leverage the information hunting and gathering activities of other people or groups of
people. Recommendation systems have been an important application area and the focus
Recommendation systems basically are divided into two categories. One is called
content-base filtering; the other is collaborative filtering (or social filtering). In content-
that can be derived from document contents. In collaborative filtering system, the
readers of the document. They consider that communities of shared interest could be
3
4
filtering system provides a basis for selection of information items, regardless of whether
their content can be represented in a way that is useful for selection. In this paper, the
recommendation system, Tapestry, in 1992 [Goldberg, et al. 1992]. Several years later
the concept of collaborative filtering had already applied in dozens of publicly available
systems, several proprietary systems, and even some commercially available systems. In
1996, dozens of the researchers in the academic and business areas gathered at the UC-
Berkeley to share their ideas and experiences about these emerging filtering methods
[Collaborative Filtering workshop, 1996]. They presented the vision and definition of
collaborative filtering, and provided some applications of this technique. Right now
more and more published articles demonstrated their applications of the collaborative
filtering methods.
In this paper, a survey was made for all the recommendation systems available in
the Internet. Then, the characteristics of each recommendation system are displayed. And
purposes of their application, the recommendation systems can be classified into three
4
5
r e c o m m e n d a t o n s y s t e m s
m o v ie s o rn em w u s s io c r a rw t i ec bl e sp a g e s
E a c h M o i vTe a p e s t r y P h o a k s
M o r s e G r o u p L e n sG A B
F ir e f ly L o t u s N o t e Fs a b
. . . . . . . . .
The systems in first category are used for recommending movies, music, videos or
other services. In this category, the database is relative stable, such as the population
database, it may not be changed in years. The typical systems include EachMovie,
Firefly, and Morse. The second category is used for news or articles in a newsgroup.
The users in the newsgroup generally have the similar goals or interests. The database is
also relative stable. It may be updated in weeks or short time. The representatives of
these systems are Tapestry, GroupLens, and Lotus Notes. The last one is for web pages’
recommendation. The information in this category is dynamic, that means, the new page
can be added or deleted in the system at any time. At the same time, the users may have
different tastes. Phoaks, GAB, and Fab are most useful systems of this kind.
5
6
Do-I-Care
Collaborative Filtering workshop, 1996] system provides a function that alerts the user
when this Web page is changed. The system uses the model-based algorithm. It uses
Bayesian classifier technology. After some users training the model many times, the other
[Collaborative Filtering workshop, 1996], the accuracy of Do-I-Care can reach 70-90%.
It is said the accuracy of the system reaches 100% in tracking airline fare sales
application.
Fab
In a collaborative filtering system, if a new item or new user enters the system,
the system has no clue to calculate the similarity between users and the system has no
way to consider the new item unless some users have rated it, or recommended it. This
problem is called cold-start problem. But for content-based filtering, there does not exist
such problem. To eliminate this problem, Fab recommendation system [Turnbull, 1998]
The Fab system is a web based recommendation service that incorporates both
collection of keywords contained in those documents that each user rate highly.
6
7
Documents are presented for rating when either the content of the document matches
previous documents that were rated highly, or neighboring users rate a document highly.
Every time a favorable or unfavorable rating is received, the profile of the user is updated
Collection agents are sent out over the web to look for documents with specific
content, each agent using a different set of keywords. After retrieving the documents,
they are passed to a central server where a selection agent matched to each user's profile,
scours through the documents looking for interesting material. Relevant documents are
then presented to the user for rating. This rating dynamically affects the selection agent’s
behavior and changes the user's profile. The rating also affects the collection agent that
retrieved the document. Unpopular collection agents are removed and replaced with more
The Fab system combines the best features of both content-based and
collaborative filtering methods and also manages to keep the system dynamically updated
to the current users' tastes. One potential shortcoming is Fab's reliance on explicit user
feedback.
Firefly
The system [Turnbull, 1997 and 1998] is based on similarities of users to provide
recommendation. At the beginning, this system was used for music and movies
7
8
The system used users’ profiles as input, and used constrained Pearson algorithm
to make the best predictions between users. The basic idea of the algorithm is: a) the
system maintains a user profile, which includes “like or dislike” of specific items; b) the
system compares the similarities of users and decides which kind of users that the user
belongs, and c) according to the similar user’s profile and gives a good recommendation.
GAB
GAB [Wittenburg, et. al., 1998] stands for group asynchronous browsing. The
idea of GAB system is that the system collects and merges bookmarks and hotlists files
of users and then serves these files to users. That means, the system has the ability to
reach user’s bookmarks and extract information. This raises privacy concerns. To
overcome the privacy problem, the system has provide a mechanism to let user save
The system uses multi-tree data structure for the bookmarks. To avoid getting
lost in hyperspace and to increase the connectivity in merged subject tree database, the
system has defined sibling and cousin relations. Sibling relation of item A and B means
that A and B belong to the same specific subject, while cousin relation of A and B means
that A and B belong to the broad subject but not the same specific subject. The system
also has applied for monitoring the change of content of web page.
Grassroots
8
9
Organizing People”.
This system provides a special interface of Web pages to access all of the
information it works with. In practice, Grassroots also lets participants continue using
other mechanisms, and takes as much advantage of them as possible. The main engine in
the Grassroots system is a Web server and Proxy server setup that can be used with any
Web browser.
GroupLens
Resnick [Resnick, et al. 1994] presented the GroupLens system, which is built
based on a simple premise "the heuristic that people who agreed in the past will probably
agree again". This system uses the same Pearson algorithm to provide algorithm. At
early stage, the system uses explicit vote ( 1 to 5 scale, 1 stands for dislike it, 5 for like
it). The updated one also includes using implicit method to get the feedback from the
user, such as monitoring reading time. The most characteristics of the system are its
Openness means that this system provides other researchers an access to create
clients that work with the system servers or to even change those servers if there are
better improvements. When users’ number increases, the system still can provide
accurate prediction but the database for the system or the calculation time will be very
huge.
9
10
Let's browse and its predecessor, Letizia, [Lieberman, 1996; Pryor, 1998] are web
agents that assist a user during his/her browsing experience. By monitoring a user’s
behavior, or browsing time on a web page, Letizia system learns the user’s interests and
multiple users are reading the same page at the same time, Let’s Browse can determine
which users are in the area of monitor, and use their profiles to provide recommendation
Lotus Notes
Collaborative Filtering techniques. The system serves for the newsgroup. All Notes
Users should have similar goals or information interests because they are working in the
same group
Lotus provides a feature to let people annotate documents. After annotation, the
user can send or distribute these links or comments to others. To protect user’s privacy,
the system uses an agent to represent an individual. These agents extract significant
phrases from the document that the user reads, and then exchange the learning results
anonymously.
10
11
Mosaic
Mosaic system [Turnbull,1997] was the first Web tool that facilitated
collaborative. Like recommendation system Pointers, the Mosaic users in the system can
publish and distribute the bookmarks and add the comments to the web page. This
PHOAKS
Terveen [Terveen et. al, 1997] first introduced PHOAKS (People Helping One
Another Know Stuff) system that recommends the URLs that will be very interesting to
users. The system will automatically recognize web resource references in a new group
message and then attempt to classify it, and introduce it to other users. That means the
system scans and checks the group’s messages and then gets the most important URLs in
theses messages. After sorting these links, the system recommends this URLs to users.
The system uses implicit feedback and also considers the role specialization.
Pointers
we know if one person is an expert in these areas, then other users in this group would
like to see his/her recommendation. So the system provide a mechanism to let the
11
12
documents they find. This mechanism is realized by using “pointer”. This pointer is
consists of URL link, contextual information, and optimal comments by the sender. The
Siteseer
to find neighbors and recommend sites. Users with significant overlap in bookmark
listings are determined to be close to one another, allowing previously unvisited sites to
Tapestry
free annotations or explicit “like it” or “hate it” annotations. This system is used for
Yahoo!
uses manual way to realize collaborative filtering. They have one expert to update
Yahoo! Index as quickly as possible. That means that every site is examined by a people
when it is added. Also the system allows web users to submit pages. Because of its
12
13
openness, the form of Yahoo! index has become very popular and has become a
classification standard.
WebWatcher
recommendation. The user who enter the system can ask question by typing what is
his/her interest, and then the system will recommend the related web sites. This is not the
same thing as keyword-based search engine. It does use the user profile and other users’
previous tour, and calculate the similarities of users and predict the user’s interest. The
Today recommendation systems have been used in many fields, virtually all
topics that could be of potential interest to users are covered by special purpose
recommendation systems: Web pages, news stories, emails, movies, music videos, books,
CDs, restaurants, and many more. These recommendation systems predict the users’
interest and preference based on all users’ profiles, using information retrieval
techniques. The underlying techniques used in today’s recommendation systems fall into
two distinct categories: content-based filtering and collaborative filtering methods. The
content-based filtering uses actual content features of items, while the collaborative
13
14
filtering predict new user’s preference using other users’ rating, assuming the like-
minded people tend to have similar choices. Here, we concentrate on the algorithms used
products of a new user might like, based on a user preference database. There have been
Collaborative Filtering.
Memory-based Algorithms
because that these algorithms operate over the entire user database to make predictions.
Basically, these algorithms all try to find the similarity or correlation between the new
active user and other users in the database. All users’ preferences could be represented by
their votes (explicit or implicit) to the products (which could be anything related to the
users’ interests.). The new user has an average vote over the products he/she has rated.
Then the predicted votes of the new users over other products could be calculated by
adding weighted sum of other users’ votes. The weights could be determined by the
similarity between the new user and other users. The more similar they are, the more
contributions they have to the sum, so the large the weights are. The user’s average vote
14
15
could be represented as below, the I i is the set of items the new user i has voted, vij is
1
vi = ∑v i , j
| I i | j∈I i
n
p a , j = v a + k ∑ w(a, i )(vi , j − v i )
i =1
where the k is a normalizing factor, while w( a, i ) is the weight that the user i
The weights are calculated by comparing a set of common products, which the
active user and all other users in the database have rated. Here we collected three major
This method defines the weight as the inverse of the mean square distance.
1
w( a, i ) =
(V j −Va ) 2
Pearson Correlation:
15
16
w( a, i ) =
∑ (v − v )( v − v )
j a, j a i, j i
∑ (v − v ) ∑ (v − v
j a, j a
2
j i, j i )2
Vector Similarity:
This method defines the weight based on the angle size between the active user
va, j vi , j
w( a, i ) = ∑
j ∑ k∈I a
v a2, k ∑ k ∈I i
vi2,k
Default Voting:
16
17
Usually, we are dealing with very sparse databases, also there are a lot of products which
users didn’t vote on (explicit or implicit). When using memory-based algorithms, we are
only using the entries at the intersection. For the example above, to calculate the weight
user1 contributes, we can only use the rates for book1. In order to deal with this problem,
default votes are introduced. In most case, a neutral or negative preference is given to the
unobserved products. So the union of voted set could be used in weights calculation
instead of intersection. But this method may not necessarily improve the performance of
the memory-based algorithms, an unobserved product may not mean that it’s less
interesting.
The idea of inverse user frequency is that universally liked products are not as
useful as the less common products in capturing the similarity between users. So the
n
f j = log
nj
Where n is total number of users, while n j is the total number of users who have
w( a, i ) =
∑ j
f j (∑j f j v a , j vi , j ) − (∑j f j v a , j )( ∑j f j vi , j )
UV
17
18
Where,
U = ∑ f j (∑ f j v a2, j − (∑ f j v a , j ) 2 )
j j j
V = ∑ f j (∑ f j vi2, j − (∑ f j vi , j ) 2 )
j j j
Case Amplification:
Case amplification emphasizes the contribution of the most similar users to the
prediction by amplifying the weights close to 1. The new weights are calculated as
below:
waρ,i wa ,i ≥ 0
wa' ,i = {
− ( −waρ,i ) wa , j < 0
Voting by category:
matrix could become unmanageable, preventing the practical calculations over the over
matrix. There could be very few common votes to the same products if not using default
voting method mentioned before, however, providing default votes may not improve the
Basically, they assume the existence of small number of generated clusters or pre-
existing categories to which products can be assigned. Then transfer the voting matrix
into much lower dimension by transfer users’ voting to products into the voting to
18
19
categories. See the same example below, this time the original 4 by 6 matrix is changed
v i ,c = vi , j , j ∈c
Now the entry of the new matrix is the average over the votes of the products per each
The method could be used on all other algorithms (including the Model-based
Algorithms). We put it here because the original author uses it along with the correlation
algorithm.
Model-based Algorithms
19
20
m
Pa , j = E (v a , j ) = ∑ Pr( v a , j = i | v a ,k , k ∈ I a ) ⋅ i
i =0
Cluster Models:
Based on the idea that there are certain groups or types of users capturing a
common set of preferences and tastes, Breese, et.al, proposed a cluster method, in which
like-minded users are classified into the same group. Given a user’s class membership,
the user’s votes are assumed to be independent, then the joint probability of class and
n
Pr( C = c, v1 ,..., v n ) = Pr( C = c) Pr( vi | C = c)
i =1
Once we know the probability of observing an individual of a class with a set of votes,
the expectation of the future vote could be easily calculated. Since the classes and
number of class are unknown, EM algorithm is used to find the model structure with
maximum likelihood.
20
21
Ungar [Unger, et. al.,1998] proposed a new clustering methods, unlike the
standard cluster models, they assume that people are from classes: e.g, intellectual or fun
and products are also from classes. Here is an example in their paper,
and movies could belong to three categories: action, foreign, classic. “y” in the table
means people like the movies associated. For each person/movie pair, the probability that
Based on the observation above, they establish a model, which contains three sets
movie is in class l), Pkl (probability a person in class k is linked to a movie in class l).
Here, the class assignments are unknown. They tried repeated clustering and
Gibbs sampling methods. In repeated clustering method, firstly, people are clustered
based on movies and movies based on people; on the second, and later passes, people are
clustered based movie clusters and movies based on people clusters. To do clustering,
21
22
they use k-means clustering instead of EM algorithm due to the constraint that a person is
always in the same class and a movie is always in the same class. They claimed that the
Bayesian belief network with a node corresponding to each product in the database. The
missing data can be represented by a “no vote” value. After applying an algorithm to train
the belief network, in the resulting network, each item will have a set of parent items that
are the best predictors of its votes. A decision tree could be used to represent the
classification task. Based on a set of ratings from users for products, we could induce a
model for each user that allows us to classify unseen products into two or more classes.
The missing data could be indicted by a “no vote” state. Here is an example given in
I1 I2 I3 I4 I5
U1 4 3
U2 1 2
U3 3 4 2 4
U4 4 2 1 ?
22
23
Where Ui is the ith user, Ii is the ith item. Users rate the items from 1 to 4, while 4 is the
highest rating. Since finally they only recommend the items the active user would like,
they reform the rating matrix by replacing rating > 2 by 1 otherwise 0. To represent the
“no vote” value, they further split every user set into two sets (like and dislike).
E1 E2 E3
U1 like 1 0 1
U1 dislike 0 0 0
U2 like 0 0 0
U2 dislike 0 1 0
U3 like 1 1 0
U3 dislike 0 0 1
Class like dislike dislike
Here U4’s ratings for I1, I2, I3 are class labels. After converting a data set of user ratings
for items into this format, we can apply virtually any supervised learning algorithm.
Other Algorithms
called personality diagnosis (PD) which can be seen as a hybrid between memory- and
model-based approaches. All data is maintained throughout the process, new data can be
their underlying “personal type”. Based on the fact that users’ voting are affected by the
other environmental factors, such as previous users’ votes, current user’s mood … , they
23
24
assumed that all users report their rating with Gaussian noise. If we define a user’s
true
personality type as a vector of “true” rating V i , then user i’s actually rating could be
2
−( x − y ) / 2σ 2
Pr( vi , j = x | vitrue
, j = y) = k ⋅ e
They further assumed that the distribution of voting vector in the database is
1
Pr(Vatrue = Vi ) =
n
Where n is the total number of users in the database. Then the probability that the active
user has the same personality type with any other user can by calculated by applying
Bayes’ rule.
24
25
p (v =x |v = x ,..., v =x ) =
r a, j j a ,1 1 a, m m
true =V ) ⋅ p (V true =V | v
∑ p r ( v a , j = x j | Va i r a i a ,1 = x1 ,..., v a , m = x m )
Improvements:
methods. Both methods have their advantages and drawbacks. Memory-based methods
are simple and easy to implement. But they may be time- and space- consuming. At lease,
for memory-based methods, it’s hard to handle two problems mentioned below:
1) Missing data: To find the similarity between users, the difference (distance) between
users has to be computed. If there are missing data, either only the products which all
users voted are used, or give a vote to missing data. In first case, it has problem with
sparse databases. In second case, giving average votes or somewhat negative votes to
2) Memory-based methods can not handle the situation that two user are very similar but
User1 and user2 are very similar in this example, however, when we use memory-
based methods to predict user3’s preference on product6, only user1’s votes could be
used to predict.
25
26
For model-based methods, clustering methods could somewhat handle missing data
by clustering products into fewer categories, the new votes for categories are averaged
over available votes for the products in the category. But Clustering methods may over-
generalize, and hurt the performance. Bayesian network or neural network models could
handle the missing data and the problem (2) mentioned before reasonably well. But for
large databases containing many users, we will end up with thousands of features while
our amount of training data is very limited, those models will become not practical.
Recently, a promising algorithm is proposed. The idea is that users are rating their
products based on the latent features of products. All products in the database share a set
of common features. Users rate products highly because they rate those features highly.
So by factoring peoples’ ratings into features using linear algebra, we could predict how
users will react to documents they have not seen before based on their preferences for
these features. Singular Value Decomposition (SVD) allows us to break down data sets
into these components and analyze the principal components of the data. We will see
below how SVD could be used to capture the hidden features and help to reduce the
dimension of databases.
The user rating vectors can be represented by a m× n matrix A, with m users and
n products,
A =[ai , j ]
26
27
and the S is a zero matrix, except for the diagonal entries which are defined as the
example given by Pryor [Pryor, H. Michael,1998] in his report. Suppose the rating matrix
A is,
5 4 2 6
A=
3 7 5 2
6 4 1 4
27
28
We can find that the feature described by “14.4890” in S is the most important
feature. So the dimension of S could drop off by selecting only most important features,
in this case only the one represented by “14.4890”. Then the new rating matrix could be
generated, by converting the original rating matrix into the feature space.
AV = US
M = US '
In this case, S ' = [14 .4890 ] , after we get the new rating matrix M in the feature space.
has been shown that exploiting latent structure in matrices of user ratings can lead to
Collaborative Filtering (CF) Methods are used. CBF filters information based on
matching information content with user’s interests. CBF is able to filter information that
has not been evaluated by other people. So CBF and CF are combined in recommender
systems. CBF could be used to deal with unlearn products, while CF recommend new
IV. Discussion
28
29
following features:
is, system has no clue to recommend a new item to users or to provide an accurate
predictions for a new user. Since content-based filtering is based on the feature of the
item, there is no such cold-start problem. Fab system has integrated these content-based
fitering and collaborative filtering. Based on this integration, Michelle Keim Condliff et
uses Bayesian theory to give a good prediction by fully incorporating all of the available
data, such as user ratings, user features, and item features . Claypool [Mark Claypool, et
al. 1999] also provide an approach to solve this cold-start problem. This system bases on
prediction.
like to receive recommendation. Since the system depends on the votes of users and then
to calculate the similarities of users, so it is very important to get enough data from the
users. So the system should provide very easy interface for a user to vote or provide
annotation. Although explicit annotations or votes will leverage the calculation, implicit
feedback of the users will be more helpful to decrease the sparse matrices, which is used
for similarity calculation. The implicit methods include monitoring user’s behavior and
29
30
monitoring user’s browsing time on the page. The longer time a person stays, the more
interesting the person shows. The system also can use compensation methods. For
example, if one needs further recommendation, one must vote what he reads.
3) Privacy
Privacy becomes an issue when a system collects information about its user, so
users share the document annotations. In one side, people do not like the release their
private identification, on the other side, people like to see who make the annotations. For
example, if annotation is provided by an expert in this area, people in this group would
like more to read this information. The system should provide a mechanism to allow user
4) Algorithm
3. cost-efficiency
5. Reference:
MLIF/papers.html
30
31
Breese, J., Heckerman, D., Kadie, C., 1998. Empirical Analysis of Predictive Algorithms
Claypool, Mark; Gokhale, Anuja and Miranda, Tim et. al., 1999, Combining Content-
http://www.cs.wpi.edu/~claypool/papers/content-collab/
http://www.sims.berkeley.edu/resources/collab/collab-report.htr.
Condliff, Michelle Keim; Lewis, David D.; Madigan, David and Posse, Christian ; 1998,
http://www.cs.umbc.edu/~ian/sigir99-rec/
Goldberg, D. Nichols, D. Oki, B. M. and Terry, D.: Using collaborative filtering to weave
Joachims, Thorsten; Freitag, Dayne and Mitchell, Tom 1996, WebWatcher: A Tour
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-6/web-agent/www/project-
home.html
Lieberman, H. 1996: “Letizia: An Agent That Assists Web Browse,” in MIT Media Lab.
Maltz, David and Ehrlich, Kate 1995: Pointing the way: active collaborative filtering.
http://www.acm.org/sigchi/chi95/Electronic/documnts/papers/ke_bdy.htm.
Oard, Douglas W. and Marchionini, Gary 1996, A Conceptual FrameWork for Text
Filtering. http://www.ee.umd.edu/medlab/filter/papers/filter/filter.html
31
32
http://www.research.microsoft.com/~horvitz/cfpd.htm
338.
Resnick, Paul; Iacovou, Neophytos and et al;, 1994, GroupLens : An Open Architecture
175-186.
Shardanand, Upendra and Maes, Pattie 1995. Social Information Filtering: Algorithms
http://www.acm.org/sigchi/chi95/Electronic/documnts/papers/us_bdy.htm
Terveen, Loren G., Hill, William C. and et al;, 1998, Building Task-Specific Interfaces
http://www.acm.org/sigchi/chi97/proceedings/paper/lgt.htm
Turnbull, Don: Augmenting Information Seeking on the World Wide Web Using
http://donturn.fis.utoronto.ca/research/augmentis.htn
http://donturn.fis.utoronto.ca/research/kmdi-cf.html
Ungar, Lyle H., and Foster, Dean P. Foster, 1998. A Formal Statistical Approach to
32
33
http://www.cis.upenn.edu/~ungar/papers.html
Wittenburg, Kent, Duco Das, Will Hill, and Larry Stead, 1998, Group Asynchronous
http://www.w3.org/Conferences/WWW4/Papers/98/
33