Content-Based Filtering
Content-Based Filtering
The first part of the chapter presents the basic concepts and terminology
of content based recommender systems, a high level architecture, and their main
advantages and drawbacks. The second part of the chapter provides a review of the
state of the art of systems adopted in several application domains, by thoroughly
describing both classical and advanced techniques for representing items and user
profiles. The most widely adopted techniques for learning user profiles are also
presented. The last part of the chapter discusses trends and future research which
might lead towards the next generation of systems, by describing the role of User
Generated Content as a way for taking into account evolving vocabularies, and the
challenge of feeding users with serendipitous recommendations, that is to say
surprisingly interesting items that they might not have otherwise discovered.
• PROFILE LEARNER
– This module collects data representative of the user
preferences and tries to generalize this data, in order to construct the user profile.
Usually, the generalization strategy is realized through machine learning
techniques, which are able to infer a model of user interests starting from items
liked or disliked in the past. For instance, the PROFILE LEARNER of a Web page
recommender can implement a relevance feedback method in which the learning
technique combines vectors of positive and negative examples into a prototype
vector representing the user profile. Training examples are Web pages on which a
positive or negative feedback has been provided by the user;
The first step of the recommendation process is the one performed by the
CONTENT ANALYZER, that usually borrows techniques from Information
Retrieval system. Item descriptions coming from Information Source are processed
by the CONTENT ANALYZER, that extracts features (keywords, n-grams,
concepts, . . . ) from unstructured text to produce a structured item representation,
stored in the repository Represented Items. In order to construct and update the
profile of the active user u (user for which recommendations must be provided) her
reactions to items are collected in some way and recorded in the repository
Feedback. These reactions, called annotations or feedback, together with the related
item descriptions, are exploited during the process of learning a model useful to
predict the actual relevance of newly presented items. Users can also explicitly
define their areas of interest as an initial profile without providing any feedback.
Typically, it is possible to distinguish between two kinds of relevance feedback:
positive information (inferring features liked by the user) and
negative information
Two different techniques can be adopted for recording user’s feedback.
When a system requires the user to explicitly evaluate items, this technique is
usually referred to as “explicit feedback”; the other technique, called “implicit
feedback”, a does not require any active user involvement, in the sense that
feedback is derived from monitoring and analyzing user’s activities. Explicit
evaluations indicate how relevant or interesting an item is to the user. There are
three main approaches to get explicit relevance feedback:
like/dislike – items are classified as “relevant” or “not relevant” by
adopting a simple binary rating scale; ratings – a discrete numeric scale is usually
adopted to judge items. Alternatively, symbolic ratings are mapped to a numeric
scale, such as in Syskill & Webert, where users have the possibility of rating a Web
page as hot, lukewarm, or cold; text comments – Comments about a single item are
collected and presented to the users as a means of facilitating the decision-making
process. For instance, customer’s feedback at Amazon.com or eBay.com might
help users in deciding whether an item has been appreciated by the community.
Textual comments are helpful, but they can overload the active user because she
must read and interpret each comment to decide if it is positive or negative, and to
what degree.
The literature proposes advanced techniques from the affective
computing research area to make content-based recommenders able to
automatically perform this kind of analysis.