IR Lecture 6b
IR Lecture 6b
This lecture
Improving results
• For high recall. E.g., searching for aircraft doesn’t match
with plane; nor thermodynamic with heat
Options for improving results…
• Focus on relevance feedback
• The complete landscape
• Global methods
• Query expansion
• Thesauri
• Automatic thesaurus generation
• Local methods
• Relevance feedback
• Pseudo relevance feedback
Query
expansion
Relevance Feedback
Relevance feedback: user feedback on relevance
of docs in initial set of results
• User issues a (short, simple) query
• The user marks returned documents as relevant or non-
relevant.
• The system computes a better representation of the
information need based on feedback.
• Relevance feedback can go through one or more
iterations.
Idea:
it may be difficult to formulate a good query
when you don’t know the collection well, so iterate
Relevance Feedback: Example
Image search engine
http://nayana.ece.ucsb.edu/imsearch/imsearc
h.html
Results for Initial Query
Relevance Feedback
Results after Relevance Feedback
Rocchio Algorithm
The Rocchio algorithm incorporates relevance feedback
information into the vector space model.
Want to maximize sim (Q, Cr) - sim (Q, Cnr)
The optimal query vector for separating relevant and
non-relevant documents (with cosine sim.):
1 1
j j
Qopt d d
Cr d j C r N Cr d j C r
x non-relevant documents
Optimal
query o relevant documents
Rocchio 1971 Algorithm (SMART)
Used in practice:
1 1
j j
qm q0 d d
Dr d j Dr Dnr d j Dnr
• Burma/Myanmar
• Contradictory government policies
Often:instances of a general concept
Good editorial content can address
problem
• Report on contradictory government policies
Relevance Feedback: Problems
Why do most search engines not use
relevance feedback?
Relevance Feedback: Problems
Long queries are inefficient for typical IR engine.
• Long response times for user.
• High cost for retrieval system. W
hy
• Partial solution: ?
• Only reweight certain prominent terms
• Perhaps top 20 by term frequency
Users are often reluctant to provide explicit feedback
It’s often harder to understand why a particular document was
retrieved after apply relevance feedback
Evaluation of relevance feedback
strategies
Use q0 and compute precision and recall graph
Use qm and compute precision recall graph
• Assess on all documents in the collection
• Spectacular improvements, but … it’s cheating!
• Partly due to known relevant documents ranked higher
• Must evaluate with respect to documents not seen by user
• Use documents in residual collection (set of documents minus
those assessed relevant)
• Measures usually then lower than for original query
• But a more realistic evaluation
• Relative performance can be validly compared
Empirically, one round of relevance feedback is often
very useful. Two rounds is sometimes marginally useful.
Relevance Feedback on the Web
news feed)
Relevance Feedback
Summary
systems.
Other types of interactive retrieval may improve
• Thesaurus
• Controlled vocabulary
• Browse lists of terms in the inverted index
Query Expansion
In relevance feedback, users give
additional input (relevant/non-relevant) on
documents, which is used to reweight
terms in the documents
In query expansion, users give additional
dj n
ti
m
Automatic Thesaurus Generation
Example
Automatic Thesaurus Generation
Discussion
Quality of associations is usually a problem.
Term ambiguity may introduce irrelevant statistically
correlated terms.
• “Apple computer” “Apple red fruit computer”
Problems:
are relevant.
Do relevance feedback
There are two well-known models for applying relevance
feedback:
(i) Ide’s method [69]: This method adds all the terms of
relevant documents of the set provided for relevance
feedback and removes the terms of first irrelevant
document in the set. Modified query vector is constructed
as:
qnew qold di s
diR
• lnc.ltc 3210
• lnc.ltc-PsRF 3634
• Lnu.ltu 3709
• Lnu.ltu-PsRF 4350
Indirect relevance feedback
On the web, DirectHit introduced a form of
indirect relevance feedback.
DirectHit ranked documents higher that users