0% found this document useful (0 votes)
18 views

IR Lecture 6b

Uploaded by

mhc2023006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

IR Lecture 6b

Uploaded by

mhc2023006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Lecture 9: Query Expansion

This lecture
 Improving results
• For high recall. E.g., searching for aircraft doesn’t match
with plane; nor thermodynamic with heat
 Options for improving results…
• Focus on relevance feedback
• The complete landscape
• Global methods
• Query expansion
• Thesauri
• Automatic thesaurus generation
• Local methods
• Relevance feedback
• Pseudo relevance feedback
Query
expansion
Relevance Feedback
 Relevance feedback: user feedback on relevance
of docs in initial set of results
• User issues a (short, simple) query
• The user marks returned documents as relevant or non-
relevant.
• The system computes a better representation of the
information need based on feedback.
• Relevance feedback can go through one or more
iterations.
 Idea:
it may be difficult to formulate a good query
when you don’t know the collection well, so iterate
Relevance Feedback: Example
 Image search engine
http://nayana.ece.ucsb.edu/imsearch/imsearc
h.html
Results for Initial Query
Relevance Feedback
Results after Relevance Feedback
Rocchio Algorithm
 The Rocchio algorithm incorporates relevance feedback
information into the vector space model.
 Want to maximize sim (Q, Cr) - sim (Q, Cnr)
 The optimal query vector for separating relevant and
non-relevant documents (with cosine sim.):
 1  1 
 j  j
Qopt  d  d
Cr d j C r N  Cr d j C r

Qopt = optimal query; Cr = set of rel. doc vectors; N =


collection size
 Unrealistic: we don’t know relevant documents.
The Theoretically Best Query
x x
x x
o x x
x x x x
o x x
x
o x
o x x
 o o x
x

x non-relevant documents
Optimal
query o relevant documents
Rocchio 1971 Algorithm (SMART)
 Used in practice:
  1  1 
 j  j
qm q0   d  d
Dr d j Dr Dnr d j Dnr

qm = modified query vector; q0 = original query vector;


α,β,γ: weights (hand-chosen or set empirically); Dr =
set of known relevant doc vectors; Dnr = set of known
irrelevant doc vectors
 New query moves toward relevant documents and
away from irrelevant documents
 Tradeoff α vs. β/γ : If we have a lot of judged
documents, we want a higher β/γ.
 Term weight can go negative
Relevance feedback on initial query
Initial
x x
query x
o x
 x x
x x
o x x
x
x o
x o x
o o x
x x
x
x known non-relevant documents
Revised
query o known relevant documents
Relevance Feedback in vector spaces
 We can modify the query based on relevance feedback
and apply standard vector space model.
 Use only the docs that were marked.
 Relevance feedback can improve recall and precision
 Relevance feedback is most useful for increasing recall
in situations where recall is important
• Users can be expected to review results and to take
time to iterate
Positive vs Negative Feedback
 Positive feedback is more valuable than
negative feedback (so, set  < ; e.g.  =
0.25,  = 0.75).
 Many systems only allow positive feedback
y ?
(=0). W
h
Probabilistic relevance feedback
 Rather than reweighting in a vector space…
 If user has told us some relevant and irrelevant documents,
then we can proceed to build a classifier, such as a Naive
Bayes model:
• P(tk|R) = |Drk| / |Dr|
• P(tk|NR) = (Nk - |Drk|) / (N - |Dr|)
• tk = term in document; Drk = known relevant doc
containing tk; Nk = total number of docs containing tk
• This is effectively another way of changing the query term
weights
• But note: the above proposal preserves no memory of the
original weights
Relevance Feedback: Assumptions

 A1: User has sufficient knowledge for initial query.


 A2: Relevance prototypes are “well-behaved”.

• Term distribution in relevant documents will be similar


• Term distribution in non-relevant documents will be
different from those in relevant documents
• Either: All relevant documents are tightly clustered around a
single prototype.
• Or: There are different prototypes, but they have significant
vocabulary overlap.
• Similarities between relevant and irrelevant documents are
small
Violation of A1
 Userdoes not have sufficient initial
knowledge.
 Examples:

• Misspellings (Brittany Speers).


• Cross-language information retrieval
• Mismatch of searcher’s vocabulary vs. collection
vocabulary
• Cosmonaut/astronaut
Violation of A2
 Thereare several relevance prototypes.
 Examples:

• Burma/Myanmar
• Contradictory government policies
 Often:instances of a general concept
 Good editorial content can address

problem
• Report on contradictory government policies
Relevance Feedback: Problems
 Why do most search engines not use
relevance feedback?
Relevance Feedback: Problems
 Long queries are inefficient for typical IR engine.
• Long response times for user.
• High cost for retrieval system. W
hy
• Partial solution: ?
• Only reweight certain prominent terms
• Perhaps top 20 by term frequency
 Users are often reluctant to provide explicit feedback
 It’s often harder to understand why a particular document was
retrieved after apply relevance feedback
Evaluation of relevance feedback
strategies
 Use q0 and compute precision and recall graph
 Use qm and compute precision recall graph
• Assess on all documents in the collection
• Spectacular improvements, but … it’s cheating!
• Partly due to known relevant documents ranked higher
• Must evaluate with respect to documents not seen by user
• Use documents in residual collection (set of documents minus
those assessed relevant)
• Measures usually then lower than for original query
• But a more realistic evaluation
• Relative performance can be validly compared
 Empirically, one round of relevance feedback is often
very useful. Two rounds is sometimes marginally useful.
Relevance Feedback on the Web

 Some search engines offer a similar/related pages feature


(this is a trivial form of relevance feedback)
• Google (link-based)
α/β/γ ??
• Altavista
• Stanford WebBase
 But some don’t because it’s hard to explain to average
user:
• Alltheweb
• msn
• Yahoo
 Excite initially had true relevance feedback, but
abandoned it due to lack of use.
Excite Relevance Feedback
Spink et al. 2000
 Only about 4% of query sessions from a user used

relevance feedback option


• Expressed as “More like this” link next to each result
 Butabout 70% of users only looked at first page of
results and didn’t pursue things further
• So 4% is about 1/8 of people extending search
 Relevance feedback improved results about 2/3 of the
time
Other Uses of Relevance Feedback
 Following a changing information need
 Maintaining an information filter (e.g., for a

news feed)
Relevance Feedback
Summary

 Relevance feedback has been shown to be very


effective at improving relevance of results.
• Requires enough judged documents, otherwise it’s unstable (≥
5 recommended)
• Requires queries for which the set of relevant documents is
medium to large
 Full relevance feedback is painful for the user.
 Full relevance feedback is not very efficient in most IR

systems.
 Other types of interactive retrieval may improve

relevance by as much with less work.


The complete landscape
 Global methods
• Query expansion/reformulation
• Thesauri (or WordNet)
• Automatic thesaurus generation
• Global indirect relevance feedback
 Local methods
• Relevance feedback
• Pseudo relevance feedback
Query Reformulation: Vocabulary
Tools
 Feedback

• Information about stop lists, stemming, etc.


• Numbers of hits on each term or phrase
 Suggestions

• Thesaurus
• Controlled vocabulary
• Browse lists of terms in the inverted index
Query Expansion
 In relevance feedback, users give
additional input (relevant/non-relevant) on
documents, which is used to reweight
terms in the documents
 In query expansion, users give additional

input (good/bad search term) on words or


phrases.
Query Expansion: Example

Also: see www.altavista.com, www.teoma.com


Types of Query Expansion
 Global Analysis: (static; of all documents in collection)
• Controlled vocabulary
• Maintained by editors (e.g., medline)
• Manual thesaurus
• E.g. MedLine: physician, syn: doc, doctor, MD, medico
• Automatically derived thesaurus
• (co-occurrence statistics)
• Refinements based on query log mining
• Common on the web
 Local Analysis: (dynamic)
• Analysis of documents in result set
Controlled Vocabulary
Thesaurus-based Query Expansion
This doesn’t require user input
For each term, t, in a query, expand the query with synonym
and related words of t from the thesaurus
• feline → feline cat
May weight added terms less than original query terms.
Generally increases recall.
Widely used in many science/engineering fields
May significantly decrease precision, particularly with
ambiguous terms.
• “interest rate”  “interest rate fascinate evaluate”
There is a high cost of manually producing a thesaurus
Automatic Thesaurus Generation
 Attempt to generate a thesaurus automatically
by analyzing the collection of documents
 Two main approaches

• Co-occurrence based (co-occurring words are


more likely to be similar)
• Shallow analysis of grammatical relations
• Entities that are grown, cooked, eaten, and digested
are more likely to be food items.
 Co-occurrence based is more robust,
grammatical relations are more accurate .
Co-occurrence Thesaurus
 Simplest way to compute one is based on term-term
similarities in C = AAT where A is term-document matrix.
 wi,j = (normalized) weighted count (ti , dj)

dj n

ti

m
Automatic Thesaurus Generation
Example
Automatic Thesaurus Generation
Discussion
 Quality of associations is usually a problem.
 Term ambiguity may introduce irrelevant statistically

correlated terms.
• “Apple computer”  “Apple red fruit computer”
 Problems:

• False positives: Words deemed similar that are not


• False negatives: Words deemed dissimilar that are similar
 Sinceterms are highly correlated anyway,
expansion may not retrieve many additional
documents.
Query Expansion: Summary
 Query expansion is often effective in increasing
recall.
• Not always with general thesauri
• Fairly successful for subject-specific collections
 In most cases, precision is decreased, often
significantly.
 Overall, not as useful as relevance feedback;

may be as good as pseudo-relevance feedback


Pseudo Relevance Feedback
 Mostlyworks (perhaps better than global
analysis!)
• Found to improve performance in TREC ad-hoc task
• Danger of query drift
 pseudo relevance feedback is the relevance
feedback without user intervention
 Automatic local analysis

 Pseudo relevance feedback attempts to

automate the manual part of relevance feedback.


 Retrieve an initial set of relevant
documents.
 Assume that top m ranked documents

are relevant.
 Do relevance feedback
There are two well-known models for applying relevance
feedback:
 (i) Ide’s method [69]: This method adds all the terms of
relevant documents of the set provided for relevance
feedback and removes the terms of first irrelevant
document in the set. Modified query vector is constructed
as:
qnew qold   di  s
diR

where qold is the original query vector, qnew is the new


query, di is the vector of relevant document R and s is
the vector of the first irrelevant document in the feedback
set.
 (ii) Rocchio’s method [127]: Rocchio’s method consists of
moving the initial query vector toward the centroid of the
relevant documents and away from the centroid of the non-
relevant documents. It attempts to estimate “the optimal” user
query through relevance feedback. This can be described by
the following equation:
qnew  qold    di    di
diR diI

Where, di is relevant or irrelevant document obtained by manual or


automatic feedback during initial retrieval, I is the set of
irrelevant documents and α, β and γ are coefficients. These
coefficients are set by trial and error method.
α =1, β=.75 and γ = .15
 In pseudo-relevance feedback, the
modified query vector is calculated by
dropping the negative terms appearing in
the Ide’s and Rocchio’s equation.
query drift
 Query expansion following retrieval feedback may
degrade the performance if the top ranked
documents retrieved during initial run are not
relevant.
 Expanding query by adding terms from irrelevant
documents, or adding terms from relevant
documents that are not closely related to the query
terms, will move the query representation away from
what may be “optimal” query representation.
 This results in alteration of focus of the query i.e.
‘query drift’.
Pseudo relevance feedback:
Cornell SMART at TREC 4
 Results show number of relevant documents out
of top 100 for 50 queries (so out of 5000)
 Results contrast two length normalization

schemes (L vs. l), and pseudo relevance feedback


(PsRF) (done as adding 20 terms)

• lnc.ltc 3210
• lnc.ltc-PsRF 3634
• Lnu.ltu 3709
• Lnu.ltu-PsRF 4350
Indirect relevance feedback
 On the web, DirectHit introduced a form of
indirect relevance feedback.
 DirectHit ranked documents higher that users

look at more often.


• Clicked on links are assumed likely to be relevant
• Assuming the displayed summaries are good, etc.
 Globally: Not user or query specific.
 This is the general area of clickstream mining

You might also like