0% found this document useful (0 votes)
272 views

Query Languages and Query Operation: Chapter Seven

This document discusses different types of query languages used for information retrieval systems, including keyword, phrase, boolean, and other advanced queries. It explains how each type of query works by matching documents based on words, phrases, boolean operators, or other criteria. The document also covers techniques for improving queries, such as relevance feedback and query expansion, to address problems with basic keyword matching and retrieve additional relevant documents.

Uploaded by

milkikoo shifera
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
272 views

Query Languages and Query Operation: Chapter Seven

This document discusses different types of query languages used for information retrieval systems, including keyword, phrase, boolean, and other advanced queries. It explains how each type of query works by matching documents based on words, phrases, boolean operators, or other criteria. The document also covers techniques for improving queries, such as relevance feedback and query expansion, to address problems with basic keyword matching and retrieve additional relevant documents.

Uploaded by

milkikoo shifera
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Query Languages and Query

Operation
Chapter seven

1 04/29/21
Common types of query
A. Keyword-based querying
Queries are combinations of words.
The document collection is searched for documents that
contain these words.
 Word queries are intuitive, easy to express and provide fast
ranking.
 Here, the concept of word must be defined.
 A word is a sequence of letters terminated by a separator (period,
comma, space, etc).
 Definition of letter and separator is flexible; e.g., hyphen could be
defined as a letter or as a separator.
 Usually, common words (such as “a”, “the”, “of”, …) are ignored.

2 04/29/21
B. Single-word queries
A query is a single word: Usually used for searching in
document images
Simplest form of query.
All documents that include this word are retrieved.
 Documents may be ranked by the frequency of this
word in the document.

3 04/29/21
C. Phrase queries
 In this case a query is a sequence of words treated as a
single unit.
Also called “literal string” or “exact phrase” query.
 Phrase is usually surrounded by quotation marks.
 All documents that include this phrase are retrieved.
 Usually, separators (commas, colons, etc.) and common
words (e.g., “a”, “the”, “of”, “for”…) in the phrase are
ignored.
In effect, this query is for a set of words that must appear
in sequence.
Allows users to specify a context and thus gain more precision.
Example: “Information Processing for Document
Retrieval”.
4 04/29/21
D. Multiple-word queries
In this case a query is a set of words (or phrases).
Two options: A document is retrieved if it includes:
 Any of the query words, or
 Each of the query words.
Documents are ranked by the number of query words
they contain:
 A document containing n query words is ranked higher than a
document containing m < n query words.
 Documents are ranked in decreasing order:
Those containing all the query words are ranked at the top,
only one query word at bottom.
 Frequency counts may be used to break ties among documents
that contain the same query words.
5 Example: what is the result for the query “Red Bird” ? 04/29/21
E. Boolean queries
Based on concepts from logic: AND, OR, NOT
It describes the information needed by relating multiple
words with Boolean operators.
Operators: AND, OR, NOT
Semantics: For each query word w a corresponding set Dw
is constructed that includes the documents that contain w.
The Boolean expression is then interpreted as an
expression on the corresponding document sets with
corresponding set operators:
AND: Finds only documents containing all of the
specified words or phrases.
OR: Finds documents containing at least one of the
specified words or phrases.
NOT: Excludes documents containing the specified word
or phrase.
6 04/29/21
Boolean Queries
Precedence: Order of operations
NOT, AND, OR
use parentheses to override precedence
process left-to-right among operators with the same
precedence.
Truth Table

P Q NOT P P AND Q P OR Q
0 0 TRUE FALSE FALSE
0 1 TRUE FALSE TRUE
1 0 FALSE FALSE TRUE
7
1 1 FALSE TRUE TRUE 04/29/21
Examples: Boolean queries
1.computer OR server
Finds documents containing either computer, server or both

2. (computer OR server) NOT mainframe


 Select all documents that discuss computers or servers, do
not select any documents that discuss mainframes.
3. computer NOT (server OR mainframe)
Select all documents that discuss computers, and do not
discuss either servers or mainframes.
4. computer OR server NOT mainframe
Select all documents that discuss computers, or documents
that discuss servers but do not discuss mainframes.

8 04/29/21
Penalizing documents
When interpreting queries, some models demote
or reduced documents that include keywords that
were not requested. For example:
Example: Assume the vector model with the
cosine measure and the simple case that both
documents and queries use binary values.
Consider these two documents and a query:
 d1 = (0,1,0,1,0), d2= (0,1,1,1,0), q= (0,1,0,1,0)
 sim(q, d1) = 1.0, sim(q, d2) = 0.82
 d2 is demoted because it includes an extra keyword
not requested by q.
In contrast, the Boolean model does not
“penalize” documents with extra (non-requested)
keywords
9 04/29/21
Query Operations
Relevance Feedback &
Query Expansion

10 04/29/21
Problems with Keywords
May not retrieve relevant documents that include
synonymous terms.
◦ “restaurant” vs. “café”
◦ “PRC” vs. “China”
May retrieve irrelevant documents that include
ambiguous terms.
◦ “bat” (baseball vs. mammal)
◦ “Apple” (company vs. fruit)
◦ “bit” (unit of data vs. act of eating)

11 04/29/21
Query operations
No detailed knowledge of collection and retrieval
environment
 difficult to formulate queries well designed for retrieval
 Need many formulations of queries for effective retrieval
 First formulation: often naïve attempt to retrieve
relevant information
 Documents initially retrieved:
 Can be examined for relevance information by user,
automatically by the system
 Improve query formulations for retrieving additional
relevant documents

12 04/29/21
Query reformulation
Two basic techniques to revise query to account for
feedback:
Query expansion: Expanding original query with new terms
from relevant documents.
This is done by adding new terms to query from
relevant documents.
Term reweighting in expanded query: Modify term weights
based on user relevance judgements.
Increase weight of terms in relevant documents
decrease weight of terms in irrelevant documents

13 04/29/21
Approaches for Relevance Feedback
 Approaches based on Users relevance feedback
Relevance feedback with user input
Description of cluster built interactively with user assistance
 Approaches based on pseudo relevance feedback
Use relevance feedback methods without explicit user
involvement.
Obtain cluster description automatically
Identify terms related to query terms
e.g. synonyms, stemming variations, terms close to query terms in text

14 04/29/21
User Relevance Feedback
Most popular query reformulation strategy
Cycle:
 User presented with list of retrieved documents
After initial retrieval results are presented, allow the user to provide
feedback on the relevance of one or more of the retrieved documents.
 User marks those which are relevant
In practice: top 10-20 ranked documents are examined
 Use this feedback information to reformulate the query.
Select important terms from documents assessed relevant by users
 Enhance importance of these terms in a new query
Produce new results based on reformulated query.
 Allows more interactive, multi-pass process.
Expected:
 New query moves towards relevant documents and away from non-
relevant documents

15 04/29/21
User Relevance Feedback
Architecture
Query Document
String corpus

Revised Rankings
IR ReRanked
Query System Documents
1. Doc2
2. Doc4
Query 3. Doc5
Ranked 1. Doc1
Reformulation 2. Doc2 .
Documents 3. Doc3 .
1. Doc1  .
2. Doc2  .
3. Doc3 
Feedback .
16 . 04/29/21
Pseudo Relevance Feedback
Just assume the top m retrieved documents are relevant,
and use them to reformulate the query.
Allows for query expansion that includes terms that are
correlated with the query terms.
 Two strategies:
Local strategies: Approaches based on information
derived from set of initially retrieved documents (local set
of documents)
Global strategies: Approaches based on global
information derived from document collection.

17 04/29/21
Pseudo Feedback Architecture

Query Document
String corpus

Revised Rankings
IR ReRanked
Query System Documents
1. Doc2
2. Doc4
Query 3. Doc5
Ranked 1. Doc1
Reformulation 2. Doc2 .
Documents 3. Doc3 .
.
1. Doc1 
.
Pseudo 2. Doc2 
3. Doc3 
Feedbac .
18 04/29/21
k .
Query Expansion Conclusions
Expansion of queries with related terms
can improve performance, particularly
recall.
However, must select similar terms very
carefully to avoid problems, such as loss
of precision.

19 04/29/21
Thank You

20 04/29/21

You might also like