0% found this document useful (0 votes)
4 views

Week 2 - Information Retrieval Basics

The document outlines the fundamentals of Information Retrieval (IR), detailing various retrieval models, techniques, and approaches used to find relevant documents from large collections. It covers topics such as text-based retrieval, Boolean retrieval, vector space retrieval, and the importance of indexing and similarity computation in the retrieval process. Additionally, it highlights the role of user queries and the different types of data that can be retrieved, including unstructured data like text and multimedia.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week 2 - Information Retrieval Basics

The document outlines the fundamentals of Information Retrieval (IR), detailing various retrieval models, techniques, and approaches used to find relevant documents from large collections. It covers topics such as text-based retrieval, Boolean retrieval, vector space retrieval, and the importance of indexing and similarity computation in the retrieval process. Additionally, it highlights the role of user queries and the different types of data that can be retrieved, including unstructured data like text and multimedia.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Distributed Information Systems

(CIS3-535)

Prof. Dr. Mourad Elloumi

2022 -
2023
Information Retrieval- 1
Part 1

Information Retrieval

Information Retrieval- 2
1. Information Retrieval - Overview
1.1 Introduction to Information Retrieval 1.4 Embedding Techniques
1.2 Basic Information Retrieval 1.4.1 Latent Semantic Indexing
1.2.1 Text-Based Information Retrieval 1.4.2 Latent Dirichlet Allocation
1.2.2 Boolean Retrieval 1.4.3 Word Embeddings – skipgram, CBOW
1.2.3 Vector Space Retrieval 1.4.4 Fasttext
1.2.4 Evaluating Information Retrieval 1.4.5 Glove
1.2.5 Probabilistic Information Retrieval 1.5 Link-Based Ranking
1.2.6 Query Expansion 1.5.1 PageRank
1.2.6.1 User Relevance Feedback 1.5.2 Hyperlink-Induced Topic Search (HITS)
1.2.6.2 Global Query Expansion 1.5.3 Link Indexing
1.3 Indexing for Information Retrieval
1.3.1 Inverted Index
1.3.2 Web-scale Indexing: Map-Reduce
1.3.3 Distributed Retrieval
1.3.3.1 Fagin’s algorithm
1.3.3.2 Threshold algorithm

Information Retrieval- 3
1.1 INTRODUCTION TO INFORMATION
RETRIEVAL

Information Retrieval- 4
What is Information Retrieval?
Information retrieval (IR) is the task of finding in a
large collection of documents those that satisfy the
information needs of a user

Examples
– Searching documents in a library
– Searching the Web

Information Retrieval- 5
Different Types of Information Retrieval
Documents can be
– unstructured data like texts, images, audio, and video
Queries can be both structured and unstructured
– Boolean expressions
– Free text, sample documents
Results can be sorted or unsorted
– Results sets
– Ranked lists

Information Retrieval- 6
Basic Information Retrieval Approach
information items retrieval
(unstructured) content system:
feature retrieval efficiency
extraction model:
structured relevance
document
representation

ranked/
similarity ?
binary
result
structured matching
query
representation

query
information need
formulation
(unstructured) query
Information Retrieval- 7
Example: Text Retrieval
Web documents information retrieval
text content system:
feature retrieval Google,
extraction: model: Bing etc.
terms, words example: Boolean,
"web retrieval" Vector etc.

ranked list
similarity ?
of Web
documents
example: matching:
"web information occurrence of
retrieval" query terms in
document
query
Web search formulation:
keywords
Information Retrieval- 8
Formally
𝐶⊆𝐷 𝐹 𝑐 : 𝑃 (𝐷)→ 𝑅𝑒𝑝𝑐 The representation depends on the
individual document as well as on
𝐹 𝐷 : 𝐷 × 𝑅𝑒𝑝 𝑐 → 𝑅𝑒𝑝 𝐷 the features of collection !

𝑑
𝑟𝑒𝑝 𝑑 =𝐹 𝐷 ( 𝑑 , 𝐹 𝐶 (𝐶) )
, )}

𝑠𝑖𝑚: 𝑅𝑒𝑝 𝐷 × 𝑅𝑒𝑝𝑄 →𝑉

𝑟𝑒𝑝 𝑞 =𝐹 𝑄 ( 𝑞 , 𝐹 𝐶 (𝐶) )

The similarity function can be computed


𝑞 ∈𝑄𝐹 𝑄 :𝑄× 𝑃 (𝐷)→𝑅𝑒𝑝𝑄 for each document individually!

Information Retrieval- 9
Retrieval Model
The retrieval model determines
– the structure of the document representation
– the structure of the query representation
– the similarity matching function

Relevance
– determined by the similarity matching function
– should reflect right topic, user needs, authority, recency
– no objective measure

Information Retrieval- 10
What does the Similarity Function compute?
Two basic models
1. Boolean Retrieval:
2. Ranked Retrieval: or

General Wisdom
- Boolean retrieval suitable for
- Experts: can formulate accurate queries
- Machines: can consume large results
- Ranked Retrieval good for ordinary users

Information Retrieval- 11
Information Filtering
filtering
system:
retrieval efficiency
information item feature
model:
content extraction structured relevance
document
representation

similarity ? disseminate
item if
relevant
structured matching
query
representation

query
information
profile
needs
Information Retrieval- 12
Information Retrieval and Browsing
Retrieval
– Produce a ranked result from a user request
– Interpretation of the information by the system
Browsing
– Let the user navigate in the information set
– Relevance feedback by the human

Retrieval
?

Browsing
Information Retrieval- 13
Other tasks
In a more general sense Information Retrieval is used
for a number of different types of tasks, such as
– Information filtering
– Document summarization
– Question answering
– Recommendation
– Document classification

Information Retrieval- 14
IR is an Information Management Task

model M

Model Usage: given a model, Model Building: given data, find a


compute some specific data model the matches the data

Retrieve result documents for In IR the model is often based on


a query simple statistics on the data
data

Information Retrieval- 15
1.2 BASIC INFORMATION RETRIEVAL

Information Retrieval- 16
1.2.1 Text-based Information Retrieval
Most of the information needs and content are expressed in
natural language
– Library and document management systems
– Web Search Engines
Basic approach: use the words that occur in a text as
features for the interpretation of the content
– This is called the "full text” or “bag of words” retrieval approach
– Ignore grammar, meaning etc.
– Simplification that has proven successful
– Document structure, layout and metadata may be considered
additionally (e.g., PageRank/Google)
Information Retrieval- 17
Architecture of Text Retrieval Systems
User Text
Interface
user need Text
1. feature extraction
Text Operations

query repr. document repr.


Query DB Manager
Indexing
user feedback Operations Module

query inverted file

Searching Index

retrieved docs 3. efficient data access


Text
Database
Ranking
ranked docs
2. ranking system
Information Retrieval- 18
Pre-Processing Text for Text Retrieval

Feature Extraction

Manual
Docs Tokens stopwords stemming indexing

Structure
Layout
Metadata

Structured Full text Index terms


data

Information Retrieval- 19
Text Retrieval - Basic Concepts and Notations
Document d: expresses ideas about some topic in a natural language
Query q: expresses an information need for documents pertaining
to some topic
Index term: a semantic unit, a word, short phrase, or potentially root
of a word
Database DB: collection of n documents dj DB, j=1,…,n
Vocabulary T: collection of m index terms ki T, i=1,…,m

A document is represented by a set of index terms ki

The importance of an index term ki for the meaning of a document dj is represented


by a weight wij [0,1]; we write dj = (w1j, …,wmj)

The IR system assigns a similarity coefficient sim(q ,dj) as an estimate for the
relevance of a document dj DB for a query q.
Information Retrieval- 20
Example: Documents
B1 A Course on Integral Equations
B2 Attractors for Semigroups and Evolution Equations
B3 Automatic Differentiation of Algorithms: Theory, Implementation, and Application
B4 Geometrical Aspects of Partial Differential Equations
B5 Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative
Algebra
B6 Introduction to Hamiltonian Dynamical Systems and the N-Body Problem
B7 Knapsack Problems: Algorithms and Computer Implementations
B8 Methods of Solving Singular Systems of Ordinary Differential Equations
B9 Nonlinear Systems
B10 Ordinary Differential Equations
B11 Oscillation Theory for Neutral Differential Equations with Delay
B12 Oscillation Theory of Delay Differential Equations
B13 Pseudodifferential Operators and Nonlinear Partial Differential Equations
B14 Sinc Methods for Quadrature and Differential Equations
B15 Stability of Stochastic Differential Equations with Respect to Semi-Martingales
B16 The Boundary Integral Approach to Static and Dynamic Contact Problems
B17 The Double Mellin-Barnes Type Integrals and Their Applications to Convolution Theory

Information Retrieval- 21
Term-Document Matrix
Matrix of weights wij

This example
• Vocabulary (contains only terms that occur multiple times, no stop words)
• all weights are set to 1 (equal importance)
Information Retrieval- 22
Implementation in Python

Information Retrieval- 23
Implementation in Python

Information Retrieval- 24
A retrieval model attempts to capture …
1. the interface by which a user is accessing information
2. the importance a user gives to a piece of information
for a query
3. the formal correctness of a query formulation by user
4. the structure by which a document is organised

Information Retrieval- 25
Full-text retrieval refers to the fact that …
1. the document text is grammatically fully analyzed for
indexing
2. queries can be formulated as texts
3. all words of a text are considered as potential index
terms
4. grammatical variations of a word are considered as the
same index terms

Information Retrieval- 26
The entries of a term-document matrix indicate …

1. how many relevant terms a document contains


2. how frequent a term is in a given document
3. how relevant a term is for a given document
4. which terms occur in a document collection

Information Retrieval- 27
1.2.2 Boolean Retrieval
Users specify which terms should be present in the documents
– Simple, based on set-theory, precise meaning
– Frequently used in (old) library systems
– Still many applications, e.g., web harvesting
Example query
– "application" AND "theory” answer: B3, B17

Retrieval Language
expr ::= term | (expr) | NOT expr | expr AND expr | expr OR expr

Weights for index terms appearing in documents


wij = 1 if ki dj and 0 otherwise
Information Retrieval- 28
"Similarity" Computation in Boolean Retrieval
Step 1:
Determine the disjunctive normal form of the query q
– A disjunction of conjunctions
– Using distributivity and Morgans laws, e.g. NOT (s AND t) NOT s OR NOT t
– Thus q = ct1 OR … OR ctl where ct = t1 AND … AND tk and t {t, NOT t}

Step 2:
For each conjunctive term ct create its query weight vector
vec(ct)
– vec(ct)=(w1,…,wm) :
wi = 1 if ki occurs in ct
wi = -1 if NOT ki occurs in ct
wi = 0 otherwise
Information Retrieval- 29
"Similarity" Computation in Boolean Retrieval
Step 3:
If one weight vector of a conjunctive term ct in q matches
the document weight vector dj = (w1j, …,wmj) of a document
dj , then the document dj is relevant, i.e.,
sim(dj, q) = 1
– vec(ct) matches dj if:
wi = 1 wij = 1
wi = -1 wij = 0

Information Retrieval- 30
Example
Index terms {application, algorithm, theory}
Query "application" AND ("algorithm" OR NOT "theory")

Disjunctive normal form of query


("application" AND "algorithm" AND "theory") OR
("application" AND "algorithm" AND NOT "theory") OR
("application" AND NOT "algorithm" AND NOT "theory")

Query weight vectors q={(1,1,1), (1,1,-1), (1,-1,-1)}


Documents d1={algorithm, theory, application} (1,1,1)
d2={algorithm, theory} (0,1,1)
d3={application, algorithm} (1,1,0)

Result sim(d1, q) = sim(d3, q) = 1, sim(d2, q) = 0


Information Retrieval- 31
1.2.3 Vector Space Retrieval
Limitations of Boolean Retrieval
– No ranking: problems with handling large result sets
– Queries are difficult to formulate
– No tolerance for errors
– Queries either return far too many results, or none

Key Idea of Vector Space Retrieval


– Use “free text” queries
– represent both the document and the query by a weight vector
in the m-dimensional keyword space assigning non-binary
weights
– determine their distance in the m-dimensional keyword space

Information Retrieval- 32
Similarity Computation in Vector Space Retrieval

k1
dj

Q
q

km

Since wij > 0 and wiq 0, 0 sim(q, dj) 1

Information Retrieval- 33
Vector Space Retrieval - Properties
Properties
– Ranking of documents according to similarity value
– Documents can be retrieved even if they don’t contain some
query keyword

Today’s standard text retrieval technique


– Web Search Engines
– The vector model is the basis of most search engines, however
they do not rely on it exclusively
– It is simple and fast to compute

Information Retrieval- 34
Example
query vector:
"application algorithms" (1,1)
document vector:
"algorithms" (0, 1) sim(q, B3) 1
(B5, B7)
document vector:
"application algorithms" (1,1)
(B3)

1
sim(q, B5) 
2

Issue: how to determine document vector:


the weights for q and dj? "application" (1, 0)
(B17) Information Retrieval- 35
Term Frequency
Documents are similar if they contain the same keywords
(frequently)
– Therefore use the frequency freq(i,j) of the keyword ki in the
document dj to determine the weight of the document vectors

(Normalized) term frequency of term ki in Document dj


freq (i, j )
tf (i, j ) 
max kT freq(k , j )

Information Retrieval- 36
Example
Vocabulary T = {information, retrieval, agency}
Query q = (information, retrieval) = (1,1,0)

information … retrieval …
…retrieval … …retrieval …
…information… "Result" …retrieval …
…retrieval … …retrieval …

D1 = (1, 1, 0) D2 = (0,1,0)
sim(q,D1)=1 sim(q,D2)=0.7071…

agency … retrieval …
…information… …agency …
retrieval… …retrieval…
…agency … …agency…

D3 = (0.5,0.5, 1) D4 = (0, 1, 1)
sim(q,D3)=0.5773… sim(q,D4)=0.5
Information Retrieval- 37
Inverse Document Frequency
We have not only to consider how frequent a term occurs within a
document (measure for similarity), but also how frequent a term is in
the document collection of size n (measure for distinctiveness)
Inverse document frequency of term ki

n
idf (i ) log( )  [0, log( n)]
ni

ni number of documents in which term ki occurs

Inverse document frequency can be interpreted as the amount of


information associated with the term ki

Term weight (tf-idf) wij = tf(i,j) idf(i)


Information Retrieval- 38
n
idf (i ) log( )  [0, log( n)]
ni
Example
Vocabulary T = {information, retrieval, agency} idf(information)=idf(agency)=log(2)
Query q = (information, retrieval) = (1,1,0) idf(retrieval)=log(1)=0

information … retrieval …
…retrieval … …retrieval …
…information… …retrieval …
…retrieval … …retrieval …

D1 = (log(2), 0, 0) D2 = (0,0,0)
sim(q,D1)=0.7071… sim(q,D2)=0
"Result"
agency … retrieval …
…information… …agency …
retrieval… …retrieval…
…agency … …agency…

D3 = (0.5 log(2),0, log(2)) D4 = (0,0, log(2))


sim(q,D3)=0.316… sim(q,D4)=0
Information Retrieval- 39
Query Weights
The same considerations as for document term weights apply also to
query term weights

Query weight for query q


freq (i, q ) n
wiq  ) log( )
max kT freq (k , q ) ni

Example: Query q = (information, retrieval)


– Query vector: (log(2), 0, 0)
– Scores: sim(q, D1)= 1
sim(q, D2)=0
sim(q, D3)=0.44…
sim(q, D4)=0
Information Retrieval- 40
Example
Query q = "application theory"

Boolean retrieval result


– application AND theory: B3, B17
– application OR theory: B3, B11, B12, B17

Vector retrieval result


– Query vector (0, 0.77…, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.63)
– Ranked Result:B17 1.0
B3 0.69
B12 0.28
B11 0.28
Information Retrieval- 41
Implementation in Python

Information Retrieval- 42
Let the query be represented by {(1, 0, -1), (0, -1, 1)}

and the document by (1, 0, 1). The document …


1. matches the query because it matches the first query vector
2. matches the query because it matches the second query vector
3. does not match the query because it does not match the first
query vector
4. does not match the query because it does not match the second
query vector

Information Retrieval- 43
The term frequency of a term is normalized …
1. by the maximal frequency of all terms in the document
2. by the maximal frequency of the term in the document collection
3. by the maximal frequency of any term in the vocabulary
4. by the maximal term frequency of any document in the
collection

Information Retrieval- 44
The inverse document frequency of a
term can increase …
1. by adding the term to a document that contains the term
2. by removing a document from the document collection that does
not contain the term
3. by adding a document to the document collection that contains
the term
4. by adding a document to the document collection that does not
contain the term

Information Retrieval- 45
The Role of Document Length
When computing cosine similarity, document vectors are
normalized
Result: for Query “information”: information
doc2 will be favored because it is shorter agency
retrieval doc1
system

(0.25,0.25,0.25,0.25)

information
doc2
agency

Singhal 1996
(0.5,0.5,0,0)
Information Retrieval- 46
Normalization of Document Vector
Renormalize document vector

Standard normalization:

original normalization factor


new normalization factor

Information Retrieval- 47
Compensating Bias towards Short Documents

Slope s

s * pivot

(1 – s) * pivot

New normalization
Information Retrieval- 48
Length Normalization
Weighting scheme:
= slope
= original normalization factor, i.e.

Result
• If n < pivot, then N(n) > n and therefore weights will be smaller

Information Retrieval- 49
Pivoted Unique Query Normalization
Practical implementation of the approach
Weighting scheme:
with

average number of distinct terms in a document


number of distinct terms in the document

Information Retrieval- 50
Variants of Vector Space Retrieval Model
The vector model with tf-idf weights is a good ranking strategy for
general collections
– many alternative weighting schemes exist, but are not fundamentally
different

Information Retrieval- 51
Discussion of Vector Space Retrieval Model
Advantages
– term-weighting improves quality of the answer set
– partial matching allows retrieval of docs that approximate the
query conditions
– cosine ranking formula sorts documents according to degree of
similarity to the query
Disadvantages
– assumes independence of index terms; not clear that this is a
disadvantage
– No theoretical justification why the model works

Information Retrieval- 52
1.2.4 Evaluating Information Retrieval
Quality of a retrieval model depends on how well it
matches user needs!

Comparison to database querying


– correct evaluation can be formally verified

Information Retrieval- 53
Evaluating Information Retrieval
Test collections with test queries, where the relevant documents are
identified manually are used to determine the quality of an IR system
(e.g. TREC)

True negatives Document


collection
False negatives

Relevant True positives Answer set A


documents R False positives

Information Retrieval- 54
Recall and Precision
Recall is the fraction of relevant documents retrieved
from the set of total relevant documents collection-
wide
Precision is the fraction of relevant documents
retrieved from the total number retrieved (answer
set)
Relevant Non-relevant
Retrieved True positives False positives
(tp) (fp)
Not Retrieved False negatives True negatives
(fn) (tn)

Information Retrieval- 55
Recall and Precision – A Tradeoff
Suppose you search for “Theory of Relativity”.

Optimizing recall: retrieve all pages mentioning “theory” and “relativ*”


• We will have probably most documents talking about the topic
• We might have results such as, “In theory, I feel relatively good”,
“Relative to the theory of evolution …” etc.
Optimizing precision: retrieve all pages mentioning “relativity theory”
and “expanding universe”
• Most likely all results are relevant
• But we might miss “the theory of relativity by Einstein”

Thus high recall hurts precision and vice versa


Information Retrieval- 56
Combined Measures
Sometimes we want to characterize the performance
of a retrieval system by one number
F-Measure: weighted harmonic mean

F1: balanced F-Measure with :

Information Retrieval- 57
Accuracy

Appropriate metric when


- Classes are not skewed
- Errors have the same importance

Information Retrieval- 58
Accuracy - Pitfall
Class
Classifier 1
Fraud ¬Fraud
Fraud 5 10
Classified ¬Fraud 5 80

A = 85/100 = 0.85
Class
Always ¬Fraud
Fraud ¬Fraud
Fraud 0 0
Classified ¬Fraud 10 90

A = 90/100 = 0.90
Information Retrieval- 59
Which is the “best” classifier?
Class Class
Classifier 1 Classifier 2
A B A B

A 45 20 A 40 10
Classified Classified
B 5 30 B 10 40

A. Classifier 1
B. Classifier 2
C. Both are equally good

Information Retrieval- 60
Which is the “best” classifier?
Class Class
Classifier 1 Classifier 2
Cancer ¬Cancer Cancer ¬Cancer

Cancer 45 20 Cancer 40 10
Classified Classified
¬Cancer 5 30 ¬Cancer 10 40

A. Classifier 1
B. Classifier 2
C. Both are equally good

Information Retrieval- 61
Precision and Recall: Example
Class Class
Classifier 1 Classifier 2
Cancer ¬Cancer Cancer ¬Cancer

Cancer 45 20 Cancer 40 10
Classified Classified
¬Cancer 5 30 ¬Cancer 10 40

P1=45/65=0.69 P2=40/50=0.8
R1=45/50=0.9 R2=40/50=0.8

Class
Everybody has cancer
Cancer ¬Cancer

Cancer 50 50 P = 50/100 = 0.5


Classified
¬Cancer 0 0 R = 50/50 = 1
Information Retrieval- 62
F-Score: Example (alpha = 1/2)
Class Class
Classifier 1 Classifier 2
Cancer ¬Cancer Cancer ¬Cancer

Cancer 45 20 Cancer 40 10
Classified Classified
¬Cancer 5 30 ¬Cancer 10 40

F1=2*(0.69*0.9)/(0.69+0.9) F2=2*(0.8*0.8)/(0.8+0.8)
= 0.78 =0.8

Class
Everybody has cancer
Cancer ¬Cancer

Cancer 50 50
Classified
¬Cancer 0 0
F=2*(0.5*1)/(0.5+1)=0.66

Information Retrieval- 63
F-alpha-Score: Example (alpha = 1/5)
Class Class
Classifier 1 Classifier 2
Cancer ¬Cancer Cancer ¬Cancer

Cancer 45 20 Cancer 40 10
Classified Classified
¬Cancer 5 30 ¬Cancer 10 40

F1=5*(0.69*0.9)/(4*0.69+0.9) F2=5*(0.8*0.8)/(4*0.8+0.8)
= 0.84 =0.8

Class
Everybody has cancer
Cancer ¬Cancer

Cancer 50 50
Classified
¬Cancer 0 0
F=5*(0.5*1)/(4*0.5+1)=0.83

Information Retrieval- 64
Precision/Recall Tradeoff in Ranked Retrieval
An IR system ranks documents by a similarity coefficient,
allowing the user to trade off between precision and recall
by choosing the cutoff level
Precision depends on the number of results retrieved:
P@k = precision for the top-k documents
hypothetical ideal IR system

realistic IR systems

Information Retrieval- 65
Evaluating Ranked Retrieval
Recall-Precision Plot
R N R N R R N N R R R N R N R R
(10 relevant documents)

Information Retrieval- 66
Interpolated Precision

Interpolated precision:

Information Retrieval- 67
Mean Average Precision (MAP)
Given a set of queries Q
For each the set of relevant documents
the top k relevant documents for query
interpolated precision of result

Information Retrieval- 68
Example
Assume 4 results are returned for a query q:
RNRN

P@1 = 1, P@2 = 0.5, P@3 = 2/3, P@4 = 0.5


Pint@1 = 1, Pint @2 = 2/3, Pint @3 = 2/3, Pint @4 = 0.5

Then MAP({q}) = 1/2 (1 + 2/3) = 5/6


(note : only precision values when retrieving a relevant document are considered)

Information Retrieval- 69
ROC Curve

True positives

False positives
Specificity S:
Specificity gives information about how many of the true negatives
have been retrieved as false positives
• The steeper the curve rises at the beginning, the better
• The larger the area under the curve, the better (AUC)

Information Retrieval- 70
If the top 100 documents contain 50 relevant
documents …
1. the precision of the system at 50 is 0.25
2. the precision of the system at 100 is 0.5
3. the recall of the system is 0.5
4. All of the above

Information Retrieval- 71
If retrieval system A has a higher precision at k
than system B …
1. the top k documents of A will have higher similarity values than
the top k documents of B
2. the top k documents of A will contain more relevant documents
than the top k documents of B
3. A will recall more documents above a given similarity threshold
than B
4. the top k relevant documents in A will have higher similarity
values than in B

Information Retrieval- 72
Let the first four documents retrieved be
R N N R. Then the MAP is
1. 1/2
2. 3/4
3. 2/3
4. 5/6

Information Retrieval- 73
References
Course material based on
– Ricardo Baeza-Yates, Berthier Ribeiro-Neto, Modern Information
Retrieval (ACM Press Series), Addison Wesley, 1999.
– Christopher D. Manning, Prabhakar Raghavan and Hinrich
Schütze, Introduction to Information Retrieval, Cambridge
University Press. 2008 (http://www-nlp.stanford.edu/IR-book/)
– Course Information Retrieval by TU Munich (http://
www.cis.lmu.de/~hs/teach/14s/ir/)
– Singhal, A., Salton, G., Mitra, M., & Buckley, C. (1996). Document
length normalization. Information Processing &
Management, 32(5), 619-633.

Information Retrieval- 74

You might also like