Lecture 1: Introduction and Overview: Information Retrieval Computer Science Tripos Part II
Lecture 1: Introduction and Overview: Information Retrieval Computer Science Tripos Part II
Information Retrieval
Computer Science Tripos Part II
Simone Teufel
Lent 2014
1
Overview
1 Motivation
Definition of “Information Retrieval”
IR: beginnings to now
3 Reading
What is Information Retrieval?
2
What is Information Retrieval?
3
Document Collections
4
Document Collections
6
What we mean here by do cument collections
7
IR Basics
8
IR Basics
9
What is Information Retrieval?
10
Structured vs Unstructured Data
SELECT *
FROM business catalogue
WHERE category = ’florist’
AND city zip = ’cb1’
11
Information Needs and Relevance
12
Types of information needs
Known-item search
Precise information seeking search
Open-ended search (“topical search”)
13
Information scarcity vs. information abundance
...when a servant had spilled an urn of hot coffee over his legs, he replied to
the distressed inquiries of the lady of the house, ’Thank you, madam, the
agony is somewhat abated.’ [not Lord Byron, but Lord Macaulay]
14
Relevance
15
How well has the system performed?
16
IR today
Web search ( )
Search ground are billions of documents on millions of
computers
issues: spidering; efficient indexing and search; malicious
manipulation to boost search engine rankings
Link analysis covered in Lecture 8
17
A short history of IR
+
1
$ /.
,
-
!
" # (
$ !
!
,# 0 ! #!
%& ( 12)
(( ) )
18
IR for non-textual media
19
Similarity Searches
20
Areas of IR
21
Overview
1 Motivation
Definition of “Information Retrieval”
IR: beginnings to now
3 Reading
Boolean Retrieval
22
Brutus AND Caesar AND NOT Calpurnia
23
The term-document incidence matrix
24
Query “Brutus AND Caesar AND NOT Calpunia”
We compute the results for our query as the bitwise AND between
vectors for Brutus, Caesar and complement (Calpurnia):
25
Query “Brutus AND Caesar AND NOT Calpunia”
We compute the results for our query as the bitwise AND between
vectors for Brutus, Caesar and complement (Calpurnia):
26
Query “Brutus AND Caesar AND NOT Calpunia”
We compute the results for our query as the bitwise AND between
vectors for Brutus, Caesar and complement (Calpurnia):
27
Query “Brutus AND Caesar AND NOT Calpunia”
We compute the results for our query as the bitwise AND between
vectors for Brutus, Caesar and complement (Calpurnia):
28
The results: two documents
29
Practical Boolean Search
30
Example: Westlaw
31
Westlaw Queries/Information Needs
32
Comments on WestLaw
33
Does Google use the Boolean Model?
34
Overview
1 Motivation
Definition of “Information Retrieval”
IR: beginnings to now
3 Reading
Reading
35