01 Introduction To ISR
01 Introduction To ISR
Concepts and Technology behind Search. Pearson Education Ltd, Harlow, England.
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze (2008). Introduction
to Information Storage and Retrieval. Cambridge University Press, New York.
Gerald J. Kowalski and Mark T. Maybury (2002). Information Storage and Retrieval
Systems: Theory and Implementation. Available at
http://www.ebooks.kluweronline.com
Instructional Methods:
Lecture, Discussions, Hands on exercise.
Evaluations:
Assessment 1 10%
Assessment 2 10%
Project 1 20%
Project2 20%
Final Exam 40%
6 01: Introduction to ISR
Introduction to
Information Storage and Retrieval
Searching
DB
Browsing
USER
27 01: Introduction to ISR
• Searching The User Task
• It is the process of retrieving information whereby
the main objective is clearly defined from the
onset of searching process
• The user of a retrieval system has to translate his
information need into a query in the language
provided by the system
• In this context (i.e. by specifying a set of words),
the user searches for useful information executing
a retrieval task
• English Language Statement :
I want a book by J. K Rowling titled The Chamber of Secrets
Document
corpus
Query
IR
String
System
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
01: Introduction to ISR
.
34
Web Search System (e.g.: Google)
Web crawler
Web Spider
Document
corpus
Query IR
String System
Ranked
Documents
1. Page1
2. Page2
3. Page3
.
35 01: Introduction to ISR
Overview of the Retrieval process
Text Operations
logical view Logical view
DB
User Query Language manager
Indexing Module
feedback & Operations
Searching Index
Comparing representations
to identify relevant documents
What weighting scheme and similarity measure to be used?
what is a “good” model of retrieval?
documents
Documents Assign document identifier
text document
Tokenize
IDs
tokens
Stop list
non-stoplist Stemming & Normalize
tokens
stemmed Term weighting
terms
terms with
weights Index
45 01: Introduction to ISR
Searching Subsystem
query parse query
query tokens
ranked
Stop list non-stoplist
document
tokens
set
ranking
Stemming & Normalize
relevant stemmed terms
document set
Similarity Query Term weighting
Measure terms
Index terms
Index