1 introIR
1 introIR
Information Storageand
and
Retrieval
Retrieval
1
2
Contents:
What is information?
What are the Sources of information?
3
WHAT IS INFORMATION RETRIEVAL?
It is a software program that deals with the organization,
storage, retrieval, and evaluation of information from
document repositories, particularly textual information.
It is the activity of obtaining material that can usually be
documented on an unstructured nature i.e. usually text
which satisfies an information need from within large
collections which is stored on computers.
For example, Information Retrieval can be when a user enters a
query into the system.
The IR system assists the users in finding the information
they require but it does not explicitly return the answers to
the question. 4
CONT’D….
It notifies regarding the existence and location of
documents that might consist of the required information.
It also extends support to users in browsing or filtering
document collection or processing a set of retrieved
documents.
The system searches over billions of documents stored on
millions of computers.
A spam filter, manual or automatic means are provided by
Email program for classifying the mails so that it can be
placed directly into particular folders.
5
CONT’D…
An IR system has the ability to represent, store, organize,
and access information items.
A set of keywords are required to search.
Keywords are what people are searching for in search
engines.
These keywords summarize the description of the
information.
6
INFORMATION VS DATA RETRIEVAL
9
10
STORAGE OF TEXT
Textual documents
Searchable as text
words are represented as ASCII/Unicode
Image Documents:
Scanned image of text document, which is not searchable as text:
Texts (characters, words, etc.) are represented as patterns of pixels
retrieval.
EXAMPLES OF IR SYSTEMS
Text-based (Lexis-Nexis, Google, FAST): Search by
keywords. Limited search using queries in natural language.
Multimedia (QBIC, WebSeek, SaFe): Search by visual
appearance (shapes, colors,… ).
Question answering systems (AskJeeves,
Answerbus): Search in (restricted) natural language
Digital and virtual libraries
Other:
Cross language vs. multilingual information retrieval,
Music retrieval
Medical search engines
13
INFORMATION RETRIEVAL SERVE AS
BRIDGE
An Information Retrieval System serves as a bridge
between the world of authors and the world of
readers/users,
That is, writers present a set of ideas in a document using a
set of concepts. Then Users seek the IR system for relevant
documents that satisfy their information need.
Black box
User Documents
14
TYPICAL IR SYSTEM ARCHITECTURE
Document
corpus
Query IR
String System
1. Doc1
2. Doc2
Ranked 3. Doc3
Relevant Documents .
. 15
IR SYSTEM VS. WEB SEARCH SYSTEM
Web
Spider Document
corpus
Query IR
String System
1. Page1
2. Page2
3. Page3 Ranked
. Relevant Documents
. 16
THE RETRIEVAL PROCESS
The
The Retrieval
Retrieval Process
Process
Itis necessary to define the text database before any of the
retrieval processes are initiated
The text operations transform the original documents & the
Comparing representations
what is a “good” similarity measure & retrieval model?
how is uncertainty represented?