Introduction
Introduction
INFORMATION RETRIEVAL
Lynda Tamine-Lechani
[email protected]
https://www.irit.fr/~Lynda.Tamine-Lechani/
FOUNDATIONS OF INFORMATION RETRIEVAL
•Course description
Study the theory, design, and implementation of information retrieval systems
from the perspectives of:
ü information representation: focus on texts
ü theoretical information retrieval model: focus on language model and learning-
based models
ü Performance evaluation: focus on system-centred evaluation
•Learning objectives
ü Index and represent textual information;
ü Recall and discuss well-known information retrieval models;
ü Design, implement and evaluate the performance of information retrieval
systems using retrieval algorithms and models discussed in class.
2
© L. Tamine-Lechani
FOUNDATIONS OF INFORMATION RETRIEVAL
•Organization
o 12H course, 6H tutorial: Lynda Tamine-Lechani
o 10H hands-on work: Jesus-Lovon Melgajero, José G. Moréno and Lynda Tamine-
Lechani
•Prerequisites
o Python programming
o Basics in probability and statistics
•Course material
o Copies of the lecture slides are posted on the MOODLE site
o Book and readings references are provided
•Grading
o 1st session
üHands-on experience with techniques discussed in class: assignment of 30% of the final score
üFinal written exam in class: assignment of 70% of the final score
o 2nd session
üFinal written exam in class: assignment of 100% of the final score
© L. Tamine-Lechani 3
FOUNDATIONS OF INFORMATION RETRIEVAL
•Schedule
Lecture Topic
1 Course Introduction; Text indexing, vector semantics
2 Static embeddings, contextual embeddings
3 Infomation retrieval (IR) models: query reformulation, learning to
rank
4 Tutorial 1: Text indexing and representation
5 Neural models for IR
6 Page Rank, Performance evaluation
7 Tutorial 2: information retrieval techniques and models
8 Question answering systems and chatbots
9 Tutorial 3: performance evaluation
4
© L. Tamine-Lechani
FOUNDATIONS OF INFORMATION RETRIEVAL
Books
Information retrieval: Algorithms and Heuristics
David A. Grossamnn, Ophir Frieder, Kluwer
Academic Publishers, 1998
5
© L. Tamine-Lechani
Introduction
Salton, 1980 :
Information retrieval systems are designed to help analyze and describe the items stored in a file, to
organize them and search among them, and finally to retrieve them in response to a user's query.
Designing and using a retrieval system involves four major activities: information analysis, information
organization and search, query formulation, and information retrieval and dissemination.
Information retrieval (IR) in computing and information science is the process of obtaining
information system resources that are relevant to an information need from a
collection of those resources. Searches can be based on full-text or other content-based
indexing.
6
© L. Tamine-Lechani
Introduction
Heatmaps on SERP
Cross-device search
© L. Tamine-Lechani 8
Introduction
(Web) search systems that select from a corpus of texts documents those that are
relevant to a user information need experssed by the user using a query.
Information
need
Corpus
Query Documents
Selection
System's
answer to the query
9
© L. Tamine-Lechani
Introduction
Structure
10
© L. Tamine-Lechani
Introduction
Image
Video
11
© L. Tamine-Lechani
Introduction
-Document
-Blog
-Tweet
-News
-Presentation
-E-mail
--..
12
© L. Tamine-Lechani
Introduction
13
© L. Tamine-Lechani
Introduction
© NIST (TREC)
© L. Tamine-Lechani 15
Introduction
•Deluge of information
o Large-scale information
o Often little ratio of information is relevant and/or useful for a query
o Information is noisy
o Information is not always trusty
o Hetrogeneous information forms and sources
o ...
© L. Tamine-Lechani 16
Introduction
Source : Infographic
17
© L. Tamine-Lechani
Introduction
© L.Tamine-Lechani 18
Introduction
Statitistics on usage
of information 20032003
access systems Réseaux sociaux
Réseaux sociaux
2014-2020
Source : 19
https://datastudio.google.com/embed/reporting/1sImC_rjeWqNXdgQt5MtmrQMbH44qFjtA/page/1fzh
Introduction
Roi lion
1 Queryà N intents
© L. Tamine-Lechani 20
Introduction
© L. Tamine-Lechani 21
Introduction
Source: https://www.leprogres.fr/magazine-sante/2021/12/13/variant-omicron-quels-sont-les-premiers-symptomes-
detectes
© L. Tamine-Lechani 22
Introduction
What makes information retrieval similar vs. different from data retrieval (Databases)?
Selected information Information relevant to the All the data that satifies the
query query
© L. Tamine-Lechani 23
Introduction
Indexing Expression
Documents Query
representations
Matching
Selected documents
Feedback
Copyright L.Tamine-Lechani 24
FOUNDATIONS OF INFORMATION RETRIEVAL
• Lecture structure
oIntroduction
o Chapter 1: Text indexing and representation
"How to transform raw texts into machinable representations?
Keywords: indexation, words, documents, representation learning of texts
o Chapter 2: Information retrieval (IR) models
"How to score the relevance of a document as an answer to a user's
query?"
Keywords: relevance status value, retrieval model
o Chapter 3: Performance evaluation of an IR system
"How to measure the performance of an information retrieval system?"
Keywords: evaluation metrics, test collections
o Chapter 4: From question-answering systems to chatbots
"How to interact with systems while searching for information?"
Keywords: conversation, turn, clarification
25