0% found this document useful (0 votes)

9 views

IR Practical 1

The document outlines a practical exercise focused on document indexing and retrieval, specifically implementing an inverted index construction algorithm and a simple document retrieval system. It explains the importance of index construction and query processing in information retrieval, providing code examples for building an inverted index and retrieving documents based on user queries. The sample output demonstrates the functionality of the constructed index and the retrieval process.

Uploaded by

laxmikantkattimani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

IR Practical 1

Uploaded by

laxmikantkattimani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

PRACTICAL 1

Aim: - Document Indexing and Retrieval

a) Implement an inverted index construction algorithm.
b) Build a simple document retrieval system using the
constructed index.

Theory: -
Information retrieval plays a crucial role in modern computing,
enabling efficient access to relevant documents from large datasets
based on user queries. It involves two key processes:

1. Index Construction:
Index construction is a crucial step in building a document retrieval
system. The main idea behind indexing is to efficiently map terms
(words) in a collection of documents to the documents in which they
appear. This process significantly reduces the search time during
retrieval, as it allows a system to look up documents containing a
particular term quickly instead of scanning all documents for every
query.

2. Query Processing:
Once an index is built, the system can process user queries by
matching search terms against indexed words and retrieving the most
relevant documents. Query processing involves analyzing the user's
input, breaking it down into search terms, and using the constructed
index to quickly locate the documents that contain those terms,
often ranking them based on relevance or frequency of occurrence.
a) Implement an inverted index construction algorithm.

Code:
from collections import defaultdict

# Function to build the inverted index

def build_inverted_index(documents):

inverted_index = defaultdict(list)

# Iterate over documents

for doc_id, document in enumerate(documents):

words = document.lower().split() # Convert to lowercase and split into words

for word in set(words): # Use set to avoid duplicates in a document

inverted_index[word].append(doc_id)

return inverted_index

# Sample documents

documents = [

"Information retrieval is important.",

"Document indexing is part of retrieval.",

"Retrieval systems need an inverted index."

# Build the inverted index

inverted_index = build_inverted_index(documents)

# Print the inverted index

for word, doc_ids in inverted_index.items():

print(f"{word}: {doc_ids}")

OUTPUT:

b) Build a simple document retrieval system using the

constructed index.
Code:
from collections import defaultdict

# Function to build the inverted index

def build_inverted_index(documents):

inverted_index = defaultdict(list)

# Iterate over documents

for doc_id, document in enumerate(documents):

words = document.lower().split() # Convert to lowercase and split into words

for word in set(words): # Use set to avoid duplicates in a document

inverted_index[word].append(doc_id)

return inverted_index

# Function to retrieve documents for a query

def retrieve_documents(query, inverted_index):

query = query.lower().split() # Tokenize the query and convert to lowercase

doc_ids = set()

# Find documents for each word in the query

for word in query:

if word in inverted_index:

doc_ids.update(inverted_index[word])

return sorted(doc_ids)

# Sample documents

documents = [

"Information retrieval is important.",

"Document indexing is part of retrieval.",

"Retrieval systems need an inverted index."

# Build the inverted index

inverted_index = build_inverted_index(documents)

# Display the inverted index

print("Inverted Index:")

for word, doc_ids in inverted_index.items():

print(f"{word}: {doc_ids}")

# Search query example

query = "retrieval index"

retrieved_docs = retrieve_documents(query, inverted_index)

# Display the retrieved documents (IDs)

print(f"\nDocuments matching the query '{query}': {retrieved_docs}")

OUTPUT:

Great Gatsby Lesson Plan Edsc 440s
No ratings yet
Great Gatsby Lesson Plan Edsc 440s
8 pages
Abrahamic Covenant Chart
No ratings yet
Abrahamic Covenant Chart
7 pages
vertopal.com_ir_op_6
No ratings yet
vertopal.com_ir_op_6
2 pages
IR Journal
No ratings yet
IR Journal
36 pages
ir-journal
No ratings yet
ir-journal
41 pages
IR_Prac_1
No ratings yet
IR_Prac_1
3 pages
Ir 2 Inverted Files
No ratings yet
Ir 2 Inverted Files
2 pages
IR_MOD4_NOTES
No ratings yet
IR_MOD4_NOTES
19 pages
115 Ir 9
No ratings yet
115 Ir 9
4 pages
ir
No ratings yet
ir
120 pages
IR Journal (Printable)
No ratings yet
IR Journal (Printable)
20 pages
IR
No ratings yet
IR
12 pages
20BCE1779 - Web Mining - Lab-1
No ratings yet
20BCE1779 - Web Mining - Lab-1
9 pages
vanessaa_wim
No ratings yet
vanessaa_wim
9 pages
IR practical
No ratings yet
IR practical
24 pages
IR - 754 All Practical
No ratings yet
IR - 754 All Practical
21 pages
chap5-index-construction
No ratings yet
chap5-index-construction
38 pages
Learning Guide Unit 2
No ratings yet
Learning Guide Unit 2
15 pages
Rescued Document
No ratings yet
Rescued Document
4 pages
ir
No ratings yet
ir
23 pages
IRS Module 5
No ratings yet
IRS Module 5
24 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
Unit 3 Indexing
100% (1)
Unit 3 Indexing
10 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
TA2_U2_ISR
No ratings yet
TA2_U2_ISR
2 pages
Unit 1 Notes-1
No ratings yet
Unit 1 Notes-1
10 pages
ir
No ratings yet
ir
4 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
ir.task
No ratings yet
ir.task
6 pages
SUMSEM2022-23 CSE3024 ETH VL2022230700533 2023-05-22 Reference-Material-I
No ratings yet
SUMSEM2022-23 CSE3024 ETH VL2022230700533 2023-05-22 Reference-Material-I
7 pages
Inverted File
No ratings yet
Inverted File
20 pages
20230922044043-Chapter 1
No ratings yet
20230922044043-Chapter 1
4 pages
IR_Journal
No ratings yet
IR_Journal
20 pages
Bda Lab
No ratings yet
Bda Lab
11 pages
Learning Guide Unit 2 _ Home
No ratings yet
Learning Guide Unit 2 _ Home
11 pages
Assignment 6
No ratings yet
Assignment 6
3 pages
SLIP's fsemMCA
No ratings yet
SLIP's fsemMCA
19 pages
CS 3308 Programming Assignment Unit 4
No ratings yet
CS 3308 Programming Assignment Unit 4
7 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Lab3 IR BIM
No ratings yet
Lab3 IR BIM
14 pages
Python Imp
No ratings yet
Python Imp
29 pages
Introduction To Information Rertrieval Recitation
No ratings yet
Introduction To Information Rertrieval Recitation
2 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
Written Assignmen Unit Four IR (1)
No ratings yet
Written Assignmen Unit Four IR (1)
3 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
Project Report
No ratings yet
Project Report
5 pages
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
No ratings yet
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
2 pages
Chapter 3 Indexing
No ratings yet
Chapter 3 Indexing
48 pages
IRS imp
No ratings yet
IRS imp
76 pages
Implementation
No ratings yet
Implementation
16 pages
assignment_1
No ratings yet
assignment_1
12 pages
Tamirat.IRS
No ratings yet
Tamirat.IRS
7 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
CS 3308 Programming Assignment Unit 2
No ratings yet
CS 3308 Programming Assignment Unit 2
10 pages
Document (10) python
No ratings yet
Document (10) python
15 pages
Pyhton Data Structure CheatSheet
No ratings yet
Pyhton Data Structure CheatSheet
5 pages
4.index Construction - New
No ratings yet
4.index Construction - New
46 pages
Relevance of A Document To A Query
No ratings yet
Relevance of A Document To A Query
10 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
Citing Sources
No ratings yet
Citing Sources
46 pages
ST_MODEL_IX_2024_25_SET_A
No ratings yet
ST_MODEL_IX_2024_25_SET_A
4 pages
Arpita Saha Resume
No ratings yet
Arpita Saha Resume
1 page
iOS Game Development by Example - Sample Chapter
0% (1)
iOS Game Development by Example - Sample Chapter
18 pages
Students' Progress Report (March 2011 For Bahasa Melayu)
No ratings yet
Students' Progress Report (March 2011 For Bahasa Melayu)
8 pages
NOTES-CLASS-9_. NO MEN ARE FOREIGN
No ratings yet
NOTES-CLASS-9_. NO MEN ARE FOREIGN
3 pages
English Phonetics and Phonology
No ratings yet
English Phonetics and Phonology
8 pages
KrunalPatel B.E.Computer New
No ratings yet
KrunalPatel B.E.Computer New
2 pages
Rise and Fall of the Third Charles River Editors The all chapter instant download
No ratings yet
Rise and Fall of the Third Charles River Editors The all chapter instant download
34 pages
Pre Int I - 2 Workbook
No ratings yet
Pre Int I - 2 Workbook
28 pages
CATIA Automotive
No ratings yet
CATIA Automotive
204 pages
Dushyanth Sem 2 2023 24
No ratings yet
Dushyanth Sem 2 2023 24
12 pages
The Bog and Seamus Heaney
No ratings yet
The Bog and Seamus Heaney
5 pages
Hibernate Reference Envers
No ratings yet
Hibernate Reference Envers
42 pages
Unit 3 - Linear Equations and Inequalities
No ratings yet
Unit 3 - Linear Equations and Inequalities
76 pages
DLCO
100% (1)
DLCO
2 pages
Sample Lesson Plan in The Actual Administration of EGRA
No ratings yet
Sample Lesson Plan in The Actual Administration of EGRA
3 pages
Java Persistence Practice Guide
No ratings yet
Java Persistence Practice Guide
130 pages
Lecture Notes Coxeter
No ratings yet
Lecture Notes Coxeter
65 pages
11th. ReinWork II
No ratings yet
11th. ReinWork II
3 pages
Getting Started With Bootstrap
No ratings yet
Getting Started With Bootstrap
8 pages
Countable and Uncountable Nouns
No ratings yet
Countable and Uncountable Nouns
23 pages
EngID1 2ndedition Unit8Teachers
100% (1)
EngID1 2ndedition Unit8Teachers
10 pages
Campa-An Outpost of Indian Culture
No ratings yet
Campa-An Outpost of Indian Culture
8 pages
Sociolinguistics Reflection
No ratings yet
Sociolinguistics Reflection
2 pages
Chapter 6 Oops
No ratings yet
Chapter 6 Oops
6 pages
Workshop 3.4 Named Selections + Object Generator: Introduction To ANSYS Mechanical
No ratings yet
Workshop 3.4 Named Selections + Object Generator: Introduction To ANSYS Mechanical
15 pages
Mystic Christian I 0000 Yogi
No ratings yet
Mystic Christian I 0000 Yogi
296 pages

IR Practical 1

Uploaded by

IR Practical 1

Uploaded by

PRACTICAL 1

Aim: - Document Indexing and Retrieval

# Function to build the inverted index

# Iterate over documents

for doc_id, document in enumerate(documents):

words = document.lower().split() # Convert to lowercase and split into words

for word in set(words): # Use set to avoid duplicates in a document

"Information retrieval is important.",

"Document indexing is part of retrieval.",

"Retrieval systems need an inverted index."

# Build the inverted index

# Print the inverted index

for word, doc_ids in inverted_index.items():

b) Build a simple document retrieval system using the

# Function to build the inverted index

# Iterate over documents

for doc_id, document in enumerate(documents):

words = document.lower().split() # Convert to lowercase and split into words

# Function to retrieve documents for a query

def retrieve_documents(query, inverted_index):

query = query.lower().split() # Tokenize the query and convert to lowercase

# Find documents for each word in the query

for word in query:

"Information retrieval is important.",

"Document indexing is part of retrieval.",

"Retrieval systems need an inverted index."

# Build the inverted index

# Display the inverted index

for word, doc_ids in inverted_index.items():

# Search query example

query = "retrieval index"

retrieved_docs = retrieve_documents(query, inverted_index)

# Display the retrieved documents (IDs)

print(f"\nDocuments matching the query '{query}': {retrieved_docs}")

You might also like