0% found this document useful (0 votes)
9 views

IR Practical 1

The document outlines a practical exercise focused on document indexing and retrieval, specifically implementing an inverted index construction algorithm and a simple document retrieval system. It explains the importance of index construction and query processing in information retrieval, providing code examples for building an inverted index and retrieving documents based on user queries. The sample output demonstrates the functionality of the constructed index and the retrieval process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

IR Practical 1

The document outlines a practical exercise focused on document indexing and retrieval, specifically implementing an inverted index construction algorithm and a simple document retrieval system. It explains the importance of index construction and query processing in information retrieval, providing code examples for building an inverted index and retrieving documents based on user queries. The sample output demonstrates the functionality of the constructed index and the retrieval process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

PRACTICAL 1

Aim: - Document Indexing and Retrieval


a) Implement an inverted index construction algorithm.
b) Build a simple document retrieval system using the
constructed index.

Theory: -
Information retrieval plays a crucial role in modern computing,
enabling efficient access to relevant documents from large datasets
based on user queries. It involves two key processes:

1. Index Construction:
Index construction is a crucial step in building a document retrieval
system. The main idea behind indexing is to efficiently map terms
(words) in a collection of documents to the documents in which they
appear. This process significantly reduces the search time during
retrieval, as it allows a system to look up documents containing a
particular term quickly instead of scanning all documents for every
query.

2. Query Processing:
Once an index is built, the system can process user queries by
matching search terms against indexed words and retrieving the most
relevant documents. Query processing involves analyzing the user's
input, breaking it down into search terms, and using the constructed
index to quickly locate the documents that contain those terms,
often ranking them based on relevance or frequency of occurrence.
a) Implement an inverted index construction algorithm.

Code:
from collections import defaultdict

# Function to build the inverted index

def build_inverted_index(documents):

inverted_index = defaultdict(list)

# Iterate over documents

for doc_id, document in enumerate(documents):

words = document.lower().split() # Convert to lowercase and split into words

for word in set(words): # Use set to avoid duplicates in a document

inverted_index[word].append(doc_id)

return inverted_index

# Sample documents

documents = [

"Information retrieval is important.",

"Document indexing is part of retrieval.",

"Retrieval systems need an inverted index."

# Build the inverted index


inverted_index = build_inverted_index(documents)

# Print the inverted index

for word, doc_ids in inverted_index.items():

print(f"{word}: {doc_ids}")

OUTPUT:

b) Build a simple document retrieval system using the


constructed index.
Code:
from collections import defaultdict

# Function to build the inverted index

def build_inverted_index(documents):

inverted_index = defaultdict(list)

# Iterate over documents

for doc_id, document in enumerate(documents):

words = document.lower().split() # Convert to lowercase and split into words


for word in set(words): # Use set to avoid duplicates in a document

inverted_index[word].append(doc_id)

return inverted_index

# Function to retrieve documents for a query

def retrieve_documents(query, inverted_index):

query = query.lower().split() # Tokenize the query and convert to lowercase

doc_ids = set()

# Find documents for each word in the query

for word in query:

if word in inverted_index:

doc_ids.update(inverted_index[word])

return sorted(doc_ids)

# Sample documents

documents = [

"Information retrieval is important.",

"Document indexing is part of retrieval.",

"Retrieval systems need an inverted index."

# Build the inverted index

inverted_index = build_inverted_index(documents)

# Display the inverted index

print("Inverted Index:")

for word, doc_ids in inverted_index.items():


print(f"{word}: {doc_ids}")

# Search query example

query = "retrieval index"

retrieved_docs = retrieve_documents(query, inverted_index)

# Display the retrieved documents (IDs)

print(f"\nDocuments matching the query '{query}': {retrieved_docs}")

OUTPUT:

You might also like