IR Practical 1
IR Practical 1
Theory: -
Information retrieval plays a crucial role in modern computing,
enabling efficient access to relevant documents from large datasets
based on user queries. It involves two key processes:
1. Index Construction:
Index construction is a crucial step in building a document retrieval
system. The main idea behind indexing is to efficiently map terms
(words) in a collection of documents to the documents in which they
appear. This process significantly reduces the search time during
retrieval, as it allows a system to look up documents containing a
particular term quickly instead of scanning all documents for every
query.
2. Query Processing:
Once an index is built, the system can process user queries by
matching search terms against indexed words and retrieving the most
relevant documents. Query processing involves analyzing the user's
input, breaking it down into search terms, and using the constructed
index to quickly locate the documents that contain those terms,
often ranking them based on relevance or frequency of occurrence.
a) Implement an inverted index construction algorithm.
Code:
from collections import defaultdict
def build_inverted_index(documents):
inverted_index = defaultdict(list)
inverted_index[word].append(doc_id)
return inverted_index
# Sample documents
documents = [
print(f"{word}: {doc_ids}")
OUTPUT:
def build_inverted_index(documents):
inverted_index = defaultdict(list)
inverted_index[word].append(doc_id)
return inverted_index
doc_ids = set()
if word in inverted_index:
doc_ids.update(inverted_index[word])
return sorted(doc_ids)
# Sample documents
documents = [
inverted_index = build_inverted_index(documents)
print("Inverted Index:")
OUTPUT: