0% found this document useful (0 votes)
32 views

Assignment 3 NonOverlap IR

This code creates a non-overlapping list model for a text document collection. It imports OS and other libraries. The gather_documents function collects all text files in a directory. The non_overlapped_list_model function creates a dictionary with terms as keys and document lists as values, returning documents containing query terms. The main function runs a search loop where the user inputs queries and gets matching documents.

Uploaded by

Pac SaQii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Assignment 3 NonOverlap IR

This code creates a non-overlapping list model for a text document collection. It imports OS and other libraries. The gather_documents function collects all text files in a directory. The non_overlapped_list_model function creates a dictionary with terms as keys and document lists as values, returning documents containing query terms. The main function runs a search loop where the user inputs queries and gets matching documents.

Uploaded by

Pac SaQii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Information Retrieval

Assignment 3 Non-Overlap

Session: 2020 – 2024

Submitted by:
Saqlain Nawaz 2020-CS-135

Supervised by:
Sir Khaldoon Syed Khurshid

Department of Computer Science


University of Engineering and Technology
Lahore Pakistan
Overview
This code is designed to create a non-overlapping list model for a collection of text
files. It includes a search function that allows users to query the model based on
terms of interest.

Libraries Used
The following libraries are used in this code:

● os: Provides functions for interacting with the operating system, used here for
file operations and directory traversal.

Code Flow
The code is structured as follows:

Import Libraries
The required libraries are imported at the beginning of the code.

gather_documents Function
This function takes a directory path as input and returns a list of all text files in that
directory. It uses the os.walk function to traverse the directory and its
subdirectories.

non_overlapped_list_model Function
This function takes a list of documents and terms of interest as input. It creates a
dictionary where each key is a term of interest and each value is a list of documents
that contain that term. It then returns a set of all documents that contain any of the
terms of interest. The function handles UnicodeDecodeError exceptions for files
that cannot be decoded.

Main Function
In the main function, the program first gathers all text files from a specified directory.
Then it enters a loop where it allows the user to enter a query. For each term in the
query, it finds and prints all documents that contain that term.

Execution
The main function is executed when the script is run. The user can interact with the
program by entering queries, and the program will print out documents containing
any of the query terms.
Please note that this is a basic implementation of a non-overlapping list model. It
does not take into account the frequency or proximity of terms within documents.
For a more advanced implementation, you might need to use a library that can parse
and query structured documents.

You might also like