Assignment 3 NonOverlap IR
Assignment 3 NonOverlap IR
Assignment 3 Non-Overlap
Submitted by:
Saqlain Nawaz 2020-CS-135
Supervised by:
Sir Khaldoon Syed Khurshid
Libraries Used
The following libraries are used in this code:
● os: Provides functions for interacting with the operating system, used here for
file operations and directory traversal.
Code Flow
The code is structured as follows:
Import Libraries
The required libraries are imported at the beginning of the code.
gather_documents Function
This function takes a directory path as input and returns a list of all text files in that
directory. It uses the os.walk function to traverse the directory and its
subdirectories.
non_overlapped_list_model Function
This function takes a list of documents and terms of interest as input. It creates a
dictionary where each key is a term of interest and each value is a list of documents
that contain that term. It then returns a set of all documents that contain any of the
terms of interest. The function handles UnicodeDecodeError exceptions for files
that cannot be decoded.
Main Function
In the main function, the program first gathers all text files from a specified directory.
Then it enters a loop where it allows the user to enter a query. For each term in the
query, it finds and prints all documents that contain that term.
Execution
The main function is executed when the script is run. The user can interact with the
program by entering queries, and the program will print out documents containing
any of the query terms.
Please note that this is a basic implementation of a non-overlapping list model. It
does not take into account the frequency or proximity of terms within documents.
For a more advanced implementation, you might need to use a library that can parse
and query structured documents.