Assignment 1 IR
Assignment 1 IR
Assignment 1
Submitted by:
Saqlain Nawaz 2020-CS-135
Supervised by:
Sir Khaldoon Syed Khurshid
Libraries Used
The following libraries are used in this code:
● OS: Provides functions for interacting with the operating system, used for file
operations and directory traversal.
● NLTK: The Natural Language Toolkit is used for natural language processing tasks
such as tokenization, stemming, and part-of-speech tagging.
● String: Provides a collection of string constants for punctuation characters.
● nltk.corpus.stopwords: Provides a list of common English stopwords.
● nltk.stem.PorterStemmer: Implements the Porter stemming algorithm for word
stemming.
● nltk.tokenize.word_tokenize: Tokenizes sentences into words.
● nltk.tokenize.sent_tokenize: Tokenizes text into sentences.
Code Flow
The code is structured as follows:
Import Libraries
Initialize Variables
create_index Function
This function takes a directory path as input and returns an inverted index.
search Function
User Interaction
● The user can input a search query, and the code returns the filenames in which each
query word appears
Execution
Block Diagram:
A block diagram is a visual representation of the code's structure and key
components.
Data Flow Diagram (DFD):
A DFD illustrates how data moves through your code.