0% found this document useful (0 votes)
40 views3 pages

NLP Lab Programs

The document outlines several NLP lab programs using the NLTK library, including text tokenization, sentence extraction from documents, and removing stop words and punctuation. It also covers tokenization with stop words as delimiters and demonstrates stemming of words. Each program includes example code snippets and instructions for downloading necessary data.

Uploaded by

Boomika G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views3 pages

NLP Lab Programs

The document outlines several NLP lab programs using the NLTK library, including text tokenization, sentence extraction from documents, and removing stop words and punctuation. It also covers tokenization with stop words as delimiters and demonstrates stemming of words. Each program includes example code snippets and instructions for downloading necessary data.

Uploaded by

Boomika G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

NLP Lab Programs

1. Tokenize a text
from [Link] import word_tokenize, sent_tokenize
import nltk

[Link]('punkt') # Download tokenizer data

# Example text
text = "NLP makes machines understand language. Tokenization is the first step."

# Sentence Tokenization
print("Sentences:", sent_tokenize(text))

# Word Tokenization
print("Words:", word_tokenize(text))

output:

2. sentences of a text document


from [Link] import sent_tokenize
import nltk

[Link]('punkt') # Download tokenizer data

# Read the text from a file


file_path = "[Link]" # Replace with your file path
with open(file_path, 'r') as file:
text = [Link]()

# Sentence Tokenization
sentences = sent_tokenize(text)

# Display the sentences


print("Sentences in the document:")
for i, sentence in enumerate(sentences, 1):
print(f"{i}: {sentence}")
save a text file as [Link] in jupyter notebook
output:

3. tokenize text with stop words as delimiters

from [Link] import word_tokenize

from [Link] import stopwords

import nltk

# Download necessary data

[Link]('punkt')

[Link]('stopwords')

# Example text

text = "I enjoy learning Python and coding."

# Define stop words

stop_words = set([Link]('english'))

# Tokenize the text

words = word_tokenize(text)

# Tokenize using stop words as delimiters

tokens_without_stopwords = [word for word in words if [Link]() not in stop_words]

# Output the result

print("Original Tokens:", words)

print("Tokens without Stop Words:", tokens_without_stopwords)

output:

4. remove stop words and punctuations in a text

from [Link] import word_tokenize


from [Link] import stopwords
import string
import nltk

# Download necessary data


[Link]('punkt')
[Link]('stopwords')

# Example text
text = "Python is great! It's simple and powerful."

# Define stop words


stop_words = set([Link]('english'))

# Tokenize the text


words = word_tokenize(text)

# Remove stop words and punctuation


tokens_cleaned = [word for word in words if [Link]() not in stop_words and word not in
[Link]]

# Output the result


print("Tokens without Stop Words and Punctuation:", tokens_cleaned)

output:

5. perform stemming
# import these modules
from [Link] import PorterStemmer
from [Link] import word_tokenize

ps = PorterStemmer()

# choose some words to be stemmed


words = ["pythonprogramming", "programs", "programmer", "event", "thankyou"]

for w in words:
print(w, " : ", [Link](w))

output:

You might also like