Semantic Analysis with NLTK

Last Updated : 24 Jul, 2025

Semantic Analysis in NLP is the process of extracting meaning from text beyond just the words themselves. While syntax focuses on structure semantics helps machines understand what the text actually means. The Natural Language Toolkit (NLTK) is a popular Python library that provides foundational tools for this.

WordNet in NLTK

WordNet is like a smart English dictionary that groups words into sets of synonyms called synsets. Each synset represents a single concept or meaning by which we can easily find all the meanings of a word and see example sentences that use each meaning.

Word Sense Disambiguation (WSD): WSD is about figuring out which meaning is correct in a given sentence. NLTK includes the classic Lesk Algorithm which chooses the best sense by comparing words in the sentence with the dictionary definitions. For example in the sentence “He went to the bank to deposit money,” WSD correctly picks the sense of “bank” as a financial place not a riverbank.
Semantic Similarity: WordNet lets you calculate semantic similarity based on how closely related their meanings are in the WordNet hierarchy. For example “dog” and “cat” are semantically similar because they both belong to the “animal” group. In NLTK you can use measures which compares how deep the words are in the WordNet tree and how closely they share a common ancestor concept.
Named Entity Recognition (NER): NER goes a step beyond words and meanings it helps identify real world entities in text. NLTK’s built in NER can detect names of people, places, organizations, dates and more. For example in the sentence “Barack Obama was born in Hawaii,” NER labels “Barack Obama” as a person and “Hawaii” as a location.

Implementation

Step 1: Install and Download Necessary Libraries

This step installs NLTK if needed, downloads various language resources including tokenizers, taggers and named entity chunkers, and imports modules required for word sense disambiguation and linguistic analysis.

Python

!pip install nltk

import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('omw-1.4')
nltk.download('maxent_ne_chunker')
nltk.download('words')

from nltk.corpus import wordnet as wn
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
from nltk import pos_tag, ne_chunk
import pandas as pd

Step 2: Load the Dataset

This step reads the CSV file into a DataFrame with specified column names and displays the first three rows to verify the data has been loaded correctly. You can download the Sentiment140 dataset with 1.6 million tweets from Kaggle.

Python

df = pd.read_csv('training.1600000.processed.noemoticon.csv.zip',
                 encoding='latin-1',
                 names=['target', 'ids', 'date', 'flag', 'user', 'text'])
df.head(3)

Output:

Step 3: Select and Display a Sample Tweet

This step picks the first tweet from the dataset and prints it, providing a sample text to work with for further analysis or demonstration.

Python

tweet = df['text'][0]
print(f"Tweet: {tweet}")

Output:

Tweet: @switchfoot https://twitpic.com/ - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D

Step 4: Tokenize and POS Tag the Sample Tweet

This step tokenizes the selected tweet into words and assigns part of speech tags to each token helping identify the grammatical role of each word for deeper linguistic analysis.

Python

import nltk
nltk.download('averaged_perceptron_tagger_eng')
tokens = word_tokenize(tweet)
tags = pos_tag(tokens)

print("Tokens:", tokens)
print("POS Tags:", tags)

Output:

Step 5: Perform Word Sense Disambiguation (WSD)

This step uses the Lesk algorithm to find the most appropriate meaning of the ambiguous word bank in the given sentence. It then prints the selected sense and its definition to clarify the intended meaning based on context.

Python

sentence = "He went to the bank to deposit money."
tokens = word_tokenize(sentence)
sense = lesk(tokens, 'bank')
print("Best sense:", sense)
print("Definition:", sense.definition())

Output:

Best sense: Synset('savings_bank.n.02')
Definition: a container (usually with a slot in the top) for keeping money at home

Step 6: Calculate Semantic Similarity Between Words

This step selects adjective senses of good and bad from WordNet and computes their semantic similarity using the Wu-Palmer measure which quantifies how closely related the two concepts are based on their position in the lexical hierarchy.

Python

dog = wn.synsets('good', pos=wn.ADJ)[0]
bad = wn.synsets('bad', pos=wn.ADJ)[0]

similarity = dog.wup_similarity(bad)
print(f"Semantic Similarity (Wu-Palmer): {similarity}")

Output:

Semantic Similarity (Wu-Palmer): 0.5

Step 7: Named Entity Recognition (NER) with Chunking

This step downloads the necessary resource for NER applies the named entity chunker to the POS tagged tokens and prints the resulting parse tree which identifies and groups named entities in the sentence.

Python

import nltk
nltk.download('maxent_ne_chunker_tab')
tree = ne_chunk(tags)
print(tree)

Output:

Step 8: Perform Comprehensive Semantic Analysis

This step tokenizes and POS tags the input text, performs named entity recognition and then retrieves and prints all WordNet synsets with definitions for each word. Running this on the first two tweets provides a detailed linguistic and semantic overview.

Python

def semantic_analysis(text):
    tokens = word_tokenize(text)
    tags = pos_tag(tokens)
    print("Original:", text)
    print("Tokens:", tokens)
    print("NER Tree:", ne_chunk(tags))

    for word in tokens:
        synsets = wn.synsets(word)
        if synsets:
            print(f"\nWord: {word}")
            for syn in synsets:
                print(f"  - {syn.name()}: {syn.definition()}")
    print("\n")

for i in range(2):
    semantic_analysis(df['text'][i])

Output:

Here, we have done Semantic Analysis of the word "Day".

You can download the Source code from here- Semantic Analysis with NLTK

shrurfu5

Improve

Article Tags :

Semantic Analysis with NLTK

WordNet in NLTK

Implementation

Step 1: Install and Download Necessary Libraries

Step 2: Load the Dataset

Step 3: Select and Display a Sample Tweet

Step 4: Tokenize and POS Tag the Sample Tweet

Step 5: Perform Word Sense Disambiguation (WSD)

Step 6: Calculate Semantic Similarity Between Words

Step 7: Named Entity Recognition (NER) with Chunking

Step 8: Perform Comprehensive Semantic Analysis

Explore

Introduction to NLP

Libraries for NLP

Text Normalization in NLP

Text Representation and Embedding Techniques

NLP Deep Learning Techniques

NLP Projects and Practice

Thank You!

What kind of Experience do you want to share?