0% found this document useful (0 votes)
13 views

Unit 4

Uploaded by

Payal Khuspe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Unit 4

Uploaded by

Payal Khuspe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to Semantic Analysis

Semantic Analysis is a subfield of Natural Language Processing (NLP) that attempts


to understand the meaning of Natural Language. Understanding Natural Language
might seem a straightforward process to us as humans. However, due to the vast
complexity and subjectivity involved in human language, interpreting it is quite a
complicated task for machines. Semantic Analysis of Natural Language captures the
meaning of the given text while taking into account context, logical structuring of
sentences and grammar roles.
Parts of Semantic Analysis
Semantic Analysis of Natural Language can be classified into two broad parts:
1. Lexical Semantic Analysis: Lexical Semantic Analysis involves understanding
the meaning of each word of the text individually. It basically refers to fetching the
dictionary meaning that a word in the text is deputed to carry.
2. Compositional Semantics Analysis: Although knowing the meaning of each
word of the text is essential, it is not sufficient to completely understand the meaning
of the text.
For example, consider the following two sentences:
• Sentence 1: Students love College.
• Sentence 2: College loves Students.
Although both these sentences 1 and 2 use the same set of root words {student, love,
College}, they convey entirely different meanings.
Hence, under Compositional Semantics Analysis, we try to understand how
combinations of individual words form the meaning of the text.
Tasks involved in Semantic Analysis
In order to understand the meaning of a sentence, the following are the major
processes involved in Semantic Analysis:
1. Word Sense Disambiguation
2. Relationship Extraction
Word Sense Disambiguation:

In Natural Language, the meaning of a word may vary as per its usage in sentences
and the context of the text. Word Sense Disambiguation involves interpreting the
meaning of a word based upon the context of its occurrence in a text.
For example, the word ‘Bark’ may mean ‘the sound made by a dog’ or ‘the outermost
layer of a tree.’
Likewise, the word ‘rock’ may mean ‘a stone‘ or ‘a genre of music‘ – hence, the
accurate meaning of the word is highly dependent upon its context and usage in the
text.
Thus, the ability of a machine to overcome the ambiguity involved in identifying the
meaning of a word based on its usage and context is called Word Sense
Disambiguation.

Relationship Extraction:

Another important task involved in Semantic Analysis is Relationship Extracting. It


involves firstly identifying various entities present in the sentence and then
extracting the relationships between those entities.
For example, consider the following sentence:
Semantic Analysis is a topic of NLP which is explained on the GeeksforGeeks blog.
The entities involved in this text, along with their relationships, are shown below.

Entities

Relationships
Elements of Semantic Analysis
Some of the critical elements of Semantic Analysis that must be scrutinized and
taken into account while processing Natural Language are:
• Hyponymy: Hyponymys refers to a term that is an instance of a generic term.
They can be understood by taking class-object as an analogy. For example:
‘Color‘ is a hypernymy while ‘grey‘, ‘blue‘, ‘red‘, etc, are its hyponyms.
• Homonymy: Homonymy refers to two or more lexical terms with the same
spellings but completely distinct in meaning. For example: ‘Rose‘ might mean
‘the past form of rise‘ or ‘a flower‘, – same spelling but different meanings;
hence, ‘rose‘ is a homonymy.
• Synonymy: When two or more lexical terms that might be spelt distinctly have
the same or similar meaning, they are called Synonymy. For example: (Job,
Occupation), (Large, Big), (Stop, Halt).
• Antonymy: Antonymy refers to a pair of lexical terms that have contrasting
meanings – they are symmetric to a semantic axis. For example: (Day, Night),
(Hot, Cold), (Large, Small).
• Polysemy: Polysemy refers to lexical terms that have the same spelling but
multiple closely related meanings. It differs from homonymy because the
meanings of the terms need not be closely related in the case of homonymy. For
example: ‘man‘ may mean ‘the human species‘ or ‘a male human‘ or ‘an adult
male human‘ – since all these different meanings bear a close association, the
lexical term ‘man‘ is a polysemy.
• Meronomy: Meronomy refers to a relationship wherein one lexical term is a
constituent of some larger entity. For example: ‘Wheel‘ is a meronym of
‘Automobile‘
Meaning Representation
While, as humans, it is pretty simple for us to understand the meaning of textual
information, it is not so in the case of machines. Thus, machines tend to represent
the text in specific formats in order to interpret its meaning. This formal structure
that is used to understand the meaning of a text is called meaning representation.
Basic Units of Semantic System:

In order to accomplish Meaning Representation in Semantic Analysis, it is vital to


understand the building units of such representations. The basic units of semantic
systems are explained below:
3. Entity: An entity refers to a particular unit or individual in specific such as a
person or a location. For example KodeVortex, Delhi, etc.
4. Concept: A Concept may be understood as a generalization of entities. It refers
to a broad class of individual units. For example Learning Portals, City,
Students.
5. Relations: Relations help establish relationships between various entities and
concepts. For example: ‘KodeVortex is a Learning Portal’, ‘Delhi is a City.’,
etc.
6. Predicate: Predicates represent the verb structures of the sentences.
In Meaning Representation, we employ these basic units to represent textual
information.

Approaches to Meaning Representations:

Now that we are familiar with the basic understanding of Meaning


Representations, here are some of the most popular approaches to meaning
representation:
7. First-order predicate logic (FOPL)
8. Semantic Nets
9. Frames
10. Conceptual dependency (CD)
11. Rule-based architecture
12. Case Grammar
13. Conceptual Graphs
Semantic Analysis Techniques
Based upon the end goal one is trying to accomplish, Semantic Analysis can be
used in various ways. Two of the most common Semantic Analysis techniques are:
Text Classification

In-Text Classification, our aim is to label the text according to the insights we
intend to gain from the textual data.
For example:
• In Sentiment Analysis, we try to label the text with the prominent emotion they
convey. It is highly beneficial when analyzing customer reviews for
improvement.
• In Topic Classification, we try to categories our text into some predefined
categories. For example: Identifying whether a research paper is of Physics,
Chemistry or Maths
• In Intent Classification, we try to determine the intent behind a text message.
For example: Identifying whether an e-mail received at customer care service is
a query, complaint or request.

Text Extraction

In-Text Extraction, we aim at obtaining specific information from our text.


For Example,
• In Keyword Extraction, we try to obtain the essential words that define the
entire document.
• In Entity Extraction, we try to obtain all the entities involved in a document.
Significance of Semantics Analysis
Semantics Analysis is a crucial part of Natural Language Processing (NLP). In the
ever-expanding era of textual information, it is important for organizations to draw
insights from such data to fuel businesses. Semantic Analysis helps machines
interpret the meaning of texts and extract useful information, thus providing
invaluable data while reducing manual efforts.
Besides, Semantics Analysis is also widely employed to facilitate the processes of
automated answering systems such as chatbots – that answer user queries without
any human interventions.
Part Of Speech (POS) Tagging Ambiguity

POS tagging refers to the process of classifying words in a text to a part of speech -
whether the word is a verb, noun, etc. Often, you will find that the same word can
take on multiple classifications for its part of speech depending on how the sentence
is constructed. For example, it is quite common to see words that can be used both
as a verb or a noun −

• I need to mail my friend the files. (Verb)


• I need to find the mail that was sent to me. (Noun)

Structural Ambiguity

This ambiguity arises because the same exact sentence can be interpreted differently
based on how the sentence is parsed. Take the following sentence −
The boy kicked the ball in his jeans.

This sentence can be construed as the boy either kicking the ball while wearing his
jeans, or kicking the ball while the ball was in the jeans. This depends on how the
sentence is parsed.

Scope Ambiguity

Here we look at ambiguities that occur due to quantifiers. Taking a look back at
some math logic terminology, or just basic grammar, we know that words like ‘every’
and ‘any’ would come to mind.
Take the following sentence −
All students learn a programming language.

This sentence, due to the scope created with the sequential use of quantifiers ‘all’
followed by ‘a’, can have two different meanings. The two meanings are that −
• The first is that all students learn the same programming language.
• They all learn a language that doesn’t have to be the same one.

Lexical Ambiguity
Certain words have the property that they can have self was carrying the telescope.
The person saying the sentence was using a telescope to see Michelle.

Anaphoric Ambiguity

Here we have a loosely similar ambiguity to referential ambiguity, but more fixated
on pronouns . The use of pronouns can cause some confusion if there are multiple
people being mentioned in a sentence. Take the following sentence −
Michelle told Romany that she ate the cake.
Now, from the sentence alone it is not exactly clear whether ‘she’ is referring to
Michelle or Romany.
WordNet
Natural Language Processing can be challenging when it comes to automatically
deciphering and analyzing word meanings and pre-processing text input (NLP). To
help with this, lexicons are widely used. A vocabulary is referred to as a dictionary
of lexicons. We often make connections in language using these lexicons, which
helps us understand the relationship between various concepts. A great lexical
resource is WordNet. The identification of word relationships, synonyms, grammar,
and other topics is made easier because of its distinctive semantic network.
Automatic language translation, sentiment analysis, and text similarity are all aided
by this.

What is Wordnet?
WordNet is a massive lexicon of English words. Nouns, verbs, adjectives, and
adverbs are arranged into synsets,' which are collections of cognitive synonyms
that communicate a separate concept. Conceptual-semantic and linguistic links like
hyponymy and antonymy are used to connect synsets.
WordNet is similar to a thesaurus in that it groups words according to their
meanings. There are, nevertheless, some key distinctions.
• For starters, WordNet connects not just word forms — strings of letters — but
also word senses. As a result, terms in the network that are close to one
another are semantically disambiguated.
• For starters, WordNet connects not just word forms — strings of letters — but
also word senses. As a result, terms in the network that are close to one
another are semantically disambiguate.

Structure of WordNet
WordNet is a lexical database of the English language that organizes words into
synsets (sets of synonymous words) and describes their semantic relationships.
Here's a simplified diagram illustrating the structure of WordNet:
+-------------+
| WordNet |
+-------------+
|
|
+---------------------+
| Synsets |
+---------------------+
/ | \
/ | \
+--------+ +--------+ +--------+
| Synset | | Synset | | Synset |
+--------+ +--------+ +--------+
| | |
| | |
+--------+ +--------+ +--------+
| Word | | Word | | Word |
+--------+ +--------+ +--------+

Explanation:
• WordNet: The top-level entity represents the entire WordNet database.
• Synsets: Synsets are sets of synonymous words grouped together based on
shared meanings or concepts.
• Synset: Each synset contains a group of words (lemmas) that are semantically
related and represent the same concept or meaning.
• Word: Individual words (lemmas) are linked to synsets, indicating their
membership in a particular synset.
In WordNet, synsets are interconnected through various semantic relations such as
synonymy, antonymy, hypernymy, and meronymy, forming a rich network of
lexical information. This structure allows WordNet to be used for tasks like word
sense disambiguation, semantic similarity calculation, and natural language
processing.
How to use WordNet
The Natural Language Toolkit (NLTK) is a Python module for natural language
processing. It has many corpora, toy grammars, trained models, and, most
importantly, WordNet, which is of importance to this site. The English WordNet
module in the NLTK module has 155,287 words and 117,659 synonym sets.
Synset is a type of basic interface used in NLTK that allows you to look up words
in WordNet. Synset instances are a collection of synonyms that communicate the
same idea. Some words have only one Synset, while others have multiple.

Hypernyms and Hyponyms


• Hypernyms are more esoteric terms.
• More specific terms are referred to as hyponyms.
• Synsets are grouped in a structure similar to that of an inheritance tree,
therefore both spring to mind. A root hypernym can be found at the top of this
tree. Hypernyms are a means of classifying and grouping words based on their
resemblance.

What is BabelNet
BabelNet is a multilingual lexicalized semantic network and ontology developed at
the NLP group of the Sapienza University of Rome. BabelNet was automatically
created by linking Wikipedia to the most popular computational lexicon of the
English language, WordNet. The integration is done using an automatic mapping
and by filling in lexical gaps in resource-poor languages by using statistical machine
translation. The result is an encyclopedic dictionary that provides concepts and
named entities lexicalized in many languages and connected with large amounts of
semantic relations.
Lesk’s Algorithm: A simple method for word-sense disambiguation
Perhaps one of the earliest and still most commonly used methods for word-sense
disambiguation today is Lesk’s Algorithm, proposed by Michael E. Lesk in 1986.
Lesk’s algorithm is based on the idea that words that appear together in text are
related somehow, and that the relationship and corresponding context of the words
can be extracted through the definitions of the words of interest as well as the other
words used around it. Developed long before the advent of modern technology,
Lesk’s algorithm aims to disambiguate the meaning of words of interest — usually
appearing within a short phrase or sentence — by finding the pair of matching
dictionary “senses” (i.e. synonyms) with the greatest word overlap in dictionary
definitions.
In the example Lesk uses, he references the words “pine” and “cone”, noting that the
words return the following definitions from the Oxford English Dictionary:

Example of sense-matching proposed by Lesk in 1986.


In the simplest terms, Lesk’s algorithm counts the number of overlaps between all
dictionary definitions of a word of interest and all dictionary definitions of the words
surrounding it, known as a “context window”. Then, it takes the definition
corresponding to the word with the highest number of overlaps, without including
stop words (words such as “the”, “a”, “and”), and infers it to be the word’s “sense”.
If we consider “pine” to be the word of interest, and “cone” to be the only word in
its context window, comparing dictionary definitions of “pine” and “cone” would
find that “evergreen” is the most common “sense” to both terms. Thus, we can
logically infer that the word of interest “pine” refers to an evergreen tree, rather its
alternate definition.
Naive Bayes to NLP Problems
Naive Bayes Classifier Algorithm is a family of probabilistic algorithms based on
applying Bayes’ theorem with the “naive” assumption of conditional independence
between every pair of a feature.
Bayes theorem calculates probability P(c|x) where c is the class of the possible
outcomes and x is the given instance which has to be classified, representing some
certain features.
P(c|x) = P(x|c) * P(c) / P(x)
Naive Bayes are mostly used in natural language processing (NLP) problems.
Naive Bayes predict the tag of a text. They calculate the probability of each tag for
a given text and then output the tag with the highest one.
How Naive Bayes Algorithm Works ?
Let’s consider an example, classify the review whether it is positive or negative.
Training Dataset:

Text Reviews

“I liked the movie” positive

“It’s a good movie. Nice story” positive

“Nice songs. But sadly boring ending. ” negative

“Hero’s acting is bad but heroine looks good. Overall nice


positive
movie”

“Sad, boring movie” negative

We classify whether the text “overall liked the movie” has a positive review or a
negative review. We have to calculate,
P(positive | overall liked the movie) — the probability that the tag of a sentence is
positive given that the sentence is “overall liked the movie”.
P(negative | overall liked the movie) — the probability that the tag of a sentence
is negative given that the sentence is “overall liked the movie”.
Before that, first, we apply Removing Stopwords and Stemming in the text.
Removing Stopwords: These are common words that don’t really add anything to
the classification, such as an able, either, else, ever and so on.
Stemming: Stemming to take out the root of the word.
Now After applying these two techniques, our text becomes

Text Reviews

“ilikedthemovi” positive

“itsagoodmovienicestori” positive

“nicesongsbutsadlyboringend” negative

“herosactingisbadbutheroinelooksgoodoverallnicemovi” positive

“sadboringmovi” negative

Feature Engineering:
The important part is to find the features from the data to make machine learning
algorithms works. In this case, we have text. We need to convert this text into
numbers that we can do calculations on. We use word frequencies. That is treating
every document as a set of the words it contains. Our features will be the counts of
each of these words.
In our case, we have P(positive | overall liked the movie), by using this theorem:
P(positive | overall liked the movie) = P(overall liked the movie | positive) *
P(positive) / P(overall liked the movie)
Since for our classifier we have to find out which tag has a bigger probability, we
can discard the divisor which is the same for both tags,
P(overall liked the movie | positive)* P(positive) with P(overall liked the movie |
negative) * P(negative)
There’s a problem though: “overall liked the movie” doesn’t appear in our training
dataset, so the probability is zero. Here, we assume the ‘naive’ condition that every
word in a sentence is independent of the other ones. This means that now we look
at individual words.
We can write this as:
P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie)
The next step is just applying the Bayes theorem:-
P(overall liked the movie| positive) = P(overall | positive) * P(liked | positive)
* P(the | positive) * P(movie | positive)
And now, these individual words actually show up several times in our training
data, and we can calculate them!
Calculating probabilities:
First, we calculate the a priori probability of each tag: for a given sentence in our
training data, the probability that it is positive P(positive) is 3/5. Then, P(negative)
is 2/5.
Then, calculating P(overall | positive) means counting how many times the word
“overall” appears in positive texts (1) divided by the total number of words in
positive (17). Therefore, P(overall | positive) = 1/17, P(liked/positive) = 1/17,
P(the/positive) = 2/17, P(movie/positive) = 3/17.
If probability comes out to be zero then By using Laplace smoothing: we add 1 to
every count so it’s never zero. To balance this, we add the number of possible
words to the divisor, so the division will never be greater than 1. In our case, the
total possible words count are 21.
Applying smoothing, The results are:

Word P(word | positive) P(word | negative)

overall 1 + 1/17 + 21 0 + 1/7 + 21


liked 1 + 1/17 + 21 0 + 1/7 + 21

the 2 + 1/17 + 21 0 + 1/7 + 21

movie 3 + 1/17 + 21 1 + 1/7 + 21

Now we just multiply all the probabilities, and see who is bigger:
P(overall | positive) * P(liked | positive) * P(the | positive) * P(movie | positive)
* P(positive ) = 1.38 * 10^{-5} = 0.0000138
P(overall | negative) * P(liked | negative) * P(the | negative) * P(movie |
negative) * P(negative) = 0.13 * 10^{-5} = 0.0000013
Our classifier gives “overall liked the movie” the positive tag.

You might also like