0% found this document useful (0 votes)
271 views

Unit 3

Unit 3 notes for NLP

Uploaded by

Anonymous XhmybK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
271 views

Unit 3

Unit 3 notes for NLP

Uploaded by

Anonymous XhmybK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

R22 B.Tech.

CSE NLP

Prerequisites:
1. Data structures and compiler design
Course Objectives:

Introduction to some of the problems and solutions of NLP and their relation
to linguistics and statistics.
Course Outcomes:

CS525PE Show sensitivity to linguistic phenomena and an ability to model them with
formal grammars.
Understand and carry out proper experimental methodology for training and
Natural Language evaluating empirical NLP systems
Manipulate probabilities, construct statistical models over strings and
Processing trees, and estimate parameters using supervised and unsupervised training
methods.
Design, implement, and analyze NLP algorithms; and design different
Professional Elective – II language modelling Techniques.
UNIT - I
Finding the Structure of Words: Words and Their
Components, Issues and Challenges, Morphological Models
Finding the Structure of Documents: Introduction, Methods,
Complexity of the Approaches, Performances of the
Approaches, Features
UNIT - II
Page 1 of 76
R22 B.Tech. CSE NLP

Syntax I: Parsing Natural Language, Treebanks: A Data- REFERENCE BOOK:


Driven Approach to Syntax, Representation of Syntactic
1. Speech and Natural Language Processing - Daniel
Structure, Parsing Algorithms
Jurafsky & James H Martin, Pearson Publications.
UNIT – III
2. Natural Language Processing and Information Retrieval:
Syntax II: Models for Ambiguity Resolution in Parsing, Tanvier Siddiqui, U.S. Tiwary.
Multilingual Issues
Semantic Parsing I: Introduction, Semantic Interpretation,
System Paradigms, Word Sense
UNIT - IV
Semantic Parsing II: Predicate-Argument Structure,
Meaning Representation Systems
UNIT - V
Language Modeling: Introduction, N-Gram Models,
Language Model Evaluation, Bayesian parameter
estimation, Language Model Adaptation, Language Models-
class based, variable length, Bayesian topic based,
Multilingual and Cross Lingual Language Modeling
TEXT BOOKS:
1. Multilingual natural Language Processing Applications:
From Theory to Practice – Daniel M. Bikel and Imed Zitouni,
Pearson Publication.

Page 2 of 76
R22 B.Tech. CSE NLP

Rule-based models use hand-crafted grammars and rules to


disambiguate sentences. These rules can be based on linguistic
knowledge or heuristics, and can help resolve ambiguity by
Syntax II: Models for Ambiguity Resolution in Parsing,
preferring certain syntactic structures over others. For example, a
Multilingual Issues rule-based model might prefer a noun phrase followed by a verb
Semantic Parsing I: Introduction, Semantic Interpretation, phrase as the primary syntactic structure for a given sentence.
System Paradigms, Word Sense 2. Statistical models:
Syntax II: Statistical models use machine learning algorithms to learn
1.Models for Ambiguity Resolution in Parsing. from large corpora of text and make predictions about the most
likely syntactic structure for a given sentence. These models can
2.Multilingual Issues. be based on various features, such as part-of-speech tags, word
1. Models for Ambiguity Resolution in embeddings, or contextual information. For example, a statistical
Parsing: model might learn to associate certain word sequences with
specific syntactic structures.
Ambiguity resolution is an important problem in natural
language processing (NLP)as many sentences can have multiple 3. Hybrid models:
valid syntactic parses. This means that the same sentence can be Hybrid models combine both rule-based and statistical
represented by multiple phrase structure trees or dependency approaches to resolve ambiguity. These models can use rules to
graphs. Resolving ambiguity is crucial for many NLP guide the parsing process and statistical models to make more fine-
applications, such as machine translation, text-to-speech grained predictions. For example, a hybrid model might use a rule-
synthesis, and information retrieval. based approach to identify the main syntactic structures in a
Here are some common models for ambiguity resolution in sentence, and then use a statistical model to disambiguate specific
parsing: substructures.

1. Rule-based models: 4. Neural network models:


Neural network models use deep learning techniques to
learn from large amounts of text and make predictions about the
Page 38 of 76
R22 B.Tech. CSE NLP

most likely syntactic structure for a given sentence. These models PCFGs can be used to compute the probability of a parse
can be based on various neural architectures, such as recurrent tree for a given sentence, which can then be used to select the most
neural networks (RNNs) or transformer models. For example, a likely parse. The probability of a parse tree is computed by
neural network model might use an attention mechanism to learn multiplying the probabilities of its constituent production rules,
which words in a sentence are most relevant for predicting the from the root symbol down to the leaves. The probability of a
syntactic structure. sentence is computed by summing the probabilities of all parse
trees that generate the sentence.
5. Ensemble models:
Here is an example of a PCFG for the sentence "the cat saw the
Ensemble models combine the predictions of multiple
dog":
parsing models to achieve higher accuracy and robustness. These
models can be based on various techniques, such as voting, S -> NP VP [1.0]t
weighting, or stacking. For example, an ensemble model might
NP -> Det N [0.6]
combine the predictions of a rule-based model, a statistical model,
and a neural network model to improve the overall accuracy of the NP -> N [0.4]
parsing system. VP -> V NP [0.8]
Overall, there are many models for ambiguity resolution in VP -> V [0.2]
parsing, each with its own strengths and weaknesses. The choice
of model depends on the specific application and the available Det -> "the" [0.9]
resources, such as training data and computational power. Det -> "a" [0.1]
1.1 Probabilistic Context-Free Grammars: N -> "cat" [0.5]
Probabilistic Context-Free Grammars (PCFGs) are a N -> "dog" [0.5]
popular model for ambiguity resolution in parsing. PCFGs extend
context-free grammars (CFGs) by assigning probabilities to each V -> "saw" [1.0]
production rule, representing the likelihood of generating a certain In this PCFG, each production rule is annotated with a
symbol given its parent symbol. probability. For example, the rule NP -> Det N [0.6] has a
probability of 0.6, indicating that a noun phrase can be generated
Page 39 of 76
R22 B.Tech. CSE NLP

by first generating a determiner, followed by a noun, with a "the") * P(N -> "dog") = 1.0 * 0.6 * 0.9 * 0.5 * 0.8 *1.0 * 0.6 *
probability of 0.6. 0.9 * 0.5 = 0.11664
To parse the sentence "the cat saw the dog" using this Thus, the probability of the best parse tree for the sentence
PCFG, we can use the CKY algorithm to generate all possible "the cat saw the dog" is0.11664. This probability can be used to
parse trees and compute their probabilities. The algorithm starts select the most likely parse among all possible parse trees for the
by filling in the table of all possible subtrees for each span of the sentence.
sentence, and then combines these subtrees using the production
1.2 Generative Models for Parsing:
rules of the PCFG. The final cell in the table represents the
probability of the best parse tree for the entire sentence. Generative models for parsing are a family of models that
generate a sentence's parse tree by generating each node in the tree
Using the probabilities from the PCFG, the CKY algorithm
according to a set of probabilistic rules. One such model is the
generates the following parse tree for the sentence "the cat saw the
probabilistic earley parser.
dog":
The early parser uses a chart data structure to store all
S
possible parse trees for a sentence. The parser starts with an empty
/ \ chart, and then adds new parse trees to the chart as it progresses
through the sentence. The parser consists of three mainstages:
NP VP
prediction, scanning, and completion.
/ \ / \
In the prediction stage, the parser generates new items in the
Det N V NP chart by applying grammar rules that can generate non-terminal
| | | / \ symbols. For example, if the grammar has a rule S -> NP VP, the
parser would predict the presence of an S symbol in the current
the cat saw the dog span of the sentence by adding a new item to the chart that
The probability of this parse tree is computed as follows: indicates that an S symbol can be generated by an NP symbol
followed by a VP symbol.
P(S -> NP VP) * P(NP -> Det N) * P(Det -> "the") * P(N -> "cat")
* P(VP -> V NP) * P(V ->"saw") * P(NP -> Det N) * P(Det -> In the scanning stage, the parser checks whether a word in
the sentence can be assigned to a non-terminal symbol in the chart.
Page 40 of 76
R22 B.Tech. CSE NLP

For example, if the parser has predicted an NP symbol in the N -> "cat" [0.5]
current span of the sentence, and the word "dog" appears in that
N -> "dog" [0.5]
span, the parser would add a new item to the chart that indicates
that the NP symbol can be generated by the word "dog". V -> "saw" [1.0]
In the completion stage, the parser combines items in the Initial chart:
chart that have the same end position and can be combined 0: [S -> * NP VP [1.0], 0, 0]
according to the grammar rules. For example, if the parser has
added an item to the chart that indicates that an NP symbol can be 0: [NP -> * Det N [0.6], 0, 0]
generated by the word "dog", and another item that indicates that 0: [NP -> * N [0.4], 0, 0]
a VP symbol can be generated by the word "saw" and an NP
symbol, the parser would add a new item to the chart that indicates 0: [VP -> * V NP [0.8], 0, 0]
that an S symbol can be generated by an NP symbol followed by 0: [VP -> * V [0.2], 0, 0]
a VP symbol.
0: [Det -> * "the" [0.9], 0, 0]
Here is an example of a probabilistic earley parser applied to the
sentence "the cat saw the dog": 0: [Det -> * "a" [0.1], 0, 0]

Grammar: 0: [N -> * "cat" [0.5], 0, 0]

S -> NP VP [1.0] 0: [N -> * "dog" [0.5], 0, 0]

NP -> Det N [0.6] 0: [V -> * "saw" [1.0], 0, 0]

NP -> N [0.4]
VP -> V NP [0.8] Predicting S:

VP -> V [0.2] 0: [S -> * NP VP [1.0], 0, 0]

Det -> "the" [0.9] 1: [NP -> * Det N [0.6], 0, 0]

Det -> "a" [0.1] 1: [NP -> * N [0.4], 0, 0]


Page 41 of 76
R22 B.Tech. CSE NLP

1: [VP -> * V NP [0.8], 0 F1: current word is "the"


F2: current word is "cat"
1.3 Discriminative Models for Parsing: F3: current word is "saw"
Discriminative models for parsing are a family of models F4: current word is "dog"
that predict a sentence's parse tree by learning to discriminate
F5: current span is "the cat"
between different possible trees. One such model is the maximum
entropy markov model. F6: current span is "cat saw"
The Maximum Entropy Markov Model (MEMM) is a F7: current span is "saw the"
discriminative model that models the conditional probability of a F8: current span is "the dog"
parse tree given a sentence. The model is trained on a corpus of
labeled sentences and their corresponding parse trees. During F9: partial parse tree is "S -> NP VP"
training, the model learns a set of feature functions that map the
current state of the parser (i.e., the current span of the sentence and
the partial parse tree constructed so far) to a set of binary features Weights:
that are indicative of a particular parse tree. The model then learns F1: 1.2
the weight of each feature function using maximum likelihood
estimation. F2: 0.5

During testing, the MEMM uses the learned feature F3: 0.9
functions and weights to score each possible parse tree for the F4: 1.1
input sentence. The model then selects the parse tree with the
highest score as the final parse tree for the sentence. F5: 0.8

Here is an example of a MEMM applied to the sentence "the cat F6: 0.6
saw the dog": F7: 0.7
Features: F8: 0.9
Page 42 of 76
R22 B.Tech. CSE NLP

F9: 1.5 - - NP -> Det N


- - - Det -> "the"
Possible parse trees and their scores: - - - N -> "dog"
S -> NP VP Score: 4.9
- NP -> Det N
- - Det -> "the" S -> NP VP
- - N -> "cat" - NP -> Det N
- VP -> V NP - - Det -> "the"
- - V -> "saw" - - N -> "cat"
- - NP -> Det N - VP -> V
- - - Det -> "the" - - V -> "saw"
- - - N -> "dog" - NP -> Det N
Score: 5.7 - - Det -> "the"
- - N -> "dog"
S -> NP VP Score: 3.5
- NP -> N
- - N -> "cat" Selected parse tree:
- VP -> V NP S -> NP VP
- - V -> "saw" - NP -> Det N
Page 43 of 76
R22 B.Tech. CSE NLP

- - Det -> "the" constructed, how they are written, and how they are used in
context.
- - N -> "cat"
For example, in English, words are typically separated by
- VP -> V NP
spaces, making it relatively easy to tokenize a sentence into
- - V -> "saw" individual words. However, in some languages, such as Chinese
- - NP -> Det N or Japanese, there are no spaces between words, and the text must
be segmented into individual units of meaning based on other
- - - Det -> "the" cues, such as syntax or context.
- - - N -> "dog" Furthermore, even within a single language, there can be
Score: 5.7 variation in how words are spelled or written. For example, in
English, words can be spelled with or without hyphens or
In this example, the MEMM generates a score for each apostrophes, and there can be differences in spelling between
possible parse tree and selects the parse tree with the highest score American English and British English.
as the final parse tree for the sentence. The selected parse tree
corresponds to the correct parse for the sentence. Multilingual issues in tokenization arise because different
languages can have different character sets, which means that the
same sequence of characters can represent different words in
2. Multilingual Issues: different languages. Additionally, some languages have complex
morphology, which means that a single word can have many
In natural language processing (NLP), a token is a sequence different forms that represent different grammatical features or
of characters that represents a single unit of meaning. In other meanings.
words, it is a word or a piece of a word that has a specific meaning
within a language. The process of splitting a text into individual To address these issues, NLP researchers have developed
tokens is called tokenization. multilingual tokenization techniques that take into account the
specific linguistic features of different languages. These
However, the definition of what constitutes a token can vary techniques can include using language-specific dictionaries,
depending on the language being analyzed. This is because models, or rules to identify the boundaries between words or units
different languages have different rules for how words are of meaning indifferent languages.
Page 44 of 76
R22 B.Tech. CSE NLP

2.1 Tokenization, Case, and Encoding: character encoding standard that can represent a wide range of
characters from different languages.
Tokenization, case, and encoding are all important aspects
of natural language processing (NLP) that are used to preprocess Here is an example of how tokenization, case, and encoding might
text data before it can be analyzed by machine learning algorithms. be applied to a sentence of text:
Here are some examples of each:
Text: "The quick brown fox jumps over the lazy dog."
Tokenization:
Tokenization: ["The", "quick", "brown", "fox", "jumps", "over",
Tokenization is the process of splitting a text into individual "the", "lazy", "dog", "."]
tokens or words. In English, this is typically done by splitting the
Case: ["the", "quick", "brown", "fox", "jumps", "over", "the",
text on whitespace and punctuation marks. For example, the
"lazy", "dog", "."]
sentence "The quick brown fox jumps over the lazy dog." would
be tokenized into the following list of words: Encoding: [0x74, 0x68, 0x65, 0x20, 0x71, 0x75, 0x69, 0x63,
0x6b, 0x20, 0x62, 0x72,0x6f, 0x77, 0x6e, 0x20, 0x66, 0x6f, 0x78,
1. ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy",
0x20, 0x6a, 0x75, 0x6d, 0x70, 0x73, 0x20,0x6f, 0x76, 0x65, 0x72,
"dog", "."].
0x20, 0x74, 0x68, 0x65, 0x20, 0x6c, 0x61, 0x7a, 0x79,
Case: 0x20,0x64, 0x6f, 0x67, 0x2e]
2. Case refers to the use of upper and lower case letters in text. In Note that the encoding is represented in hexadecimal to show the
NLP, it is often important to standardize the case of words to avoid underlying bytes that represent the text.
treating the same word as different simply because it appears in
2.2 Word Segmentation:
different case. For example, the words "apple" and "Apple" should
be treated as the same word. Word segmentation is one of the most basic tasks in Natural
Language Processing(NLP), and it involves identifying the
Encoding:
boundaries between words in a sentence. However, in some
3. Encoding refers to the process of representing text data in a way languages, such as Chinese and Japanese, there is no clear spacing
that can be processed by machine learning algorithms. One or punctuation between words, which makes word segmentation
common encoding method used in NLP is Unicode, which is a more challenging.

Page 45 of 76
R22 B.Tech. CSE NLP

In Chinese, for example, a sentence like "我喜欢中文"  Vietnamese: Vietnamese uses the Latin alphabet, but it also
has many diacritics (accent marks) that can change the
(which means "I like Chinese") could be segmented in different
meaning of a word. In addition, Vietnamese words can be
ways, such as "我 / 喜欢 / 中文" or "我喜欢 / 中文".Similarly, in formed by combining smaller words, which makes word
Japanese, a sentence like "私は日本語が好きです" (which also segmentation more complex.
means "Ilike Japanese") could be segmented in different ways, To address these challenges, NLP researchers have developed
such as "私は / 日本語が / 好きです" or "私は日本語 / が好き various techniques for word segmentation, including rule-based
です". approaches, statistical models, and neural networks. However,
Here are some examples of the challenges of word segmentation word segmentation is still an active area of research, especially for
in different languages: low-resource languages where large amounts of annotated data are
not available.
 Chinese: In addition to the lack of spacing between words,
Chinese also has a large number of homophones, which are 2.3 Morphology:
words that sound the same but have different meanings. For Morphology is the study of the structure of words and how
example, the words "你" (you) and "年" (year) sound the they are formed from smaller units called morphemes.
same in Mandarin, but they are written with different Morphological analysis is important in many natural language
characters. processing tasks, such as machine translation and speech
 Japanese: Japanese also has a large number of recognition, because it helps to identify the underlying structure
homophones, but it also has different writing systems, of words and to disambiguate their meanings.
including kanji (Chinese characters), hiragana, and Here are some examples of the challenges of morphology in
katakana. Kanji can often have multiple readings, which different languages:
makes word segmentation more complex.
 Thai: Thai has no spaces between words, and it also has no  Turkish: Turkish has a rich morphology, with a complex
capitalization or punctuation. In addition, Thai has a unique system of affixes that can be added to words to convey
script with many consonants that can be combined with different meanings. For example, the word "kitap" (book)
different vowel signs to form words. can be modified with different suffixes to indicate things
like possession, plurality, or tense.
Page 46 of 76
R22 B.Tech. CSE NLP

 Arabic: Arabic also has a rich morphology, with a complex Semantic Parsing I:
system of prefixes, suffixes, and infixes that can be added to
1. Introduction
words to convey different meanings. For example, the root
"k-t-b" (meaning "write") can be modified with different 2. Semantic Interpretation
affixes to form words like "kitab" (book) and "kataba" (he
3. System Paradigms,
wrote).
 Finnish: Finnish has a complex morphology, with a large 4. Word Sense
number of cases, suffixes, and vowel harmony rules that can
1. Introduction to Semantic Parsing:
affect the form of a word. For example, the word "käsi"
(hand) can be modified with different suffixes to indicate What is Semantic parsing?
things like possession, location, or movement.
The process of understanding the meaning and
 Swahili: Swahili has a complex morphology, with a large interpretation words, signs and sentence structure is called
number of prefixes and suffixes that can be added to words semantic parsing.
to convey different meanings. For example, the word
"kutaka" (to want) can be modified with different prefixes  Using semantic parsing, the computers can understand
and suffixes to indicate things like tense, negation, or Natural language the way humans do.
subject agreement.  It is the toughest phase and not fully solved.
 Semantic--------> study of meaning
To address these challenges, NLP researchers have developed  Parsing----------> Identify and relates peace of information.
various techniques for morphological analysis, including rule-
based approaches, statistical models, and neural networks.
However, morphological analysis is still an active area of 2. Semantic Interpretation:
research, especially for low-resource languages where large
amounts of annotated data are not available. Semantic Interpretation:
 Semantic parsing is considered as a part of a large process
semantic interpretation.
 Semantic interpretation is a kind of representation of text
that can be fed into a computer to allow further
Page 47 of 76
R22 B.Tech. CSE NLP

computational manipulations and search. • Predicate-Argument structure


 Here we discuss about some of the main components of this
• Meaning Representation
process.
 We begin the discuss with Syntactic structures by Chomsky
which introduced the concept of a transformational phrase
structure grammar. 2.1 Structural Ambiguity:
 Later Katz and Fodor wrote a paper “The structure of a Structural Ambiguity arises when a sentence has more than one
semantic theory” where they proposed a few properties a meaning due to the way words are arranged in that sentence.
semantic theory should possesses.
For example,
A semantic theory should be able to:
The sentence “sara caught the butterfly by the tree” is structurally
 Explain sentences having ambiguous meaning. (Example: ambiguous because it has 2 meaning.
the sentence “the bill is large” can represent money or
the beak of a bird) 1. Sara caught the butterfly while she was standing by the
 Resolve the ambiguities of the words in the context. tree.
(Example: the sentence “the bill is large but need not be 2. Sara caught the butterfly which was fluttering near the
paid” can be disambiguated) tree.
 Identify meaningless but syntactically well-formed
sentences. (Example “colorless green ideas sleep
furiously”) 2.2 Word Sense:
 Identify syntactically or transformationally unrelated The same word type, or word lemma, is used in different
paraphrases of a concept having the same semantic morphological variants to represent different entities or concepts
content. in the world.
Requirements for achieving a semantic representation: For example,
• Structural Ambiguity Take a word nail (human anatomy & metallic object)
• Word Sense 1. He nailed the loose arm of the chair.
• Entity and Event Resolution 2. He got a box of metallic nails.
Page 48 of 76
R22 B.Tech. CSE NLP

3. This nails are growing too fast. did what to whom, when, where, why and how.
4. He went to manicure to remove his nails.

2.3 Entity and Event Resolution:


Entity: The process of identifying entities i.e. people,
organization, location and more.

Example:
Steve jobs was to co-founder of Apple, which is headquartered in
Cupertino.

2.5 Meaning Representation:


Event: The actions described with association entities.
The final process of the semantic interpretation is to build
Example: a semantic representation or meaning representation that can be
manipulated by algorithms.
Elon Musk announced that Tesla will build a new factory in Texas.
This process is sometimes called as deep presentation.
 Identifying and linking references to the same entity across
the text. For example,
1. If our player 2 has the ball, then position our player 5
in the midfield.
2.4 Predicate-Argument structure:
After identifying the word senses, entities and events we ((bowler (player 2)) (do (player 5) (pos (midfield))))
have to identify the participants of the entities in these events.
2. Which river is the longest?
Generally, this process can be defined as the identification of who
Page 49 of 76
R22 B.Tech. CSE NLP

Answer (x1, longest (x1 river(x1))) • Semi-supervised: manual annotation is very expensive and
does not yield enough data. In such instances researches can
3. System Paradigms: automatically expand the dataset on which their models are
 Researchers from linguistics community have examined trained either by employing machine-generated output directly
meaning representations at different levels of granularity or by bootstrapping off of an existing model by having human
(the level of detail) and generality (how broad of general the corrects its output. In many cases, a model from one domain is
information is). used quickly to adapt to a new domain.
 Many of the potential experimental conditions, no hand 3.2 Scope:
annotated data is available.
 Therefore, it is important to get a perspective on the various a) Domain Dependent: These systems are specific to certain
primary dimensions on which the problem of semantic domains.
interpretation has been tackled.
b) Domain Independent: These systems are general enough
The historic approaches which are more prevalent and that the techniques can be applicable to multiple domains.
successful generally fall into 3 categories.
3.3 Coverage:
3.1 SYSTEM ARCHITECTURES:
 Shallow: These systems tend to produce an intermediate
• Knowledge based: These systems use a predefined set of representation that can then be converted to one that a
rules or a knowledge base to obtain a solution to a new machine can base its actions on.
problem.  Deep: These systems usually create a terminal
representation that is directly consumed by a machine
• Unsupervised: These systems tend to require minimal or application.
human intervention to be functional by using existing resources
that can be bootstrapped for a particular application or problem 4. Word Sense:
domain.
Researchers have explored various system architectures to
• Supervised: These systems require some manual annotation. address the sense disambiguation problem.
Typically, researches create feature functions. A model is
trained to use these features predict labels, and then it is applied We classify these systems into four main categories:
to unseen data. 1. Rule based or knowledge based
Page 50 of 76
R22 B.Tech. CSE NLP

2. Supervised relationships).
3. Unsupervised  This increase the accuracy of overlap
4. Semi-supervised measurement and improves the
disambiguation performance.
4.1 Rule based: iii) Structural Semantic Interconnections (SSI):
Rule based system for word sense disambiguation are  Proposed by Navigli and Velardi.
among the earliest methods developed to tackle the  Constructs semantic graphs using resources
like WordNet, domain labels, and annotated
 Problem of determining the correct meaning of a word corpora.
based on its context.  Uses an Iterative algorithm to match semantic
 These systems rely heavily on dictionaries, thesauri, and graphs of context words with the target word
handcrafted rules. until the best matching sense is identified.
Algorithms and Techniques: Working of rule based:
i) Lesk Algorithm: 1. Context collection
 One of the oldest and simplest dictionary 2. Dictionary / Thesaurus matching
based algorithms. 3. Weight computation
 The algorithm assigns the sense of a word that 4. Sense selection
has the most overlap in terms of word with
the words in its context. Advantages:

Example: if the word “bank” appears in a context 1. Simple and intuitive approach
with words like “monkey” and 2. Can be very effective when precise dictionary
definitions or thesaurus categories are available.
“deposit” the financial sense of “bank” is chosen.
Limitations:
ii) Enhanced Lesk algorithm:
 Banerjee and Pedersen extended the lesk 1. Heavily reliant on the availability and quality of lexical
algorithm to include synonyms, hypernyms resources.
(more general terms), hyponyms (more 2. Handcrafted rules can be labor-intensive and may not
specific terms) and meronyms (part-whole cover all possible contexts.
Page 51 of 76
R22 B.Tech. CSE NLP

4.2 Supervised systems: or object.


 Supervised systems for word sense disambiguation use Advantages:
machine learning to train classifiers on manually annotated
datasets. 1. Typically achieves high accuracy due to the rich feature sets
 These systems typically perform better than unsupervised and annotated data.
methods when tested on annotated data but require Limitations:
significant manual effort for annotation and a
predetermined sense inventory. 1. Requires a large amount of manually annotated data, which
 Use various machine learning models, like Support Vector is time consuming and expensive.
Machines (SVMs) and Maximum Entropy(MaxEnt) 4.3 Unsupervised systems:
classifiers.
 Unsupervised system for word sense disambiguation tackle
Features used in supervised system: the problem without relying heavily on manually annotated
a) Lexial Context: training data.
This feature comprises the words and lemmas of words  These systems are essential due to scarcity of labelled data
occurring in the entire paragraph. for every sense of each word in a given language.
b) Parts of Speech:  The key strategies include clustering, distance metrics and
POS tags for words in the context window. leveraging cross-linguistic evidence.
c) Local Collocations: Key approaches:
Sequence of nearby words that provide semantic context.
Example: for the word “nail” in “He bought a box of nails,” i) Group similar instances of word into clusters, where each
collocations features might include “box_of” and cluster represents a different sense of the word.
”of_nails.” ii) Use a measure of semantic distance to determine the sense
d) Topic Feature: of a word by finding how close it is to known senses in
The broad topic or domain of the text can indicate the likely a semantic network like WordNet.
sense of a word. iii) Start with a few examples of each sense (seeds) and grow
e) Additional Rich Features: these examples into large clusters.
Voice of the Sentence: Active, Passive or Semi-passive
Presence of subject/ object: Whether the word has subject Advantages:
Page 52 of 76
R22 B.Tech. CSE NLP

 No need for extensive manual annotation, making it semisupervised WSD.


scalable and adaptable to various languages.  It uses initial seed examples to iteratively classify and
 Can discover new senses not present in predefined sense expand the training set.
inventories. i) Initial Seed selection
ii) Training
Limitations: iii) Iteration
 May require sophisticated algorithms to achieve high iv) Termination
accuracy. v) Application
 Performance can be lower compared to supervised Extensions and variations:
systems when tested on well-annotated data.
1. SAALAM Algorithm:
4.4 Semi-supervised systems: Groups word that translate into the same word in another
 Semisupervised systems for word sense disambiguation language, identifies senses using WordNet proximity, and
combine limited labelled data with a larger pool of propagate sense tags across the parallel text.
unlabelled data to iteratively improve classification 2. Unsupervised to supervised combination:
performance. Use unsupervised methods to create labelled data, then train
 These methods aim to leverage the strengths of both supervised models on this data. This hybrid approach aims
supervised and unsupervised approaches. to combine the scalability of unsupervised methods with the
precision of supervised learning.
Key Principles:
Advantages:
1. One sense per collation:
Words that occur in specific syntactic relationships or  Reduce the need for extensive manual annotation.
nearby certain types of words often share the same sense.  Utilizes both labelled and unlabelled data, making it more
2. One sense per disclosure: scalable than purely supervised approaches.
Within a given disclosure, instances of the same word tend Limitations:
to share the same sense.
 Performance depends on the quality and representativeness
Yarowsky Algorithm: of initial seed examples.
 Introduced by yarowsky, this algorithm is foundational for  Potential noise from incorrect automatic labelling, though
Page 53 of 76
R22 B.Tech. CSE NLP

mitigated by constraints like one sense per discourse.


Performance:
 Studies have shown semisupervised methods to perform Semantic Parsing II: Predicate-Argument Structure, Meaning
well, often achieving accuracy in the mid-80% range when Representation Systems
tested on standard datasets.
Software: 1. Predicate-Argument Structure:
• Several software programs are available for word sense  Predicate argument structure is also called as semantic role
disambiguation. labelling, is a method used to identify the roles of different
• IMS (It makes Sense): This is a complete word sense parts of a sentence.
disambiguation system  The “predicate” is usually a verb (but can also be a noun,
adjective, or preposition) and the “arguments” are the
• WordNet Similarity-2.05: These WordNet similarity modules entities that participate in the action or state described by
for Perl provide a quick way of computing various word similarity the predicate.
measures.
Example:
• WikiRelate: This is a word similarity measure based on Consider the sentence: "The cat chased the mouse."
categories in Wikipedia.
Predicate: chased
Arguments:
The cat (agent)
the mouse (patient)

The PAS for this sentence would be: chased (cat, mouse)

1.1 Resources:
 These resources help computers understand the
Page 54 of 76

You might also like