0% found this document useful (0 votes)
93 views14 pages

Unit-3 Aim 502

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views14 pages

Unit-3 Aim 502

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

AIM-502: UNIT-3 SYNTACTIC ANALYSIS

3.1 Define Context-Free Grammar


Context-free grammar consists of a set of rules expressing how symbols of the
language can be grouped and ordered together and a lexicon of words and symbols.
 One example rule is to express an NP (or noun phrase) that can be composed of
either a ProperNoun or a determiner (Det) followed by a Nominal, a Nominal in turn
can consist of one or more Nouns:
NP → DetNominal, NP → ProperNoun; Nominal → Noun | NominalNoun
 Context-free rules can also be hierarchically embedded, so we can combine the
previous rules with others, like the following, that express facts about the lexicon:
Det → a Det → the Noun → flight
 Context-free grammar is a formalism power enough to represent complex relations
and can be efficiently implemented. Context-free grammar is integrated into many
language applications
 A Context free grammar consists of a set of rules or productions, each expressing
the ways the symbols of the language can be grouped, and a lexicon of words
Context-free grammar (CFG) can also be seen as the list of rules that define the set of
all well-formed sentences in a language. Each rule has a left-hand side that identifies a
syntactic category and a right-hand side that defines its alternative parts reading from
left to right.
Example: The rule s --> np vp means that "a sentence is defined as a noun phrase
followed by a verb phrase."
Formalism in rules for context-free grammar: A sentence in the language defined by
a CFG is a series of words that can be derived by systematically applying the rules,
beginning with a rule that has s on its left-hand side.
o Use of parse tree in context-free grammar: A convenient way to describe a
parse is to show its parse tree, simply a graphical display of the parse.
o A parse of the sentence is a series of rule applications in which a syntactic
category is replaced by the right-hand side of a rule that has that category
on its left-hand side, and the final rule application yields the sentence itself.
 Example: A parse of the sentence "the giraffe dreams" is:
s => np vp => det n vp => the n vp => the giraffe vp => the giraffe iv => the giraffe
dreams

3.2 Define Grammar rules for English


Grammar in NLP is a set of rules for constructing sentences in a language used to
understand and analyze the structure of sentences in text data.

1|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

This includes identifying parts of speech such as nouns, verbs, and adjectives,
determining the subject and predicate of a sentence, and identifying the relationships
between words and phrases.
Grammar also plays an essential role in describing the syntactic structure of well-formed
programs, like denoting the syntactical rules used for conversation in natural languages.
In the theory of formal languages, grammar is also applicable in Computer Science,
mainly in programming languages and data structures. Example - In the C programming
language, the precise grammar rules state how functions are made with the help of lists
and statements.
Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P)
where:
o N or VN = set of non-terminal symbols or variables.

o S = Start symbol where S ∈ N


o T or ∑ = set of terminal symbols.

o It has the form α→β, where α and β are strings on VN ∪∑, and at least one
o P = Production rules for Terminals as well as Non-terminals.

symbol of α belongs to VN

3.3 Classify Treebanks


Semantic and Syntactic Treebanks are the two most common types of Treebanks in
linguistics. Let us now learn more about these types −
Semantic Treebanks
These Treebanks use a formal representation of sentence’s semantic structure. They
vary in the depth of their semantic representation. Robot Commands Treebank,
Geoquery, Groningen Meaning Bank, RoboCup Corpus are some of the examples of
Semantic Treebanks.
Syntactic Treebanks
Opposite to the semantic Treebanks, inputs to the Syntactic Treebank systems are
expressions of the formal language obtained from the conversion of parsed Treebank
data. The outputs of such systems are predicate logic based meaning representation.
Various syntactic Treebanks in different languages have been created so far. For
example, Penn Arabic Treebank, Columbia Arabic Treebank are syntactic Treebanks
created in Arabia language. Sininca syntactic Treebank created in Chinese
language. Lucy, Susane and BLLIP WSJ syntactic corpus created in English language.

3.4 Explain Normal Forms for grammar


A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid
objects, with the following two properties:
● For every element c of C, except possibly a finite set of special cases, there exists
some element f of F such that f is equivalent to c with respect to some set of tasks.
 F is simpler than the original form in which the elements of C are written. By “simpler”
we mean that at least some tasks are easier to perform on elements of F than they
would be on elements of C.
Types of Normal forms
1) Chomsky Normal Form (CNF)
2) Greibach Normal Form (GNF)

2|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

Chomsky's normal form (CNF) is a method to simplify context-free grammar.


Every grammar in CNF is context-free, and every CFG can be converted into CNF.
Grammar is in CNF if all the productions follow either of the following set of rules:
 Start symbol generates null.
 A nonterminal generates two nonterminals.
 A nonterminal generates one terminal.
Steps to convert grammar to CNF
The following steps are used to convert context-free grammar into Chomsky's normal
form.
1. Introduce a new start variable that produces the old start variable.
2. Remove the null productions from non-starting variables.
3. Remove the unit productions.
4. Convert all rules such that the right side has one terminal and two variables. New
variables can be introduced to convert the rules.
Example
Let's consider the following CFG:
S→ASA∣Ab
A→B∣S
B→b∣ϵ
Step 1: Add a new start state
S0→S
S→ASA∣aB
A→B∣S
B→b∣ϵ
Step 2: Remove null productions
Remove null productions iteratively. Replace the values of the variables that produce
null in any other production.
Step 2.1: Removing B→ϵ
S0→S
S→ASA∣aB∣a
A→B∣S∣ϵ
B→b
Step 2.2: Removing A→ϵ
S0→S
S→ASA∣aB∣a∣AS∣SA
A→B∣S
B→b
Step 3: Remove unit productions
Step 3.1: Replace the productions produced by S in S0 and A
S0→ASA∣aB∣a∣AS∣SA
S→ASA∣aB∣a∣AS∣SA
A→B∣ASA∣aB∣a∣AS∣SA
B→b
Step 3.2: Replace the productions produced by B in A
S0→ASA∣aB∣a∣AS∣SA
S→ASA∣aB∣a∣AS∣SA
A→b∣ASA∣aB∣a∣AS∣SA
B→b

3|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

Step 4: Incorporate one terminal and two variables in each production


Step 4.1: Introduce C→a where a occurs with a variable
S0→ASA∣CB∣a∣AS∣SA
S→ASA∣CB∣a∣AS∣SA
A→b∣ASA∣CB∣a∣AS∣SA
B→b
C→a
Step 4.2: Introduce D→AS where AS occurs with another variable
S0→DA∣CB∣a∣AS∣SA
S→DA∣CB∣a∣AS∣SA
A→b∣DA∣CB∣a∣AS∣SA
B→b
C→a
D→AS
Result
The following grammar is in CNF.
S0→DA∣CB∣a∣AS∣SA
S→DA∣CB∣a∣AS∣SA
A→b∣DA∣CB∣a∣AS∣SA
B→b
C→a
D→AS
Greibach Normal Form (GNF)
A CFG(context free grammar) is in GNF(Greibach normal form) if all the production
rules satisfy one of the following conditions:
 A start symbol generating ε. For example, S → ε.
 A non-terminal generating a terminal. For example, A → a.
 A non-terminal generating a terminal which is followed by any number of non-
terminals. For example, S → aASB.
For example:
G1 = {S → aAB | aB, A → aA| a, B → bB | b}
G2 = {S → aAB | aB, A → aA | ε, B → bB | ε}
The production rules of Grammar G1 satisfy the rules specified for GNF, so the
grammar G1 is in GNF. However, the production rule of Grammar G2 does not satisfy the
rules specified for GNF as A → ε and B → ε contains ε(only start symbol can generate ε).
So the grammar G2 is not in GNF.
Steps for converting CFG into GNF
Step 1: Convert the grammar into CNF.
If the given grammar is not in CNF, convert it into CNF. You can refer the following topic to
convert the CFG into CNF: Chomsky normal form
Step 2: If the grammar exists left recursion, eliminate it.
If the context free grammar contains left recursion, eliminate it. You can refer the following
topic to eliminate left recursion: Left Recursion
Step 3: In the grammar, convert the given production rule into GNF form.
If any production rule in the grammar is not in GNF form, convert it.
Example:
S → XB | AA
A → a | SA
B→b
X→a

4|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can
skip step 1 and step 2 and directly go to step 3.
The production rule A → SA is not in GNF, so we substitute S → XB | AA in the
production rule A → SA as:
S → XB | AA
A → a | XBA | AAA
B→b
X→a
The production rule S → XB and B → XBA is not in GNF, so we substitute X → a in
the production rule S → XB and B → XBA as:
S → aB | AA
A → a | aBA | AAA
B→b
X→a
Now we will remove left recursion (A → AAA), we get:
S → aB | AA
A → aC | aBAC
C → AAC | ε
B→b
X→a
Now we will remove null production C → ε, we get:
S → aB | AA
A → aC | aBAC | a | aBA
C → AAC | AA
B→b
X→a
The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA
in production rule S → AA as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → AAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA
in production rule C → AAC as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → aCAC | aBACAC | aAC | aBAAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
Hence, this is the GNF form for the grammar G.

3.5 State the importance of Dependency Grammar

5|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

In natural language processing, dependency parsing is a technique used to identify


semantic relations between words in a sentence. Dependency parsers are used to map
the words in a sentence to semantic roles, thereby identifying the syntactic relations
between words. Dependency parsing is a well-known approach for syntactic analysis of
natural language texts at the surface structure level. In this method, the syntactic
structure of a sentence is recovered from a linear sequence of word tokens, by analyzing
the syntactic dependencies between words and identifying the syntactic category of
each word.
Applications of dependency Parsing:
One of the major uses of dependency parsing is in semantic role labeling (SRL) and
information extraction, which are components of natural language processing.
Dependency parsing is also used for syntactic chunking and constituency parsing
outside of NLP tasks. Dependency parsing is fundamentally different from phrase
structure parsing, which maps the words in a sentence to the corresponding phrase
marker or tree structure.
3.6 Describe the process of Syntactic Parsing
The third stage of NLP is syntax analysis, also known as parsing or syntax analysis. The
goal of this phase is to extract exact meaning, or dictionary meaning, from the text.
Syntax analysis examines the text for meaning by comparing it to formal grammar rules.
The sentence "hot ice cream," for example, would be rejected by a semantic analyzer.
In this sense, syntactic analysis or parsing can be defined as the process of analyzing
natural language strings of symbols in accordance with formal grammar rules.
 Concept of Parser: Used to establish the parsing. It is a software component that
accepts input data (text) and returns a structural representation of the input after
checking for correct syntax using formal grammar. Additionally, a data structure
that typically takes the form of a parse tree, abstract syntax tree, or another
hierarchical structure is built. Top-down Parsing and Bottom-up Parsing are
available.
 Concept of Derivation: There are production rules for derivation. During parsing,
we must determine the non-terminal to be replaced as well as the production rule
that will be used to replace the non-terminal. To determine which non-terminal
should be replaced with a production rule, two different types of derivations can be
used: left-most and right-most.
 Concept of Parse Tree: It is a graphical representation of a derivation. The parse
tree's root is the derivation's start symbol. The leaf nodes and interior nodes of
each parse tree are terminal and non-terminal, respectively. An attribute of a parse
tree is that it will return the original input string upon in-order traversal.
 Concept of Grammar: Grammar is critical for describing the syntactic structure of
well-formed programs. They denote syntactical rules for conversation in natural
languages in the literary sense. Linguistics has attempted to define grammar since
the beginning of natural languages such as English, Hindi, and others.
 Phrase Structure or Constituency Grammar: The constituency relation is the
foundation of phrase structure grammar. As a result, it is also known as
constituency grammar. It is the polar opposite of dependency grammar.
 Dependency Grammar: Its foundation is a dependency relationship and it is the
opposite of constituency grammar. Dependency grammar (DG) differs from
constituency grammar in that it lacks phrasal nodes.

6|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

 Context-Free Grammar: Context-free grammar, also called CFG, is a notation for


describing languages and a superset of Regular grammar.

3.7 Explain the problem of Ambiguity


Ambiguity, generally used in natural language processing, can be referred as the ability of
being understood in more than one way. In simple terms, we can say that ambiguity is the
capability of being understood in more than one way. Natural language is very ambiguous.
NLP has the following types of ambiguities −
Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity. For example, treating the
word silver as a noun, an adjective, or a verb.
Syntactic Ambiguity
This kind of ambiguity occurs when a sentence is parsed in different ways. For example,
the sentence “The man saw the girl with the telescope”. It is ambiguous whether the man
saw the girl carrying a telescope or he saw her through his telescope.
Semantic Ambiguity
This kind of ambiguity occurs when the meaning of the words themselves can be
misinterpreted. In other words, semantic ambiguity happens when a sentence contains an
ambiguous word or phrase. For example, the sentence “The car hit the pole while it was
moving” is having semantic ambiguity because the interpretations can be “The car, while
moving, hit the pole” and “The car hit the pole while the pole was moving”.
Anaphoric Ambiguity
This kind of ambiguity arises due to the use of anaphora entities in discourse. For
example, the horse ran up the hill. It was very steep. It soon got tired. Here, the anaphoric
reference of “it” in two situations cause ambiguity.
Pragmatic ambiguity
Such kind of ambiguity refers to the situation where the context of a phrase gives it
multiple interpretations. In simple words, we can say that pragmatic ambiguity arises when
the statement is not specific. For example, the sentence “I like you too” can have multiple
interpretations like I like you (just like you like me), I like you (just like someone else dose).

3.8 Explain Dynamic Programming parsing


• Dynamic programming parsing is a technique for parsing natural language
sentences.
• It is a type of top-down parsing, which means that it starts with the entire sentence
and works its way down to the individual words.
• Dynamic programming parsing works by building up a table of possible parses for the
sentence.
• Each entry in the table represents a possible parse for the sentence up to that point.
• The table is built up in a bottom-up fashion, starting with the individual words and
working its way up to the entire sentence.
• Once the table is built, the parser can use it to find the most likely parse for the
sentence.
• The most likely parse is the one that has the highest probability of being correct.
• Dynamic programming parsing is a very powerful technique for parsing natural
language sentences.
• It is able to handle a wide variety of sentence structures, including those that are
ambiguous or grammatically incorrect.

7|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

Here is an example of how dynamic programming parsing works:


The sentence "Rahul is eating an apple”.

3.8.1 Shallow parsing


Shallow parsing, also known as chunking, is a type of natural language processing
(NLP) technique that aims to identify and extract meaningful phrases or chunks from a
sentence. Unlike full parsing, which involves analyzing the grammatical structure of a
sentence, shallow parsing focuses on identifying individual phrases or constituents, such
as noun phrases, verb phrases, and prepositional phrases. Shallow parsing is an
essential component of many NLP tasks, including information extraction, text
classification, and sentiment analysis.

8|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

One of the primary benefits of shallow parsing is its efficiency. Full parsing involves
analyzing the entire grammatical structure of a sentence, which can be computationally
intensive and time-consuming. Shallow parsing, on the other hand, involves identifying and
extracting only the most important phrases or constituents, making it faster and more
efficient than full parsing. This makes shallow parsing particularly useful for applications
that require processing large volumes of text, such as web crawling, document indexing,
and machine translation.

Shallow parsing involves several key steps. The first step is sentence segmentation,
where a sentence is divided into individual words or tokens. The next step is part-of-
speech tagging, where each token is assigned a grammatical category, such as noun,
verb, or adjective. Once the tokens have been tagged, the next step is to identify and
extract the relevant phrases or constituents from the sentence. This is typically done using
pattern matching or machine learning algorithms that have been trained to recognize
specific types of phrases or constituents.
One of the most common types of shallow parsing is noun phrase chunking, which
involves identifying and extracting all the noun phrases in a sentence. Noun phrases
typically consist of a noun and any associated adjectives, determiners, or modifiers. For
example, in the sentence “The black cat sat on the mat,” the noun phrase “the black cat”
can be identified and extracted using noun phrase chunking.
Another common type of shallow parsing is verb phrase chunking, which involves
identifying and extracting all the verb phrases in a sentence. Verb phrases typically
consist of a verb and any associated adverbs, particles, or complements. For example, in
the sentence “She sings beautifully,” the verb phrase “sings beautifully” can be identified
and extracted using verb phrase chunking.

Shallow parsing, also known as chunking, is a natural language processing task that
involves dividing a sentence into meaningful phrases, such as noun phrases or verb
phrases. Here are some common algorithms used for shallow parsing in NLP:
1. Rule-based Chunking: This algorithm uses a set of predefined rules to identify and
extract phrases from a sentence. These rules are based on the part-of-speech tags
and syntactic structure of the sentence. For example, a rule-based chunker might
identify a noun phrase as any sequence of consecutive nouns, adjectives, and
determiners.
2. Hidden Markov Models (HMMs): HMMs are statistical models that can be used for
sequence labeling tasks, such as part-of-speech tagging and chunking. In an HMM-
based chunker, the goal is to find the most likely sequence of chunks given a
sentence. This is done by computing the probability of each possible sequence of
chunks and selecting the one with the highest probability.

9|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

3. Conditional Random Fields (CRFs): CRFs are another type of statistical model that
can be used for sequence labeling tasks. In a CRF-based chunker, the goal is to find
the most likely sequence of chunks given a sentence and the previous chunk labels.
This is done by computing the conditional probability of each possible sequence of
chunks given the sentence and the previous chunk labels.
4. Support Vector Machines (SVMs): SVMs are a type of machine learning algorithm
that can be used for classification tasks, including chunking. In an SVM-based
chunker, the goal is to learn a model that can classify each word in a sentence as
belonging to a particular chunk or not. The model is trained on a labeled dataset,
where each word is annotated with its corresponding chunk label.
5. Maximum Entropy Markov Models (MEMMs): MEMMs are a type of statistical
model that combines features from both HMMs and CRFs. In a MEMM-based
chunker, the goal is to find the most likely sequence of chunks given a sentence and
the previous chunk labels, similar to a CRF-based chunker. However, the model is
trained using maximum entropy, which allows it to capture more complex
dependencies between the input and output sequences.
These algorithms are not exhaustive, and there are other approaches to shallow parsing
as well. The choice of algorithm depends on the specific task and the available resources.

3.8.2 Probabilistic CFG


Probabilistic Context Free Grammar (PCFG) is a type of grammar that is used in natural
language processing and is a probabilistic version of a standard context-free grammar
(CFG), which is a set of rules used to generate strings of words in a particular language
where the model parameters are the probabilities assigned to grammar rules.
 Probabilities of all productions rewriting a given non-terminal must add to 1,
defining a distribution for each non-terminal.
 String generation is probabilistic where production probabilities are used to non-
deterministically select production for rewriting a given non-terminal.
 Each rule is associated with a probability which indicates the likelihood that the rule
will be used to generate a given sentence. This allows the PCFG to model the
uncertainty inherent in natural language and make more accurate predictions about
the structure of sentences.
o Independence assumption is one of the key properties of PCFGs where the
probability of a node sequence depends only on the immediate mother node, not
any node above that or outside the current constituent.
o Different rules may be used depending on the context allowing the PCFG to
model the uncertainty and variability inherent in natural language.
PCFG is a good way to solve ambiguity problems in the syntactic structure field.
Problems with PCFG
 PCFGs suffer from various problems, among which the two most crucial weaknesses
are a lack of sensitivity to lexical information and a lack of sensitivity to structural
preferences.
o Due to these two problems, PCFGs cannot always capture the full range of syntactic
variation and ambiguity that exists in natural languages leading to errors and incorrect
parses, particularly when working with sentences that are structurally complex or
contain multiple possible interpretations.
o Lexicalized PCFGs are developed from a motivation to solve these issues and work
better when compared with PCFG.

10 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

 Complexity: One other major problem with PCFGs is that they can be very complex,
making them difficult to understand and work with. The complexity makes it difficult to
design a PCFG that accurately represents the structure of a given language and
challenging to implement and use a PCFG in a practical application.
 Data Availability: PCFGs also have an issue because they often require a large
amount of training data to produce accurate results. This can be a problem when
working with languages with limited amounts of annotated text or when trying to parse
sentences containing novel or unusual constructions.

3.9 Explain Probabilistic CYK algorithm


The CYK algorithm is a parsing algorithm for context-free grammar. It is used to check
if a particular string can be derived from the language generated by a given grammar. It
is also called the membership algorithm as it tells whether the given string is a member
of the given grammar or not. It was independently developed by three Russian scientists
named Cocke, Younger, and Kasami, hence the name CYK!
In a CYK algorithm, the structure of grammar should be in Chomsky normal form.
In addition, the CYK algorithm uses the dynamic programming or table filling algorithm.
The grammar will be in CNF if each rule has one of the following forms:
 A→BC (at most two variables on the right-hand side)
 A→ a (a single terminal on the right-hand side)
 S→Ø (null string)
If the given Grammar is not in the CNF form, convert it to the CNF form before applying
the CYK Algorithm.
The algorithm
In the CYK algorithm,
 Construct a triangular table of the length of your given string.

Table for string 'w' that has length 3


 Each row corresponds to the length of the substrings of the given word ‘w’:
o The bottom row will have strings of length 1.
o The second row from the bottom will have strings of length 2 and so on.
 X(i,i) is a set of variables A such that A→wi is a production of grammar G,
where: wi is part of the given string ‘w’ or the whole string.
 Compare at most n pairs of previously computed sets. For strings with a length of
2, compare two pairs, and for strings with a length of 3 compare three sets. In this
way, the next sets (A) can be computed, which are derived from the given
grammar as per the following formula:
(X(i,i),X((i+1),j),(X(i,i+1),X(i+2,j))…(X(i,j−1),Xj)
In the equation, X(i,i) refers to the set of variables derived from the production rules.
Rule: If the top row consists of the starting variable of the grammar, then the given string
can be derived from it.
Example

11 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

Let’s look at an example to get a better understanding of the algorithm.

 S→AB ∣ BC
Consider the following CNF grammar G:

 A→BA ∣ a
 B→CC ∣ b
 C→AB ∣ a
The given word w=baaba.
Step 1: Constructing the triangular table

Triangular table for the word 'baaba'


Step 2: Populating the table
 The first row is computed simply by looking at the grammar to see which
production is producing the string of length 1 for the given word ‘w’.
 The second row is computed by comparing the two strings that were computed
previously. So we can have ′′w′: ′′ba′, ′aa′, ′ab′, and ′ba′.
 For the third row, there are three possible substrings of length three in the given
word, namely ′w′: ′baa′, ′aab′, and ′aba′.
 For the substring ′baa′, there are two possibilities:
[′b′,′aa′] and [′ba′,′a′].
Finding the sets to compute these:
[′b′,′aa′]={B}{B}
[′ba′,′a′]={S,A} {A,C}
We take the union of these as follows:
{B}{B}U{S,A}{A,C}{BB}U{SA,SC,AA,AC} → {BB,SA,SC,AA,AC},
Then, we look in the grammar rules for populating the table.
 We’ll repeat this for all substrings with lengths greater than 33.
The illustration below shows how the table is populated.

12 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

This is the final Triangular table.


Step 3: Check the table
Finally, we need to see if the given word is in the language of grammar.

3.10 Describe Probabilistic Lexicalized CFGs


 Probabilistic Lexicalized Context-Free Grammar (PLCFG) is also a type of
grammar used in natural language processing to generate and analyze sentences
in a given language.
 It is a combination of a lexicalized context-free grammar which uses lexical items
that word as the basic units for generating sentences, and probabilistic models
which assign probabilities to the different rules and structures in the grammar.
 Probabilistic Lexicalized Context Free Grammar solve the somewhat separable
weaknesses that stem from the independence assumptions of PCFGs, out of
which the most often remarked on one is their lack of lexicalization.

13 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS

o The chance of a VP expanding as a verb followed by two noun phrases is


independent of the choice of verb involved in a PCFG. This is a weakness as the
possibility is much more likely with ditransitive verbs like a hand or tell than with
other verbs.
Combining lexicalization with probability models allows the probabilistic lexicalized CFG
model to take into account the likelihood of different sentences and structures, making it
useful for tasks such as language modeling and machine translation.

3.11 Describe the Unification of feature structures.


A feature structure is an attempt to understand the structure of grammar and make a
workable system for representing how different elements of grammar interact.
Each feature structure is represented using an attribute-value matrix (AVM). This is a set
of terms and symbols contained within brackets. It almost looks like the grammatical
equivalent of a mathematical formula.
AVMs are divided into two columns. The left-hand column represents the features such
as category and agreement. The right-hand column represents the sub-features of each
feature such as gender and number. Each sub-feature in the right-hand column is
attributed a value. For gender, the value would be male, female or neuter, and for
number it would be single or plural.
An alternative method for representing a feature structure is a directed acyclic graph
(DAG). The graph begins with a dot, also called a node. Each feature diverges off from
the node along a curved arrow. Each feature can then split into sub-features. The sub-
features end in another node that contains the sub-feature’s value.
An example of this would be a node leading to the agreement feature that leads to the
person sub-feature and a value of 3rd. This represents a sentence told in the third
person. In languages where gender is an important feature, the feature arrow could split
into gender as well as person and would result in two values at the end.
For example, this could mean third person and female.
Unification in a feature structure means that two features can split into sub-features,
which then merge. Mergers within unified feature strands have the same value and are
best represented using a DAG rather than an AVM. As well as unifying feature
structures, attempts at unification can also prove certain structures are incompatible.
Feature structures are used for text encoding initiatives (TEI). These create mark-up
schemes for linguists. Such schemes can then be used to analyze or interpret encrypted
and encoded texts.
There are two main problems with using a feature structure.
 Feature structures produce a lot of generalizations.
 They and unification structures cannot contain all possible values in a language.
A suggested solution to this problem is to include a third column or master branch called
types. Each type would organize features into appropriate sections or classes. By doing
this, the features would be regulated and so would the values each feature can take.
The types would work in a hierarchy system to regulate feature interactions.

14 | P a g e

You might also like