Unit-3 Aim 502
Unit-3 Aim 502
1|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
This includes identifying parts of speech such as nouns, verbs, and adjectives,
determining the subject and predicate of a sentence, and identifying the relationships
between words and phrases.
Grammar also plays an essential role in describing the syntactic structure of well-formed
programs, like denoting the syntactical rules used for conversation in natural languages.
In the theory of formal languages, grammar is also applicable in Computer Science,
mainly in programming languages and data structures. Example - In the C programming
language, the precise grammar rules state how functions are made with the help of lists
and statements.
Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P)
where:
o N or VN = set of non-terminal symbols or variables.
o It has the form α→β, where α and β are strings on VN ∪∑, and at least one
o P = Production rules for Terminals as well as Non-terminals.
symbol of α belongs to VN
2|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
3|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
4|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can
skip step 1 and step 2 and directly go to step 3.
The production rule A → SA is not in GNF, so we substitute S → XB | AA in the
production rule A → SA as:
S → XB | AA
A → a | XBA | AAA
B→b
X→a
The production rule S → XB and B → XBA is not in GNF, so we substitute X → a in
the production rule S → XB and B → XBA as:
S → aB | AA
A → a | aBA | AAA
B→b
X→a
Now we will remove left recursion (A → AAA), we get:
S → aB | AA
A → aC | aBAC
C → AAC | ε
B→b
X→a
Now we will remove null production C → ε, we get:
S → aB | AA
A → aC | aBAC | a | aBA
C → AAC | AA
B→b
X→a
The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA
in production rule S → AA as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → AAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA
in production rule C → AAC as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → aCAC | aBACAC | aAC | aBAAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
Hence, this is the GNF form for the grammar G.
5|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
6|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
7|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
8|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
One of the primary benefits of shallow parsing is its efficiency. Full parsing involves
analyzing the entire grammatical structure of a sentence, which can be computationally
intensive and time-consuming. Shallow parsing, on the other hand, involves identifying and
extracting only the most important phrases or constituents, making it faster and more
efficient than full parsing. This makes shallow parsing particularly useful for applications
that require processing large volumes of text, such as web crawling, document indexing,
and machine translation.
Shallow parsing involves several key steps. The first step is sentence segmentation,
where a sentence is divided into individual words or tokens. The next step is part-of-
speech tagging, where each token is assigned a grammatical category, such as noun,
verb, or adjective. Once the tokens have been tagged, the next step is to identify and
extract the relevant phrases or constituents from the sentence. This is typically done using
pattern matching or machine learning algorithms that have been trained to recognize
specific types of phrases or constituents.
One of the most common types of shallow parsing is noun phrase chunking, which
involves identifying and extracting all the noun phrases in a sentence. Noun phrases
typically consist of a noun and any associated adjectives, determiners, or modifiers. For
example, in the sentence “The black cat sat on the mat,” the noun phrase “the black cat”
can be identified and extracted using noun phrase chunking.
Another common type of shallow parsing is verb phrase chunking, which involves
identifying and extracting all the verb phrases in a sentence. Verb phrases typically
consist of a verb and any associated adverbs, particles, or complements. For example, in
the sentence “She sings beautifully,” the verb phrase “sings beautifully” can be identified
and extracted using verb phrase chunking.
Shallow parsing, also known as chunking, is a natural language processing task that
involves dividing a sentence into meaningful phrases, such as noun phrases or verb
phrases. Here are some common algorithms used for shallow parsing in NLP:
1. Rule-based Chunking: This algorithm uses a set of predefined rules to identify and
extract phrases from a sentence. These rules are based on the part-of-speech tags
and syntactic structure of the sentence. For example, a rule-based chunker might
identify a noun phrase as any sequence of consecutive nouns, adjectives, and
determiners.
2. Hidden Markov Models (HMMs): HMMs are statistical models that can be used for
sequence labeling tasks, such as part-of-speech tagging and chunking. In an HMM-
based chunker, the goal is to find the most likely sequence of chunks given a
sentence. This is done by computing the probability of each possible sequence of
chunks and selecting the one with the highest probability.
9|Page
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
3. Conditional Random Fields (CRFs): CRFs are another type of statistical model that
can be used for sequence labeling tasks. In a CRF-based chunker, the goal is to find
the most likely sequence of chunks given a sentence and the previous chunk labels.
This is done by computing the conditional probability of each possible sequence of
chunks given the sentence and the previous chunk labels.
4. Support Vector Machines (SVMs): SVMs are a type of machine learning algorithm
that can be used for classification tasks, including chunking. In an SVM-based
chunker, the goal is to learn a model that can classify each word in a sentence as
belonging to a particular chunk or not. The model is trained on a labeled dataset,
where each word is annotated with its corresponding chunk label.
5. Maximum Entropy Markov Models (MEMMs): MEMMs are a type of statistical
model that combines features from both HMMs and CRFs. In a MEMM-based
chunker, the goal is to find the most likely sequence of chunks given a sentence and
the previous chunk labels, similar to a CRF-based chunker. However, the model is
trained using maximum entropy, which allows it to capture more complex
dependencies between the input and output sequences.
These algorithms are not exhaustive, and there are other approaches to shallow parsing
as well. The choice of algorithm depends on the specific task and the available resources.
10 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
Complexity: One other major problem with PCFGs is that they can be very complex,
making them difficult to understand and work with. The complexity makes it difficult to
design a PCFG that accurately represents the structure of a given language and
challenging to implement and use a PCFG in a practical application.
Data Availability: PCFGs also have an issue because they often require a large
amount of training data to produce accurate results. This can be a problem when
working with languages with limited amounts of annotated text or when trying to parse
sentences containing novel or unusual constructions.
11 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
S→AB ∣ BC
Consider the following CNF grammar G:
A→BA ∣ a
B→CC ∣ b
C→AB ∣ a
The given word w=baaba.
Step 1: Constructing the triangular table
12 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
13 | P a g e
AIM-502: UNIT-3 SYNTACTIC ANALYSIS
14 | P a g e