21CSE356T: Natural Language
Processing
Unit 2
Prepared by: Dr. Pritam Khan
Introduction to Parsing in NLP
• Definition: Parsing analyzes sentence structure
based on grammar rules.
• Importance: Essential for NLP applications
like machine translation and speech
recognition.
Context-Free Grammars (CFGs)
• Definition: Formal grammar with production rules to describe
language syntax.
Purpose of context-free grammar:
• To list all strings in a language using a set of rules (production
rules).
• It extends the capabilities of regular expressions and finite automata.
Components:
• - Terminal symbols (Σ): Actual words.
• - Non-terminal symbols (N): Abstract categories.
• - Production rules (P): Defines symbol replacements.
• - Start symbol (S): Root of parse tree.
Grammar Rules for English
• Phrase Structure Rules:
• - Sentence (S) → NP VP
• - Noun Phrase (NP) → Det N | N
• - Verb Phrase (VP) → V NP | V
• - Prepositional Phrase (PP) → P NP
• Example: 'The cat chases the dog.'
CFG Example
This tree represents the syntactic structure of
the sentence:
•S (Sentence) is split into NP (Noun Phrase)
and VP (Verb Phrase).
•The NP consists of Det (Determiner) "The"
and N (Noun) "cat".
•The VP consists of the V (Verb) "sat" and a
PP (Prepositional Phrase).
•The PP consists of P (Preposition) "on" and
another NP, which includes Det "the" and N
"mat".
Top-Down Parsing
• Definition: Starts with start symbol, rewrites it
into input string.
• Methods:
• - Recursive Descent Parsing: Uses recursive
function calls.
• - Backtracking: Tries different rules.
• Pros: Easy to implement.
• Cons: Inefficient due to backtracking.
Top-Down Parsing
• In this kind of parsing, the parser starts constructing the parse tree from the start symbol and
then tries to transform the start symbol to the input.
• The most common form of top-down parsing uses recursive procedure to process the input.
• Top-down parsing starts its search from the root node and works downwards towards the leaf
node.
• The root is expanded using the grammar with ’S’ as non-terminal symbols.
• Each non-terminal symbol in the resulting sub-tree is then expanded using the appropriate
grammar rules.
S→NP VP NP→Noun Preposition→from|with|on|to
S→VP VP→Verb Verb→plays|point
NP→ Det VP→Verb Noun Noun→Student|..
Noun PP→ Prep NP
NP→ pronoun Det→This|That|th
NP→ Det Noun PP e
Pronoun→ She|He|they
Example 1: Paint the door
Level 1: S
Level 2:
S S
NP VP VP
Level 3:
S
S
S
NP VP NP VP
NP VP
Det Noun
Det Noun PP
Pronoun
S
S
VP
VP
Verb
Verb NP
Level 4:
S
VP
Verb NP
(paint)
Det Noun
(the) (door)
Example 2: Deepika reads the book
Deepika reads the book
S
NP VP
Noun Verb NP
(Deepika) (reads)
Det Noun
(the) (book)
Example 3: Does this flight
S include a meal?
Aux NP VP
(Does)
Det Noun Verb NP
(this) (flight) (include)
Det Noun
(a) (meal)
Advantages and Disadvantages of Top-Down
parsing
• Advantages
• There is a chance of generating wrong
grammatical sentence as it starts generating the
tree using the start symbol of grammar.
• Disadvantages
• Time consuming because it checks each and
every rule of parsing
Bottom-Up Parsing
• Definition: Starts with input, builds up to start
symbol.
• Methods:
• - Shift-Reduce Parsing: Uses stack.
• - LR Parsing: Used in compilers.
• Pros: More efficient than top-down parsing.
• Cons: More complex implementation.
Bottom-Up Parsing
• In this kind of parsing, the parser starts with the input
symbol and tries to construct the parse tree in an upward
direction towards the root.
• At each step the parser checks the rules in the grammar
where the RHS matches the portion of the parse tree
constructed so far.
• It then reduces it using the LHS of the production.
• The parse tree is considered to be successful if the parser
reduces the tree to the start symbol of the grammar.
Example 1: Paint the door
Paint the door
Verb Det Noun
NP
VP
S
Example 2: Deepika reads the book
Deepika reads the book
Noun Verb Det Noun
NP NP
VP
S
Example 3: Does this flight include a meal?
Does this flight include a meal?
Aux Det Noun Verb Det Noun
NP NP
VP
S
Advantages and Disadvantages of
Bottom-Up Parsing
• Advantages
• It never wastes time in exploring a tree that
does not match the input.
• Disadvantages
• It wastes time in generating trees that have no
chance of leading to S rooted tree.
Disadvantages of parsing
• Left Recursion Leading to Infinite Loops:
• Top-down parsers cannot handle left-recursive grammars
directly, as they result in non-terminating recursive calls.
• Ambiguity:
• Ambiguous grammars can lead to multiple valid parse trees
for a single input, complicating the parsing process and
making it difficult to determine the intended structure and
meaning.
• Addressing these disadvantages involves transforming
grammars to remove left recursion and refine them to
resolve ambiguities, ensuring that parsers can operate
efficiently and accurately.
Ambiguity in Parsing
• Definition: Sentence has multiple valid parse trees.
• Example: 'I saw the man with the telescope.'
• - Interpretation 1: I used a telescope.
• - Interpretation 2: The man had a telescope.
• Solutions:
• - Probabilistic Context-Free Grammars (PCFGs)
• - Semantic analysis.
• Visual: Parse trees for ambiguity.
Cocke-Kasami-Younger (CKY)
Parsing Algorithm
The Cocke-Kasami-Younger (CKY) algorithm is a bottom-up parsing
algorithm used for parsing context-free grammars (CFGs) in Chomsky
Normal Form (CNF). It is particularly useful for parsing sentences
efficiently in NLP and is a foundational technique in probabilistic parsing.
Cocke-Kasami-Younger (CKY)
Parsing
• Definition: A dynamic programming method for
CFGs in Chomsky Normal Form.
• Steps:
• 1. Convert CFG to CNF.
• 2. Fill table bottom-up.
• 3. Identify valid parse tree.
• Pros: Efficient for large parsing tasks.
• Cons: Needs CNF conversion.
Steps of the CKY Algorithm
Step 1: Convert the Grammar to Chomsky Normal
Form (CNF)
A CNF grammar has rules of the following forms, any one is to be followed:
1. A→BC (where A,B,C are non-terminals)
2. A→a (where A is a non-terminal, and a is a terminal)
Example:
Standard CFG: Converted to CNF:
Step 2: Initialize the CKY Parsing Table
For an input sentence:
"The dog chases"
Create a table where rows and columns represent substrings of the input.
Step 3: Fill the CKY Table Bottom-Up
1.Fill the diagonal with terminal rules (matching words to CNF rules).
2.Build higher levels using binary productions.
3.Check if the start symbol (S) appears in the top-right cell → If yes, the
sentence is grammatically valid.
Example CKY Parse Table for "The dog chases"
The Dog Chases
The Det
Dog N
Chases V
Det N NP
NP V VP
NP VP S
Dependency Parsing
• Definition: Focuses on word relationships instead
of phrase structures.
• Example: 'The dog chased the ball.'
• - 'chased' is the root verb.
• - 'dog' (subject) depends on 'chased.'
• - 'ball' (object) depends on 'chased.'
• Applications: Syntax-based machine translation.
Earley Parsing
• Definition: Top-down parsing handling left-
recursion efficiently.
• Steps:
• 1. Prediction: Expand non-terminals.
• 2. Scanning: Match terminals.
• 3. Completion: Move to next state.
• Pros: Handles any CFG.
• Cons: Slower than some parsers.
Earley Parsing
A state in Earley Parsing is represented as: A→α∙β,[i]
•A: Non-terminal being expanded.
•α: Already parsed portion.
•β: Remaining to be parsed.
•∙: Position in the rule.
•[i]: Position in the input where this state started.
Three Main Operations
1.Prediction: Expand a non-terminal.
1. If β begins with a non-terminal B, predict all rules of B.
2.Scanning: Match terminals with input.
1. If β begins with a terminal, match it with the current input
symbol.
3.Completion: Complete a rule and advance the parser.
1. If β is empty, find and advance the states that predicted this
rule.
Steps of Earley Parsing
1.Initialize with a start state: S’→∙S,[0]
2.For each input position k:
1. Prediction: Add rules for non-terminals.
2. Scanning: Move the dot past terminals if they match.
3. Completion: Move dot in the states that awaited this completion.
3.Final State:
1. Successful parse if we have: S’→S∙,[0] at the end of the input.
Example Parsing "John eats"
Grammar:
Parsing Steps:
Probabilistic Context-Free
Grammars (PCFGs)
• Definition: CFG with probabilities assigned to rules.
• Example:
• - S → NP VP (0.9)
• - NP → Det N (0.6)
• - VP → V NP (0.7)
• Usage: Resolves ambiguity using probabilities.
• Applications: Speech recognition, machine translation.
Consider a simple grammar for parsing the sentence "The cat sleeps":
Two Possible Parse Trees with Probabilities
Parse Tree 1: Using NP→Det N
Probability:
P(S)=1.0×0.5×1.0×0.7×1.0=0.35
Parse Tree 2: Using NP→N
Probability:
P(S)=1.0×0.5×0.7×1.0=0.35
Total Probability Distribution
Since both trees sum to 1.0 (if normalized), we get a probability distribution
over parse trees:
Parse Tree Probability
NP → Det N 0.35
NP → N 0.35
If there were more trees, they would also be included in the distribution.