0% found this document useful (0 votes)
35 views32 pages

Natural Language Processing UNIT 2

Natural Language Processing PPT

Uploaded by

ayushnair44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views32 pages

Natural Language Processing UNIT 2

Natural Language Processing PPT

Uploaded by

ayushnair44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

21CSE356T: Natural Language

Processing

Unit 2

Prepared by: Dr. Pritam Khan


Introduction to Parsing in NLP
• Definition: Parsing analyzes sentence structure
based on grammar rules.

• Importance: Essential for NLP applications


like machine translation and speech
recognition.
Context-Free Grammars (CFGs)
• Definition: Formal grammar with production rules to describe
language syntax.
Purpose of context-free grammar:
• To list all strings in a language using a set of rules (production
rules).
• It extends the capabilities of regular expressions and finite automata.

Components:
• - Terminal symbols (Σ): Actual words.
• - Non-terminal symbols (N): Abstract categories.
• - Production rules (P): Defines symbol replacements.
• - Start symbol (S): Root of parse tree.
Grammar Rules for English
• Phrase Structure Rules:
• - Sentence (S) → NP VP
• - Noun Phrase (NP) → Det N | N
• - Verb Phrase (VP) → V NP | V
• - Prepositional Phrase (PP) → P NP

• Example: 'The cat chases the dog.'


CFG Example
This tree represents the syntactic structure of
the sentence:

•S (Sentence) is split into NP (Noun Phrase)


and VP (Verb Phrase).

•The NP consists of Det (Determiner) "The"


and N (Noun) "cat".

•The VP consists of the V (Verb) "sat" and a


PP (Prepositional Phrase).
•The PP consists of P (Preposition) "on" and
another NP, which includes Det "the" and N
"mat".
Top-Down Parsing
• Definition: Starts with start symbol, rewrites it
into input string.

• Methods:
• - Recursive Descent Parsing: Uses recursive
function calls.
• - Backtracking: Tries different rules.

• Pros: Easy to implement.


• Cons: Inefficient due to backtracking.
Top-Down Parsing
• In this kind of parsing, the parser starts constructing the parse tree from the start symbol and
then tries to transform the start symbol to the input.
• The most common form of top-down parsing uses recursive procedure to process the input.
• Top-down parsing starts its search from the root node and works downwards towards the leaf
node.
• The root is expanded using the grammar with ’S’ as non-terminal symbols.
• Each non-terminal symbol in the resulting sub-tree is then expanded using the appropriate
grammar rules.

S→NP VP NP→Noun Preposition→from|with|on|to


S→VP VP→Verb Verb→plays|point
NP→ Det VP→Verb Noun Noun→Student|..
Noun PP→ Prep NP
NP→ pronoun Det→This|That|th
NP→ Det Noun PP e
Pronoun→ She|He|they
Example 1: Paint the door
Level 1: S
Level 2:

S S

NP VP VP

Level 3:
S
S
S
NP VP NP VP
NP VP
Det Noun
Det Noun PP
Pronoun
S
S
VP
VP
Verb
Verb NP

Level 4:
S

VP

Verb NP
(paint)
Det Noun
(the) (door)
Example 2: Deepika reads the book
Deepika reads the book
S

NP VP

Noun Verb NP
(Deepika) (reads)
Det Noun
(the) (book)
Example 3: Does this flight
S include a meal?
Aux NP VP
(Does)

Det Noun Verb NP


(this) (flight) (include)

Det Noun
(a) (meal)
Advantages and Disadvantages of Top-Down
parsing

• Advantages
• There is a chance of generating wrong
grammatical sentence as it starts generating the
tree using the start symbol of grammar.
• Disadvantages
• Time consuming because it checks each and
every rule of parsing
Bottom-Up Parsing
• Definition: Starts with input, builds up to start
symbol.

• Methods:
• - Shift-Reduce Parsing: Uses stack.
• - LR Parsing: Used in compilers.

• Pros: More efficient than top-down parsing.


• Cons: More complex implementation.
Bottom-Up Parsing
• In this kind of parsing, the parser starts with the input
symbol and tries to construct the parse tree in an upward
direction towards the root.
• At each step the parser checks the rules in the grammar
where the RHS matches the portion of the parse tree
constructed so far.
• It then reduces it using the LHS of the production.
• The parse tree is considered to be successful if the parser
reduces the tree to the start symbol of the grammar.
Example 1: Paint the door

Paint the door


Verb Det Noun

NP

VP

S
Example 2: Deepika reads the book

Deepika reads the book


Noun Verb Det Noun

NP NP

VP

S
Example 3: Does this flight include a meal?

Does this flight include a meal?


Aux Det Noun Verb Det Noun

NP NP

VP

S
Advantages and Disadvantages of
Bottom-Up Parsing
• Advantages
• It never wastes time in exploring a tree that
does not match the input.
• Disadvantages
• It wastes time in generating trees that have no
chance of leading to S rooted tree.
Disadvantages of parsing
• Left Recursion Leading to Infinite Loops:
• Top-down parsers cannot handle left-recursive grammars
directly, as they result in non-terminating recursive calls.
• Ambiguity:
• Ambiguous grammars can lead to multiple valid parse trees
for a single input, complicating the parsing process and
making it difficult to determine the intended structure and
meaning.
• Addressing these disadvantages involves transforming
grammars to remove left recursion and refine them to
resolve ambiguities, ensuring that parsers can operate
efficiently and accurately.
Ambiguity in Parsing
• Definition: Sentence has multiple valid parse trees.

• Example: 'I saw the man with the telescope.'


• - Interpretation 1: I used a telescope.
• - Interpretation 2: The man had a telescope.

• Solutions:
• - Probabilistic Context-Free Grammars (PCFGs)
• - Semantic analysis.

• Visual: Parse trees for ambiguity.


Cocke-Kasami-Younger (CKY)
Parsing Algorithm
The Cocke-Kasami-Younger (CKY) algorithm is a bottom-up parsing
algorithm used for parsing context-free grammars (CFGs) in Chomsky
Normal Form (CNF). It is particularly useful for parsing sentences
efficiently in NLP and is a foundational technique in probabilistic parsing.
Cocke-Kasami-Younger (CKY)
Parsing
• Definition: A dynamic programming method for
CFGs in Chomsky Normal Form.

• Steps:
• 1. Convert CFG to CNF.
• 2. Fill table bottom-up.
• 3. Identify valid parse tree.

• Pros: Efficient for large parsing tasks.


• Cons: Needs CNF conversion.
Steps of the CKY Algorithm
Step 1: Convert the Grammar to Chomsky Normal
Form (CNF)
A CNF grammar has rules of the following forms, any one is to be followed:

1. A→BC (where A,B,C are non-terminals)


2. A→a (where A is a non-terminal, and a is a terminal)
Example:
Standard CFG: Converted to CNF:
Step 2: Initialize the CKY Parsing Table
For an input sentence:
"The dog chases"
Create a table where rows and columns represent substrings of the input.
Step 3: Fill the CKY Table Bottom-Up
1.Fill the diagonal with terminal rules (matching words to CNF rules).
2.Build higher levels using binary productions.
3.Check if the start symbol (S) appears in the top-right cell → If yes, the
sentence is grammatically valid.

Example CKY Parse Table for "The dog chases"

The Dog Chases


The Det
Dog N
Chases V
Det N NP
NP V VP
NP VP S
Dependency Parsing
• Definition: Focuses on word relationships instead
of phrase structures.

• Example: 'The dog chased the ball.'


• - 'chased' is the root verb.
• - 'dog' (subject) depends on 'chased.'
• - 'ball' (object) depends on 'chased.'

• Applications: Syntax-based machine translation.


Earley Parsing
• Definition: Top-down parsing handling left-
recursion efficiently.

• Steps:
• 1. Prediction: Expand non-terminals.
• 2. Scanning: Match terminals.
• 3. Completion: Move to next state.

• Pros: Handles any CFG.


• Cons: Slower than some parsers.
Earley Parsing
A state in Earley Parsing is represented as: A→α∙β,[i]

•A: Non-terminal being expanded.


•α: Already parsed portion.
•β: Remaining to be parsed.
•∙: Position in the rule.
•[i]: Position in the input where this state started.

Three Main Operations


1.Prediction: Expand a non-terminal.
1. If β begins with a non-terminal B, predict all rules of B.
2.Scanning: Match terminals with input.
1. If β begins with a terminal, match it with the current input
symbol.
3.Completion: Complete a rule and advance the parser.
1. If β is empty, find and advance the states that predicted this
rule.
Steps of Earley Parsing
1.Initialize with a start state: S’→∙S,[0]
2.For each input position k:
1. Prediction: Add rules for non-terminals.
2. Scanning: Move the dot past terminals if they match.
3. Completion: Move dot in the states that awaited this completion.
3.Final State:
1. Successful parse if we have: S’→S∙,[0] at the end of the input.
Example Parsing "John eats"
Grammar:
Parsing Steps:
Probabilistic Context-Free
Grammars (PCFGs)
• Definition: CFG with probabilities assigned to rules.

• Example:
• - S → NP VP (0.9)
• - NP → Det N (0.6)
• - VP → V NP (0.7)

• Usage: Resolves ambiguity using probabilities.

• Applications: Speech recognition, machine translation.


Consider a simple grammar for parsing the sentence "The cat sleeps":

Two Possible Parse Trees with Probabilities


Parse Tree 1: Using NP→Det N
Probability:
P(S)=1.0×0.5×1.0×0.7×1.0=0.35
Parse Tree 2: Using NP→N
Probability:
P(S)=1.0×0.5×0.7×1.0=0.35

Total Probability Distribution


Since both trees sum to 1.0 (if normalized), we get a probability distribution
over parse trees:
Parse Tree Probability
NP → Det N 0.35
NP → N 0.35

If there were more trees, they would also be included in the distribution.

You might also like