0% found this document useful (0 votes)
51 views

Normal Forms For CFG'S: Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

The document discusses several algorithms for transforming context-free grammars (CFGs) into different normal forms. It describes discovering and eliminating useless symbols like variables that derive nothing and unreachable symbols. It also explains eliminating epsilon productions by finding nullable symbols and removing unit productions by collapsing derivations using unit productions. Finally, it defines Chomsky normal form as having productions that are either a single terminal or two variables, and provides a proof that any CFG can be transformed into this normal form.

Uploaded by

Lalalalalala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Normal Forms For CFG'S: Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

The document discusses several algorithms for transforming context-free grammars (CFGs) into different normal forms. It describes discovering and eliminating useless symbols like variables that derive nothing and unreachable symbols. It also explains eliminating epsilon productions by finding nullable symbols and removing unit productions by collapsing derivations using unit productions. Finally, it defines Chomsky normal form as having productions that are either a single terminal or two variables, and provides a proof that any CFG can be transformed into this normal form.

Uploaded by

Lalalalalala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

1

Normal Forms for CFGs


Eliminating Useless Variables
Removing Epsilon
Removing Unit Productions
Chomsky Normal Form
2
Variables That Derive Nothing
Consider: S -> AB, A -> aA | a, B -> AB
Although A derives all strings of as, B
derives no terminal strings.
Why? The only production for B leaves a B
in the sentential form.
Thus, S derives nothing, and the
language is empty.
3
Discovery Algorithms
There is a family of algorithms that work
inductively.
They start discovering some facts that
are obvious (the basis).
They discover more facts from what they
already have discovered (induction).
Eventually, nothing more can be
discovered, and we are done.
4
Picture of Discovery
Start with
the basis
facts
Round 1:
Add facts
that follow
from the
basis
Round 2:
Add facts
that follow
from round 1
and the
basis
And so on
5
Testing Whether a Variable
Derives Some Terminal String
Basis: If there is a production A -> w,
where w has no variables, then A
derives a terminal string.
Induction: If there is a production
A -> , where consists only of
terminals and variables known to derive
a terminal string, then A derives a
terminal string.
6
Testing (2)
Eventually, we can find no more
variables.
An easy induction on the order in which
variables are discovered shows that
each one truly derives a terminal string.
Conversely, any variable that derives a
terminal string will be discovered by this
algorithm.
7
Proof of Converse
The proof is an induction on the height
of the least-height parse tree by which
a variable A derives a terminal string.
Basis: Height = 1. Tree looks like:
Then the basis of the algorithm
tells us that A will be discovered.
A
a
1
a
n
. . .
8
Induction for Converse
Assume IH for parse trees of height <
h, and suppose A derives a terminal
string via a parse tree of height h:
By IH, those X
i
s that are
variables are discovered.
Thus, A will also be discovered,
because it has a right side of terminals
and/or discovered variables.
A
X
1
X
n
. . .
w
1
w
n
9
Algorithm to Eliminate
Variables That Derive Nothing
1. Discover all variables that derive
terminal strings.
2. For all other variables, remove all
productions in which they appear in
either the head or body.
10
Example: Eliminate Variables
S -> AB | C, A -> aA | a, B -> bB, C -> c
Basis: A and C are discovered because
of A -> a and C -> c.
Induction: S is discovered because of
S -> C.
Nothing else can be discovered.
Result: S -> C, A ->aA | a, C -> c
11
Unreachable Symbols
Another way a terminal or variable
deserves to be eliminated is if it cannot
appear in any derivation from the start
symbol.
Basis: We can reach S (the start symbol).
Induction: if we can reach A, and there is
a production A -> , then we can reach all
symbols of .
12
Unreachable Symbols (2)
Easy inductions in both directions show
that when we can discover no more
symbols, then we have all and only the
symbols that appear in derivations from S.
Algorithm: Remove from the grammar all
symbols not discovered reachable from S
and all productions that involve these
symbols.
13
Eliminating Useless Symbols
A symbol is useful if it appears in
some derivation of some terminal
string from the start symbol.
Otherwise, it is useless.
Eliminate all useless symbols by:
1. Eliminate symbols that derive no terminal
string.
2. Eliminate unreachable symbols.
14
Example: Useless Symbols (2)
S -> AB, A -> C, C -> c, B -> bB
If we eliminated unreachable symbols
first, we would find everything is
reachable.
A, C, and c would never get eliminated.
15
Why It Works
After step (1), every symbol remaining
derives some terminal string.
After step (2) the only symbols
remaining are all derivable from S.
In addition, they still derive a terminal
string, because such a derivation can
only involve symbols reachable from S.
16
Epsilon Productions
We can almost avoid using productions of
the form A -> (called -productions ).
The problem is that cannot be in the
language of any grammar that has no
productions.
Theorem: If L is a CFL, then L-{} has a
CFG with no -productions.
17
Nullable Symbols
To eliminate -productions, we first
need to discover the nullable symbols
= variables A such that A =>* .
Basis: If there is a production A -> ,
then A is nullable.
Induction: If there is a production
A -> , and all symbols of are
nullable, then A is nullable.
18
Example: Nullable Symbols
S -> AB, A -> aA | , B -> bB | A
Basis: A is nullable because of A -> .
Induction: B is nullable because of
B -> A.
Then, S is nullable because of S -> AB.
19
Eliminating -Productions
Key idea: turn each production
A -> X
1
X
n
into a family of productions.
For each subset of nullable Xs, there is
one production with those eliminated
from the right side in advance.
Except, if all Xs are nullable (or the body
was empty to begin with), do not make a
production with as the right side.
20
Example: Eliminating -
Productions
S -> ABC, A -> aA | , B -> bB | , C ->
A, B, C, and S are all nullable.
New grammar:
S -> ABC | AB | AC | BC | A | B | C
A ->aA | a
B ->bB | b
Note: C is now useless.
Eliminate its productions.
21
Why it Works
Prove that for all variables A:
1. If w and A =>*
old
w, then A =>*
new
w.
2. If A =>*
new
w then w and A =>*
old
w.
Then, letting A be the start symbol
proves that L(new) = L(old) {}.
(1) is an induction on the number of
steps by which A derives w in the old
grammar.
22
Proof of 1 Basis
If the old derivation is one step, then
A -> w must be a production.
Since w , this production also
appears in the new grammar.
Thus, A =>
new
w.
23
Proof of 1 Induction
Let A =>*
old
w be a k-step derivation,
and assume the IH for derivations of
fewer than k steps.
Let the first step be A =>
old
X
1
X
n
.
Then w can be broken into w = w
1
w
n
,
where X
i
=>*
old
w
i
, for all i, in fewer
than k steps.
24
Induction Continued
By the IH, if w
i
, then X
i
=>*
new
w
i
.
Also, the new grammar has a
production with A on the left, and just
those X
i
s on the right such that w
i
.
Note: they all cant be , because w .
Follow a use of this production by the
derivations X
i
=>*
new
w
i
to show that A
derives w in the new grammar.
25
Unit Productions
A unit production is one whose body
consists of exactly one variable.
These productions can be eliminated.
Key idea: If A =>* B by a series of unit
productions, and B -> is a non-unit-
production, then add production A -> .
Then, drop all unit productions.
26
Unit Productions (2)
Find all pairs (A, B) such that A =>* B
by a sequence of unit productions only.
Basis: Surely (A, A).
Induction: If we have found (A, B), and
B -> C is a unit production, then add
(A, C).
27
Proof That We Find Exactly
the Right Pairs
By induction on the order in which pairs
(A, B) are found, we can show A =>* B
by unit productions.
Conversely, by induction on the number
of steps in the derivation by unit
productions of A =>* B, we can show
that the pair (A, B) is discovered.
28
Proof The the Unit-Production-
Elimination Algorithm Works
Basic idea: there is a leftmost
derivation A =>*
lm
w in the new
grammar if and only if there is such a
derivation in the old.
A sequence of unit productions and a
non-unit production is collapsed into a
single production of the new grammar.
29
Cleaning Up a Grammar
Theorem: if L is a CFL, then there is a
CFG for L {} that has:
1. No useless symbols.
2. No -productions.
3. No unit productions.
I.e., every body is either a single
terminal or has length >2.
30
Cleaning Up (2)
Proof: Start with a CFG for L.
Perform the following steps in order:
1. Eliminate -productions.
2. Eliminate unit productions.
3. Eliminate variables that derive no
terminal string.
4. Eliminate variables not reached from the
start symbol.
Must be first. Can create
unit productions or useless
variables.
31
Chomsky Normal Form
A CFG is said to be in Chomsky
Normal Form if every production is of
one of these two forms:
1. A -> BC (body is two variables).
2. A -> a (body is a single terminal).
Theorem: If L is a CFL, then L {}
has a CFG in CNF.
32
Proof of CNF Theorem
Step 1: Clean the grammar, so every
body is either a single terminal or of
length at least 2.
Step 2: For each body a single terminal,
make the right side all variables.
For each terminal a create new variable A
a
and production A
a
-> a.
Replace a by A
a
in bodies of length >2.
33
Example: Step 2
Consider production A -> BcDe.
We need variables A
c
and A
e
. with
productions A
c
-> c and A
e
-> e.
Note: you create at most one variable for
each terminal, and use it everywhere it is
needed.
Replace A ->BcDe by A -> BA
c
DA
e
.
34
CNF Proof Continued
Step 3: Break right sides longer than 2
into a chain of productions with right
sides of two variables.
Example: A -> BCDE is replaced by
A -> BF, F -> CG, and G -> DE.
F and G must be used nowhere else.
35
Example of Step 3 Continued
Recall A -> BCDE is replaced by
A -> BF, F -> CG, and G -> DE.
In the new grammar, A => BF => BCG
=> BCDE.
More importantly: Once we choose to
replace A by BF, we must continue to
BCG and BCDE.
Because F and G have only one production.

You might also like