CS242 Module 7
CS242 Module 7
26/12/2021
Theory of Computing
https://www.csa.iisc.ac.in/~deepakd/atc-2016/Seminar-CSG.pdf
https://www3.nd.edu/~cpennycu/2019/assets/fall/TOC/09%20Chomsky%20Nor
mal%20Form.pdf
This Presentation is mainly dependent on the textbook: Introduction to Automata Theory, Languages, and Computation: Global Edition, 3rd edition (2013) PHI
by John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
• Normal Forms for Context-Free Grammars
How to “simplify” CFGs?
7
Three ways to simplify/clean a CFG
A =>
A => B
8
Eliminating useless symbols
A symbol X is reachable if there exists:
◼ S ➔* X
9
Algorithm to detect useless symbols
1. First, eliminate all symbols that are not generating
2. Next, eliminate all symbols that are not reachable
10
Example: Useless symbols
◼ S ➔ AB | a
◼ A➔b
1. A, S are generating
2. B is not generating (and therefore B is useless)
3. ==> Eliminating B… (i.e., remove all productions that involve B)
1. S➔a
2. A➔b
4. Now, A is not reachable and therefore is useless
5. Simplified G:
1. S➔a
What would happen if you reverse the order, i.e., test reachability
before generating?
Will fail to remove: A ➔ b
11
Algorithm to find all generating symbols
◼ X ➔* w
◼ Given: G=(V,T,P,S)
◼ Basis:
◼ Every symbol in T is obviously generating.
◼ Induction:
◼ Suppose for a production A ➔ , where is generating
◼ Then, A is also generating
12
Algorithm to find all reachable symbols
◼ S ➔* X
◼ Given: G = (V, T, P, S)
◼ Basis:
◼ S is obviously reachable (from itself)
◼ Induction:
◼ Suppose for a production A ➔ 1 2… k, where A is
reachable
◼ Then, all symbols on the right hand side, {1, 2 ,… k}
are also reachable.
13
Eliminating -productions
What’s the point of removing -productions (A ➔ )?
Caveat: It is not possible to eliminate -productions for
languages which include in their word set.
14
Algorithm to detect all nullable variables
◼ Basis:
◼ If A is a production in G, then A is nullable
➔
15
Eliminating -productions
Given: G = (V, T, P, S)
Algorithm:
1. Detect all nullable variables in G
2. Then construct G1 = (V, T, P1, S) as follows:
i. For each production of the form: A ➔ X1X2…Xk, where k
≥ 1, suppose m out of the k Xi’s are nullable symbols
ii. Then G1 will have 2m versions for this production
i. i.e., all combinations where each Xi is either present or absent
iii. Alternatively, if a production is of the form: A ➔ ,
then remove it
16
Example: Eliminating -productions
◼ Let L be the language represented by the following CFG G:
i. S ➔ AB
ii. A ➔ aAA |
iii. B ➔ bBB |
Simplified
Goal: To construct G1, which is the grammar for L-{}
grammar
◼ Nullable symbols: {A, B}
17
Eliminating unit productions (1)
▪ Unit production is one which is of the form A ➔ B, where
both A & B are variables.
▪ What’s the point of removing unit transitions ?
▪ Will save #substitutions
E.g.
A=>B | …
B=>C | …
C=>D | …
D=>xxx | yyy | zzz
19
The Unit Pair Algorithm
to remove unit productions (1)
◼ Suppose A➔B1 ➔B2 ➔ … ➔ Bn ➔
◼ Action: Replace all intermediate productions to produce
directly
◼ i.e., A➔ ; B1➔ ; … Bn ➔ ;
◼ Induction: If (A,B) and (B,C) are unit pairs, and A➔C is also a unit
pair.
20
The Unit Pair Algorithm
to remove unit productions (2)
Input: G = (V, T, P, S)
Goal: to build G1 = (V, T, P1, S) to devoid of unit
productions.
Algorithm:
1. Find all unit pairs in G.
2. For each unit pair (A,B) in G:
1. Add to P1 a new production A➔, for every B➔
which is a non-unit production
2. If a resulting production is already there in P, then
there is no need to add it.
21
Example: eliminating unit productions
Unit pairs Only non-unit
productions to be
G:
added to P1
1. E ➔ T | E+T (E,E) E ➔ E+T
2. T ➔ F | T*F
3. F ➔ I | (E) (E,T) E ➔ T*F
4. I ➔ a | b | Ia | Ib | I0 | I1 (E,F) E ➔ (E)
(E,I) E ➔ a|b|Ia | Ib | I0 | I1
(T,T) T ➔ T*F
(T,F) T ➔ (E)
(T,I) T ➔ a|b| Ia | Ib | I0 | I1
G1 :
1. E ➔ E+T | T*F | (E) | a| b | Ia | Ib | I0 | I1 (F,F) F ➔ (E)
2. T ➔ T*F | (E) | a| b | Ia | Ib | I0 | I1
3. F ➔ (E) | a| b | Ia | Ib | I0 | I1 (F,I) F ➔ a| b| Ia | Ib | I0 | I1
4. I ➔ a | b | Ia | Ib | I0 | I1
(I,I) I ➔ a| b | Ia | Ib | I0 | I1
22
Putting all together
▪ Theorem: If G is a CFG for a language that contains at
least one string other than , then there is another CFG
G1, such that L(G1)=L(G) - , and G1 has:
◼ no -productions
◼ no unit productions
◼ no useless symbols
▪ Algorithm:
Step 1) eliminate -productions
Step 2) eliminate unit productions
Step 3) eliminate useless symbols
23
Normal forms
◼ If all productions of the grammar could be
expressed in the same form(s), then:
a. It becomes easy to design algorithms that use the
grammar.
b. It becomes easy to show proofs and properties.
◼ Two important Normal Form (GNF)
◼ Griebach Normal Form (GNF)
◼ All productions of the form
A==>a
◼ Chomsky Normal Form (CNF)
◼ Will be discussed here
24
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{}
Definition:
G is said to be in Chomsky Normal Form if all its
productions are in one of the following two
forms:
i. A ➔ BC where A,B,C are variables, or
ii. A➔a where a is a terminal
◼ G has no useless symbols
◼ G has no unit productions
◼ G has no -productions
25
CNF checklist
▪ Is this grammar in CNF?
G1:
1. E ➔ E+T | T*F | (E) | Ia | Ib | I0 | I1
2. T ➔ T*F | (E) | Ia | Ib | I0 | I1
3. F ➔ (E) | Ia | Ib | I0 | I1
4. I ➔ a | b | Ia | Ib | I0 | I1
Checklist:
▪ G has no -productions
▪ G has no unit productions
▪ G has no useless symbols
▪ But…
▪ the normal form for productions is violated
26
How to convert a G into CNF?
◼ Assumption: G has no -productions, unit productions or useless symbols
B2 C2 and so on…
B C1
◼ A➔B1C1 C1➔B2C2 … Ck-3➔Bk-2Ck-2 Ck-2➔Bk-1B1k
27
Example #1
G in CNF:
G:
X0 => 0
S => AS | BABC
X1 => 1
A => A1 | 0A1 | 01
S => AS | BY1
B => 0B | 0 Y1 => AY2
C => 1C | 1 Y2 => BC
A => AX1 | X0Y3 | X0X1
Y3 => AX1
B => X0B | 0
C => X1C | 1
28
Example #2
1. E ➔ EX+T | TX*F | X(EX) | IXa | IXb | IX0 | IX1
2. T ➔ TX*F | X(EX) | IXa | IXb | IX0 | IX1
G: 3. F ➔ X(EX) | IXa | IXb | IX0 | IX1
1. E ➔ E+T | T*F | (E) | Ia | Ib | I0 | I1 4. I ➔ Xa | Xb | IXa | IXb | IX0 | IX1
2. T ➔ T*F | (E) | Ia | Ib | I0 | I1 5. X+ ➔ +
3. F ➔ (E) | Ia | Ib | I0 | I1 6. X* ➔ *
4. I ➔ a | b | Ia | Ib | I0 | I1
Step (1) 7. X+ ➔ +
8. X( ➔ (
9. …….
Step (2)
29
Languages with
◼ For languages that include ,
◼ Write down the rest of grammar in CNF
◼ Then add production “S => ” at the end
E.g., Consider: G in CNF:
G: X0 => 0
X1 => 1
S => AS | BABC
A => A1 | 0A1 | 01 | S => AS | BY1 |
B => 0B | 0 | Y1 => AY2
C => 1C | 1 | Y2 => BC
30
• The Pumping Lemma for Context-Free Languages
Pumping Lemma
▪ There are some languages which are not CFL.
◼ For such languages stack is not enough
◼ E.g. the languages of strings of the form ww
▪ A result that will be useful in proving languages
that are not CFLs just like for regular languages
▪ Let us first prove an important property about
parse trees
32
The “parse tree theorem”
Observe that any parse tree generated by Parse tree for w
a CNF will be a binary tree, where all
S = A0
internal nodes have exactly two children
(except those nodes connected to the
leaves) A1
Given: A2
◼ Suppose we have a parse tree
for a string w, according to a h
= tree height
CNF grammar, G = (V, T, P, S)
◼ Let h be the height of the parse
Ah-1
tree
Implies:
◼ |w| ≤ 2h-1 a
◼ I.e., a CNF parse tree’s string
yield (w) no longer than 2h-1. w
33
Proof of the size of Parse trees
Proof: (using induction on h) Parse tree for w
Basis: h = 1
➔ Derivation will have to be “S➔a” S = A0
➔ |w|= 1 = 21-1 .
h
Ind. Step: h = k = height
S will have exactly two children:
S➔AB
34
Implication of the Parse Tree Theorem
Fact:
◼ If the height of a parse tree is h, then
◼ |w| ≤ 2h-1
Implication:
◼ If |w| ≥ 2m, then
◼ Parse tree’s height is at least m+1
35
The Pumping Lemma for CFLs
Let L be a CFL.
Then there exists a constant N, such that,
◼ if z L such that |z| ≥ N, then we can write
z = uvwxy, such that:
1. |vwx| ≤ N
2. vx ≠
3. For all k ≥ 0: uvkwxky L
Note: We are pumping in two places (v & x)
36
Proof: Pumping Lemma for CFL
◼ Choose N = 2 .
m
37
Parse tree for z
Meaning: Repetition
in the last m+1 variables
h-m ≤ i < j ≤ h
S = A0 S = A0
+
A1 Ai = Aj
A2 Ai
h ≥ m+1 h ≥ m+1
Aj
m+1
Ah-1
u v x y
Ah= a
w
z z = uvwxy
• Therefore, vx ≠
38
Extending the parse tree…
S = A0
S = A0
Ai
w
Ai u y
u v x y
z = uwy
v x
w ==> For all k ≥ 0: uvkwxky L
z = uvkwxky
39
Proof completes
But, 2m = N
==> |vwx| ≤ N
40
Application of Pumping Lemma for CFLs
Example 1: L = {ambmcm | m > 0 }
Claim: L is not a CFL
Proof:
◼ Let N <== P/L constant
◼ Pick z = aNbNcN
◼ Apply pumping lemma to z and show that there exists at
least one other string constructed from z (obtained by
pumping up or down) that is L
◼ z = uvwxy
◼ As z = aNbNcN and |vwx| ≤ N and vx ≠
◼ ==> v, x cannot contain all three symbols (a, b, c)
◼ ==> we can pump up or pump down to build
another string which is L
41
Exercise for Pumping Lemma application
◼ L = { ww | w is in {0, 1}*}
◼ Show that L is not a CFL
◼ Try string z = 0N0N
◼ what happens?
◼ Try string z = 0N1N0N1N
◼ what happens?
2
◼ L = { 0k | k is any integer)
◼ Prove L is not a CFL using Pumping Lemma
42
• Closure Properties of Context-Free Languages
Closure Property Results
◼ CFLs are closed under:
◼ Union
◼ Concatenation
◼ Kleene closure operator
◼ Substitution
◼ Homomorphism, inverse homomorphism
◼ reversal
◼ CFLs are not closed under:
◼ Intersection Note: Reg languages
◼ Difference are closed
◼ Complementation under
these
operators
44
Strategy for Closure Property Proofs
◼ First prove “closure under substitution”
◼ Use the above result to prove other closure properties
◼ CFLs are closed under:
◼ Union
◼ Concatenation
◼ Kleene closure operator
◼ Substitution
Prove ◼ Homomorphism, inverse homomorphism
this first ◼ Reversal
45
The Substitution operation
Note: s(L) can use a
different alphabet
46
CFLs are closed under Substitution
What is s(L)?
L s(L)
w1 s(w1) Note: each s(w)
w2 s(L) s(w2) is itself a set of strings
w3 s(w3)
w4 s(w4)
… …
47
CFLs are closed under Substitution
◼ G=(V, T, P, S) : CFG for L
◼ Because every s(a) is a CFL, there is a CFG for each s(a)
◼ Let Ga = (Va, Ta, Pa, Sa)
◼ Construct G’ = (V’, T’, P’, S) for s(L)
◼ P’ consists of:
◼ The productions of P, but with every occurrence of terminal “a” in their
bodies replaced by Sa.
◼ All productions in any Pa, for any a ∑
48
Substitution of a CFL: example
◼ Let L = language of binary palindromes such that, substitutions for 0
and 1 are defined as follows:
◼ s(0) = {anbn | n ≥ 1}, s(1) = {xx, yy}
◼ Prove that s(L) is also a CFL.
S=> S0SS0 | S1 S S1 |
S0=> aS0b | ab
S1=> xx | yy
49
CFLs are closed under union
Let L1 and L2 be CFLs
To show: L2 U L2 is also a CFL
◼ Make a new language:
◼ Lnew = {a,b} s.t., s(a) = L1 and s(b) = L2
==> s(Lnew) == same as == L1 U L2
◼ Let us show by using the result of Substitution
◼ A more direct, alternative proof
◼ Let S1 and S2 be the starting variables of the grammars for L1 and L2
◼ Then, Snew => S1 | S2
50
CFLs are closed under concatenation
51
CFLs are closed under Kleene Closure
◼ Let L be a CFL
◼ Then, L* = s(Lnew)
52
CFLs are closed under Reversal
◼ Let L be a CFL, with grammar G=(V,T,P,S)
◼ For LR, construct GR =(V, T, PR, S) such that,
◼ If A==> is in P, then:
◼ A==> R is in PR
53
CFLs are not closed under Intersection
◼ Existential proof:
◼ L1 = {0n1n2i | n ≥ 1,I ≥ 1}
◼ L2 = {0i1n2n | n ≥ 1,I ≥ 1}
◼ Both L1 and L2 are CFLs
◼ Grammars?
◼ But L1 L2 cannot be a CFL
◼ Why?
◼ We have an example, where intersection is not closed.
◼ Therefore, CFLs are not closed under intersection.
54
CFLs are not closed under complementation
◼ Follows from the fact that CFLs are not closed under
intersection
◼ L1 L2 = L1 U L2
55
CFLs are not closed under difference
◼ Follows from the fact that CFLs are not closed
under complementation
56
• Decision Properties of CFLs
Summary of Decision Properties
• As usual, when we talk about “a CFL” we really mean
“a representation for the CFL, e.g., a CFG or a PDA
accepting by final state or empty stack.
• There are algorithms to decide if:
1. String w is in CFL L.
2. CFL L is empty.
3. CFL L is infinite.
58
Non-Decision Properties
• Many questions that can be decided for regular sets cannot
be decided for CFL’s.
• Example: Are two CFL’s the same?
• Example: Are two CFL’s disjoint?
• How would you do that for regular languages?
• Need theory of Turing machines and decidability to prove
no algorithm exists.
59
Testing Emptiness
• We already did this.
• We learned to eliminate variables that generate no
terminal string.
• If the start symbol is one of these, then the CFL is
empty; otherwise not.
60
Testing Membership
• Want to know if string w is in L(G).
• Assume G is in CNF.
• Or convert the given grammar to CNF.
• w = ε is a special case, solved by testing if the start symbol is
nullable.
• Algorithm (CYK) is a good example of dynamic programming
and runs in time O(n3), where n = |w|.
61
CYK Algorithm
• Let w = a1…an.
• We construct an n-by-n triangular array of sets of variables.
• Xij = {variables A | A =>* ai…aj}.
• Induction on j–i+1.
• The length of the derived string.
• Finally, ask if S is in X1n.
• Basis: Xii = {A | A -> ai is a production}.
• Induction: Xij = {A | there is a production A -> BC and an
integer k, with i < k < j, such that B is in Xik and C is in Xk+1, j.
62
Example: CYK Algorithm
63
Example: CYK Algorithm
X13={}
Yields nothing
64
Example: CYK Algorithm
65
Example: CYK Algorithm
66
Example: CYK Algorithm
Grammar: S -> AB, A -> BC | a, B -> AC | b, C -> a | b
String w = ababa
X15={A}
X14={B,S} X25={A}
67
Testing Infiniteness
68
“Undecidable” problems for CFL
◼ Is a given CFG G ambiguous?
◼ Is a given CFL inherently ambiguous?
◼ Is the intersection of two CFLs empty?
◼ Are two CFLs the same?
◼ Is a given L(G) equal to ∑*?
69
Main Reference
1. Normal Forms for Context-Free Grammars
2. The Pumping Lemma for Context-Free Languages
3. Closure Properties of Context-Free Languages
4. Decision Properties of CFLs
(Introduction to Automata Theory, Languages, and Computation
(2013) Global Edition 3rd Edition)
Additional References
https://www.csa.iisc.ac.in/~deepakd/atc-2016/Seminar-CSG.pdf
https://www3.nd.edu/~cpennycu/2019/assets/fall/TOC/09%20Chomsky%20Normal%20Form.pdf
https://www.cse.iitd.ac.in/~naveen/courses/COL352/slides/cfl5.ppt
This Presentation is mainly dependent on the textbook: Introduction to Automata Theory, Languages, and Computation: Global Edition, 3rd edition (2013) PHI
by John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
Thank You