0% found this document useful (0 votes)
10 views

Chapter 3 Context Free Language

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 3 Context Free Language

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

Chapter 3

Context Free Languages


3.1 Context free languages
3.1.1 Sentential forms
3.2. Parsing and ambiguity
3.3. Derivation tree or parse tree
3.3.1. Left most and right most
derivations
3.4. Simplification of context free
grammar
3.4.1. Methods for transforming
grammars
3.4.2. Chomsky’s hierarchy of grammars
1
Formal Definition of a CFG
• There is a finite set of symbols that form the
strings, i.e. there is a finite alphabet. The
alphabet symbols are called terminals.
• There is a finite set of variables, sometimes
called non-terminals or syntactic categories. Each
variable represents a language (i.e. a set of
strings).
– In the palindrome example, the only variable is P.
• One of the variables is the start symbol. Other
variables may exist to help define the language.
• There is a finite set of productions or production
rules that represent the recursive definition of the
language. Each production is defined:
1. Has a single variable that is being defined to the left of
the production
2. Has the production symbol 
3. Has a string of zero or more terminals or variables, called
the body of the production. To form strings we can
2
– V is the set of variables
– T is the set of terminals
– P is the set of production rules
– S is the start symbol.
• CFG drive their name from the fact that the substitution
of the variable on the left of the production can be made
any time such a variable appears in a sentential form. It
does not depend on the symbol in the rest of the
sentential form (the context). This feature is the
consequence of following only a single variable on the
left side of the production.
3
Sample CFG
1. EI // Expression is an identifier
2. EE+E // Add two expressions
3. EE*E // Multiply two expressions
4. E(E) // Add parenthesis
5. I L // Identifier is a Letter
6. I ID // Identifier + Digit
7. I IL // Identifier + Letter
8. D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 // Digits
9. L a|b|c|…A|B|…Z // Letters

Note Identifiers are regular; could


describe as (letter)(letter + digit)*

4
Context-Free Grammar: G (V , T , S , P )

All productions in P are of the form

A s
Variable String of
variables and
terminals

5
Example of Context-Free Grammar
S  aSb | 
productions
P {S  aSb, S  }

G V , T , S , P 

V {S }
T {a, b} start variable
variables
terminals
6
Language of a Grammar:

For a grammarG S
with start variable

*
L(G ) {w : S  w, w  T *}

String of terminals or 

7
Example:

context-free grammar G : S  aSb | 

n n
L(G )  {a b : n  0}

Since, there is derivation



n n
Sa b for any n 0
8
Context-Free Language definition:
A languageL is context-free
if there is a context-free grammarG
L L(G )
with

9
Example:
n n
L  {a b : n  0}
is a context-free language
since context-free grammarG :
S  aSb | 

generates L(G ) L

10
Another Example

Context-free grammar G :
S  aSa | bSb | 
Example derivations:
S  aSa  abSba  abba
S  aSa  abSba  abaSaba  abaaba

R
L(G ) {ww : w {a, b}*}
Palindromes of even length
11
Another Example
Context-free grammar G :
S  aSb | SS | 
Example derivations:
S  SS  aSbS  abS  ab
S  SS  aSbS  abS  abaSb  abab

L(G ) {w : na ( w) nb ( w),


and na (v) nb (v)
Describes
in any prefix v}
matched
parentheses: () ((( ))) (( )) a (, b )
12
Derivation Order

Consider the following example grammar


with 5 productions:

1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

13
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation order of string aab :

1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

At each step, we substitute the


leftmost variable
14
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Rightmost derivation order of string aab :

1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
At each step, we substitute the
rightmost variable
15
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation of aab :


1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

Rightmost derivation of aab :


1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
16
Derivation Trees/ parse tree
onsider the same example grammar:
S  AB A  aaA |  B  Bb | 
And a derivation of aab :
S  AB  aaAB  aaABb  aaBb  aab
Definition:
• A parse tree for a context-free grammar G = (V,P,R, S) is

a tree whose nodes are labeled by elements of V u and
that satisfies the following conditions.
• The root is labeled by the start symbol S.
• Each interior node is labeled by a non-terminal.

• Each leaf is labeled by a terminal symbol or by .

17
S  AB A  aaA |  B  Bb | 

S  AB
S

A B

yield AB

18
S  AB A  aaA |  B  Bb | 

S  AB  aaAB
S

A B

yield aaAB
a a A

19
S  AB A  aaA |  B  Bb | 

S  AB  aaAB  aaABb
S

A B

a a A B b

yield aaABb
20
S  AB A  aaA |  B  Bb | 
S  AB  aaAB  aaABb  aaBb
S

A B

a a A B b

yield
 aaBb aaBb
21
S  AB A  aaA |  B  Bb | 
S  AB  aaAB  aaABb  aaBb  aab
Derivation Tree S
(parse tree)
A B

a a A B b
yield
  aab aab
22
Sometimes, derivation order doesn’t matter
Leftmost derivation:
S  AB  aaAB  aaB  aaBb  aab
Rightmost derivation:
S  AB  ABb  Ab  aaAb  aab
S

Give same A B
derivation tree
a a A B b

  23
Ambiguity

24
Grammar for mathematical expressions

E  E  E | E  E | (E) | a

Example strings:
(a  a)  a  (a  a  (a  a))

Denotes any number

25
E  E  E | E  E | (E) | a

E  E E  aE  aEE
E
 a  a E  a  a*a
E  E
A leftmost derivation
for a  a  a
a E  E

a a
26
E  E  E | E  E | (E) | a

E  EE  E EE  aEE


E
 a aE  a aa
E  E
Another
leftmost derivation
for a  a  a E  E a

a a
27
E  E  E | E  E | (E) | a

Two derivation trees


for a  a  a
E E

E  E E  E

a E  E E  E a

a a a a
28
take a 2

a  a  a 2  2  2

E E

E  E E  E

2 E  E E  E 2

2 2 2 2
29
Good Tree Bad Tree
2  2  2 6 2  2  2 8
6 Compute expression result 8
E using the tree E
2 4 4 2
E  E E  E
2 2 2 2
2 E  E E  E 2

2 2 2 2
30
Two different derivation trees
may cause problems in applications which
use the derivation trees:

• Evaluating expressions

• In general, in compilers
for programming languages

31
Ambiguous Grammar:
A context-free grammar G is ambiguous
if there is a string w L(G ) which has:

two different derivation trees


or
two leftmost derivations

(Two different derivation trees give two


different leftmost derivations and vice-versa)
32
Example: E  E  E | E  E | (E) | a

this grammar is ambiguous since


string a  a  a has two derivation trees
E E

E  E E  E

a E  E E  E a

a a a a
33
E  E  E | E  E | (E) | a
his grammar is ambiguous also because
string a  a  a has two leftmost derivations
E  E E  aE  aEE
 a  a E  a  a*a

E  EE  E EE  aEE


 a aE  a aa
34
Another ambiguous grammar:

IF_STMT  if EXPR then STMT


| if EXPR then STMT else STMT

Variables Terminals

Very common piece of grammar


in programming languages
35
If expr1 then if expr2 then stmt1 else stmt2
IF_STMT

if expr1 then STMT

if expr2 then stmt1 else stmt2

Two derivation trees


IF_STMT

if expr1 then STMT else stmt2

if expr2 then stmt1


36
In general, ambiguity is bad
and we want to remove it

Sometimes it is possible to find


a non-ambiguous grammar for a language

But, in general ιt is difficult to achieve this

37
A successful example:
Equivalent
Ambiguous
Non-Ambiguous
Grammar
Grammar
E  E E
E  E T |T
E  E E
T  T F | F
E  (E )
E  a F  (E ) | a

generates the same


language
38
E  E T  T T  F T  a T  a T  F
 a  F F  a aF  a aa
E
E  E T |T
E  T
T  T F | F
F  (E) | a T T  F

F F a
Unique
derivation tree
for a  a  a a a
39
An un-successful example:

n n m n m m
L {a b c }  {a b c }
n, m 0

L is inherently ambiguous:
every grammar that generates this
language is ambiguous

40
Example (ambiguous) grammar forL :

n n m n m m
L {a b c }  {a b c }

S  S1 | S 2 S1  S1c | A S 2  aS2 | B
A  aAb |  B  bBc | 

41
The string a nb nc n  L
has always two different derivation trees
(for any grammar)

For example
S S

S1 S2

S1 c a S2

42
Simplifications
of
Context-Free Grammars

43
• To Simplify a Context-Free Grammars,
Removing All
Step 1: Remove Nullable Variables
Step 2: Remove Unit-Productions: any
production of the context-free grammar
of the form AB, where A,B are
elements of V is called a unit
production.
Step 3: Remove Useless Variables:
either it cannot be reached from the
start symbol or it can’t derive a terminal
string. 44
A Substitution Rule

Equivalent
grammar
S  aB
S  aB | ab
A  aaA
Substitute A  aaA
A  abBc B b A  abBc | abbc
B  aA
B  aA
B b
45
A Substitution Rule
S  aB | ab
A  aaA
A  abBc | abbc
B  aA
Substitute
B  aA
S  aB | ab | aaA
A  aaA Equivalent
A  abBc | abbc | abaAc grammar
46
In general:
A  xBz

B  y1

Substitute
B  y1

equivalent
A  xBz | xy1z grammar
47
Nullable Variables

  production : A 

Nullable Variable: A  

48
Removing Nullable Variables

Example Grammar:

S  aMb
M  aMb
M

Nullable variable

49
Final Grammar

S  aMb
S  aMb
Substitute S  ab
M  aMb M
M  aMb
M
M  ab

50
Unit-Productions

Unit Production: A B
(a single variable in both sides)

Removing Unit Productions


Observation:
A A

Is removed immediately
51
Example Grammar:

S  aA
A a
A B
B A
B  bb

52
S  aA
S  aA | aB
A a
Substitute A a
A B A B B  A| B
B A
B  bb
B  bb

53
S  aA | aB S  aA | aB
A a Remove A a
B  A| B B B B A
B  bb B  bb

54
S  aA | aB
S  aA | aB | aA
A a Substitute
B A A a
B A
B  bb
B  bb

55
Remove repeated productions

Final grammar
S  aA | aB | aA S  aA | aB
A a A a
B  bb B  bb

56
Useless Productions

S  aSb
S
S A
A  aA Useless Production

Some derivations never terminate...

S  A  aA  aaA    aa aA  
57
Another grammar:

S A
A  aA
A 
B  bA Useless Production
Not reachable from S

58
In general: contains only
terminals

if S    xAy    w

w L(G )

then variable A is useful

otherwise, variable A is useless

59
A production A  x is useless
if any of its variables is useless

S  aSb
S Productions
Variables S A useless

useless A  aA useless
useless B C useless

useless C  D useless
60
Removing Useless Productions

Example Grammar:

S  aS | A | C
A a
B  aa
C  aCb

61
First: find all variables that can produce
strings with only terminals

S  aS | A | C Round 1: { A, B}
A a S A
B  aa
C  aCb Round 2: { A, B, S }

62
Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)

S  aS | A | C
A a S  aS | A
B  aa A a
C  aCb B  aa
Remove useless productions
63
Second: Find all variables
reachable from S

Use a Dependency Graph

S  aS | A
A a S A B
B  aa not
reachable

64
Keep only the variables
reachable from S
(the rest variables are useless)

Final Grammar
S  aS | A
S  aS | A
A a
A a
B  aa

Remove useless productions

65
Normal Forms
for
Context-free Grammars

66
Chomsky Normal Form(CNF)
Each productions has form:
A  BC or A a

variable variable terminal

 Steps to convert in CNF


• Remove the unit productions, and l-productions if any,
• Remove the terminals on the right hand side of length two
or more.
• Limit the number of variables on the right hand side of
productions to two. 67
Examples:

S  AS S  AS
S a S  AAS
A  SA A  SA
A b A  aa
Chomsky Not Chomsky
Normal Form Normal Form

68
Convertion to Chomsky Normal Form

Example: S  ABa
A  aab
B  Ac

Not Chomsky
Normal Form

69
Introduce variables for terminals: Ta , Tb , Tc

S  ABTa
S  ABa A  TaTaTb
A  aab B  ATc
B  Ac Ta  a
Tb  b
Tc  c
70
Introduce intermediate variable: V1

S  AV1
S  ABTa
V1  BTa
A  TaTaTb
A  TaTaTb
B  ATc
B  ATc
Ta  a
Ta  a
Tb  b
Tb  b
Tc  c
Tc  c
71
Introduce intermediate variable: V2
S  AV1
S  AV1
V1  BTa
V1  BTa
A  TaV2
A  TaTaTb
V2  TaTb
B  ATc
B  ATc
Ta  a
Ta  a
Tb  b
Tb  b
Tc  c
Tc  c 72
Final grammar in Chomsky Normal Form:
S  AV1
V1  BTa
A  TaV2
Initial grammar
V2  TaTb
S  ABa B  ATc
A  aab Ta  a
B  Ac Tb  b
Tc  c 73
In general:

From any context-free grammar


(which doesn’t produce )
not in Chomsky Normal Form

we can obtain:
An equivalent grammar
in Chomsky Normal Form

74
The Procedure
First remove: Nullable variables, Unit productions

Then, for every symbol a :

Add production Ta  a

In productions: replace with Ta

New variable: Ta
75
Replace any production A  C1C2 Cn
with A  C1V1
V1  C2V2

Vn 2  Cn 1Cn
New intermediate variables:V1, V2 ,  ,Vn  2
Theorem: For any context-free grammar
(which doesn’t produce )
there is an equivalent grammar
in Chomsky Normal Form 76
Greibach Normal Form (GNF)
In Chomsky’s Normal Form (CNF), restrictions are
put on the length of right sides of a production,
whereas in Greibach Normal Form (GNF), restriction
are put on the positions in which terminals and
variables can appear.

All productions have form:


A  a V1V2 Vk k 0

symbol variables
77
Examples:

S  cAB
S  abSb
A  aA | bB | b
S  aa
B b

Greinbach Not Greinbach


Normal Form Normal Form

78
Conversion to Greinbach Normal Form:

S  aTb STb
S  abSb S  aTa
S  aa Ta  a
Tb  b
Greinbach
Normal Form
79
Theorem:For any context-free grammar
(which doesn’t produce  )
there is an equivalent grammar
in Greinbach Normal Form

Observations
• Greinbach normal forms are very good
for parsing
• It is hard to find the Greinbach normal
form of any context-free grammar
80
Closure Properties of Context-free
Language
• A set is closed (under an operation) if and only if the
operation on two elements of the set produces
another element of the set.
• If an element outside the set is produced, then the operation
is not closed.
• Closure is a property which describes the application of the
property on any two elements of the set; the result is also
included in the set.

81
Context-free Language Closed Under
Union
Concatenation
Star Closure
Intersection
Not Closed Under Complementation
Every Regular Language is a Context-free
Language

82
Pumping Lemma for CFL
The pumping lemma for CFL is used to prove that
certain
sets are not context free.
Every CFL fulfills some general properties.
But if a set or language fulfills all the properties
of the pumping lemma for CFL, it cannot be said that
the language is context free.
But the reverse is true, i.e., if a language breaks the
properties it can be said that the language is not context
free.

83
Pumping Lemma for CFL: Let L be a
CFL. Then, we can find a natural number n
such that
1. Every z ∈ L with z ≥ n can be written as
w = uvwxy, for some strings u,v,w,x,y.
2. | vx | ≥ 1
3. | vwx | ≤ n
4. uvkwxky ∈ L for all k ≥ 0

84

You might also like