0% found this document useful (0 votes)
3 views

Lecture 9

Uploaded by

Moumer Zaryab
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 9

Uploaded by

Moumer Zaryab
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

Context-Free Languages

Context-free grammar

• This is a different model for describing languages


• The language is specified by productions (substitution rules) that tell how
strings can be obtained, e.g.

A → 0A1 A, B are variables


A→B 0, 1, # are terminals
B→# A is the start variable

• Using these rules, we can derive strings like this:

A  0A1  00A11 000A111


 000B111
 000#111
Programming languages
• Context-free grammars are also used to describe (parts of)
programming languages
• For instance, expressions like (2 + 3) * 5 or
3 + 8 + 2 * 7 can be described by the CFG

<expr>  <expr> + <expr> Variables: <expr>


<expr>  <expr> * <expr> Terminals: +, *, (, ), 0, 1, …, 9
<expr>  (<expr>)
<expr>  0
<expr>  1

<expr>  9
Motivation for studying CFGs

• Context-free grammars are essential for understanding


the meaning of computer programs
code: (2 + 3) * 5

meaning: “add 2 and 3, and then multiply by 5”

• They are used in compilers


Definition of context-free grammar

• A context-free grammar (CFG) is a 4-tuple


(V, T, P, S) where
• V is a finite set of variables or non-terminals
• T is a finite set of terminals (V T = )
• P is a set of productions or substitution rules of the form
A→
where A is a symbol in V and is a string over V  T
• S is a variable in V called the start variable
Shorthand notation for productions

• When we have multiple productions with the same variable on


the left like

EE+E N  0N Variables: E, N
EE*E N  1N Terminals: +, *, (, ), 0,
E  (E) N0 1
EN N1 Start variable: E

we can write this in shorthand as


E  E + E | E * E | (E) | 0 | 1
N  0N | 1N | 0 | 1
Derivation
• A derivation is a sequential application of productions:
EE*E 
 (E) * E
 (E) * N means  can be obtained

derivation
 (E + E ) * N from  with one production
 (E + E ) * 1
 (E + N) * 1 *

 (N + N) * 1
means  can be obtained
 (N + 1N) * 1
from  after zero or more
 (N + 10) * 1
productions
 (1 + 10) * 1
Example 1

A → 0A1 | B variables: A, B
B→# terminals: 0, 1, #
start variable: A

• Is the string 00#11 in L?


• How about 00#111, 00#0#1#11?

• What is the language of this CFG?

L = {0n#1n: n ≥ 0}
Example 2

S  SS | (S) | 
convention: variables in uppercase,
terminals in lowercase, start variable first
• Give derivations of (), (()())

S  (S) (rule S  (S) (rule 2)


2)  (SS)
 () (rule (rule 1)
3)  ((S)S) (rule 2)
 ((S)(S)) (rule 2)
 (()(S)) (rule 3)
• How about ())?
 (()())
(rule 3)
Examples: Designing CFGs

• Write a CFG for the following languages

• The language L = {anbncmdm | n  0, m  0}

• The language L = {anbmcmdn | n  0, m  0}


Examples: Designing CFGs
Context-free versus regular
• Write a CFG for the language (0 + 1)*111
S  A111
A   | 0A | 1A
• Can you do so for every regular language?

Every regular language is context-free


From regular to context-free
regular expression CFG

 grammar with no rules


 S→
a (alphabet symbol) S →a
E1 + E 2 S→ S1 | S2
E1 E2 S→ S1S2
E1 * S→ SS1 | 

n all cases, S becomes the new start symbol


Context-free versus regular

• Is every context-free language regular?


• No! We already saw some examples:
A → 0A1 | B
B→#
L = {0n#1n: n ≥ 0}

• This language is context-free but not regular


Parse tree
• Derivations can also be represented using parse trees

E  E + E | E - E | (E) | E
V
Vx|y|z E+E
EE+E
V+E V ( E )
x+E
 x + (E)
x
 x + (E  E) E  E
 x + (V  E)
 x + (y  E) V V
 x + (y  V)
 x + (y  z)
y z
Definition of parse tree

• A parse tree for a CFG G is an ordered tree with labels on


the nodes such that
• Every internal node is labeled by a variable
• Every leaf is labeled by a terminal or 
• Leaves labeled by  have no siblings
• If a node is labeled A and has children A1, …, Ak from left to right,
then the rule
A → A1…Ak
is a production in G.
Left derivation
• Always derive the leftmost variable first:
E
EE+E
V+E E +E
x+E
 x + (E) V ( E )
 x + (E  E)
 x + (V  E) x E  E
 x + (y  E) V V
 x + (y  V)
 x + (y  z) y z

• Corresponds to a left-to-right traversal of parse tree


Ambiguity
• A grammar is ambiguous if some strings have more than one
parse tree

• Example: E  E + E | E  E | (E) |
V
Vx|y|z

E E

E +E E +E
x+y+z
V E +E E +E V
x V V V V z
y z x y
Why ambiguity matters
• The parse tree represents the intended meaning:

E E
E +E E +E
x+y+z
V E +E E +E V

x V V V V z
y z x y

“first add y and z, “first add x and y,


and then add this to x” and then add z to this”
Why ambiguity matters
• Suppose we also had multiplication:

E  E + E | E  E | E  E | (E) |


V
Vx|y|z
E E
E * E E +E
xy+z
V E +E E E V

x V V V V z
y z x y

“first y + z, then x ” “first x  y, then + z”


Disambiguation
• Sometimes we can rewrite the grammar to remove the
ambiguity
E  E + E | E  E | E  E | (E) |
V
Vx|y|z

• Rewrite grammar so  cannot be broken by +:

ET|E+T|E T stands for term: x * (y +


T z)
TF|TF F stands for factor: x, (y +
F  (E) | V z)
Vx|y|z A term always splits into
factors
A factor is either a variable
Disambiguation
• Example
E
ET|E+T|E
T
E T
TF|TF
F  (E) | V T
Vx|y|z
T F F

V V V

x  y + z

You might also like