0% found this document useful (0 votes)
112 views

Context-Free Languages & Grammars (Cfls & CFGS) : Reading: Chapter 5

The document discusses context-free languages and grammars (CFLs and CFGs). Some key points: - CFLs are a class of formal languages that are more expressive than regular languages and support recursive definitions using context-free grammars (CFGs). - CFGs define a language using variables, terminals, productions, and a start variable. Productions allow variables to be substituted by strings of variables and terminals. - Examples show how CFGs can generate languages of palindromes, balanced parentheses, and other recursively defined strings. Productions allow recursive substitutions that regular grammars cannot express. - Applications of CFLs/CFGs include syntax analysis in compilers, parsing markup languages like

Uploaded by

Pro Hammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

Context-Free Languages & Grammars (Cfls & CFGS) : Reading: Chapter 5

The document discusses context-free languages and grammars (CFLs and CFGs). Some key points: - CFLs are a class of formal languages that are more expressive than regular languages and support recursive definitions using context-free grammars (CFGs). - CFGs define a language using variables, terminals, productions, and a start variable. Productions allow variables to be substituted by strings of variables and terminals. - Examples show how CFGs can generate languages of palindromes, balanced parentheses, and other recursively defined strings. Productions allow recursive substitutions that regular grammars cannot express. - Applications of CFLs/CFGs include syntax analysis in compilers, parsing markup languages like

Uploaded by

Pro Hammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Context-Free Languages &

Grammars
(CFLs & CFGs)
Reading: Chapter 5

1
Not all languages are regular
 So what happens to the languages
which are not regular?

 Can we still come up with a language


recognizer?
 i.e., something that will accept (or reject)
strings that belong (or do not belong) to the
language?

2
Context-Free Languages
 A language class larger than the class of regular
languages

 Supports natural, recursive notation called “context-


free grammar”

 Applications:
 Parse trees, compilers Context-
 XML Regular free
(FA/RE)
(PDA/CFG)

3
An Example
 A palindrome is a word that reads identical from both
ends
 E.g., madam, redivider, malayalam, 010010010
 Let L = { w | w is a binary palindrome}
 Is L regular?
 No.
 Proof:
 Let w=0N10N (assuming N to be the p/l constant)

 By Pumping lemma, w can be rewritten as xyz, such that xy kz is also L


(for any k≥0)
 But |xy|≤N and y≠
 ==> y=0+
 ==> xykz will NOT be in L for k=0
 ==> Contradiction
4
But the language of
palindromes…
is a CFL, because it supports recursive
substitution (in the form of a CFG)
 This is because we can construct a
“grammar” like this:
Same as:
1. A ==>  A => 0A0 | 1A1 | 0 | 1 | 
Terminal
2. A ==> 0
3. A ==> 1
Variable or non-terminal
Productions 4. A ==> 0A0
5. A ==> 1A1
How does this grammar work?
5
How does the CFG for
palindromes work?
An input string belongs to the language (i.e.,
accepted) iff it can be generated by the CFG
G:
 Example: w=01110 A => 0A0 | 1A1 | 0 | 1 | 
 G can generate w as follows:
Generating a string from a grammar:
1. A => 0A0 1. Pick and choose a sequence
2. => 01A10 of productions that would
3. => 01110 allow us to generate the
string.
2. At every step, substitute one variable
with one of its productions.
6
Context-Free Grammar:
Definition
 A context-free grammar G=(V,T,P,S), where:
 V: set of variables or non-terminals
 T: set of terminals (= alphabet U {})
 P: set of productions, each of which is of the form
V ==> 1 | 2 | …

Where each i is an arbitrary string of variables and
terminals
 S ==> start variable

CFG for the language of binary palindromes:


G=({A},{0,1},P,A)
P: A ==> 0 A 0 | 1 A 1 | 0 | 1 | 
7
More examples
 Parenthesis matching in code
 Syntax checking
 In scenarios where there is a general need
for:
 Matching a symbol with another symbol, or
 Matching a count of one symbol with that of
another symbol, or
 Recursively substituting one symbol with a string
of other symbols

8
Example #2
 Language of balanced paranthesis
e.g., ()(((())))((()))….
 CFG?
G:
S => (S) | SS | 

How would you “interpret” the string “(((()))()())” using this grammar?

9
Example #3
 A grammar for L = {0m1n | m≥n}

 CFG? G:
S => 0S1 | A
A => 0A | 

How would you interpret the string “00000111”


using this grammar?

10
Example #4
A program containing if-then(-else) statements
if Condition then Statement else Statement
(Or)
if Condition then Statement
CFG?

11
More examples
 L1 = {0n | n≥0 }
 L2 = {0n | n≥1 }
 L3={0i1j2k | i=j or j=k, where i,j,k≥0}
 L4={0i1j2k | i=j or i=k, where i,j,k≥1}

12
Applications of CFLs & CFGs
 Compilers use parsers for syntactic checking
 Parsers can be expressed as CFGs
1. Balancing paranthesis:
 B ==> BB | (B) | Statement
 Statement ==> …
2. If-then-else:
 S ==> SS | if Condition then Statement else Statement | if Condition
then Statement | Statement
 Condition ==> …
 Statement ==> …
3. C paranthesis matching { … }
4. Pascal begin-end matching
5. YACC (Yet Another Compiler-Compiler)

13
More applications
 Markup languages
 Nested Tag Matching
 HTML
 <html> …<p> … <a href=…> … </a> </p> …
</html>

 XML
 <PC> … <MODEL> … </MODEL> .. <RAM> …
</RAM> … </PC>

14
Tag-Markup Languages
Roll ==> <ROLL> Class Students </ROLL>
Class ==> <CLASS> Text </CLASS>
Text ==> Char Text | Char
Char ==> a | b | … | z | A | B | .. | Z
Students ==> Student Students | 
Student ==> <STUD> Text </STUD>
Here, the left hand side of each production denotes one non-terminals
(e.g., “Roll”, “Class”, etc.)
Those symbols on the right hand side for which no productions (i.e.,
substitutions) are defined are terminals (e.g., ‘a’, ‘b’, ‘|’, ‘<‘, ‘>’, “ROLL”,
etc.)
15
Structure of a production

head derivation body

A =======> 1 | 2 | … | k

The above is same as:


1. A ==> 1
2. A ==> 2
3. A ==> 3

K. A ==> k
16
CFG conventions
 Terminal symbols <== a, b, c…

 Non-terminal symbols <== A,B,C, …

 Terminal or non-terminal symbols <== X,Y,Z

 Terminal strings <== w, x, y, z

 Arbitrary strings of terminals and non-


terminals <== , , , ..
17
Syntactic Expressions in
Programming Languages
result = a*b + score + 10 * distance + c

terminals variables Operators are also


terminals

Regular languages have only terminals


 Reg expression = [a-z][a-z0-1]*
 If we allow only letters a & b, and 0 & 1 for
constants (for simplification)
 Regular expression = (a+b)(a+b+0+1)*

18
String membership
How to say if a string belong to the language
defined by a CFG?
1. Derivation
 Head to body
Both are equivalent forms
2. Recursive inference
 Body to head
G:
Example: A => 0A0 | 1A1 | 0 | 1 | 
 w = 01110
A => 0A0
 Is w a palindrome?
=> 01A10
=> 01110

19
Simple Expressions…
 We can write a CFG for accepting simple
expressions
 G = (V,T,P,S)
 V = {E,F}
 T = {0,1,a,b,+,*,(,)}
 S = {E}
 P:
 E ==> E+E | E*E | (E) | F
 F ==> aF | bF | 0F | 1F | a | b | 0 | 1

20
Generalization of derivation
 Derivation is head ==> body

 A==>X (A derives X in a single step)


 A ==>*G X (A derives X in a multiple steps)

 Transitivity:
IFA ==>*GB, and B ==>*GC, THEN A ==>*G C

21
Context-Free Language
 The language of a CFG, G=(V,T,P,S),
denoted by L(G), is the set of terminal
strings that have a derivation from the
start variable S.
 L(G) = { w in T* | S ==>*G w }

22
Left-most & Right-most
G:
Derivation Styles EF =>
=> E+E | E*E | (E) | F
aF | bF | 0F | 1F | 
Derive the string a*(ab+10) from G: E =*=>G a*(ab+10)
 E  E
 ==> E * E  ==> E * E
Left-most
 ==> F * E  ==> E * (E)
 ==> aF * E  ==> E * (E + E) Right-most
derivation:  ==> a * E  ==> E * (E + F) derivation:
 ==> a * (E)  ==> E * (E + 1F)
Always  ==> a * (E + E)  ==> E * (E + 10F) Always
substitute  ==> a * (F + E)  ==> E * (E + 10) substitute
leftmost  ==> a * (aF + E)  ==> E * (F + 10) rightmost
variable  ==> a * (abF + E)  ==> E * (aF + 10)
variable
 ==> a * (ab + E)  ==> E * (abF + 0)
 ==> a * (ab + F)  ==> E * (ab + 10)
 ==> a * (ab + 1F)  ==> F * (ab + 10)
 ==> a * (ab + 10F)  ==> aF * (ab + 10)
 ==> a * (ab + 10)  ==> a * (ab + 10)
23
Leftmost vs. Rightmost
derivations
Q1) For every leftmost derivation, there is a rightmost
derivation, and vice versa. True or False?
True - will use parse trees to prove this

Q2) Does every word generated by a CFG have a


leftmost and a rightmost derivation?
Yes – easy to prove (reverse direction)

Q3) Could there be words which have more than one


leftmost (or rightmost) derivation?
Yes – depending on the grammar
24
How to prove that your CFGs
are correct?

(using induction)

25
Gpal:
CFG & CFL A => 0A0 | 1A1 | 0 | 1 | 

 Theorem: A string w in (0+1)* is in


L(Gpal), if and only if, w is a palindrome.

 Proof:
 Use induction
 on string length for the IF part
 On length of derivation for the ONLY IF part

26
Parse trees

27
Parse Trees
 Each CFG can be represented using a parse tree:
 Each internal node is labeled by a variable in V

 Each leaf is terminal symbol

 For a production, A==>X X …X , then any internal node


1 2 k
labeled A has k children which are labeled from X1,X2,…Xk
from left to right

Parse tree for production and all other subsequent productions:


A ==> X1..Xi..Xk A

X1 … Xi … Xk

28
Examples
E

Recursive inference
A
E + E
0 A 0
F F

Derivation
1 A 1
a 1

Parse tree for 0110


Parse tree for a + 1
G: G:
E => E+E | E*E | (E) | F A => 0A0 | 1A1 | 0 | 1 | 
F => aF | bF | 0F | 1F | 0 | 1 | a | b
29
Parse Trees, Derivations, and
Recursive Inferences
Production:
A ==> X1..Xi..Xk
A

Derivation
Recursive

X1 … Xi … Xk
inference

Left-most Parse tree


derivation

Derivation Right-most
Recursive
derivation
inference
30
Interchangeability of different
CFG representations
 Parse tree ==> left-most derivation
 DFS left to right
 Parse tree ==> right-most derivation
 DFS right to left
 ==> left-most derivation == right-most
derivation
 Derivation ==> Recursive inference
 Reverse the order of productions
 Recursive inference ==> Parse trees
 bottom-up traversal of parse tree
31
Connection between CFLs
and RLs

32
What kind of grammars result for regular languages?

CFLs & Regular Languages


 A CFG is said to be right-linear if all the
productions are one of the following two
forms: A ==> wB (or) A ==> w
Where:
• A & B are variables,
• w is a string of terminals

 Theorem 1: Every right-linear CFG generates


a regular language
 Theorem 2: Every regular language has a
right-linear grammar
 Theorem 3: Left-linear CFGs also represent
RLs 33
Some Examples
0 1 0,1 0 1  A => 01B | C
1 0 B => 11B | 0C | 1A
A B C 1 0
A B 1 C C => 1A | 0 | 1
0
Right linear CFG? Right linear CFG? Finite Automaton?

34
Ambiguity in CFGs and CFLs

35
Ambiguity in CFGs
 A CFG is said to be ambiguous if there
exists a string which has more than one
left-most derivation

Example:
S ==> AS |  LM derivation #1: LM derivation #2:
A ==> A1 | 0A1 | 01 S => AS S => AS
=> 0A1S => A1S
=>0A11S => 0A11S
=> 00111S => 00111S
=> 00111 => 00111
Input string: 00111
Can be derived in two ways
36
Why does ambiguity matter?
Values are
E ==> E + E | E * E | (E) | a | b | c | 0 | 1 different !!!
string = a * b + c
E
• LM derivation #1:
• E => E + E => E * E + E E + E (a*b)+c
==>* a * b + c
E * E c

a b
E
• LM derivation #2
• E => E * E => a * E => E E
* a*(b+c)
a * E + E ==>* a * b + c
a E + E

The calculated value depends on which b c


of the two parse trees is actually used.
37
Removing Ambiguity in
Expression Evaluations
 It MAY be possible to remove ambiguity for
some CFLs
 E.g.,, in a CFG for expression evaluation by
imposing rules & restrictions such as precedence
 This would imply rewrite of the grammar
Modified unambiguous version:
 Precedence: (), * , + E => E + T | T
T => T * F | F
F => I | (E)
I => a | b | c | 0 | 1

Ambiguous version: How will this avoid ambiguity?


E ==> E + E | E * E | (E) | a | b | c | 0 | 1 38
Inherently Ambiguous CFLs
 However, for some languages, it may not be
possible to remove ambiguity

 A CFL is said to be inherently ambiguous if


every CFG that describes it is ambiguous
Example:
 L = { anbncmdm | n,m≥ 1} U {anbmcmdn | n,m≥ 1}
 L is inherently ambiguous
 Why?
Input string: anbncndn

39
Summary
 Context-free grammars
 Context-free languages
 Productions, derivations, recursive inference,
parse trees
 Left-most & right-most derivations
 Ambiguous grammars
 Removing ambiguity
 CFL/CFG applications
 parsers, markup languages

40

You might also like