Context-Free Grammars: Formalism Derivations Backus-Naur Form Left-And Rightmost Derivations
Context-Free Grammars: Formalism Derivations Backus-Naur Form Left-And Rightmost Derivations
Informal Comments
A context-free grammar is a notation for describing languages. It is more powerful than finite automata or REs, but still cannot define all possible languages. Useful for nested structures, e.g., parentheses in programming languages.
2
CFG Formalism
Terminals = symbols of the alphabet of the language being defined. Variables = nonterminals = a finite set of other symbols, each of which represents a language. Start symbol = the variable whose language is the one being defined.
5
Productions
A production has the form variable -> string of variables and terminals. Convention:
A, B, C, are variables. a, b, c, are terminals. , X, Y, Z are either terminals or variables. , w, x, y, z are strings of terminals only. , , , are strings of terminals and/or variables.
Derivations Intuition
We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the right side of one of its productions.
That is, the productions for A are those that have A on the left side of the ->.
Derivations Formalism
We say A => if A -> is a production. Example: S -> 01; S -> 0S1. S => 0S1 => 00S11 => 000111.
Iterated Derivation
=>* means zero or more derivation steps. Basis: =>* for any string . Induction: if =>* and => , then =>* .
10
11
Sentential Forms
Any string of variables and/or terminals derived from the start symbol is called a sentential form. Formally, is a sentential form iff S =>* .
12
Language of a Grammar
If G is a CFG, then L(G), the language of G, is {w | S =>* w}.
Note: w must be a terminal string, S is the start symbol.
Example: G has productions S -> and S -> 0S1. L(G) = {0n1n | n > 0}. Note: is a legitimate
right side.
13
Context-Free Languages
A language that is defined by some CFG is called a context-free language. There are CFLs that are not regular languages, such as the example just given. But not all languages are CFLs. Intuitively: CFLs can count two things, not three.
14
BNF Notation
Grammars for programming languages are often written in BNF (Backus-Naur Form ). Variables are words in <>; Example: <statement>. Terminals are often multicharacter strings indicated by boldface or underline; Example: while or WHILE.
15
Example: S -> 0S1 | 01 is shorthand for S -> 0S1 and S -> 01.
16
17
18
20
Translation: Grouping
You may, if you wish, create a new variable A for {}. One production for A: A -> . Use A in place of {}.
22
Example: Grouping
L -> S [{;S}] Replace by L -> S [A]
A stands for {;S}.
A -> ;S A -> ;S
Leftmost Derivations
Say wA =>lm w if w is a string of terminals only and A -> is a production. Also, =>*lm if becomes by a sequence of 0 or more =>lm steps.
25
Rightmost Derivations
Say Aw =>rm w if w is a string of terminals only and A -> is a production. Also, =>*rm if becomes by a sequence of 0 or more =>rm steps.
27