Grammar and Parse Trees
(Syntax)
(note to instructor, have a video to load under 1st CFG Grammar Example)
What makes a good programming Language?
What do you think?
BTW, don’t take any of these for granted!!
Creation Order of language X
In a Systems Analysis and Design class you cover the ways applications are
developed
o the process for developing a language is much like a waterfall
u
B
g
R
p
e
o tD
rsUlIm
v n
E
a Creation Order of Language X
Implementers shape how the code is formed, but are handcuffed by the Language
1
Syntax vs. semantics
Syntax
o the form or structure of the expressions, statements, and program units
Semantics
o the meaning of the expressions, statements, and program units
Syntax and semantics provide a language’s definition
o Users of a language definition
Other language designers
Implementers
Programmers (the users of the language)
Both are closely related
A well designed language you should be able to read the statement
parentheses syntax and get what it is they will do (semantics)
Syntax vs. Semantics
Syntax Semantics
Java int x = 12; Set x to the value 12
Python x = 12 Set x to the value 12
Syntax vs. semantics example
While (Boolean_expression) statement ] – physical makeup (syntax)
Meaning of the statement (semantics)
Terms in syntax
Language
o set of sentences, combination of keywords
Sentence
o a string of characters over some alphabet
o a line of syntax
Lexeme
o lowest level syntactic unit of a language (e.g., *, sum, begin)
o Numerical limits, operators, special words, etc…
o a program is a study of lexemes
2
Tokens
o is a category of lexemes (identifiers)
o words in a syntax
Lexemes and Tokens Revisited
“LECK-seem”
examples (each separated by a space)
o const { } cout << 23.23 ++ ; “Lupoli”
Lexemes vs. Tokens
Lexemes
o are read in and recognized by a Scanner
Scanner described below (images)
o that Scanner then places that lexeme into a Token category
Tokens
o lexemes broken down into the categories
reserved or keywords words
an identifier cannot be in the same sentence as a reserved
word
int else;
identifiers
names of variables, methods, classes, etc…
Operators and special symbols
+, -, /, etc…
Literals or constants
Values placed in equations or hard coded digits
3
Where the scanner is in the entire process
4
Unary Operators
operators that act upon a single operand, set around a value
o prefix or postfix around the value
o 3 – (-2), x++
C family of Unary Operators
Increment: ++x, x++ Positive: +x
Decrement: −−x, x−− Negative: −x
Address: &x One's complement: ~x
Indirection: *x Logical negation: !x
Grammars
type of language generator, meant to describe the syntax of natural languages
the grammar can have nested, recursive, self-similar branches in their syntax
trees
o so they can handle nested structures well.
They can be implemented as state automaton with stack
o This stack is used to represent the nesting level of the syntax
one portion may have to wait until another portion is solved
Two grammar classes
o Context-free (CF)
o Regular
Both used to describe the syntax of programming languages
5
1st CFG Grammar Example
EXP -> NUM | ( EXP OP EXP )
OP -> + | - | * | /
NUM -> DIGIT | DIGIT NUM
DIGIT -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | -1 | -2 | …
Test if ( 4 – (3 + 2) ) fits the grammar
How did I get this answer?
THINK OF THIS AS PATTERN MATCHING
6
4 digit value be covered by this grammar?
// remember, <NUM> would not start the parse tree
Do these fit the grammar?? And why, why not??
Hint: STOP as soon as you see something is off!! (The parser/compiler will!!)
Suggestion: double circle your terminal value
EXP -> NUM | ( EXP OP EXP )
OP -> + | - | * | /
NUM -> DIGIT | DIGIT NUM
DIGIT -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | -1 | -2 | …
DRAW THE PARSE TREE!!!!
1. ( -1 * ( 3 + 43) ) Answer:
2. 9 ( 4 + 8 ) Answerb:
3. ( 8 + ( 6 * (5 * 2) ) ) Answerb:
7
Context-free language and grammar (CFGs)
Context Free Grammar (CFG) Example
a CFG for the language of all
palindromes using letters a and b Right Hand Side
Left Hand Side
S→P
P → ε // think of epsilon as a “null”
P→a
P→b
P → aPa
P → bPb
Meta
Symbol
What are the differences between uppercase and lowercase letters?
consists of a series of grammar rules
o rules consist of a left hand side (LHS) that is a single phrased structure
name
o then “metasymbol”
o followed by a right-hand side (RHS) consisting of a sequence of items
that can be symbols or other phrase names.
also called productions
productions can be in several forms
o Context Free Grammar
o Backus – Naur form (BNF) which is next
Why “Context Free”??
o nonterminals appear singly on the LHS of the productions
o means that each non-terminal can be replaced by any RHS choice, no
matter where the non-terminal may appear
8
What other sentences would match the grammar above/below? (Create 3 more
using the example below, with 7 or more characters)
Sentence parse tree proving it fits
aabaa
S→P
P → ε // think of epsilon as a “null”
P→a
P→b
P → aPa
P → bPb
9
Parse Trees
graphical representation showing the hierarchical syntax structure of the
sentence of the language they define
or (I like better) a hierarchical representation of a derivation
must know
o the grammar
or may need many parse trees to solve what the grammar is
o the sentence you are striving for
Parse Tree Example
Grammar Parse Tree for A = B * (A + C)
<assign> <id> = <expr>
<id> A | B | C
<expr> <id> + <expr>
| <id> * <expr>
| ( <expr> )
| <id>
each internal node is a non-terminal symbol (< ? >)
every leaf is a terminal, operators included!!
every sub-tree describes one instance of an abstraction in a sentence
10
Terms in Grammar
Grammar
o a finite non-empty set of rules
abstractions
o also called non-terminals
o can have 2 or more distinct definitions/representations …
uses the | to separate sentences
Defining separate definitions for the same rule
<if_stmt> if ( <logic_expr> ) <stmt>
<if_stmt> if ( <logic_expr> ) <stmt> else <stmt>
using “|” will be
<if_stmt> if ( <logic_expr> ) <stmt>
| if ( <logic_expr> ) <stmt> else <stmt>
In these rules, <stmt> ….. (finish)
Think of “|” as OR!!!
lexemes and tokens in a grammar
o terminal and nonterminal symbols
o more explained below
11
Terms in a Grammar
sentence
o entire line in a rule
o and final solution (line of syntax)
usually from a derivation (later)
o Sebesta book uses both!!!
BNF description/grammar
o generation device for defining languages
o collection of rules.
non-terminals
o part of a sentence that can be further broken down
o in normal form (non-BNF), UPPER CASE
terminal
o part of a sentence that cannot be broken down any further
o in normal form (non-BNF), lower case
start symbol
12
o special non-terminal symbol, the very highest, non-reduced symbol in
the grammar
Backus – Naur form (BNF)
a form of CFG
invented by John Backus for Algol 58
widely used notation
Used “abstraction” for general description notation for syntax structures
Backus – Naur form (BNF)
<assign> (or anything in < >) is defined as an instance of an abstraction
RHS is made up of Tokens and Lexemes (like “,”)
LHS is non-terminals
Each line is called rule/production
13
The meaning of “list” in Grammar
not the array or list data structure!!
items of the same data type
A “list” of items
int a, b, c, d, e, f;
uses recursion in the Grammar in order to replicate the same “code” over and
over
o like in recursion, the rule must have a “base case” in order to stop
does not call itself (rule) again
o please notice a “,” is used to separate the items of the list
Example rule dealing with lists
<ident_list> identifier ; // this line would be the base case
| identifier , <ident_list>
14
General Grammar setup
some obvious (hopefully) things
o <program> is the starting point of a syntax
or some rule that has an obvious starting name
o special words or lexemes are in bold and mixed throughout the sentence
First Example of a Grammar (again)
<program> begin <stmt_list> end
<stmt_list> <stmt>
| <stmt> ; <stmt_list>
<stmt> <var> = <expression>
<expression> <var> + <var>
| <var> - <var>
| <var>
<var> A | B | C | D
Always start with upper left most non-terminal
1. Which syntax below would NOT work with the Grammar above?
a. begin A = B + C end
b. begin A = A + B + C end
c. begin A = B end
2. Why would the following syntax not be correct with the grammar given?
begin A = C – B;
B = D + A; end
Hint: No, it’s not because it’s on two lines, and don’t assume anything!!
Answer:
3. Was there recursion in the grammar given?
4. What other operators are not supported in this Grammar? Why?
Answerb:
15
Derivations in General
are what you JUST used to solve which syntax did NOT fit
o in a more formulated, step by step way
solution set from the Grammar given using its rules and ending with a
sentence
to solve
o given TARGET SYNTAX, GRAMMAR and Derivation Order
order covered in a minute
o always start with the start symbol in the grammar and its rule
o then use the other rules to get to the target syntax
the results
o the symbol “=>” means derives
o notice in each line of the derivation, only one abstraction/substitution is
derived
o each line is derived from the line before (above)
o each line is called sentential form
Example Grammar/Derivation Setup
<program> begin <stmt_list> end target syntax:
<stmt_list> <stmt>
(begin A = B + C; B = C end)
| <stmt> ; <stmt_list>
<stmt> <var> = <expression>
<var> A | B | C <program> => begin <stmt_list> end
<expression> <var> + <var> => begin <stmt> ; <stmt_list> end
| <var> - <var>
| <var> => begin <var> = <expression> ; <stmt_list> end
=> begin A = <expression> ; <stmt_list> end
=> begin A = <var> + <var> ; <stmt_list> end
=> begin A = B + <var> ; <stmt_list> end
=> begin A = B + C ; <stmt_list> end
…
** this uses a Left most derivation (explained below)
16
Derivation Order
order of derivation replacement of a sentential form
Leftmost
o in each line of the derivation, the leftmost non-terminal is replaced
using a rule in the Grammar
Rightmost (another order option)
o rightmost non-terminal is solved first
o creates different sentences (maybe)
In order (another order option)
Leftmost examples from Previous Derivation
=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression> ; <stmt_list> end
** notice in line 2, you have two options, but only the left was replaced
=> begin A = B + C ; <var> = <expression> end
=> begin A = B + C ; B = <expression> end
Derivation order (should not at least) has no effect on the language generated
by the grammar
by using ALL of the different order combinations do you get your entire
language sentences
o if we did not have a target syntax
o which in reality, would be impossible (super huge!) to get ALL
combinations, but at least a good feel for it
17
Solving Derivations and First Example
to solve
o given TARGET SYNTAX, GRAMMAR and Derivation Order
o always start with the start symbol in the grammar and its rule
o then use the other rules to get to the target syntax
but you CANNOT change derivation orders (left, right, etc…) within the same
derivation
o either the whole derivation is left, right, or inorder
1st Leftmost Derivation Example
Given Grammar Derivation with Target Syntax
(begin A = B + C; B = C end)
<program> begin <stmt_list> end <program> => begin <stmt_list> end
<stmt_list> <stmt>
| <stmt> ; <stmt_list>
=> begin <stmt> ; <stmt_list> end
<stmt> <var> = <expression> => begin <var> = <expression> ; <stmt_list> end
<var> A | B | C => begin A = <expression> ; <stmt_list> end
<expression> <var> + <var> => begin A = <var> + <var> ; <stmt_list> end
| <var> - <var>
| <var> => begin A = B + <var> ; <stmt_list> end
=> begin A = B + C ; <stmt_list> end
=> begin A = B + C ; <stmt> end
=> begin A = B + C ; <var> = <expression> end
=> begin A = B + C ; B = <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end
18
Exercise #1
Given Grammar Leftmost Derivation w/ Target Syntax
<sentence> -> <subject> <predicate> A DOG PETS A DOG
<subject> -> <article> <noun>
<predicate> -> <verb> <direct-object>
<sentence> ->
<direct-object> -> <article> <noun>
Answer:
<article> -> THE | A
<noun> -> MAN | DOG
<verb> -> BITES | PETS
Exercise #2
Given Grammar Rightmost Derivation w/ Target Syntax
<sentence> -> <subject> <predicate> THE MAN BITES A DOG
<subject> -> <article> <noun>
<predicate> -> <verb> <direct-object> (rightmost!!)
<direct-object> -> <article> <noun> Answer:
<article> -> THE | A
<noun> -> MAN | DOG
<verb> -> BITES | PETS
19
Using the JFlap tool
www.jflap.org
o free to download
o used to draw a parse tree
o used to test a grammar in many ways
installation
o fill out form
o find “JFLAP_Thin.jar” and download to desktop. No installation
needed.
how to create parse trees in JFlap
20
Ambiguity
where a sentence can be represented by more than one parse tree
o NOT DERIVATION!!!
o this is bad!!!
o could be that
left/right most derivations do not match!!
several lefts don’t match!!
why does this matter??
o mathematically seems the same
o just try programming this!!
In proving ambiguity you are TRYING TO MISMATCH with the SAME
TARGET Syntax
o just make sure your syntax is correct first!!
1st Ambiguous Grammar Example
Grammar Solution(s) for: A = B + C * A
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <expr>
| <expr> * <expr>
| ( <expr> )
| <id>
21
2nd Ambiguous Grammar Example
Grammar Solution(s) for: 3 + 4 * 5
E→E+E
E→E*E
E→i
(left most - right most)
Derviation
=> E => E
=> E + E => E * E
=> 3 + E => E + E * E
=> 3 + E * E => 3 + E * E
=> 3 + 4 * E => 3 + 4 * E
=> 3 + 4 * 5 => 3 + 4 * 5
3rd Ambiguous Grammar Example
Grammar Solution(s) for: const – const / const
<expr> <expr> <op> <expr>
| const
<op> /
| -
(left most - right most)
if the Grammar is ambiguous there are some notable things to avoid
o not defining operator precedence
further explained below, but you already know this
o have the SAME abstraction in a rule more than one
Anything ambiguous can be re-written
22
Fixing Ambiguity – well kinda
sadly, there is no “procedure” to fix
cannot be done automatically
more of a trial and error
but USUALLY it includes
o more rules
o better precedence order (see below)
o not as many “|” in a single sentence
23
Proving Ambiguity within a Grammar
start with a Grammar and a legit sentence
o USE ALL RULES WITHIN THE GIVEN GRAMMAR!!!
try both a left and right derivation
o if the trees EXACTLY match, grammar could be good
Proving Ambiguity with input string: x + y + z
Left (parse tree) Right (parse tree)
Grammar to determine if ambiguous
<S> <A>
<A> <A> + <A> | <id>
<id> a | b | c
Leftmost (GREEDY) Parse Tree Rightmost (GREEDY) Parse Tree
What does greedy mean?
watch how to prove it using parse trees
o https://www.youtube.com/watch?v=3F8kBL07TEc
24
Greedy means the Grammar is crap
Well, really means crap or educational example
The better the grammar the less you worry about greedy
Greedy versus Spare responses
Grammar to determine if ambiguous
<S> <A>
<A> <A> + <A> | <id>
<id> a | b | c
Greedy Leftmost Spare Leftmost
25
<S> <A>
<A> <A> + <A> | <id>
<id> a | b | c
1. Create another target syntax string that works with the given grammar above
a. Instead of x + y + z, try something new with a,b,c respectfully
2. Try left most and right most greedy parse tree to prove ambiguity on that new
target string
3. Do the same thing (greedy) with the grammar below: Answerb:
<binary-string> -> 0
| 1
| <binary-string> <binary-string>
Left (parse-tree) Right (parse-tree)
<S> <A> a <B> b
<A> <A> b | b
<B> a <B> | a Answerb:
1. Which of the following sentences are in the languages generated above?
a. baab
b. bbbab
c. bbaaaaa Answer:
d. bbaab
2. Describe in English, the language defined by the following grammar:
<S> <A> <B> <C>
<A> a <A> | a
<B> b <B> | b
<C> c <C> | c Answer:
26
Remember how to solve trees?
remember it’s bottom up
leaves-ish are solved first
tree = recursive, remember the last valid recursive call REALLY gets solved
first
we use this idea for precedence below
Inorder Expression Trees (Solving Trees)
(((5 + 2) * 5) + 3)
1. In the left tree, what portion of the equation is completed first? Why?
2. What would the answer be for either of these?
3. What would the equations below look like as a tree??? Use our normal
understanding of (PEMDAS) to create the tree.
a. (((3 + 4) * 5) *6)
b. 6 + 128 * 34
c. 2 ^ 6 * 12 + 5 Answersb:
27
Setting up Operator Precedence in a Grammar
setting the order of operations in a grammar
can assign different levels or precedence for operators in the grammar design
operators lower(est) on the parse tree must be completed/solved first
o left parse tree from example below is what we want
Solving by precedence
How would we solve 3 + (5 * 9 + 2) using a tree?
<assign> <id> = <expr>
<expr> <expr> + <term>
| <term> // but if not +, add layer
<term> <term> * <factor>
| <factor> // but if not *, add another layer
<factor> ( <expr> )
| <id>
<id> A | B | C
the correct setup requires
o separate non-terminal symbols to represent each operator
o the rule for * and / must be more derivations from the + and – operators
this will push them further DOWN the parse tree making them
FIRST to be completed
<expr> <expr> + <term>
| term
28
Unambiguous grammar for Operator (+ and *) Precedence
<assign> <id> = <expr>
<expr> <expr> + <term>
| <term>
<term> <term> * <factor>
| <factor>
<factor> ( <expr> )
| <id>
<id> A | B | C
What other operators should be deeper or even with <factor>?
*** notice to JUST get to factor (which has the highest precedence) it
takes many rules to get to!! <expr> <term> <factor>
1. Draw the simple parse trees for: (ignore left or rightmost for now)
a. A = C + B which level (starting at 0) does the + reside?
b. A = C * B which level does the * reside?
answerb:
2. Using the grammar above, which ALWAYS will come first, (top down)
<expr> or <term>?
(We’ll do a more complex one next!)
29
Derivation for A = B + C * A
<assign> <id> = <expr>
<expr> <expr> + <term>
| <term>
<term> <term> * <factor>
| <factor>
<factor> ( <expr> )
| <id>
<id> A | B | C
(leftmost) (rightmost)
<assign> => <id> = <expr> <assign> => <id> = <expr>
=> A = <expr> => <id> = <expr> + <term>
=> A = <expr> + <term> => <id> = <expr> + <term> * <factor>
=> A = <term> + <term> => <id> = <expr> + <term> * <id>
=> A = <factor> + <term> => <id> = <expr> + <term> * A
=> A = <id> + <term> => <id> = <expr> + <factor> * A
=> A = B + <term> => <id> = <expr> + <id> * A
=> A = B + <term> * <factor> => <id> = <expr> + C * A
=> A = B + <factor> * <factor> => <id> = <term> + C * A
=> A = B + <id> * <factor> => <id> = <factor> + C * A
=> A = B + C * <factor> => <id> = <id> + C * A
=> A = B + C * <id> => <id> = B + C * A
=> A = B + C * A => A = B + C * A
Parse Tree for A = B + C * A
Draw left parse tree here!! (answernp) Draw right parse tree here!! (answernp)
30
Parse Tree for A = B + C * A
(left and right!!)
Remember, the as long as the tree is the same
(left or right) the grammar is unambiguous
31
1. Try creating parse trees for these:
<assign> <id> = <expr> // this grammar is unambiguous
<id> A | B | C
<expr> <expr> + <term>
| <term>
<term> <term> * <factor>
| <factor>
<factor> ( <expr> )
| <id>
A = (B + C) * A A = (B * C) * A
Answerb: Answerb:
1. When you drew the tree, did:
a. the tree solve the actual problem from bottom up correctly?
b. the grammar support the precedence correctly?
32
Associativity in General
not only if you have to deal with * and /
but operators of the same precedence
parse trees (leftmost and rightmost) SHOULD look exactly the same, BUT
solve to the same equation because of associativity
BUT THE DERIVATIONS (leftmost and rightmost) will NOT LOOK the
same!!
remember, usually LEFT to RIGHT when using operators of the same
precedence.
o +, -, *, /, % are all evaluated left to right
o B+C+A
is really (B + C) + A
Associativity Grammar for Example
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <term> // left associativity
| <term>
<term> <term> * <factor> // left associativity
| <factor>
<factor> ( <exp> )
| <id>
33
Associative Rule Breakers!! Intro. to Recursion
Right to Left rule breakers!!!
o unary
o power (7 ** 8) // ( ** or ^ depending on the language)
o !
o ~ (bitwise compliment)
Why Power breaks the rules
1.0000001 * 107 = (Correctly) 10000001
(1.0000001 * 10)7 = (LEFTSIDE done first) = 101.0000101
1.0000001 * (107) = (RIGHTSIDE done first) = 10000001
Recursion
o rule calls itself
Left Recursion, call (symbol) to itself is physically LEFT of the operator
Right Recursion, call (symbol) to itself is physically RIGHT of the operator
Remember, the part of the tree that “dangles” the lowest is completed first!!
Example of Left and Right Recursion Parse Trees
Left Recursive Right Recursive
Solves left hand side first!! Solves right hand side first!!
34
Recursion Example
Left Recursion Right Recursion
(from previous examples) (specifically for power)
<expr> <expr> + <term> <factor> <exp> ** <factor>
Precedence
| <term> | <exp>
<term> <term> * <factor> <exp> ( <expr> )
| <factor> | <id>
Showing the Proper Parse Tree for **
Grammar A+B*C A ** B + C
<expr> <expr> + <term>
| <expr> - <term>
| <term>
<term> <term> * <factor>
| <term> / <factor>
| <factor>
<factor> <exp> ** <factor>
| <exp>
<exp> ( <expr> )
| <id>
<id> A | B | C | …
1. Try A + (B * C) / D just to get used to this new grammar. Did it come out
right?
2. Then try A ** B ** C. Which portion was the lowest on the tree?
Answersb:
35
FYI Section
Who needs to know and understand your language?
there are many people needed in the development of a new programming
language, but who are they??
Answers:
Language recognizers and Generators
Language Recognizers (also called Finite State Machines!!)
o accept a Language like a Machine.
The Machines take a string as input.
The Machines will accept the input if when run,
the Machine stops at an accept state. Otherwise the input is
rejected.
If a Machine M recognizes all strings in Language L, and accepts
input provided by a given string S, M is said to accept S.
Otherwise M is said to reject S. S is in L if and only if M
accepts S.
JFLAP is a FSM application!!
o Context Free Grammar (CFGs) are a well-known type of language
generator!!
Language Generators
o create the strings (sentences) of a Language.
o A generator provides a construction description.
o If a generator is able to construct all stings in a Language L, and every
string S that can be constructed by that generator is in L, we can say that
the generator is a generator for the language L.
o If there is no way to construct a string S from the generator, S is not in
L.
o Push Down Automatas (PDAs) are a well known form of language
recognizers.
not really covered here
36
Extended (Updated) BNF (EBNF)
like anything else, some updates were made for convenience
only increased the readability/writability (for us!!)
there are other versions of the updates
3 most common updates
o optional part in RHS
Optional parts are placed in brackets [ ]
almost anything with |’s in reality is now replaced
if symbol is unique to any of the rules, it may need to stay
less rules, or as many lines to write
EBNF “Optional” Update (Extension)
in C++:
<if_stmt> if( <expression> ) <statements> [else <statements> ]
Replaces
<if_stmt> if( <expression> ) <statements>
| if( <expression> ) <statements> else <statements>
o repeating
0 or more!!
use of braces in an RHS to indicate that the enclosed part can be
repeated indefinitely OR left out altogether
works great for lists!!
look for any recursion in the BNF form to be replaced
37
EBNF “Repeating” Update
<ident_list> <identifiers> { , <identifiers> }
Replaces
<ident_list> identifier
| identifier , <ident_list>
o Multiple choice!!
choose a single element from a group
options are placed in ( )s and separated by |
notice the | count is the same, just now in one line
EBNF “Multiple Choice” Update
<term> <term> ( * | / | % ) <factor>
Replaces
<term> <term> * <factor>
| <term> / <factor>
| <term> % <factor>
38
This new fangled EBNF thingy
the brackets, braces, and parentheses in the EBNF for are called
“metasymbols”
o notational tools
o not terminal symbols
issues
o in case the metasymbols are also terminal symbols in the language
the instance that are terminals are underlined or quoted
o loss of associativity
using the EBNF for of + above
no longer does it imply direction of associativity
this is fixed by using a EBNF syntax analyzer discussed later
Comparison #1 between BNF and EBNF
BNF EBNF
<expr> <expr> + <term> <expr> <term> { (+ | -) <term> }
| <expr> - <term> <term> <factor> { (* | /) <factor> }
| <term> <factor> <exp> { ** <factor> }
<term> <term> * <factor> <exp> ( <expr> )
| <term> / <factor> | <id>
| <factor>
<factor> <exp> ** <factor>
| <exp>
<exp> ( <expr> )
| <id>
Comparison #2 between BNF and EBNF
BNF EBNF
<expr> <expr> + <term> <expr> <term> {(+ | -) <term>}
| <expr> - <term> <term> <factor> {(* | /) <factor>}
| <term>
<term> <term> * <factor>
| <term> / <factor>
| <factor>
39
Converting BNF to EBNF Hints
Look for recursion in grammar:
A ::= a A | B
⇒ A ::= a { a } B
Look for common string that can be factored out with grouping and options.
A ::= a B | a
⇒ A := a [B]
BNF to EBNF Conversion Help
(BNF) (EBNF)
<expr> ::= <digits>
<digits> ::= <digit> <expr> ::= <digit>+
| <digit> <digits>
<expr> ::= <digits> | empty <expr> ::= <digit>*
<id> ::= <letter> <id> ::= <letter> (<letter> | <digit>)*
| <id><letter>
| <id><digit>
40
Lupoli’s Strategy on BNFs to EBNF
<program> begin <stmt_list> end
<stmt_list> <stmt>
| <stmt> ; <stmt_list>
<stmt> <var> = <expression>
<var> A | B | C
<expression> <var> + <var>
| <var> - <var>
| <var>
1. Find all Multiple choice, notice no ALL | are “multiple choice”
a. those that have a < ? > option < ? > are prime canditates
b. anything with multiple terminal “identifiers” are fair game
c. ( )s
<program> begin <stmt_list> end
<stmt_list> <stmt>
| <stmt> ; <stmt_list>
<stmt> <var> = <expression>
<var> (A | B | C)
<expression> <var> (+ | -) <var>
| <var>
2. Find “repeating” sequences (anything that can have multiple statements)
a. repeating can be 0!! (or more)
b. { }s
<program> begin <stmt_list> end
<stmt_list> <stmt> {; <stmt_list> }
<stmt> <var> = <expression>
<var> (A | B | C)
<expression> <var> { (+ | -) <var> }
3. Optionals, rules that are very similar, one extends from another
41
a. none in this case
<program> begin <stmt_list> end
<stmt_list> <stmt> {; <stmt_list> }
<stmt> <var> = <expression>
<var> (A | B | C)
<expression> <var> { (+ | -) <var> }
1. Convert the BNF grammar to EBNF
BNF EBNF
<assign> <id> = <expr>
<id> A | B | C
<expr> <id> + <expr>
| <id> * <expr>
| ( <expr> )
| <id>
2. Convert the following from EBNF to BNF
S A {bA}
A a [b] A
42
Answers
Simple Grammar Exercises
#2
#3
43
Binary String Grammar Example
<binary-string> -> 0
| 1
| <binary-string> <binary-string>
1010111
Leftmost Greedy Rightmost Greedy
44
Turning Equations into Trees
45
Why certain operations are not covered in grammar
Where order does not matter (covered)
3+4=
4+3=
3*4=
4*3=
Where order does matter (would need to be added)
4/2=
2/4=
46
Proving Order precedence works
A=C+B A=C*B
47
A = (B + C) * A A = (B * C) * A
48
A+(B*C)/D A ** B ** C
49
Resources:
http://stackoverflow.com/questions/2842809/lexers-vs-parsers
http://teaching.idallen.com/cst8152/97w/slides/sld021.htm
http://www.slideshare.net/dasprid/about-tokens-and-lexemes
http://everything2.com/title/Language+Generators+vs.+Language+Recognizers
http://www.antlr.org/wiki/display/CS652/Grammars
http://goose.ycp.edu/~dhovemey/fall2009/cs340/lecture/lecture2.html
http://condor.depaul.edu/ichu/csc447/notes/wk3/BNF.pdf
http://en.unitedstatesof.net/2008/09/11/2-dlr-scanner/
http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
http://www.box.com/shared/e31pciv7b9
http://www.codeproject.com/KB/cs/intro_functional_csharp2/figure3.png
50