Welcome To CS4212: Compiler Design
Welcome To CS4212: Compiler Design
. . .
LD R1 X Target program
LD R2 1
ADD R3 R2 R1
STR R3 X
11
Formal languages
A language is formally dened by :
A set T of terminal symbols.
A set N of non-terminal symbols.
A set P of syntactic rules (or production rules).
A start symbol S.
We dene a grammar G by G = (T, N, P, S).
12
The language of context-free grammars
Expr = Expr Operand Expr
Expr = Identifier
Expr = Constant
Expr = ( Expr )
Operand = + | * | - | /
Identifier = Char Char | Char
Char = A | ... | Z | a | ... | z
Constant = Number Number | Number
Number = 0 | ... | 9
This syntax was originally developed by J. Backus and P. Naur
for the denition of Algol 60. Commonly, called Backus-Naur
form or BNF.
Exercise: Determine start, terminal and non-terminal symbols.
13
Extended Backus Naur Form
Grammars can often be simplied and shortened by using two
more constructs:
{x} expresses repitition, zero, one ore more occurences of
x.
[x] expresses option, zero, or one occurence of x.
The resulting formalism is called extended Backus Naur form or
EBNF.
14
Warm up: IMP a simple imperative language
Concrete syntax (already simplied):
com = skip | x := exp | if exp then com else com
| if exp then com | while exp do com
| var x := exp ; com | com ; com
exp = v | x | exp op exp
op = + | - | * | / | = | < | &&
v = i | true | false
i = 1 | 2 | ...
x = l {l}
l = a | ... | z
Problems:
Whats the meaning of
if e1 then c1; if e2 then c2 else c3
15
IMP Grammer
Compare
if e1 then c1; (if e2 then c2 else c3)
to
if e1 then (c1; if e2 then c2) else c3
Grammar has conicts!
16
Examples
var x:= 1;
var y:= 2;
var z:= if x < y then true else y; skip
var x:= 1;
var x:= x < 1; skip
Are the above programs valid, whats their meaning?
We need to provide a formal specication of IMP.
17
Abstract syntax
Variables x
Numbers i
Values v ::= i | true | false
Operators o ::= +| | | / |=|<|
Expressions e ::= v | x | eoe | e
Commands c ::= skip | x := e | c; c | if ethen celse c |
while edo c | var x := e; c
Assume that
if exp then com
has been translated to
if exp then com else skip
18
Side conditions
Types of operands and operators must be compatible.
If-then-else condition must be a Boolean expression.
Types of variables and expression in an assignment must
be compatible.
We will employ a type system to enforce these side conditions!
Types classify values!
Here: Int, Bool, Cmd
Clauses e : state expression e is well-typed (with type )
under type environment where {Int, Bool}.
Similarly, we have c : Cmd
={x
1
:
1
, . . . , x
n
:
}
where x
i
are free variables.
19
Rules for expressions
(Taut) i : Int true : Bool false : Bool
(Var)
(x : )
x :
(IOP)
e
1
: Int e
2
: Int o {+, , , /}
e
1
oe
2
: Int
20
Rules for expressions
(CMP)
e
1
: Int e
2
: Int o {<, =}
e
1
oe
2
: Bool
(AND)
e
1
: Bool e
2
: Bool
e
1
e
2
: Bool
(NOT)
e : Bool
e : Bool
21
Rules for commands
(SKIP) skip : Cmd
(ASSIGN)
x : e :
x := e : Cmd
(SEQ)
c
1
: Cmd c
2
: Cmd
c
1
; c
2
: Cmd
22
Rules for commands
(IF)
e : Bool c
1
: Cmd c
2
: Cmd
if ethen c
1
else c
2
: Cmd
(WHILE)
e : Bool c : Cmd
while edo c : Cmd
(NEWVAR)
e : , x : c : Cmd
var x := e; c : Cmd
We dene , x : = {y :
| y = x}{x : }.
For example, {x : Bool}, x : Int ={x : Int}.
23
IMP semantics
We employ an operational semantics.
Clause S c S
v if x = y
S(y) otherwise
24
Operational semantics of expressions
(VALUE) S i i S false false S true true
(VAR) S x v (S(x) = v)
(IOP)
S e
1
v
1
S e
2
v
2
v = v
1
+v
2
S e
1
+e
2
v
S e
1
v
1
S e
2
v
2
v = v
1
v
2
S e
1
e
2
v
S e
1
v
1
S e
2
v
2
v
2
= 0, v = v
1
v
2
S e
1
/e
2
v
25
Operational semantics of expressions
(CMP)
S e
1
v
1
S e
2
v
2
v
1
= v
2
S e
1
= e
2
true
S e
1
v
1
S e
2
v
2
v
1
= v
2
S e
1
= e
2
false
(NOT)
S e true
S e false
S e false
S e true
(AND)
S e
1
true S e
2
v
2
S e
1
e
2
v
2
S e
1
false
S e
1
e
2
false
26
Operational semantics of commands
(SKIP) S skip S
(ASSIGN)
S e v
S x := e (S | x v)
(IF)
S e true S c
1
S
S if ethen c
1
else c
2
S
S e false S c
2
S
S if ethen c
1
else c
2
S
27
Operational semantics of commands
(SEQ)
S c
1
S
c
2
S
S c
1
; c
2
S
(WHILE)
S e false
S while edo c S
S e true S c S
while edo c S
S while edo c S
(NEWVAR)
S e v (S | x v) c S
S var x := e; c (S
| x S(x))
Phew! Now we are in the position to write a compiler for IMP!
28
Operational Semantics Results
Lemma 1 We have that S while edo c S
iff
S if ethen (c; while edo c)else skip S
.
Theorem 1 (a) Let x f v(e) and S e v, then
(S | x w) e v.
(b) Let x f v(c) and S c S
, then
(S | x w) c (S
| x w).
Theorem 2 IMP is deterministic. For any expression e,
command c and state S there exists at most one v and S
.
Sound language specication!
29
Other compiler related issues
Resource usage verication: prevent incorrect resource access
Source program:
main(n) = open(f) ; a(f,n) ; close(f)
a(f,n) = write(f);
if (n mod 2 == 0) then read(f);
if (n > 0) then a(f,n-1)
Resource usage policy (specied as DFA):
//
1
open
//
2
close
//
read
write
4
3
read
OO
What could go wrong? 30
Summary
A detailed formal language specication is crucial
Negative examples, see C, C++
Good, ANSI-C and ML. C better, but there are still
ambiguities.
Compiler writer needs to follow language specication
carefully.
Next week: lexical analysis, some theory on DFAs, NFAs,
regular expressions.
31