Compiler 2
Compiler 2
Chang Chi-Chung
2008.03 rev.1
A Simple Syntax-Directed
Translator
This chapter contains introductory material to
Chapters 3 to 8
To create a syntax-directed translator that maps
infix arithmetic expressions into postfix
expressions.
Building a simple compiler involves:
Defining the syntax of a programming language
Develop a source code parser: for our compiler
we will use predictive parsing
Implementing syntax directed translation to
generate intermediate code
A Code Fragment To Be
Translated
To extend syntax-directed translator to map code fragments into three-
address code. See appendix A.
1: i = i + 1
2: t1 = a [ i ]
{ 3: if t1 < v goto 1
int i; int j; 4: j = j -1
float[100] a; float v; float x; 5: t2 = a [ j ]
while (true) { 6: if t2 > v goto 4
do i = i + 1; while ( a[i] < v ); 7: ifFalse i >= j goto 9
do j = j – 1; while ( a[j] > v ); 8: goto 14
if ( i>= j ) break; 9: x = a [ i ]
x = a[i]; a[i] = a[j]; a[j] = x; 10: t3 = a [ j ]
} 11: a [ i ] = t3
} 12: a [ j ] = x
13: goto 1
14:
A Model of a Compiler Front
End
Source Lexical Token Syntax Intermediate Three-address
program analyzer stream Parser tree Code code
Generator
Character
Stream
Symbol
Table
Two Forms of Intermediate
Code
Abstract syntax trees Tree-Address instructions
do-while
1: i = i + 1
body > 2: t1 = a [ i ]
3: if t1 < v goto
1
assign [] v
i + a i
i 1
Syntax Definition
G = <T, N, P, S>
T = { +,-,0,1,2,3,4,5,6,7,8,9 }
N = { list, digit }
P=
list list + digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
S = list
Derivations
Example A
A XYZ
X Y Z
Example of the Parser Tree
Parse tree of the string 9-5+2 using grammar G
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
Ambiguity
string string
9 - 5 + 2 9 - 5 + 2
Associativity of Operators
Left-associative
If an operand with an operator on both sides of it, then it belo
ngs to the operator to its left.
string a+b+c has the same meaning as (a+b)+c
Left-associative operators have left-recursive productions
left left + term | term
Right-associative
If an operand with an operator on both sides of it, then it belo
ngs to the operator to its right.
string a=b=c has the same meaning as a=(b=c)
Right-associative operators have right-recursive productions
right term = right | term
Associativity of Operators (co
nt’d)
list right
digit letter
a + b + c a = b = c
left-associative right-associative
Precedence of Operators
String 9+5*2 has the same meaning as 9+(5*2)
* has higher precedence than +
Constructs a grammar for arithmetic expression
s with precedence of operators.
left-associative : + - (expr)
left-associative : * / (term)
Step 1: Step 3:
factor digit | ( expr ) expr expr + term
| expr – term
| term
Step 2: Step 4:
term term * factor expr expr + term | expr – term | term
| term / factor term term * factor | term / factor | factor
| factor factor digit | ( expr )
An Example: Syntax of
Statements
The grammar is a subset of Java statements.
translate expr1 ;
translate term ;
handle + ;
Syntax-Directed Translation
(Cont’d)
Two concepts (approaches) related to
Syntax-Directed Translation.
Synthesized Attributes
Syntax-directed definition
Build up a translation by attaching strings (semantic
rules) as attributes to the nodes in the parse tree.
Translation Schemes
Syntax-directed translation
Build up a translation by program fragments which are
called semantic actions and embedded within production
bodies.
Syntax-directed definition
The syntax-directed definition associates
With each grammar symbol (terminals and nonterminals), a s
et of attributes.
With each production, a set of semantic rules for computing t
he values of the attributes associated with the symbols appea
ring in the production.
An attribute is said to be
Synthesized
if its value at a parse-tree node is determined from attribute valu
es at its children and at the node itself.
Inherited
if its value at a parse-tree node is determined from attribute valu
es at the node itself, its parent, and its siblings in the parse tree.
An Example: Synthesized
Attributes
An annotated parse tree
Suppose a node N in a parse tree is labeled by gr
ammar symbol X.
The X.a is denoted the value of attribute a of X at
node N.
expr.t = “95-2+”
term.t = “9”
9 - 5 + 2
Semantic Rules
Production Semantic Rules
expr expr1 + term expr.t = expr1.t || term.t || ‘+’
expr expr1 - term expr.t = expr1.t || term.t || ‘-’
expr term expr.t = term.t
term 0 term.t = ‘0’
term.t = ‘1’
term 1
…
…
term.t = ‘9’
term 9
|| is the operator for string concatenation in semantic rule.
Depth-First Traversals
Tree traversals
Breadth-First
Depth-First
Preorder: N L R
Inorder: L N R
Postorder: L R N
Depth-First Traversals: Postorder 、 From left to right
procedure visit(node N)
{
for ( each child C of N, from left to right )
{
visit(C);
}
evaluate semantic rules at node N;
}
Example: Depth-First
Traversals
expr.t = 95-2+
expr.t = 9 term.t = 5
term.t = 9
9 - 5 + 2
rest
term 5 { print(‘5’) }
expr expr + term { print(‘+’)
}
9 { print(‘9’) } expr expr – term { print(‘-’) }
expr term
term 0 { print(‘0’) }
term 1 { print(‘1’) }
…
Parsing
The process of determining if a string of
terminals (tokens) can be generated by a
grammar.
Time complexity:
For any CFG there is a parser that takes at most
O(n3) time to parse a string of n terminals.
Linear algorithms suffice to parse essentially all
languages that arise in practice.
Two kinds of methods
Top-down: constructs a parse tree from root to leaves
Bottom-up: constructs a parse tree from leaves to root
Top-Down Parsing
Recursive descent parsing is a top-down method
of syntax analysis in which a set of recursive proced
ures is used to process the input.
One procedure is associated with each nonterminal of a gr
ammar.
If a nonterminal has multiple productions, each production i
s implemented in a branch of a selection statement based
on input lookahead information
Predictive parsing
A special form of recursive descent parsing
The lookahead symbol unambiguously determines the flow
of control through the procedure body for each nonterminal.
An Example: Top-Down
Parsing
stmt expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
optexpr
| expr
stmt
rser
match(expr); match(‘)’);
stmt(); break;
case for:
match(for); match(‘(‘);
optexpr(); match(‘;’);
stmt expr ; optexpr(); match(‘;’);
| if ( expr ) stmt optexpr(); match(‘)’);
| for ( optexpr ; optexpr ; optexpr ) stmt stmt(); break;
| other case other:
match(other); break;
default:
report(“syntax error”);
}
Use ε- }
Productions
optexpr | expr
void optexpr() {
if ( lookahead == expr ) match(expr);
}
void match(terminal t) {
if ( lookahead == t )
lookahead = nextTerminal;
else
report(“syntax error”);
}
Example: Predictive Parsing
Parse LL(1)
Tree stmt
Input
lookahead
FIRST
FIRST() is the set of terminals that appear a
s the first symbols of one or more strings gen
erated from
is Sentential Form
Example
FIRST(stmt) = { expr, if, for, other }
FIRST(expr ;) = { expr }
stmt expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
Examples: First
type simple
| ^ id
| array [ simple ] of type
simple integer
| char
| num dotdot num
Example:
stmt if ( expr ) stmt
| if ( expr ) stmt else stmt
A R
R
β α α …. α β α α …. α ε
expr term
helper
term
9 - 5 + 2
Conclusion: Parsing and
Translation Scheme
Give a CFG grammar G as below:
expr expr + term { print(‘+’) }
expr expr – term { print(‘-’) }
expr term
term 0 { print(‘0’) }
term 1 { print(‘1’) }
…
term 9 { print(‘9’) }
term rest
2 { print(‘2’) } ε
void term() {
if ( lookahead is a digit ) {
t = lookahead; match(lookahead);
print(t);
}
else
report(“syntax error”);
}
Conclusion: Parsing and
Translation Scheme
Step 3
Simplifying the Translator
void rest() {
void rest() { while ( true ) {
if ( lookahead == ‘+’ ) { if ( lookahead == ‘+’ ) {
match(‘+’); term(); match(‘+’); term();
print(‘+’); rest(); print(‘+’); continue;
} }
else if (lookahead == ‘-’) { else if (lookahead == ‘-’) {
match(‘-’); term(); match(‘-’); term();
print(‘-’); rest(); print(‘-’); continue;
} }
else { } break;
}
}
Conclusion: Parsing and
Translation
Complete Scheme
import java.io.*;
class Parser {
static int lookahead;