0% found this document useful (0 votes)
75 views

Compiler 2

This chapter introduces the concepts needed to build a simple syntax-directed translator. It discusses defining a context-free grammar, parsing source code using predictive parsing, and implementing syntax-directed translation to generate intermediate code. The chapter also provides examples of translating infix arithmetic expressions to postfix expressions and mapping code fragments to three-address code.

Uploaded by

mamudu francis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Compiler 2

This chapter introduces the concepts needed to build a simple syntax-directed translator. It discusses defining a context-free grammar, parsing source code using predictive parsing, and implementing syntax-directed translation to generate intermediate code. The chapter also provides examples of translating infix arithmetic expressions to postfix expressions and mapping code fragments to three-address code.

Uploaded by

mamudu francis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Chapter 2

Chang Chi-Chung
2008.03 rev.1
A Simple Syntax-Directed
Translator
 This chapter contains introductory material to
Chapters 3 to 8
 To create a syntax-directed translator that maps
infix arithmetic expressions into postfix
expressions.
 Building a simple compiler involves:
 Defining the syntax of a programming language
 Develop a source code parser: for our compiler
we will use predictive parsing
 Implementing syntax directed translation to
generate intermediate code
A Code Fragment To Be
Translated
To extend syntax-directed translator to map code fragments into three-
address code. See appendix A.

1: i = i + 1
2: t1 = a [ i ]
{ 3: if t1 < v goto 1
int i; int j; 4: j = j -1
float[100] a; float v; float x; 5: t2 = a [ j ]
while (true) { 6: if t2 > v goto 4
do i = i + 1; while ( a[i] < v ); 7: ifFalse i >= j goto 9
do j = j – 1; while ( a[j] > v ); 8: goto 14
if ( i>= j ) break; 9: x = a [ i ]
x = a[i]; a[i] = a[j]; a[j] = x; 10: t3 = a [ j ]
} 11: a [ i ] = t3
} 12: a [ j ] = x
13: goto 1
14:
A Model of a Compiler Front
End
Source Lexical Token Syntax Intermediate Three-address
program analyzer stream Parser tree Code code
Generator
Character
Stream

Symbol
Table
Two Forms of Intermediate
Code
 Abstract syntax trees  Tree-Address instructions

do-while

1: i = i + 1
body > 2: t1 = a [ i ]
3: if t1 < v goto
1
assign [] v

i + a i

i 1
Syntax Definition

 Using Context-free grammar (CFG)


 BNF: Backus-Naur Form
 Context-free grammar has four components:
 A set of tokens (terminal symbols)
 A set of nonterminals
 A set of productions
 A designated start symbol
Example of CFG

 G = <T, N, P, S>
 T = { +,-,0,1,2,3,4,5,6,7,8,9 }
 N = { list, digit }
 P=
 list  list + digit

 list  list – digit

 list  digit

 digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
 S = list
Derivations

 The set of all strings (sequences of tokens) g


enerated by the CFG using derivation
 Begin with the start symbol
 Repeatedly replace a nonterminal symbol in the c
urrent sentential form with one of the right-hand si
des of a production for that nonterminal
Example of the Derivations
list  Production
 list + digit  list  list + digit
 list - digit + digit  list  list – digit
 digit - digit + digit
 list  digit
 9 - digit + digit
 digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
 9 - 5 + digit
9-5+2
 Leftmost derivation
 replaces the leftmost nonterminal (underlined) in each step.
 Rightmost derivation
 replaces the rightmost nonterminal in each step.
Parser Trees
 Given a CFG, a parse tree according to the grammar is a tree wit
h following propertes.
 The root of the tree is labeled by the start symbol

 Each leaf of the tree is labeled by a terminal (=token) or 

 Each interior node is labeled by a nonterminal

 If A  X X … X is a production, then node A has immediate chil


1 2 n
dren X1, X2, …, Xn where Xi is a (non)terminal or  ( denotes the
empty string)

 Example A
 A  XYZ

X Y Z
Example of the Parser Tree
 Parse tree of the string 9-5+2 using grammar G
list

list digit

list digit

digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
Ambiguity

 Consider the following context-free grammar


G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

P = string  string + string | string - string | 0 | 1 | … | 9

 This grammar is ambiguous, because more


than one parse tree represents the string 9-
5+2
Ambiguity (Cont’d)

string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2
Associativity of Operators
 Left-associative
 If an operand with an operator on both sides of it, then it belo
ngs to the operator to its left.
 string a+b+c has the same meaning as (a+b)+c
 Left-associative operators have left-recursive productions
 left  left + term | term
 Right-associative
 If an operand with an operator on both sides of it, then it belo
ngs to the operator to its right.
 string a=b=c has the same meaning as a=(b=c)
 Right-associative operators have right-recursive productions
 right  term = right | term
Associativity of Operators (co
nt’d)
list right

list digit letter right

list digit letter right

digit letter

a + b + c a = b = c

left-associative right-associative
Precedence of Operators
 String 9+5*2 has the same meaning as 9+(5*2)
 * has higher precedence than +
 Constructs a grammar for arithmetic expression
s with precedence of operators.
 left-associative : + - (expr)
 left-associative : * / (term)

Step 1: Step 3:
factor  digit | ( expr ) expr  expr + term
| expr – term
| term
Step 2: Step 4:
term  term * factor expr  expr + term | expr – term | term
| term / factor term  term * factor | term / factor | factor
| factor factor  digit | ( expr )
An Example: Syntax of
Statements
 The grammar is a subset of Java statements.

 This approach prevents the build-up of semicolons a


fter statements such as if- and while-, which end wit
h nested substatements.
stmt  id = expression ;
| if ( expression ) stmt
| if ( expression ) stmt else stmt
| while ( expression ) stmt
| do stmt while ( expression ) ;
| { stmts }

stmts  stmts stmt


| 
Syntax-Directed Translation
 Syntax-Directed translation is done by attaching rules
or program fragments to productions in a grammar.
 Translate infix expressions into postfix notation. ( in t
his chapter )
 Infix: 9 – 5 + 2
 Postfix: 9 5 – 2 +
 An Example
 expr  expr1 + term
 The pseudo-code of the translation

translate expr1 ;
translate term ;
handle + ;
Syntax-Directed Translation
(Cont’d)
 Two concepts (approaches) related to
Syntax-Directed Translation.
 Synthesized Attributes
 Syntax-directed definition
 Build up a translation by attaching strings (semantic
rules) as attributes to the nodes in the parse tree.
 Translation Schemes
 Syntax-directed translation
 Build up a translation by program fragments which are
called semantic actions and embedded within production
bodies.
Syntax-directed definition
 The syntax-directed definition associates
 With each grammar symbol (terminals and nonterminals), a s
et of attributes.
 With each production, a set of semantic rules for computing t
he values of the attributes associated with the symbols appea
ring in the production.
 An attribute is said to be
 Synthesized
 if its value at a parse-tree node is determined from attribute valu
es at its children and at the node itself.
 Inherited
 if its value at a parse-tree node is determined from attribute valu
es at the node itself, its parent, and its siblings in the parse tree.
An Example: Synthesized
Attributes
An annotated parse tree
 Suppose a node N in a parse tree is labeled by gr
ammar symbol X.
 The X.a is denoted the value of attribute a of X at
node N.
expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2
Semantic Rules
Production Semantic Rules
expr  expr1 + term expr.t = expr1.t || term.t || ‘+’
expr  expr1 - term expr.t = expr1.t || term.t || ‘-’
expr  term expr.t = term.t
term  0 term.t = ‘0’
term.t = ‘1’
term  1


term.t = ‘9’
term  9
|| is the operator for string concatenation in semantic rule.
Depth-First Traversals
 Tree traversals
 Breadth-First
 Depth-First
 Preorder: N L R
 Inorder: L N R
 Postorder: L R N
 Depth-First Traversals: Postorder 、 From left to right
procedure visit(node N)
{
for ( each child C of N, from left to right )
{
visit(C);
}
evaluate semantic rules at node N;
}
Example: Depth-First
Traversals
expr.t = 95-2+

expr.t = 95- term.t = 2

expr.t = 9 term.t = 5

term.t = 9

9 - 5 + 2

Note: all attributes are the synthesized type


Translation Schemes
 A translation scheme is a CFG embedded
with semantic actions
 Example
 rest  + term { print(“+”) } rest

Embedded Semantic Action

rest

+ term { print(“+”) } rest


An Example: Translation
expr
Scheme
expr + term { print(‘+’) }

expr - term { print(‘-’) } 2 { print(‘2’) }

term 5 { print(‘5’) }
expr  expr + term { print(‘+’)
}
9 { print(‘9’) } expr  expr – term { print(‘-’) }
expr  term
term  0 { print(‘0’) }
term  1 { print(‘1’) }


Parsing
 The process of determining if a string of
terminals (tokens) can be generated by a
grammar.
 Time complexity:
 For any CFG there is a parser that takes at most
O(n3) time to parse a string of n terminals.
 Linear algorithms suffice to parse essentially all
languages that arise in practice.
 Two kinds of methods
 Top-down: constructs a parse tree from root to leaves
 Bottom-up: constructs a parse tree from leaves to root
Top-Down Parsing
 Recursive descent parsing is a top-down method
of syntax analysis in which a set of recursive proced
ures is used to process the input.
 One procedure is associated with each nonterminal of a gr
ammar.
 If a nonterminal has multiple productions, each production i
s implemented in a branch of a selection statement based
on input lookahead information
 Predictive parsing
 A special form of recursive descent parsing
 The lookahead symbol unambiguously determines the flow
of control through the procedure body for each nonterminal.
An Example: Top-Down
Parsing
stmt  expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
optexpr  
| expr

stmt

for ( optexpr ; optexpr ; optexpr ) stmt

ε expr expr other


void stmt() {

Pseudocode For switch ( lookahead ) {


case expr:
match(expr); match(‘;’); break;
a Predictive Pa case if:
match(if); match(‘(‘);

rser
match(expr); match(‘)’);
stmt(); break;
case for:
match(for); match(‘(‘);
optexpr(); match(‘;’);
stmt  expr ; optexpr(); match(‘;’);
| if ( expr ) stmt optexpr(); match(‘)’);
| for ( optexpr ; optexpr ; optexpr ) stmt stmt(); break;
| other case other:
match(other); break;
default:
report(“syntax error”);
}
Use ε- }

Productions
optexpr   | expr
void optexpr() {
if ( lookahead == expr ) match(expr);
}

void match(terminal t) {
if ( lookahead == t )
lookahead = nextTerminal;
else
report(“syntax error”);
}
Example: Predictive Parsing
Parse LL(1)
Tree stmt

for ( optexpr ; optexpr ; optexpr ) stmt

match(for) match(‘(‘) optexpr()match(‘;‘)


optexpr()match(‘;‘) optexpr()
match(‘)‘) stmt()

Input

for ( ; expr ; expr ) other

lookahead
FIRST
 FIRST() is the set of terminals that appear a
s the first symbols of one or more strings gen
erated from 
  is Sentential Form
 Example
 FIRST(stmt) = { expr, if, for, other }
 FIRST(expr ;) = { expr }
stmt  expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
Examples: First
type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num

FIRST(simple) = { integer, char, num }


FIRST(^ id) = { ^ }
FIRST(type) = { integer, char, num, ^, array }
Designing a Predictive Parser
 A predictive parser is a program consisting of a
procedure for every nonterminal.
 The procedure for nonterminal A
 It decides which A-production to use by examining t
he lookahead symbol.
 Left Factor
 Left Recursion
 ε Production
 Mimics the body of the chosen production.
 Applying translation scheme
 Construct a predictive parser, ignoring the actions.
 Copy the actions from the translation scheme into th
e parser
Left Factor
 Left Factor
 One production for nonterminal A starts with the s
ame symbols.

 Example:
stmt  if ( expr ) stmt
| if ( expr ) stmt else stmt

 Use Left Factoring to fix it


stmt  if ( expr ) stmt rest
rest  else stmt | ε
Left Recursion
 Left Recursive
 A production for nonterminal A starts with a self re
ference.
 A  Aα | β
 An Example:
 expr  expr + term | term
 Rewrite the left recursive to right recursive by
using the following rules.
A  βR
R  αR | ε
Example: Left and Right
Recursive A A
… R
A R
A …

A R
R
β α α …. α β α α …. α ε

left recursive right recursive


Abstract and Concrete
Syntax +
- 2
expr
9 5
expr term

expr term
helper
term

9 - 5 + 2
Conclusion: Parsing and
Translation Scheme
 Give a CFG grammar G as below:
expr  expr + term { print(‘+’) }
expr  expr – term { print(‘-’) }
expr  term
term  0 { print(‘0’) }
term  1 { print(‘1’) }

term  9 { print(‘9’) }

 Semantic actions for translating into postfix notation.


Conclusion: Parsing and
Translation
 Step 1
Scheme
 To elimination left-recursion
 Technique
A  Aα | Aβ | γ
into
A  γR
R  αR | βR | ε
 Use the rule to transforms G.
Conclusion: Parsing and
Translation Scheme
 Left-Recursion-elimination
expr  term rest
rest  + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term  0 { print(‘0’) }
term  1 { print(‘1’) }

term  9 { print(‘9’) }
An Example: Left-Recursion-elimin
ation expr

term rest

9 { print(‘9’) } - term { print(‘-’) } rest

5 { print(‘5’) } + term { print(‘+’) } rest

2 { print(‘2’) } ε

expr  term rest


rest  + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term  0 { print(‘0’) } | 1 { print(‘1’) } | … | 9 { print(‘9’) }
Conclusion: Parsing and
 Translation Scheme
void expr() {
Step 2 term(); rest();
}
 Procedures for
void rest() {
Nonterminals. if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); rest();
}
else if ( lookahead == ‘-’ ) {
match(‘-’); term();
print(‘-’); rest();
}
else { } //do nothing with the input
}

void term() {
if ( lookahead is a digit ) {
t = lookahead; match(lookahead);
print(t);
}
else
report(“syntax error”);
}
Conclusion: Parsing and
Translation Scheme
 Step 3
 Simplifying the Translator
void rest() {
void rest() { while ( true ) {
if ( lookahead == ‘+’ ) { if ( lookahead == ‘+’ ) {
match(‘+’); term(); match(‘+’); term();
print(‘+’); rest(); print(‘+’); continue;
} }
else if (lookahead == ‘-’) { else if (lookahead == ‘-’) {
match(‘-’); term(); match(‘-’); term();
print(‘-’); rest(); print(‘-’); continue;
} }
else { } break;
}
}
Conclusion: Parsing and
 Translation
Complete Scheme
import java.io.*;

class Parser {
static int lookahead;

public Parser() throws IOException {


lookahead = System.in.read();
}

void expr() { void term() throws IOException {


term(); if (Character.isDigit((char)lookahead){
while ( true ) { System.out.write((char)lookahead);
if ( lookahead == ‘+’ ) { match(lookahead);
match(‘+’); term(); }
System.out.write(‘+’); else throw new Error(“syntax error”);
continue; }
}
else if (lookahead == ‘-’) { void match(int t) throws IOException {
match(‘-’); term(); if ( lookahead == t )
System.out.write(‘-’); lookahead = System.in.read();
continue; else throw new Error(“syntax error”);
} }
else return; }
}

You might also like