Syntax Directed Translation
Syntax Directed Translation
Z Z.VAL TOP
Y Y.VAL
X X.VAL
S.D.T for Desk Calculator
Production Semantic Action
S E$ { print E.VAL }
E E(1) + E(2) {E.VAL := E(1) . VAL + E(2) .VAL }
E E(1) * E(2) {E.VAL := E(1) . VAL * E(2) .VAL }
E (E(1)) {E.VAL := E(1) . VAL }
EI {E.VAL := I . VAL }
I I(1) digit {I.VAL := 10*I(1) . VAL + LEXVAL }
I digit {I.VAL := LEXVAL }
Implementation of Desk calculator
Production Program Fragment
S E$ print VAL[TOP]
E E(1) + E(2) VAL[TOP] := VAL[TOP]+VAL[TOP-2]
E E(1) * E(2) VAL[TOP] := VAL[TOP]*VAL[TOP-2]
E (E(1)) VAL[TOP] := VAL[TOP -1]
EI NONE
I I(1) digit VAL[TOP] := 10*VAL[TOP]+LEXVAL
I digit VAL[TOP] := LEXVAL
Intermediate Code
In many compilers the source code is translated
into a language which in intermediate in
complexity between a programming language
and machine code.
acd-ac+ac*?ab+?
One language that normally uses postfix
notation is SNOBOL language.
Control flow in Postfix code:
The solution for control flow code is introducing
jump and variety of conditional jumps such as
jump, jlt or jeqz etc..
Ex: if e then x else y
Can be represented in control flow as
e l1 jeqz x l2 jump l1:y l2:
Similarly for
if a then if c-d then a+c else a*c else a+b
E id E.CODE:= id
Syntax Trees:
The parse tree itself is a suitable intermediate
representation. A parse tree however contains
redundant information which can be eliminated.
E id {E.VAL:= LEAF(id) }
Three Address Code:
Is a sequence of statements, typically of the
general form A := B op C.
Ex: for the expression X + Y * Z can be written as
T1 := Y * Z
T2 := X + T1
Where T1 and T2 are compiler generated
temporary names.
Some of the additional Three Address statements
used in Prog. Environment:
• A := B op C
• A := op B
• Goto L
• If a relop b goto L
• Param A and call P,n
Indexed assignment statements like
• A := B[I] and A[I] := B
• A := addr B A := *B *A := B
Quadruples:
We may use a record structure with four fields.
Op, Arg1, Arg2 and Result. This representation is
called quadruples.
Ex: A := -B * (C + D)
T1 := -B
T2 := C+D
T3 := T1 * T2
A := T3
The quadruples for the statements are:
Triples:
To avoid entering temporary variables into symbol
table, one can allow the statement computing a
temporary value to represent that value.
The Triples for the statements are:
Op Arg1 Arg2
(0) Uminus B -
(1) + C D
(2) * (0) (1)
(3) := A (2)
Indirect Triples:
Another implementation of three address code
which has been considered is that of listing pointers
to triples, rather than listing the triples them selves.
This implementation is naturally called indirect triples.
The Indirect Triples for the statements are:
Indirect Triples:
Another implementation of three address code
which has been considered is that of listing pointers
to triples, rather than listing the triples them selves.
This implementation is naturally called indirect triples.
S.D.T for Assignment statements
Assume that all identifiers denote primitive data
types. First we begin with a simple scheme in which
semantic checking is not necessary.
T1 := B and C
T2 := A or T1
A relational expression such as A<B is equivalent to the
conditional statement
If A<B then 1 else 0
Which can be translated into three address statement
(1) If A<B goto (4 )
(2) T := 0
(3) Goto (5 )
(4) T := 1
Thus the boolean expression A<B or C
can be translated into
(1) If A<B goto (4 )
(2) T := 0
(3) Goto (5 )
(4) T := 1
(5) T2 := T or C
S.D.T For Boolean expressions with
Numerical representation:
Production Semantic Rule
E E(1) or E(2) { T := NEWTEMP();
E.PLACE := T;
GEN(T:= E(1) .PLACE or E(2) .PLACE)}
E id(1) relop id(2) {T := NEWTEMP();
E.PLACE := T;
GEN(if id(1) .PLACE relop id(2) .PLACE goto
NEXTQUAD+3);
GEN(T:=0);
GEN(GOTO NEXTQUAD+2);
S.D.T For Boolean expressions with Control Flow
representation:
If we evaluate the programs by position, we may
be able to avoid evaluating entire expression.
Ex: given the expression A or B, if we determine A
is true, then we can conclude the expression is
true with evaluating expression B.
Similarly for expression A and B, if we
determine A is False, then we can conclude the
expression is false with evaluating expression B.
The semantic definition determines whether all
parts of a boolean expression must be
In the context of conditional
statements:
If E then S else S and While E do S
(1) (2)
GOTO
FALSE
An expression E will be translated into a
sequence of three address statements , that
evaluate E. This translation is a sequence of
conditional and unconditional jumps to one of
two locations.
Ex: consider the expression E(1) or E(2) . If E(1) is
true, then we know that E itself is True. So we
can make the location True for E(1) be the same
as True for E.
If E(1) is false then we must evaluate E(2) so we
make false for E(1) be the first statement in the
code for E(2) . The true and false exits of E(2) is
The problem arises when we produce code
bottom up is that we may not generate the
actual quadruples to which jumps are to be
made.
We call the subsequent filling in of quadruples
backpatching.
To manipulate the list of quadruples, we use
three functions.
MAKELIST()
MERGE(p1, p2)
BACKPATCH(p,i)
MAKELIST(): creates a new list containing only I,
an index into the array of quadruples being
generated.
SA
{ S.NEXT := MAKELIST() }
L L(1); MS
{BACKPATCH(L(1).NEXT, M.QUAD ) }
L.NEXT := S.NEXT }
LS
{ L.NEXT := S.NEXT }
Postfix Translations:
We call a translation scheme postfix if for each
production A α, the translation rule for A.CODE consists of
the concatenation of the translation of Non terminals in α, in
the same order as they appear, followed by tail of output.
Since there is only one production for B, the two new productions can be
used exactly where the old one was.
Postfix Translations:
Ex: S while M(1)E do M(2)S for the while statement.
M(1) could record the first quadruple of the code for E, and
M(2) the first quadruple of the code for S.
S CS
C WE do
W While
where C and W are new non terminals introduced for the
purpose of postfix translation. We could give W the
translation W.QUAD, which would serve for M(1) .QUAD and C
would have the translation C.QUAD with the same value as
When we reduce to C, we can backpatch
E.TRUE to position NEXTQUAD immediately
so M(2) .QUAD is never needed.
A suitable S.D.T is
W While { W.QUAD := NEXTQUAD }
C WE do { C.QUAD := W.QUAD;
BACKPATCH(E.TRUE, NEXTQUAD);
C.FALSE := E.FALSE }
S C S(1) { BACKPATCH(S(1).NEXT, C.QUAD);
S.NEXT := C.FALSE;
GEN(GOTO C.QUAD) }
S.D.T for Procedure Call Statements:
Grammar for simple procedure call statements
S call id( elist )
elist elist, E
elist E
• The translation for a call includes a calling
sequence, a sequence of actions taken on entry to
an exit from each procedure.
• Calling sequence can differ, even for the
implementation of the same language.
• First the arguments are evaluated and put in some
known place so that they may be accessed by the
• Also put in a known place is the return address.
• Let us assume that parameters are passed by
reference. When generating three address code
for this type of call, it is sufficient to generate
three address code to evaluate the arguments,
then followed by list of param three address
statements one for each argument.
• We need to save the value of E.PLACE for each
expression E in id(E,E,E…E).
• Convenient data structure to save these values
is Queue.
S.D.T:
S call id(elist)
{ for each item p on QUEUE do
GEN(param p);
GEN(call id.PLACE) }
elist elist, E
{ append E.PLACE to the end of QUEUE }
elist E
{ initialize QUEUE to contain only E.PLACE
}
S.D.T for declaration Statements:
The simplest form of declaration syntax found in
programming languages is a keyword denoting an
attribute, followed by list of names having that
attribute.
Grammar:
D integer namelist
real namelist
namelist id, namelist
id
The problem with this is that one cannot get the
attribute associated with namelist until the entire
To adopt this difficulty we can go for another variation
Grammar:
D integer intlist
real reallist
intlist id, intlist
id
reallist id, reallist
id
Identifier DIMP
LE
Identifier . 6 (length)
Identifier Identifier
information information
R F D I MP L E
Reusing Symbol table space:
• The identifier used by a program to denote a
particular name must be preserved in the
symbol table until no further references to
that identifier.
• This is essential so that all uses of identifier
can be associated with same symbol table
entry.
• However a compiler can be designed to run in
less space if the space used to store id’s can be
reused by subsequent passes.
Reusing Symbol table space:
• The identifier used by a program to denote a
particular name must be preserved in the
symbol table until no further references to
that identifier.
• This is essential so that all uses of identifier
can be associated with same symbol table
entry.
• However a compiler can be designed to run in
less space if the space used to store id’s can be
reused by subsequent passes.
• One exception concerns external names.
• When a name is declared external to the
program, the corresponding identifier must be
preserved.
UL1 UL2
UL3 B
FORTRAN:
• A FORTRAN program consists of main program,
subroutines and functions.
• Each name has a scope consisting of one routine only.
• We can generate object code for each routine upon
reaching the end of that routine.
• If we do so it is possible that most of the information
in the symbol table can be expunged.
• We need to preserve only names that are external to
the routine just processed.
Hash table permanent reusable
data storage data
storage
. NAME 1 NAME 3
NAME 2 NAME 4
Local data
SP
Old SP
Return value
Return address
Arg. Count
Actual parameters
• In C all local data (including arrays) are of fixed size.
so the size of the activation record computed by
compiler.
• Hence a simple local name X can be referred by
X[SP], where X stands for offset of X.
Procedure calls in C:
Param T1
Param T2
.
Param Tn
Call P,n
PUSH(n) - store the argument count
PUSH(l1) - l1 is the label of return address
PUSH() - leave space for return value
PUSH(SP) - store the old stack pointer
Goto l2 - l2 is the first statement of the called
procedure P.
• The first statement of the called procedure must be a
special three address statement “procbegin”
• Which sets the stack pointer to the place holding the
old SP, and sets TOP to the top of activation record.
SP := TOP
TOP := SP + sp - sp is the size of P (local data for P)
Assuming that TOP points to the lowest numbered used
location on the stack and that memory locations are
counted by words.
We could translate each param T into PUSH(T),
where PUSH(X) stands for
TOP := TOP – 1
*TOP := X
Procedure MAIN();
Procedure P(a);
Procedure Q(b);
L1: R(x,y);
end Q;
L2: Q(z);
end P;
Procedure R(c,d);
end R;
L3: P(w);
L4: R(u,v);
end MAIN;
• When we execute MAIN at L5, we call P(w) at L3,
which calls Q(z) at L2, which calls R(x,y) at L1.
The stack and displays are as shown:
TOP
Activation Record for
R
Activation Record for
Q
Activation Record for
DISPLAY[1] P
.
DISPLAY[2] Activation Record for
. MAIN
• Suppose Q calls R(x,y) at L1, and R has the
following declaration of local data
Integer I;
Real Array A[0:n-1, 1:m];
Real Array B[2:10];
• Activation Record for R(x,y)
TOP
Activation Record for
R
Activation Record for
Q
Activation Record for
P
DISPLAY[1] .
Activation Record for
DISPLAY[2] . MAIN
DISPLAY[3] .
LEVEL 3
The display pointers for Procedure R are as shown:
TOP
Activation Record for
R
Activation Record for
Q
Activation Record for
P
. Activation Record for
DISPLAY[1] . MAIN
DISPLAY[2]
l1 is 3 for Q
l3 is 1 for MAIN
l1 – l3 = 3-1 =2 pop 2 display pointers for display
Error Detection and Recovery:
Properties:
• Should pin point the errors in terms of original
source program.
• Understandable by the user.
• Should be specific and should localize the problem
• Should not be redundant.
Sources of error:
IFA = B THEN
SUM = SUM + A;
ELSE
SUM = SUM – A ;
Minimum distance correction of errors:
• On theoretical way of defining errors and their
location is the minimum hamming distance method.
• We define a collection of error transformations.
• A program P has k errors if the shortest sequence of
error transformations that will map any valid
program in to P has length k.
Lexical Syntactic
corrector corrector
Lexical Semantic
Parser checker
analyzer tokens intermediate code
Source
code
Lexical phase errors
• The lexical analyzer detects an error when it discovers that an input's
prefix does not fit the specification of any token class.
• After detecting an error, the lexical analyzer can invoke an error
recovery routine. This can entail a variety of remedial actions.
• The simplest possible error recovery is to skip the erroneous
characters until the lexical analyzer finds another token.
• But this is likely to cause the parser to read a deletion error, which can
cause severe difficulties in the syntax analysis and remaining phases.
• One way the parser can help the lexical analyzer can improve its
ability to recover from errors is to make its list of legitimate tokens (in
the current context) available to the error recovery routine.
• The error-recovery routine can then decide whether a remaining
input's prefix matches one of these tokens closely enough to be treated
as that token.
Syntactic Phase errors
• A parser detects an error when it has no legal move from its
current configuration.
• The LL (1) and LR (1) parsers use the valid prefix property;
therefore, they are capable of announcing an error as soon as they
read an input that is not a valid continuation of the previous
input's prefix.
• This is earliest time that a left-to-right parser can announce an
error. But there are a variety of other types of parsers that do not
necessarily have this property.
• The advantages of using a parser with a valid-prefix-property
capability is that it reports an error as soon as possible, and it
minimizes the amount of erroneous output passed to subsequent
phases of the compiler.
Panic Mode Recovery
• Panic mode recovery is an error recovery method
that can be used in any kind of parsing, because
error recovery depends somewhat on the type of
parsing technique used.
• In panic mode recovery, a parser discards input
symbols until a statement delimiter, such as a
semicolon or an end, is encountered .
• The parser then deletes stack entries until it finds an
entry that will allow it to continue parsing, given the
synchronizing token on the input.
• This method is simple to implement, and it never
gets into an infinite loop.
Error Recovery in LR Parsing:
• A systematic method for error recovery
in LR parsing is to scan down the stack until a
state S with a goto on a particular nonterminal A is
found
• Then discard zero or more input symbols until a
symbol a is found that can legitimately follow A .
• The parser then shifts the state goto [ S, A ] on the
stack and resumes normal parsing.
• There might be more than one choice for the
nonterminal A . Normally, these would be
nonterminals representing major program pieces,
such as statements.
• Another method of error recovery that can be
implemented is called "phrase level recovery".
• Each error entry in the LR parsing table is examined, and,
based on language usage, an appropriate error-recovery
procedure is constructed .
• For example, to recover from an construct error that
starts with an operator, the error-recovery routine will
push an imaginary id onto the stack and cover it with the
appropriate state.
• While doing this, the error entries in a particular state
that call for a particular reduction on some input symbols
are replaced by that reduction.
• This has the effect of postponing the error detection until
one or more reductions are made; but the error will still
State Id + * ( ) $ E
0 S3 S2 1
1 S4 S5 ACCEP
T
2 S3 S2 6
3 R4 R4 R4 R4
4 S3 S2 7
5 S3 S2 8
6 S4 S5 S9
7 R1 S5 R1 R1
8 R2 R2 R2 R2
9 R3 R3 R3 R3
State Id + * ( ) $ E
0 S3 e1 e1 S2 e2 e1 1
1 e3 S4 S5 e3 e2 ACCEP
T
2 S3 e1 e1 S2 e2 e1 6
3 r4 R4 R4 R4 R4 R4
4 S3 e1 e1 S2 e2 e1 7
5 S3 e1 e1 S2 e2 e1 8
6 e3 S4 S5 e3 S9 e4
7 R1 R1 S5 R1 R1 R1
8 R2 R2 R2 R2 R2 R2
9 R3 R3 R3 R3 R3 R3
• e1: this routine is called from states 0,2,4 and 5
all of which expect the beginning of operand.
Instead an operator or end of output was
found.
Action: push an imaginary id on to the stack and
cover it with state 3
error Diagnostic: Missing Operand
• e2: This routine is called from states 0,1,2,4 and
5 on finding a right parenthesis
action: remove the next input symbol
error diagnostic: Unbalanced right Parenthesis
• e3: this routine is called from states 1 or 6 when
expecting an operator and an id or right
parenthesis is found
Action: push + on to the stack and cover it with
state 4
error Diagnostic: Missing Operator
• e4: This routine is called from states 6 when the
end of input is found. State 6 expects an
operator or a right parenthesis
action: push right parenthesis on to the stack
and cover it with state 9
error diagnostic: Missing right Parenthesis
• Consider the input id+)
Stack input
0 id+)$
0id3 +)$
0E1 +)$
0E1+4 )$ /* ( removed by routine e2
0E1+4 $
0E1+4id3 $ /* id pushed onto the stack by e1
0E1+4E7 $
0E1 $ /* parsing completed */
Error Recovery in LL Parsing:
E E – TE` E – TE`
E` E` E` € E` €
+TE`
T T FT` T FT`
T` T` € T` T` € T` €
*FT`
F F id F (E)
id + * ( ) $
E E – TE` e1 e1 E – TE` e1 e1
E` E` € E` +TE` E` € E` € E` € E` €
T T FT` e1 e1 T FT` e1 e1
T` T` € T` € T` *FT` T` € T` € T` €
F F id e1 e1 F (E) e1 E1
Id Pop
+ Pop
* Pop
( Pop
) e2 e2 e2 e2 Pop e2
$ e3 e3 e3 e3 e3 accept
• The table includes pop actions which match an
input symbol against an identical terminal on
the stack.
• Some entries are still left blank. These entries
which can obviously never be exercised, even
on erroneous input.
• We also inserted E` € and T` € in certain
places where an error can be detected.
• This may postpone some error detection, but
cannot cause an error to be missed.
• e1: this routine is called when an operand
beginning with an id or a left parenthesis is
expected. But an operator , right parenthesis or
the end of input was found.
Action: push id on to the input
error Diagnostic: Missing Operand
• e2: here we have right parenthesis on top of
the stack but not on the input.
action: pop the right parenthesis from the stack
error diagnostic: Missing right Parenthesis
• e3: the stack has been emptied, but input
remains.
Action: remove remaining all symbols from
input.
error Diagnostic: unexpected input
Semantic errors:
The primary sources of semantic errors are
– Undeclared names
– Type incompatibilities
• Recovery from undeclared names is straight forward.
• First time we encounter an undeclared name we
make an entry for that name in symbol table with
appropriate attributes.
• The attributes can be determined by the context in
which the name is used.
• A flag in the symbol table entry is set to indicate that
the entry was made in response to a semantic error
rather than a declaration.
Code Generation
• Input to our code generation is an optimized
intermediate code that can be a sequence of
– Quadruples
– Triples
– A Tree
– Postfix polish string
• The output of Code generator is the object
program. This may take a variety of forms
– An absolute Machine language program
– A relocatable Machine language program
– An assembly language program
– Some other programming language
• The advantage of generating absolute machine code
is it can be placed in a fixed memory location and
immediately executed.
ex: student job compilers
• Relocatable object program allows subprograms to
be compiled separately. Set of relocatable object
modules can be linked together and loaded for
executing.
– More flexible
• Producing an assembly language as output
makes the process easier.
• Producing high level language as output
simplifies code generation even further.
FORTRAN is the output of ALTRAN Compiler.
Problems in code generation: