0% found this document useful (0 votes)
8 views

Semantic Analysis

Semantic analysis in compilers involves translating source program constructs by generating intermediate code, entering information into a symbol table, and performing semantic checks. It requires a framework to specify the syntactic structure and associated actions, leading to the development of syntax-directed definitions (SDDs) and syntax-directed translation schemes (SDTS) for implementation. The document also discusses synthesized and inherited attributes, dependency graphs, and methods for translating arithmetic and boolean expressions into intermediate code forms.

Uploaded by

yashk6615
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Semantic Analysis

Semantic analysis in compilers involves translating source program constructs by generating intermediate code, entering information into a symbol table, and performing semantic checks. It requires a framework to specify the syntactic structure and associated actions, leading to the development of syntax-directed definitions (SDDs) and syntax-directed translation schemes (SDTS) for implementation. The document also discusses synthesized and inherited attributes, dependency graphs, and methods for translating arithmetic and boolean expressions into intermediate code forms.

Uploaded by

yashk6615
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Semantic Analysis

What is semantic analysis


 Translating a source program construct
involves the following actions on the part of the
compiler:
1. generating an equivalent intermediate or
object code.
2. Entering information into symbol table.
3. Performing semantic checks.
 This requires interpreting a construct, which
requires gathering the semantic information.
 Therefore semantic analysis involves
gathering semantic inofmation, and using
it to interpret the construct to get it
translated
Specifying Translations
 Since the meaning of a construct is related to its
syntactic structure.

 We therfore need a framework through which it is


possible to specify the syntactic structure of the
construct, as well as the actions to be carried out by
the compiler for translating that construct.

 Hence specification of syntactic structure is a part of


specification of translation
Specifying Translations
 We therefore suitably extend the CFG notation that we
use to specify the syntax, so as to allow for specifying
the actions to be carried out by the compiler.

 Actions to be carried out by the compiler involves


manipulating values of various quantities: For example
translation of a construct; int a,b,c, requires entering
type attribute int into the symbol table records for the
names a,b, and c respectively.

 Compiler is therefore required to keep track of these


quantities.
Specifying Translations
 Therefore the CFG is required to be extended to provide
for:
1. specifying the placeholders for keeping track of these
quantities.
And
2. specifying the rules for manipulating these quantities.
 We therefore extend the CFG by associating:
1. Attributes with grammar symbols, which will be used
for holding the values of the quantities.
And
2. Set of rules with the productions for manipulating these
attributes of the grammar symbols through which we can
specify the manipulations of these quantities
 This extension is called as syntax directed definitions.
Syntax Directed Definitions
an Example
Productions Semantic rules
D−›TL L.type = T.type

T −› int T.Type = int

L −› L1, id L1.type = L.type


Enter(id.ptr, L.type)

L −› id Enter(id.ptr, L.type)
Abstract Specification

SDDs is more abstract framewok because it


allows us to specify the actions to be carried
for translation through attributes and
semantic rules, but it does not specify the
actual implementation of how these actions
will be carried out. So it specifies what of
semantic analysis, it does not specifies when
which semantic rule to be evaluated.
Implementation of translations
specified using SDDs
 Parser generates a parse tree
 Traverse the generated parse tree.
 During the traversal when a parse tree node
labeled by symbol X is visited, compute the
values of the attributes of the grammar symbol
X at that parse tree node.
Implementation of SDDs: An Example

D Call enter()
T T.type =int L L.type=int

Call enter()
int L.type=int L , id id.ptr =c

Call enter()
L.type =int L , id id.ptr = b

id id.ptr=a
Synthesized and Inherited attributes

 The attributes associated with grammar symbols


are classified into two categories.
Synthesized attributes
An attribute is said to be synthesized if its value at a
parse tree node is determined by the attribute values
at the child nodes.
An example
A −› XYZ A.val = f(X.val,Y.val,Z.val)
Here A.val is a synthesized attribute.
A synthesized attribute has a desirable property; it
can be evaluated in a single bottom-up traversal of
a parse tree.
Synthesized and Inherited attributes

Inherited attributes
An Inherited attribute is the one whose value at a
node in the parse tree is defined in terms of the
attributes of the parent and/or siblings of that node.
An example
D −› TL L.type = T.type
Here L.type is an inherited attribute.
Inherited attributes are convenient for expressing
the dependency of programming language
construct on the context in which it appears.
Dependency graph

 When inherited attributes are used, the


interdependencies among the attributes at the
nodes of the parse tree must be taken into account
for evaluating semantic rules for computing the
values of the attributes.
 This requires constructing a dependency graph
whose vertices are attributes at parse tree nodes,
and the edges are decided as follows:
If attribute b depends on attribute a, then there
is a directed edge from a to b.
Dependency graph

 To construct dependency graph replace a


semantic rule in the form of function call by an
assignment, assigning the return value of
function to a dummy attribute introducing a
dummy attribute.
An example
Replace the following:
L −› L1, id enter(id.ptr, L.type );
With
L −› L1, id L.dmy = enter(id.ptr, L.type );
Where L.dmy is a dummy attribute.
Deciding the order of evaluation usuing
dependency graph

 Topological sort of the dependency graph gives


the order of traversal of parse tree.
Topological sort
1. List all those vertices that have no
predecessors, remove them along with the
edges connecting them from the graph.
2. Repeat the process till all vertices gets
listed
This does not produce a unique order
Deciding the order of evaluation usuing
dependency graph

 One of the orders that we get for the above


mentioned dependency graph using topological
sort is:
id1.ptr,id2.ptr,id3,ptr,T.type,L1.type,L2.type,
L3.type,L1.dmy,L2.dmy,L3.dmy
S-attributed and L-attributed definitions

S-attributed definitions
Syntax directed definitions that use synthesized
attributes only are known as S-attributed definitions
L-attributed definitions
Syntax directed definitions are L-attributed if for every
production A −› X1 X2…. Xn an inherited attribute of Xj for
j lies between 1 and n depends on:
1. The attributes(both synthesized as well as inherited) of
the symbols X1, X2,.., Xj-1 and
2. the inherited attributes of A.
This requires the evaluation order as follows:
in(A), in(X1), syn(x1), in(x2), Syn(X2), … , in(Xn),
Syn(Xn), Syn(A).
Possibility of computing the values of
attributes alogwith parsing:

 If we are using LL parsing then to get the attribute


values computed during parsing requires that an
inherited attribute of a grammar X in the production
A −› XYZ must be computed just before deriving X, and
synthesized attribute of A must be computed after
completing the derivation of A.
 So the order of computing the values of attributes is:
in(A), in(X1), syn(x1), in(x2), Syn(X2), … , in(Xn),
Syn(Xn), Syn(A).
 This matches with the order of evaluation of L-attributed
definitions.
Possibility of computing the values of
attributes alogwith parsing:
 If we are using LR parsing then to get the attribute
values computed during parsing requires that
synthesized attribute of a grammar symbol A in the
production A −› XYZ must be computed when the parser
is on the verge of carrying out reduction using a
production A −› XYZ.
So the order of computing the values of attributes
matches with the order of evaluation of S-attributed
definitions.
Conclusion:
It is possible to get attribute values computed by evaluating
the semantic rules during the parsing itself, if we use L-
attributed definitions.(every S-attributed definition is L-
attributed).
Possibility of computing the values of
attributes alogwith parsing:
 Therefore to get the attribute values computed during
the parsing, it is required to specify the point in time
when the semantic rule will get evaluated during the
parsing process.

 This requires a framework that will allow us to specify


this implementation details also.

 Syntax Directed Translation Schemes(SDTS) is a


formalism that allows us to specify these implementation
details
Syntax Directed Translation Schemes

 SDTS also associates attributes with grammar


symbols, which will be used for holding the values of
the quantities.
and
Embeds the code fragement specifying how to
compute the values of these attributes(called as
semantic actions) within the production rules.

 The position of the semantic action in the production


rule defines the point in time when the semantic
action is executed (in the middle of production or at
the end).
Syntax Directed Translation Schemes:

For LL parsing
D −› T { L.type = T.type;} L
L −› {L1.type = L.type;} L1, id {L.dmyenter(id.ptr,
L.type);}
L −› id {L.dmy=enter(id.ptr, L.type);}
T −› int {T.type = int;}
Dependency Graph

D L1.dmy

T T.type =int L1 L1.type=int

int L2.type=int L2 L2.dmy , id3 id3.ptr =c

L3.type=int L3 , id2 id2.ptr = b

L3.dmy

id1 id1.ptr=a
Syntax Directed Translation Schemes:

For LR parsing
D −› T L{ Enter(L.place,id.place);}
L −› L1, id {L.place = L1.place; append(L.place, id.place);
L −› id {append(L.place,id.place);}
T −› int {T.type = int;}
Syntax Directed Translation Schemes:
Example(LR Parsing)

D a
T T.type =int L L.place

b
int L.place L , id
c
L.place L , id a b

id a
SDTS : Example(LR Parsing)
E −› E1 + T {E.val = E1.val +T.val;}
E −› T {E.val = T.val;}
T −› T1 * F {T.val = T1.val * F.val;}
T −› F {T.val = F.val;}
F −› NUM {F.val = NUM.val;}

For input : 2+3*5


SDTS : Example(LR Parsing)
E E.val=E.val + T.val

E.val=T.valE + T T.val=T.val*F.val

T
T.val=F.val T *
T.val=F.val F F.val = NUM.val

F.val =NUM.valF F F.val= NUM.val NUM NUM.val = 5

NUM.val =2 NUM NUM NUM.val = 3

Parsing stack itself is used for holding the values


of the attributes of the grammar symbol.
Rewriting grammar

Grammar using inherited attributes


D −› T { L.type = T.type;} L
L −› {L1.type = L.type;} L1, id {enter(id.ptr,L.type);}
L −› id {enter(id.ptr, L.type);}
T −› int {T.type = int;}

Rewritten grammar using only synthesized


attributes
D −› L
L −› L1, id {L.type = L1.type; enter(id.ptr,L1.type);}
L −› T, id { L.type = T.type; enter(id.ptr, T.type);}
T −› int {T.type = int;}
Intermediate Code Forms
 Sytax Tree : Which is nothing but a
condensed form of parse tree; the operator
and keyword nodes of a parse tree are
moved to their parents and chain of single
productions is replaced with single link.
 Three-address code: It is a sequence of
statements, with each statement containing
no more than three references.
Syntax Tree: Example
E +
E + T a *
T T * F b c
F F NUM=c

NUM=a NUM=b

Parse Tree for a+b*c Syntax Tree for a+b*c


Three-address code:Example
t1 = b*c
t2= a+ t1
Three address statements:
1. Used for representing arithmetic expressions:
x= y op z, x = op y, x = y
2. Used for representing boolean expressions:
if a relop b goto L, goto L
3. Used for representing array references and dereferencing
operations:
x=y[i], x[i] = y, x *y, *x = y
4. Used for representing procedure call;
param T, call p,n
SDTS to translate arithmetic
expressions into syntax tree
E −› E1 + T { E.ptr = mknode(‘+’,E1.ptr, T.ptr);}
E −› T {E.ptr = T.ptr;}
T −› T1 * F { T.ptr = mknode(‘*’,T1.ptr, F.ptr);}
T −› F {T.ptr = F.ptr;}
F −› id {F.ptr = makeleaf(id.place);}

makeleaf(ptr): creates a leaf node containing a pointer ptr.


makenode(op, ptr1, ptr2) : creates a node labeled by op,
make its left child to be the node pointed by ptr1, make
its right child to be the node pointed by ptr2, and returns
a pointer to node labeled by op.
Types of attributes : id.place is pointer to symbol record
containing the name of id;
E.ptr=T.ptr=F.ptr : pointer to syntax tree node
SDTS to translate arithmetic
expressions into Three-address code
E −› E1 + T { $1 = gentemp();
gencode(‘+’,E1.place, T.place,$1);
E.place =$1;}
E −› T {E.place = T.place;}
T −› T1 * F { $2 = gentemp()
gencode(‘*’,T1.place, F.place,$2);
T.place = $2;}
T −› F {T.place = F.place;}
F −› id {F.place = id.place;}

gentemp(): creates a temporary name installs it in the


symbol table and returns a pointer to that symbol table
record.
SDTS to translate arithmetic
expressions into Three-address code
gencode(op, ptr1, ptr2, ptr3) : generates a three-address
statement with op1, op2, and result as ptr1, ptr2, and
ptr3 respectively.
Types of attributes : id.place is pointer to symbol record
containing the name of id;
E.place=T.place=F.place : pointer to symbol table records.
SDTS to translate Boolean
expressions into Three-address code
 One way of representing true and false is using integer 1
and 0 respectively.
 If this representation is used then translating a Boolean
expression involves generating a code to set a
temporary location to either 0 or 1, and referencing this
location whenever the value of the Boolean expression is
required
Example : a < b is required to be translated to the
following TAC;
i) if a < b goto i+3
i+1) $1 =0
i+2) goto i+4
i+3) $1 = 1
SDTS to translate Boolean
expressions into Three-address code
E −› E1 relop E2
{ $1=gentemp();
gencode(if E1.place relop E2.place goto Nextquad+3);
gencode($1 =0);
gencode(goto Nextquad+2);
gencode($1=1);
}
Nextquad is an attribute which is used to keep track of
the next available index of data structure used for
holding TAC
SDTS to translate Boolean
expressions into Three-address code
 Another way to represent a value of Boolean expression is
by using the position in the generated TAC where the
control is to be transferred depending upon truth or
falsehood of the Boolean expression.
 This representation is called as control flow representation
 In this representation a Boolean expression is represented
by a conditional and unconditional goto statements. The
target of the condition goto is truth value of the
expression, whereas the target of unconditional goto is the
false value.
Example: an expression a < b is represented as follow:
i) if a < b goto __
i+1) goto __
SDTS to translate Boolean
expressions into Three-address code
 Therefore SDTC for translating a Boolean expression, when
we use control flow representation is:
E −› E1 relop E2
{ E.True =makelist(Nextquad);
E.False = makelist(Nextquad+1);
gencode(if E1.place relop E2.place goto __);
gencode(goto __);
}
E.True and E.False are pointer valued attributes pointing to
lists containing indices of data structure used for storing
TAC, these are targets of the statements to be executed
depending upon truth or false value of expression.
makelist(i); creates a node inserts index i into it and returns a
pointer to the node.
SDTS to translate Boolean
expressions into Three-address code
Example: an expression a < b is represented as follow:
i) if a < b goto __
i+1) goto __

E.True = i

E.False = i+1
SDTS to translate Boolean
expressions into Three-address code
Example: an expression a < b and c > d is represented as
follows (assuming short circuited evaluation):
i) if a < b goto (i+2)
i+1) goto __
i+2) if c > d goto __
i+3) goto __

E.True = i +2

E.False = i+1 i+3


SDTS to translate Boolean
expressions into Three-address code
Therefore SDTS is:
E −› E1 and M E2
{ backpatch(E1.True, M.quad);
E.True = E2.True;
E.False = merge(E1.False, E2.False);
}
M −› є { M.quad = Netquad;}
M.quad : used to remember the start of the code for E2.
backpatch(ptr, i) : fills the targets of all the three-address
statements whose indices are in the list pointed to by ptr
with i.
SDTS to translate Boolean
expressions into Three-address code
Example: an expression a < b or c > d is represented as
follows (assuming short circuited evaluation):
i) if a < b goto __
i+1) goto (i+2)
i+2) if c > d goto __
i+3) goto __

E.False = i +3

E.True = i i+2
SDTS to translate Boolean
expressions into Three-address code
Therefore SDTS is:
E −› E1 or M E2
{ backpatch(E1.False, M.quad);
E.True = merge(E1.True,E2.True);
E.False = E2.False;
}
M −› є { M.quad = Netquad;}
SDTS to translate Boolean
expressions into Three-address code
Example: an expression not a < b is represented as follows :
i) if a < b goto __
i+1) goto __

E.True = i +1

E.False = i
SDTS to translate Boolean
expressions into Three-address code
Therefore SDTS is:
E −› not E1
{ E.True = E1.False;
E.False = E1.True;
}
SDTS to translate Statements into
Three-address code
A simple statement is an expression terminated by
semicolon.Therefore SDTS is:
S −› E ;
{ S.Next= NULL;
}

S.Next is a pointer valued attribute pointing to a list containing


indices of data structure used for storing TAC, these are
the indices for those statements whose targets are to be
filled by logical next of the statements
SDTS to translate Control structures
into Three-address code
Example: if E then S1 else S2 is represented as follows:

E.false code for E


E.True

code for S1 s1.Next


goto ___ S.Next

code for s2 s2.Next


SDTS to translate control structures
into Three-address code
Therefore SDTS is:
S −› if E then M1 S1 N else M2 S2
{ backpatch(E.True, M1.quad);
backpatch(E.False, M2.quad);
S.Next = merge(S1.Next,N.Next,S2.Next);
}

M −› є { M.quad = Netquad;}
N −› є { N.Next = makelist(Nextquad);
gencode(goto ____);
}
SDTS to translate Control structures
into Three-address code
Example: while E do S1 is represented as follows:

code for E E.false S.Next


E.True

code for S1 s1.Next

goto ___
SDTS to translate control structures
into Three-address code
Therefore SDTS is:
S −› while M1 E do M2 S1
{ backpatch(E.True, M2.quad);
backpatch(S1.Next, M1.quad);
S.Next = E.False;
gencode(goto M1.quad);
}

M −› є { M.quad = Netquad;}
SDTS to translate Control structures
into Three-address code
Example: for(E1; E2;E3) S1 is represented as follows:

code for E1

E.True code for E2 E.False S.Next

code for E3
goto

Code for S1 S1.Next


goto
SDTS to translate control structures
into Three-address code
Therefore SDTS is:
S −› for( E1; M1 E2; M2 E3) N S1
{ backpatch(E2.True, N.quad);
backpatch(N.Next, M1.quad);
backpatch(S1.Next, M2.quad);
S.Next = E2.False;
gencode(goto M2.quad);
}
M −› є { M.quad = Netquad;}
N −› є { N.Next = makelist(Nextquad);
gencode(goto ____);
N.quad = Nextquad;
}
SDTS to translate an array reference
into Three-address code
An array reference is an l-valued expression, hence we use
following grammar to capture the syntactic structure:
L −› id[elist]
elist −› elist, E | E

• Being an l-valued expression, translation of an array


reference involves code generation for l-value
computation.
L-value computation
L-value(a[i1,i2,…,ik] = addr(a) + offset;
Offset = [(i1-lb1) *(ub2-lb2+)*(Ub3-lb3+1)*…*(ubk-lbk+1)
+(ib2-lb2)*(Ub3-lb3+1)*…*(ubk-lbk+1)+ ……
+(ibk-lbk)] *size of element;
Where lbi & ubi are lower and upper bounds for ith dimension.
SDTS to translate an array reference
into Three-address code
If we represent (ubi-lbi+1) by di, and take size of element to
be bpw, then
Offset = [(i1-lb1) d2*d3*…..dk+ ib2-lb2)*d3*d4*…..dk+ ……
+(ibk-lbk)] *bpw;
This can be rewritten as:
Offset = [(i1*d2*d3*…..dk+ i2*d3*d4*…..dk+ ……+ ik)*bpw –
(lb1*d2*d3*…..dk+ lb2*d3*d4*…..dk+ ……
+lbk)*bpw]
(i1*d2*d3*…..dk+ i2*d3*d4*…..dk+ ……+ ik)*bpw, being
dependent of indices is a varaible part denoted as V, and
(lb1*d2*d3*…..dk+ lb2*d3*d4*…..dk+ …… +lbk)*bpw, being
independent of indices is a constant part denoted as C
L-value(a[i1,i2,…,ik] = addr(a) + V - C;
SDTS to translate an array reference
into Three-address code
Since addr(a) is fixed, we can combine C with addr(a) and make a provision
to track through L.place attribute, and make a provision to keep track of V
through L.off attribute, and use the index addressing as L.place[L.off].
Since V =((((i1*d2)+i2)*d3)+i3 ……..+ik)*bpw.
TAC that is required to be generated is;
$1= i1;
$1=$1*d2
$1=$1+i2;
$1=$1*d3
$1=$1+i3
…..
$1=$1*dk
$1=$1+ik
$1=$1*bpw
$2=addr(a) -C
SDTS to translate an array reference
into Three-address code
Therefore the SDTS is:
elist −› E {Initialize Queue by adding E.place;}
elist −› elist1,E { append E.place to Queue;}
L −› id[elist] {$1 = gentem();Ndim=1;
gencode($1=retrieve());
while(Queue is not empty) do
{ gencode($1=$1*limit(id.place,Ndim));
gencode($1=$1+retrieve());
Ndim = Ndim+1;
} $1=$1*bpw; L.off = $1;
gencode($2=id.place – C); L.place = $2;
}
Where limit(id.place,Ndim) return the value of di.
SDTS to translate an array reference
into Three-address code
Grammar to capture array declaration with following syntax:

int a[1..5,2..7];

D −› type id[dlist]
type −› int | float
dlist −› dlist1, const1..const2 | const1..const2

Computation of C
C = (lb1*d2*d3*…..dk+ lb2*d3*d4*…..dk+ …… +lbk)*bpw;
Can be written as:
C =((((((lb1*d2)+lb2)*d3+lb3)*d4+……+lbk)*bpw;
SDTS to translate an array reference
into Three-address code
Therefore the SDTS is:
type −› int {type.val = 2;}
type −› float {type.val = 4;}
dlist −› const1..const2 { dlist.zize = const2.val –const1.val+1;
Initialize Queue by adding dlist.size;
C= const1.val;
}
Dlist −› dlist1, const1..const2
{dlist.size = const2.val –const1.val+1;
append dlist.size to Queue;
C= C*dlist.size+const1.val
}
SDTS to translate an array reference
into Three-address code

D −› type id[dlist] { C= C * type.val;


while(Queue not empty)
{ enetr(id.place,retrieve());}
enter(id.place,type.val);
enter(id.place,C);
}

You might also like