CD Unit3 Part2
CD Unit3 Part2
C.i = A.i
C is taking value from Parent A
i is an attribute associated with
the symbols ABCD
Types of SDD
• 2 types
1. S-Attributed SDD or S-Attributed Definitions or S-Attributed Grammar
2. L-Attributed SDD or L-Attributed Definitions or L-Attributed Grammar
S-Attributed SDD
1. A SDD that uses only synthesized attributes is called as S-Attributed SDD
• S specifies synthesized
2. Semantic actions are always placed at the right end of the production
• In order to construct the annotated parse tree at the starting stage itself its
not possible to assign the value for T’.syn
Dependency Graph
• There are 3 advantages of dependency graph
1. It represents the flow of information among the attributes in a parse tree
2. Used for determining the evaluation order for the attributes. // in which
order the corresponding semantic rules are executed i.e. which production
can be executed first
3. While an annotated parse tree shows the values of attributes where as a
dependency graph determines how those values can be computed
Draw the dependency graph for i/p string 5+3*4
first construct the parse tree and then annotated parse tree
• In order to construct annotated parse tree top down left to right traversing
approach. If there is a reduction then execute the corresponding semantic
action
• First digit.lexval=5will be executed for this we need to assign 1 which is
nothing but evaluation order 1. so first this stmt will be executed
• Then F like wise process continues and we need to give the evaluation
order number
Method 2:
• Conversion of infix to postfix 1+2+3
• If we observe the first production it is in the form of left recursion
• After elimination the grammar will be of the form:
• here we introduced a new NT R
• For this grammar we need to construct parse tree
• After constructing the parse tree convert infix to postfix
• If we find + operator while conversion no need to use it but if we find
print + then we need to use it
Postfix Notation
• If an operator appears after the operand then it is postfix notation
• Consider the infix expression (a+b)*c then the postfix will be
Sol: As () is hvng highest priority
First a+b will be executed ab+ then *
ab+c*
• Consider the infix expression a+(b*c) then the postfix will be
Sol: As () is hvng highest priority bc*
Then + will be performed abc*+
• Consider the infix expression (a-b)*(c/d) then the postfix will be
Sol: As () is hvng highest priority so 1st expression will be executed first
ab-
Next 2nd expression will be executed cd/
Next * will be done ab-cd/*
Three Address Code Representation
• In a Three Address Code instruction each instruction should contain at
most(max) three addresses and the RHS should contain at most one
operator
• A Three Address Code instruction is represented in 3 ways
1. Quadruple
2. Triple
3. Indirect Triple
• Consider the instruction x+y*z
Sol: In RHS there must be only one operator but here there are 2
• So we have to translate this instruction into a Three Address Code
representation
• Here 2 operators +,* as * is hvng highest priority it will be executed first
t1=y*z // t1 is a temporary variable which is generated by the compiler
t2=x+t1
As each instruction should contain at most(max) three addresses
t1=y*z
So t1 – 1st address, 2nd address – y and 3rd address – z
RHS should contain at most one operator -- *
Quadruple
• It contains 4 fields
1. op
2. arg1 // represents 1st field
3. arg2
4. result // result of corresponding expression
Ex: Consider an expression
a= b * -c + b * -c
-c // unary minus – it will be hvng highest priority than *,+
so we need to represent this expression in 3 address code
• Now we need to represent this 3 address code in the form of quadruple
• After intermediate code we have code optimization which is used for
reducing the code
• Let the 1st instruction address is stored in (0)
• The major disadvantage of this approach is , too many temporary
variables will be available all must be stored in the S.T, so it requires
more amt of memory.
• So in order to overcome this problem we can use the 2nd approach triple
Triple:
• The name itself specifies the meaning. Each instruction should contain
only 3 fields
1. op
2. arg1
3. arg2
t2=b*t1 // 2nd arg is t1. this t1 is stored at address (0) so 2nd arg will be (0)
In triples we don’t represent temporary variables, instead of them we use
addresses
The major advantage of this representation is
temporary variables are not needed in this approach
With less amt of memory we can execute the instructions
Indirect Triple:
Same like triple but there will be an extra table which contains pointer to the
triple
• as we are hvng 5 instructions , let the 1st pointer is 100 it points to triple
let it be (0)
• (0) to (4) are triples which are pointed by some pointers
• So in indirect triple there will be pointers which will be pointing to some
triples and that triples information will be available in triple table
• Construct Quadruples,triples,indirect triples for the statement
(a+b)*(c+d)-(a+b+c)
Sequence of Declarations
• One environment will be generated for that the offset value will be zero.
• So the base address will be stored in the stack
• If the data type is integer the T.type will be integer and the identifier
name along the offset(if you declare T for the first time it will be zero) all
these value will be stored in the symbol table
• The symbol table is maintain the data for the present environment we are
not
• Top.put from the stack where the current pointer value pointing to and
from that we are going to place the identifier along with the data type
and offset value into the symbol table
Fields in records and classes
Translation of Expressions
First production
• We need to generate three address code for the stmt.
• S refers to stmt
• To generate three address code for the stmt S first we need to know what
is the three address code for the expression E and after that the id details
we have to get from the symbol table
• That’s why top.get and id.lexeme is used
• The environment details will be stored in the stack
• Using the top pointer we can access the environment variables from the
stack
• After accessing the id details we are going to store the E.addres in the id
• So we need to generate three address code for id=E.addr then we are
going to concatenate the three address code of E
In 2nd production
• After performing the addition the result will be stored in compiler
generated temporary variable new Temp()
• E is a parent and E1,E2 are children nodes
• So the three address code is
E.code=E1.Code||E2.code
These 2 will be concatenated to find the parent three address code
Along with that we have to find E1.addr, what is the value in E1 and what is the
value in E2 so we are accessing the values from them and add the values and
store it in parent E.addr
• This three address code we need to generate using the predefined function
generate
In 3rd production
• Here we need to generate new temporary variable by using a function
new Temp which will be stored in E.addr which is the parent attribute
• So we need to find the three address code for E1 which will be present in
E1.code and also find the three address code minus which are
concatenated with each other which is generated through using the
predefined function generate(predefined function for the compiler)
• All these are concatenated to find the E.code
In 4th production
• Here in E.addr we are going to store E1.addr
• And whatever the three address code in E1 the parent three address code
• Code is a attribute where three address code will be stored
• In 5th production
• We are going to access the identifier details from the symbol table for
that get function will be used and that will be stored in E.addr
• Whatever the present environment that details will be in the stack
• We can access this stack contents through the top pointer
• Here no three address code generator so E.code will be empty value
Incremental Translations
• New node we are going to generate.
• The op is +
• The first operand is E1.addr
• The second operand is E2.addr
• E.addr is the parent three address code
• First production
• S is referring to the stmt
• Here we need to get the identifier details of the present environment from
the stack.
• For that we will make use of top pointer and then we have to generate
three address code
• For generating three address code what are the things(SDT) we require
here are
• = E.addr so in E the expression value will be stored and that will be
accessed by E.addr
• So in order to access the elements in E we are using E.addr
• In 2nd production
• E1 and E2 value will be stored in E1 and E2.addr from those addresses
we access the values of them and we will assign to parent E.addr
• To generate the three address code we will make use of generate
predefined function
• After performing the addition we are going to store the result in
temporary variable new temp. so it’s a compiler generated temporary
variable and its address will be captured in parent E
• In 3rd production
• Here unary minus to calculate that we have to generate three address code
which will be stored in temporary variable
• The value of E1 will be accessed through that E1.addr
• In 4th production
• The value of E1 will be assigned to E
Addressing array elements
• i stands for index w for width
• i1 is the index value for the first dimension and w1 means how many
columns are there in the row i2 is the index value for the second
dimension and w2 is the element data size
Translation of array references
• In first production
• The id details will be retrieved from the symbol table for that purpose we
are using top.get function
• E is the expression
• L is referring to the list of values which is referring to the array
• To retrieve the address of it we will be using L.addr that will be used to
access the offset value of the array
• To retrieve the base value we will be using base attribute and L.array
will help us to retrieve the details of array from the symbol table
• What ever the value stored in E again the addr attribute is used for
retrieving the values
• We will get the details of array from the symbol table using L.array
• In 2nd production
• Whatever the values of E1 and E2 and perform addition and placed in
parent E.addr where addr is an attribute
• To store the address of E we are creating a new temporary variable
• In 3rd production
• To retrieve the details of id from the symbol table we need to use top.get
function and then it will be stored in E.addr
• In 4th production
• To retrieve the contents from the array we need the offset value L.addr
and what is the base address of the array
• Using L.array we will retrieve the values from the symbol table
• Now we need to generate three address code for that gen function is used
• In 5th production
• Whenever we use 2d array we need to get the details about identifier from
the symbol table for that top.get function will be used
• Whatever the type of each and every element in the array that will be the
type of L.type
• Width of every element is multiplied with E.addr to calculate the offset
value
• Three address code is generated using gen function
• In 6th production
• L1.array that value will be retrieved from the symbol table which will be
the parent details
• Offset value will be calculated which will be used for finding the exact
location of array
Type Checking
• Compiler while compiling the prg it has to follow the type rules of the
language
• Each language is having there own type rule
• The information about the data type is maintained and computed by the
compiler
• Type checker is a module of a compiler which is used for helpful in type
checking tasks(compiler is hvng the type checker module)
• The tasks are nthng but the
• index is allowed only on an array (indexing concept will be used
only for arrays to assign the array values)
• Index must be present
• Range of integer/float must be this
• Type checking may be either static or dynamic
• Static is done at compile time ( C lang,pascal) which is used to check the
correctness of prg before its execution
• Dynamic is done at run time
• The main purpose of Type checking is whether the prg is maintained
correctly or not everything will be checked before its execution
• So a static type checking is also useful to determine the amount of
memory needed to store the variables
The design of type checker depends on
• syntactic structure of language constructs
• the type expressions of language
• the semantic rules for assigning types to construct
Position of a type checker
• By seeing the syntax tree type checker will check whether each data type
is handling the correct variable or not
• Again if there are any modifications are there it will modify it and
produces the result in the form of syntax tree
Control Flow
Ctrl flow is also known as
• Intermediate code for flow of ctrl stmts
• SDT of flow of ctrl stmts into three address code
• SDD of flow of ctrl stmts into three address code
• By using Ctrl stmts we can transfer the flow from one stmt of the prg to
another stmt
• Examples for ctrl stmts are:
1.simple if
2.if else, nested if else
3.while, switch etc….
Here we are writing IC for 3 ctrl stmts such as
1.simple if
2.if else and
3.while loop
So we are translating these 3 ctrl stmts into IC or writing SDD/SDT
Simple if
• Let the stmt be S-> if(B) then S1 // S,S1 are stmts , B stands for condition
which is a Boolean expression
• If the expression is true then S1 will be executed // if false S1 don’t exe
Now we need to write code for simple if stmt
• S-> if(B) then S1 is our production
• First we need to evaluate the condition, which will be evaluated with the
help of B.code // B.code returns either true or false, B.true is a label
• If the condition is true then B.true label will be executed
• If the condition is true then S1 will be executed, so code for S1 is S1.code
• If the condition is false then S1.next/S.next(both are same) will be
executed
• Whenever the condition is true then the body of if will be executed and
the ctrl comes out from the if.
• Then the next stmt after if is nthng but S.next
If else
• Let the production is S-> if(B) then S1 else S2
Code:
• 1st condition will be evaluated with the help of B.code which returns
either T or F
• If B.code=true then B.true label will be executed if not false
• If B.code=true then ctrl goes to B.true label and S1.code will be executed
• Then ctrl comes out from the if else, so the nxt stmt will be
S2.next/S.next
• After S1 then we need to goto S.next without checking any condition
• If we execute this stmt goto S.next then false block wont execute and ctrl
directly goes to next stmt after S2
• If B.code=false then ctrl goes to B.false label and S2.code will be
executed
• Here in this block no need to write goto S.next bcz the next stmt after S2
is S.next only
Semantic Rules
• If the condition is true then B.true will be executed
• B.true=newlabel() // in order to store the result newlabel fn is generated
• S1.next=S.next // after exe the body of S1.code the next stmt of S1 is
S.next
• If the condition is false then B.false will be executed
• B.false=newlabel() // it generates a 3 addr code
• S2.next=S.next
• Code
S.code=B.code||labe(B.true)||S1.code||gen(‘goto’ S.next)||
labe(B.false)||S2.code
//1st write code for Boolean expression
//if true label will be generated
//S1.code will be executed
//then ctrl goes to S.next which is done with the help of generate
//If the condition is false then ctrl goes to b.false so we need to generate label
for false
//next evaluate the code for S2
While
• Let the production be S->while(B) then S1
• If the condition is true then S1 will be executed
• 1st expression will be evaluated with B.code which returns either T or F
• If condition true ctrl goes to true label then S1.code will be executed
• Once again we have to check the condition, if the condition is true once
again body will be executed // the body will be executed as long as the
condition is true
• So we need to goto B.code and name the label as begin to B.code
• When false S.next
Semantic Rules
• First create a label for begin
• Begin=newlabel()
• Next B.code will be executed ,if true
• B.true=newlabel()
• S1.next=begin
• B.false=S.next//If the condition is false
• Code
S.code=label(Begin)||B.code||label(B.true)||S1.code||gen(‘goto’ begin)||
//label is created for begin
//B condition will be evaluated
//true will be executed
//Next S1
Next goto begin without checking any condition
Backpatching
• Consider an example
x<100 || y>200 && x!=y
• Let x<100 resides at label 100 , if x<100 is true then no need to evaluate
the expression but we don’t know in which label(address) we have true
value, so that’s why we have to specify blank here // there is a true label
• if x<100 is false then we need to evaluate the expression y>200 , but we
don’t know what is the address of this instruction
• Let the next instruction is stored at 101,this 101 will be executed when
this expression x<100 is true. So 101 is the address of this instruction
x<100
• if x<100 is false then we need to evaluate the expression y>200 , but we
don’t know what is the address of this instruction, so we need to put
blank
• if x<100 is false then we need to evaluate the expression y>200, let it
resides at 102
• If y>200 is true then we have to evaluate the expression x!=y, but we
don’t know what is the address of this instruction so we need to put blank
• If the condition is false y>200 then the entire result is false, bcz of &&.
But we don’t know where false address resides
• So let the next instruction is 103 , goto ----- //it specifies a label/address
which specifies the false value
• If y>200 is true then we have to evaluate the expression x!=y, if x!=y is
also true then we can say that the entire result is true // but we don’t know
where true resides so put blank
• If the expression x!=y is false then we can say that the entire result is
false // but we don’t know where false resides so put blank
• 106 specifies true value
• 107 specifies false value
• In the process of IC generation we may not know all the labels in the 1st
pass
• In order to overcome this pblm we need 2nd pass which we call it as
backpatching
• Backpatching is the process of filling the missing labels
• If x<100 is true then no need to evaluate the expression, then this
evaluation is known as short circuit evaluation
• if x<100 is false then we need to evaluate the expression y>200,so it is
residing on 102
• If y>200 is true then we have to evaluate the expression x!=y,so its
address is 104
Intermediate Code Generation
100: if x<100 then -------
101: goto --------
102: if y>200 then -------
103: goto -----------
104: if x!=y then ----------
105: goto ---------
106: true
107: false
• In 2nd pass we need to fill the missing
100: if x<100 then 106
101: goto 102
102: if y>200 then 104
103: goto 107
104: if x!=y then 106
105: goto 107
106: true
107: false
• Now we need to write the translation rules
x<100 || y>200 && x!=y
• Here we are hvng 3 operators
• Here M is a marker NT,its value might be epsilon
• M will give the address of B2 //B,B1,B2 are some Boolean expressions
• In order to write translation rules we will be hvng 3 stmts
1.Backpatch() // contains 2 arguments
2.B.truelist() // contains list of addresses which contain true
stmts(100,102,104)
3.B.falselist()
• B->B1 || MB2
{
Backpatch(B1.falselist, M.instr) ;
B.truelist = merge(B1.truelist, B2.truelist);
B.falselist = B2.falselist; // no need to write B1
}
//if B1 is T then no need to evaluate the next expression so we need to write in
B.truelist(), here we are considering B2 as also T
//so when we perform merge(B1.truelist, B2.truelist), its result is B.truelist
//if B1 is false then we have to evaluate the next expression B2, if B2 is F then
B will be F,
// for backpatch, if B1 is false then only we have to evaluate B2,which will be
indicated as B1.falselist
//in oder to evaluate B2 we have to use M, so to know address of B2 we use
M.instr
• B->B1 && MB2
{
Backpatch(B1.truelist, M.instr) ;
B.truelist = B2.truelist;
B.falselist = merge(B1.falselist, B2.falselist);
}
//if B1 is true then only we have to evaluate B2
//backpatching is needed when B1 is true
// B1.truelist is backpatched with M.instr
• B->!B1
{
B.truelist = B1.falselist;
B.falselist = B1.truelist //
}
• B->(B1)
{
B.truelist = B1. truelist; //assignment
B.falselist = B1.truelist;
}
• M-> ε
{
m.instr=nextinstr;
}
• Now construct Annotated parse tree for x<100 || y>200 && x!=y
Functions of backpatching
• makelist(i) – creates a new list with index i
• merge(L1, L2) – concatenate the list L1 with L2
• backpatch(L, label) – L is a list it contains some three address stmts and
label is the target label. So this target label must be filled with all the
three address stmts present in the list L. So backpatch operation fills all
the three address stmts in the list with target label
• nextquad – gives the index of the next quadraple
SDT for boolean expression using backpatching
• In first production
• One more marker non terminal M is used. A non terminal producing
epsilon it is called as marker non terminal., which means that it wont
generate anything but some semantic action must be performed
• If E1 is false then only we need to check E2 is false or not. here M will
have the index/address of E2
• truelist and falselist are the attributes attached to the non terminal E
• Either E1 is true or E2 is true means we can say that E is also true
• Merge function will concatenate both
• If E1 is false then only we are evaluating E2 if it is also then E will be
false
• In backpatch fn if E1 is false– here the label is m.quad
• Whenever we get this M.quad this label should be filled with all the stmts
in the falselist
• In 2nd production
• If E1 is true then we will check E2 is true or not.if both are true then only
we will say that E is true
• Falselist is either E1 is false or E2 is false
• In 5th production
• gen is used to generate three address stmt before that truelist and falselist
must be created
• makelist operation will create a within index. In that index only for
truelist three address stmt will be generated
• For E.falselist means another list will be created with nextsquad+1
• nextsquad will give the address of next quadruple
• In that location false stmt will be generated
Intermediate Code for Procedures
• Procedure is nothing but a function
• D – function/procedure definition
• T – return type/data type
• F – formal parameters
• S – statements
• E – expression
• A – actual parameters
• In the first line
• formal parameters are the parameters which are defined within called
function
• Here the formal parameter may be empty or data type
• Here id means function name whereas in second line id means variable
name
• Within the curly braces body must be written
• float add( )
(int a)
(int a, float b)
{
return add() //id is nothing but function name
}
Actual parameters are the parameters which are declared in the calling function
If we add a function within main function it is calling function