0% found this document useful (0 votes)
18 views

CD Unit 3

Uploaded by

peter
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

CD Unit 3

Uploaded by

peter
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

SDD : It extends the idea of context-free grammars (CFGs) by associating semantic rules with grammar productions.

it specifies the value of an attribute.


UNIT-IV COMPILER DESIGN
UNIT-IV
SEMANTIC ANALYSIS
1. SDDs and SDTs
SDD: Specifies the values of attributes by associating semantic rules with the productions.
SDT scheme: embeds program fragments (also called semantic actions) within production bodies. The
position of the action defines the order in which the action is executed (in the middle of production or
end). SDT is a set of productions that have semantic rules embedded within them.
SDD is easier to read; easy for specification.
SDT scheme – can be more efficient; easy for implementation
SDD is a CFG along with attributes and rules. An attribute is associated with grammar symbols
(attribute grammar). Rules are are associated with productions.
Construct a parse tree 2 Compute the values of the attributes at the nodes of the tree by visiting the
tree Key: We don‟t need to build a parse tree all the time. Translation can be done during parsing. class of
SDTs called “L-attributed translations”. class of SDTs called “S-attributed translations”
Attributes: Attribute is any quantity associated with a programming construct. Example: data types, line
numbers, instruction details.
Two kinds of attributes: for a non-terminal A, at a parse tree node N
 A synthesized attribute: defined by a semantic rule associated with the production at N. defined
only in terms of attribute values at the children of N and at N itself.
the attribute value for a non-terminal A
derived from it's children attribute
value or itself is called
synthesized attribute

SDD Parse Tree


Fig: synthesized attributes
 An inherited attribute: defined by a semantic rule associated with the parent production of N.
defined only in terms of attribute values at the parent of N siblings of N and at N itself.

SDD PARSE TREE


Dependencies between attributes: the attribute value for a non-terminal A
Values are computed from constants & other attributes. derived from the attribute value of it's
 Synthesized attribute – value computed from children. siblings or parents is called
inherited attribute
UNIT-IV COMPILER DESIGN
 Inherited attribute – value computed from siblings & parent key notion: induced dependency
graph
Attributes dependency Graph:
The dependency graph must be acyclic Evaluation order: topological sort the dependency graph to
order attributes using this order, evaluate the rules The order depends on both the grammar and the input
string.

Fig: Example

2. Evaluation Strategies
Parse-tree methods (dynamic): 1. Build the parse tree. 2. Build the dependency graph. 3.
Topological sort the graph. 4. Evaluate it (cyclic graph fails).
Q: What if tree has cycles?
Hard to tell, for a given grammar, whether there exists any parse tree whoe depdency graphs have
cycles.
Top-Down (LL)
L-attributed grammar:
Informally – dependency-graph edges may go from left to right, not other way around.
Given production A → X1X2 ···Xn.
Inherited attributes of Xj depend only on:
Inherited attributes of A
Arbitrary attributes of X1,X2,···Xj−1.
Synthesized attributes of A depend only on its inherited attributes and arbitrary RHS
attributes.
Synthesized attributes of an action depends only on its inherited attributes i.e., evaluation
order: Inh(A), Inh(X1), Syn(X1), . . . , Inh(Xn), Syn(Xn), Syn(A).
This is precisely the order of evaluation for an LL parser.
Bottom-Up(LR)
S-attributed grammar:
UNIT-IV COMPILER DESIGN
L-attributed, and only synthesized attributes for non-terminals, actions at far right of a RHS Can
evaluate S-attributed in one bottom-up (LR) pass.
3. Introduction to ICG
The front end translates a source program into an intermediate representation from which the back
end generates target code.
Benefits of using a machine-independent intermediate form are:
1. Retargeting is facilitated. That is, a compiler for a different machine can be created by
attaching a back end for the new machine to an existing front end.
2. A machine-independent code optimizer can be applied to the intermediate representation.

parser static intermediate intermediate code


checker code generator code generator

fig: Position of intermediate code generator


Intermediate Codes:
Three ways of intermediate representation
 Syntax tree
 Postfix notation
 Three address code
The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.
Graphical Representations (Syntax tree):
A syntax tree depicts the natural hierarchical structure of a source program. A dag (Directed
Acyclic Graph) gives the same information but in a more compact way because common sub-expressions
are identified. A syntax tree and dag for the assignment statement a : = b * - c + b * - c are as follows:

Fig: Graphical representation of three address code


Postfix notation:
Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the tree
in which a node appears immediately after its children. The postfix notation for the syntax tree given
above is
a b c uminus * b c uminus * + assign
Syntax-directed definition:
Syntax trees for assignment statements are produced by the syntax-directed definition. Non-
terminal S generates an assignment statement. The two binary operators + and * are examples of the
full operator set in a typical language. Operator associativities and precedences are the usual ones,
even though they have not been put into the grammar. This definition constructs the tree from the
input a : = b * - c + b* - c.
UNIT-IV COMPILER DESIGN

PRODUCTION SEMANTIC RULE

S  id : = E S.nptr : = mknode(„assign‟,mkleaf(id, id.place), E.nptr)

E  E1 + E2 E.nptr : = mknode(„+‟, E1.nptr, E2.nptr )

E  E1 * E2 E.nptr : = mknode(„*‟, E1.nptr, E2.nptr )

E  - E1 E.nptr : = mknode(„uminus‟, E1.nptr)

E  ( E1 ) E.nptr : = E1.nptr

E  id E.nptr : = mkleaf( id, id.place )

Syntax-directed definition to produce syntax trees for assignment statements


The token id has an attribute place that points to the symbol-table entry for the identifier. A
symbol-table entry can be found from an attribute id.name, representing the lexeme associated with that
occurrence of id. If the lexical analyzer holds all lexemes in a single array of characters, then attribute
name might be the index of the first character of the lexeme.
Two representations of the syntax tree are as follows. In (a) each node is represented as a record
with a field for its operator and additional fields for pointers to its children. In (b), nodes are allocated
from an array of records and the index or position of the node serves as the pointer to the node. All the
nodes in the syntax tree can be visited by following pointers, starting from the root at position 10.

Fig: Two representations of the syntax tree


4. Three Address codes
Three-address code is a sequence of statements of the general form
x : = y op z
Where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean-valued
data. Thus a source language expression like x+ y*z might be translated into a sequence
t 1 : = y * z t2 : = x + t 1
Where, t1 and t2 are compiler-generated temporary names.
UNIT-IV COMPILER DESIGN
Advantages of three-address code:
 The unravelling of complicated arithmetic expressions and of nested flow-of-control statements
makes three-address code desirable for target code generation and optimization.
 The use of names for the intermediate values computed by a program allows three-address code to
be easily rearranged – unlike postfix notation.
Three-address code is a linearized representation of a syntax tree or a dag in which explicit
names correspond to the interior nodes of the graph. The syntax tree and dag are represented by the three-
address code sequences. Variable names can appear directly in three-address statements.
Three-address code corresponding to the syntax tree and dag given above

t1 : = - c t1 : = -c
t 2 : = b * t1 t2 : = b * t1
t3 : = - c t 5 : = t 2 + t2
t 4 : = b * t3 a : = t5
t 5 : = t 2 + t4
a : = t5
(a) Code for the syntax tree (b) Code for the dag
The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.
Types of Three-Address Statements:
The common three-address statements are:
1. Assignment statements of the form x : = y op z, where op is a binary arithmetic or logical operation.
2. Assignment instructions of the form x : = op y, where op is a unary operation. Essential unary
operations include unary minus, logical negation, shift operators, and conversion operators that, for
example, convert a fixed-point number to a floating-point number.
3. Copy statements of the form x : = y where the value of y is assigned to x.
4. The unconditional jump goto L. The three-address statement with label L is the next to be executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (
<, =, >=, etc. ) to x and y, and executes the statement with label L next if x stands in relation
relop to y. If not, the three-address statement following if x relop y goto L is executed next, as in the
usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned value is
optional. For example,
param x1 param
x2
...
param xn call p,n
generated as part of a call of the procedure p(x1, x2, …. ,xn ).
7. Indexed assignments of the form x : = y[i] and x[i] : = y.
8. Address and pointer assignments of the form x : = &y , x : = *y, and *x : = y.
Syntax-Directed Translation into Three-Address Code:
When three-address code is generated, temporary names are made up for the interior nodes of a
syntax tree. For example, id : = E consists of code to evaluate E into some temporary t, followed by the
assignment id.place : = t.
Given input a : = b * - c + b * - c, the three-address code is as shown above. The synthesized
attribute S.code represents the three-address code for the assignment S.
The nonterminal E has two attributes :
1. E.place, the name that will hold the value of E , and
2. E.code, the sequence of three-address statements evaluating E.
UNIT-IV COMPILER DESIGN
PRODUCTION SEMANTIC RULES

S  id : = E S.code : = E.code || gen(id.place ‘:=’ E.place)

E  E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place)

E  E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place)

E  - E1 E.place := newtemp;
E.code := E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)

E  ( E1 ) E.place : = E1.place;
E.code : = E1.code

E  id E.place : = id.place;
E.code : = ‘ ‘
Table: Syntax-directed definition to produce three-address code for assignments
Semantic rules generating code for a while statement

PRODUCTION SEMANTIC RULES 


S  while E do S1 S.begin := newlabel;
S.after := newlabel;
S.code := gen(S.begin ‘:’) ||
E.code ||
gen ( ‘if’ E.place ‘=’ ‘0’ ‘goto’ S.after)||
S1.code ||
gen ( ‘goto’ S.begin) ||
gen ( S.after ‘:’)
The function newtemp returns a sequence of distinct names t1,t2,….. in response to successive
calls.
 Notation gen(x ‘:=’ y ‘+’ z) is used to represent three-address statement x := y + z. Expressions
appearing instead of variables like x, y and z are evaluated when passed to gen, and quoted
operators or operand, like „+‟ are taken literally.
UNIT-IV COMPILER DESIGN
 Flow-of–control statements can be added to the language of assignments. The code for S
whileEdoS1is generated using new attributesS.beginandS.afterto mark
the firststatement in the code for E and the statement following the code for S, respectively.
 The function newlabel returns a new label every time it is called.
 We assume that a non-zero expression represents true; that is when the value of E becomes zero,
control leaves the while statement.
Implementation of Three-Address Statements:
A three-address statement is an abstract form of intermediate code. In a compiler, these
statements can be implemented as records with fields for the operator and the operands. Three such
representations are:
 Quadruples
 Triples
 Indirect triples
a) Quadruples:
 A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
 The op field contains an internal code for the operator. The three-address statement x : = y op z
is represented by placing y in arg1, z in arg2 and x in result.
 The contents of fields arg1, arg2 and result are normally pointers to the symbol-table entries
for the names represented by these fields. If so, temporary names must be entered into the
symbol table as they are created.
b) Triples:
 To avoid entering temporary names into the symbol table, we might refer to a temporary value
by the position of the statement that computes it.
 If we do so, three-address statements can be represented by records with only three fields: op,
arg1 and arg2.
 The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table or
pointers into the triple structure ( for temporary values ).
 Since three fields are used, this intermediate code format is known as triples.

Fig: Quadruples, triples representation of three address statements


A ternary operation like x[i] : = y requires two entries in the triple structure as shown as below
while x : = y[i] is naturally represented as two operations.
UNIT-IV COMPILER DESIGN
Fig: More examples on three address code
Indirect Triples:
 Another implementation of three-address code is that of listing pointers to triples, rather than
listing the triples themselves. This implementation is called indirect triples.
 For example, let us use an array statement to list pointers to triples in the desired order.
Then the triples shown above might be represented as follows:

Fig: Indirect triples representation of three address statements

5. TYPE CHECKING
A compiler must check that the source program follows both syntactic and semantic
conventions of the source language.
This checking, called static checking, detects and reports programming errors.
Some examples of static checks:
1. Type checks – A compiler should report an error if an operator is applied to an incompatible
operand. Example: If an array variable and function variable are added together.
2. Flow-of-control checks – Statements that cause flow of control to leave a construct must have some
place to which to transfer the flow of control. Example: An error occurs when an enclosing statement,
such as break, does not exist in switch statement.
Position of type checker

token parser syntax type checker syntax intermediate intermediate


stream tree tree code generator

 A type checker verifies that the type of a construct matches that expected by its context. For
example: arithmetic operator mod in Pascal requires integer operands, so a type checker verifies
that the operands of mod have type integer.
 Type information gathered by a type checker may be needed when code is generated.
Type Systems:
The design of a type checker for a language is based on information about the syntactic constructs in
the language, the notion of types, and the rules for assigning types to language constructs.
For example: “ if both operands of the arithmetic operators of +,- and * are of type integer, then the result
is of type integer ”
Type Expressions:
 The type of a language construct will be denoted by a “type expression.”
 A type expression is either a basic type or is formed by applying an operator called a type
constructor to other type expressions.
 The sets of basic types and constructors depend on the language to be checked.
The following are the definitions of type expressions:
UNIT-IV COMPILER DESIGN
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type checking; void denoting “the
absence of a value” allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.
3. A type constructor applied to type expressions is a type expression.
Constructors include:
Arrays: If T is a type expression then array (I,T) is a type expression denoting the type of an
array with elements of type T and index set I.
Products: If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is a type
expression.
Records: The difference between a record and a product is that the fields of a record have names.
The record type constructor will be applied to a tuple formed from field names and field types.
For example:
type row = record
address: integer;
lexeme: array[1..15] of char end;
var table: array[1...101] of row;
declares the type name row representing the type expression record((address X integer) X (lexeme
X array(1..15,char))) and the variable table to be an array of records of this type.
Pointers: If T is a type expression, then pointer(T) is a type expression denoting the type “pointer
to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).
Functions: A function in programming languages maps a domain type D to a range type R. The type
of such function is denoted by the type expression D → R
4. Type expressions may contain variables whose values are type expressions.
Tree representation for char x char → pointer (integer)

x pointer

char char integer


Type systems:
 A type system is a collection of rules for assigning type expressions to the various parts of a
program.
 A type checker implements a type system. It is specified in a syntax-directed manner.
 Different type systems may be used by different compilers or processors of the same
language.
Static and Dynamic Checking of Types
 Checking done by a compiler is said to be static, while checking done when the target program
runs is termed dynamic.
 Any check can be done dynamically, if the target code carries the type of an element along
with the value of that element.
Sound type system:
A sound type system eliminates the need for dynamic checking for type errors because it allows
us to determine statically that these errors cannot occur when the target program runs. That is, if a sound
type system assigns a type other than type_error to a program part, then type errors cannot occur when
the target code for the program part is run.
Strongly typed language:
A language is strongly typed if its compiler can guarantee that the programs it accepts will
execute without type errors.
UNIT-IV COMPILER DESIGN
Error Recovery:
 Since type checking has the potential for catching errors in program, it is desirable for type
checker to recover from errors, so it can check the rest of the input.
 Error handling has to be designed into the type system right from the start; the type
checking rules must be prepared to cope with errors.

You might also like