Chapter - 4: Semantic Analysis
Chapter - 4: Semantic Analysis
Semantic analysis
Syntax directed translation
A syntax directed definition is a generalization of the
CFG in which each grammar symbol has an associated set
of attributes (synthesized and inherited).
An attribute can represent anything we choose ( a string,
a number, a type, a memory location, etc.)
The value of a synthesized attribute is computed from the
values of attributes at the children of that node in the parse
tree.
The value of an inherited attribute is computed from the
values of attributes at the siblings and parent of that node in
the parse tree.
Attributes
A process of computing an attribute and its values is
referred as binding.
Static: prior to execution e.g.: number of digits in a number
Dynamic: during execution e.g.: value of a variable
Parse: int x
Semantic rules/Attributed Grammar
Semantic rules represent attributes and their values on the
production.
/*Binary number */
N→L1.L2
L1→L2 B
L→B
Parse: int x B→0
B→1
Parse: 1001.10
Type checking
The compiler must check if the source program follows
semantic conventions of the source language.
Static checking
Dynamic checking
The following are examples of static checks:
Type Checks.
Uniqueness Checks.
Flow-of-Control Checks.
Name -Related Checks.
A type checker verifies that the type construct matches that
expected by its context.
For example, a type checker should verify that the type value
assigned to a variable is compatible with the type of the variable
Mod(%) expects two integers
Type checking(cont. . .)
In almost all languages, types are either basic or
constructed
Basic types are boolean, character, integer and real
Constructed types includes arrays, records and sets
Type Expression
The type of a language construct will be denoted by a type
expression
A type expression is either a basic type or formed by applying
an operator called type constructor to the type expression
The following are type expressions:
A basic type is a type expression
A type constructor applied to a type expression is a type expression.
Constructors include:
Arrays. If I in an index set and T is a type expression, then array (I, T) is a type
expression
Pointers. If T is a type expression then pointer(T) is a type expression
Functions. The type expression of a function has the form D→R where
D is the type expression of the parameters and R is the TE of the
returned value.
Etc. ..
Examples of type checker
The following example gives type checking system for function calls:
E→E1 (E2) {
E.type := if E2.type = s and E1.type = s → t then
else
type_error
}
Examples of type checker
Expressions:
E→E1 mod E2 {E.type := if E1.type = integer and E2.type := integer then integer else type_error}
E→E1[E2] {E.type := if E2.type = integer and E1.type := array(s, t) then t else type_error}
S→id := E
S→if E then S1
S→while E do S1
S→S1 ; S2
Examples of type checker
Statements:
S→id := E {S.type := if id.type = E.type then void else type_error}
S→S1 ; S2 {S.type := if S1.type = void and S2.type = void then void else type_error}
Equivalence of type expression
Two type expressions are structurally equivalent if and
only if they are the same basic type or are formed by
applying the same constructor to structurally equivalent
types
For example, integer is equivalent only to integer
Using triples, we refer to the result of an operation x op y by its position, rather than by an explicit temporary name.
Intermediate code generation
Postfix Notation
The postfix notation is practical for an intermediate
representation as the operands are found just before the
operator
In fact, the postfix notation is a linearized representation
of a syntax tree
e.g., 1 + 2 * 3 will be represented in the postfix notation as 1 2
+3*
Intermediate code generation
Three-Address code
The three address code is a sequence of statements
of the form: X := Y op Z
Only one operator at the right side of the assignment is
possible.
Similarly to postfix notation, the three-address code is a
linearized representation of a syntax tree.
usually contains three-addresses (the two operands and
the result)
Intermediate code generation
Statement Format Comments
Assignment (binary operation) X := Y op Z Arithmetic and logical operators used
Copy statement X := Y
Unconditional jump goto L
Conditional jump If X relop y goto L
Function call
param X1 The parameters are specified by param
param X2
…
param Xn
call p, n The procedure p is called by indicating the number of parameters n
Intermediate code
dot_prod = 0; | T4 = dot_prod+T3
i = 0; | dot_prod = T4
L1: if(i >= 10) goto L2 | T5 = i+1
T1 = a | i = T5
T2 = b | goto L1
T3 = T1*T2 | L2:
Intermediate code generation(Examples)
C-Program
Intermediate code
Intermediate code generation(Examples)
Intermediate code generation (Examples)
C-Program (function)
int dot_prod(int x, int y){
int d, i; d = 0;
for (i=0; i<10; i++) d += x*y;
return d;
}
Intermediate code
func begin dot_prod | T2 = y
d = 0; | T3 = T1*T2
i = 0; | T4 = d+T3
L1: if(i >= 10)goto L2 | d = T4
T1 = x | T5 = i+1
| i = T5
| goto L1
| L2: return d
| func end
Intermediate code generation(Examples)
1. C-Program
2. C-Program
3. C-Program
int fact(int n)
{
if (n==0) return 1;
else return (n*fact(n-1));
}
Intermediate code ?
Chapter – 6
Code Generation
Code Generation
Back end operation
Machine architecture(CPU in the eyes of ML programmer)
Instruction set operation
Addressing mode
Data format
CPU registers
I/O instructions
Code Generation
Atoms → Binary coded instruction(Object language program)
Code Generation
Query #1:
Can we use a C++ backend while developing a new language?
Code Generation
Query #2:
Given
Source code for Pascal compiler
Pascal Compiler that runs on Mac
Required
Ada compiler which runs on Mac
Code Generation
Query #3:
Given
Source code for Pascal compiler
Pascal Compiler that runs on Mac
Required
Pascal compiler for a new machine called RISC
Code Generation
Converting atoms to
instructions
Eg. C=A+B
Converting conditional
branches LOD followed by STO architecture
If(A>B) A=B*C L1:
LOD R1, =’1’ LOD R1, T1
MOV supporting architecture CMP R1, =’0’, 1
STO R1, T1
MOV 1, T1 JMP L2
LOD R1, A
TST A, B, 3, L1 LOD R1, B
MOV 0, T1 CMP R1, B, 3
JMP L1 MUL R1, C
LBL L1
TST T1, 0, 1, L2 LOD R1, =’0’ STO R1, A
MUL B, C, A STO R1, T1 CMP 0,0,0
JMP L3 JMP L3
LBL L2 L2:
LBL L3 L3:
Code Generation
Register Allocation
CPU
Single arithmetic register
Specific purpose registers or
General purpose registers (Here we need RA)
Register Allocation:
Process of assigning a purpose to a particular register
Used to
Maximize utilization of CPU registers
Minimize memory reference
Minimize number of instructions (Can also optimize code)
Allows CG to maintain information on
Which registers are used
Which are available for reuse
Code Generation
Show the code that can be generated for the following
C++ code segments with efficient register utilization
Example #1:
A=B+C*D;
B=A-C*D;
One solution can be
LOD R1, C LOD R1, C
MUL R1, D MUL R1, D
STO R1, Temp1 STO R1, Temp2
Problems
1. Number of instructions
LOD R1, B LOD R1, A
2. Number of memory references
ADD R1, Temp1 SUB R1, Temp2
STO R1, A STO R1, B
Example #2:
a - b/c + d*(e-f + g+h)