Check Semantics - Error Reporting - Disambiguate - Type Coercion - Static Checking
Check Semantics - Error Reporting - Disambiguate - Type Coercion - Static Checking
• Check semantics
• Error reporting
• Disambiguate
overloaded operators
• Type coercion
• Static checking
– Type checking
– Control flow checking
– Unique ness checking
– Name checks
1
Beyond syntax analysis
• Parser cannot catch all the program errors
2
Beyond syntax …
• Example 1
string x; int y;
y=x+3
the use of x is type error
• int a, b;
a=b+c
c is not declared
3
Compiler needs to know?
• Whether a variable has been declared?
4
• How many arguments does a function take?
• Inheritance relationship
5
How to answer these questions?
• These issues are part of semantic analysis phase
6
How to … ?
• Use formal methods
– Context sensitive grammars
– Extended attribute grammars
7
Why attributes ?
• For lexical analysis and syntax analysis
formal techniques were used.
8
Attribute Grammar Framework
• Generalization of CFG where each grammar
symbol has an associated set of attributes
– Translation schemes
• indicate order in which semantic rules are to be evaluated
• allow some implementation details to be shown
9
• Conceptually both:
– parse input token stream
– build parse tree
– traverse the parse tree to evaluate the semantic
rules at the parse tree nodes
• Evaluation may:
– generate code
– save information in the symbol table
– issue error messages
– perform any other activity
10
Example
• Consider a grammar for signed binary numbers
symbol attributes
Number value
sign negative
list position, value
bit position, value
11
production Attribute rule
bit 0 bit.value 0
bit 1 bit.value 2bit.position
12
Parse tree and the dependence graph
Number Val=-5
- 1 0 1
13
Attributes …
• attributes fall into two classes:
synthesized and inherited
14
Attributes …
• Each grammar production A → α has
associated with it a set of semantic rules of the
form
– b is a synthesized attribute of A
OR
– b is an inherited attribute of one of the grammar
symbols on the right
15
Synthesized Attributes
• a syntax directed definition that uses
only synthesized attributes is said to
be an S-attributed definition
16
Syntax Directed Definitions for a
desk calculator program
LEn Print (E.val)
EE+T E.val = E.val + T.val
ET E.val = T.val
TT*F T.val = T.val * F.val
TF T.val = F.val
F (E) F.val = E.val
F digit F.val = digit.lexval
17
Parse tree for 3 * 4 + 5 n
L Print 17
Val=17E n
Val=12E + T Val=5
Val=12T F Val=5
Val=3 T * F Val=4 id
Val=3 F id
id 18
Inherited Attributes
• an inherited attribute is one whose value is defined in
terms of attributes at the parent and/or siblings
L id addtype (id.entry,L.in)
19
Parse tree for
real x, y, z
D
type=real in=real
T L
addtype(z,real)
real L in=real , z
addtype(y,real)
in=real L , y
addtype(x,real) x
20
Dependence Graph
• If an attribute b depends on an
attribute c then the semantic rule for
b must be evaluated after the
semantic rule for c
21
Algorithm to construct
dependency graph
for each node n in the parse tree do
for each attribute a of the grammar symbol do
construct a node in the dependency graph
for a
22
Example
• Suppose A.a = f(X.x , Y.y) is a semantic
rule for A X Y
A A.a
X Y X.x Y.y
X.x Y.y
X Y
23
Example
• Whenever following production is used in a parse
tree
E E1 + E2 E.val = E1.val + E2.val
we create a dependency graph
E.val
E1.val E2.val
24
Example
• dependency graph for real id1, id2, id3
• put a dummy synthesized attribute b for a
semantic rule that consists of a procedure call
D
type=real in=real
T L
addtype(z,real)
Type_lexeme
real L in=real , id.zz
addtype(y,real)
in=real L , id.y y
addtype(x,real) x id.x
25
Evaluation Order
• Any topological sort of dependency graph
gives a valid order in which semantic rules
must be evaluated
a4 = real
a5 = a4
addtype(id3.entry, a5)
a7 = a5
addtype(id2.entry, a7 )
a9 := a7
addtype(id1.entry, a9 )
26
Syntax Tree
• Condensed form of parse tree,
• useful for representing language constructs.
• The production S → if B then s1 else s2
may appear as
if-then-else
B s1 s2
27
Syntax tree …
• Chain of single productions may be collapsed,
and operators move to the parent nodes
E +
E + T * id3
T F id1 id2
T * F id3
F id2
id1 28
Constructing Syntax tree for
expression
• Each node can be represented as a record
29
Example
the following
sequence of
function calls
creates a parse tree P5
for a- 4 + c
+
P3 P4
P1 = mkleaf(id, entry.a)
- id
P2 = mkleaf(num, 4) P1
P2
P3 = mknode(-, P1, P2) entry of c
30
A syntax directed definition
for constructing syntax tree
E → E1 + T E.ptr = mknode(+, E1.ptr, T.ptr)
E →T E.ptr = T.ptr
T → T1 * F T.ptr := mknode(*, T1.ptr, F.ptr)
T →F T.ptr := F.ptr
F → (E) F.ptr := E.ptr
F → id F.ptr := mkleaf(id, entry.id)
F → num F.ptr := mkleaf(num,val)
31
DAG for Expressions
Expression a + a * ( b – c ) + ( b - c ) * d
make a leaf or node if not present,
otherwise return pointer to the existing node
P1 = makeleaf(id,a)
P2 = makeleaf(id,a) P13
P3 = makeleaf(id,b) +
P4 = makeleaf(id,c) P7
P5 = makenode(-,P3,P4)
P6 = makenode(*,P2,P5) +
P7 =
makenode(+,P1,P6) P6 P12
P8 = makeleaf(id,b) * *
P9 = makeleaf(id,c) P5 P10
P1 P2 P11
P10 =
makenode(-,P8,P9) a - d
P11 = makeleaf(id,d)
P3 P8
P4 P9
P12 =
makenode(*,P10,P11) b c
P13 = 32
makenode(+,P7,P12)
Bottom-up evaluation of S-
attributed definitions
• Can be evaluated while parsing
state value
ptr stack stack
33
• Suppose semantic rule A.a = f(X.x, Y.y, Z.z) is
associated with production A → XYZ
34
Assignment 5: Extend the scheme which
has a rule number sign list . List replacing
number sign list (DUE Feb 28, 2005)
number sign list list.position 0
if sign.negative
then number.value - list.value
else number.value list.value
bit 0 bit.value 0
bit 1 bit.value 2bit.position
35
Example: desk calculator
L → En print(val(top))
E→E+T val(ntop) = val(top-2) + val(top)
E→T
T→T*F val(ntop) = val(top-2) * val(top)
T→F
F → (E) val(ntop) = val(top-1)
F → digit
36
INPUT STATE Val PRODUCTION
3*5+4n
*5+4n digit 3
*5+4n F 3 F → digit
*5+4n T 3 T→F
5+4n T* 3–
+4n T*digit3 – 5
+4n T*F 3–5 F → digit
+4n T 15 T→T*F
+4n E 15 E→ T
4n E+ 15 –
n E+digit 15 – 4
n E+F 15 – 4 F → digit
n E+T 15 – 4 T→F
n E 19 E → E +T
37
L-attributed definitions
• When translation takes place during
parsing, order of evaluation is linked to
the order in which nodes are created
38
L attributed definitions …
• A syntax directed definition is L-attributed if each inherited
attribute of Xj (1 ≤ j ≤ n) as the right hand side of A→X1 X2…Xn
depends only on
– Inherited attribute of A
A → LM L.i = f1(A.i)
M.i = f2(L.s)
As = f3(M.s)
A → QR Ri = f4(A.i)
Qi = f5(R.s)
A.s = f6(Q.s)
39
Translation schemes
• A CFG where semantic actions occur
within the rhs of production
E→ T R
R→ addop T {print(addop)} R | ε
T→ num {print(num)}
40
Parse tree for 9-5+2
E
T R
num print(num) Є
(2)
41
• Assume actions are terminal symbols
42
• In case of both inherited and synthesized attributes
S → A1 A2 {A1.in = 1,A2.in = 2}
A→a {print(A.in)}
S
A1 A2 A1.in=1
A2.in=2
a print(A1.in) a print(A2.in)
43
Example: Translation scheme
for EQN
S→B B.pts = 10
S.ht = B.ht
B → B1 B2 B1.pts = B.pts
B2.pts = B.pts
B.ht = max(B1.ht,B2.ht)
44
after putting actions in the right place
S → {B.pts = 10} B
{S.ht = B.ht}
B → {B1.pts = B.pts} B1
{B2.pts = B.pts} B2
{B.ht = max(B1.ht,B2.ht)}
45
Top down Translation
Use predictive parsing to implement L-
attributed definitions
46
Eliminate left recursion
E→ T {R.i = T.val}
R {E.val = R.s}
R→ +
T {R1.i = R.i + T.val}
R1 {R.s = R1.s}
R→ -
T {R1.i =R.i – T.val}
R1 {R.s = R1.s}
R→ ε {R.s = R.i}
47
E Parse tree for 9-5+2
T Ri=T.val R E.val=R.s
T.val=9
(9)
T.val=5
(5)
T.val=2
Num Є R.s=R.i
(2) 48
E E.val=R.s=6
(9)
Num Є
(2) 49
Removal of left recursion
Suppose we have translation scheme:
A → A1 Y {A = g(A1,Y)}
A→X {A = f(X)}
50
Bottom up evaluation of
inherited attributes
• Remove embedded actions from
translation scheme
51
Therefore,
ETR
R + T {print (+)} R
R - T {print (-)} R
RЄ
T num {print(num.val)}
transforms to
E→TR
R→+TMR
R→-TNR
R→Є
T → num {print(num.val)}
M→Є {print(+)}
N→Є {print(-)}
52
Inheriting attribute on parser
•
stacks
bottom up parser reduces rhs of A → XY by removing XY from stack
and putting A on the stack
L → id {addtype(id.entry,Lin)}
53
State stack INPUT PRODUCTION
real p,q,r
real p,q,r
T p,q,r T → real
Tp ,q,r
TL ,q,r L → id
TL, q,r
TL,q ,r
TL ,r L → L,id
TL, r
TL,r -
TL - L → L,id
D - D →TL
54
Example …
• Every tine a reduction to L is made value of T type is
just below it
55
Example …
Therefore, the translation scheme
becomes
DTL
T int val[top] =integer
T real val[top] =real
L id addtype(val[top], val[top-1])
56
Simulating the evaluation of
inherited attributes
• The scheme works only if grammar allows position of
attribute to be predicted.
S aAC Ci = As
S bABC Ci = As
Cc Cs = g(Ci)
• C inherits As
57
Simulating the evaluation …
• Insert a marker M just before C in the second
rule and change rules to
S aAC Ci = As
S bABMC Mi = As; Ci = Ms
Cc Cs = g(Ci)
Mε Ms = M i
58
Simulating the evaluation …
• Markers can also be used to simulate
rules that are not copy rules
S aAC Ci = f(A.s)
• using a marker
60
Algorithm …
• If the reduction is to a marker Mj and
the marker belongs to a production
Ai is in position top-2j+2
X1.i is in position top-2j+3
X1.s is in position top-2j+4
61
Space for attributes at
compile time
• Lifetime of an attribute begins when it
is first computed
62
Example
• Consider following definition
D T L L.in := T.type
T real T.type := real
T int T.type := int
L L1,I L1.in :=L.in; I.in=L.in
LI I.in = L.in
I I1[num] I1.in=array(numeral, I.in)
I id addtype(id.entry,I.in)
63
Consider string int x[3], y[5]
its parse tree and dependence graph
D
T 1 2 L
int 6 L , 3 I
7 I I [ num ]
5
4 5
I [ num ] id
9 8 3 y
id
64
x
Resource requirement
1 2 3 4 5 6 7 8 9
R1 R1 R2 R3 R2 R1 R1 R2 R1
66
Example
• Consider rule B B1 B2 with inherited
attribute ps and synthesized attribute ht
B.ht
B.ps B
B.ps
B2.ht
B2.ps
B1.ht
B1.ps
B1.ps
B1.ht B2.ps B.ps
B.ps
B1.ps B1.ht
B.ps B1.ps
B1 B2
B.ps
67
Example …
• However, if different stacks are maintained
for the inherited and synthesized attributes,
the stacks will normally be smaller
B2.ht
B.ps B1.ht
B.ps
B.ps B1.ht B.ps B1.ht B2
B1
68
Type system
• A type is set of values
69
Type system …
• Languages can be divided into three
categories with respect to the type:
– “untyped”
• No type checking needs to be done
• Assembly languages
– Statically typed
• All type checking is done at compile time
• Algol class of languages
• Also, called strongly typed
– Dynamically typed
• Type checking is done at run time
• Mostly functional languages like Lisp, Scheme etc.
70
Type systems …
• Static typing
– Catches most common programming errors at
compile time
– Avoids runtime overhead
– May be restrictive in some situations
– Rapid prototyping may be difficult
72
Type system and type checking
• If both the operands of arithmetic operators +, -, x are
integers then the result is of type integer
73
Type expression
• Type of a language construct is denoted by a
type expression
74
Type Constructors
• Array: if T is a type expression then array(I,
T) is a type expression denoting the type of an
array with elements of type T and index set I
75
Type constructors …
• Records: it applies to a tuple formed from field names and
field types. Consider the declaration
76
Type constructors …
• Pointer: if T is a type expression then
pointer( T ) is a type expression denoting type
pointer to an object of type T
• Function: function maps domain set to range
set. It is denoted by type expression D → R
77
Specifications of a type
checker
• Consider a language which
consists of a sequence of
declarations followed by a single
expression
P→D;E
D → D ; D | id : T
T → char | integer | array [ num] of T | ^ T
E → literal | num | E mod E | E [E] | E ^
78
Specifications of a type
checker …
• A program generated by this grammar is
key : integer;
key mod 1999
• Assume following:
– basic types are char, int, type-error
– all arrays start at 1
– array[256] of char has type expression
array(1 .. 256, char)
79
Rules for Symbol Table entry
D id : T addtype(id.entry, T.type)
T char T.type = char
T integer T.type = int
T ^T1 T.type = pointer(T1.type)
T array [ num ] of T1 T.type = array(1..num, T1.type)
80
Type checking for expressions
E → literal E.type = char
E → num E.type = integer
E → id E.type = lookup(id.entry)
E → E1 mod E2 E.type = if E1.type == integer and
E2.type==integer then integer else type_error
E → E1[E2] E.type = if E2.type==integer and
E1.type==array(s,t) then t else type_error
E → E1^ E.type = if E1.type==pointer(t) then t
else type_error
81
Type checking for statements
• Statements typically do not have values. Special basic
type void can be assigned to them.
82
Equivalence of Type expression
• Structural equivalence: Two type
expressions are equivalent if
• either these are same basic types
• or these are formed by applying same
constructor to equivalent types
• Name equivalence: types can be given
names
• Two type expressions are equivalent if they
have the same name
83
Function to test structural equivalence
84
Efficient implementation
• Bit vectors can be used to used to represent
type expressions. Refer to: A Tour Through
the Portable C Compiler: S. C. Johnson, 1979.
Basic type Encoding Type encoding
constructor
Boolean 0000
pointer 01
Char 0001
array 10
Integer 0010
real 0011 function 11
85
Efficient implementation …
Type expression encoding
86
Checking name equivalence
• Consider following declarations
type link = ^cell; var
next, last : link;
p, q, r : ^cell;
87
Name equivalence …
variable type expression
next link
last link
p pointer(cell)
q pointer(cell)
r pointer(cell)
88
Name equivalence …
• Some compilers allow type expressions to have names.
• However, some compilers assign implicit type names to
each declared identifier in the list of variables.
• Consider
type link = ^ cell;
var next : link;
last : link;
p : ^ cell;
q : ^ cell;
r : ^ cell;
89
Name equivalence …
The code is similar to
type link = ^ cell
np = ^ cell;
nq = ^ cell;
nr = ^ cell;
var next : link;
last : link;
p : np;
q : nq;
r : nr;
90
Cycles in representation of
types
• Data structures like linked lists are defined recursively
91
Cycles in representation of …
• Recursively defined type names can be substituted by
definitions
record record
X X X X
cell
92
Cycles in representation of …
• C uses structural equivalence for all types
except records
93
Type conversion
• Consider expression like x + i where x is of
type real and i is of type integer
94
Type conversion …
• Usually conversion is to the type of the left hand
side
95
Type checking for expressions
E → num E.type = int
E → num.num E.type = real
E → id E.type = lookup( id.entry )
96
Overloaded functions and
operators
• Overloaded symbol has different meaning
depending upon the context
97
Overloaded functions and
operators …
• In Ada standard interpretation of * is
multiplication
98
Overloaded function resolution
• Suppose only possible type for 2, 3 and 5 is
integer and Z is a complex variable
– then 3*5 is either integer or complex depending
upon the context
– in 2*(3*5)
3*5 is integer because 2 is integer
– in Z*(3*5)
3*5 is complex because Z is complex
99
Type resolution
• Try all possible types of each overloaded
function (possible but brute force method!)
100
Determining set of possible
types
E’ E E’.types = E.types
E id E.types = lookup(id)
E E1(E2) E.types = { t | there exists an s in E 2.types and
st is in E1.types}
E {i,c}
{i} E * E {i}
{ixii
ixic
{i} 3 5 {i}
cxcc} 101
Narrowing the set of possible
types
• Ada requires a complete expression to
have a unique type
102
Narrowing the set of …
E’ E E’.types = E.types
E.unique = if E’.types=={t} then t
else type_error
E id E.types = lookup(id)
103
Polymorphic functions
• Functions can be invoked with arguments of
different types
104
Polymorphic functions …
• Strongly typed languages can make programming very
tedious
105
Type variables
• Variables can be used in type expressions to represent
unknown types
106
• Consider
function deref(p);
begin
return p^
end;
• When the first line of the code is seen nothing is known about
type of p
– Represent it by a type variable
107
Reading assignment
• Rest of section 6.6 and section 6.7 of
Aho, Sethi and Ullman
108