0% found this document useful (0 votes)
2 views

22-Introduction to Semantic Analysis

The document provides an introduction to semantic analysis in programming languages, highlighting its role as the final phase of the front-end compiler process that checks for errors not caught by lexical analysis or parsing. It discusses the importance of scope, symbol tables, and type checking, particularly in the context of the COOL programming language. Additionally, it covers the concepts of static and dynamic typing, type inference, and the rules for type checking, emphasizing the need for sound rules in type systems.

Uploaded by

naimu767
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

22-Introduction to Semantic Analysis

The document provides an introduction to semantic analysis in programming languages, highlighting its role as the final phase of the front-end compiler process that checks for errors not caught by lexical analysis or parsing. It discusses the importance of scope, symbol tables, and type checking, particularly in the context of the COOL programming language. Additionally, it covers the concepts of static and dynamic typing, type inference, and the rules for type checking, emphasizing the need for sound rules in type systems.

Uploaded by

naimu767
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 128

Introduction to Semantic

Analysis

Arles Rodríguez
[email protected]

Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Motivation
• Lexical Analysis:
– Detects inputs with illegal tokens
• Parsing:
– Detects inputs with ill-formed parse trees.
– Ill-formed means not conforming to the rules of a
given language.
• Semantic Analysis:
– Last “front end” phase.
– Last line of defense that catches all remaining errors.
Motivation
• Parsing cannot catch some errors.
• Some language constructs are not context-free
grammars.
– What are some examples?
Motivation
• Parsing cannot catch some errors.
• Some language constructs are not context-free
grammars.
– What are some examples?
• Declare identifiers before using them in a program.
• Validate inheritance relationships
• Scope
• …
Motivation
• In coolc semantic analysis means to check:
– Scope restrictions on the identifiers
– All identifiers are declared
– Type checking
– Inheritance relationships should make sense
– Classes are defined only once
– Methods in a class are defined only once
– Reserved identifiers are not misused
– …
Scope
Scope
• It means to match identifier declarations with
uses:
– We need to validate if variable X might have more
than one definition in the program.
– This is performed in most programming languages.
– It includes COOL.
Scope: COOL example 1
• Given:

let y : String <- "ABC" in y + 3

is this possible?
Scope: COOL example 1
• Given:

let y : String <- "ABC" in y + 3

is this possible? No, because y is a String.


Scope: COOL example 2
• Given:

let y : Int <- in x + 3

is this possible?
Scope: COOL example 2
• Given:

let y : Int <- in x + 3

is this possible? Depends,


Scope: COOL example 2
• Given:

let y : Int <- in x + 3

is this possible? Depends, on the definition of x.


– If x is undeclared or undefined, we must detect
that.
Scope definition
• Scope of an identifier is the portion of the
program in which that identifier is accessible.
• The same identifier may refer to different
things in different parts of the program.
– Different scopes for the same name don’t overlap.
• An identifier may have restricted scope.
– E.g. a local variable in a method.
Scope
• Most languages have static scope:
– Scope depends only on the program text, non
runtime behavior.
– Cool has static scope.
• A few languages are dynamically scoped:
– LISP had dynamically scope but it changed to
mostly static scoping.
– Scope depends on the execution of the program.
Static Scope example
• Given:
let x : Int <- 0 in {
x;
let x: Int
<- 1 in
x;
x;
}
Static Scope example
Which definitions of x refer to code in blue?
let x : Int <- 0 in {
x;
let x: Int
<- 1 in
x;
x;
}
Static Scope example
Which definitions of x refer to code in blue?
let x : Int <- 0 in {
x;
let x: Int
<- 1 in
x;
x;
}
Static Scope example
Which definitions of x refer to code in blue?
let x : Int <- 0 in {
x;
let x: Int
<- 1 in
x;
x;
}
Static Scope example
Which definitions of x refer to code in purple?
let x : Int <- 0 in {
x;
let x: Int
<- 1 in
x;
x;
}
Static Scope example
Which definitions of x refer to code in purple?
let x : Int <- 0 in {
x;
let x: Int
<- 1 in
x;
x;
}
Static Scope
• We use the most closely nested rule.
• A variable binds to the definition that is most
closely enclosing the same name.
let x : Int <- 0 in {
x;
let x:
Int <- 1 in

x;
x;
}
Dinamically-scope example
• A dynamically-scoped variable refers to the
closest enclosing binding in the execution of
the program.
• Given the next code and dynamically scope:
g(y) = let a <- 4 in f(3);
f(x) = a;
What is the value that f(x) returns?
Dinamically-scope example
• A dynamically-scoped variable refers to the
closest enclosing binding in the execution of
the program.
• Given the next code and dynamically scope:
g(y) = let a <- 4 in f(3);
f(x) = a;
What is the value that f(x) returns? Returns 4
Cool identifier bindings
• Cool identifier bindings are introduced by:
– Class declarations (introduce class names)
– Method definitions (introduce method names)
– Let expressions (introduce object id’s)
– Formal parameters (introduce object id’s)
– Attribute definitions (introduce object id’s)
– Case expressions (introduce object id’s)
Scope
• Not all identifiers follow the most-closely
nested rule.
• In cool class definitions cannot be nested.
• Class are globally visible throughout the
program.
• A class name can be used before it is defined.
Scope
class Foo {
let y : Bar
in … • In COOL a class name
} can be used before it is
defined.
Class Bar {

}
Scope
• In COOL attribute
class Foo {
names are global
f(): Int { a }; within the class in
a : Int <-0; which they are defined.
}
• What could be wrong
with this code?
Class Bar {

}
Scope
• In COOL attribute
class Foo {
names are global
f(): Int { a }; within the class in
a : Int <-0; which they are defined.
}
• This is legal, attribute
definitions are usually
Class Bar { defined before method
… definitions, but this is
} not required.
Scope
• Method names have complex rules:
– A method does not have to be defined in the class
in which is used.
– It could be defined in some parent class.
– Methods can be redefined (Overridden).
Exercise
Symbol tables
Motivation symbol tables
• A lot of Semantic Analysis and Code
generation can be expressed as a
recursive descent of an AST.
• In each step:
– Before: Process an AST node n
– Recurse: Process the children of n
– After: Finish processing the AST node n
• When performing semantic analysis
on a portion of the AST, we need to
know which identifiers are defined.
Example: recursive descent stategy
• Scope of let bindings is
one subtree of the AST: let Sym
table
X

let x: Int <- 0 in e Sym


Sym table
table X

• x is defined in subtree e init e


Symbol Table
• Given: let x: Int <- 0 in e
• Idea:
– Before processing e, add definition of x to current
definitions, overriding any other definition of x
– Recurse
– After processing e, remove definition of x and
restore old definition of x
• A symbol table is a data structure that tracks
the current bindings of identifiers.
Simple Symbol table implementation
• For a simple symbol table, we can use a stack.
• Operations:
– add_symbol(x): push x and associated info such as
type of x on the stack.
– find_symbol(x): search stack, starting from top,
for x. Returns first x found of NULL if none found.
– remove_symbol(): pop symbol from stack.
Simple symbol table
• Works for let because:
let x
– Symbols are added one
at a time.
let y
– Declarations are
perfectly nested.
let z
Symbol tables
• To be more robust, some definitions are added
to the symbol table implementation:
– enter_scope(): starts a new nested scope.
– exit_scope(): exit current scope.
– check_scope(x): returns true if x is already defined
in the current scope. It allows check for double
definitions like:
f(x: Int, x: Int){…
For the lab 3, this implementation is supplied.
Symbol tables: class names
• Class names can be used before being defined.
• We cannot check class names in a single pass or
using a symbol table.
• Solution: Perform two passes
– Pass 1: gather all class names.
– Pass 2: do the checking.
• Semantic analysis require multiple passes
(probably more than two).
Types
Introduction to types
• What is a type?
– Notion variates from language to language
• Consensus
– A type is a set of values and a set of operations
on those values.
– E.g, ( int: +,-,<=, <) (string: concat, isnull)
• Classes are one instantiation of the modern
notion of types.
Types: assembly
• Consider the assembly language fragment:
add $r1, $r2, $r3

What are the types of $r1, $r2, $r3?


Types: assembly
• Consider the assembly language fragment:
add $r1, $r2, $r3

What are the types of $r1, $r2, $r3?

add $r1, $r2, $r3

+
Types: assembly
• Consider the assembly language fragment:
add $r1, $r2, $r3

What are the types of $r1, $r2, $r3?

add $r1, $r2, $r3

+
Types: assembly
• Consider the assembly language fragment:
add $r1, $r2, $r3

• What are the types of $r1, $r2, $r3?


– In assembly, they could be representatives of any
kind of type
– They are just a bunch of registers with zeros and
ones.
– Add produces a bit pattern and saves it in r1.
Types: operations
• Certain operations are legal for values of each
type. By example:
– It does make sense to add two integers.
– It does not make sense to add a function pointer
and an integer in C.
– Both have the same assembly language
implementation.
Language types
• A language’s type system specifies which
operations are valid for which types.
• Goal of type checking is to ensure that
operations are used only with the correct
types.
– Enforces intended value interpretations.
Three kind of languages
• Statically types: All or almost all
Java example:
checking of types is done as part int number=5;

of compilation (C, Java, COOL). numbr =


(number+15)/2;

• Dynamically types: Almost all Python example:


checking of types is done as part number=5;
numbr =

of program execution (Javascript, (number+15)/2;

python, perl).
• Untyped: No type checking
(machine code).
Debate: static vs dynamic
Static typing: Dynamic typing:
• Static checking catches • Static type systems are
many programming restrictive.
errors at compile time. • Rapid prototyping is
• Avoids overhead of difficult within a static
runtime checks. type system.
Types
• A lot of code is written in statically typed
languages with an “escape mechanism”
– Unsafe cast C++/Java
• People adapt static typing to dynamically
typed languages:
– For optimization and debugging.
• It is debatable whether either compromise
represents the best or the worst of both
worlds.
Cool Types
• Types in Cool are:
– Class Names
– SELF_TYPE
• User declares the types of each identifier,
then the compiler does the rest of the work:
– Compiler infers types for expressions.
– Infers a type for every expression.
Processing computer types
• There are two different processes:
– Type checking: is the process of verifying fully typed
programs.
• We have an abstract syntax tree with all types filled on every node.
• Look at each node and confirm that types are correct in that part
of the tree.
– Type inference: is the process of filling in missing type
information.
• We have an abstract syntax tree with no types or few types in key
locations.
• We want to fill in missing types in some nodes so require checking
that types are correct and fill missing data.
Type checking in COOL
• Formal notations to specifying parts of a
compiler:
– Regular expressions
– Context-free grammars.
• The appropriate formalism for type checking is
logical rules of inference.
Type checking: inference rules
• Inference rules have the form:
– If hypothesis is true, then conclusion is true
• Type checking computes via reasoning:
– If and have certain types, then has a certain
type.
• Inference rules are a compact notation for “If-
Then” statements.
Type checking notation
• The notation is easy to read with practice.
• Start with simplified system and gradually add
features.
• Building blocks are:
– Symbol is and
– Symbol is if-then
– x:T is x has type T
Type checking example
• If has type Int and has type Int, then has type

Type checking example
• If has type Int and has type Int, then has type
Int
• This can be reduced to a mathematical
statement:
has type Int has type Int) ( has type Int )
:Int :Int) :Int
Type checking example
• The statement:
:Int :Int) :Int
• Is a special case of:

• This is an inference rule


Type checking example
• By tradition inference rules are written:

• means “it is provable that …”


• Cool type rules have hypotheses and
conclusions
Type checking cool example
• Rule for [Int]

• Rule for [Add] [Int]

[Add]
Type checking
• These rules give templates describing how to
type integers and + expressions
• By filling the templates, we can produce
complete typing for expressions.
Type checking: sound rules
• A type system is sound if
– Whenever
– If we run the program, then evaluates to a value
of type
• We only want sound rules:
– Some sound rules are a quite better than others.
– is true but it is not at all useful
– We want the most specific type that we can
Exercise
Type checking summary
• Type checking proves facts e:T
– Proof is on the structure of the AST
– Proof has the shape of the AST
– One type rule is used for each AST node.
• In the type rule used for a node e:
– Hypotheses are the proofs of types of e’s
subexpressions.
– Conclusion is the type of e.
• Types are computed in a bottom-up pass over the
AST.
Type environments
• Type rule for constant false

⊢ 𝑓𝑎𝑙𝑠𝑒 : 𝐵𝑜𝑜𝑙 [False]

• Type rule for string literal

𝑠 is a string literal
[String]
⊢ 𝑠 :𝑆𝑡𝑟𝑖𝑛𝑔
Type environments
• Type rule for expression new T

⊢𝑛𝑒𝑤 𝑇 :𝑇 [New]

• SELF_TYPE is ignored by now..


Type environments
• Type rule for expression not
⊢ 𝑒: 𝐵𝑜𝑜𝑙
⊢! 𝑒: 𝐵𝑜𝑜𝑙 [Not]
• Type rule for while loop We don’t care
about type of
⊢𝑒1 :𝐵𝑜𝑜𝑙 because type of
entire
⊢𝑒2 :𝑇 expression is
⊢ 𝑤h𝑖𝑙𝑒𝑒1 𝑙𝑜𝑜𝑝𝑒2 𝑝𝑜𝑜𝑙:𝑂𝑏𝑗𝑒𝑐𝑡 [Loop]
Type environments
• What is the type of a variable reference?
[Var]
• We don’t have enough information to give x a
type.
• All the information of inference rules needs to
be local.
• There are not external structures.
Type environments
• We can put more information in the rules
• A type environment gives types for free
variables.
– A type environment is a function from
ObjectIdentifiers to Types
– A variable is free in an expression if it is not
defined within the expression
• In expression x + y, x and y are free variables.
• In let y <- in … x + y, y is bound and x is free
Type Environment
• Let O be a function from ObjectIdentifiers to
Types
• The sentence is read:
– Under the assumption that free variables have
the types given by O, it is provable that the
expression has the type
Type environments
• The type environment is added to earlier
rules:
𝑖is an integer literal
[Int]
𝑂 ⊢ 𝑖 :∫ ¿ ¿

𝑂 ⊢𝑒 1 : ∫ 𝑂 ⊢𝑒 2 :
∫¿ ¿[Add]
𝑂 ⊢𝑒 1+𝑒 2 : ∫ ¿ ¿
Type environments
• We can write new rules
𝑂 ( 𝑥 ) =𝑇
[Var]
𝑂⊢ 𝑥:T

=T
• We can have following notation:

Before we type check ,


we need to include a new
=
assumption about x
𝑂 [ 𝑇 0 / 𝑥 ] ⊢𝑒 1 : 𝑇 1 [Let-No-Init]
𝑂 ⊢ 𝑙𝑒𝑡 𝑥 : 𝑇 0 𝑖𝑛 𝑒1 : 𝑇 1
Type environments
• We can write new rules
𝑂 ( 𝑥 ) =𝑇
[Var]
𝑂⊢ 𝑥:T

=
• We can have following notation:

When we leave type


checking ,
=
we remove
assumption about x
𝑂 [ 𝑇 0 / 𝑥 ] ⊢𝑒 1 : 𝑇 1 [Let-No-Init]
𝑂 ⊢ 𝑙𝑒𝑡 𝑥 : 𝑇 0 𝑖𝑛 𝑒1 : 𝑇 1
Type environment
𝑂 [ 𝑇 0 / 𝑥 ] ⊢𝑒 1 : 𝑇 1
𝑂 ⊢ 𝑙𝑒𝑡 𝑥 : 𝑇 0 𝑖𝑛 𝑒1 : 𝑇 1

This type environment


is really implemented
by a symbol table
Exercise
Summary: type environments
• A type environment gives types to the free
identifiers in the current scope.
• Type environment is passed down the AST
from the root towards the leaves.
• Types are computed up the AST from the
leaves towards the root.
Subtyping
Subtyping
• Consider let with initialization

[Let-init]
Subtyping
• Consider let with initialization

[Let-init]

• Rule says that has the same type than x


• There is no problem if has a type which is a
subtype of
Subtyping definition
• Define a relation called subtyping on classes:

– if X inherits from Y
– if and

[Let-init]
Subtyping application ASSIGN

[Assign]
Subtyping application ASSIGN

[Assign]

• Lhs is a variable, is an expression


• To validate the expression, we need to lookup the value
type of …
Subtyping application ASSIGN

[Assign]

• Lhs is a variable, is an expression


• To validate the expression, we need to lookup the value
type of it is
Subtyping application ASSIGN

[Assign]

• Lhs is a variable, is an expression


• To validate the expression, we need to lookup the value
type of it is
• In the same environment O, type of is …
Subtyping application ASSIGN

[Assign]

• Lhs is a variable, is an expression


• To validate the expression, we need to lookup the value
type of it is
• In the same environment O, type of is
Subtyping application ASSIGN

[Assign]

• Lhs is a variable, is an expression


• To validate the expression, we need to lookup the value
type of it is
• In the same environment O, type of is
• What must be true for this assignment to be correct?
Subtyping application ASSIGN

[Assign]

• Lhs is a variable, is an expression


• To validate the expression, we need to lookup the value
type of it is
• In the same environment O, type of is
• We require that to be a subtype of
Subtyping application: Attribute
initialization
• Let for all attributes in class

[Attr-init]

• Lhs is an attribute, is an expression


• To validate the expression, we need to lookup the value
type of it is
• In the same environment , type of is
• We require that to be a subtype of
Subtyping application: if and else
• Consider:
if then else fi
• Result can be either or
• Type can be either ‘s type or ‘s type
• The best we can do is the smallest supertype
larger than the type of or
• We define this supertype as lub (least upper
bound)
Subtyping: lub
• , the least upper bound of X and Y, is Z if:

• is an upper bound

• is least among upper bounds


• is the smallest of all possible upper bounds of and
• In COOL, the least upper bound of two types is
their least common ancestor in the inheritance
tree.
Subtyping: lub
• In COOL, the least upper bound of two types is
their least common ancestor in the inheritance
tree.
Object

X Y
Where is the lub(X, Y)?
Subtyping: lub
• In COOL, the least upper bound of two types is
their least common ancestor in the inheritance
tree.
Object

Here!

X Y
Where is the lub(X, Y)?
Exercise
Subtyping If-then-else
Subtyping If-then-else

• Predicate of if is type Bool


Subtyping If-then-else

• Branches of if have different types


Subtyping If-then-else

• Type of entire expression is the least upper bound of


and
Subtyping: case in cool

• Case in cool evaluates and looks the runtime


type of .
• Then look at the first branch
Subtyping: case in cool

• Then look at the first branch.


• Compare the type of at runtime with the type
• Validates if is the smallest of all possible branches that matches
runtime type of
• If it is true, bind to type and evaluate
Subtyping: case in cool

• Validates if is the smallest of all possible branches that


matches runtime type of
• If not, validates if matches and execute branch with the
variable of type
Subtyping: case in cool

• The type of entire expression is the least upper bound over all
types
Typing methods and method calls
Typing methods

[Dispatch]

• Arguments are
• What kind of value do we get back after calling this
method?
Typing methods

[Dispatch]

• What kind of value do we get back after calling this


method?
• We do not know what is f,
• unless we have some information about f’s behavior,
we cannot say what kind of value is going to return.
Typing methods
• In COOL, method and object identifiers live in
different name spaces.
– A method foo and an object foo can coexist in the
same scope.
– There is a separate method environment M to
record each method signatures.
– In class C, there is a method f
Return type!
– Such as:
Typing methods

[Dispatch]

for
Typing methods

[Dispatch]

for
• We have two mappings one for object idenfifiers and another for
method names
Typing methods

[Dispatch]

for
• Propagate method environment though all the typing of the sub
expressions
Typing methods

[Dispatch]

for
• Look method f defined in class
Typing methods

[Dispatch]

for
• Validate that arguments we are passing match subtypes of the
declared formal parameter.
Typing methods

for [static Dispatch]

• Is similar to Dispatch but programmer writes the name of the


class at which they wish to run the method.
Typing methods
Instead of run
method f as defined
in class , we are
going to run
whatever the
method F happens in
some ancestor class
of

for [static Dispatch]

• Is similar to Dispatch but programmer writes the name of the


class at which they wish to run the method.
Exercise
Typing methods
• The method environment must be added to all
rules.
• In most cases, M is passed down but not
actually used.
– Only the dispatch rules use M

𝑂 ⊢𝑒 1 : ∫ 𝑂 ⊢𝑒 2 :
∫¿ ¿[Add]
𝑂 ⊢𝑒 1+𝑒 2 : ∫ ¿ ¿
Typing methods
• The method environment must be added to all
rules.
• In most cases, M is passed down but not
actually used.
– Only the dispatch rules use M

𝑂 , 𝑀 ⊢𝑒1 :∫ 𝑂, 𝑀 ⊢𝑒2 :
∫¿ ¿ [Add]
𝑂, 𝑀 ⊢𝑒1 +𝑒2 :∫ ¿¿
Typing methods: SELF_TYPE
• For some cases involving SELF_TYPE, we need
to know the class in which an expression
appears.
• The full type environment for COOL:
– The mapping O, giving types to object id’s
– A mapping M, giving types to methods.
– The current class C
Typing methods
• The full form of a sentence in the logic is:

• Example:

𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
Summary
• Rules given are COOL-specific
– Some other languages have very different rules
• There are some general themes in type checking:
– Type rules are defined on the structure of
expressions (inductive fashion).
– Types of variables are modeled by an environment.
• Type rules are very compact!
– There is a lot of information in this rules.
– Require carefully reading to really be understood.
Implementing Type Checking
Implementing type checking
• COOL type checking can be implemented in a
single traversal over the AST.
• There are two phases here:
– Top-down phase: Type environment is passed
down the tree from parent to child.
– Bottom-up phase: Types are passed back up from
child to parent.
• Starting at the leaves we use the environment to
compute the types of each sub-expression working our
way back the tree to the root.
Implementing addition

𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
Implementing addition

𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
TypeCheck(Environment, );
Implementing addition

𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
TypeCheck(Environment, );
TypeCheck(Environment, );
Implementing addition

𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
TypeCheck(Environment, );
TypeCheck(Environment, );
Check == == ;
return Int;
}
Implementing Let

[Let-init]

TypeCheck(Environment,) = {
Implementing Let

[Let-init]

TypeCheck(Environment,) = {
TypeCheck(Environment, );
Implementing Let

[Let-init]

TypeCheck(Environment,) = {
TypeCheck(Environment, );
TypeCheck(Environment.add(x:T), );
check subtype();
return ;
}
¡Thank you!
References
• Aho et al. Compilers: principles,
techniques, and tools. Torczon et al.
(2014) (Section 5.1.-5.3)
• Slides are based on the design of Aiken Alex.
CS 143. https://web.stanford.edu/class/cs143/

You might also like