22-Introduction to Semantic Analysis
22-Introduction to Semantic Analysis
Analysis
Arles Rodríguez
[email protected]
Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Motivation
• Lexical Analysis:
– Detects inputs with illegal tokens
• Parsing:
– Detects inputs with ill-formed parse trees.
– Ill-formed means not conforming to the rules of a
given language.
• Semantic Analysis:
– Last “front end” phase.
– Last line of defense that catches all remaining errors.
Motivation
• Parsing cannot catch some errors.
• Some language constructs are not context-free
grammars.
– What are some examples?
Motivation
• Parsing cannot catch some errors.
• Some language constructs are not context-free
grammars.
– What are some examples?
• Declare identifiers before using them in a program.
• Validate inheritance relationships
• Scope
• …
Motivation
• In coolc semantic analysis means to check:
– Scope restrictions on the identifiers
– All identifiers are declared
– Type checking
– Inheritance relationships should make sense
– Classes are defined only once
– Methods in a class are defined only once
– Reserved identifiers are not misused
– …
Scope
Scope
• It means to match identifier declarations with
uses:
– We need to validate if variable X might have more
than one definition in the program.
– This is performed in most programming languages.
– It includes COOL.
Scope: COOL example 1
• Given:
is this possible?
Scope: COOL example 1
• Given:
is this possible?
Scope: COOL example 2
• Given:
x;
x;
}
Dinamically-scope example
• A dynamically-scoped variable refers to the
closest enclosing binding in the execution of
the program.
• Given the next code and dynamically scope:
g(y) = let a <- 4 in f(3);
f(x) = a;
What is the value that f(x) returns?
Dinamically-scope example
• A dynamically-scoped variable refers to the
closest enclosing binding in the execution of
the program.
• Given the next code and dynamically scope:
g(y) = let a <- 4 in f(3);
f(x) = a;
What is the value that f(x) returns? Returns 4
Cool identifier bindings
• Cool identifier bindings are introduced by:
– Class declarations (introduce class names)
– Method definitions (introduce method names)
– Let expressions (introduce object id’s)
– Formal parameters (introduce object id’s)
– Attribute definitions (introduce object id’s)
– Case expressions (introduce object id’s)
Scope
• Not all identifiers follow the most-closely
nested rule.
• In cool class definitions cannot be nested.
• Class are globally visible throughout the
program.
• A class name can be used before it is defined.
Scope
class Foo {
let y : Bar
in … • In COOL a class name
} can be used before it is
defined.
Class Bar {
…
}
Scope
• In COOL attribute
class Foo {
names are global
f(): Int { a }; within the class in
a : Int <-0; which they are defined.
}
• What could be wrong
with this code?
Class Bar {
…
}
Scope
• In COOL attribute
class Foo {
names are global
f(): Int { a }; within the class in
a : Int <-0; which they are defined.
}
• This is legal, attribute
definitions are usually
Class Bar { defined before method
… definitions, but this is
} not required.
Scope
• Method names have complex rules:
– A method does not have to be defined in the class
in which is used.
– It could be defined in some parent class.
– Methods can be redefined (Overridden).
Exercise
Symbol tables
Motivation symbol tables
• A lot of Semantic Analysis and Code
generation can be expressed as a
recursive descent of an AST.
• In each step:
– Before: Process an AST node n
– Recurse: Process the children of n
– After: Finish processing the AST node n
• When performing semantic analysis
on a portion of the AST, we need to
know which identifiers are defined.
Example: recursive descent stategy
• Scope of let bindings is
one subtree of the AST: let Sym
table
X
+
Types: assembly
• Consider the assembly language fragment:
add $r1, $r2, $r3
+
Types: assembly
• Consider the assembly language fragment:
add $r1, $r2, $r3
python, perl).
• Untyped: No type checking
(machine code).
Debate: static vs dynamic
Static typing: Dynamic typing:
• Static checking catches • Static type systems are
many programming restrictive.
errors at compile time. • Rapid prototyping is
• Avoids overhead of difficult within a static
runtime checks. type system.
Types
• A lot of code is written in statically typed
languages with an “escape mechanism”
– Unsafe cast C++/Java
• People adapt static typing to dynamically
typed languages:
– For optimization and debugging.
• It is debatable whether either compromise
represents the best or the worst of both
worlds.
Cool Types
• Types in Cool are:
– Class Names
– SELF_TYPE
• User declares the types of each identifier,
then the compiler does the rest of the work:
– Compiler infers types for expressions.
– Infers a type for every expression.
Processing computer types
• There are two different processes:
– Type checking: is the process of verifying fully typed
programs.
• We have an abstract syntax tree with all types filled on every node.
• Look at each node and confirm that types are correct in that part
of the tree.
– Type inference: is the process of filling in missing type
information.
• We have an abstract syntax tree with no types or few types in key
locations.
• We want to fill in missing types in some nodes so require checking
that types are correct and fill missing data.
Type checking in COOL
• Formal notations to specifying parts of a
compiler:
– Regular expressions
– Context-free grammars.
• The appropriate formalism for type checking is
logical rules of inference.
Type checking: inference rules
• Inference rules have the form:
– If hypothesis is true, then conclusion is true
• Type checking computes via reasoning:
– If and have certain types, then has a certain
type.
• Inference rules are a compact notation for “If-
Then” statements.
Type checking notation
• The notation is easy to read with practice.
• Start with simplified system and gradually add
features.
• Building blocks are:
– Symbol is and
– Symbol is if-then
– x:T is x has type T
Type checking example
• If has type Int and has type Int, then has type
…
Type checking example
• If has type Int and has type Int, then has type
Int
• This can be reduced to a mathematical
statement:
has type Int has type Int) ( has type Int )
:Int :Int) :Int
Type checking example
• The statement:
:Int :Int) :Int
• Is a special case of:
[Add]
Type checking
• These rules give templates describing how to
type integers and + expressions
• By filling the templates, we can produce
complete typing for expressions.
Type checking: sound rules
• A type system is sound if
– Whenever
– If we run the program, then evaluates to a value
of type
• We only want sound rules:
– Some sound rules are a quite better than others.
– is true but it is not at all useful
– We want the most specific type that we can
Exercise
Type checking summary
• Type checking proves facts e:T
– Proof is on the structure of the AST
– Proof has the shape of the AST
– One type rule is used for each AST node.
• In the type rule used for a node e:
– Hypotheses are the proofs of types of e’s
subexpressions.
– Conclusion is the type of e.
• Types are computed in a bottom-up pass over the
AST.
Type environments
• Type rule for constant false
𝑠 is a string literal
[String]
⊢ 𝑠 :𝑆𝑡𝑟𝑖𝑛𝑔
Type environments
• Type rule for expression new T
⊢𝑛𝑒𝑤 𝑇 :𝑇 [New]
𝑂 ⊢𝑒 1 : ∫ 𝑂 ⊢𝑒 2 :
∫¿ ¿[Add]
𝑂 ⊢𝑒 1+𝑒 2 : ∫ ¿ ¿
Type environments
• We can write new rules
𝑂 ( 𝑥 ) =𝑇
[Var]
𝑂⊢ 𝑥:T
=T
• We can have following notation:
=
• We can have following notation:
[Let-init]
Subtyping
• Consider let with initialization
[Let-init]
– if X inherits from Y
– if and
[Let-init]
Subtyping application ASSIGN
[Assign]
Subtyping application ASSIGN
[Assign]
[Assign]
[Assign]
[Assign]
[Assign]
[Assign]
[Attr-init]
• is an upper bound
X Y
Where is the lub(X, Y)?
Subtyping: lub
• In COOL, the least upper bound of two types is
their least common ancestor in the inheritance
tree.
Object
Here!
X Y
Where is the lub(X, Y)?
Exercise
Subtyping If-then-else
Subtyping If-then-else
• The type of entire expression is the least upper bound over all
types
Typing methods and method calls
Typing methods
[Dispatch]
• Arguments are
• What kind of value do we get back after calling this
method?
Typing methods
[Dispatch]
[Dispatch]
for
Typing methods
[Dispatch]
for
• We have two mappings one for object idenfifiers and another for
method names
Typing methods
[Dispatch]
for
• Propagate method environment though all the typing of the sub
expressions
Typing methods
[Dispatch]
for
• Look method f defined in class
Typing methods
[Dispatch]
for
• Validate that arguments we are passing match subtypes of the
declared formal parameter.
Typing methods
𝑂 ⊢𝑒 1 : ∫ 𝑂 ⊢𝑒 2 :
∫¿ ¿[Add]
𝑂 ⊢𝑒 1+𝑒 2 : ∫ ¿ ¿
Typing methods
• The method environment must be added to all
rules.
• In most cases, M is passed down but not
actually used.
– Only the dispatch rules use M
𝑂 , 𝑀 ⊢𝑒1 :∫ 𝑂, 𝑀 ⊢𝑒2 :
∫¿ ¿ [Add]
𝑂, 𝑀 ⊢𝑒1 +𝑒2 :∫ ¿¿
Typing methods: SELF_TYPE
• For some cases involving SELF_TYPE, we need
to know the class in which an expression
appears.
• The full type environment for COOL:
– The mapping O, giving types to object id’s
– A mapping M, giving types to methods.
– The current class C
Typing methods
• The full form of a sentence in the logic is:
• Example:
𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
Summary
• Rules given are COOL-specific
– Some other languages have very different rules
• There are some general themes in type checking:
– Type rules are defined on the structure of
expressions (inductive fashion).
– Types of variables are modeled by an environment.
• Type rules are very compact!
– There is a lot of information in this rules.
– Require carefully reading to really be understood.
Implementing Type Checking
Implementing type checking
• COOL type checking can be implemented in a
single traversal over the AST.
• There are two phases here:
– Top-down phase: Type environment is passed
down the tree from parent to child.
– Bottom-up phase: Types are passed back up from
child to parent.
• Starting at the leaves we use the environment to
compute the types of each sub-expression working our
way back the tree to the root.
Implementing addition
𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
Implementing addition
𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
TypeCheck(Environment, );
Implementing addition
𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
TypeCheck(Environment, );
TypeCheck(Environment, );
Implementing addition
𝑂, 𝑀,𝐶⊢𝑒1 :∫ 𝑂 , 𝑀,𝐶⊢𝑒2 :
∫¿ ¿ [Add]
𝑂,𝑀 ,𝐶⊢𝑒1 +𝑒2 :∫ ¿¿
• This can be implemented via recursion:
TypeCheck(Environment, ) = {
TypeCheck(Environment, );
TypeCheck(Environment, );
Check == == ;
return Int;
}
Implementing Let
[Let-init]
TypeCheck(Environment,) = {
Implementing Let
[Let-init]
TypeCheck(Environment,) = {
TypeCheck(Environment, );
Implementing Let
[Let-init]
TypeCheck(Environment,) = {
TypeCheck(Environment, );
TypeCheck(Environment.add(x:T), );
check subtype();
return ;
}
¡Thank you!
References
• Aho et al. Compilers: principles,
techniques, and tools. Torczon et al.
(2014) (Section 5.1.-5.3)
• Slides are based on the design of Aiken Alex.
CS 143. https://web.stanford.edu/class/cs143/