0% found this document useful (0 votes)
2 views

CH 06 CodeGeneration

The document provides an overview of code generation and intermediate representations in compiler design, detailing the front end and back end processes. It discusses various forms of intermediate representations, such as abstract syntax trees and directed acyclic graphs, and their roles in code optimization and translation. Additionally, it covers semantic actions related to declarations and expressions, emphasizing the importance of semantic checking and memory allocation in the compilation process.

Uploaded by

Arif Kamal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CH 06 CodeGeneration

The document provides an overview of code generation and intermediate representations in compiler design, detailing the front end and back end processes. It discusses various forms of intermediate representations, such as abstract syntax trees and directed acyclic graphs, and their roles in code optimization and translation. Additionally, it covers semantic actions related to declarations and expressions, emphasizing the importance of semantic checking and memory allocation in the compilation process.

Uploaded by

Arif Kamal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Lecture 10

Introduction to Code Generation


and
Intermediate Representations

Joey Paquet, 2000, 2002 1


Introduction to Code Generation
• Front end:
– Lexical Analysis
– Syntactic Analysis
– Intermediate Code Generation
• Back end:
– Intermediate Code Optimization
– Object Code Generation
• The front end is machine-independent, i.e. it can be
reused to build compilers for different architectures
• The back end is machine-dependent, i.e. these steps
are related to the nature of the assembly or machine
language of the target architecture

Joey Paquet, 2000, 2002 2


Introduction to Code Generation
• After syntactic analysis, we have a number of options to
choose from:
– generate object code directly from the parse
– generate intermediate code, and then generate object
code from it
– generate an intermediate abstract representation, and
then generate code directly from it
– generate an intermediate abstract representation,
generate intermediate code, and then the object code
• All these options have one thing in common: they are all
based on syntactic information gathered in the semantic
analysis

Joey Paquet, 2000, 2002 3


Introduction to Code Generation

Lexical Syntactic Object


Analyzer Analyzer Code

Lexical Syntactic Intermediate Object


Analyzer Analyzer Code Code

Lexical Syntactic Intermediate Object


Analyzer Analyzer Representation Code

Lexical Syntactic Intermediate Intermediate Object


Analyzer Analyzer Representation Code Code

Front End Back End

Joey Paquet, 2000, 2002 4


Interm. Representations & Code
• Intermediate representations synthetize
the syntactic information gathered during the
parse, generally in the form of a tree or
directed graph.
• Intermediate representations enable high-level
code optimization.
• Intermediate code is a low-level coded (text)
representation of the program, directly
translatable to object code.
• Intermediate code enables low-level,
architecture-dependent optimizations.

Joey Paquet, 2000, 2002 5


Part I

Intermediate Representations

Joey Paquet, 2000, 2002 6


Abstract Syntax Trees
• Each node represents the application of a rule in the grammar
• A subtree is created only after the complete parsing of a right
hand side
• Pointers to subtrees are sent up and grafted as upper subtrees
are completed
• Parse trees (concrete syntax trees) emphasize the
grammatical structure of the program
• Abstract syntax trees emphasize the actual computations
to be performed. They do not refer to the actual non-terminals
defined in the grammar, hence their name.

Joey Paquet, 2000, 2002 7


Parse vs Abstract Syntax Trees
Parse Tree Abstract Syntax Tree
x = a*b+a*b x = a*b+a*b

A =

x = E x +

E + E * *

a * b a * b a b a b

Joey Paquet, 2000, 2002 8


Directed Acyclic Graphs (DAG)
• Directed acyclic graphs (DAG) are a relative
of syntax trees: they are used to show the
syntactic structure of valid programs in the
form of a “tree”.
• In DAGs, the nodes for repeated variables and
expressions are merged into a single node.
• DAGs are more complicated to build than
syntax trees, but directly implements lots of
code optimization by avoiding redundant
operations.

Joey Paquet, 2000, 2002 9


AST vs DAG

Abstract Syntax Tree Directed Acyclic Graph


x = a*b+a*b x = a*b+a*b

= =

x + x +

* * *

a b a b a b

Joey Paquet, 2000, 2002 10


Postfix Notation
• Every expression is rewritten with its operators at the
end, e.g.:

a+b  ab+
a+b*c  abc*+
if A then B else C  ABC?
If A then if B then C else D else E  ABCD?E?
x=a*b+a*b  ab*ab*+x=

• Easy to generate from a bottom-up parse


• Can be generated from a syntax tree using postorder
traversal

Joey Paquet, 2000, 2002 11


Postfix Notation
• Its nature allows it to be naturally evaluated with the
use of a stack
• Operands are pushed onto the stack; operators pop the
right amount of operands from the stack, do the
operation, then push the result back onto the stack.
• However, this notation is restricted to simple
expressions such as in arithmetics where every rule
conveys an operation
• It cannot be used for the expression of most
programming languages constructs

Joey Paquet, 2000, 2002 12


Three-Address Code
• Three-address codes (3AC) is an intermediate
language that maps directly to assembly code,
but that is not architecture-dependent
• It breaks the program into short statements
requiring no more than three variables and no
more than one operator, e.g:

source 3AC
x = a+b*c t := b*c
x := a+t

Joey Paquet, 2000, 2002 13


Three-Address Code
• The temporary variables are
3AC ASM
generated at compile time
and added to the symbol t := b*c L 3,b
table M 3,c
• In the generated code, the ST 3,t
variables will refer to actual x := a+t L 3,a
memory cells. Their address A 3,t
is also stored in the symbol ST 3,x
table
• 3AC can also be represented 3AC Quadruples
as quadruples, which are
t := b*c MULT t,b,c
even more related to
x := a+t ADD x,a,t
assembly languages

Joey Paquet, 2000, 2002 14


Intermediate Languages
• In this case, we generate code in a language for which
we already have a compiler or interpreter
• Such languages are generally very low-level and
dedicated to the compiler construction task
• It provides the compiler writer with a “virtual machine”
• Various compilers can be built using the same virtual
machine
• The virtual machine compiler can be compiled on
different machines to provide a translator to various
architectures.
• For the project, we have the moon compiler, which
provides a virtual assembly language and a compiler.

Joey Paquet, 2000, 2002 15


Project Overview

Source Lexical Token Syntactic Moon Moon Object


Code Analyzer Stream Analyzer Code Compiler Code

• Your compiler generates Moon code


• The Moon compiler (virtual machine) is used to
generate an exectuable for your program
• Your compiler is thus retargetable by
recompilation of the moon compiler on a new
processor

Joey Paquet, 2000, 2002 16


Part II

Semantic Actions
and
Code Generation

Joey Paquet, 2000, 2002 17


Semantic Actions
• Semantics is about giving a meaning to the compiled
program.
• Semantic actions have two parts:
– Semantic checking: check if the compiled program has a
meaning, e.g variables are declared, operator and function
have the right parameter types and number of parameters
upon calling
– Semantic translation: translate declarations, statements
and expressions to machine code
• Semantic translation is conditional to semantic checking

Joey Paquet, 2000, 2002 18


Semantic Actions
• Semantic actions are inserted in the grammar (thus
transforming it in an attribute grammar)
– In recursive descent parsers, they are represented by
function calls imbedded in the parsing functions
– In table-driven top-down parsers, they are represented by
functions pushed on the stack along with the right hand
sides they belong to
• Most semantic actions use attributes for their resolution:
– In recursive descent parsers, they are migrated using
reference parameter passing
– In table-driven top-down parsers, they are migrated using
a semantic stack

Joey Paquet, 2000, 2002 19


Semantic Actions
• There are semantic actions associated with:
– Declarations:
• variable declarations
• type declarations
• function declarations
– Control structures:
• conditional statements
• loop statements
– Expressions:
• assignment operations
• arithmetic and logical expressions

Joey Paquet, 2000, 2002 20


Processing Declarations
• In processing declarations, the only semantic checking
there is to do is to ensure that every object (e.g.
variable, type, class, function, etc.) is declared once and
only once
• This restriction is tested using the symbol table entries
• Symbol table entries are generated as declarations are
encountered
• Afterwards, every time an identifier is encountered, a
check is made in the symbol table to ensure that it has
been properly defined

Joey Paquet, 2000, 2002 21


Processing Declarations
• Code generation in declarations comes in the form of
memory allocation for the objects defined
• Every object defined, no matter its type, will eventually
have to be stored in the computer’s memory
• Memory allocation must be done according to the size
of the objects defined, which depends on the target
machine
• For each identifier declared, you must generate a label
that will be used to refer to that variable in the ASM
code and store it in the location field of its entry in the
symbol table
• See the Moon machine description documentation for
more explanations specific to the project

Joey Paquet, 2000, 2002 22


Processing Variable Declarations
• <varDecl>  <type><id>; {varDeclSem}
– An entry is created in the corresponding symbol table.
Memory space is reserved for the variable according to the
size of the type of the variable and linked to a label in the
ASM code
– The starting address (or its label) is stored in the symbol
table entry of the variable. In the case of arrays, the offset
of (size of the elements) is often stored in the symbol table
record
• <varDecl>  <type><idList>; {varDeclSem}
– To generate each entry, (one for each element in the list),
the compiler must keep track of the type of the
declaration. This is an attribute that is migrated using a
technique appropriate to the parsing method used

Joey Paquet, 2000, 2002 23


Processing Type Declarations
• Most programming languages allow the definition of
types that aggregates of the basic types defined in the
language
• There are typically arrays or record types, or even
abstract data types (or classes) in object-oriented
programming languages
• <typeDecl>  <type><id> is <typeDef>;
{typeDeclSem}
– An entry is created in the symbol table for the new type
defined. It contains a definition (e.g. size) of all the
elements of the new type
– This information is used when new objects of that type are
declared in the program, and to compute the offset when
arrays of elements of that type are created

Joey Paquet, 2000, 2002 24


Type Compatibility
• In Pascal:
– A,B: array (1..10) of integer;
– C,D: array (1..10) of integer;
• this defines two data types:
– type Type1 is array (1..10) of integer;
– A,B: Type1
– type Typ2 is array (1..10) of integer;
– C,D: Type2
• Here, Type1 and Type2 are clearly two distinct data
types, e.g. A := D is not permitted in the program

Joey Paquet, 2000, 2002 25


Type Compatibility
• Some compilers use some rules to define type
equivalence. One of the first compilers to implement
this was the Algol68 compiler
• Advantage:
– gives more flexibility to the language
• Drawbacks:
– compiler is much more complicated to implement
– compiler will hardly distinguish between equivalent types

Joey Paquet, 2000, 2002 26


Processing Arrays
• Static arrays are arrays with static size defined at
compile time
• Most programming languages allow only integer litterals
for the initialization of array size, or constant integer
variables when available in the language
• Pascal: A: array (1..10) if integer
• C: int A[10];
or
const size=10;
int A[size];

Joey Paquet, 2000, 2002 27


Processing Arrays
• This restriction comes from the fact that the memory
allocated to the array has to be set at compile time, and
is fixed throughout the execution of the program
• When processing an array declaration, a sufficient
amount of memory is allocated to the variable
depending on the size of the elements and the
cardinality of the array
• Only the starting address (or a label) is stored in the
symbol table. The offset (the size of the array in
memory) is also sometimes stored in the symbol table
record to avoid referring of elements outside the
bounds
• Dynamic arrays are generally implemented using
pointers, dynamic memory allocation functions and an
execution stack or heap

Joey Paquet, 2000, 2002 28


Processing Expressions
• Semantic records contain the type and location for
variables (normally labels in the ASM code) or the type
and value for constant factors
• Semantic records are created at the leafs of the tree
when factors (F) are recognized, and then passed
upwards in the tree
• These semantic records contain the attributes that are
migrated within the tree to find a global result for the
symbol on top of the tree for that expression

Joey Paquet, 2000, 2002 29


Processing Expressions
• As new nodes (or subtrees) are created going up in the
tree, intermediate results are stored in temporary
semantic records containing subresults for
subexpressions
• Each time an operator node is resolved, its
corresponding semantic checking and translation is
done and its subresult is stored in a temporary variable
for which you have to allocate some memory and
generate a label
• You can even put an entry in the symbol table for each
intermediate result you generated. You can then use
these for further reference, e.g. for debugging

Joey Paquet, 2000, 2002 30


Processing Expressions
• Doing so, the code is generated sequentially as the tree
is traversed:

subtree ASM
=
t1 = b*c L 3,b
M 3,c
x + ST 3,t1
t2 = a+t1 L 3,a
a * A 3,t1
ST 3,t2
b c x = t2 L 3,t2
ST 3,x

Joey Paquet, 2000, 2002 31


Conclusions
• Most compilers build an intermediate
representation of the parsed program,
normally as an abstract syntax tree.
• These will allow high-level optimizations
to occur before the code is generated.
• In the project, we are outputting MOON
code, which is an intermediate
language.
• MOON code could be the subject of low-
level optimizations.

Joey Paquet, 2000, 2002 32


Conclusions
• Semantic actions are composed of a
semantic checking, and a semantic
translation part.
• Semantic actions are inserted at
appropriate places in the grammar to
achieve the semantic checking and
transaltion phase.
• Semantic translation is conditional to
semantic checking.

Joey Paquet, 2000, 2002 33


Conclusions
• There are semantic actions for:
– Declarations (variables, functions, types,
etc)
– Expressions (arithmetic, logic, etc)
– Control structures (loops, conditions, etc)

Joey Paquet, 2000, 2002 34

You might also like