rkCD-Chapter 1 - INTRO TO COMPILERS
rkCD-Chapter 1 - INTRO TO COMPILERS
INTRODUCTION TO COMPILERS
1.1 COMPILERS
● A Compiler is a program that reads a program written in one language - the source
Language - and translates it in to an equivalent program in another language - the
target language
● The compiler reports to its user the presence of errors in the source program.
1. PARTS OF COMPILATION
There are two parts to compilation:
i) analysis
ii) synthesis
The analysis part breaks up the source program into constituent pieces and creates an
intermediate representation of the source program.
The synthesis part constructs the desired target program from the intermediate
representation. Of the two parts, synthesis requires the most specialized techniques.
During analysis, the operations implied by the source program are determined and
recorded in a hierarchical structure called a tree.
Often, a special kind of tree called a syntax tree is used, in which each node represents an
operation and the children of a node represent the arguments of the operation.
Structure editor
● A structure editor takes as input a sequence of commands to build a source program.
● The structure editor not only performs the text-creation and modification functions of
an ordinary text editor, but it also analyzes the program text, putting an appropriate
hierarchical structure on the source program.
● For example, it can check that the input is correctly formed, can supply keywords
automatically (e-g.. when the user types while. the editor supplies the matching do
and reminds the user that a conditional must come between them), and can jump from
a begin or left parenthesis to its matching end or right parenthesis.
Pretty printers
● A pretty printer analyzes a program and prints it in which a way that the structure of
the program becomes clearly visible.
● For example, comments may appear in a special font, and statements may appear with
an amount of indentation proportional to the depth of their nesting in the hierarchical
organization of the statements.
Static checkers
● A static checker reads a program, analyzes it, and attempts to discover potential bugs
without running the program.
● For example, a static checker may detect that parts of the source program can never be
executed.
● It can catch logical errors such as trying to use a real variable as a pointer.
Interpreters
● Interpreter performs the operations implied by the source program.
● For an assignment statement, for example, an interpreter might build a tree like Fig.
1.2, and then carry out the operations at the nodes as it "walks" the tree.
● Interpreters are frequently used to execute command languages, since each operator
executed in a command language is usually an invocation of a complex routine such
as an editor or compiler.
● The analysis portion in each of the following examples is similar to that of a
conventional compiler.
SYMBOL-TABLE MANAGEMENT
● An essential function of a compiler is to record the identifiers used in the source
program and collect information about various attributes of each identifier.
● These attributes may provide information about the storage allocated for an identifier,
its type, its scope and, in the case of procedure names, such things as the number and
types of its arguments, the method of passing each argument and the type returned.
● A symbol table is a data structure containing a record for each identifier, with fields
for the attributes of the identifier. The data structure allows us to find the record for
each identifier quickly and to store or retrieve data from that record quickly.
● When an identifier in the source program is detected by the lexical analyzer, the
identifier is entered into the symbol table.
● However, the attributes of an identifier cannot normally determined during lexical
analysis. For example, in a Pascal declaration like
var position, initial , rate : real ;
● the type real is not known when position, initial , and rate are seen by the lexical
analyzer.
The remaining phases enter information about identifiers into the symbol table and then
use this information in various ways.
SYNTAX ANALYSIS
● It groups token together into syntactic structures. (Fig.1.11a.. syntax tree)
● A typical data structure for the tree is shown in Fig. 1.11(b) in which an interior node
is a record with a field for the operator and two fields containing pointers to the
records for the left and right children.
● A leaf is a record with two or more fields, one to identify the token at the leaf, and the
others to record information about the token.
SEMANTIC ANALYSIS
● An important component of semantic analysis is type checking. Here the compilers
checks that each operator has operands that are permitted by the source language
specification.
● For eg. Many programming language definition require a compiler to report an error
every time a real number is used to index an array.
● However, the language specification may permit some operand coercions, for
example, when a binary arithmetic operator is applied to an integer and real, in this
case, the compiler may need to convert the integer to a real.
CODE OPTIMIZATION
● The code optimization phase attempts to improve the intermediate code, so that faster-
running machine code will result.
● The above intermediate code is optimized like this,
CODE GENERATION
● The final phase of the compiler is the generation of target code, consisting normally of
relocatable machine code or assembly code, Memory locations are selected for each
of the variables used by the program.
● Then, intermediate instructions are each translated into a sequence of machine
instructions that perform the same task.
● A crucial aspect is the assignment of variables to registers. For example, using
registers 1and 2, the translation of the code of the above code might become
Preprocessors
Preprocessors produce input to compilers. They may perform the following functions:
1. Macro processing - A preprocessor may allow a user to define macros that are
short hands for longer constructs.
2. File inclusion - A preprocessor may include header files into the program text. For
example, the C preprocessor causes the contents of the file <global.h> to replace
the statement #include <global.h> when it processes a file containing this
statement.
3. "Rational" preprocessors - These processors augment older languages with more
modern flow-of-control and data-structuring facilities. For example, such a
preprocessor might provide the user with built-in macros for constructs like while-
statements or if-statements, where none exist in the programming language itself.
4. Language extensions, These processors attempt to add capabilities to the language
by what amounts to built-in macros, For example. the language Equel is a database
query language embedded in C.
Statements beginning with ## are taken by the preprocessor to be database-access
statements, unrelated to C, and are translated into procedure calls on routines that
perform the database access.
Macro processors deal with two kinds of statement:
● macro definition
● macro use
Definitions are normally indicated by some unique character or keyword, like define or
macro. They consist of a name for the macro being defined and a body, forming its
definition.
The use of a macro consists of naming the macro and supplying actual parameters, that is
Values for its formal parameters. The macro processor substitutes the actual parameters
for the formal parameters in the body of the macro; the transformed body then replaces
the macro use itself.
Assemblers
● Some compilers produce assembly code that is passed to an assembler for further
processing, Other compilers perform the job of the assembler, producing relocatable
machine code that can be passed directly to the loader/link-editor.
● Assembly code is a mnemonic version of machine code, in which names are used
instead of binary codes for operations, and names are also given to memory addresses.
● A typical sequence of assembly instruction might be
● In the second pass, the assembler scans the input again. This time, it translates each
operation code into the sequence of bits representing that operation in machine
language, and it translates each identifier representing a location into the address
given for that identifier in the symbol table.
● The output of the second pass is usually relocatable machine code, meaning that it can
be loaded starting at any location L in memory.
Some general tools have been created for the automatic design of specific compiler
components, these tools use specialized languages for specifying and implementing the
component, and many use algorithms that are quite sophisticated.
4. Automatic code generators - Such a tool takes a collection of rules that define the
translation of each operation of the intermediate language into the machine
language for the target machine.
The rules must include sufficient detail that we can handle the different possible
access methods for data; e.g.. Variables may be in registers, in a fixed (static)
location in memory, or may be allocated a position on a stack.
The basic technique is "template matching." The intermediate code statements are
replaced by "templates" that represent sequences of machine instructions, in such a
way that the assumptions about storage of variables match from template to
template.
5. Data flow engines - Much of the information needed to perform good code
optimization involves "data-flow analysis," the gathering of information about how
values are transmitted from one part of a program to each other part. Different
tasks of this nature can be performed by essentially the same routine, with the user
supplying details of the relationship between intermediate code statements and the
information being gathered.