0% found this document useful (0 votes)
21 views

CD Uint5

The document discusses several issues that arise during code generation in compilers, including the input format, target program format, memory management, instruction selection, register allocation, evaluation order, and approaches to addressing code generation issues. It also outlines some disadvantages of code generator design such as limited flexibility, maintenance overhead, debugging difficulties, and potential performance and learning curve issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

CD Uint5

The document discusses several issues that arise during code generation in compilers, including the input format, target program format, memory management, instruction selection, register allocation, evaluation order, and approaches to addressing code generation issues. It also outlines some disadvantages of code generator design such as limited flexibility, maintenance overhead, debugging difficulties, and potential performance and learning curve issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT – 5

Issues in Design of a Code Generator:-


Code generator converts the intermediate representation of source code into a
form that can be readily executed by the machine. A code generator is expected
to generate the correct code. Designing of the code generator should be done
in such a way that it can be easily implemented, tested, and maintained.
The following issue arises during the code generation phase:
Input to code generator – The input to the code generator is the intermediate
code generated by the front end, along with information in the symbol table that
determines the run-time addresses of the data objects denoted by the names in
the intermediate representation. Intermediate codes may be represented mostly
in quadruples, triples, indirect triples, Postfix notation, syntax trees, DAGs, etc.
The code generation phase just proceeds on an assumption that the input is
free from all syntactic and state semantic errors, the necessary type checking
has taken place and the type-conversion operators have been inserted
wherever necessary.
 Target program: The target program is the output of the code generator.
The output may be absolute machine language, relocatable machine
language, or assembly language.
 Absolute machine language as output has the advantages that it
can be placed in a fixed memory location and can be immediately
executed. For example, WATFIV is a compiler that produces the
absolute machine code as output.
 Relocatable machine language as an output allows subprograms
and subroutines to be compiled separately. Relocatable object
modules can be linked together and loaded by a linking loader. But
there is added expense of linking and loading.
 Assembly language as output makes the code generation easier.
We can generate symbolic instructions and use the macro-facilities
of assemblers in generating code. And we need an additional
assembly step after code generation.
 Memory Management – Mapping the names in the source program to the
addresses of data objects is done by the front end and the code generator. A
name in the three address statements refers to the symbol table entry for the
name. Then from the symbol table entry, a relative address can be
determined for the name.
Instruction selection – Selecting the best instructions will improve the
efficiency of the program. It includes the instructions that should be complete
and uniform. Instruction speeds and machine idioms also play a major role
when efficiency is considered. But if we do not care about the efficiency of the
target program then instruction selection is straightforward. For example, the
respective three-address statements would be translated into the latter code
sequence as shown below:

P:=Q+R
S:=P+T

MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S

Here the fourth statement is redundant as the value of the P is loaded again in
that statement that just has been stored in the previous statement. It leads to an
inefficient code sequence. A given intermediate representation can be
translated into many code sequences, with significant cost differences between
the different implementations. Prior knowledge of instruction cost is needed in
order to design good sequences, but accurate cost information is difficult to
predict.
 Register allocation issues – Use of registers make the computations faster
in comparison to that of memory, so efficient utilization of registers is
important. The use of registers is subdivided into two subproblems:
1. During Register allocation – we select only those sets of variables that will
reside in the registers at each point in the program.
2. During a subsequent Register assignment phase, the specific register is
picked to access the variable.
To understand the concept consider the following three address code
sequence
t:=a+b
t:=t*c
t:=t/d
Their efficient machine code sequence is as follows:
MOV a,R0
ADD b,R0
MUL c,R0
DIV d, R0
MOV R0,t
1. Evaluation order – The code generator decides the order in which the
instruction will be executed. The order of computations affects the efficiency
of the target code. Among many computational orders, some will require only
fewer registers to hold the intermediate results. However, picking the best
order in the general case is a difficult NP-complete problem.
2. Approaches to code generation issues: Code generator must always
generate the correct code. It is essential because of the number of special
cases that a code generator might face. Some of the design goals of code
generator are:
 Correct
 Easily maintainable
 Testable
 Efficient

Disadvantages in the design of a code generator:

Limited flexibility: Code generators are typically designed to produce a


specific type of code, and as a result, they may not be flexible enough to handle
a wide range of inputs or generate code for different target platforms. This can
limit the usefulness of the code generator in certain situations.
Maintenance overhead: Code generators can add a significant maintenance
overhead to a project, as they need to be maintained and updated alongside the
code they generate. This can lead to additional complexity and potential errors.
Debugging difficulties: Debugging generated code can be more difficult than
debugging hand-written code, as the generated code may not always be easy
to read or understand. This can make it harder to identify and fix issues that
arise during development.
Performance issues: Depending on the complexity of the code being
generated, a code generator may not be able to generate optimal code that is
as performance as hand-written code. This can be a concern in applications
where performance is critical.
Learning curve: Code generators can have a steep learning curve, as they
typically require a deep understanding of the underlying code generation
framework and the programming languages being used. This can make it more
difficult to onboard new developers onto a project that uses a code generator.
Over-reliance: It’s important to ensure that the use of a code generator doesn’t
lead to over-reliance on generated code, to the point where developers are no
longer able to write code manually when necessary. This can limit the flexibility
and creativity of a development team, and may also result in lower quality code
overall

Object Code Forms:-


Let assume that, you have a c program, then you give the C program to
compiler and compiler will produce the output in assembly code. Now, that
assembly language code will give to the assembler and assembler is going to
produce you some code. That is known as Object Code.
In the context of compiler design, object code is the intermediate code that is
generated by the compiler after the syntax analysis, semantic analysis, and
optimization stages. Object code is essentially the machine-readable version of
the source code, which can be executed directly by the computer’s CPU.
1. Object code is typically stored in a binary file format, which is specific to the
target architecture and operating system. The object code file contains both
the executable code and data, as well as information about the program’s
symbols and their memory locations.
2. Object code is generated by the compiler in multiple steps. First, the source
code is transformed into an intermediate representation, such as an abstract
syntax tree or a three-address code. Then, the intermediate code is
optimized to improve the efficiency and speed of the final executable code.
Finally, the optimized intermediate code is translated into the target
architecture’s machine code, using the appropriate instruction set and
addressing modes.
3. Object code can be linked with other object files to produce a complete
executable program. The linking process involves resolving any unresolved
external references, such as function calls or global variables, and
generating a final executable file that can be run on the target system.
In summary, object code is the machine-readable code that is generated by the
compiler, and it serves as an intermediate step between the source code and
the final executable code. Object code files are specific to the target
architecture and operating system and are typically stored in a binary file
format.

But, when you compile a program, then you are not going to use both compiler
and assembler. You just take the program and give it to the compiler and
compiler will give you the directly executable code. The compiler is actually
combined inside the assembler along with loader and linker.So all the module
kept together in the compiler software itself. So when you calling gcc, you are
actually not just calling the compiler, you are calling the compiler, then
assembler, then linker and loader. Once you call the compiler, then your object
code is going to present in Hard-disk. This object code contains various part –
 Header – The header will say what the various parts present in this object
code are and then point those parts. So header will say where the text
segment is going to start and a pointer to it and where the data segment
going to start and it say where the relocation information and symbol
information there. It is nothing but like an index, like you have a textbook,
there an index page will contain at what page number each topic present.
Similarly, the header will tell you, what the places at which each information
is present are. So that later for other software it will be useful to directly go
into those segment.
 Text segment – It is nothing but the set of instruction.
 Data segment – Data segment will contain whatever data you have used.
For example, you might have used something constraint, then that going to
be present in the data segment.
 Relocation Information – Whenever you try to write a program, we
generally use symbol to specify anything. Let us assume you have
instruction 1, instruction 2, instruction 3, instruction 4,….

 Now if you say somewhere Goto L4 (Even if you don’t write Goto statement
in the high-level language, the output of the compiler will write it), then that
code will be converted into object code and L4 will be replaced by Goto 4.
Now Goto 4 for the level L4 is going to work fine, as long as the program is
going to be loaded starting at address no 0. But in most cases, the initial part
of the RAM is going to be dedicated to the operating system. Even if it is not
dedicated to the operating system, then might be some other process that
will already be running at address no 0. So, when you are going to load the
program into memory, means if the program has to be loaded in the main
memory, it might be loaded anywhere.Let us say 1000 is the new starting
address, then all the addresses have to be changed, that is known
as Reallocation
.

 The original address is known as Relocatable address and the final


address which we get after loading the program into main memory is known
as the Absolute address.

Symbol table –

 It contains every symbol that you have in your program. for example, int a, b,
c then, a, b, c are the symbol.it will show what are the variables that your
program contains.

Features :

Machine-readable format: Object code is in a format that can be executed


directly by the processor without the need for further translation.
Architecture-specific: Object code is specific to a particular processor
architecture, so it must be recompiled for other architectures.
Linking: Object code can be linked together with other object files and libraries
to create a complete executable program.
Debugging information: Object code can include debugging information, such
as line numbers and variable names, to aid in debugging the program.
Relocation information: Object code includes information about the addresses
of symbols in the code, allowing the linker to adjust the addresses when the
code is linked with other code.
Code optimization: Object code can be optimized by the compiler to improve
performance, reduce code size, or both.
Assembly code: Object code can be disassembled into assembly code, which
can be useful for understanding how the program works or for reverse
engineering.

Advantages:

1. Efficiency: Object code is optimized for the specific target platform, which
can result in more efficient code than would be possible with a high-level
language.
2. Portability: Object code is typically platform-specific, but it can still be
portable across different systems that use the same platform. This allows
developers to write code once and compile it for multiple target systems.
3. Debugging: Object code can be easier to debug than source code, as it
provides a low-level view of the program’s execution. Developers can use
object code to trace the execution of the program and identify errors or
issues that may be present.
4. Protection: Object code can be protected through the use of obfuscation
techniques, making it harder for others to reverse engineer the code or steal
intellectual property.
5. Security: Object code is more secure than source code because it is not
readable by humans, making it more difficult for attackers to reverse
engineer the code.
6. Interoperability: Object code can be easily linked with other object files to
create a complete executable program.

Disadvantages:

1. Platform-specific: Object code is specific to a particular platform, which


means that it may not be compatible with other systems. This can limit the
portability of the code and make it harder to deploy across multiple systems.
2. Limited readability: Object code is a low-level language that is harder to
read and understand than source code. This can make it more difficult for
developers to maintain and debug the code.
3. Limited control: Object code is generated by the compiler, and developers
have limited control over the resulting code. This can limit the ability to
optimize the code or tailor it to specific requirements.
4. Compatibility issues: Object code can sometimes be incompatible with
other components of the system, which can cause errors or performance
issues.
5. Code size: Object code is typically larger than source code because it
contains additional information, such as symbols and relocation information.
6. Licensing: Object code may be subject to licensing restrictions that limit its
use and distribution.

Code Generator
Code generator is u sed to produ ce the target code for three-address statements. It u ses
registers to store the operands of the three address statement.

Example:
Consider the three address statement x:= y + z. It can have the follow ing seq u ence of
codes:

MOV x, R0
ADD y, R0

Register and Address Descriptors:


o A register descriptor contains the track of w hat is cu rrently in each register. T he register
descriptors show that all the registers are initially empty.
o An address descriptor is u sed to store the location w here cu rrent valu e of the name can
b e fou nd at ru n time.

A code-generation algorithm:
T he algorithm takes a seq u ence of three-address statements as inpu t. F or each three
address statement of the form a:= b op c perform the variou s actions. T hese are as
follow s:

1 . Invoke a fu nction getreg to find ou t the location L w here the resu lt of compu tation b op c
shou ld b e stored.
2 . Consu lt the address description for y to determine y'. If the valu e of y cu rrently in
memory and register b oth then prefer the register y'. If the valu e of y is not already in L
then generate the instru ction M O V y' , L to place a copy of y in L.
3 . Generate the instru ction O P z',L w here z'is u sed to show the cu rrent location of z. if z is
in b oth then prefer a register to a memory location. U pdate the address descriptor of x to
indicate that x is in location L. If x is in L then u pdate its descriptor and remove x from all
other descriptor.
4 . If the cu rrent valu e of y or z have no next u ses or not live on exit from the b lock or in
register then alter the register descriptor to indicate that after execu tion of x : = y op z
those register w ill no longer contain y or z.

Generating Code for Assignment Statements:


T he assignment statement d:= (a-b ) + (a-c) + (a-c) can b e translated into the follow ing
seq u ence of three address code:

1. t:= a-b
2. u := a-c
3. v:= t + u
4. d:= v+ u

Statement Code Generated Register descriptor Address descriptor


Register empty

t:= a - b M OV a, R0 R0 contains t t in R0
SU B b , R0

u := a - c M OV a, R1 R0 contains t t in R0
SU B c, R1 R1 contains u u in R1

v:= t + u ADD R1 , R0 R0 contains v u in R1


R1 contains u v in R1
d:= v + u ADD R1 , R0 R0 contains d D in R0
M O V R0 , d d in R0 and memory

Code seq u ence for the example is as follow s:

Resource Allocation and Assignment

Registers are the fastest locations in the memory hierarchy. But unfortunately,
this resource is limited. It comes under the most constrained resources of the
target processor. Register allocation is an NP-complete problem. However, this
problem can be reduced to graph coloring to achieve allocation and
assignment. Therefore a good register allocator computes an effective
approximate solution to a hard problem.

Figure – Input-Output
The register allocator determines which values will reside in the register and
which register will hold each of those values. It takes as its input a program with
an arbitrary number of registers and produces a program with a finite register
set that can fit into the target machine. (See image)
Allocation vs Assignment:
Allocation –
Maps an unlimited namespace onto that register set of the target machine.
 Reg. to Reg. Model: Maps virtual registers to physical registers but spills
excess amount to memory.
 Mem. to Mem. Model: Maps some subset of the memory location to a set of
names that models the physical register set.
Allocation ensures that code will fit the target machine’s reg. set at each
instruction.
Assignment –
Maps an allocated name set to the physical register set of the target machine.
 Assumes allocation has been done so that code will fit into the set of
physical registers.
 No more than ‘k’ values are designated into the registers, where ‘k’ is the no.
of physical registers.
General register allocation is an NP-complete problem:
 Solved in polynomial time, when (no. of required registers) <= (no. of
available physical registers).
 An assignment can be produced in linear time using Interval-Graph Coloring.
Local Register Allocation And Assignment:
Allocation just inside a basic block is called Local Reg. Allocation. Two
approaches for local reg. allocation: Top-down approach and bottom-up
approach.
Top-Down Approach is a simple approach based on ‘Frequency Count’. Identify
the values which should be kept in registers and which should be kept in
memory.
Algorithm:
1. Compute a priority for each virtual register.
2. Sort the registers into priority order.
3. Assign registers in priority order.
4. Rewrite the code.
Moving beyond single Blocks:
 More complicated because the control flow enters the picture.
 Liveness and Live Ranges: Live ranges consist of a set of definitions and
uses that are related to each other as they i.e. no single register can be
common in a such couple of instruction/data.
Following is a way to find out Live ranges in a block. A live range is represented
as an interval [i,j], where i is the definition and j is the last use.
Global Register Allocation and Assignment:
1. The main issue of a register allocator is minimizing the impact of spill code;
 Execution time for spill code.
 Code space for spill operation.
 Data space for spilled values.
2. Global allocation can’t guarantee an optimal solution for the execution time of
spill code.
3. Prime differences between Local and Global Allocation:
 The structure of a global live range is naturally more complex than the local
one.
 Within a global live range, distinct references may execute a different
number of times. (When basic blocks form a loop)
4. To make the decision about allocation and assignments, the global allocator
mostly uses graph coloring by building an interference graph.
5. Register allocator then attempts to construct a k-coloring for that graph where
‘k’ is the no. of physical registers.
 In case, the compiler can’t directly construct a k-coloring for that graph, it
modifies the underlying code by spilling some values to memory and tries
again.
 Spilling actually simplifies that graph which ensures that the algorithm will
halt.
6. Global Allocator uses several approaches, however, we’ll see top-down and
bottom-up allocations strategies. Subproblems associated with the above
approaches.
 Discovering Global live ranges.
 Estimating Spilling Costs.
 Building an Interference graph.
Discovering Global Live Ranges:
How to discover Live range for a variable?

Figure – Discovering live ranges in a single block


The above diagram explains everything properly. Let’s take the example of
Rarp, it’s been initialized at program point 1 and its last usage is at program
point 11. Therefore, the Live Range of Rarp i.e. Larp is [1,11]. Similarly, others
follow up.
Figure – Discovering Live Ranges
Estimating Global Spill Cost:
 Essential for taking a spill decision which includes – address computation,
memory operation cost, and estimated execution frequency.
 For performance benefits, these spilled values are kept typically for the
Activation records.
 Some embedded processors offer ScratchPad Memory to hold such spilled
values.
 Negative Spill Cost: Consecutive load-store for a single address needs to
be removed as it increases the burden, hence incurs negative spill cost.
 Infinite Spill Cost: A live range should have infinite spill cost if no other live
range ends between its definition and its use.
Interference and Interference Graph:
Figure – Building Interference Graph from Live Ranges

From the above diagram, it can be observed that the live range LRA starts in
the first basic block and ends in the last basic block. Therefore it will share an
edge with every other live Range i.e. Lrb, Lrc,Lrd. However, Lrb, Lrc, Lrd
doesn’t overlap with any other live range except Lra so they are only sharing an
edge with Lra.
Building an Allocator:
 Note that a k-colorable graph finding is an NP-complete problem, so we
need an approximation for this.
 Try with live range splitting into some non-trivial chunks (most used ones).

You might also like