Using Assembler Language in Delphi
Using Assembler Language in Delphi
in Delphi
Fourth revision - May 2010
Guido Gybels
www.guidogybels.eu
The first edition of this paper was created in the late nineties and based on the
behaviour of Delphi 2. While I had a need to use assembler in some of my
projects, I quickly found out that there wasn't much documentation available on
the topic of how to properly integrate assembler code into Delphi programmes.
This paper was written with the intention of bridging that gap. It has since been
through a number of revisions. Processors, and Delphi, have evolved
significantly since I first wrote this paper, but it should retain a lot of its original
value.
Table of Contents
TABLE OF CONTENTS ........................................................................................................... 3
INTRODUCTION...................................................................................................................... 4
BASIC PRINCIPLES................................................................................................................ 6
CHAPTER 1: GENERAL CONTEXT OF ASSEMBLER CODE ................................................ 7
1.1. W HERE TO LOCATE THE ASSEMBLER CODE ......................................................................... 7
1.2. LABELS ........................................................................................................................... 7
1.3. LOOPS............................................................................................................................ 9
1.4. ENTRY AND EXIT CODE ....................................................................................................10
1.5. REGISTER PRESERVATION ...............................................................................................11
CHAPTER 2: PASSING PARAMETERS.................................................................................12
2.1. CALLING CONVENTIONS ..................................................................................................12
2.2. PASSING PARAMETERS IN REGISTERS ...............................................................................12
2.3. USING THE STACK FOR PARAMETER PASSING .....................................................................15
2.4. PASSING BY VALUE VERSUS PASSING BY REFERENCE ..........................................................17
CHAPTER 3: LOCAL VARIABLES ........................................................................................20
3.1. LOCAL VARIABLES AND THE STACK FRAME .........................................................................20
3.2. SIMPLE TYPES AS LOCAL VARIABLES .................................................................................22
3.3. RECORDS AS LOCAL VARIABLES........................................................................................23
3.4. HEAP ALLOCATED TYPES AS LOCAL VARIABLES ...................................................................24
CHAPTER 4: RETURNING RESULTS....................................................................................26
4.1. RETURNING INTEGERS AS IMMEDIATE VALUES ....................................................................26
4.2. RETURNING BOOLEANS AS IMMEDIATE VALUES ...................................................................27
4.3. RETURNING REAL NUMBERS .............................................................................................28
4.4. RETURNING CHARACTERS................................................................................................30
4.5. RETURNING A LONG STRING .............................................................................................31
FURTHER READING ..............................................................................................................35
TABLE 1: USE OF CPU REGISTERS ....................................................................................36
TABLE 2: CALLING CONVENTIONS .....................................................................................38
TABLE 3: PARAMETER PASSING ........................................................................................39
TABLE 4: RETURNING RESULTS .........................................................................................42
ABOUT THE AUTHOR ...........................................................................................................44
Introduction
Many programmers still associate assembler with a difficult, low-level kind of
programming. Most of them also think it is incomprehensible and impossible to
master. In reality, things are not that bad and these perceptions are mostly
founded in unfamiliarity. It is quite possible to learn how to write good assembly
code without being a genius. On the other hand, don't think a few lessons in
assembly will leave you producing faster code than the average Delphi/Pascal
equivalent. The reason is that when you write Delphi code, you are competing
with an efficient and experienced assembler programmer: the compiler. Overall,
the code produced by it is efficient and fast enough for most applications.
Most people grab for assembler in order to achieve better performance for their
software. However, the key to performance is mostly design, not choice of
language. There is little point in trying to fix a bad algorithm through assembler.
So, the first thing to do when you are experiencing bottlenecks, is to review your
code architecture and Pascal implementation, rather than immediately taking
recourse to assembler. Writing it in assembler is not going to turn a bad
algorithm or a bad approach into a good one.
Similarly, most programmers erroneously believe that writing code in assembler
by definition means it's fast and assume that code written in assembler is by
definition faster than compiled Delphi/Pascal code. That is certainly not the
case. Badly written assembler routines will perform on an inferior level and can
even cause strange bugs and problems in your application.
There is however a good case for hand crafting assembler code in certain
cases. Delphi's (Pascal) language is a general programming language and
certain specialised tasks could be done better in assembler. Also, taking
advantage of processor specific features might require manual code design.
It is not within the scope of this article to teach you the principles of assembler
programming. There are other information resources out there discussing
assembler programming. See Further Reading to find some pointers to relevant
material.
Before resorting to rewriting code in assembler, investigate first where your
bottlenecks are and why. A profiler might be a useful tool during this analysis.
Once you have identified the cause, take a step back and look at the structure
and algorithm. Often, you can get much better performance by revising the
algorithm or the application design, rather than just throwing in some
assembler. On the other hand, in some particular cases, assembler might
indeed be a better and even simpler choice.
Once you conclude that assembler is actually needed, you should take time
proper to draw up a plan for your code and design the algorithm. Only when you
Guido Gybels: Using Assembler in Delphi
have a clear idea of what you want to do and how you will be implementing it,
you can start coding. If you don't think about these issues, you'll end up with
spaghetti code, convoluted statements and an unmaintainable program. Not to
mention the possibility of introducing nasty bugs.
Basic Principles
During the implementation phase, there are several general principles you
should follow. The first important rule is: keep your routines as short as
possible. Assembler should be used for some real core activity in which
performance and simplicity are essential. So, in most cases, you can manage
with fairly short and specialised routines. If you see that you have plenty of long
pieces of assembler in your project, you are probably over-enthusiastic about it.
Secondly, keep the routines readable by commenting them in a meaningful
way. Because of the basic linear nature of assembly code, the individual
statements are quite easy to read. Comments are needed to clarify how you
want to implement the algorithm, so that a third party will immediately
understand what is going on. Your comments need to add valuable information
for the readers of your code. So, the following is clearly wrong:
inc edx {increase edx}
Such comments are totally pointless, since the instruction itself is making it
blatantly clear that you are incrementing the edx register. Comments should
indicate the inner workings of the algorithm and/or provide information not
otherwise immediately clear from the code and its context, rather than just
rephrase what the mnemonics already show. So, the above could, for instance,
be replaced by something like this:
inc edx {go to the next element in our product table}
Thirdly, you should avoid the use of slow instructions wherever possible. In
general, the simple instructions are preferable over the complex opcodes, since
on modern cpus the latter are implemented in microcode. Google for Agner
Fog's publications on the topic of code optimisation, which discuss this and
many other very useful aspects of how processors behave.
The final general principle is predictable: test your assembler routines
thoroughly and continuously. For routines written in assembler, the compiler will
produce substantially less warnings and error messages, and will offer no
feedback with regard to the basic logic of the assembler code. Unused variables
and wrong use of pointers will not be detected as easily as in Pascal code. So,
prepare for intensive testing and debugging of your code. It will take more time
and effort when compared to debugging regular Pascal code. That is yet
another reason to keep these routines as short and readable as possible.
If after all of this, you still want to go ahead with writing assembler code for your
Delphi/Pascal programs, reading this article is probably a good first step. I have
designed it to be as generic as possible.
It is possible to nest asm blocks inside a Pascal function or procedure, but that
approach is not recommended. You should isolate your assembler code inside
separate function or procedure blocks. First of all, inserting assembler inside a
regular Pascal function will interfere with the compiler's optimisation and
variable management activities. As a result, the generated code will be far from
optimal. Variables are likely to be pushed out of their register, requiring saving
on the stack and reloading afterwards. Also, nesting inside a Pascal block
forces the compiler to adapt its generated code to your assembler code. Again,
this interferes with the optimisation logic and the result will be quite inefficient.
So, the rule is to put assembler code in its own separate function/procedure
block. There is also a design aspect: the readability and maintainability of your
code will benefit greatly when all assembler is clearly isolated in dedicated, wellcommented blocks.
1.2. Labels
Labels are tags that mark locations in your code. The most common reason for
having labels is to have a point of reference for branching. There are two kinds
of labels you can use in your assembler code: Pascal-style labels and local
assembly labels. The former type requires you to declare them in a label section
first. Once declared, you can use the label in your code. The label must be
followed by a colon:
label
MyLabel;
asm
...
mov ecx, {Counter}
MyLabel:
... {Loop statements}
dec ecx
jnz MyLabel
...
end;
Neither kind of label is intrinsically better than the other. There is no advantage
in code size or speed of course, since labels are only reference points for the
compiler to calculate offsets and jumps. The difference between Pascal-style
and local labels in assembler blocks is a relic from the past and is fading away.
As a consequence, even Pascal-style labels are "local" in the sense that it is not
possible to jump to a label outside the current function or procedure block. That
is just as well, since that would be a perfect scenario for disaster.
1.3. Loops
Often, the assembler code will be designed to achieve the highest speed
possible. And quite often also, processing large amounts of data inside loops
will be the task. When loops are involved, you should implement the loop itself
in assembler too. That is not difficult and otherwise you will be wasting a lot of
execution time because of calling overheads. So, instead of doing:
function DoThisFast(...): ...; register;
asm
...
{Here comes your assembler code}
...
end;
procedure SomeRoutine;
var
I: Integer;
begin
I:=0;
...
while I<{NumberOfTimes} do begin
DoThisFast(...);
inc(I);
end;
...
end;
Note that in the example above, the loop counter counts downwards. That is
because in this way, you can simply check the zero flag after decrementing to
see if the end of the loop has been reached. By contrast, if you simply start off
with ecx=0 and then count upwards, you will need an additional compare
instruction to check whether or not to continue the loop:
mov ecx,0
@@loop:
...
inc ecx
cmp ecx,{NumberOfTimes}
jne @@loop
Alternatively, you can subtract the NumberOfTimes from 0 and then increase
the loop index until zero is reached. This approach is especially useful if you
use the loop index register simultaneously as an index to some table or array in
memory, since cache performance is better when accessing data in forward
direction:
xor ecx,ecx
sub ecx,{NumberOfTimes}
@@loop:
...
inc ecx
jnz @@loop
Remember however that in this case, your base register or address should
point to the end of the array or table, rather than to the beginning, and you will
be iterating through the elements in reverse order.
This code preserves ebp, and then copies the stack pointer into the ebp
register. Subsequently ebp can be used as the base register to access
information on the stack frame. The sub esp line reserves space on the stack
for local variables as required. The exit code pattern is as follows:
mov esp,ebp
pop ebp
ret {Size of stack space reserved for parameters}
10
This exit sequence will clean up the space allocated for local parameters by
copying ebp (pointing at the beginning of the stack frame) back to the stack
pointer. This deallocates the space used for local variables. Next, ebp is
restored to the value it had upon entry of the routine. Finally, control is returned
to the caller, adjusting the stack again for any space allocated for parameters
passed to the routine. This parameter cleanup in the ret instruction is required
for all calling conventions except cdecl. In all cases except cdecl, the called
function is responsible for cleaning up the stack space allocated for parameters,
and thus the ret instruction will include the necessary adjustment. In case of
cdecl, however, the caller performs the cleanup.
If your function or procedure has neither local variables nor parameters passed
to it via the stack, then no entry and exit code will be produced, except for the
ret instruction that is always generated.
11
12
types can be passed in a register. Please note that when passing parameters
by reference, you are in fact passing a pointer to the variable in question. Since
pointer types qualify for passing in a register, variables passed by reference
always qualify for passing in a register, with the exception of method pointers.
If the number of parameters passed is lower than or equal to the number of
registers available (three for standalone routines, two for methods), than there
is no need to set up a stack frame for parameter passing. This can save
overhead when calling the routine. Be careful however, because parameter
passing is not the only reason for setting up stack frames: if you declare local
variables, a stack frame is also required and thus extra overhead to manage the
stack frame is still incurred.
In addition, for many structured types, the data itself actually resides on the
stack or on the heap and the variable is a pointer to the actual data. Such a
pointer occupies 32-bits and therefore will fit into a register. This means that
most parameter types will qualify for passing through registers, although
method pointers (consisting of two 32-bit pointers, one to the object instance
and one to the method entry point) will always be passed on the stack.
This article is based on 32-bit modes, so registers are 32 bits wide. When
passing information that doesn't occupy the whole register (byte- and wordsized values for example), the normal rules apply: bytes go in the lowest 8 bits
(for example al) and words in the lower word of the register (for example ax).
Pointers are always 32-bit values and thus occupy the whole register (for
example eax). In case of byte- or word-sized variables, the content of the rest
of the register is unknown and you should not make any assumptions about its
state. For instance, when passing a byte to a function in al, the remaining 24
bits of eax are unknown, so you cannot assume them to be zeroed out. You
can use an and operation to make sure the remaining bits of the register are
reset:
and eax,$FF {Unsigned byte value in AL, clears 24 highest bits}
or
and eax,$FFFF {unsigned word value in AX, clears 16 highest bits}
When passing signed values (shortint and smallint), you might want to
expand them to a 32-bit value for easier computation, but in doing so you need
to retain the sign. To expand a signed byte to a signed double word, you need
two instructions:
cbw {extends al to ax}
cwde {extends ax to eax}
The importance of not relying on the remainder bits having a specific value can
be easily demonstrated. Write the following test routine:
Guido Gybels: Using Assembler in Delphi
13
Next, drop a button and a label on a form and put the following code in the
button's OnClick event:
var
I: ShortInt;
begin
I:=-7;
Label1.Caption:=IntToStr(Test(I));
end;
Run the project and click the button. The Test routine receives a ShortInt
through al. It returns an integer in the eax register (returning results is
discussed in Chapter 4), which should be unchanged since the subroutine
returns immediately. You can easily observe that eax has undefined content
upon return. Now change the test function as follows and run the project again:
function Test(Value: ShortInt): LongInt; register;
asm
cbw
cwde
end;
will put First in eax, Second in dl and Third in ecx. Next, here is an
example of a method declaration:
procedure
register;
asm
...
end;
TSomeClass.DoSomething(First,
Second:
Integer);
In this case, eax will contain Self, edx contains First, while Second is
stored in ecx.
14
Please note that since register will result in parameters being passed in
registers, you will lose that parameter information as soon as you override the
register's contents. Take the following code:
procedure DoSomething(AValue: Integer); register;
asm
{eax will contain AValue}
...
mov eax, [edx+ecx*4] {eax gets overwritten here}
...
end;
After eax gets overwritten, you no longer have access to the AValue
parameter. If you need to preserve that parameter, make sure to save the
contents of eax on the stack or in local storage for use afterwards. And don't fall
into the common trap to do the following later on in your code:
mov eax, AValue
because the compiler will, for the above line, simply generate the following
code:
mov eax, eax
as the compiler only knows, from the chosen calling convention, that AValue
was passed in eax to the subroutine.
The calling convention is pascal, which means that prior to the call to the
subroutine, the caller pushes three parameters on the stack in the order that
they are declared (remember that the stack grows downwards, which means
the first parameter is located at the highest address):
Guido Gybels: Using Assembler in Delphi
15
First
esp
Second
Third
Next, the call instruction will push the return address onto the stack and then
hands over execution to the subroutine, so immediately after entry, the stack
looks as follows:
First
Second
esp
Third
Return Address
The compiler generated entry code (see Chapter 1) saves the current value of
ebp and subsequently copies the value of esp to ebp so that the latter can from
now on be used to access the parameter data on the stack frame:
First
Second
esp
Third
Return Address
Saved ebp
ebp
From this point on, we can access the parameters on the stack frame as offsets
from ebp. Because the return address sits on the stack between the current
top-of-stack and the actual parameters, we access the parameters as follows:
First = ebp + $10 (ebp + 16)
Second = ebp + $0C (ebp + 12)
Third = ebp + $08 (ebp + 8)
However, you can simply refer to these parameters by name. the compiler will
replace each parameter with the correct offset from ebp. So, in the example
above, writing the following:
mov eax, First
will be translated by the compiler into:
16
mov eax,[ebp+0x10]
This will save you the headache from calculating the offsets yourself and it is
also much more readable, so you should use the names of the parameters that
are passed on the stack in your code wherever possible (practically always)
instead of hard coding the offsets. Be careful however: if you use the register
calling convention, the first set of parameters will be passed in registers. For
those parameters that are passed in registers, you should use correct register
to refer to the variable, to prevent ambiguities in your code. Take the following
example:
procedure DoSomething(AValue: Integer); register;
Since this declaration uses the register calling convention the AValue
parameter will be passed into the eax register. It is probably wise to explicitely
write eax in your code to refer to this parameter. It will help you to spot the
following potential bug:
mov eax, AValue
which on the basis of the declaration above would result in the following code to
be generated:
mov eax, eax
In summary: for parameters passed in a register, you should use the register to
refer to it. For parameters passed via the stack, use the variable name to refer
to it (and don't use the ebp register, so it remains available for access that
information).
Stack space is always allocated in 32-bit chunks, and therefore the data passed
on the stack will always occupy a dword multiple. Even if you pass a byte to the
procedure, 4 bytes will be allocated on the stack with the three most significant
bytes having undefined content. You should never assume that this undefined
portion is zeroed out or has any other specific value.
17
As the register calling convention is used, the value of the I parameter will
be passed in the eax register (see table 3). So, given I=254, eax will upon
entry contain the value $000000FE, passing I by value. However, if we change
the declaration as follows:
function MyFunction(var I: Integer): Integer; register;
the eax register no longer contains the value of I, but rather a pointer to the
memory location where I is stored, for example $0066F8BC. Passing
parameters by reference using var or const is done by means of a 32-bit
pointer.
When you use const, indicating that you a variable is used for read-only
access only, the compiler uses either method. The wording in Delphi's online
help can be misleading and some people assume that const always results in
passing a 32-bit pointer to the actual value, but that is not correct. You can use
table 3 for guidance.
By using const, the programmer informs the compiler that the data is only
going to be read. Please note however that within an asm..end block, the
compiler will not prevent you from writing code that violates this read-only
characterisation of const parameters in cases where these are pointers to
structured data like AnsiStrings or records. Be careful to honour the readonly character of the information passed using const in your assembler code,
otherwise you could introduce nasty bugs. And of course, it would be extremely
poor design to label information read-only, yet then proceed to change it. This is
especially important when you are using reference counted types like
AnsiString that use copy-on-write semantics.
All of the above means that you will have to carefully take into account the
differences between passing by value and passing by reference. For example,
imagine a function that calculates the sum of an integer with 12. In case of
passing the integer parameter by value, the code should look as follows:
function MyFunction(I: Integer): Integer; register;
asm
add eax, 12
end;
I will discuss returning results in Chapter 4. For now it is sufficient to know that
the result in this case will be returned to the caller via the eax register. As you
can see, the value of I is taken directly from the eax register. But if we change
the function to pass the information by reference, we would get something like
this:
function MyFunction(var I: Integer): Integer; register;
asm
18
Because eax does not contain the value of I, but rather a pointer to the
memory location where I is stored, we retrieve the value through the received
pointer.
19
From that previous chapter, we already know that using the pascal calling
convention will result in parameters being pushed onto the stack prior to
invoking the procedure. The call instruction will push the return address onto the
stack. Next, entry code will cause the value of ebp to also be pushed onto the
stack. Then, ebp is set up as base pointer for accessing the data on the stack
frame. At this point, the stack frame looks therefore as follows:
First
Second
esp
Third
Return Address
Saved ebp
ebp
Because we have also declared a local variable, SomeTemp, the compiler will
add code (for instance push ecx) to reserve space on the stack for said
variable:
20
First
Second
Third
Return Address
Saved ebp
esp
SomeTemp
+16
+12
+8
+4
ebp
-4
As stated before, ebp contains a base pointer for accessing data on the stack
frame. Since the stack grows downwards, higher addresses contain
parameters, while lower addresses contain local variables. In our particular
example, the stack frame has the following slots allocated:
Parameters:
First = ebp + $10 (ebp + 16)
Second = ebp + $0C (ebp + 12)
Third = ebp + $08 (ebp + 8)
Local Variables:
SomeTemp = ebp - $04 (ebp - 4)
A next local variable will be allocated at ebp -8 and so on. Just as with
parameters on the stack, you can (and should) use the variable name to refer to
the actual location on the stack:
mov eax, SomeTemp
Please note that the content of these variables is generally not initialised, and
you should treat it as being undefined. It is your task to initialise them when and
if required.
Because using local variables cause overhead for creating and managing the
stack frame, it is worth analysing your algorithm carefully to determine whether
or not you need local storage. Clever use of available registers and smart code
design can often avoid the need for local variables altogether. Apart from
avoiding overhead for allocating and managing local variables, moving data
between registers is significantly faster than accessing data in main memory
(but beware of stalls and other performance hits, for instance by reading a
register immediately after writing it). When you are writing Object Pascal code,
the Delphi compiler will perform optimisations by trying to use registers
wherever feasible. Loop counter variables are a particular case in point and
you, too, should favour registers for such usage. Of course, inside an
21
asm..end block, you are on your own and the compiler will not perform such
optimisations for you. Well structured code will therefore aim to use registers as
much as possible, especially for data that is used most often.
While AValue only requires one byte, a full dword is allocated on the stack
frame. This behaviour ensures that data on the stack is always aligned on a
dword boundary, which improves performance and makes the logic to calculate
variable locations easier (and allows for easy use of scaling in indirect
addressing). You should however not use the remainder part of the allocated
space (the padding) since this ultimately an implementation issue. Future
compiler versions might behave differently. If you need additional storage
space, simply use the appropriate, larger type.
Please note that this rule for dword allocation does not apply to local variables
of type record, even though they are also stored on the stack frame. Their
member fields' alignment depends on the state of the alignment switch ({$A}
directive) and the use of the packed modifier. This is discussed in more detail
in the next paragraph.
This alignment behaviour is another good reason to refer to local variables
using their variable name, rather than manually calculating the offset yourself.
The compiler will calculate the correct offset for you. In the example above,
AValue occupies only one byte. Hence, only the lowest byte of the allocated
dword is used. So, this instruction:
mov al, AValue
22
The alignment boundary for each member field of the record depends on its
type and its size. In the example above, Firstvalue and ThirdValue are of
type DWord, which is a 32-bit type. With alignment on, they will be aligned to
dword boundaries. Since in between those two members, there is a byte-sized
field, SecondValue, the compiler will add three padding bytes, thus ensuring
23
that ThirdValue is properly aligned. The following picture shows the memory
allocation for this record in the aligned state:
By adding the packed modifier to the record declaration, the record's member
fields are no longer aligned. You can see the result in the following illustration,
as the padding bytes are no longer present:
Similarly, when alignment is turned off by using the {$A-} directive, even without
the packed modifier there will be no padding between record member fields.
Fortunately, just as for simple types, you can refer to record member fields by
their names, and the compiler will calculate the correct offsets for you. However,
always make sure you use operands of the proper size, i.e. specify the operand
size explicitly. In that way, your code will continue to work correctly even when
alignment is changed or the packed modifier is introduced at a later stage:
mov eax, DWORD PTR [ARecord.FirstValue]
mov al, BYTE PTR [ARecord.Byte]
24
In other words, if you use heap allocated types as local variables, memory will
be allocated for the reference (the pointer) to that variable on the stack frame,
but you are responsible for the actual allocation and deallocation of the memory
and for initialising the contents. In Pascal, most of these types are largely
automatically managed, so allocation and deallocation happens behind the
scenes. In assembler blocks, that is obviously not the case.
You can call GetMem to allocate memory and return a pointer to the newly
allocated memory. You need to pass the amount of memory needed in eax and
upon return from GetMem the eax register will contain the pointer, which you
can then store in the appropriate slot on the stack frame.
25
26
little use for the Comp type, which is maintained for backward compatibility. It is
recommended to use Int64 instead. On several processors, Int64 will also
yield better performance for integer arithmetic, as it uses the CPU registers,
rather than the FPU.
The following code demonstrates how to return an integer from assembly code.
It returns the number of set bits in the AValue parameter as an unsigned 8-bit
value (we don't need the larger range of 16 or 32 bit integers, since the returned
value will fall in the range 0-32).
function CountBits(const AValue: Longword): Byte;
asm
mov ecx, eax
xor al, al
test ecx, ecx
jz
@@ending
@@counting:
shr ecx, 1
adc al, 0
test ecx, ecx
jnz @@counting
@@ending:
end;
27
28
The precision control bits in the FPU control register are highly relevant in this
respect. Under normal circumstances, this two bit binary value ought to be set
to 11 at all times, indicating 64-bit mantissa precision. If the precision control
bits are set to a lower precision, the FPU will reduce precision during
computation and hence your result will be less precise than you would have
expected. The theory of floating point arithmetic and the details of the Intel FPU
are out of scope for this article, but you should familiarise yourself intimately
with the topics before attempting to write elaborate floating point code or
whenever you need to produce results in line with specific guidelines or
standards. Delphi has several supporting functions and variables, such as
Get8087CW, Set8087CW, SetPrecisionMode, etc. See online help for more
information. Note that many libraries and even calls to OS functions can change
the value of the FPU control word. If you change the control word inside your
own code, it is good practice to make sure you restore it to its previous state
when you are done. This ought to be done outside your time critical FP code,
since setting the control word causes, on many processors, considerable stall if
the control word is read immediately afterwards, which is the case for most FP
instructions.
In addition to single, double and extended, the Real48, Comp and
Currency types are also returned in ST(0). Even though Comp represents a
64-bit integer, it is a type that uses the FPU, rather than the CPU registers and
as such is manipulated using FPU instructions. Currency is a fixed-point type
mainly designed for monetary calculations, but as with Comp it is in fact a FPU
based type. Note that Currency is scaled by 104. Hence, a Currency value of
5.4321 is stored in ST(0) as 54321.
Anyone wishing to use FP math for monetary applications ought to make sure
they fully understand the nature of floating point arithmetic. My article on floating
point values provides a brief introduction to the topic and contains links to
further reading material. Using scaled integers might be a better approach for
such applications. Also, Intel CPUs support BCD encoding and arithmetic,
which is useful for these purposes. Unfortunately, the Delphi language has no
support for BCD types, so you will need to encode and decode BCD data
yourself.
Unless needed for compatibility with other applications or environments, you
should avoid the non-native Real48 type altogether. This type is not supported
in hardware, so all manipulation has to be done in code, which makes it very
slow. Convert a Real48 into a native float immediately after receiving it, using
the System unit's _Real2Ext function. When invoking _Real2Ext, eax
contains a pointer to the Real48 value. Upon return, ST(0) is loaded with the
value. You can then use the FPU to perform the required calculations. If you
need to hand the result back as a Real48 type, call _Ext2Real, also in the
29
System unit, which will convert the value in ST(0) back into a Real48 value.
eax should contain a pointer to a 6-byte wide memory location where the
converted value will be stored. Note that in Delphi versions before Delphi 4, this
non-native 6-byte type was called Real instead of Real48. From Delphi 4
onwards, Real acts as a generic type for real numbers, at present implemented
as a Double.
Table 4 summarises the rules for returning results, including real types.
To conclude this section on returning real numbers, below is a full working
example. The function CalcRelativeMass below demonstrates floating point
arithmetic in assembler within a Delphi environment. The function takes two
parameters, mass and velocity of a body, and calculates the relative mass as
per the theory of relativity.
function CalcRelativeMass(m,v: Double): Double; register;
const
LightVelocity: Integer = 299792500;
asm
{Calculate the relative mass according to the following
formula: Result = m / Sqrt(1-v/c), where c = the
velocity of Light, m the mass and v the velocity of
an object}
fild LightVelocity
fild LightVelocity
fmulp {Calculate c}
fld v
fld v
fmulp {Calculate v}
fxch
fdivp {v/c}
fld1
fxch
fsubp {ST(0)=1-(v/c)}
fsqrt {Root of ST(0)}
fld m
fxch
fdivp {divide mass by root result}
end;
30
The Delphi online help makes a bit of a mess of things by stating on the one
hand that WideChar is a fundamental type, and on the other hand talking about
its "current implementations [sic]", thereby alluding that future versions might
change its 16-bit nature, as if it were some generic type. In practice, there is
little choice but to consider WideChar as a fundamental, 16-bit Unicode type.
See also Table 4, which provides an overview of the rules for returning results.
31
32
jnz @@loop2
@@ending:
pop esi
end;
The above example does not need to call any of the System.pas routines, but
it still relies on some implementation specific behaviour, namely where it
retrieves the long string's length. Long strings are preceded by two extra
dwords, a 32-bit length indicator at offset -4 and a 32-bit reference count at
offset -8. There is no guarantee that this scheme will remain unchanged forever.
To use the above function, allocate a string and then call the routine:
procedure DoSomething;
var
ALine: AnsiString;
begin
...
SetLength(ALine, {Required Length});
FillWithPlusMinus(ALine);
...
end;
Ansistring;
33
jz @@remain
@@loop:
mov [esi],eax
add esi,4
dec ecx
jnz @@loop
@@remain: {fill the remaining bytes}
mov ecx, ebx
and ecx, 3
jz @@ending
@@loop2:
mov BYTE PTR [esi],al
shr eax,8
inc esi
dec ecx
jnz @@loop2
@@ending:
pop ebx
pop esi
end;
The example now uses the standard Result mechanism for returning data. It is
called simply with the required length to obtain a string:
procedure DoSomething;
var
ALine: AnsiString;
begin
...
ALine:=PlusMinusLine({Required Length});
...
end;
The above example above was written in Delphi 7. It should be very similar for
most other versions of Delphi, although the names for the internal functions do
differ somewhat between Delphi versions. Inspecting System.pas should help
you in identifying the appropriate function. You can also use the ctrl-left
button shortcut on the name of Pascal functions like UniqueString to jump
to their System.pas implementations.
34
Further Reading
The Art of Assembly
http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/www.artofasm.com/index.html
35
Entry
Exit
eax
No
ebx
Unknown
n/a
Yes
ecx
n/a
No
No
edx
Preserve?
esi
Undefined
n/a
Yes
edi
Undefined
n/a
Yes
ebp
(5)
Yes
esp
Stack pointer
Stack pointer
(5)
Yes
cs
Code selector
n/a
Yes
ds
Data selector
n/a
Yes
es
Data selector
n/a
Yes
fs
n/a
Yes
gs
Reserved
n/a
Yes
ss
Stack selector
Stack selector
Yes
(1)
36
except Register.
Only for Result types that qualify to go into a register. See table 4 for a complete overview of
how results are returned to the caller.
(5)
While you are allowed to use these registers inside your code, such practise is strongly
discouraged. The stack pointer itself changes with every stack operation and its content is
therefore highly volatile, making it difficult to manage in code. The ebp register is used to
access the stack frame and other uses are therefore to be avoided.
(4)
37
Common Usage
register
left-to-right
Callee
Delphi applications
pascal
left-to-right
Callee
Backwards compatibility
cdecl
right-to-left
Caller
stdcall
right-to-left
Callee
safecall
right-to-left
Callee
38
By Value
Const
ShortInt
8-bit value(1)
8-bit value(1)
SmallInt
16-bit value(1)
16-bit value(1)
32-bit pointer to
16-bit Value(1)
LongInt
32-bit value
32-bit value
32-bit pointer to
32-bit Value
Byte
8-bit value(1)
8-bit value(1)
Word
16-bit value(1)
16-bit value(1)
32-bit pointer to
16-bit Value(1)
Dword
32-bit value
32-bit value
32-bit pointer to
32-bit Value
Int64
32-bit pointer to
64-bit value2
32-bit pointer to
64-bit value2
32-bit pointer to
64-bit Value2
Boolean
8-bit value(1)
8-bit value(1)
ByteBool
8-bit value(1)
8-bit value(1)
WordBool
16-bit value(1)
16-bit value(1)
32-bit pointer to
16-bit Value(1)
LongBool
32-bit value
32-bit value
32-bit pointer to
32-bit Value
By Reference
39
Value in
Register?
By Value
Const
By Reference
AnsiChar
8-bit value(1)
8-bit value(1)
WideChar
16-bit value(1)
16-bit value(1)
32-bit pointer to
16-bit Value(1)
ShortString
32-bit pointer to
the string
32-bit pointer to
the string
32-bit pointer to
the string
AnsiString
32-bit pointer to
the string
32-bit pointer to
the string
32-bit pointer to a
32-bit pointer to
the string
Variant
32-bit pointer to
the variant
32-bit pointer to
the variant
32-bit pointer to
the variant
Pointer
32-bit pointer
32-bit pointer
32-bit pointer to a
32-bit pointer
Object
reference
32-bit pointer to
the object
instance
32-bit pointer to
the object
instance
32-bit pointer to a
32-bit pointer to
the object instance
Class
reference
32-bit pointer to
the class
32-bit pointer to
the class
32-bit pointer to a
32-bit pointer to
the class
32-bit pointer to
32-bit pointer to
the
the
procedure/function procedure/function
32-bit pointer to a
32-bit pointer to
the
procedure/function
Method
pointer
2x 32-bit pointer(3)
2x 32-bit pointer(3)
2x 32-bit pointer(3)
Set
8/16/32-bit value
or 32-bit pointer(4)
8/16/32-bit value
or 32-bit pointer(4)
32-bit pointer to
the set
Record
8/16/32-bit value
or 32-bit pointer(5)
8/16/32-bit value
or 32-bit pointer(5)
32-bit pointer to
the record
Static
Array
8/16/32-bit value
or 32-bit pointer(6)
8/16/32-bit value
or 32-bit pointer(6)
32-bit pointer to
the array
Dynamic
Array
32-bit pointer to
the array
32-bit pointer to
the array
32-bit pointer to a
32-bit pointer to
the array
Open Array
32-bit pointer to
32-bit pointer to
32-bit pointer to
Procedure
pointer
40
Value in
Register?
By Value
Const
By Reference
Single
32-bit value
32-bit value
32-bit pointer to
32-bit Value
Double
(Real48)
64-bit value
64-bit value
32-bit pointer to
64-bit Value
Extended
80-bit value8
80-bit value8
32-bit pointer to
80-bit Value8
Currency
64-bit value
64-bit value
32-bit pointer to
64-bit Value
(1)
Data types that occupy less than 32-bits will still take up 32-bits. The actual value is stored in
the lowest parts of the stack location or register and the content of the remaining part is
undefined and should be treated as such at all times.
(2)
The pointer points to the lowest dword of the value. The highest dword is stored in the next
location.
(3)
Method pointers are always passed on the stack. They consist of an instance pointer, which
is pushed before the actual method pointer, which means the method pointer sits on the lowest
address on the stack.
(4)
If the Set contents fit into a byte/word/dword, its value is passed immediately, respectively as
a 8/16/32 bit value. Otherwise, a 32-bit pointer to the set is passed.
(5)
If the record contents fit into a byte/word/dword, the data is passed immediately, respectively
as a 8/16/32 bit value. Otherwise, a 32-bit pointer to the record is passed.
(6)
If the array contents fit into a byte/word/dword, the data is passed immediately, respectively
as a 8/16/32 bit value. Otherwise, a 32-bit pointer to the array is passed.
(7)
Open arrays are passed as 2 parameters: the first one is the pointer to the actual array, the
second one is the number of elements in the array. As such, passing an open array parameter
actually occupies 2 parameter slots. For instance: if you use the register calling convention
and you pass one open array parameter eax will contain the pointer to the array and edx will
contain the number of elements. See Chapter 2 for details about calling conventions. Also note
that open array parameters reside on the stack, so refrain from using very large arrays.
(8)
While the value itself occupies only 10 bytes, 12 bytes are actually allocated (3 dwords). The
content of the last 2 bytes should be considered undefined.
41
What?
shortInt
al
SmallInt(1)
ax
LongInt(2)
eax
Byte
al
Word
ax
LongWord(3)
eax
Int64
edx:eax
Boolean
al
True or False
ByteBool
al
WordBool
ax
LongBool
eax
Single
ST(0)(4)
Double
ST(0)(4)
Extended
ST(0)(4)
Real48
ST(0)(4)
Comp
ST(0)(4) (5)
Currency
ST(0)(4)
AnsiChar
al
8-bit character
WideChar
ax
Byte
al
AnsiString
(1)
42
43