Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
1
Intel-based Assembly
2
Intel-based Assembly
INTEL HISTORY
Intel Microprocessor History
4
Intel Microprocessor History
11/22/2021 5
Intel Microprocessor History
11/22/2021 6
Intel Microprocessor History
11/22/2021 7
Intel Microprocessor History
11/22/2021 8
Intel Microprocessor History
11/22/2021 9
Intel Microprocessor History
11/22/2021 10
Intel micro-processor history
https://en.wikipedia.org/wiki/List_of_Intel_microprocessors
11
Intel micro-processor history
https://en.wikipedia.org/wiki/List_of_Intel_microprocessors
12
Intel micro-processor series
13
Intel micro-processor series
14
Intel-based Assembly
INTEL REGISTERS
Basic Execution Environment
General-purpose registers
Index and base registers
Specialized register uses
Status flags
Floating-point, MMX, XMM registers
16
Some Specialized Register
Uses (1 of 2)
General-Purpose
RAX/EAX – accumulator
RCX/ECX – loop counter
RSP/ESP – stack pointer
RSI/ESI, RDI/EDI – index registers
RBP/EBP – extended frame pointer
(stack)
RIP/EIP/IP – instruction pointer
RFLAGS/EFLAGS
status and control flags
each flag is a single binary bit 17
Status Flags
• Carry (CF)
unsigned arithmetic out of range
• Overflow (OF)
signed arithmetic out of range
• Sign (SF)
result is negative
• Zero (ZF)
result is zero
• Auxiliary Carry
carry from bit 3 to bit 4
• Parity (PF)
sum of 1 bits is an even number
18
X86_64
AMD architecture
http://developer.amd.com/documentation/guide
s/Pages/default.aspx
https://software.intel.com/en-us/articles/intel-
sdm
Expand the registers into 64bits, rax, rbx,
rcx, rdx, …
19
X86-64 Intel registers
https://en.wikipedia.org/wiki/X86
X86_64 registers
21
X86_64 registers (cont’d)
RAX EAX AX AH AL
RBX EBX BX BH BL
RCX ECX CX CH CL
RDX EDX DX DH DL
RDI EDI DI DIL
RSI ESI SI SIL
RBP EBP BP BPL
RSP ESP SP SPL
R8 R8D R8W R8B
…
R15 R15D R15W R15B
Intel-based Assembly
BASIC INSTRUCTIONS
ASM Programming levels
ASM programs can perform input-output at
each of the following levels:
C library Level 3
OS function Level 2
ASM Program
BIOS function Level 1
Hardware Level 0
24
Program structure
.section .data
output: .asciz “The processor Vendor ID is ‘%s’\n”
.section .text
.globl _start
_start:
program_body
25
Data Definition Statement
A data definition statement sets aside storage in
memory for a variable.
Syntax:
[name:] directive initializer [,initializer] . . .
All initializers become binary data in memory
Data type: .byte, .short (.2byte), .int (.long, .4byte), .quad
(.8byte), .float, .double, .asciz, .zero expression
26
Operand Types
Three basic types of operands:
Immediate – a constant integer
Imm8, imm16, imm32, imm64
Register – the name of a register
register name is converted to a number and encoded
within the instruction
r8, r16, r32, r64, x (real number processing register)
Memory – reference to a location in memory
memory address is encoded within the instruction, or a
register holds the address of a memory location
m8, m16, m32, m64
27
Instruction Operand Notation
Convention:
o w (word): 16 bits;
o d (double word): 32 bits;
o q (quadword): 64 bits
Operand Description
r8 8-bit general purpose register: AH, AL, BH, BL, CH, CL, r8b, …
r16 16 bit general purpose register: AX, BX, CX, DX, SI, DI, r8w, …
r32 32 bit general purpose register: EAX, EBX, ECX, EDX, r8d, …
r64 64 bit general purpose register: RAX, RBX, RCX, RDX, r8, …
imm8/16/32/64 An immediate of 8, 16, 32, 63 bit
m8/16/32/64 A variable of 8, 16, 32, 64 bit
x xmm register
r/m8/16/32/64 A register or variable of 8, 16, 32, 64
reg any general purpose register
28
Assembly standards
• Intel standard • AT&T standard
29
Manual
30
MOV Instruction (assigment)
• Move from source to destination. Syntax:
MOV source, destination
• Both operands must be the same size
• No more than one memory operand permitted
.section .data
Output: .asciz “The result is: ”
Val: .int 10
.section text
…
mov $4, %eax #eax=4
mov $1, %ebx #ebx=1
mov $output, %rcx #rcx=&output
mov $12, %edx #edx=12
…
mov val, %eax #eax=val
…
mov %eax,val #val=eax
31
Direct-Offset Operands
An offset is added to a data label to produce an effective
address (EA).
32
Addition and Subtraction
INC and DEC Instructions
ADD and SUB Instructions
NEG Instruction
Implementing Arithmetic Expressions
Flags Affected by Arithmetic
Zero
Sign
Carry
Overflow
33
INC and DEC Instructions
34
ADD and SUB Instructions
• ADD source, destination
• Logic: destination destination + source
• SUB source, destination
• Logic: destination destination – source
• Same operand rules as for the MOV
instruction
35
ADD and SUB Examples
36
NEG (negate) Instruction
Reverses the sign of an operand. Operand can be a register or
memory operand.
• NEG destination
• Logic: destination - destination
valB: .BYTE -1
valW .int +32767
…
mov valB,%al # AL = -1
neg %al # AL = +1
neg valW # valW = -32767
37
MUL Instruction
• Unsigned multiplication
• MUL r8/m8 - MUL r16/m16
• MUL r32/m32 - MUL r64/m64
Multiplicant Multiplier Product
AL r8/m8 AX
AX r16/m16 DX:AX
EAX r32/m32 EDX:EAX
RAX r64/m64 RDX:RAX
38
DIV Instruction
• Unsigned multiplication
• DIV r8/m8 - DIV r16/m16 - DIV r32/m32 - DIV r64/m64
Dividend Divisor Quotient Remainder
AX r8/m8 AL AH
DX:AX r16/m16 AX DX
EDX:EAX r32/m32 EAX EDX
RDX:RAX r64/m64 RAX RDX
40
Zero Flag (ZF)
The Zero flag is set when the result of an operation produces
zero in the destination operand.
Remember...
• A flag is set when it equals 1.
• A flag is clear when it equals 0.
41
JMP Instruction
• JMP is an unconditional jump to a label that is usually within
the same procedure.
• Syntax: JMP target
• Logic: RIP target
• Example:
top:
.
.
jmp top #goto top
42
JMP Instruction
• JMP is an unconditional jump to a label that is usually within
the same procedure.
• Syntax: JMP target
• Logic: RIP target
• Example:
top:
.
.
jmp top
44
CMP Instruction (1 of 3)
47
Jumps Based on Equality
cmp left, right
Instruction Description
JE label if(right==left) goto label;
JNE label if(right!=left) goto label;
JCXZ label if (%CX==0) goto label;
JECXZ if (%ECX==0) goto label;
JRCXZ if (%RCX==0) goto label;
48
Jumps Based on Unsigned
Comparisons
cmp left, right
Mnemonic Description Flag
JA label if (right>left) goto label; CF=0 && ZF=0
JNBE label if (right>left) goto label; CF=0 && ZF=0
JAE label if (right>=left) goto label; CF=0
JNB label if (right>=left) goto label; CF=0
JB label if (right<left) goto label; CF=1
JNAE label if (right<left) goto label; CF=1
JBE label if (right<=left) goto label; CF=1 && ZF=1
JNA label if (right<=left) goto label; A=Above; E=Equal;
JE label if (right==left) goto label; N=Not; J=Jump
JNE label if (right!=left) goto label; B=Below
49
Jumps Based on Signed
Comparisons
cmp left, right
Mnemonic Description Flag
JG label if (right>left) goto label; SF=OF && ZF=0
JNLE label if (right>left) goto label; SF=OF && ZF=0
JGE label if (right>=left) goto label; SF=OF
JNL label if (right>=left) goto label; SF=OF
JL label if (right<left) goto label; SF!=OF
JNGE label if (right<left) goto label; SF!=OF
JLE label if (right<=left) goto label; SF!=OF && ZF=1
JNG label if (right<=left) goto label; SF!=OF && ZF=1
JE label if (right==left) goto label; G=Greater than
JNE label if (right!=left) goto label; L=Less than
50
Intel-based Assembly
CONTROL STRUCTURE
Conditional Structures
• Block-Structured IF Statements
• Compound Expressions with AND
• Compound Expressions with OR
• WHILE Loops
• Table-Driven Selection
52
Applications
• Task: Jump to a label if unsigned EAX is greater than EBX
• Solution: Use CMP, followed by JA
53
Applications
• Task: Jump to a label if unsigned EAX is greater than EBX
• Solution: Use CMP, followed by JA
cmp %ebx, %eax if (%eax > %ebx)
ja Larger goto Larger
54
Block-Structured IF Statements
56
Your turn . . .
Implement the following pseudocode in
assembly language. All values are unsigned:
if(ebx <= ecx ) if: cmp %ecx,%ebx
{ ja endif
then: mov $5, %eax
eax = 5;
mov $6,%edx
edx = 6;
endif:
}
57
Your turn . . .
Implement the following pseudocode in
assembly language. All values are unsigned:
if(ebx <= ecx ) if:cmp %ecx,%ebx if: cmp %ecx,%ebx
{ ja endif jbe then
mov $5, %eax jmp endif
eax = 5;
mov $6,%edx then: mov $5, %eax
edx = 6;
endif: mov $6,%edx
} endif:
58
Compound Expression with
AND (2 of 3)
if ((al > bl) && (bl > cl))
X = 1;
60
Your turn . . .
Implement the following pseudocode in
assembly language. All values are unsigned:
63
Compound Expression with
OR (1 of 2)
if ((al > bl) || (bl > cl))
X = 1;
64
WHILE Loops
A WHILE loop is really an IF statement followed by the body
of the loop, followed by an unconditional jump to the top of
the loop. Consider the following example:
65
WHILE … DO Loops
A WHILE loop is really an IF statement followed by the body
of the loop, followed by an unconditional jump to the top of
the loop. Consider the following example:
67
Your turn . . .
Implement the following loop, using unsigned 32-bit integers:
70
LOOP Instruction- for loop
Calculate the total of the first n integers (n>0)
for(r9d=0,ecx=n;ecx>0;ecx--) r9d+=ecx;
init:
mov $0,%r9d
mov n,%ecx
for:
add %ecx,%r9d
loop for
71
for loop – general case
Calculate the total of the first n integers (n>0)
for(r9d=0,r10d=0;r10d<=n;r10d++) r9d+=
r10d;
init:
mov $0,%r9d
xor %r10d,%r10d
while:
cmp n,%r10d
ja endwhile
do:
add %r10d,%r9d
inc %r10d
jmp while
endwhile:
72
Intel-based Assembly
74
SSE introduction
75
X86-64 Intel registers
https://en.wikipedia.org/wiki/X86
SSE instructions:
assignment
Instruction Source Destination Description
movss M32/X X dst=src;
movss X M32 dst=src;
movsd M64/X X dst=src;
movsd X M64 dst=src;
77
SSE instructions (cont’d)
len: .double 23.45
result: .double 0.0
arr: .double 3.1,2.3,3.4,4.5,5.6
...
movsd len,%xmm0
movsd %xmm0,result
mov $1, %edx
movsd arr(,%edx,8),%xmm1
78
SSE instructions (cont’d)
float double Description
addss src, dst addsd src, dst dst+=src;
subss src, dst subsd src, dst dst-=src;
mulss src, dst mulsd src, dst dst*=src;
divss src, dst divsd src, dst dst/=src;
maxss src, dst maxsd src, dst dst=max(src,dst);
minss src, dst minsd src, dst dst=max(src, dst);
sqrtss src, dst sqrtsd src, dst dst=sqrt(src);
79
SSE instructions (cont’d)
len: .double 23.45
result: .double 0.0
arr: .double 3.1,2.3,3.4,4.5,5.6
...
movsd len,%xmm0
movsd arr(,%edx,8),%xmm1
addsd %xmm1,%xmm0
movsd %xmm0,result
80
SSE instructions (cont’d)
…
ucomisd %xmm1, %xmm0
jb else
movsd %xmm1,%xmm0
else:
movsd %xmm0,result
…
81
Exercises
Write a program to add two double numbers
and print the result on screen
Write a program to multiply two double
numbers and print the result on screen
Write a program to print the maximum
number of the two double numbers
Write a program to sum the elements of a
double array and print the result on screen
82
Exercises (cont’d)
Write a program to solve the equation ax+b=0
Write a program to solve the equation ax2+bx+c=0
Write a program to print the first of n numbers of a
geometric sequence (cấp số nhân) with a given
value of a and r
Write a program to print the first of n number in an
arithmetic sequence (cấp số cộng) with a given
value of d and u
Write a program to find the maximum number of a
double array
83
Intel-based Assembly
85
Linux 64bit C data model
Integer data type (in bits)
86
SSE: real-2-real, integer-2-real
conversion
Instruction Source Destination Description
cvtss2sd M32/X X dst=double(src);//src is float
cvtsd2ss M64/X X dst=float(src);//src is double
88
unsigned Integer data conversion
unsigned char uc=12; unsigned short us=71;
unsigned int ui = 23; unsigned long ul=98;
Instruction Description Example
movzx r/m8, r16 us=unsigned short(uc); movzx uc, %ax
movzx r/m8, r32 ui=unsigned int(uc); movzx uc, %eax
movzx r/m8, r64 ul=unsigned long(uc); movzx uc, %rax
movzx r/m16, r32 ui=unsigned int(us); movzx us, %eax
mov r/m16, r64 ul=unsigned long(us); movzx us, %rax
mov r/m32, r32 ul=unsigned long(ui); mov ui, %eax #(*)
(*) Upper 32 bits of %rax will be filled by 0
=> %rax = unsigned long(ui);
89
Signed Integer data conversion
char c=-12; short s=71;
int i = -23; long l=98;
90
Signed Integer data conversion
Instruction Description
CBW AX=SE(AL)
CWDE EAX=SE(AX)
CDQE RAX=SE(EAX)
91
Bigger to smaller Integer
conversion
char c=-12; short s=71;
int i = -23; long l=98;
unsigned char uc=12; unsigned short us=71;
unsigned int ui = 23; unsigned long ul=98;
Instruction Description
mov ul, %rax ui=%eax; us= %ax; uc=%al;
mov l, %rax i=%eax; s= %ax; c=%al;
mov ui, %eax us= %ax; uc=%al;
mov I, %eax s= %ax; c=%al;
mov us, %ax uc=%al;
mov s, %ax c=%al;
92
Integer conversions
i: .int -6
l: .long # l=long(i); is translated as
msg: .asciz “long value is %ld”
…
mov i , %eax
movsxd %eax, %rsi #conversion
mov %rsi, l
mov $msg, %rdi
call printf
93
Unsigned Integer conversions
ui: .int 0xFFAABBCC #unsigned int ui;
ul: .long # unsigned long ul;
msg: .asciz “ulong value is %lu”
#l=unsigned long(ui); is translated as
…
mov ui , %eax
mov %eax, %esi #conversion
mov %rsi, l
mov $msg, %rdi
call printf # 0xFFAABBCC
94
Intel-based Assembly
FUNCTION/PROCEDURE
Procedure/Function
Define Call
convert: call convert
mov $10,%ebx
xor %ecx, %ecx
Steps to call:
… 1. Assign parameters
ret to suitable registers
2. call proc/funct
Parameters are passed via 3. Use the returned
registers value
96
C library function arguments
Real arguments: 1. xmm0, 2. xmm1, …; return value xmm0
64 32 16 8 Description
%rax %eax %ax %al return value
%rbx %ebx %bx %bl Callee saved
%rcx %ecx %cx %cl 4th argument
%rdx %edx %dx %dl 3rd argument
%rsi %esi %si %sil 2nd argument
%rdi %edi %di %dil 1st argument
%rbp %ebp %bp %bpl Callee saved
%rsp %esp %sp %spl Stack pointer
%r8 %r8d %r8w %r8b 5th argument
%r9 %r9d %r9w %r9b 6th argument
97
System call
Each system call has different arguments
Assign parameters to appropriate registers
Use int 0x80
Example Print a string
msg: .asciz “Hello World”
Exit from the program
mov $4, %eax
mov $0, %ebx
mov $1, %ebx
mov $1, %eax
mov $msg, %ecx
int $0x80
mov $10, %edx
int $0x80 98
Call C library function
64bit architecture: use registers to pass arguments
.section .data
format_string: .asciz "Vendor ID: %d\n"
vendor_id: .int 12
.section .text
.globl _start
_start:
#Arguments to C functions:
mov $format_string, %rdi
mov vendor_id, %esi
mov $0, %eax
call printf printf(“Vendor ID is: %d\n”,id);
call exit
99
Development tools
compiler: as, linker: ld, debugger: gdb
.section .data
output: .asciz "The Vendor ID is '%d'\n“
vendor_id : .byte 12
.section .text
.globl _start
_start:
mov $format_string, %edi
mov vendor_id, %esi
mov $0, %eax
call printf
call exit
Compile, link and run the program
$ as –o print.o printf.s
$ ld –dynamic-linker /lib64/ld-linux-x86-64.so.2 –lc –o print print.o
$ ./print
100
Numeric types and conversions
(cont’d)
Celcius to fahrenheit
double cel2fahr(float temp)
{
return 1.8 * temp + 32;
}
convert the above function into an assembly
procedure
101
Numeric types and conversions
(cont’d)
Celcius to fahrenheit
#temp is in xmm0; scale: .double 1.8
proc_cel2fahrenheit:
mov $32,%eax #eax=32
cvtsi2sd %eax,%xmm2 #xmm2=double(eax)
movsd scale,%xmm1 #xmm1=scale
cvtss2sd %xmm0,%xmm0 #xmm0=double(xmm0)
mulsd %xmm1,%xmm0 #xmm0*=xmm1
addsd %xmm2,%xmm0 #xmm0+=xmm2
ret
102
Numeric types and conversions
(cont’d)
Celcius to fahrenheit
double cel2fahr(int *temp){
return 1.8 * (*temp) + 32.0;
}
convert the above function into an assembly
procedure
103
Numeric types and conversions
(cont’d)
Celcius to fahrenheit
#rdi=&temp
proc_cel2fahrenheit:
mov 0(%rdi),%ebx #ebx=*rdi;#ebx=temp
cvtsi2sd %ebx,%xmm0 #xmm0=double(ebx)
mov $32,%eax #eax=32
cvtsi2sd %eax,%xmm2 #xmm2=double(eax)
movsd scale,%xmm1 #xmm1=scale
mulsd %xmm1,%xmm0 #xmm0*=xmm1
addsd %xmm2,%xmm0 #xmm0+=xmm2
ret 104
Exercises
void proc(int a1, double *a1p)
{
*a1p = a1*2.5;
}
Convert the above function into an assembly
procedure
105
Exercises (cont’d)
double fcvt(int i, float *fp, double *dp, long *lp)
{
float f = *fp; double d = *dp; long l = *lp;
*lp = (long) d;
*fp = (float) i;
*dp = (double) l;
return (double) f;
}
Convert the above function into an assembly
procedure
106
Exercises (cont’d)
double funct(double a,
float x, double b, int i)
{
return a*x - b/i;
}
Convert the above function into an
assembly procedure
107
Exercises
Write a procedure to print a number (in %eax)
Write a program to print the value of factorial N (N!)
Write a program to print the value of factorial N (N!) in a
recursive procedure
Write a program to print the product of two integer
numbers (a*b) by an addition procedure
Write a program to print the dividend of two integer
numbers (a%b) by a recursive subtraction procedure
Write a program to calculate the sum of an array
Write a program to calculate the sum of the first n
natural numbers (1+2+3+…+n)
108
Exercises cont’d
Write a program to print the first n fibonaci numbers
Write a program to print the first of n numbers of a
geometric sequence with a given value of a and r
Write a program to print the first of n number in an
arithmetic sequence with a given value of d and u
Write a program to find out the greatest common
divisor of the two numbers a and b
Write a program to find out the lowest common
multiple of the two numbers a and b
Write a program to sort an array
109
Exercises cont’d
Fast calculate the function
f(x)=
with the following method
f(x)=(
Where is the element of an array float
a[n+1]. For example: float a[]={1, 2, 3, 4, 5};
then , ,…,
110
Fibonaci
ebx=1; eax=1;
for(ecx=3;ecx<=n;ecx++){
r8d=ebx+eax;
ebx=eax;
eax=r8d;
}
111
Fibonaci-recursive version
unsigned long fibonaci(unsigned long
n){
if(n<=2) return 1;
n1=fibonaci(n-1);
n2=fibonaci(n-2);
return n1+n2;
}
112
Factorial
eax=1;
for(ebx=1;ebx<=n;ebx++)eax*=ebx;
113
Factorial-recursive version
unsigned long fact(unsigned long n){
if(n==1) return 1;
unsigned long t=fact(n-1);
t*=n;
return t;
}
114
Geometric sequence
#a(n)=a(n-1)*r=a.rn-1;
xmm0=a;
xmm1=r;
for(ecx=0;exc<n;ecx++)xmm0*=xmm1;
115
Equation ax+b=0
xmm0=a; xmm2=b;xmm1=0;
if(xmm0==xmm1){ #a==0?
if(xmm2!=xmm1) #b==0
edx=-1; #impossible equation
else edx=0; #countless solution
}else{
edx=1; #one solution
xmm0=-xmm2/xmm0;
}
116
Maximum number of an array
xmm0=a[0];
for(ecx=1;ecx<n;ecx++)
if(xmm0<a[ecx])xmm0=a[ecx];
Sum of an array
xmm0=0;
for(ecx=0;ecx<n;ecx++)
xmm0+=a[ecx];
117
Equation ax2+bx+c=0 (a!=0)
xmm5=a; xmm1=b;xmm2=c;xmm3=0;
xmm4=xmm1*xmm1-4*xmm5*xmm2; #delta=b*b-4*a*c;
if(xmm4<xmm3) edx=0; #impossible equation
else if(xmm4==xmm3){
edx=1; xmm0=-xmm1/xmm5; #one solution
}else {
edx=2; #two solutions
xmm0=(-xmm1-sqrt(xmm4))/(2*xmm5);
xmm1=(-xmm1+sqrt(xmm4))/(2*xmm5);
}
118
Reference
Professional
Assembly
Language,
Richard Blum,
2005
119
Reference
Assembly
Language for
Intel-Based
Computers,
120
Reference
http://x86.renejeschke.de/
https://en.wikipedia.org/wiki
/X86_instruction_listings
121
End of chapter
Happy coding!
Any questions?
122
Intel-based Assembly
APPENDICES
Appendix: Call C library
function in 32 bit architecture
Use stack to pass arguments
.section .data
output: .asciz "The Vendor ID is '%d'\n“
buffer: .byte 12
.section .text
.globl _start
_start:
push $12
push $output
call printf
addl $8, %esp
push $0
call exit
124
x86 FP Architecture
Originally based on 8087 FP coprocessor
8 × 80-bit extended-precision registers
Used as a push-down stack
Registers indexed from TOS: ST(0), ST(1), …
FP values are 32-bit or 64 in memory
Converted on load/store of memory operand
Integer operands can also be converted
on load/store
Very difficult to generate and optimize code
Result: poor FP performance
125
x86 FP Instructions
Data transfer Arithmetic Compare Transcendental
FILD mem/ST(i) FIADDP mem/ST(i) FICOMP FPATAN
FISTP mem/ST(i) FISUBRP mem/ST(i) FIUCOMP F2XMI
FLDPI FIMULP mem/ST(i) FSTSW AX/mem FCOS
FIDIVRP mem/ST(i)
FLD1 FPTAN
FSQRT
FLDZ FPREM
FABS
FPSIN
FRNDINT
FYL2X
Optional variations
I: integer operand
P: pop operand from stack
R: reverse operand order
But not all combinations allowed
126