x86-64 Intel Cheat Sheet Summary
x86-64 Intel Cheat Sheet Summary
These are all the normal x86-64 registers accessible from user code:
Name Notes Type 64-bit 32-bit 16-bit 8-bit
long int short char
rax Values are returned from functions in this scratch rax eax ax ah and al
register.
rcx Typical scratch register. Some instructions also scratch rcx ecx cx ch and cl
use it as a counter.
rbx Preserved register: don't use it without saving it! preserved rbx ebx bx bh and bl
rsp The stack pointer. Points to the top of the stack. preserved rsp esp sp spl
rbp Preserved register. Sometimes used to store the preserved rbp ebp bp bpl
old value of the stack pointer, or the "base".
rsi Scratch register. Also used to pass function scratch rsi esi si sil
argument #2 in 64-bit Linux. String instructions
treat it as a source pointer.
rdi Scratch register. Function argument #1 in 64-bit scratch rdi edi di dil
Linux. String instructions treat it as a destination
pointer.
r8 Scratch register. These were added in 64-bit scratch r8 r8d r8w r8b
mode, so they have numbers, not names.
r12 Preserved register. You can use it, but you need preserved r12 r12d r12w r12b
to save and restore it.
○ Some functions such as printf only get linked if they're called from
C/C++ code, so to call printf from assembly, you need to include
at least one call to printf from the C/C++ too.
○ If you use the Windows MinGW or Visual Studio C++ compiler, "long" is the same size as "int", only
32 bits / 4 bytes even in 64-bit mode. You need to use "long long" to get a 64 bit / 8 byte integer
variable on these systems. (Even on Windows, gcc, g++, or WSL makes "long" 64 bits, just like
Linux or Mac or Java.) It's probably safest to #include <stdint.h> and refer to int64_t.
○ If you use the MASM assembler, memory accesses must include "PTR", like "DWORD PTR [rsp]".
○ See NASM assembly in 64-bit Windows in Visual Studio to make linking work.
● In 32 bit mode, parameters are passed by pushing them onto the stack in reverse order, so the function's
first parameter is on top of the stack before making the call. In 32-bit mode Windows and OS X compilers
also seem to add an underscore before the name of a user-defined function, so if you call a function foo
from C/C++, you need to define it in assembly as "_foo".
Different C++ datatypes get stored in different sized registers, and need to be accessed differently:
C/C++ Bits Bytes Register Access Access Array Allocate Static
datatype memory *ptr ptr[idx] Memory
[1] It's "long long" or "int64_t" on Windows MinGW or Visual Studio; but just "long" everywhere else.
You can convert values between different register sizes using different mov instructions:
Source Size
64 bit rax mov rax,rcx movsxd rax,ecx movsx rax,cx movsx rax,cl Writes to whole register
32 bit eax mov mov eax,ecx movsx eax,cx movsx eax,cl Top half of destination gets
eax,ecx zeroed
16 bit ax mov ax,cx mov ax,cx mov ax,cx movsx ax,cl Only affects low 16 bits, rest
unchanged.
8 bit al mov al,cl mov al,cl mov al,cl mov al,cl Only affects low 8 bits, rest
unchanged.
signed char unsigned char In C/C++, char may be signed (default on gcc) or unsigned
(default on Windows compilers) by default.
movsxd movzxd Assembly, sign extend or zero extend to change register sizes.
imul mul Assembly, imul is signed (and more modern), mul is for
unsigned (and ancient and horrible!). idiv/div work similarly.
Normally, your assembly code lives in the code section, which can be read but not modified. When you declare
static data, you need to put it in section .data for it to be writeable.
Name Use Discussion
section .data r/w data This data is initialized, but can be modified.
section .rodata r/o data This data can't be modified, which lets it be shared across copies of the
program. In C/C++, global "const" or "const static" data is stored in .rodata.
section .bss r/w space This is automatically initialized to zero, meaning the contents don't need to be
stored explicitly. This saves space in the executable.
section .text r/o code This is the program's executable machine code (it's binary data, not plain
text--the Microsoft assembler calls this section ".code", a better name).
Before you can call some existing function, you need to declare that the function is "extern":
extern puts
call puts
If you want to define a function that can be called from outside, you need to declare your function "global":
global myGreatFunction
myGreatFunction:
ret
When linking a program that calls functions directly like this, you may need gcc's "-no-pie" option, to disable the
position-independent executable support.
Instructions
For gory instruction set details, read this per-instruction reference, or the uselessly huge Intel PDF (4000 pages!).
Instruction Purpose Examples
mov dest,src Move data between registers, load immediate data mov rax,4 ; Load constant into rax
into registers, move data between registers and mov rdx,rax ; Copy rax into rdx
mov [rdi],rdx ; Copy rdx into the
memory.
memory that rdi is pointing to
push src Insert a value onto the stack. Useful for passing push rbx
arguments, saving registers, etc.
pop dest Remove topmost value from the stack. Equivalent pop rbx
to "mov dest, [rsp]; add 8,rsp"
call func Push the address of the next instruction and start call puts
executing func.
ret Pop the return program counter, and jump there. ret
Ends a function.
mul src Multiply rax and src as unsigned integers, and put mul rdx ; Multiply rax by rdx
the result in rax. High 64 bits of product (usually ; rax=low bits, rdx overflow
zero) go into rdx.
jmp label Goto the instruction label:. Skips anything else in jmp post_mem
the way. mov [0],rax ; Write to NULL!
post_mem: ; OK here...
cmp a,b Compare two values. Sets flags that are used by cmp rax,10
the conditional jumps (below).
jl label Goto label if previous comparison came out as jl loop_start ; Jump if
less-than. Other conditionals available are: rax<10
jle (<=), je (==), jge (>=), jg (>), jne (!=)
Also available in unsigned comparisons:
jb (<), jbe (<=), ja (>), jae (>=)
And checking for overflow (jo) and carry (jc).
Standard Idioms
Looping over array elements, including the first-time test at startup:
Properties The stack is only 8 Slowest memory allocation: Static data stays allocated
megs on most costs at least a half-dozen until the program exits.
machines. function calls.
add addss addsd addps addpd sub, mul, div all work the same
way
min minss minsd minps minpd max works the same way
cvt cvtss2sd cvtsd2ss cvtps2pd cvtpd2ps Convert to ("2", get it?) Single
cvtss2si cvtsd2si cvtps2dq cvtpd2dq Integer (si, stored in register like
cvttss2si cvttsd2si cvttps2dq cvttpd2dq eax) or four DWORDs (dq, stored
in xmm register). "cvtt" versions
do truncation (round down); "cvt"
versions round to nearest.
com ucomiss ucomisd n/a n/a Sets CPU flags like normal x86
"cmp" instruction for unsigned,
from SSE registers.
cmp cmpeqss cmpeqsd cmpeqps cmpeqpd Compare for equality ("lt", "le",
"neq", "nlt", "nle" versions work
the same way). Sets all bits of
float to zero if false (0.0), or all
bits to ones if true (a NaN).
Result is used as a bitmask for
the bitwise AND and OR
operations.
● 0=A&0 AND by 0's creates 0's, used to mask out bad stuff
● A=A&~0 AND by 1's has no effect
Weird Instructions
x86 is ancient, and it has many weird old instructions. The more useful ones include:
div src Unsigned divide rax by src, and put the ratio into mov rax, 100 ; numerator
rax, and the remainder into rdx. mov rdx,0 ; avoid error
Bizarrely, on input rdx must be zero (high bits of mov rcx, 3 ; denominator
numerator), or you get a SIGFPE. div rcx ; compute rax/rcx
idiv src Signed divide rax by the register src. mov rax, 100 ; numerator
rdx = rax % src cqo ; sign-extend into rdx
rax = rax / src mov rcx, 3 ; denominator
Before idiv, rdx must be a sign-extended version of idiv rcx
rax, usually using cqo (Convert Quadword rax to
Octword rdx:rax).
shr val,bits Bitshift a value right by a constant, or the low 8 bits add rcx,4
of rcx ("cl"). shr rax,cl ; shift by rcx
Shift count MUST go in rcx, no other register will
work!
lea dest,[ptr expression] Load Effective Address of the pointer into the lea rcx,[rax + 4*rdx +12]
destination register--doesn't actually access
memory, but uses the memory syntax.
loop jumplabel Decrement rcx, and if it's not zero, jump to the mov rcx,10
label. start:
add rax,7
loop start
scasb Compare the next char from string with register al:
cmp BYTE PTR[rdi++], al
rep stringinstruction Repeat the string instruction rcx times. Only works mov al,'x'
with string instructions (lods, stos, cmps, scas, mov rcx,100
cmps, ins, outs) mov rdi,bufferStart
rep stosb
repne stringinstruction Repeat the string instruction until the instruction mov al,0
sets the zero flag, or rcx gets decremented down to mov rcx,-1
zero. mov rsi,stringStart
repne lodsb
Debugging Assembly
error: parser: instruction expected
error: label or instruction expected at start of line
● This means you spelled the instruction name wrong.
It compiles but won't link: "undefined reference to foo()" <- note parenthesis!
● The C++ side needs to use "extern "C" long foo(void);" because you get this C++-vs-C link error if you
leave out the extern "C".
It compiles but won't link: "undefined reference to _foo" <- note underscore!
● The assembly side may need to add underscores to match the compiler's linker names. This seems
common on 32-bit machines.
Using a debugger, like gdb, is very handy both for writing new code, and analysing existing programs even if you
just have a compiled binary without source code. Here's my GDB reverse engineering cheat sheet.