0% found this document useful (0 votes)
2 views20 pages

Lecture 10

The document outlines the processes of forward and reverse engineering in programming, detailing the steps involved in building and deconstructing software. It discusses the tools used in each process, the importance of registers in assembly language, and the methods for managing control flow and function calls. Additionally, it covers memory addressing, data storage, and the significance of calling conventions in function execution.

Uploaded by

jowaowa101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views20 pages

Lecture 10

The document outlines the processes of forward and reverse engineering in programming, detailing the steps involved in building and deconstructing software. It discusses the tools used in each process, the importance of registers in assembly language, and the methods for managing control flow and function calls. Additionally, it covers memory addressing, data storage, and the significance of calling conventions in function execution.

Uploaded by

jowaowa101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Reverse Engineering

Lecture 10
Design

The Forward Engineering Process Code

Compile
"Forward Engineering" is an overloaded term, but
in this context, it is the process of building a Fix Tons of Bugs
program.
Compile

1. Figure out what you want to code. Extensive Cursing


2. Code it.
3. Compile it. Fix More Bugs
4. Run it.
Compile

At every step, information is lost! Assemble


Design

Forward Engineering Tools Code

Compile
Let's look at some tools:
Fix Tons of Bugs
- Visual Studio/ an IDE
- Compile
gcc
- Strings Extensive Cursing

Fix More Bugs

Viola! An ELF is born. Compile

Assemble
The Reverse Engineering Process Understand

Every step in the reverse-engineering process is


imperfect and relies on some amount of human Lots of Thinking
help.

Typically, a reverser use several reverse Decompile


engineering tools to build up a mental model of
the target software.
Disassemble
This art is the focus of this module: how do
we reverse the design from the binary?
Assembly Refresher
#
Registers rax

rdx

Registers are very fast, temporary stores for data. rsi

You get several "general purpose" registers:


- 8085: a, c, d, b, e, h, l
- 8086: ax, cx, dx, bx, sp, bp, si, di
- x86: eax, ecx, edx, ebx, esp, ebp, esi, edi
- amd64: rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15
- arm: r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14

The address of the next instruction is in a register:


eip (x86), rip (amd64), r15 (arm)

Various extensions add other registers (x87, MMX, SSE, etc).


#
Setting Registers
You load data into registers with... assembly! "mov" means "move".
mov rax, 0x539
mov rbx, 1337

Data specified directly in the instruction like this is called an


Immediate Value.
You can also load data into partial registers:
mov ah, 0x5
mov al, 0x39
#
Register Arithmetic
Once you have data in registers, you can compute!
For most arithmetic instructions, the first specified register stores the result.
Instruction C / Math equivalent Description
add rax, rbx rax = rax + rbx add rax to rbx
sub ebx, ecx ebx = ebx - ecx subtract ecx from ebx
imul rsi, rdi rsi = rsi * rdi multiple rsi to rdi, truncate to 64-bits
inc rdx rdx = rdx + 1 increment rdx
dec rdx rdx = rdx - 1 decrement rdx
neg rax rax = 0 - rax negate rax in terms of numerical value
not rax rax = ~rax negate each bit of rax
and rax, rbx rax = rax & rbx bitwise AND between the bits of rax and rbx
or rax, rbx rax = rax | rbx bitwise OR between the bits of rax and rbx
xor rcx, rdx rcx = rcx ^ rdx bitwise XOR (don't confuse ^ for exponent!)
shl rax, 10 rax = rax << 10 shift rax's bits left by 10, filling with 10 zeroes on the right

shr rax, 10 rax = rax >> 10 shift rax's bits right by 10, filling with 10 zeroes on the left
shift rax's bits right by 10, with sign-extension to fill the now
sar rax, 10 rax = rax >> 10
"missing" bits!
ror rax, 10 rax = (rax >> 10) | (rax << 54) rotate the bits of rax right by 10

rol rax, 10 rax = (rax << 10) | (rax >> 54) rotate the bits of rax left by 10

Curious how these work? Play around with the rappel tool ( https://github.com/yrp604/rappel)!
#
Memory (stack)
The stack has several uses. For now, we'll talk about temporary data
storage.
Registers and immediates can be pushed onto the stack to save
values:
mov rax, 0xc001ca75
push rax
push 0xb0bacafe # WARNING: even on 64-bit x86, you can only push 32-bit immediates...

c001ca75

c001ca75
b0bacafe
push rax
stack
(Like mov, push leaves the value in the src register intact.)

Values can be popped back off of the stack (to any register!).
pop rbx # sets rbx to 0xc001ca75

c001ca75
stack
pop rcx # sets rcx to 0xb0bacafe
#
Addressing the Stack
The CPU knows where the stack is because its address is stored in
rsp = 0x7f01f3453050
rsp.

0x7f01f345305
0

c001ca75
stack

rsp = 0x7f01f3453048

push 0xb0bacafe

0x7f01f345304
8

c001ca75
b0bacafe
stack

rsp = 0x7f01f3453050
pop rcx

0x7f01f345305
0

c001ca75
stack

Historical oddity: the stack grows backwards toward


smaller memory addresses!
push decreases rsp, pop increases it.
#
Accessing Memory
You can also move data between registers and memory with ... mov!
This will load the 64-bit value stored at memory address 0x12345 into
rbx:
mov rax, 0x12345
mov rbx, [rax]

This will store the 64-bit value in rbx into memory at address
0x133337:
mov rax, 0x133337
mov [rax], rbx

This is equivalent to push rcx:


sub rsp, 8
mov [rsp], rcx

Each addressed memory location contains one byte.


#
Memory Endianess
Data on most modern systems is stored backwards, in little endian.
ah al

mov eax, 0xc001ca75 # sets rax to c0 01 ca 75


mov rcx, 0x10000 0x10000 0x10001 0x10002 0x10003

mov [rcx], eax # stores data as 75 ca 01 c0


mov bh, [rcx] # reads 0x75

https://en.wikipedia.org/wiki/Endianness
Assembly Crash Course
#
Computers Make Decisions
if (authenticated) {
leetness = 1337;
}
else {
leetness = 0;
}
So far, we've just shunted data around.
But how do we make decisions?
#
What to Execute?
First, let's look at how computers execute instructions.
Recall: Assembly instructions are direct translations of binary code.
This binary code lives in memory.
0x10000 0x7fffffffffff

Dynamically Allocated Memory OS Helper


Program Binary Code (managed by libraries) Library Code Process Stack Regions

Example:
0x400800

Program add rax,


pop rax pop rbx push rax
Binary Code rbx

This is (in hex):


0x400800 0x400801 0x400802 0x400805

Program
58 5b 48 01 d8 50
Binary Code
#
Control Flow: Jumps
CPUs execute instructions in sequence until told not to.
One way to interrupt the sequence is with a jmp instruction:
mov cx, 1337
jmp STAY_LEET
mov cx, 0
STAY_LEET:
push rcx
0x400800 STAY_LEET

Program mov rcx, 0x1337 jmp STAY_LEET mov rcx, 0 push rcx
Binary Code

STAY_LEET
0x400800 0x400804 0x400806 0x40080a
eb 04
Program
66 b9 37 13 (skip 4 66 b9 00 00 51
Binary Code bytes)

jmp skips X bytes and then resumes execution!


But that's still not enough for decisions...
je jump if equal
jn jump if not equal
e jump if greater
#
Control Flow: Conditional Jumps! jg
jl
jle
jump if less
jump if less than or equal
jump if greater than or equal
jg jump if above (unsigned)
e jump if below (unsigned)
Jumps can rely on conditions! ja jump if above or equal
mov cx, 1337 jb (unsigned)
jnz STAY_LEET ja jump if below or equal
mov cx, 0 e (unsigned)
jb jump if signed
STAY_LEET: e jump if not signed
push rcx js jump if overflow
jn jump if not overflow
0x400800 STAY_LEET
s jump if zero
Program
Binary Code
mov rcx, 0x1337 jmp STAY_LEET mov rcx, 0 push rcx jo jump if not zero
jn
STAY_LEET o
0x400800 0x400804 0x400806 0x40080a
jz
Program
Binary Code
66 b9 37 13 75 04 66 b9 00 00 51 jn
z

jnz is "jump if not zero", but if what is not zero?


je jump if equal ZF=1
jn jump if not equal ZF=0
e jump if greater ZF=0 and SF=OF
#
Control Flow: Conditions jg
jl
jle
jump if less
jump if less than or equal
jump if greater than or equal
SF!=OF
ZF=1 or SF!=OF
SF=OF
jg jump if above (unsigned) CF=0 and ZF=0
e jump if below (unsigned) CF=1
Conditional jumps check Conditions ja jump if above or equal CF=0
stored in the "flags" register: rflags. jb (unsigned) CF=1 or ZF=1
ja jump if below or equal SF=1
e (unsigned) SF=0
Flags are updated by: jb jump if signed OF=1
Most arithmetic instructions.
e jump if not signed OF=0
Comparison instruction cmp (sub, but discards result).
js jump if overflow ZF=1
Comparison instruction test (and, but discards result).
jn jump if not overflow ZF=0
s jump if zero
Main conditional flags: jo jump if not zero
Carry Flag: was the 65th bit 1? jn
Zero Flag: was the result 0?
o
Overflow Flag: did the result "wrap" between positive to negative?
jz
Signed Flag: was the result's signed bit set (i.e., was it negative)?
jn
z
Common patterns:
cmp rax, rbx; ja STAY_LEET # unsigned rax > rbx. 0xffffffff >= 0
cmp rax, rbx; jle STAY_LEET # signed rax <= rbx. 0xffffffff = -1 < 0
test rax, rax; jnz STAY_LEET # rax != 0
cmp rax, rbx; je STAY_LEET # rax == rbx

Thanks to Two's Complement, only the jumps themselves have to be signedness-aware.


#
Control Flow: Function Calls!
Assembly code is split into functions with call and ret.
call pushes rip (address of the next instruction after the call) and jumps away!
ret pops rip and jumps to it!

Using a function that takes an authenticated value and returns


leetness: int check_leet(int authed)
mov rdi, 0
call FUNC_CHECK_LEET {
mov rdi, 1 if (authed) return 1337;
call FUNC_CHECK_LEET
call EXIT else return 0;
}
FUNC_CHECK_LEET:
test rdi, rdi
jnz LEET int main() {
mov ax, 0
ret
check_leet(0);
LEET: check_leet(1);
mov ax, 1337 exit();
ret
}
FUNC_EXIT:
#
Calling Conventions
Callee and caller functions must agree on argument passing.
Linux x86: push arguments (in reverse order), then call (which pushes return address),
return value in eax
Linux amd64: rdi, rsi, rdx, rcx, r8, r9, return value in rax
Linux arm: r0, r1, r2, r3, return value in r0

Registers are shared between functions, so calling conventions should


agree on what registers are protected.
Linux amd64.
rbx, rbp, r12, r13, r14, r15 are "callee-saved"
(the function you call keeps their values safe on the stack).
Other registers are up for grabs
(within reason; e.g., rsp must be maintained). Save their values (on the stack)!

You might also like