David Weinstein: Advanced x86: Virtualization With VT-X
David Weinstein: Advanced x86: Virtualization With VT-X
David Weinstein
[email protected]
Acknowledgements
<Could be you!> : Tell me something that I
didnt know that ends up in the course
material
Thanks to Xeno Kovah for pushing me to
create this material and for reviewing it
periodically as it was created.
Thanks to Corey Kallenberg for device driver
signing info for Windows 7
Introductions
Name
Department
Work interests
Projects, sponsor, etc.
Prerequisites
Intro/Intermediate x86 (or equivalent)
required
Rootkits class will probably help
Agenda
Introduction
Lightning x86_64 review
VT-x
VMM detection
Relevant hypervisor projects
Time permitting
Discussion: writing undetectable bot for SC2/Diablo 3?
Questions
Stolen from Xeno
Questions: Ask em if you got em
If you fall behind and get lost and try to tough it out until you
understand, its more likely that you will stay lost, so ask
questions ASAP.
Scope
While advanced, still introductory
Fundamentals, challenges, techniques
Open source virtualization technologies and implementations
Primarily Intel specific discussions, 64 bit host/guests
All indications to sections in Intel manual correspond to
December 2011 edition (Order Number: 325384-041US)
which should be provided with these slides
Goals
Identify/understand/implement various
hypervisor concepts, integrate by parts
Blue Pill/Hyperjack
post-boot (hosted) hypervisor shim technique
Introduction
The goal is to get the core virtualization concepts
out of the way and clear up the semantics first.
Well cover some 64-bit concepts
Then move into specifics for Intel VT-x;
We will be covering the architecture, instructions, and
specifics needed to write real code
With Windows focus
Series of labs to guide the way
Sqr0
Each instance of an OS is called a Virtual
Machine (VM), guest, or domU.
Hypervisor Virtual Machine Monitor (VMM)
There are fundamentally diferent approaches
to virtualization; important to understand the
diferences
Terminology Bootstrap
Virtual Machine Extensions (VMX)
Virtual Machine Monitor (VMM)
VMX Root operation
VMM, host VM
Management VM
dom0
Virtualization is
Resource Abstraction yo!
The process of hiding the underlying physical
hardware in a way that makes it transparently
usable and shareable by multiple operating
systems. [IBM]
Most hardware can be virtualized to the point
that a guest doesnt know/care
Underlying physical hardware supporting VM may
not be dedicated to it
[IBM] http://www.ibm.com/developerworks/linux/library/l-hypervisor/
Abstraction
Grey for
efect 2005
seems so
long ago
Vendor technologies
Intel
AMD
CPU Flag
Processor emulation
VT-x
MMU emulation
VT-d
Network emulation
VT-c
PCI emulation
AMD-v
AMD-Vi
PCI-SIG I/O Virtualization
VMM Types
Type 1. bare metal hypervisors run directly on
the host hardware
guest OS runs at level above the hypervisor
2012
.data
ext_features DB 8 DUP(0)
.code
mov eax, 80000001h
cpuid
mov [ext_features+0], edx
mov [ext_features+4], ecx
18
ECX =
27 26 25 24 23 22 21 20
15 14 13 12
11 10
L
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRS
F
31 30 29 28
27 26 25 24 23 22 21 20
IA-32e
EDX =
19 18 17 16
RR
RRRRRRRR
19 18 17 16
15 14 13 12
X
RRRRRRRR
D
S
Y
S
C
A
L
L
11 10
RRRRRRRRRRR
2012
IA32e
XD
SYSCALL
LSF
R
19
CMPSQ
CMPXCHG16B
LODSQ
MOVSQ
MOVZX (64-bits)
STOSQ
SWAPGS
SYSCALL
SYSRET
New GP Registers
In 64-bit mode, there are 16 general purpose
(GP) registers and the default operand size is
32 bits. However, general-purpose registers are
able to work with either 32-bit or 64-bit
operands.
R8-R15 represent eight new general-purpose
registers. All of these registers can be accessed
at the byte (B), word (2 B), dword (4 B), and
qword (8 B) level.
Registers
typedef struct _GUEST_REGS
{
ULONG64 rax;
ULONG64 rcx;
ULONG64 rdx;
ULONG64 rbx;
ULONG64 rsp;
ULONG64 rbp;
ULONG64 rsi;
ULONG64 rdi;
ULONG64 r8;
ULONG64 r9;
ULONG64 r10;
ULONG64 r11;
ULONG64 r12;
ULONG64 r13;
ULONG64 r14;
ULONG64 r15;
} GUEST_REGS, *PGUEST_REGS;
x86-64 Segmentation
Segmentation is generally (but not completely)
disabled, creating a fat 64-bit linear-address
space.
Specifically, the processor treats the segment
base of CS, DS, ES, and SS as zero in 64-bit mode
(this makes a linear address equal an efective
address). Segmented and real address modes are
not available in 64-bit mode.
Intel Vol 3 (Section 3.2.4)
Paging structures
32-bit
64-bit
48 47
sign-extension
PML4
39 38
30 29
PDPT
21 20
PDIR
12 11
PTBL
PDPTE
Table 416
PML4E
Table 414.
CR3
2012
Page
Map
Level-4
Table
PDE
Table 418
Page
Directory
Pointer
Table
Page
Directory
ofset
Page
Table
Page
Frame
(4KB)
26
XD
52 51
M M-1
31
12 11
Base
Address
[(M-1) : 32]
Reserved
[51 : M] must be 0
available
32
9 8
P
P P S R
avail G A D A C W / / P
T
D T UW
27
MAXPHYADDR (1)
CPUID.80000008H:EAX[7:0] reports the physicaladdress width supported by the processor.
Ours will probably be 36-bits (64 GB)
RIP-relative addressing
In 64-bit mode, the RIP register is the instruction pointer.
This register holds the 64-bit ofset of the next instruction to be
executed.
; Ref: http://codegurus.be/codegurus/Programming/riprelativeaddressing_en.htm#Mode64
REX Prefix
REX (byte) prefixes are used to generate 64bit operand sizes or to reference registers R8R15.
If REX.w = 1, a 64-bit operand size is used.
See Intel Vol. 2 Section 2.2.1.2 for details
REX
REX.w bitflag
CPUID.EAX = 0x80000003
Characters [16:31] in EAX, EBX, ECX, EDX
CPUID.EAX = 0x80000004
Characters [32:47] in EAX, EBX, ECX, EDX
No VMX!
.text
.global main
.type main, @function
main:
pushq%rbp
movq %rsp, %rbp
<<set appropriate eax value>>
cpuid
<<look at VMX bit in appropriate register>>
jz <<no_vmx>>
leaq S0(%rip), %rdi
call
leave
ret
puts
_CpuId PROC
push rbp
mov rbp, rsp
push rbx
push rsi
mov
mov
cpuid
mov
mov
mov
mov
mov
mov
pop
pop
mov
pop
ret
_CpuId ENDP
[rbp+18h], rdx
eax, ecx
rsi, [rbp+18h]
[rsi], eax
[r8], ebx
[r9], ecx
rsi, [rbp+30h]
[rsi], edx
rsi
rbx
rsp, rbp
rbp
Function prototype:
VOID _CpuId (
ULONG32 leaf,
OUT PULONG32 ret_eax,
OUT PULONG32 ret_ebx,
OUT PULONG32 ret_ecx,
OUT PULONG32 ret_edx
);
Example:
Check for 64-bit using intrinsic cpuid
typedef union
_CpuId {
int i[4];
struct {
int eax;
int ebx;
int ecx;
int edx;
};
} CpuId_t;
int CheckFor64Bit() {
int eax;
CpuId_t regs; // eax, ebx, ecx, edx
char bitres;
eax = 0x80000001;
__cpuid(regs.i, eax);
bitres = _bittest((long*) ®s.edx, 29);
return bitres ? 1 : 0;
}
Note
No more inline assembly on 64-bit Windows
Steps
Setup your coding environment
Implement code in assembly to determine whether your CPU supports VMX,
and for fun AESNI if you like
You can lookup appropriate values in the Intel manual Vol 2a 3-212, Figure
3-6. (Learn to search the manual!)
Implement grabbing the brand stringwell be playing with that later
Generating a certificate
Windows 7 requires signed drivers
We can self-sign if we boot into Test signing
mode
In Admin command prompt:
bcdedit /set testsigning on
Back to virtualization
Why is virtualization useful?
How complex is it to implement?
What inherent challenges can be expected?
What techniques have proven successful?
Efciency / Performance
A statistically dominant fraction of machine instructions must be executed without
VMM intervention.
Who is what?
Full virtualization (aka emulation)
Bochs and QEMU
Paravirtualization
Xen, VMware
Binary Translation
VMware, VirtualPC, VirtualBox, QEMU
Hardware Virtualization
Xen, VMware, VirtualPC, VirtualBox, KVM,
CPUID instruction
Returns processor identification and feature
information
Thought: When employing virtualization, are
there certain undesirable features not to be
exposed to the guest?
Some of these features could make the guest believe
it can do things it cant
Might want to mask of some features from guest
(virtualization)
Ring Aliasing
Ring 0 is most privileged
OS kernels assume to be running at
ring 0
Our guest VM is no diferent
Ring Compression
IA32 supplies two isolation mechanisms, Segmentation and
Paging
Segmentation isnt available in 64-bit
So paging is only choice for isolating a guest
But paging doesnt distinguish between rings 0 2
See Section 5.11.2 If the processor is currently operating at a CPL of
0, 1, or 2, it is in supervisor mode; if it is operating at a CPL of 3, it is in
user mode.
Faulting instructions
CLI (clear interrupt fag) and STI (set interrupt
fag
A ring 3 guest that calls CLI or STI raises CPU
exception
Diferent choices about how to architect your
virtualization environment
options: turn these interrupts into virtual
interrupts, trap to VMM, binary translation.
Memory addressing
OS kernel expects full (linear) virtual address space. VMM
could be in guest address space or mostly in separate
address space.
Why mostly?
Because there are some data structures to manage transitions
from guest to VMM (these structures need to be protected).
Reminder
only protection in 64-bit mode is paging (there is no
segmentation)
Address-space compression
Refers to the challenges of protecting these portions of the virtualaddress space and supporting guest accesses to them
VMwares older approach could no longer be used on x64 guests
because they required segment limits
The virtual machine monitors trap handler must reside in the guests address
space, because an exception cannot switch address spaces.
In theory a task gate in the IDT pointing to a TSS with appropriate CR3 could
help, but the performance overhead might have been prohibitive.
See http://www.pagetable.com/?p=25 (How retiring segmentation in AMD64
long mode broke VMware)
2. Para-virtualization
modification of guest kernel to support being virtualized
Can be pretty efcient
Binary Translation
Can ``defang privileged instructions such as
POPF
Instruction streams are modified on the fy
(think interpreter) to trap ofending
instruction sequences.
Two kinds
static and dynamic translation
If interested read up
PAYER, M., AND GROSS, T. Requirements for
fast binary translation. In 2nd Workshop on
Architectural and Microarchitectural Support
for Binary Translation (2009).
PAYER, M., AND GROSS, T. R. Generating lowoverhead dynamic binary translators. In
SYSTOR10 (2010).
Microsoft Hyper-V
https://en.wikipedia.org/wiki/File:Hyper-V.png
Microsoft Hyper-V
A hypervisor instance has to have at least one
parent partition
The virtualization stack runs in the parent
partition and has direct access to the hardware
devices.
The parent partition then creates the child
partitions which host the guest OSs.
Xen is pretty similar
Review