Virtualization
Introductio
n Providing an interface to
Virtualization:
software that maps to some underlying
system.
A one-to-one mapping between a guest
and the host on which it runs
Virtualized system should be an “efficient,
isolated duplicate” of the real one.
Process virtual machine just supports a process;
system virtual machine supports an entire
system.
Why
Virtualize?
Reasons for Virtualization
Hardware Economy
Versatility
Environment
Specialization
Security
Safe Kernel Development
OS Research
Process
Virtualization
VM interfaces
single process
with
Application sees
“virtual machine” as Application
address space,
Virtualization Layer
registers, and
instruction set .
OS
Examples:
Multiprogramming Hardware
Emulation for binaries
High-level language
VMMs (e.g., JVM)
System
Virtualization
Application
Application
OS
OS
Virtualization Layer
Virtualization Layer
OS
Hardware
Hardware
Classical Hosted
Virtualization Virtualization/
Emulation
System
Virtualization
Interfaces with operating system
OS sees VM as an actual machine—memory,
I/O, CPU, etc
Classic virtualization: virtualization layer
runs atop the hardware.
Usually found on servers (Xen, VMWare ESX)
Hosted or whole-system virtualization:
virtualization runs on an operating system
Popular for desktops (VMWare Workstation,
Virtual PC)
Emulatio
n to a system so that it can
Providing an interface
run on a system with a different interface
Lets compiled binaries, OSes run on
architectures with different ISA (binary
translation)
Performance usually worse than classic
virtualization.
Example: QEMU
Breaks CPU instructions into small ops, coded in C.
C code is compiled into small objects on native ISA.
dyngen utility runs code by dynamically
stitching objects together (dynamic code
generation).
Some Important
Terms
Virtual Machine (VM): An instance of of an
operating system running on a virtualized
system. Also known as a virtual or guest OS.
hypervisor: The underlying virtualization
system sitting between the guest OSes and
the hardware. Also known as a Virtual
Machine Monitor (VMM).
Requirements of a
VMMby Popek & Goldberg in 1974:
Developed
1. Provides environment identical to
underlying hardware.
2. Most of the instructions coming from the guest
OS are executed by the hardware without being
modified by the VMM.
3. Resource management is handled by the
VMM (this all non-CPU hardware such as
memory and peripherals).
Guest OS
Model
Hypervisor exists as a
layer between the
operating systems and
the hardware.
Apps Apps Apps
Performs memory
management and Guest OS Guest OS
scheduling required Guest OS
to coordinate
multiple operating
systems. Hypervisor
(Host)
May also have a
separate controlling Hardware
interface.
Virtualization
Challenges
Privileged Instructions
Handling architecture-imposed instruction
privilege levels.
Performance Requirements
Holding down the cost of VMM activities.
Memory Management
Managing multiple address spaces efficiently.
I/O Virtualization
Handling I/O requests from multiple
operating systems.
CPU Virtualization
x86 architecture has
four privilege levels
(rings).
The OS assumes it will
be executing in Ring 0.
Many system calls
require 0-level
privileges to execute.
Any virtualization
strategy must find a
way to circumvent this.
Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and
Hardware Assist”, 2007.
Binary Translation with Full Virtualization
“Hardware is
functionally identical
to underlying
architecture.”
Typically accomplished
through interpretation
or binary translation.
Advantage: Guest OS
will run without any
changes to source code.
Disadvantage:
Complex, usually
slower than
paravirtualization.
Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and
Hardware Assist”, 2007.
Paravirtualizatio
n
Replace certain
unvirtualized sections
of OS code with
virtualization-friendly
code.
Virtual architecture
“similar but not
identical to the
underlying
architecture.”
Advantages: easier,
lower virtualization
overhead
Disadvantages: requires
Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and
Hardware Assist”, 2007.
modifications to guest
OS
Performanc
e
Modern VMMs based
around trap-and-emulate .
Guest
When a guest OS executes CPU_INST OS
a privileged instruction,
control is passed to VMM
(VMM “traps” on
instruction), which decides TRAP VMM
how to handle instruction
. CPU_INST1
VMM generates
instructions to handle EXEC
trapped instruction
(emulation).
Non-privileged instructions
do not trap (system stays CPU_INST
in guest context).
Trap-and-Emulate
Problems
Trap-and-emulate is expensive
Requires context-switch from guest OS mode to
VMM.
x86 is not trap-friendly
Guest’s CPL privilege level is visible in hardware
registers; cannot change it in a way that the guest
OS cannot detect .
Some instructions are not privileged, but
access privileged systems (page tables, for
example) .
Hardware-Assisted
Virtualization
Hardware virtualization-assist released in 2006 .
Intel, AMD both have technologies of this type.
Introduces new VMX runtime mode.
Two modes: guest (for OS) and root (for VMM).
Each mode has all four CPL privilege levels available .
Switching from guest to VMM does not require changes
in privilege level.
Root mode supports special VMX instructions.
Virtual machine control block contains control flags and
state information for active guest OS.
New CPU instructions for entering and exiting VMM
mode.
Does not support I/O virtualization.
VMWare
FullVirtualization
virtualization implemented through
dynamic binary translation.
Translated code is grouped and stored in
translation caches (TCs).
Callout method replaces traps with stored
emulation functions.
In-TC emulation blocks are even more efficient.
Adaptive binary translation rewrites translated
blocks to minimize PTE traps .
Direct execution of user-space code further
reduces overhead .
Xen
Virtualization
Xen occupies privilege level 0; guest OS
occupies privilege level 1.
OS code is modified so that high-privilege
calls (hypercalls) are made to and trapped by
Xen .
Xen traps guest OS instructions using table
of exception handlers.
Frequently used handlers (e.g., system calls)
have special handlers that allow guest OS to
bypass privilege level 0 .
Approach does not work with page faults.
Handlers are vetted by Xen before being
stored.
Applications of VT-
X Intel VT-x to host fully-virtualized
Xen uses
guests alongside paravirtualized guests .
System has root (VMM) and non-root (guest)
modes, each with privilege levels 0-3.
QEMU/Bochs projects provide emulations
VMWare does not make use of VT technology .
VMWare’s software-based VMMs
significantly outperformed VT-X-based
VMMs .
VT-X virtualization is trap-based, and DBT tries
to eliminate traps wherever possible.
Memory Virtualization
Virtualization software must find a way to handle paging
requests of operating systems, keeping each set of pages
separate.
Memory virtualization must not impose too much overhead,
or performance and scalability will be impaired.
Guest OS must each have an address space, be convinced
that it has access to the entire address space.
SOLUTION: most modern VMMs add an additional layer
of abstraction in address space .
Machine Address—bare hardware address.
Physical Address—VMM abstraction of machine address, used
by guest Oses.
Guest maintains virtual-to-physical page tables.
VMM maintains pmap structure containing physical-to-machine
page mappings.
Memory
virtua
Problem
physica physical
machine
l l
a b b c
fram
e
Page Table Pmap
for Program structure
m on VM n. in VMM.
That’s a lot of
lookups!
Shadow Page
Tables
Shadow page tables map virtual memory to
machine memory .
One page table maintained per guest OS.
TLB(Translation Look aside Buffer) caches results
from shadow page tables.
Shadow page tables must be kept consistent
with guest pages.
VMM updates shadow page tables when
pmap (physical-to-machine) records are
updated.
VMM now has access to virtual
addresses, eliminating two page table
lookups.
Shadow Page
virtua
Tables
physica physical virtua machin
machine
l l l e
a b b c a c
Page Table Pmap Shadow
for Program structure page table
m on VM n. in VMM. in VMM.
Gues VMM
t
Shadow Page
Table
Updates are expensive
Drawbacks
On a write, the VMM must update the VM and
the shadow page table.
TLB must be flushed on world switch.
TLB from other guest will be full of machine
addresses that would be invalid in the new context.
Direct
Direct accessAccess
to hardware is not permitted by the
Popek and Goldberg model .
VMWare and Xen both bend this rule, allow guests
to access hardware directly in certain cases.
Xen uses validated access model .
Fine-grained control over direct access.
VMWare allows user-mode instructions to bypass BT,
go straight to CPU .
Memory accesses are sometimes batched to
minimize context switches.
Memory
Overcommitment
Overcommitment: committing more total
memory to guest OSes than actually exists on
the system .
Guest memory can be adjusted according to workload.
Higher-workload servers get better performance than
with a simple even allocation.
Requires some mechanism to reclaim memory
from other guests .
Poor page replacement schemes can result in
double paging .
VMM marks page for reclamation, OS immediately
moves reclaimed page out of memory
Most common in high memory-usage situations.
I/O
Virtualization
Performance is critical
for virtualized I/O
Guest OS
Many I/O devices Gues
are time- sensitive or t
require low latency . Drive
r
Most common Virtua
method: device l
emulation Devic
VMM presents guest eVMM
OS with a virtual
device Virtua
Preserves security, l
handles concurrency, but Drive
imposes more overhead. r
Physical Device
I/O Virtualization
Problems
Multiplexing
How to share hardware access among multiple
OSes.
Switching Expense
Low-level I/O functionality happens at the VMM
level, requiring a context switch.
I/O Rings,
continued
Xen VMWare
Rings contain memory Ring buffer is
descriptors pointing to
I/O buffer regions constructed in and
declared in guest managed by VMM.
address space.
If VMM detects a great
Guest and VMM deposit deal of entries and exits,
and remove messages
using a producer- it starts queuing I/O
consumer model . requests in ring buffer .
Xen 3.0 places device Next interrupt triggers
drivers on their own
virtual domains, transmission of
minimizing the effect of accumulated
driver crashes. messages.
Summar
y
Current VMM implementations provide safe,
relatively efficient virtualization, albeit often at
the expense of theoretical soundness .
The x86 architecture requires a) binary translation,
b) paravirtualization, or c) hardware support to
virtualize.
Binary translation and instruction trapping costs
are currently the largest drains on efficiency .
Management of memory and other resources
remains a complex and expensive task in modern
virtualization implementations.
Reference
1.
2.
s
Singh, A. “An Introduction To Virtualization”, www.kernelthread.com, 2004.
VMWare White Paper, “Understanding Full Virtualization, Paravirtualization,
and Hardware Assist”, 2007.
3. Barham, P. et al. “Xen and the Art of Virtualization”, SOSP 2003.
4. Waldspurger, C. “Memory Resource Management in VMware ESX Server”, OSDI
2002.
5. Adams, K. and Agesen, O. “A Comparison of Software and Hardware Techniques
for x86 Virtualization”, ASPLOS 2006.
6. Pratt, I. et al. “Xen 3.0 and the Art of Virtualization”, Linux Symposium 2005.
7. Sugerman, J. et al. “Virtualizing I/O Devices on Vmware Workstation’s Hosted
Virtual Machine Monitor”, Usenix, 2001.
8. Popek, G. and Kgoldberg, R. “Formal Requirements for Virtualizable Third-
Generation Architectures”, Communications of the ACM, 1974.
9. Mahalingam, M. “I/O Architectures for Virtualization”, VMWorld, 2006.
10. Smith, J. and Nair, R. Virtual Machines, Morgan Kaufmann, 2005.
11. Bellard, F. “QEMU, a Fast and Portable Translator”, USENIX 2005.
12. Silberschatz, A., Galvin, P., Gagne, G. Operating System Concepts, Eighth Edition.
Wiley & Sons, 2009.