Ian Pratt Xensource Inc. and University of Cambridge: Keir Fraser, Steve Hand, Christian Limpach and Many Others
Ian Pratt Xensource Inc. and University of Cambridge: Keir Fraser, Steve Hand, Christian Limpach and Many Others
Ian Pratt
XenSource Inc. and University of
Cambridge
Virtualization Overview
Xen Architecture
New Features in Xen 3.0
VM Relocation
Xen Roadmap
Questions
Virtualization Overview
Single OS image: OpenVZ, Vservers, Zones
Group user processes into resource containers
Hard to get strong isolation
Full virtualization: VMware, VirtualPC, QEMU
Run multiple unmodified guest OSes
Hard to efficiently virtualize x86
Para-virtualization: Xen
Run multiple guest OSes ported to special arch
Arch Xen/x86 is very close to normal x86
Virtualization in the Enterprise
X
Avoid downtime with VM Relocation
X
X
Enforce security policy
Xen 2.0 (5 Nov 2005)
Secure isolation between VMs
Resource control and QoS
Only guest kernel needs to be ported
User-level apps and libraries run unmodified
Linux 2.4/2.6, NetBSD, FreeBSD, Plan9, Solaris
Execution performance close to native
Broad x86 hardware support
Live Relocation of VMs between Xen nodes
Para-Virtualization in Xen
Xen extensions to x86 arch
Like x86, but Xen invoked for privileged ops
Avoids binary rewriting
Minimize number of privilege transitions into Xen
Modifications relatively simple and self-contained
Modify kernel to understand virtualised env.
Wall-clock time vs. virtual processor time
• Desire both types of alarm timer
Expose real resource availability
• Enables OS to optimise its own behaviour
Xen 3.0 Architecture
VM0 VM1 VM2 VM3
Device Unmodified Unmodified Unmodified
Manager & User User User
Control s/w Software Software Software
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
L X V U L X V U L X V U L X V U
SPEC INT2000 (score) Linux build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score)
Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U)
Scalability
1000
800
600
400
200
0
L X L X L X L X
2 4 8 16
ring 0
ring 1 kernel
ring 3
User U
0
x86_64
Run user-space and kernel in
ring 3 using different
r3 User U pagetables
Two PGD’s (PML4’s): one with
user entries; one with user
r3 Kernel U plus kernel entries
syscall/sysret System calls require an
r0 Xen S additional syscall/ret via Xen
Per-CPU trampoline to avoid
needing GS in Xen
Para-Virtualizing the MMU
Guest OSes allocate and manage own PTs
Hypercall to change PT base
Xen must validate PT updates before use
Allows incremental updates, avoids revalidation
Validation rules applied to each PTE:
1. Guest may only map pages it owns*
2. Pagetable pages may only be mapped RO
Xen traps PTE updates and emulates, or
‘unhooks’ PTE page for bulk updates
Writeable Page Tables : 1 – Write fault
guest reads
Virtual → Machine
first guest
write Guest OS
page fault
Xen VMM
Hardware
MMU
Writeable Page Tables : 2 – Emulate?
guest reads
Virtual → Machine
first guest
write Guest OS
yes
emulate?
Xen VMM
Hardware
MMU
Writeable Page Tables : 3 - Unhook
guest reads
X Virtual → Machine
guest writes
Guest OS
Xen VMM
Hardware
MMU
Writeable Page Tables : 4 - First Use
guest reads
X Virtual → Machine
guest writes
Guest OS
page fault
Xen VMM
Hardware
MMU
Writeable Page Tables : 5 – Re-hook
guest reads
Virtual → Machine
guest writes
Guest OS
validate
Xen VMM
Hardware
MMU
MMU Micro-Benchmarks
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
L X V U L X V U
lmbench results on Linux (L), Xen (X), VMWare Workstation (V), and UML (U)
SMP Guest Kernels
Xen extended to support multiple VCPUs
Virtual IPI’s sent via Xen event channels
Currently up to 32 VCPUs supported
Simple hotplug/unplug of VCPUs
From within VM or via control tools
Optimize one active VCPU case by binary
patching spinlocks
NB: Many applications exhibit poor SMP
scalability – often better off running
multiple instances each in their own OS
SMP Guest Kernels
Takes great care to get good SMP performance
while remaining secure
Requires extra TLB syncronization IPIs
SMP scheduling is a tricky problem
Wish to run all VCPUs at the same time
But, strict gang scheduling is not work conserving
Opportunity for a hybrid approach
Paravirtualized approach enables several
important benefits
Avoids many virtual IPIs
Allows ‘bad preemption’ avoidance
Auto hot plug/unplug of CPUs
VT-x / Pacifica : hvm
Enable Guest OSes to be run without modification
E.g. legacy Linux, Windows XP/2003
CPU provides vmexits for certain privileged instrs
Shadow page tables used to virtualize MMU
Xen provides simple platform emulation
BIOS, apic, iopaic, rtc, Net (pcnet32), IDE emulation
Install paravirtualized drivers after booting for
high-performance IO
Possibility for CPU and memory paravirtualization
Non-invasive hypervisor hints from OS
Guest VM (VMX) Guest VM (VMX)
(32-bit) (64-bit)
Domain 0 Domain N
Control
Panel
3D
FE Virtual
FE Virtual
3P
Drivers
Drivers
Linux xen64
Drivers
Virtual Platform Virtual Platform
Native Native 0D
1/3P VMExit VMExit
Device Device
Drivers Drivers IO Emulation IO Emulation
Callback / Hypercall
Event channel
0P
VMM
Hardware
MMU
Xen Tools
dom0 dom1
xmlib
xenstore
libxc
dom0_op
Xen
VM Relocation : Motivation
VM relocation enables:
High-availability
Xe n
• Machine maintenance
Load balancing
• Statistical multiplexing gain
Xen
Assumptions
Networked storage
NAS: NFS, CIFS
SAN: Fibre Channel
Xe n
iSCSI, network block dev
drdb network RAID
Good connectivity
Xen
common L2 network
L3 re-routeing
Storage
Challenges
VMs have lots of state in memory
Some VMs have soft real-time
requirements
E.g. web servers, databases, game servers
May be members of a cluster quorum
Minimize down-time
Performing relocation requires resources
Bound and control resources used
Relocation Strategy
52s
Quake 3 Server relocation
Current Status
x86_32 x86_32p x86_64 IA64 Power
Privileged Domains
Guest Domains
SMP Guests
Save/Restore/Migrate
>4GB memory
VT
Driver Domains
3.1 Roadmap
Improved full-virtualization support
Pacifica / VT-x abstraction
Enhanced IO emulation
Enhanced control tools
Performance tuning and optimization
Less reliance on manual configuration
NUMA optimizations
Virtual bitmap framebuffer and OpenGL
Infiniband / “Smart NIC” support
IO Virtualization
IO virtualization in s/w incurs overhead
Latency vs. overhead tradeoff
• More of an issue for network than storage
Can burn 10-30% more CPU
Solution is well understood
Direct h/w access from VMs
• Multiplexing and protection implemented in h/w
Smart NICs / HCAs
• Infiniband, Level-5, Aaorhi etc
• Will become commodity before too long
Research Roadmap
Whole-system debugging
Lightweight checkpointing and replay
Cluster/distributed system debugging
Software implemented h/w fault tolerance
Exploit deterministic replay
Multi-level secure systems with Xen
VM forking
Lightweight service replication, isolation
Conclusions
Xen is a complete and robust hypervisor
Outstanding performance and scalability
Excellent resource control and protection
Vibrant development community
Strong vendor support
http://xensource.com/community
Thanks!