0% found this document useful (0 votes)
47 views

Ian Pratt Xensource Inc. and University of Cambridge: Keir Fraser, Steve Hand, Christian Limpach and Many Others

This document summarizes virtualization and the Xen architecture. It discusses how Xen uses para-virtualization to run multiple guest operating systems efficiently through small modifications to the guest kernel. Xen provides strong isolation between virtual machines and enables features like live migration. New features in Xen 3.0 include support for SMP guests and hardware virtualization using VT-x that allows running unmodified operating systems.

Uploaded by

Chamith Kumarage
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Ian Pratt Xensource Inc. and University of Cambridge: Keir Fraser, Steve Hand, Christian Limpach and Many Others

This document summarizes virtualization and the Xen architecture. It discusses how Xen uses para-virtualization to run multiple guest operating systems efficiently through small modifications to the guest kernel. Xen provides strong isolation between virtual machines and enables features like live migration. New features in Xen 3.0 include support for SMP guests and hardware virtualization using VT-x that allows running unmodified operating systems.

Uploaded by

Chamith Kumarage
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

Virtualization

Ian Pratt
XenSource Inc. and University of
Cambridge

Keir Fraser, Steve Hand, Christian


Limpach and many others…
Outline

 Virtualization Overview
 Xen Architecture
 New Features in Xen 3.0
 VM Relocation
 Xen Roadmap
 Questions
Virtualization Overview
 Single OS image: OpenVZ, Vservers, Zones
 Group user processes into resource containers
 Hard to get strong isolation
 Full virtualization: VMware, VirtualPC, QEMU
 Run multiple unmodified guest OSes
 Hard to efficiently virtualize x86
 Para-virtualization: Xen
 Run multiple guest OSes ported to special arch
 Arch Xen/x86 is very close to normal x86
Virtualization in the Enterprise

Consolidate under-utilized servers

X
Avoid downtime with VM Relocation

Dynamically re-balance workload


to guarantee application SLAs

X
X
Enforce security policy
Xen 2.0 (5 Nov 2005)
 Secure isolation between VMs
 Resource control and QoS
 Only guest kernel needs to be ported
 User-level apps and libraries run unmodified
 Linux 2.4/2.6, NetBSD, FreeBSD, Plan9, Solaris
 Execution performance close to native
 Broad x86 hardware support
 Live Relocation of VMs between Xen nodes
Para-Virtualization in Xen
 Xen extensions to x86 arch
 Like x86, but Xen invoked for privileged ops
 Avoids binary rewriting
 Minimize number of privilege transitions into Xen
 Modifications relatively simple and self-contained
 Modify kernel to understand virtualised env.
 Wall-clock time vs. virtual processor time
• Desire both types of alarm timer
 Expose real resource availability
• Enables OS to optimise its own behaviour
Xen 3.0 Architecture
VM0 VM1 VM2 VM3
Device Unmodified Unmodified Unmodified
Manager & User User User
Control s/w Software Software Software

GuestOS GuestOS GuestOS Unmodified


(XenLinux) (XenLinux) (XenLinux) GuestOS
AGP (WinXP))
ACPI Back-End
SMP
PCI Native
Device Front-End Front-End Front-End
Drivers Device Drivers Device Drivers Device Drivers
VT-x

x86_32 Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU


x86_64 Xen Virtual Machine Monitor
IA64
Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE)
I/O Architecture
 Xen IO-Spaces delegate guest OSes protected
access to specified h/w devices
 Virtual PCI configuration space
 Virtual interrupts
 (Need IOMMU for full DMA protection)
 Devices are virtualised and exported to other
VMs via Device Channels
 Safe asynchronous shared memory transport
 ‘Backend’ drivers export to ‘frontend’ drivers
 Net: use normal bridging, routing, iptables
 Block: export any blk dev e.g. sda4,loop0,vg3
 (Infiniband / “Smart NICs” for direct guest IO)
System Performance
1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
L X V U L X V U L X V U L X V U
SPEC INT2000 (score) Linux build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score)

Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U)
Scalability

1000

800

600

400

200

0
L X L X L X L X
2 4 8 16

Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X)


x86_32
 Xen reserves top of
4GB
Xen S
VA space
 Segmentation
Kernel S
3GB protects Xen from

ring 0
ring 1 kernel
ring 3

User U  System call speed


unchanged

0GB  Xen 3 now supports


PAE for >4GB mem
x86_64
264  Large VA space makes life
Kernel U a lot easier, but:
 No segment limit support
Xen S
Need to use page-level
264-247
Reserved protection to protect
247 hypervisor

User U

0
x86_64
 Run user-space and kernel in
ring 3 using different
r3 User U pagetables
 Two PGD’s (PML4’s): one with
user entries; one with user
r3 Kernel U plus kernel entries
syscall/sysret  System calls require an
r0 Xen S additional syscall/ret via Xen
 Per-CPU trampoline to avoid
needing GS in Xen
Para-Virtualizing the MMU
 Guest OSes allocate and manage own PTs
 Hypercall to change PT base
 Xen must validate PT updates before use
 Allows incremental updates, avoids revalidation
 Validation rules applied to each PTE:
1. Guest may only map pages it owns*
2. Pagetable pages may only be mapped RO
 Xen traps PTE updates and emulates, or
‘unhooks’ PTE page for bulk updates
Writeable Page Tables : 1 – Write fault
guest reads
Virtual → Machine
first guest
write Guest OS

page fault
Xen VMM
Hardware
MMU
Writeable Page Tables : 2 – Emulate?
guest reads
Virtual → Machine
first guest
write Guest OS

yes
emulate?
Xen VMM
Hardware
MMU
Writeable Page Tables : 3 - Unhook
guest reads

X Virtual → Machine
guest writes
Guest OS

Xen VMM
Hardware
MMU
Writeable Page Tables : 4 - First Use
guest reads

X Virtual → Machine
guest writes
Guest OS

page fault
Xen VMM
Hardware
MMU
Writeable Page Tables : 5 – Re-hook
guest reads
Virtual → Machine
guest writes
Guest OS

validate

Xen VMM
Hardware
MMU
MMU Micro-Benchmarks

1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
L X V U L X V U

Page fault (µs) Process fork (µs)

lmbench results on Linux (L), Xen (X), VMWare Workstation (V), and UML (U)
SMP Guest Kernels
 Xen extended to support multiple VCPUs
 Virtual IPI’s sent via Xen event channels
 Currently up to 32 VCPUs supported
 Simple hotplug/unplug of VCPUs
 From within VM or via control tools
 Optimize one active VCPU case by binary
patching spinlocks
 NB: Many applications exhibit poor SMP
scalability – often better off running
multiple instances each in their own OS
SMP Guest Kernels
 Takes great care to get good SMP performance
while remaining secure
 Requires extra TLB syncronization IPIs
 SMP scheduling is a tricky problem
 Wish to run all VCPUs at the same time
 But, strict gang scheduling is not work conserving
 Opportunity for a hybrid approach
 Paravirtualized approach enables several
important benefits
 Avoids many virtual IPIs
 Allows ‘bad preemption’ avoidance
 Auto hot plug/unplug of CPUs
VT-x / Pacifica : hvm
 Enable Guest OSes to be run without modification
 E.g. legacy Linux, Windows XP/2003
 CPU provides vmexits for certain privileged instrs
 Shadow page tables used to virtualize MMU
 Xen provides simple platform emulation
 BIOS, apic, iopaic, rtc, Net (pcnet32), IDE emulation
 Install paravirtualized drivers after booting for
high-performance IO
 Possibility for CPU and memory paravirtualization
 Non-invasive hypervisor hints from OS
Guest VM (VMX) Guest VM (VMX)
(32-bit) (64-bit)
Domain 0 Domain N

Linux xen64 Unmodified OS Unmodified OS


(xm/xend)

Control
Panel

3D

FE Virtual

FE Virtual
3P

Drivers

Drivers
Linux xen64

Front end Virtual


Guest BIOS Guest BIOS
Virtual driver
Backend

Drivers
Virtual Platform Virtual Platform
Native Native 0D
1/3P VMExit VMExit
Device Device
Drivers Drivers IO Emulation IO Emulation
Callback / Hypercall

Event channel
0P

Control Interface Scheduler Event Channel Hypercalls


Processor Memory I/O: PIT, APIC, PIC, IOAPIC
Xen Hypervisor
MMU Virtualizion : Shadow-Mode

guest reads Virtual → Pseudo-physical

guest writes Guest OS


Accessed & Updates
dirty bits
Virtual → Machine

VMM
Hardware
MMU
Xen Tools
dom0 dom1

CIM xm Web svcs

xmlib

xenstore

builder control save/ control


restore

libxc

Priv Cmd Back xenbus xenbus Front

dom0_op
Xen
VM Relocation : Motivation

 VM relocation enables:
 High-availability
Xe n
• Machine maintenance
 Load balancing
• Statistical multiplexing gain
Xen
Assumptions

 Networked storage
 NAS: NFS, CIFS
 SAN: Fibre Channel
Xe n
 iSCSI, network block dev
 drdb network RAID
 Good connectivity
Xen
 common L2 network
 L3 re-routeing
Storage
Challenges
 VMs have lots of state in memory
 Some VMs have soft real-time
requirements
 E.g. web servers, databases, game servers
 May be members of a cluster quorum
 Minimize down-time
 Performing relocation requires resources
 Bound and control resources used
Relocation Strategy

Stage 0: pre-migration VM active on host A


Destination host selected
(Block devices mirrored)
Stage 1: reservation Initialize container on
target host
Stage 2: iterative pre-copy Copy dirty pages in
successive rounds
Stage 3: stop-and-copy Suspend VM on host A
Redirect network traffic
Synch remaining
Activate on hoststate
B
Stage 4: commitment
VM state on host A
released
Pre-Copy Migration: Round 1
Pre-Copy Migration: Round 1
Pre-Copy Migration: Round 1
Pre-Copy Migration: Round 1
Pre-Copy Migration: Round 1
Pre-Copy Migration: Round 2
Pre-Copy Migration: Round 2
Pre-Copy Migration: Round 2
Pre-Copy Migration: Round 2
Pre-Copy Migration: Round 2
Pre-Copy Migration: Final
Web Server Relocation
Iterative Progress: SPECWeb

52s
Quake 3 Server relocation
Current Status
  x86_32 x86_32p x86_64 IA64 Power

Privileged Domains          

Guest Domains          

SMP Guests      

Save/Restore/Migrate        

>4GB memory    

VT        

Driver Domains          
3.1 Roadmap
 Improved full-virtualization support
 Pacifica / VT-x abstraction
 Enhanced IO emulation
 Enhanced control tools
 Performance tuning and optimization
 Less reliance on manual configuration
 NUMA optimizations
 Virtual bitmap framebuffer and OpenGL
 Infiniband / “Smart NIC” support
IO Virtualization
 IO virtualization in s/w incurs overhead
 Latency vs. overhead tradeoff
• More of an issue for network than storage
 Can burn 10-30% more CPU
 Solution is well understood
 Direct h/w access from VMs
• Multiplexing and protection implemented in h/w
 Smart NICs / HCAs
• Infiniband, Level-5, Aaorhi etc
• Will become commodity before too long
Research Roadmap
 Whole-system debugging
 Lightweight checkpointing and replay
 Cluster/distributed system debugging
 Software implemented h/w fault tolerance
 Exploit deterministic replay
 Multi-level secure systems with Xen
 VM forking
 Lightweight service replication, isolation
Conclusions
 Xen is a complete and robust hypervisor
 Outstanding performance and scalability
 Excellent resource control and protection
 Vibrant development community
 Strong vendor support

 Try the demo CD to find out more!


(or Fedora 4/5, Suse 10.x)

 http://xensource.com/community
Thanks!

 If you’re interested in working full-time on


Xen, XenSource is looking for great
hackers to work in the Cambridge UK
office. If you’re interested, please send
me email!

[email protected]

You might also like