Hypervisor Overview Application Note Hypervisor Description ALL REV 0.00
Hypervisor Overview Application Note Hypervisor Description ALL REV 0.00
Hypervisor Description
6$0681*&RQILGHQWLDO
Software Version 11.6
Doc. Rev. 1.8 / None
Reference: DV-0018
&2$6,$1(;(//&2/7'PFMLQRDW
Date: 12/10/2021
REDBEND User Manual REDBEND Device Virtualization for Connected Vehicles
Hypervisor Description
Table of Contents
1 Introduction....................................................................................................................... 5
1.1 This Document ....................................................................................................................5
1.2 Related Documentation ......................................................................................................5
2 Prerequisite ....................................................................................................................... 6
3 Overview of the Hypervisor Services ................................................................................... 6
3.1 Hardware Resource Partitioning, Sharing, and Virtualization ............................................6
3.2 Guest OS Management Services .........................................................................................7
3.3 Summary of Hypervisor Services ........................................................................................7
4 Memory Management and Spatial Isolation ....................................................................... 8
4.1 RAM ....................................................................................................................................8
4.1.1 Virtual Machine Memory Partition .....................................................................8
4.1.2 Hypervisor Memory Partition ..............................................................................9
4.1.3 Memory Granting Between VM ....................................................................... 10
4.2 I/O Memory ..................................................................................................................... 10
4.3 Interrupt Management .................................................................................................... 11
4.3.1 Interrupt Processing ......................................................................................... 12
6$0681*&RQILGHQWLDO
5 Scheduling and Temporal Isolation ................................................................................... 13
5.1 Exclusive CPU Binding ...................................................................................................... 13
5.2 Hypervisor Scheduling ..................................................................................................... 14
&2$6,$1(;(//&2/7'PFMLQRDW
5.3 CPU Affinity ...................................................................................................................... 14
5.4 CPU handover .................................................................................................................. 15
6 Communication Channels ................................................................................................. 15
6.1 VM - Hardware Communication Channels ...................................................................... 15
6.2 VM - SMC Communication Channels ............................................................................... 16
6.3 VM - Hypervisor Communication Channels ..................................................................... 17
6.4 Inter-VM Communication Channels ................................................................................ 18
6.4.1 Vlink .................................................................................................................. 18
7 Hypervisor Public Interfaces ............................................................................................. 19
7.1 Example: Console Output Management.......................................................................... 20
7.2 Peripheral Emulation example ........................................................................................ 21
8 Monitoring ...................................................................................................................... 22
9 Hypervisor BSP................................................................................................................. 23
9.1 Typical BSP ....................................................................................................................... 23
9.2 BSP Source Tree Layout ................................................................................................... 24
10 Initialization..................................................................................................................... 24
11 Build Overview ................................................................................................................ 25
11.1 Build Tool Chains.............................................................................................................. 26
Table of Figures
Figure 1: Example Architecture Diagram .............................................................................................6
Figure 2: RAM partition ........................................................................................................................9
Figure 3:Hypervisor Memory Partition ............................................................................................. 10
Figure 4: Interrupts Mapping ............................................................................................................ 12
Figure 5: Interrupt Processing ........................................................................................................... 13
Figure 7: CPU Affinity ........................................................................................................................ 14
Figure 8: Hardware Access ............................................................................................................... 16
Figure 9: VM SMC Communication ................................................................................................... 17
Figure 10: VM - Hypervisor Communication .................................................................................... 18
Figure 11 : Vlink Architecture ............................................................................................................ 19
Figure 12: Hypervisor Public Interfaces............................................................................................. 20
Figure 13: Hypercall Example, Write to the Console ........................................................................ 21
Figure 14: Peripheral Emulation........................................................................................................ 22
6$0681*&RQILGHQWLDO
Figure 15: BSP Source Tree Layout .................................................................................................... 24
Figure 16: Build Overview ................................................................................................................. 25
&2$6,$1(;(//&2/7'PFMLQRDW
Figure 17: Detailed Hypervisor Image Generation ............................................................................ 26
Figure 18: Multi-VM Boot Image ....................................................................................................... 27
6$0681*&RQILGHQWLDO
&2$6,$1(;(//&2/7'PFMLQRDW
1 Introduction
1.1 This Document
This document describes the Hypervisor provided with the REDBEND Device Virtualization for
Connected Vehicles product. As described in the "Product Description" document, the REDBEND
Device Virtualization for Connected Vehicles is more than a mere Hypervisor, as it includes tools
and virtual drivers to be plugged in guest OSes.
The Hypervisor is the core component of the product providing virtualization services which
enable to run simultaneously multiple guest OSes on the same hardware platform. It is a Type-1
virtualization solution.
This document covers the generic virtualization services provided by the Hypervisor: memory
management and spatial isolation, scheduling and temporal isolation, inter-virtual machines
communication, Hypervisor interfaces, initialization.
The Hypervisor needs to deal with different hardware platforms (SoCs and boards), and possibly
different processor architectures. Hence it is architected so that platform dependent services are
implemented in a BSP (Board Support Package), enabling to port the Hypervisor to a new SoC
and/or board within a supported processor architecture (e.g. ARMv8).
2 Prerequisite
The reader should be familiar with virtualization concepts.
6$0681*&RQILGHQWLDO
(vCPUs) on the available physical CPUs
• Transmits cross-interrupts from one guest OS to another
• Provides low-level inter guest OS communications based on shared memory and cross-
interrupts (bus-based communication paradigm)
&2$6,$1(;(//&2/7'PFMLQRDW
This architecture allows all guest OSes to run independently and to be readily stopped and
restarted. The REDBEND Virtual Device Driver Framework allows (re)synchronizing guest OSes that
share access to the same peripheral devices.
The following figure shows an example of three virtual machines: one virtual machine for an RTOS
and two virtual machines for two general purpose guest OSes.
Applications Applications
Applications
Guest OS Guest OS
Highest priority OS
(RTOS or other)
6$0681*&RQILGHQWLDO
• Resumption of the execution of a previously suspended guest OS
• Halt and reboot of a guest OS
• Modification of the priority of a guest OS
&2$6,$1(;(//&2/7'PFMLQRDW
• Acquisition of usage statistics
Available services vary per guest OS and can be masked. In Linux, for example, masking can be
done by entering the relevant information in a control file located in the /proc subtree.
6$0681*&RQILGHQWLDO
interrupt requests. This is achieved by statically partitioning these physical resources between the
Hypervisor and the guest OSes. As the target systems are embedded static systems, there is no
need to provide dynamic partition mechanisms.
All partitions are defined up-front as part of the configuration of the overall system. The
&2$6,$1(;(//&2/7'PFMLQRDW
configuration is described in a Device Tree provided at system build time. During initialization, the
Hypervisor parses the Device Tree and instantiates the corresponding partitions.
As a result, configuration errors if any are found during early initialization. If the system passes the
initialization step, no more errors such as lack of memory may be encountered once initialization is
over.
4.1 RAM
The physical memory is partitioned by the Hypervisor according to the memory requirements
defined for each VM in the Device Tree. No memory overcommit is allowed. If there is not enough
physical memory to match the configuration defined in the Device Tree, the system stops with a
"configuration error".
PA Hypervisor PA
6$0681*&RQILGHQWLDO
Hypervisor memory partition
&2$6,$1(;(//&2/7'PFMLQRDW
The Hypervisor runs in its own memory partition. The Hypervisor memory partition is disjoint from
the partitions allocated to the Virtual Machines.
The stage-2 MMU has two roles: it isolates each Virtual Machine from the others and it performs a
translation to make Virtual Machines independent from the addresses of the physical memory it
uses. The mapping provided by stage-2 MMU can be either identical or non-identical. When an
identical mapping is used, Intermediate Physical Addresses (IPA) are identical to Physical
Addresses (PA). In a non-identical mapping, IPA and PA differ. A Virtual Machine may use an
identical mapping while another one may use a non-identical mapping. The mapping is defined by
the Hypervisor Device Tree (See Device Virtualization Reference Manual > Configuration > VM
Virtual Platform Bindings > Virtual Platform Memory Node - "memory").
0xd2000000
Lower Heap
0xd2080000
Hypervisor
Binary
Early Heap
6$0681*&RQILGHQWLDO 0xd3000000
Higher Heap
&2$6,$1(;(//&2/7'PFMLQRDW
Figure 3:Hypervisor Memory Partition
6$0681*&RQILGHQWLDO
All physical interrupts are virtualized by the Hypervisor. When a physical interrupt occurs, the
interrupt virtualization component of the Hypervisor routes that interrupt to the appropriate
Virtual Machine where it triggers the Interrupt Service Routine of the guest OS.
&2$6,$1(;(//&2/7'PFMLQRDW
From the Hypervisor point of view, an interrupt is known as an eXtended interrupt, also called
XIRQ, and is part of the range [0, 1024), which is composed of two disjoint ranges, e.g:
• [0, 5121): physical interrupts
• [512, 1024): software interrupts, also called cross-interrupts, typically used for
communication between virtual drivers
Interrupt mapping configuration defines, for each XIRQ, if it is forwarded to a given VM, and under
which virtual Interrupt Controller IRQ ID. Below picture provides an example of interrupt mapping
configuration of two VMs:
1
512 is the typical default value, but it depends on the SoC
Virtual IC Space VM1 space Hardware IC Space VM2 Space Virtual IC Space
(vGIC Space) (GIC Space) (vGIC Space)
1019 1023 1024
1019
768
639
512 512 512
511
495
332 384
315
300
284
32
0 0 0 0 0
6$0681*&RQILGHQWLDO
As seen in the above picture, Figure 4: Interrupts Mapping, hardware interrupts [0, 512) are
partitioned between different guest OSes: one hardware interrupt ID may be assigned only to one
VM.
&2$6,$1(;(//&2/7'PFMLQRDW
Software interrupts, on the other hand, obey to a different rule: each VM is provided with an
independent space of Software Interrupts. Thus, the same software interrupt ID may be reused in
multiple VMs.
In a large majority of use cases, an identical mapping will be defined by each VM for hardware and
software interrupts. One reason of a non-identical mapping might be a limitation of the number of
supported interrupt lines in the guest OS kernel. In above picture, VM3 only supports interrupts up
to 892, so we use a non-identical mapping.
6$0681*&RQILGHQWLDO
Machines.
Hardware interrupts directed to a VM is virtualized by the Hypervisor Interrupt Virtualization
component. This component also takes care of software interrupts. Both are transmitted to a
&2$6,$1(;(//&2/7'PFMLQRDW
virtual GIC driver provided by the Hypervisor.
This virtual GIC driver sends the virtual interrupt toward the appropriate Virtual Machine where it
is processed by a guest OS GIC driver and then by the appropriate device driver.
GuestOS1 GuestOS2
vCPU0 vCPU0 vCPU1 vCPU2
In the above figure, Error! Bookmark not defined., each VCPU is exclusively bound to a physical
CPU. There is no scheduling overhead imposed by the Hypervisor, and the two Virtual Machines
are isolated from each other from a temporal standpoint.
6$0681*&RQILGHQWLDO
•
•
The time to preempt a low priority vCPU and assign the physical CPU to a high priority vCPU is
small and bounded.
The overhead induced for interrupt virtualization is small and bounded.
&2$6,$1(;(//&2/7'PFMLQRDW
• The different vCPUs of a given VM are equally scheduled on physical CPUs. No vCPU shall lag
behind other vCPUs of the same VM.
The Hypervisor scheduler also provides a fair share scheduling policy. It is currently not described
in this document.
GuestOS1 GuestOS2
vCPU0 vCPU0 vCPU1 vCPU2 vCPU3
6 Communication Channels
6$0681*&RQILGHQWLDO
6.1 VM - Hardware Communication Channels
&2$6,$1(;(//&2/7'PFMLQRDW
Based on the Virtual Machine configuration defined in the Device Tree, the Hypervisor grants the
Virtual Machine access to a subset of the hardware resources (3.1).
During initialization, the Hypervisor populates the Virtual Machine I/O space with a subset of
hardware I/O resources granted to the Virtual Machine as specified by a its static configuration. As
a result, the guest OS can access such resources directly without any Hypervisor assistance. No
runtime overhead is imposed on such I/O operations.
The Hypervisor transparently sends to the guest OS a subset of the hardware IRQs granted to the
Virtual Machine as specified by its static configuration. Interrupts are virtualized. There is minimal
bounded overhead introduced by virtualization between the hardware interrupt is triggered and
the time the virtualized interrupt is delivered to the guest OS IRQ routine.
GuestOS
Hypervisor
6$0681*&RQILGHQWLDO
6.2 VM - SMC Communication Channels
&2$6,$1(;(//&2/7'PFMLQRDW
The Hypervisor traps Secure Monitor Calls (SMC) instructions issued by the guest OS as specified
by its static Virtual Machine configuration.
The Hypervisor BSP implements a customizable SMC policy combining the following mechanisms
provided by the Hypervisor frameworks:
• SMC emulation
• Execution of this or other SMCs
• SMC rejection
GuestOS GuestOS
SMC instr. SMC instr.
VM SMC VM SMC
Trap Trap
SMC
Hypervisor Policy
SMC
Secure World
6$0681*&RQILGHQWLDO
6.3 VM - Hypervisor Communication Channels
Guest OS and Hypervisor communicate via
&2$6,$1(;(//&2/7'PFMLQRDW
• Hypercalls
The Hypervisor polices hypercalls according to the access list defined as part of the
configuration of the Virtual Machine. Whether the guest OS is entitled to invoke a given
operation is specified by such an access list.
• Virtual I/O peripherals
The Hypervisor populates the Virtual Machine I/O Space with virtual I/O resources as
specified by the static Virtual Machine configuration. The Hypervisor intercepts any guest OS
access to such a virtual resource and emulates the peripheral logic transparently for the guest
OS.
• Virtual interrupts
All IRQs delivered to a guest OS are statically defined in the Virtual Machine configuration.
GuestOS
Hypercall I/O access IRQ receive
Hypervisor
6$0681*&RQILGHQWLDO
provided by a communication framework, on top of which more complex and specialized
communication protocols can be built and then used to connect "communication virtual drivers",
running in different Virtual Machines:
• Memory regions to support data exchanges between different Virtual Machines:
&2$6,$1(;(//&2/7'PFMLQRDW
° PDEV: Persistent Device memory to handle device metadata. Such a memory region is
accessed only by one Virtual Machine and the Hypervisor and is not shared between two
Virtual Machines. The metadata contains the description of the virtual link end-points
and data descriptors. Access to this metadata by virtual drivers running in the guest OS
is done by simply dereferencing pointers to such data. The content of PDEV memory
regions is preserved across guest OS reboot.
° PMEM: Persistent Memory to handle data to be exchanged between Virtual Machines.
Such memory is allocated by the Hypervisor and can be mapped by multiple Virtual
Machines.
• Cross-interrupts: virtual interrupts which can be sent from one Virtual Machine to another to
signal it that something has to be done. See 4.3 Interrupt Management.
All of these mechanisms are used by virtual drivers running in guest OSes.
6.4.1 Vlink
A vlink is an inter-VM communication channel between two Virtual Machines. A virtual
communication link is a peer-to-peer, uni-directional communication channel between a server
side end-point and a client side end-point. Usually the vlink server and client sides are located in 2
different Virtual Machines but can be located within the same Virtual Machine.
A vlink encapsulates channel’s persistent resources:
VM1 VM2
XIRQs XIRQs
PMEM
PDEV PDEV
vlink
6$0681*&RQILGHQWLDO
HYPERVISOR
&2$6,$1(;(//&2/7'PFMLQRDW
Figure 11 : Vlink Architecture
Guest OS … Guest OS
Asynchronous
Hyper- Synchronous
Exceptions
calls Exceptions
(Interrupts)
Configuration
VM
Hypervisor VM
VM
Hypervisor
Hardware Asynchronous Exceptions
(Interrupts)
7.1 &2$6,$1(;(//&2/7'PFMLQRDW
Example: Console Output Management
This section describes an example of hypercall used by guest OSes to print message on the
console.
A guest OS may print messages on the console using the following hypercalls:
• hyp_call_putchar(‘H’): prints one character
• hyp_call_putstr(“Hello world”) : prints a character string
• hyp_call_cons_write(“Hello world”, 11) : prints a number of characters
Hypervisor aggregates all guest OS console output in a single circular buffer called Hypervisor
Console History.
The Hypervisor prints the Console History on the UART console, so that guest OS outputs appear
on the console. Output of guest OSes are differentiated by using different colors for each guest OS.
The Hypervisor provides a hypercall to retrieve the Console History buffer content:
• char ch = hyp_call_cons_hist_getchar (&ordinal);
It is also possible to retrieve the console content via:
• cat /dev/vlx-history from a shell of a Linux guest OS,
• adb shell cat /dev/vlx-history from a shell of the Host OS.
hyp_call_cons_write() hyp_call_cons_hist_getchar()
Console
History
Buffer
UART
6$0681*&RQILGHQWLDO
7.2 Peripheral Emulation example
Instead of using hypercalls as shown in the previous section, a guest OS may print messages on the
console using a specific (virtual) UART peripheral (e.g. S4210).
&2$6,$1(;(//&2/7'PFMLQRDW
A Hypervisor Emulation Driver (e.g. vs5pv210-uart.c) intercepts all guest OS accesses to the
UART registers, emulates the UART logic and aggregates the console output in the Console History
buffer.
This requires that the VM configuration (vplatform) specifies the emulated (virtual) UART
peripheral, e.g.:
/ {
serial@14C30000 {
compatible = "samsung,exynos4210-uart";
reg = <0 0x14C30000 0x100>;
interrupts = <0 433 0>;
vl,vfifo = "console";
};
};
GuestOS /dev/vlx-history
GuestOS
(virtual)
UART
Synchronous hyp_call_cons_hist_getchar()
Exceptions
Emulation
Driver
Console
History
Buffer
UART
6$0681*&RQILGHQWLDO
8 Monitoring
The Hypervisor offers services to help monitor, tune and debug the overall software system.
&2$6,$1(;(//&2/7'PFMLQRDW
• Monitoring and diagnostic
It collects data about Hypervisor events such as VM restarts, context switches, interrupts,
exceptions. The Hypervisor provides APIs to access these collected data. Collecting data is
dynamically enabled and disabled using Hypervisor properties.
• Hypervisor event logging
Hypervisor events are logged in per physical CPU circular log buffer rings. Oldest events can
be overwritten if they are not periodically retrieved by a Guest OS. The content of these log
buffers persist across machine reboot, allowing to inspect data even after failure.
• Hypervisor event statistics
The Hypervisor counts the occurrences of Hypervisor events in per CPU/vCPU counter
matrices. Events may be: event counters e.g. interrupts, exceptions, hypercalls counts, event
timing counters.
• CPU snapshots
In case of panic, the Hypervisor dumps the system registers in a persistent memory area. The
content of this area is preserved across warm reset of the platform, hence the content of the
registers can be examined after reboot of the platform to identify the cause of the failure.
The event logs and statistics are stored in areas of memory shared between the Hypervisor and
the Guest OSes, making it easy to retrieve this information from the Guest OSes. Virtual device
drivers are provided for Linux Guest OSes to make it even easier.
CPU usage statistics and CPU snaphots may be retrieved by Guest OSes through dedicated
hypercalls. Again, virtual device drivers are provided for Linux Guest OSes.
9 Hypervisor BSP
The Hypervisor is structured in two pieces:
• The core Hypervisor which is generic and portable. It remains unchanged when adapting the
Hypervisor to a new board. The core Hypervisor is usually delivered in binary form.
• The Hypervisor BSP which interfaces the underlying hardware and the core Hypervisor. It
includes all drivers, initialization routines and configuration files that implement
CPU/SoC/Board/Use-case specific integration logic. The Hypervisor BSP also includes the
Hardware Abstraction Layer (HAL) used by the core Hypervisor.
The Hypervisor BSP is seen by the core Hypervisor as a collection of API. Therefore, developing a
new Hypervisor BSP consists in providing the implementation of these API on a specific board.
The overall configuration of a system is defined by a Device Tree which includes three kinds of
subtrees:
• The Hypervisor Device Tree defines the configuration of the Hypervisor,
• One Device Tree per Virtual Machine, each describing the configuration of one Virtual
Machine such as memory, VCPUs, devices…
6$0681*&RQILGHQWLDO
• A physical platform Device Tree which describes the actual hardware. This Device Tree is
generally provided by the bootloader. The Hypervisor BSP mainly deals with this Device Tree.
The Hypervisor BSP has to provide the set of API expected by the core Hypervisor. However, it can
&2$6,$1(;(//&2/7'PFMLQRDW
uses some services defined by the core Hypervisor. These services are defined by the "Component
Interfaces" chapter of the "Device Virtualization Reference Manual, BSP Edition".
The Hypervisor BSP is composed of the following sets of components:
• Generic software components
• CPU dependent components
• SoC dependent components
• Board dependent components
This structure is designed to re-use the existing software components, and therefore to accelerate
the time needed to port on a new SoC or Board.
soc
board
cpu
bsp
framework
6$0681*&RQILGHQWLDO
uart
vdt
&2$6,$1(;(//&2/7'PFMLQRDW
Figure 15: BSP Source Tree Layout
10 Initialization
The hardware boot loader loads the Multi-VM boot image2 which includes the Hypervisor image. It
then jumps into the Hypervisor entry-point.
The Hypervisor entry routine (crt0) is written in assembly. It performs low-level initializations
(e.g. sets-up an execution stack) and then continues in C-written initialization code.
2
See the description of the build mechanism further in this document.
11 Build Overview
The overall build process is depicted by the following picture.
Core
Hypervisor Core Hypervisor Hypervisor
Hypervisor
Build Bin. Distribution Hypervisor Image
Source Build Hypervisor
Hypervisor BSP Hypervisor BSP Build
Source REDBEND Source
6$0681*&RQILGHQWLDO
GuestO
GuestOS
GuestO
GuestOS SS
Image
Build(s)
Hypervisor Image and
&2$6,$1(;(//&2/7'PFMLQRDW
Configuration Build Process
Multi-VM Boot Image
Build Process
The core Hypervisor source is usually not delivered. It is used to generate a binary distribution
which is delivered along with the sources of a Hypervisor BSP. Customers may use the delivered
BSP or develop their own BSP.
From the core Hypervisor binary and the Hypervisor BSP source, one builds a Hypervisor image.
The configuration of the Hypervisor and the Virtual Machines (Virtual Platform configuration) are
built separately (Examples of “Virtual platform” device trees and hypervisor device tree are
provided in the hypervisor distribution).
This has to be completed by the binary guest OS images to run within the Virtual Machines. The
Hypervisor image, the configuration file (Device Tree Blob) and the guest OS images are then
packaged into a so-called multi-VM boot image.
The hardware bootloader is in charge of loading such an image into the main memory and to pass
control to the entry point of the Hypervisor.
Hypervisor
Image
6$0681*&RQILGHQWLDO
11.1 Build Tool Chains
Building a Multi-VM boot image is therefore quite complex. Guest OSes images must be generated
&2$6,$1(;(//&2/7'PFMLQRDW
with their own tool chain. Integrators may have specific build requirements, depending upon their
board and the applications they are developing.
All Hypervisor build mechanisms rely on Linux common tools such as:
• GNU make
• GCC x-chain
• dtc
• sh, cp, ln, rm, mkdir,
• Python3
Internally, a set of tools (called mkvlm) perform the various steps of the build process. However, as
the basic build mechanisms rely on well-known tools, it is possible to adapt the tool chain to
specific needs. As an example, Yocto has been used to generate Multi-VM boot images. Other
build chain integration are possible as well.
Hypervisor Hypervisor
Hypervisor
Image Image
Hypervisor
Configuration DTB Hypervisor
Configuration DTB
VM vplatform
DTB Header
VM vplatform Hypervisor
GuestOS DTB Configuration
Another Image + Initrd
Block
GuestOS VM vplatform
Storage VM vplatform DTB
DTB
Header
GuestOS GuestOS
GuestOS
Image + Initrd Image + Initrd
VM1 Image
Image + Initrd
GuestOS-specific Build Hypervisor Build
GuestOS
Image + Initrd VM2 Image
6$0681*&RQILGHQWLDO
End of Image
&2$6,$1(;(//&2/7'PFMLQRDW