Ibm Powerkvm
Ibm Powerkvm
IBM PowerKVM
Configuration and Use
Murilo Opsfelder Araújo
Breno Leitao
Stephen Lutz
José Ricardo Ziviani
Redbooks
International Technical Support Organization
March 2016
SG24-8231-01
Note: Before using this information and the product it supports, read the information in “Notices” on
page xvii.
This edition applies to Version 3, Release 1, Modification 0 of IBM PowerKVM (product number 5765-KV3).
© Copyright International Business Machines Corporation 2014, 2016. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
Chapter 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 IBM Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 POWER8 processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 IBM scale-out servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Power virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Simultaneous multithreading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.5 Memory architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.6 Micro-Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.7 RAS features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 PowerKVM versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 PowerKVM Version 3.1 considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Where to download PowerKVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Software stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 QEMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Open Power Abstraction Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.4 Guest operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.5 Libvirt software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.6 Virsh interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.7 Intelligent Platform Management Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.8 Petitboot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.9 Kimchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.10 Slimline Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.11 Virtio drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.12 RAS stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.2 Docker hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4.3 Docker file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Contents v
6.3.3 File-backed pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.4 I/O pass-through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.4.1 SCSI pass-through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.4.2 USB pass-through. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.4.3 PCI pass-through to a virtual machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.4 I/O limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.5 N_Port ID Virtualization (NPIV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.6 Using multipath disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.6.1 Multipath disk handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.6.2 Direct mapped multipath disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.6.3 Multipath disks in a storage pool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.7 Hot plug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.7.1 Adding a new vSCSI adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Figures ix
x IBM PowerKVM: Configuration and Use
Tables
Examples xv
7-12 docker info output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7-13 Docker service access error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7-14 Docker failing due to shared library not found . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7-15 Searching for images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7-16 Downloading a remote Docker image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7-17 Listing container images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7-18 Starting a container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7-19 Listing active containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7-20 Listing all containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7-21 Creating a Docker container based on an image . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7-22 Getting the Docker console. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7-23 Docker image changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7-24 Committing a file system change to a new image. . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7-25 Renaming a Docker image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7-26 Login information about Docker hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7-27 Uploading an image to Docker hub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7-28 Finding the image previously uploaded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7-29 Importing Ubuntu Core from the web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7-30 Renaming an image and starting it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7-31 Debootstrapping Debian in a local directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7-32 Importing from a local tgz file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8-1 Assuring the online repository is configured . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
8-2 Development packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8-3 Repository file for development kit installation from ISO. . . . . . . . . . . . . . . . . . . . . . . 231
8-4 Functions used to open a connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8-5 SASL access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
8-6 First example in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8-7 Running the first example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8-8 Connecting to a localhost hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8-9 Return a domain ID for a domain name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8-10 First example main body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8-11 Get the memory information for the active guests . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8-12 Second development example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8-13 Output of the second example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
8-14 Initial program implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
8-15 get_mac implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8-16 print_ip_per_mac implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8-17 Implementing the main function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8-18 Showing the final results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
8-19 Python implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
8-20 Program results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
Active Memory™ POWER® PowerVM®
AIX® Power Architecture® Redbooks®
IBM® Power Systems™ Redbooks (logo) ®
IBM SmartCloud® POWER6® Storwize®
IBM z Systems™ POWER7® System z®
Micro-Partitioning® POWER8® z/VM®
SoftLayer, and SoftLayer device are trademarks or registered trademarks of SoftLayer, Inc., an IBM Company.
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Download
Android
iOS
Now
This IBM® Redbooks® publication presents the IBM PowerKVM virtualization for scale-out
Linux systems, including the new LC IBM Power Systems™.
PowerKVM is open source server virtualization that is based on the IBM POWER8®
processor technology. It includes the Linux open source technology of KVM virtualization, and
it complements the performance, scalability, and security qualities of Linux.
This book describes the concepts of PowerKVM and how you can deploy your virtual
machines with the software stack included in the product. It helps you install and configure
PowerKVM on your Power Systems server and provides guidance for managing the
supported virtualization features by using the web interface and command-line interface
(CLI).
This information is for professionals who want to acquire a better understanding of PowerKVM
virtualization technology to optimize Linux workload consolidation and use the POWER8
processor features. The intended audience also includes people in these roles:
Clients
Sales and marketing professionals
Technical support professionals
IBM Business Partners
Independent software vendors
Open source community
IBM OpenPower partners
It does not replace the latest marketing materials and configuration tools. It is intended as an
additional source of information that, along with existing sources, can be used to increase
your knowledge of IBM virtualization solutions.
Before you start reading, you must be familiar with the general concepts of kernel-based
virtual machine (KVM), Linux, and IBM Power architecture.
Authors
This book was produced by a team working at the International Technical Support
Organization, Poughkeepsie Center.
Murilo Opsfelder Araújo is a Software Engineer working in the Linux Technology Center at
IBM Brazil. He holds a Bachelor’s degree in Computer Science from Anhanguera Educational
Institute, Brazil. He is a Certified Linux Professional with experience in software development
for Linux appliances and servers. He is also a Linux kernel hobbyist.
Breno Leitao is a Software Engineer at IBM in Brazil. He has been using Linux since 1997
and working in the Linux Technology Center at IBM since 2007. He holds a degree in
Computer Science from the University of São Paulo. His areas of expertise include Linux,
virtualization, networking, and performance.
José Ricardo Ziviani is a Software Engineer at IBM Brazil since 2010. During that time, he
worked on the first versions of PowerKVM installer tool and is currently working on the
Kimchi/Ginger project. His areas of interest are virtualization, software engineering, and
electronics.
The project that produced this publication was managed by: Scott Vetter, PMP
Leonardo Augusto Guimaraes Garcia, Ricardo Marin Matinata, Gustavo Yokoyama Ribeiro,
Fabiano Almeida Rosas, Lucas Tadeu Teixeira
IBM Brazil
Gregory Kurz
IBM France
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
[email protected]
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xxiii
xxiv IBM PowerKVM: Configuration and Use
Summary of changes
This section describes the technical changes made in this edition of the book and in previous
editions. This edition might also include minor corrections and editorial changes that are not
identified.
Summary of Changes
for SG24-8231-01
for IBM PowerKVM: Configuration and Use
as created or updated on May 31, 2016.
New information
CPU Hotplug, see 5.4, “CPU Hotplug” on page 145
Memory Hotplug, see 5.6, “Memory Hotplug” on page 157
N_Port ID Virtualization (NPIV), see 6.5, “N_Port ID Virtualization (NPIV)” on page 176
Usage of Multipathing, see 6.6, “Using multipath disks” on page 182
Console configuration for Power LC Systems, see 2.4.2, “Console configuration for Power
LC systems” on page 59
PowerKVM Development Kit, see Chapter 8, “PowerKVM Development Kit” on page 227
Security, see 7.4, “Security” on page 201
Host management using Ginger, see 3.8, “Ginger” on page 96
Docker, see 1.4, “Docker” on page 22
Changed information
Changes in the installer of PowerKVM
– Manual installation
– Automated installation
– Host migration
– Configuration tool
– Console configuration for LC Power Systems
See Chapter 2, “Host installation and configuration” on page 31
Guest management using Kimchi, see Chapter 3, “Managing hosts and guests from a
Web interface” on page 61
Managing guests from the command-line interface, see Chapter 4, “Managing guests from
the command-line interface” on page 105
Update and enhancement of the following OpenStack related sections:
– PowerVC
– IBM Cloud Manager
Chapter 1. Introduction
This chapter covers the concepts of open virtualization on IBM Power Systems. It introduces
the IBM PowerKVM Version 3.1 virtualization stack and covers the following topics:
Quick introduction to virtualization
Introduction and basic concepts of PowerKVM
IBM Power Systems
IBM PowerKVM 3.1 software stack
Docker and container concepts
A comparison of IBM PowerKVM versions and IBM PowerVM®
Terminology used throughout this book
Only a subset of these servers is covered in this book. This subset is referred as IBM
scale-out systems, which include servers that run Linux-only operating systems that are based
on the IBM POWER8 processor.
The processor also has a 96 MB of L3 shared cache plus 512 KB L2 cache per core.
The following features can augment performance of the IBM POWER8 processor:
Support for DDR3 and DDR4 memory through memory buffer chips that offload the
memory support from the IBM POWER8 memory controller
The scale-out servers provide many benefits for cloud workloads, including security, simplified
management, and virtualization capabilities. They are developed using open source methods.
At the time of publication, these are the base system models that are part of this family of
servers:
IBM Power System S812L (8247-21L)
IBM Power System S822L (8247-22L)
IBM Power System S824L (8247-42L)
IBM Power System S812LC (8348-21C)
IBM Power System S822LC (8335-GTA)
IBM Power System S822LC (8335-GCA)
Chapter 1. Introduction 3
For more information about S812L, see IBM Power Systems S812L and S822L Technical
Overview and Introduction, REDP-5098:
http://www.redbooks.ibm.com/abstracts/redp5098.html?Open
For more information about the IBM Power System S824L check IBM Power System S824L
Technical Overview and Introduction, REDP-5139:
http://www.redbooks.ibm.com/abstracts/redp5139.html?Open
You can see the front and rear S824L view in Figure 1-5 and Figure 1-6.
The officially supported GPU for this specific machine is NVIDIA Tesla K40 based on the
Kepler architecture. You can see this card at Figure 1-7 on page 6.
NVIDIA CUDA
NVIDIA CUDA is a parallel computing platform and programming model that enables a
dramatic increase in computing performance by offloading compute operations to the GPU
card.
You can find details about the CUDA stack on POWER at Cuda Zone:
https://developer.nvidia.com/cuda-tools-ecosystem
Chapter 1. Introduction 5
Figure 1-7 NVIDA Tesla adapter
8335-GCA
The Power S822LC (8335-GCA) server supports two POWER8 processor sockets offering
either eight 3.32 GHz cores (#EP00) or ten 2.92 GHz cores (#EP01) per socket.
8335-GTA
The Power S822LC (8335-GTA) server supports two POWER8 processor sockets offering
either eight 3.32 GHz cores (#EP00) or ten 2.92 GHz cores (#EP01) per socket. The
8335-GTA model also includes two NVIDIA K80 GPUs (#EC49).
For more information about the S812LC server, see IBM Power System S812LC Technical
Overview and Introduction, REDP-5284:
http://www.redbooks.ibm.com/abstracts/redp5284.html?Open
Supported firmware
Table 1-1 shows the minimum firmware version required to support IBM PowerKVM V3.1 in
each machine model.
8247-21L FW840
8247-22L FW840
8247-42L FW840
8348-21C OP810
8335-GTA OP810
8335-GCA OP810
With the introduction of the Linux-only scale-out systems with POWER8 technology, a new
virtualization mechanism is supported on Power Systems. This mechanism is known as a
kernel-based virtual machine (KVM), and the port for Power Systems is called PowerKVM.
You are not able to run both PowerVM and PowerKVM hypervisors at the same machine, at
the same time.
KVM is known as the de facto open source virtualization mechanism. It is currently used by
many software companies.
Chapter 1. Introduction 7
IBM PowerKVM is a product that leverages the Power resilience and performance with the
openness of KVM, which provides several advantages:
Higher workload consolidation with processors overcommitment and memory sharing
Dynamic addition and removal of virtual devices
Micro-Threading scheduling granularity
Integration with IBM PowerVC, the IBM Cloud Manager with OpenStack, and native
OpenStack
Simplified management using open source software
Avoids vendor lock-in
Uses POWER8 hardware features, such as SMT8 and Micro-Threading
NPIV support (technology preview)
Docker support (technology preview)
NVIDIA pass-through (technology preview)
For more information about IBM PowerKVM capabilities, check section 1.3.2, “KVM” on
page 14.
Note: The IBM PowerVM and IBM PowerKVM hypervisors cannot be active at the same
time on the same server.
In a POWER8 processor configured with SMT 8, up to 96 threads are available per socket,
and each of them is represented as a processor in the Linux operating system.
On the PowerKVM supported machines, there are up to three different memory nodes, and
each of them has different performance when accessing other memory nodes.
Figure 1-10 on page 9 shows a two-socket server with a processor accessing local memory
and another accessing remote memory.
PowerKVM is aware of the NUMA topology on the virtual machines, and tuning memory
access might help the system performance.
For more information about memory tuning, see 5.3.4, “Configuring NUMA” on page 141.
Remote
memory
Memory block A access Memory block B
Power8 Memory
Memory Power8 Power8 Memory Power8
core
controller
controller core core controller core
Power8 Memory
Memory Power8 Power8 Memory Power8
core core core core
controller
controller controller
Local
Memory block C memory Memory block D
access
1.1.6 Micro-Threading
Micro-Threading is a POWER8 feature that enables a POWER8 core to be split into as many
as two or four subcores. This gives the PowerKVM the capacity to support more than one
virtual machine per CPU. Using Micro-Threading has many advantages when the virtual
machine does not have enough workload to use a whole core. Up to four guests can run in
parallel in the core.
Chapter 1. Introduction 9
Figure 1-11 shows the threads and the subcores on a POWER8 core when Micro-Threading
is enabled and configured to support four subcores per core.
Note: Some servers might have additional unique RAS features, and servers such as the
LC servers, use industry standard RAS features. It is best to check the features on the
server that you intend to deploy.
For more information about the Power Systems RAS features, see IBM Power Systems
S812L and S822L Technical Overview and Introduction, REDP-5098.
http://www.redbooks.ibm.com/abstracts/redp5098.html?Open
1.2 Virtualization
For practical purposes, this publication focuses on server virtualization, especially ways to run
an operating system inside a virtual machine and how this virtual machine acts. There are
many advantages when the operating system runs in a virtual machine rather than in a real
machine, as later sections of this book explain.
Hypervisor
Notes: The terms virtual machine and guest are used interchangeably in this book.
Chapter 1. Introduction 11
1.2.2 PowerKVM Version 3.1 considerations
There are several new considerations in the PowerKVM V3.1 release:
PowerKVM Version 3.1 is based on the POWER ABI version 2. The previous version was
based on Power Abstract Binary Interface (ABI) version 1. ABI version 1 saves data in big
endian mode, while ABI version 2 uses the little endian order.
PowerKVM does not support IBM AIX or IBM i operating systems.
PowerKVM cannot be managed by the Hardware Management Console (HMC).
The SPICE graphic model is not supported.
PowerKVM supports a subset of the I/O adapters (as PowerKVM is developed, the
adapter support continually changes)
Only one operating system is allowed to run on the host system. PowerKVM does not
provide multiboot support.
Number of para-virtualized devices 32 PCI device slots per virtual machine and
8 PCI functions per device slot
Note: Make sure that you registered your system and have your customer number handy.
The update, debuginfo, and source images are available from the IBM Fix Central website:
https://www.ibm.com/support/fixcentral
After you download the full installation image from ESS page, you can use it to install
PowerKVM from a NetBoot server or burn it on a DVD to perform a local installation. Both
methods are covered in Chapter 2, “Host installation and configuration” on page 31.
1.3.1 QEMU
QEMU is an open source software that hosts the virtual machines on a KVM hypervisor. It is
the software that manages and monitors the virtual machines and performs the following
basic operations:
Create virtual image disks
Change the state of a virtual machine:
– Start virtual machine
– Stop a virtual machine
– Suspend a virtual machine
– Resume a virtual machine
– Take and restore snapshots
– Delete a virtual machine
Handle the I/O between guests and the hypervisor
Migrate virtual machines
In a simplified view, you can consider the QEMU as the user space tool for handling
virtualization and KVM the kernel space module.
QEMU can also work as an emulator, but that situation is not covered in this book.
QEMU monitor
QEMU provides a virtual machine monitor that helps control the virtual machine and performs
most of the operations required by a virtual machine. The monitor can inspect the virtual
machine low-level operations, such as details about the CPU registers, I/O device states,
ejects a virtual CD, and many other things.
You can use the command shown in Figure 1-13 to see the block devices that are attached to
a QEMU image.
Figure 1-13 QEMU monitor window example
Chapter 1. Introduction 13
To run a QEMU monitor by using the virsh command, use the following parameters:
# virsh qemu-monitor-command --hmp <domain> <monitor command>
1.3.2 KVM
A kernel-based virtual machine (KVM) is a part of open source virtualization infrastructure
that turns the Linux kernel into an enterprise-class hypervisor.
QEMU is another part of this infrastructure, and KVM is usually referred as the QEMU and
KVM stack of software. Throughout this publication, KVM is used as the whole infrastructure
on the Linux operating system to turn it into a usable hypervisor.
The whole stack used to enable this infrastructure is described in 1.3, “Software stack” on
page 13.
KVM performance
Because KVM is a very thin layer over the firmware, it can deliver an enterprise-grade
performance to the virtual machines and can consolidate a huge amount of work on a single
server. One of the important advantages of virtualization is the possibility of using resource
overcommitment.
Resource overcommitment
Overcommitment is a mechanism to expose more CPU, I/O, and memory to the guest
machine than exists on the real server, thereby increasing the resource use and improving
the server consolidation.
SPEC performance
KVM is designed to deliver the best performance on virtualization. There are
virtualization-specific benchmarks. Possibly the most important one at the moment is called
SPECvirt, which is part of the Standard Performance Evaluation Corporation (SPEC) group.
SPECvirt is a benchmark that addresses performance evaluation of data center servers that
are used in virtualized server consolidation. It measures performance of all of the important
components in a virtualized environment, from the hypervisor to the application running in the
guest operating system.
For more information about the benchmark as the KVM results, check the SPEC web page:
http://www.spec.org/virt_sc2013
OPAL is part of the firmware that interacts with the hardware and exposes it to the PowerKVM
hypervisor. It comes in several parts, and this is an example about how it works on an
OpenPower server:
1. The baseboard management controller (BMC) is responsible to power on the system.
2. The BMC starts to boot each chip individually using the Self Boot Engine (SBE) part.
3. When the processors are started, the BMC calls Flexible Service Interface (FSI), that is
the primary service interface in the POWER8 processor. There is a direct connect
between the BMC and FSI called Low Pin Count (LPC).
4. After that, the hostboot firmware IPLs the system, using a secondary power-on sequence
called Digital Power System Sweep (DPSS).
5. At that time, the hostboot firmware loads the OPAL and moves all the CPUs to starting
point.
Opal development is done on public GitHub community. You can clone the code using the
following repository:
https://github.com/open-power
There are also other community Linux distributions that technically run as guests on
PowerKVM, such as:
Debian
http://www.debian.org
Fedora
https://getfedora.org
OpenSuse
http://www.opensuse.org
CentOS
https://www.centos.org/
Chapter 1. Introduction 15
1.3.5 Libvirt software
Libvirt software is the open source infrastructure to provide the low-level virtualization
capabilities in most hypervisors that are available, including KVM, Xen, VMware, IBM
PowerVM. The purpose of libvirt is to provide a more friendly environment for the users.
Libvirt provides different ways of access, from a command line called virsh to a low-level API
for many programming languages.
The main component of the libvirt software is the libvirtd daemon. This is the component that
interacts directly with QEMU or the KVM software.
This book covers only the command-line interface. See chapter Chapter 4, “Managing guests
from the command-line interface” on page 105. There are many command-line tools to
handle the virtual machines, such as virsh, guestfish, virt-df, virt-clone, virt-df, and virt-image.
For more Libvirt development information, see Chapter 8, “PowerKVM Development Kit” on
page 227 or check the online documentation:
http://libvirt.org
V irsh
Lib v irt
L ib virtd
Q e m u /K V M
Figure 1-14 Virsh and libvirt architecture
For more information about virsh, see Chapter 4, “Managing guests from the command-line
interface” on page 105.
On S8nnL models, the IPMI server is hosted in the service processor controller. On S8nnLC
models, it is hosted in the BMC. This means that the commands directed to the server should
use the service processor IP address, not the hypervisor IP address.
These are some of the IPMI tools that work with PowerKVM:
OpenIPMI
FreeIPMI
IPMItool
Note: The IPMI protocol is based on UDP protocol, which means that it might lose
datagram. Prefer to use it through lossless network routes.
1.3.8 Petitboot
Petitboot is an open source platform independent boot loader based on Linux. It is used in the
PowerKVM hypervisor stack, and is used to boot the hypervisor operating system.
Chapter 1. Introduction 17
Petitboot includes graphical and command-line interfaces, and can be used as a client or
server boot loader. For this document, only the basic use is covered.
For more information about this software, check the Petitboot web page:
https://www.kernel.org/pub/linux/kernel/people/geoff/petitboot/petitboot.html
You are also able to find Petitboot source code publicly available at OpenPower GitHub:
https://github.com/open-power/petitboot
1.3.9 Kimchi
Kimchi is a local web management tool meant to manage a few guests virtualized with
PowerKVM. Kimchi has been integrated into PowerKVM and allows initial host configuration,
as well as managing virtual machines using the web browser through an HTML5 interface.
The main goal of Kimchi is to provide a friendly user interface for PowerKVM, allowing them to
operate the server by using a browser most of the time. These are some of the other Kimchi
features:
Firmware update
Backup of the configuration
Host monitoring
Virtual machine templates
VM guest console
VM guest VNC
Boot and install from a data stream
For more information about the Kimchi project, see Chapter 3, “Managing hosts and guests
from a Web interface” on page 61 or the project web page:
http://github.com/kimchi-project/kimchi
SLOF is also a machine independent firmware based on the IEEE-1275 standard, also known
as the Open Firmware Standard. It executes during boot time and then it is not necessary any
more in the system, so it is removed from the memory. An abstraction of the SLOF
architecture is shown in Figure 1-17.
Chapter 1. Introduction 19
Boot Time
Operating System
SLOF
Qemu/KVM
Hypervisor
Application
Network stack
Network
device driver
Guest
Virtio
Stack
Hypervisor
Physical drive
emulation
Many Virtio drivers are supported by QEMU. The main ones are provided in Table 1-5.
virtio-rng Virtual device driver that exposes hardware number generator to the guest
For the guest point of view, the drivers need to be loaded in the operating system.
Chapter 1. Introduction 21
Note: ibmveth and ibmvscsi are also paravirtualized drivers.
1.4 Docker
Docker is a software application that creates an operating system abstraction layer able to run
independent and isolated containers on top of it. This technology creates an environment
similar to virtualization and it is usually called a lightweight hypervisor. Docker is based on
three underline major Linux features called namespaces, change root (chroot), and control
groups (cgroups).
Docker also provides an official image repository with plenty of ready-to-use and freely
distributed container images. Download these images takes no more than one single Docker
command.
Linux namespace
Linux namespace is an isolation concept where each namespace has the illusion that it has
control of the whole system, and it is not able to see the content of any other namespace.
There are several types of namespaces. Basically, namespaces exist for all major
subsystems as process, users, IPC, network, mount tables, and so on.
Different processes using the same PID can exist handling different processes on different
namespaces. This is what allows different namespaces to run different initialization processes
as a traditional System V init.d boot process as depicted in namespace A, and a Systemd boot
process, as in namespace B.
Namespace A
Global Process List View
PID Process
PID Process Namespace
1 Init
1 Init A
2 Apache
1 Systemd B
2 Apache A
Mountpoint device
3 Sshd B
/ /dev/sda
2 Nginx B
/home /dev/sdb
Change root
Change root is a Linux feature that changes the file system root directory for a specific
process, hence the process file system directories are restricted for that process, similarly to
process namespace but aiming at file system restriction. This feature also gives the process
the illusion to see the whole file system and access any directory, but in this case, that is not
real, the access happens inside a jail, and the process is not allowed to access any file
outside of that jail.
Chapter 1. Introduction 23
Figure 1-20 shows an example of a change root feature and the jail environment. On the
bottom, you see the jail environment. It has a completely new file system hierarchy inside a
normal directory. The chrooted process is only able to access files and directories inside that
jail. The chrooted process will have the illusion that the root directory is the system directory,
but it is, in fact, a simple subdirectory.
/user1
` /user 2 /user3
/Photos
` /chroot
` /Documents
`
Chroot jail
/
(chroot root)
/boot
` /etc
` /bin
` /home
` /boot
` /usr
`
Control groups
Control groups is a feature that accounts and limits resource utilization for a specific set of
processes. This is the kernel feature that limits each container to not use more memory, CPU,
and I/O than specified by the hypervisor administrator.
Other than just limiting the resources for a set of processes, control groups is also responsible
for prioritizing requests for CPU utilization, I/O throughput, and memory bandwidth.
Another important feature provided by control groups is the ability to checkpoint and restart a
userspace set of applications, allowing a process (or container) to be stopped in a certain
machine, and restart in another one. This is the concept behind container migration.
Figure 1-21 depicts a situation where two Docker containers and two virtual machines are
running at the same time.
PowerKVM
POWER server
(Hardware)
Figure 1-21 Containers and virtual images on a POWER server
Although a container and a virtual machine can coexist in the same environment, they are
managed using different tool sets and there is not a simple way to see them together. The
Libvirt community started a project to manage virtual machines and Docker containers in a
libvirt extension named libvirt-lxc, but it is not production ready yet. When available, virtual
machines are managed by kimchi or libvirt, while containers are managed by the Docker
command line.
Other than that, a container must run using the hypervisor kernel, in this case a PowerKVM
kernel. Hence, a container cannot have a different and unique kernel, as it happens with
virtual machines. Container architecture allows the container to run in the userspace stack in
a restricted environment on top of the PowerKVM kernel.
Chapter 1. Introduction 25
Qemu and KVM allow a full new operating system instantiation, allowing any kind of kernel,
and thus any software stack, to run on top of an emulated hardware way, as shown in
Figure 1-22.
Application
OS Libs
Kernel Application
Hardware emulation OS Libs
PowerKVM
POWER server
(Hardware)
Figure 1-22 Container and guest OS stack
The container technology has less overhead and software flexibility when compared to a full
virtualization.
Docker hub can be accessed directly from the Docker application. To search for an image,
you can use the docker search command. If you want to download the image, you can use
docker pull. To upload an image, the command docker push is used.
For more information about these commands, see 7.6.4, “Uploading your image to Docker
hub” on page 222.
Figure 1-23 shows the process of creating a Docker image from a Docker file. When the
Docker file is converted into an image, you can start this image and in this case, you have a
unique container. Multiple containers can point to the same image, but each container forks
the original image and creates forked images.
Docker image
FROM ubuntu:14.04
RUN apt-get update && apt-get install httpd
ADD /root/www/html /var/www/html
….
Build
Chapter 1. Introduction 27
1.5.1 PowerVM and PowerKVM features
Table 1-7 compares IBM PowerVM and IBM PowerKVM features.
Supported machines All non-LC IBM Power IBM scale-out systems only
Systems
ABI version 1 2
Chapter 1. Introduction 29
1.6 Terminology
Table 1-9 lists the terms used for PowerKVM and the counterpart terms for KVM on x86.
Image formats: qcow2, raw, acow2, raw, nbd, and other proprietary
nbd, and other image formats image formats
KVM host user space (QEMU) KVM host user space (QEMU) Virtual I/O Server (VIOS)
Open Power Abstraction Layer Unified Extensible Firmware PowerVM hypervisor driver
(OPAL) Interface (UEFI) and BIOS (pHyp)
After reading this chapter, you should be able to perform the following tasks:
Install PowerKVM from a local DVD media
Automate installations over a network
Reinstall PowerKVM
Migrate a host from an older release
Configure an installed system
PowerKVM supports physical DVD media and NetBoot installation methods. The OPAL
firmware uses Petitboot, a kexec-based bootloader capable of loading kernel and initrd from
any Linux mountable file system.
PowerKVM is available in ISO format only. For DVD installation, burn the ISO image file into a
DVD media. To set up a NetBoot installation, you must extract the NetBoot files from the ISO
image on the boot server, as shown in Figure 2-18 on page 42.
Note: PowerKVM cannot be installed on a SAN disk, such as a Fibre Channel connected
device. However, SAN disks can be used as storage backing for guest images and other
file systems.
Important: For scale-out Power Systems, the IPMI network console must be enabled
with a password on the ASM interface. See 2.4.1, “Console configuration for Scale-out
Power Systems” on page 58 for more information.
For the Power LC Systems, the default password is admin to connect to the system.
Note: Petitboot starts the installer automatically after a 10-second timeout. You can
also change or disable the automatic boot on a persistent basis by changing the
Autoboot option on the System configuration menu.
5. Choose the language. English, as shown in Figure 2-3, is used for this installation.
Tip: The PowerKVM installer allows you to switch from panels during the installation
process. For example, if you want to switch to the shell panel, use CTRL-B and then
type 2.
Note: If the installer detects an existing PowerKVM system on one of the disks, it also
offers to do an Install over existing IBM PowerKVM, which preserves the guest images.
For more information about reinstalling PowerKVM, see 2.3, “Install over existing IBM
PowerKVM and host migration” on page 52.
The option Install over existing IBM PowerKVM is for migrating from PowerKVM 2.1 to
PowerKVM V3.1. For more information about migrating PowerKVM, also refer to section
2.3, “Install over existing IBM PowerKVM and host migration” on page 52.
8. In the next step, as shown in Figure 2-6, the name of the volume group can be defined.
The default is ibmpkvm_vg_root. Changing the name of the volume group can be useful for
example if you want to install several instances of PowerKVM on different disks inside one
system for demo purposes.
Figure 2-6 Change the name of the volume group and the size of the volumes if needed
10.Set the time zone for the system. Select the indicated box if the system clock uses UTC
(Figure 2-8).
12.Set the date and time. Figure 2-10 shows an example when this publication was written.
14.You can edit network devices by selecting the Edit option, as shown in Figure 2-12.
18.A progress bar is displayed while installing files into the selected device. The example in
Figure 2-16 shows a progress bar status of 25% complete.
20.The system reboots and automatically loads PowerKVM from the installed device, as
shown in Figure 2-18.
Figure 2-18 PowerKVM is automatically loaded from the installed device after reboot
The following are the requirements for preparing the PowerKVM netboot process:
PowerKVM ISO image
DHCP server in the same subnet as the target machine
HTTP (or FTP) and TFTP server that can be reached from the target machine. From now
on, we refer to this server as the netboot server
2. You see the Petitboot Config Retrieval window, as shown in Figure 2-20. Enter your
remote location in Configuration URL, and select OK.
Select Install PowerKVM 3.1.0 and the system boots using the parameters you specified.
The netboot process can load files from the network by using different protocols (Table 2-1).
Load kernel and initrd kernel location HTTP, FTP, and TFTP
intrd location
Load minimal root file system root=live:location HTTP and FTP (compressed
read-only file system image);
NFS (ISO image)
The sections ahead describe how to configure the network services HTTP, DHCP, and TFTP
on the netboot server.
Note: You must set file permissions and firewall rules to allow incoming requests to the
DHCP, HTTP, or FTP services that you are configuring. These settings depend on the
Linux distribution that you are running.
Note: The DHCP server needs to be configured in the same subnetwork that the target
system boots. Otherwise, the target system will not be able to discover DHCP, and boot
through PXE will not work.
1. Configure the DHCP server to be capable of reading text files from TFTP server using the
conf-file option. Example 2-2 shows a sample configuration.
Note: Automatic installation can be executed only on the hvc0 console. That is the console
type supported by ipmitool and QEMU with the -nographic option.
The boot option kvmp.inst.auto supports HTTP, TFTP, and NFS protocols.
To download the kickstart file specified in the kvmp.inst.auto option, the installer needs to
have network access during boot. The network settings can be specified by using the
following parameters in the boot configuration file:
ifname=interface:mac
ip=ip:server-id:gateway:netmask:hostname:interface:none
ip=interface:dhcp
The server-id is rarely used so it is common to omit its value, for example:
ip=192.0.2.10::192.0.2.254:255.255.255.0:powerkvm-host:net0:none
If you specify a domain name as the kickstart file location, the kvmp.inst.auto option, for
example server.example.com/pub/powerkvm.ks, instead of an IP address, you will need to
specify the name servers to be used to resolve server.example.com into an IP address:
nameserver=server
The nameserver parameter can be specified up to three times. The exceeding appearances of
this parameter will be ignored.
The following Example 2-5 shows the content of a sample Petitboot configuration file with
network settings and a kickstart file specified to perform an unattended installation.
In the prior example, disk /dev/sda is used for the installation. The LVM volume group
VOL_GROUP is created. The LVM logical volume / has 100 GiB of size, and /var/log has 30
GiB. The logical volume /var/lib/libvirt/images is created automatically with the
remaining space in the volume group VOL_GROUP.
You can also use more than one disk during installation. Just add multiple part lines and
update the volgroup line in your kickstart file, as shown in Example 2-7.
Example 2-7 Kickstart example using more than one disk for installation
part pv.01 --ondisk=/dev/sda
part pv.02 --ondisk=/dev/mapper/mpathb
volgroup VOL_GROUP pv.01 pv.02
...
Network (required)
The network option is used to configure network settings of the target system. The
settings will take place after the next reboot, when the target system will be already
installed and properly configured.
Note: The network settings that you specify in the kickstart file are for the target
system. They are not supposed to configure network settings of the installer LiveDVD.
Example 2-10 shows how to configure an interface using a static IP address. And it also
shows to specify the netmask, gateway, and name server addresses.
Root password
The password of user root can be in plain text or encrypted.
The following example shows how to specify the password in plain text (not encrypted):
rootpw plain-text-password
And the following example shows how to specify an encrypted password:
rootpw --iscrypted password-hash-in-sha512-format
You can generate a hash of your password by executing the following shell command:
echo -n “Your password” | sha512sum
Or by using the module hashlib from the Python standard library:
python -c “import hashlib; print(hashlib.sha512(b’Your password’)).hexdigest()”
rootpw topsecret
%post
your shell commands to be executed after installation
%end
On the installed system, you can find the files from the automated installation:
Log files: /var/log/powerkvm/*.log
You can change the predictable name by using the ifname option in the boot configuration file,
pxe.conf. Typically, you just need to specify the hardware address of the interface to be
renamed, for example:
ifname=new-nic-name:mac-address
Suppose that your interface hardware address is 00:11:22:33:44:55 and you want to refer to
it as lan0. Just add the following to your boot line arguments:
ifname=lan0:00:11:22:33:44:55 ip=lan0:dhcp
After the system is installed, the interface lan0 exists with the configuration that you specified
in the kickstart file.
In case of an existing 3.1.0 instance, the reinstallation can be used as a rescue method for
reinstalling the host without losing the guest disk images.
It is a preferred practice to save the current libvirt configuration files so you can be able to
convert them to the new syntax and have your guests up and running after the host is
completely migrated. To save libvirt configuration files, you can use the following commands:
mkdir -p /var/lib/libvirt/images/backup
tar cvzf /var/lib/libvirt/images/backup/libvirt-xml-files.tar.gz /etc/libvirt
After that, place libvirt-xml-files.tar.gz in a safe storage media and you are ready to
start the migration process.
In the installation process, when the installer detects a previous instance of PowerKVM, it
offers the option Install over existing IBM PowerKVM, as shown in Figure 2-22. The guest
images on the data partition are preserved. Only the root, swap, and boot partitions of the
host are erased and reinstalled.
Figure 2-22 Install over existing IBM PowerKVM option displayed in the installer menu
After this point, the installer steps are much the same as described in section 2.1, “Host
installation” on page 32.
After the installation of PowerKVM is completed, you can convert the libvirt configuration files
to the new syntax used in PowerKVM V3.1.0. The following steps can be used only to convert
libvirt configuration files syntax from PowerKVM version 2.1.1.3 to 3.1.0. If you were already
running version 3.1.0, you can skip the following steps:
1. Install the powerkvm-xml-toolkit package. If it is not installed by default, use the following
command:
yum install powerkvm-xml-toolkit
2. Extract the saved libvirt configuration files by running:
tar xvf /var/lib/libvirt/images/backup/libvirt-xml-files.tar.gz -C /
3. Run the xml-toolkit.sh script to convert all libvirt configuration files. The following
example shows how it can be done.
For each converted file, a backup file is created with the .bak extension. For additional
help, run xml-tookit.sh -h.
At this point, your host and guests are migrated to the new PowerKVM version 3.1.0.
2.4 Configuration
This section describes how to use the ibm-configure-system tool that is included in the
PowerKVM installation to change some important settings of the machine.
You can use the ibm-configure-system tool to perform the following maintenance tasks on
the installed system:
Reset the root password
Select the time zone
Set the date and time
Configure the network
Configure the DNS
To execute the configuration tool, from the root shell, run this command:
# ibm-configure-system
Timezone selection
By selecting Timezone selection in the configuration-system tool, you can adjust the time
zone as shown in Figure 2-26.
Tip: Use the ibm-configure-system tool if the network is not yet configured. All of the
network scripts and settings are automatically changed in the system.
Note: The network can be also changed by editing the ifcfg-* files under
/etc/sysconfig/network-scripts followed by an ifdown and ifup command for the
changed interface.
Note: To enable the IPMI console for the first time, reset the service processor after setting
the IPMI console password. Expand System Service Aids on the left panel and click Reset
Service Processor.
To connect to the server the first time you have two possibilities:
1. Connect with IPMI using an IP address issued by a DHCP server
DHCP is the default network set-up for Power LC Systems. If you are using a DHCP
server and know the IP address that your system was assigned, continue powering on
your system with IPMI as described in chapter 2.1, “Host installation” on page 32.
If you do not know the IP address or plan to use a static IP address, you must connect to
your system by using a serial console session or using an ASCII terminal.
2. Connect using a serial console
If you are using a serial console, follow these steps:
a. Attach the Serial to RJ-45 cable to serial port on Power system.
b. Attach USB connection to USB port on either PC or notebook.
c. Open a terminal emulator program such as PuTTY or minicom.
First, we introduce Kimchi, a web-based management tool. Kimchi isolates the administrator
from the task of remembering command-line syntax. But, as seasoned administrators know,
sometimes it is best to study the commands and learn their capability to better understand
what happens in the graphical user interface (GUI). For that, we recommend a quick scan of
Chapter 4, “Managing guests from the command-line interface” on page 105.
Several improvements were made in Kimchi for this new PowerKVM release:
Virtual NIC hot-plug support
Upload file to storage pool
Make template defaults configurable through a file
Guest pause/resume support
Support to edit guest MAC address
Allow user changes guest disk format on template level
Create guests asynchronously
Bugfixes
Kimchi uses Pluggable Authentication Modules (PAMs) for user authentication. It means that
any user account, registered on the host, is able to access Kimchi including the root account.
Users with no administration privileges logged in Kimchi are allowed to read only. However,
the administrator can authorize users to access their guests. Read 3.6.2, “Guest
management” on page 86 to know how to configure a guest.
Note: We suggest avoiding the use of the root account on Kimchi and creating specific
accounts that provide the needed administration privileges.
Using the same menu, it is possible to find out the installed Kimchi version and to safely log
out of Kimchi.
Figure 3-5 on page 65 shows the host system statistic in an easy-to-read graphic pane with
the following fields:
CPU
Memory
Disk I/O
Network I/O
Figure 3-6 shows the software update system and repository manager. By using the software
update system, it is possible to know all available packages to update. Click Update All to
update the whole system with a single click.
The repository manager allows any repository installed in the system to be enabled, disabled,
or removed. It is also possible to add a new repository and to edit an existing one.
Based on Figure 3-6 on page 65, click Add, type a unique identifier in the Identifier field,
complete the Name field, type the repository path in the URL field, select Repository is a
mirror if necessary, then, click Add.
Kimchi checks whether the given repository is valid before adding it. Note that yum variables
can be used. Kimchi knows how to expand them.
Figure 3-8 shows how to generate an SOS debug report by using Kimchi Debug Reports.
Click Generate to open the debug report dialog box.
Figure 3-10 shows all debug reports listed in Kimchi. It is possible to rename, remove, and
download the report.
Click Add in the Storage tab, as shown in Figure 3-11. Type the name in the Storage Pool
Name field, select DIR as the Storage Pool Type, enter the path in the Storage Path field, and
then, click Create.
Type the name of the storage pool in the Storage Pool Name field, select NFS as the
Storage Pool Type, enter the NFS Server IP field, enter the remote directory in the NFS Path
field, and click Create.
To create the iSCSI storage pool, enter the Storage Pool Name field, select iSCSI as the
Storage Pool Type, enter the IP address in the iSCSI Server field, enter the iSCSI target in the
Target field, and click Create.
To use authentication, select Add iSCSI Authentication, and type the user name and
password on the respective fields before clicking Create.
Figure 3-17 shows a template that uses an iSCSI volume. In contrast to other storage pools,
when an iSCSI and Fibre Channel storage pool are used, the guest is created by using a
template that points to that specific volume.
To create an LVM storage pool, enter the name in the Storage Pool Name field, select
LOGICAL as the Storage Pool Type, select the physical devices for the volume group, and
then click Create.
3.4 Network
In Kimchi, you can create a NAT network, a bridged network, or an isolated network. Section
6.2, “Network virtualization” on page 164, describes the differences between NAT and bridge
networks.
Figure 3-20 shows the Network tab in Kimchi, where a default NAT network can be found.
New networks can be created by clicking Add.
Note: A new guest cannot be created by using a template configured with a network that is
stopped. Make sure that the network is started when creating and running guests.
Note: The network interface used as the destination of a bridge must be configured and
set up on the host.
3.5 Templates
In Kimchi, a template is a set of basic parameters necessary to create new guests. It is
designed to store configuration details that multiple guests, having that configuration in
common, can be created efficiently. Some of the parameters contained in a template are
listed:
Path to the operating system (local or remote)
Number of CPUs
CPU topology
Memory size
Disk size
Storage pool to be used
Networks to be used
Figure 3-26 shows the media source dialog box. Click Local ISO Image.
Figure 3-27 shows the ISOs that are available. Pick the operating system and click Create
Templates from Selected ISO.
Figure 3-29 shows the Edit Template dialog box. In the General tab, it is possible to change
the template name, the amount of memory, and the graphics (currently only VNC is
supported).
Processor tab
Figure 3-32 shows the Processor tab. The number of virtual CPUs can be set directly in the
CPU Number field.
Based on Figure 3-25 on page 80, click Add for a new template and select Local Image File,
as shown in Figure 3-34.
When a new template is created, new guests created based on that template will start at the
same state where the original disk image was when the template was created.
3.6 Guests
Managing guests involve creating or editing guests from a template, and then starting or
stopping them as necessary.
In the action box, users can control the guest. The actions that are available depend on the
current guest state. When stopped, users can access a group of functions according to that
state. To start the guest system, click Start or select Actions, then select Start, as shown in
Figure 3-38 on page 87.
Figure 3-41 shows how to attach an existing disk image to a particular guest.
Note: MAC address will be chosen automatically if the MAC address field is left blank.
Figure 3-43 shows how to create a new interface. To persist the changes, click Save at the
right of the MAC address field. It is possible to undo the changes by clicking Undo.
Click the pen icon to change the MAC address of any interface. To delete a network interface,
click the trash can.
Permission tab
Figure 3-44 on page 90 shows the Permission tab, where the Kimchi administrator grants
permissions to users or groups of a particular guest. Such control allows users to access their
guest only. Select the user or the group from Available system users and groups and click
the right arrow. To undo the process, select the users from Selected system users and
groups, and click the left arrow.
Note: Refer to “User Management tool” on page 102 to know more about user roles in
Kimchi.
Note: The Host PCI Device tab does not list devices that are currently attached to another
guest. The device must be detached before becoming available.
Note: If a multifunction device is selected, all of its functions are automatically attached to
or detached from the guest.
Snapshot tab
Figure 3-46 on page 91 shows the Snapshot tab in Kimchi. Snapshot is an important feature
that stores the disk state in the moment the snapshot is taken. Multiple snapshots can be
taken by clicking Add.
Note: Reverting a guest to a particular snapshot loses any data modified after the
snapshot creation.
Click the Livetile box or select Actions and click Connect to open the guest window in a new
browser tab to install your OS.
Storage hotplug
It is possible to attach a disk image in a running guest. Example 3-1 displays all disks
attached in a particular guest.
Figure 3-48 shows how to add a storage device. Based on Figure 3-47 on page 91, click Edit,
then select the Storage tab, select the Device Type, then select the storage pool where the
image is allocated, select the image, and click Attach.
Example 3-2 lists all disks in a guest, including the attached disks.
Figure 3-49 shows how to add a new network interface. Click Add, select the Network, type
the MAC address in the MAC Address field or leave it blank for automatic fill, and click Save.
Example 3-4 shows the new interface listed by the ip addr command.
PCI hotplug
To hotplug a PCI device with Kimchi, open the Host PCI Device tab, click Add, and the device
will be automatically detached from the host to be attached to the guest.
Figure 3-50 shows how to attach a PCI device to the guest by simply clicking Add. After it is
attached, the button becomes a minus sign.
Example 3-6 shows the device PCI-E IPR SAS Adapter attached to the guest.
3.7.1 noVNC
noVNC can be accessed either by clicking the preview image on the Guests tab or by clicking
Actions and then clicking Connect.
That opens a new browser tab or window that shows the graphical interface, which enables
the user to interact with the guest machine.
Note: Your browser needs to allow pop-up windows to get a noVNC window. If your
browser blocks pop-up windows, you will not be able to see the noVNC window.
3.7.2 VNC
You can get a graphical display of a guest by using VNC. Each guest uses a different TCP
port, 5900+N, where N is the display number.
Note: Remember to open the VNC ports on the server firewall. The author suggests to set
a password for the VNC access.
The XML guest files also allow to add a keymap attribute, as shown in Example 3-9 for a
German keyboard, but this might not work for all keys.
Supported key maps can be found under /usr/share/qemu/keymaps in the PowerKVM host.
Note: Accessing the guest using the virsh console or a network connection is not an issue
regarding non-US keyboards.
3.8 Ginger
Ginger is a host management plug-in for Kimchi, thus sharing the same user experience. In
PowerKVM, Ginger is installed by default and can be accessed by clicking the Administration
tab in Kimchi. All Ginger administration tools are listed below:
Firmware Update
Configuration Backup
Network Configuration
Power Options
SAN Adapters
Sensor Monitor
SEP Configuration
User Management
Firmware Update
Figure 3-53 shows how to update the firmware using Ginger. On the Administration tab, click
Firmware Update. Enter the path to the firmware in the Package Path field, and click Update.
Note: Ginger does not show detailed information if any error event happens. For a more
verbose firmware update, use the update_flash or ipmitool command directly.
Configuration Backup
This tool is designed to back up the host configuration files.
Figure 3-55 shows the configuration backup tool. Click Generate Default Backup to have the
following directories backed up automatically:
/etc
/var/spool/cron
Note: Generate Default Backup excludes /etc/init.d, /etc/rc.d, and /etc/rcN.d from
the backup package.
Figure 3-57 shows how to delete old backups. It is possible to preserve a latest number of
backups or to preserve backups created after a given date.
Figure 3-60 shows how to change a profile. Click default for instance, then click Activate and
the system will be tuned to that particular profile.
SEP Configuration
IBM Service Event Provider (SEP) is a service installed in PowerKVM to identify hardware
problems and send reports to registered listeners.
Figure 3-65 shows all listeners registered to listen to SNMP traps. It is possible to remove any
listener by clicking the trash can icon.
Figure 3-67 on page 103 shows how to add a new user on the host system.
Enter the user name in the User Name field, then enter the password and the password
confirmation in the Password and Confirm Password fields respectively, select Use Other to
edit the Group field if necessary, select the user Profile, and then click Submit.
The profile determines the user authorization level on the host system:
Kimchi User: A user created to access Kimchi only, that user cannot access the system.
Virt User: A regular user added in the KVM group.
Administrator: A user added in KVM group with administration privileges.
virsh # list
Id Name State
----------------------------------------------------
8 MyGuest running
It is also possible to log in to PowerKVM through a Secure Shell (SSH) and run virsh directly
on the target host.
virsh # list
Id Name State
----------------------------------------------------
8 MyGuest running
# virsh pool-list
Name State Autostart
-------------------------------------------
default active no
ISO active no
MyPool active no
NFS:
# mkdir /MyPoolNFS
# virsh pool-create-as MyPoolNFS netfs \
--source-host=9.57.139.73 \
--source-path=/var/www/powerkvm \
--target=/MyPoolNFS
Pool MyPoolNFS created
# virsh pool-list
Name State Autostart
-------------------------------------------
default active no
ISO active no
MyPool active no
MyPoolNFS active no
Note: The LVM2-based pool must be created with the pool-define-as command and later
built and activated.
# virsh pool-list
Name State Autostart
-------------------------------------------
default active no
ISO active no
MyPoolLVM active no
# vgs
VG #PV #LV #SN Attr VSize VFree
MyPoolLVM 1 0 0 wz--n- 931.51g 931.51g
iSCSI:
# virsh pool-list
Name State Autostart
-------------------------------------------
default active no
ISO active no
MyPoolISCSI active no
To list details of a specific storage pool, run pool-info, as in Example 4-8 on page 111.
Tip: To avoid using two commands, you can use the --details flag shown in Example 4-9.
In this example, a 30 GB volume named mypool.qcow2 is created in the pool named MyPool.
The --allocation argument instructs libvirt to allocate only 4 GB. The rest will be allocated
on demand. This is sometime referred to as a thin volume.
If you need a more verbose output, use the --details flag as shown in Example 4-13.
The wiped volumes are empty and can be reused for another guest.
To remove a volume completely, use the vol-delete command, as shown in Example 4-15.
Note: This command deletes the volume and therefore the data on the volume.
4.2.6 Snapshots
Snapshots save the current machine state (disk, memory, and device states) to be used later.
It is specially interesting if a user is going to perform actions that can destroy data. In this
case, a snapshot taken before that destructive operation can be reverted and the guest will
continue working from that point.
Note: virsh snapshot-revert loses all changes made in the guest since the snapshot was
taken.
When MyNewSnapshot was deleted, its content was merged into NewChild. Figure 4-1
illustrates what happens to a child when its parent snapshot is deleted.
merge
new parent
The hypervisor network configuration is also described as an XML file. The files that describe
the host network configuration are stored in the /var/lib/libvirt/network directory.
The default configuration that is based on Network Address Translation (NAT) is called the
libvirt default network. After the PowerKVM installation, the default network is automatically
created and available for use.
When libvirt adds a network definition based on a bridge, the bridge is created
automatically. You can see it by using the brctl show command, as shown in Example 4-17.
To list detailed information about a specific network entry, use the net-info command, as
shown in Example 4-19.
In the next examples, we use the uuidgen command to create unique identifiers for those
devices. However, libvirt can generate it automatically if the uuid parameter is missed.
# cat nat.xml
<network>
<name>MyNat</name>
<uuid>a2c0da29-4e9c-452a-9b06-2bbe9d8f8f65</uuid>
<forward mode='nat'/>
<bridge name='myvirbr0' stp='on' delay='0'/>
<ip address='192.168.133.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.133.2' end='192.168.133.50'/>
</dhcp>
</ip>
</network>
# virsh net-list
Name State Autostart Persistent
----------------------------------------------------------
bridge active yes yes
default active yes yes
kop active yes yes
MyNat active no yes
# cat brdg.xml
<network>
<name>MyBridge</name>
<uuid>8ee4536b-c4d3-4e3e-a139-6108f3c2d5f5</uuid>
<forward dev='enP1p12s0f0' mode='bridge'>
<interface dev='enP1p12s0f0'/>
</forward>
</network>
# virsh net-list
Name State Autostart Persistent
----------------------------------------------------------
bridge active yes yes
default active yes yes
kop active yes yes
MyBridge active no yes
MyNat active no yes
# cat ovs.xml
<network>
<name>MyOVSBr</name>
<forward mode='bridge'/>
<bridge name='myOVS'/>
<virtualport type='openvswitch'/>
<portgroup name='default' default='yes'>
</portgroup>
</network>
# virsh net-list
Name State Autostart Persistent
----------------------------------------------------------
bridge active yes yes
default active yes yes
kop active yes yes
MyBridge active no yes
MyNat active no yes
MyOVSBr active no yes
Example 4-23 shows what the guest network interface looks like when using the Open
vSwitch bridge.
Each option has a subset of options. Only the most important are covered in this chapter.
The disk argument has the following suboptions, which are also shown in Example 4-25.
pool Specifies the pool name where you are provisioning the volume
size Specifies the volume size in GB
format Format of resulting image (raw | qcow2)
bus Bus used by the block device
Example 4-26 shows an example of attaching a volume from a Fibre Channel or iSCSI pool.
The virt-install command automatically starts the installation and attaches the console.
To power off the guest, use the destroy or the shutdown command as in Example 4-31. The
shutdown command interacts with the guest operating system to shut down the system
gracefully. This operation can take some time because all services must be stopped. The
destroy command shutdown the guest immediately. It can damage the guest operating
system.
Example 4-32 shows the state of the guest before it is suspended, after it is suspended, and
after it is resumed.
# virsh list
Id Name State
----------------------------------------------------
60 PowerKVM_VirtualMachine paused
# virsh list
Id Name State
----------------------------------------------------
60 PowerKVM_VirtualMachine running
Note: To detach an open console, hold down the Ctrl key and press the ] key.
It is also possible to start a guest and attach to the console by using one command:
Example 4-35 shows how to edit a guest. Regardless of using a common editor, virsh verifies
any possible mistake to avoid guest damage.
After quitting the editor, virsh checks the file for errors and returns one of the following
messages:
Domain guest XML configuration not changed
The default text editor used by virsh is vi, but it is possible to use any other editor by setting
the EDITOR shell variable. Example 4-36 shows how to configure a different text editor for
virsh.
# cat disk.xml
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/mynewdisk.qcow2'/>
<target dev='vdb' bus='virtio'/>
</disk>
The --config parameter persists the attached device in the guest XML. It means that the disk
lasts after reboots and, if the device is attached while the guest is running, the device will only
be attached after a reboot. To unplug, run the following command:
# virsh detach-device PowerKVM_VirtualMachine disk.xml --config
Example 4-38 shows how to perform a disk hotplug. By using the same disk created in
Example 4-37, start the guest and run.
The --live parameter attaches the device into a running guest. That will be automatically
detached as soon as the guest is turned off.
Note: If the guest is running, both the --live and --config parameters can be used
together.
Example 4-40 shows how to hotplug a network interface and how to unplug it.
In Example 4-42, we create the XML with the host PCI address information. This is necessary
to attach the device to the guest. In this example, we choose the device 0001:0b:00.0.
After creating the XML, it is necessary to detach the device from the host. This is achieved by
running virsh nodedev-detach. Then, the virsh attach-device command can be used.
As mentioned, the --config parameter means the command has effect after the guest
rebooting. For a live action, the --live parameter must be set. Figure 4-2 shows how to
hotplug a PCI device to a guest.
Guest
# lspci
00:01.0 Ethernet controller: Red Hat, Inc Virtio
network device
00:02.0 USB controller: Apple Inc.
KeyLargo/Intrepid USB
00:03.0 Unclassified device [00ff]: Red Hat, Inc
Virtio memory balloon
00:04.0 SCSI storage controller: Red Hat, Inc
Virtio block device
Host
# virsh attach-device \ # virsh detach-device \
PowerKVM_VirtualMachine \ PowerKVM_VirtualMachine \
pci.xml \ pci.xml \
--live --live
Device attached successfully Device detached successfully
# lspci
00:01.0 Ethernet controller: Red Hat, Inc Virtio
network device Host
00:02.0 USB controller: Apple Inc.
KeyLargo/Intrepid USB
00:03.0 Unclassified device [00ff]: Red Hat, Inc
Virtio memory balloon
00:04.0 SCSI storage controller: Red Hat, Inc
Virtio block device
00:05.0 VGA compatible controller: ASPEED
Technology, Inc. ASPEED Graphics Family (rev 30)
Guest
Figure 4-2 Interaction between host and guest during PCI hotplug
When the device is not in use by any guest, it can be reattached to the host by calling the
following command:
# virsh nodedev-reattach pci_0001_0b_00_0
Device pci_0001_0b_00_0 re-attached
When thinking about multi-function PCI pass-through, some rules must be observed:
All functions must be detached from the host
All functions must be attached to the same guest
Note: The first function definition requires the multifunction=‘on’ parameter in the guest
PCI address.
# cat multif1.xml
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x0c' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
</hostdev>
# cat multif2.xml
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x0c' slot='0x00' function='0x2'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
</hostdev>
# cat multif3.xml
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x0c' slot='0x00' function='0x3'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x3'/>
</hostdev>
Example 4-44 shows how the multifunction device is displayed in the guest.
# lspci
00:01.0 Ethernet controller: Red Hat, Inc Virtio network device
00:02.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
00:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
00:05.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit
Ethernet PCIe (rev 01)
00:05.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit
Ethernet PCIe (rev 01)
00:05.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit
Ethernet PCIe (rev 01)
00:05.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit
Ethernet PCIe (rev 01)
Example 4-45 shows how to reattach the multifunction device to the host.
4.4.14 Migration
Refer to section 7.2, “Guest migration” on page 198.
This chapter covers CPU and memory on IBM PowerKVM and includes the following topics:
Resources overcommitment
CPU compatibility mode
SMT support
Dynamic and static Micro-Threading mode
CPU pinning
CPU shares
NUMA
Huge pages
CPU and memory hotplug
Chapter 6, “I/O virtualization” on page 163 covers the I/O subsystem, which includes
networking and storage.
In the beginning of CPU virtualization, most of the instructions that ran on the virtual CPU
were emulated. But with recent virtualization technologies, most of the guest instructions run
directly on the physical CPU, which avoids the translation overhead.
The different ways to virtualize CPUs are covered in the sections that follow.
Full virtualization
In full virtualization mode, the guest operating system runs inside the virtual machine and
does not know that it is running in a virtualized environment. This means that the guest
operating system has instructions to run on real hardware, so the hypervisor needs to
emulate the real hardware.
In this mode, the hypervisor emulates the full hardware, such as registers, timing, and
hardware limitations. The guest operating system thinks it is interacting with real hardware.
However, emulation is complex and inefficient.
Paravirtualization
In paravirtualization, the guest operating system knows that it is running inside a virtual
machine, so it helps the hypervisor whenever possible. The advantage is the better
performance of the virtual machine, mainly because the communication between hypervisor
and guest can be shortened, which reduces overhead. With PowerKVM, all of the supported
guests can run in paravirtualized mode.
Much of the paravirtualization optimization happens when the virtual machine operating
system (OS) needs to do input and output (I/O) operations, which are processed by the
hypervisor. One example is when the guest operating system needs to send a network packet
outside of the server. When the guest OS sends the packet in full virtualization mode, it
operates in the same way that it would when interacting with a physical NIC, using the same
memory space, interruptions, and so on.
However, when the guest uses the paravirtualization approach, the guest operating system
knows it is virtualized and knows that the guest I/O will arrive in a hypervisor (not on a
physical hardware), and it cooperates with the hypervisor. This cooperation is what provides
most of the performance benefits of paravirtualization.
In the context of KVM, this set of device drivers are called Virtio device drivers (see 1.3.11,
“Virtio drivers” on page 21). There is a set of paravirtualized device drivers used initially on
IBM PowerVM that is also supported on PowerKVM, including ibmveth, ibmvscsi, and others.
IBM Power Systems introduced virtualization assistance hardware with the POWER5 family
of servers. At that time, Power Systems did much of the assistance by cooperating with the
hypervisor for certain functions, such as fast page movement, micropartitioning, and
Micro-Threading.
For example, Figure 5-1 shows a hypervisor with four CPUs that is hosting two virtual
machines (VMs) that are using three vCPUs each. This means that the guest operating
system can use up to three CPUs if another VM is not using more than one CPU.
If the vCPU gets 100% used at a time, the hypervisor will multiplex the vCPU in the real CPU
according to the hypervisor policies.
vCPU vCPU
vCPU vCPU
vCPU vCPU
Hypervisor
To enable POWER7 compatibility mode, add or edit the XML element in the domain element
of the guest XML configuration file, as shown in Example 5-1.
Example 5-2 shows how to verify the compatibility mode inside the guest. In this case, for
POWER7.
processor : 1
cpu : POWER7 (architected), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 2
cpu : POWER7 (architected), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 3
cpu : POWER7 (architected), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
To enable POWER6 compatibility mode, add or edit the XML element shown in Example 5-3
on the domain element of the guest XML configuration file.
PowerKVM disables SMT in the hypervisor during the boot. Each virtual machine that needs
to use the SMT feature should enable it in the virtual machine configuration.
To check whether the SMT is disabled on the cores, run the ppc64_cpu command with the
--smt or --info parameter. The ppc64_cpu --info command shows the output of the CPUs,
marking the threads for each CPU that are enabled with an asterisk (*) near the thread.
Example 5-4 shows that in a six-core machine, only one thread per CPU is enabled.
If you want to start the VM using SMT, it needs to specify that manually. For example, if you
want to use only one core with SMT 8, the machine should be assigned with eight vCPUs,
which will use just one core and eight threads, as covered in “SMT on the guests” on
page 136.
To enable SMT support on a guest, the XML configuration file needs to set the number of
threads per core. This number must be a power of 2, that is: 1, 2, 4, or 8. The number of
vCPUs must also be the product of the number of threads per core and the number of cores.
Example 5-6 shows the CPU information for the guest defined in Example 5-5. The guest is
running with four threads per core and two cores. The example includes the information with
SMT enabled and disabled.
# cat /proc/cpuinfo
processor : 0
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 1
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 2
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 3
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 4
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 5
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 6
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 7
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 4
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
# ppc64_cpu --smt=off
# cat /proc/cpuinfo
processor : 0
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
processor : 4
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0,4
Off-line CPU(s) list: 1-3,5-7
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0,4
Table 5-1 shows the relation between the number of vCPU in guests, according to the number
of sockets, cores, and threads configured in the guest XML definition in libvirt.
Table 5-1 The relation between vCPU, cores, and threads on guest configuration
vCPU Cores SMT Guest XML definition
5.3.3 Micro-Threading
Micro-Threading is an IBM POWER8 feature that enables each POWER8 core to be split into
two or four subcores. Each subcore has also a limited number of threads, as listed in
Table 5-2.
2 1, 2, 4
4 1, 2
This type of configuration provides performance advantages for some types of workloads.
Figure 5-2 Example of a POWER8 core with four subcores and two threads each subcore
Another way to demonstrate how Micro-Threading works is defining a scenario where a user
wants to start four virtual machines on a single core. You can start it without using
Micro-Threading or with Micro-Threading enabled.
Figure 5-3 shows that four virtual machines are running in the same core, and each VM can
access up to eight threads. The core switches among the four virtual machines, and each
virtual machine runs only about one-fourth of the time. This indicates that the CPU is
overcommitted.
IBMPower8 Core
(Micro threading disabled)
Thread 1 SSub
Thread 2 SSub
Thread 3 SSub
Thread 4
VM 1 VM 2 SSub
VM 3 VM 4 VM 1 VM 2
Thread 5 SSub
Thread 6 Process switch Process switch SSub
Process switch Process switch Process switch
Thread 7 SSub
Thread 8 SSub
Figure 5-3 Four virtual machines running in a single core without Micro-Threading enabled
Thread 3
Sub core 2
Thread 4
VM 2
Thread 5
Sub core 3
Thread 6
VM 3
Thread 7
Sub core 4
Thread 8
VM 4
Figure 5-4 Four virtual machines running in a single core with Micro-Threading enabled
Micro-Threading benefits:
Better CPU resources use
More virtual machines per core
Micro-Threading limitations:
SMT limited to 2 or 4 depending on the number of subcores
Guests in single thread (SMT 1) mode cannot use the full core
Dynamic Micro-Threading
PowerKVM V3.1 introduces dynamic Micro-Threading, which is enabled by default. Dynamic
Micro-Threading allows virtual processors from several guests to run concurrently on the
processor core. The processor core is split on guest entry and then made whole again on
guest exit.
If the static Micro-Threading mode is set to anything other than whole core (in other words,
set to 2 or 4 subcores) as described in “Enabling static Micro-Threading on the PowerKVM
hypervisor” on page 139, dynamic Micro-Threading is disabled.
Along with dynamic Micro-Threading, PowerKVM V3.1 also implements a related feature
called subcore sharing. Subcore sharing allows multiple virtual CPUs from the same guest to
run concurrently on one subcore. Subcore sharing applies only to guests that are running in
SMT 1 (whole core) mode and to virtual CPUs in the same guest. It applies in any
Micro-Threading mode (static or dynamic).
Dynamic Micro-Threading can be also disabled or restricted to a mode that allows the core
only to be dynamically split into two subcores or four subcores. This can be done by using the
dynamic_mt_modes parameter.
To verify that the machine has Micro-Threading enabled, use the ppc64_cpu command and
show the CPUs information with the --info parameters. Example 5-8 on page 140 shows the
output of the ppc64_cpu command, displaying that the server has six cores and each core has
four subcores.
Note: If Micro-Threading is turned on with four subcores, and a guest is started that uses
more than two threads, this results in the error Cannot support more than 2 threads on
PPC with KVM. A four-thread configuration would be possible by activating
Micro-Threading with only two subcores.
To verify that the Micro-Threading feature is disabled, check with the ppc64_cpu --info
command, as shown previously in Example 5-4 on page 133.
A guest’s NUMA environment is defined in the CPU section of the domain in the XML file.
Example 5-9 shows an environment of a system with two sockets and four cores in each
socket. The guest should run in SMT8 mode. The NUMA section shows that the first 32
vCPUs (0 - 31) should be in NUMA cell 0 and the other 32 vCPUs (32 - 63) will be assigned to
NUMA cell 1. The tag current=’8’ in the vCPU section makes sure that the guest will start
with only eight vCPUs, which is one core with eight threads. More CPUs can be later added
using CPU Hotplug as described in 5.4, “CPU Hotplug” on page 145.
For the memory part of the guest, the XML file as shown in Example 5-9 defines that each
cell should have 4 GB memory, equally spread over the two NUMA cells. The sum of the
memory in the cells is also the maximum memory stated by the memory tag. If you try to set
the maximum memory higher than the sum of the cells, PowerKVM automatically adjusts the
maximum memory to the sum of the cells. Nevertheless, there is a possibility to have a higher
maximum as the sum of memory in the cells by adding (virtual) dual inline memory modules
(DIMMs) to the NUMA cells, as described in “Memory Hotplug in a NUMA configuration” on
page 159.
To verify the result inside the guests, the lscpu and numactl commands can be used as
shown in Example 5-10.
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 4096 MB
node 0 free: 2329 MB
node 1 cpus:
node 1 size: 4096 MB
node 1 free: 4055 MB
node distances:
node 0 1
0: 10 10
1: 10 10
The advantage of pinning is that it can improve data locality. Two threads on the same core
using the same data are able to share it on a local cache. The same thing happens for two
cores on the same NUMA node.
Example 5-11 shows a configuration with four vCPUs without SMT turned on (SMT=1), where
the four vCPUs are pinned to the first four cores in the first socket of the host.
If the topology fits the system layout, for example within a Power System S812L with two
physical sockets and six cores in each socket, this configuration makes sure that this guest
only runs in the first socket of the system.
With SMT turned on in the guest, pinning CPUs works the same way, as SMT is not activated
on the host. In an example with SMT 4, the first four guest vCPUs are mapped to threads 0, 1,
2, and 3 of the core 0 on the host. The second four guest vCPUs are mapped to threads 8, 9,
10, and 11 of the core 1 on the host, and so on. Example 5-13 shows the same configuration
as in the previous example but with SMT 4.
Note: All threads of a core must be running on the same physical core. It is not supported
to activate SMT on the PowerKVM host and pin single threads to different cores.
CPU pinning can be also used with subcores, which is explained in detail in 5.3.3,
“Micro-Threading” on page 136. Also, in this case the pinning works in the same manner. In
Example 5-14 on page 144, a guest using four subcores with two threads each is pinned to
the first physical core.
The Linux scheduler spreads the vCPUs among the CPU cores. However, when there is
overcommitment, multiple vCPUs can share a CPU core. To balance the amount of time that
a virtual machine has compared to another virtual machine, you can configure shares.
Example 5-15 demonstrates how to configure the relative share time for a guest. By default,
guests have a relative share time of 1024. Two guests with share time of 1024 shares the
CPU for the same amount of time. If a third guest has a share time of 256, it runs a quarter of
the time, relative to the other guests. A guest with a share time of 2048 runs twice the time
compared to the other guests.
powerpc-utils 1.2.26
ppc64-diag 2.6.8
librtas 1.3.9
The addition or removal of CPUs is done on a per socket basis as defined in the CPU section
in the guests XML file. A socket in that sense is not necessarily a physical socket of the Power
System. It is just a virtual definition.
Before you start a hotplug operation, ensure that the rtas_errd daemon is running inside the
guest:
The following examples were created on a Power System S812L with six cores on two
sockets, giving a total of 12 cores in the system. The XML file of the guest system contains
the following configuration as written in Example 5-16.
Example 5-16 Base definition of sockets, cores, and threads for CPU Hotplug
<vcpu placement=’static’ current=’8’>96</vcpu>
...
<cpu>
<topology sockets=’12’ cores=’1’ threads=’8’/>
</cpu>
In Example 5-16, we defined a guest with 12 sockets, each with one core and eight threads,
giving a total of 96 vCPUs. The guest will start with eight vCPUs, which is one socket with one
CPU and eight threads, as defined with the attribute current in the vcpu section. From a CPU
Hotplug perspective, the guest can be increased in steps of eight vCPUs up to 96 vCPUs (12
cores with eight threads).
Note: spapr-cpu-socket stands for Server IBM Power Architecture® Platform Reference
CPU socket.
This snippet can be attached to the running guest with a virsh attach-device command as
described in Example 5-18.
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-15
Note: A persistent attachment of CPUs in the XML file by using the --config attribute is
not supported.
Example 5-19 continues Example 5-18 on page 146 by changing the SMT mode to 4 and
adding another socket.
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-3,8-11
Off-line CPU(s) list: 4-7,12-15
Thread(s) per core: 4
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-3,8-11
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-3,8-11,16-23
Off-line CPU(s) list: 4-7,12-15
Thread(s) per core: 5
Core(s) per socket: 1
Socket(s): 3
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-3,8-11,16-23
Removing sockets using CPU Hotplug is also supported. To remove sockets, the same
snippets are needed. The snippets must be applied using virsh detach-device in the
opposite direction as the addition of the sockets. It is not possible to remove a lower sequence
number before a higher sequence number. Example 5-20 shows the removal of one socket
continuing the previous example.
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-3,8-11
Off-line CPU(s) list: 4-7,12-15
Thread(s) per core: 4
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-3,8-11
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s):
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 3
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15
No removal of sockets that were present at the time of starting the guest
Only added Hotplug sockets can be removed by using a Hotplug action. If for example the
guest was started with two sockets (as defined in the XML definition), and one socket should
be removed by using virsh detach-device, this results in an error.
5.5 Memory
With virtualization, the memory is basically static, which means it is not virtualized like the
CPU, and a block of memory is mapped directly to a single (and just one) virtual machine.
Because each virtual machine is also a hypervisor Linux thread, the memory can be
overcommitted.
Example 5-23 shows the configuration for the maximum amount of memory allocated to the
guest on the memory element and the current amount of memory on the currentMemory
element. Since PowerKVM V3.1, it is possible to also increase the memory across the
maximum amount by using memory hotplug as described in 5.6, “Memory Hotplug” on
page 157.
Note: On the guest, you might notice that there is a total amount of memory that is less
than what is set as the current amount. This might happen because the guest subcore has
reserved an amount of memory for some reason. One example is the crashkernel
command, which is used for a kernel dump.
When memory ballooning is enabled on the guest, the hypervisor can remove and add
memory to the guest dynamically.
This technique can be used if the memory should be overcommitted, which means assigning
the guests in sum more memory that the system provides. In case a guest needs more
memory and another guest needs less memory at the same time, the memory is used more
efficiently. But if all guests need their assigned overcommitted memory, this can cause a bad
performance because in that case the host starts to swap pages to disk.
When a guest is configured to support ballooning, the memory can be added and removed
from the virtual machine using the virsh setmem linux-guest command. The total memory
allocated to the virtual machine can be seen with the virsh dommemstat command.
Note: If the virtual machine or the guest operating system is not configured properly to
support virtio ballooning, the following message displays on the hypervisor:
Monitoring
To check whether the memory ballooning is working on the guest, you can check with the
QEMU monitor that is running the command, as shown in Example 5-26. If the balloon is not
available in the virtual machine, the output is “Device balloon has not been activated.”
To change the amount of memory in the guest, the ‘balloon <memory in MB>’ command is
used, as in Example 5-27, that changes the memory from 3559 MB to 1024 MB. After this
command, only 1024 MB of memory is available to the guest.
Note: Most of the operating systems have virtio-balloon embedded into the kernel. If
you are using an operating system that does not have the virtio-balloon device driver in the
kernel, you need to install it manually.
KSM technology can detect that two virtual machines have identical memory pages. In that
case, it merges both pages in the same physical memory page, which reduces that amount of
memory use. To do so, a certain number of CPU cycles is used to scan and spot these pages.
For example, Figure 5-5 shows that all three virtual machines have pages that contain the
same content. In this case, when KSM is enabled, all four pages that contain the same
content will use only one physical memory block.
VM 1 VM 2 VM 3
Virtual Memory pages
Hypervisor logical
memory map
Physical memory
blocks
There is a similar feature found in the PowerVM hypervisor, called Active Memory
Deduplication. For more information about this feature, see “Power Systems Memory
Deduplication, REDP-4827.”
To verify whether KSM is running and to enable and disable it, you need to interact with the
/sys/kernel/mm/ksm/run file.
Example 5-29 shows that KSM is disabled and how to enable it.
Example 5-29 Enable KSM in PowerKVM
# cat /sys/kernel/mm/ksm/run
0
# echo 1 > /sys/kernel/mm/ksm/run
# cat /sys/kernel/mm/ksm/run
1
Monitoring KSM
To monitor the pages being merged by KSM, check the /sys/kernel/mm/ksm files. The
subsections that follow explain some of the status files.
Pages shared
The /sys/kernel/mm/ksm/pages_shared file shows how many merged pages exist in the
system. Example 5-30 shows that 2976 pages are shared by two or more virtual machines in
the system.
Pages sharing
The /sys/kernel/mm/ksm/pages_sharing file shows how many pages on the virtual machines
are using a page that is shared and merged in the hypervisor. Example 5-31 shows the
number of pages in the virtual machines that are linked to a shared page in the hypervisor.
Looking at both of the previous examples, you see that 6824 virtual pages are using 2976
physical pages, which means that 3848 pages are saved. Considering 64 KB pages, this
means that approximately 246 MB of memory was saved by using this feature.
/sys/kernel/mm/ksm Description
options
pages_unshared How many pages are candidates to be shared but are not shared at
the moment
pages_volatile The number of pages that are candidates to be shared but are being
changed so frequently that they will not be merged
full_scans How many times the KSM scanned the pages looking for duplicated
content
merge_across_nodes Option to enable merges across NUMA nodes (disable it for better
performance)
pages_to_scan How many pages the KSM algorithm scans per turn before sleeping
sleep_milisecs How many milliseconds ksmd should sleep before the next scan
On IBM PowerKVM, a guest must have its memory backed by huge pages for the guest to be
able to use it. You need to enable huge pages on the host and configure the guest to use
huge pages before you start it.
Example 5-32 demonstrates how to enable huge pages on the host. Run the command on a
host shell. The number of pages to use depends on the total amount of memory for guests
that are backed by huge pages. In this example, 4 GB of memory is reserved for huge pages
(256 pages with 16384 KB each).
Example 5-33 shows an excerpt from an XML configuration file for a guest, demonstrating
how to enable huge pages. The memoryBacking element must be inside the domain element
of the XML configuration file.
Example 5-35 presents the output of a command on the PowerKVM host that shows how
many pages have been allocated on every node before restricting the guest to only one
NUMA node.
Example 5-35 Memory allocation to NUMA nodes before restricting it to one node
# cat
/sys/fs/cgroup/memory/machine.slice/machine-qemu\x2dlinux-guest\x2d1.scope/memory.numa_stat
total=27375 N0=23449 N1=3926
file=0 N0=0 N1=0
anon=27375 N0=23449 N1=3926
unevictable=0 N0=0 N1=0
hierarchical_total=27375 N0=23449 N1=3926
hierarchical_file=0 N0=0 N1=0
hierarchical_anon=27375 N0=23449 N1=3926
hierarchical_unevictable=0 N0=0 N1=0
The output shows that most of the memory is assigned to NUMA node 0 (N0) but some
memory to NUMA node 1 (N1).
Note: The path in the command contains the name of the guest (in Example 5-35
linux-guest) and is only available when the guest is running.
Note: To find out how many nodes a system contains, use the numactl -H command. An
example output is contained in Example 5-42 on page 159.
After restarting the guest and if the system has enough free memory on NUMA node 0, the
command lists that all memory now fits into NUMA node 0 as shown in Example 5-37 on
page 157.
Note: The number of memory pages shown here is used pages by the guest. Therefore,
the number changes over time.
Only adding of memory is supported. It is not possible to remove DIMMs that were added
using memory hotplug. Memory Hotplug assigns contiguous chunks of memory to the guest.
By adding memory using memory ballooning this is not necessarily the case, which can result
in memory fragmentation. Although it is possible to also reduce the memory with memory
ballooning if the guest supports it, as described in 5.5.2, “Memory ballooning” on page 151.
Before using memory hotplug, ensure that the guest operating system has the required
packages installed as listed in Table 5-4 on page 145.
Like CPU Hotplug, a memory DIMM can be added by using an XML snippet that defines the
size of the DIMM that should be added. Example 5-38 shows a snipped for a DIMM of 4 GB.
Note: In comparison to CPU Hotplug, there is no sequence number needed. That means a
snippet can be used several times for one running guest.
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 3558 587 2378 19 592 2812
Swap: 1023 0 1023
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 5606 615 4367 19 623 4817
Swap: 1023 0 1023
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 9702 635 8442 19 625 8883
Swap: 1023 0 1023
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 3558 618 2315 19 625 2756
Swap: 1023 0 1023
Remember: It is not possible to remove the added DIMMs by using the memory hotplug
function.
Memory DIMMs can be also added persistently to the configuration of the guest by adding
--config to the attach command as shown in Example 5-40. The DIMMs are added into the
devices section of the guest XML.
This section describes additional options and possibilities used with memory hotplug.
Example 5-42 shows how to attach 1 GB of memory to just NUMA node 1 by using the
snippet as shown in Example 5-41.
[linux-guest]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 2048 MB
node 0 free: 1036 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 3072 MB
node 1 free: 2166 MB
node distances:
node 0 1
0: 10 40
1: 40 10
Also in a NUMA environment, the DIMMs can be added persistently by adding --config to
the virsh attach-device command. As a result, the DIMMs are added including the correct
cell (node) definition for the DIMMs as shown in Example 5-43. The example also shows that
in this case, the maximum memory of the guest is higher than the sum of memory defined in
the NUMA section of the XML file.
<cpu>
<topology sockets='2' cores='4' threads='8'/>
<numa>
<cell id='0' cpus='0-31' memory='4194304' unit='KiB'/>
<cell id='1' cpus='32-63' memory='4194034' unit='KiB'/>
</numa>
</cpu>
...
<device>
<memory model='dimm'>
<target>
<size unit='KiB'>1048576</size>
<node>1</node>
</target>
</memory>
</device>
Note: The SPICE graphical model is not supported for IBM PowerKVM V3.1.
For more information about I/O pass-through, see 6.4, “I/O pass-through” on page 170.
A network card can be assigned directly to the VM by using the I/O pass-through method.
These are the current network virtualization methods that are available:
User mode networking
Network address translation (NAT) networking
Bridged networking
PCI pass-through
Open vSwitch
See 3.4, “Network” on page 75 for how to create NAT and bridged networks by using Kimchi.
Considerations
The main consideration of this scenario is that the virtual machine is not visible from the
external network. For example, the virtual machine is able to access the Internet, but it will not
be able to host an external accessible web server.
A network bridge is an interface that connects other network segments that are described by
IEEE 802.1D standard.
A bridge is capable of passing Layer 2 packets using the attached network interfaces.
Because the packet forwarding works on Layer 2, any Layer 3 protocol works transparently.
Hypervisor Layer
Bridge interface
Network port
Host machine
To manage the bridge architecture on the host, use the brcrtl command. For example, to
create a bridge network interface, execute the following command:
# brctl addbr <example_bridge_device>
If you already created a bridge with linux-bridge-tools, you can’t reuse that bridge. The
bridge must be re-created by using ovs tools.
It is required to have Open vSwitch service up and running before plugging virtual machines
into the Open vSwitch network. See 4.3, “Manage guest networks” on page 114 for
information about the initial configuration.
See the documentation on the Open vSwitch website for more information:
http://openvswitch.org
Storage pools can be logically divided in two categories: Block device and file-backed pools.
Note: You can’t create new volumes on iSCSI and SCSI through a libvirt API. Volumes
must be created manually on a target, instead.
VM 1 VM 2 VM 3
PowerKVM /images
Directory
Machine Filesystem
The file system can be backed by iSCSI, as shown in Figure 6-3 on page 169.
VM 1
KVM
iSCSI
Image
/exports/image
NFS NFS
KVM
KVM
HBA
Storage
You must prepare an XML file with a device description. There are two options available:
Specify the device by vendor, product pair, for example 058f:6387 (see Example 6-3).
Specify the device by bus, device pair, for example 001:003 (see Example 6-4).
Tips: We discovered during testing that vendor, product definition works only with a cold
plug (the virtual machine is in shut-off state). The bus, device combination works quite well
for a hot plug.
Example 6-3 USB XML description example based on vendor, product (IDs) pair
<hostdev type='usb'>
<source>
<vendor id='0x058f'/>
<product id='0x6387'>
</source>
</hostdev>
Example 6-4 USB XML description example based on bus, device pair
<hostdev type='usb'>
<source>
<address bus='1' device='3'/>
</source>
</hostdev>
Example 6-5 shows how to check what is connected to the virtual machine.
# lsusb
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 003: ID 058f:6387 Alcor Micro Corp. Transcend JetFlash Flash Drive
Note: To make a live change persistent, use the --persistent or --config option.
To detach a USB device, use the virsh detach-device command, as shown in Example 6-7.
# lsusb
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
sles11vm01:~ #
The next example shows passing through a Mellanox adapter. As you can see in
Example 6-10, the device has a 0001:01:00.0 PCI address, and virsh represents it as
pci_0001_01_00_0. To make sure that you have found the correct match, using virsh
nodedev-dumpxml pci_0001_01_00_0 provides more information about the device, as shown
in Example 6-10.
Next, the PCI adapter needs to be detached from a host system. You can use the virsh
nodedev-detach command to do that (see Example 6-11).
After the PCI adapter is detached from the system, the adapter needs to be described in a
virtual machine configuration. To put the adapter description in the <devices> section, edit the
virtual machine configuration by using the virsh edit command (see Example 6-12). Then,
save the file, and the machine is ready to be started.
Note: The status of the managed mode can be either “yes” or “no”.
yes: Libvirt will unbind the device from the existing driver, reset the device, and bind it to
pci-stub
no: You must take care about those aspects manually
Now, you are ready to start the guest with the assigned adapter. In Example 6-13, you can
see that the virtual machine detects the card correctly. Example 6-14 on page 175 shows that
the additional Ethernet interfaces are available.
It is also possible to limit the I/O throughput of a device with the <iotune> subelement of a
<disk> element. These are the options:
total_bytes_sec Total throughput limit in bytes per second
Before creating a virtual Fibre Channel adapter, first discover the adapters in your system, as
described in Example 6-17.
<path>/sys/devices/pci0001:00/0001:00:00.0/0001:01:00.0/0001:02:09.0/0001:09:00.0/
host1</path>
<parent>pci_0001_09_00_0</parent>
<capability type='scsi_host'>
<host>1</host>
<unique_id>0</unique_id>
<capability type='fc_host'>
<wwnn>20000120fa89ca40</wwnn>
<wwpn>10000090fa89ca40</wwpn>
<fabric_wwn>100000053345e69e</fabric_wwn>
</capability>
<capability type='vport_ops'>
<max_vports>255</max_vports>
<vports>0</vports>
</capability>
<path>/sys/devices/pci0001:00/0001:00:00.0/0001:01:00.0/0001:02:09.0/0001:09:00.1/
host2</path>
<parent>pci_0001_09_00_1</parent>
<capability type='scsi_host'>
<host>2</host>
<unique_id>1</unique_id>
<capability type='fc_host'>
<wwnn>20000120fa89ca41</wwnn>
<wwpn>10000090fa89ca41</wwpn>
<fabric_wwn>0</fabric_wwn>
</capability>
<capability type='vport_ops'>
<max_vports>255</max_vports>
<vports>0</vports>
</capability>
</capability>
</device>
In Example 6-17 on page 176, you can see that this system has two Fibre Channel ports (on
one 2-port adapter). Both ports are able to support up to 255 virtual ports or NPIV ports. In
the example, you also can see the worldwide port names (WWPNs) of the adapters and the
fabric worldwide names (WWNs). In this example, WWN 0 shows that the second port
(scsi_host2) has no connection to a fabric.
In the example above, we use the first port (scsi_host1) to create a virtual Fibre Channel port.
To create a virtual Fibre Channel adapter, you need an XML snippet that refers to the parent
adapter as shown in Example 6-18. The example also shows how to create the adapter using
virsh nodedev-create and what the attributes of the new adapter look like.
Example 6-18 Creating a virtual Fibre Channel adapter
# cat vfc.xml
<device>
<parent>scsi_host1</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
</capability>
</capability>
</device>
In Example 6-18 on page 177, you can also see the assigned WWPN for the virtual Fibre
Channel adapter and that it is connected to the same fabric as the parent adapter.
NPIV is a technology preview and not supported by Kimchi yet, but Kimchi also shows the
virtual Fibre Channel adapters as shown in Figure 6-6.
After zoning the new WWPN to your storage and assigning a disk to it, you can scan for new
disks by using the following command:
# rescan-scsi-bus.sh -a
Important: Use the flag -a, otherwise the new virtual Fibre Channel adapter will not be
scanned and the disk/LUN will not be added.
Example 6-19 on page 179 shows the new discovered disk. There are many ways to show
the new device, in this case, by using the virsh nodedev-list command. The new SCSI
device appears under scsi_host3, which is our just created virtual Fibre Channel adapter.
The new device appears twice because there are two paths to that device. For more
information about multipathing, see 6.6, “Using multipath disks” on page 182.
To make the adapter persistent, the preferred practice is to create a storage pool using this
new NPIV adapter. To create a storage pool, an XML snippet with the WWNN and WWPN is
needed. To get the two numbers, use the virsh nodedev-dumpxml command as explained in
Example 6-18 on page 177.
Example 6-20 shows the steps to be done in order to create the storage pool and make the
adapter persistent.
Example 6-20 Creating a storage pool for a virtual Fibre Channel adapter
# cat vfcpool.xml
<pool type='scsi'>
<name>vfcpool</name>
<source>
<adapter type='fc_host' wwnn='5001a4a3b41775fd' wwpn='5001a4a4b5723b96'/>
</source>
<target>
<path>/dev/disk/by-path</path>
<permissions>
<mode>0700</mode>
<owner>0</owner>
<group>0</group>
</permissions>
</target>
</pool>
After you created the storage pool, the volumes of the pool can be assigned to a guest by
using the XML snippet as shown in Example 6-21 or by using Kimchi.
Example 6-21 Attachment of a disk from a storage pool on a virtual Fibre Channel adapter
[powerkvm-host]# cat add_pool_lun.xml
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='vfcpool' volume='unit:0:0:0'/>
<target dev='sda' bus='scsi'/>
</disk>
If the two volumes, as shown in Example 6-20 on page 180, are two paths of the same
volume, the preferred practice is to assign a multipath device (mpath) to the guest and not just
one or both units of the storage pool. If both units are assigned to the guest, the operating
system in the virtual machine is not aware that the two disks are just one disk with two paths,
as the disks appear as virtualized QEMU disks. To attach a multipath device, follow the steps
in section 6.6.2, “Direct mapped multipath disks” on page 185.
The following sections describe how to handle multipath disks and how these can be used
with PowerKVM.
In Example 6-22 on page 182, you can also see the UIDs of the LUNs in the storage. This
example is taken from an IBM Storwize® V7000. Figure 6-8 shows the disks (LUNs) on the
storage with the corresponding UIDs.
Figure 6-8 IBM Storwize V7000 storage view of LUNs attached to PowerKVM
blacklist {
}
multipaths {
multipath {
wwid "3600507680281038b080000000000008b"
path_grouping_policy multibus
path_selector "round-robin 0"
alias pvc-disk
}
multipath {
wwid "3600507680281038b08000000000000a7"
alias pkvm-pool1
}
}
After changing the attributes, the multipath service needs to be restarted with:
# service multipathd stop
# service multipathd start
Note: There are more attributes that can be changed. The changed values above are just
an example to illustrate how this can be achieved.
After these changes, the output of multipath -ll looks as shown in Example 6-24.
Note: If a path is not used, it might still be shown as active although the path is failed. After
the first I/O to a failed path, it shows up as failed.
If new disks were added to the PowerKVM host, these do not show up automatically. To scan
for new devices, use the following command:
# rescan-scsi-bus.sh
Example 6-26 XML snippet for attachment of a multipath disk into a guest
# cat mpio_disk.xml
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/dm-name-mpathb'/>
<target dev='sda' bus='scsi'/>
</disk>
# rescan-scsi-bus.sh
Scanning SCSI subsystem for new devices
Scanning host 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs
...
# lsscsi
[0:0:0:0] disk QEMU QEMU HARDDISK 2.3. /dev/sda
[0:0:0:2] cd/dvd QEMU QEMU CD-ROM 2.3. /dev/sr0
# fdisk -l
Inside the guest, the disks show up as a virtualized QEMU HARDDISK. The fact that it is a
multipath disk is transparent to the guest. Only in the PowerKVM host, it is possible to see the
paths as shown in Example 6-24 on page 184. If a path is offline and multipathing is
configured correctly, the guest can still run without any interruption.
Note: In Example 6-26 on page 185, you can see two types of disks: /dev/vda and
/dev/sda. /dev/vda uses the virtio driver and the QEMU HARDDISK uses a SCSI driver.
Instead of using device='disk' also device='lun' can be used. In this case, the disk/LUN is
configured as a pass-through device and shows up with its storage origin, as shown in
Example 6-28.
[linux-guest]# lsscsi
Kimchi also shows the added multipath disk as in the screen capture in Figure 6-9, but only
when the device was added with device=’disk’ (not with device=’lun’).
A multipath disk cannot be directly added to a guest using Kimchi. Therefore, it is preferred
practice to include the disks into a storage pool as described in the next section, 6.6.3,
“Multipath disks in a storage pool” on page 187.
[powerkvm-host]# vgscan
Reading all physical volumes. This may take a while...
Found volume group "ibmpkvm_vg_data" using metadata type lvm2
Found volume group "ibmpkvm_vg_swap" using metadata type lvm2
[linux-guest]# lsscsi
[0:0:0:0] disk QEMU QEMU HARDDISK 2.3. /dev/sda
[0:0:0:2] cd/dvd QEMU QEMU CD-ROM 2.3. /dev/sr0
[0:0:0:3] disk QEMU QEMU HARDDISK 2.3. /dev/sdb
[linux-guest]# fdisk -l
Note: All steps described in Example 6-29 on page 187 can be also achieved by using
Kimchi as described in Chapter 3, “Managing hosts and guests from a Web interface” on
page 61.
To add a disk to the guest online, use the virsh attach-disk command, as shown in
Example 6-30.
The disk hot plug works in the same way. A new LUN will be attached to an existing SCSI bus
of a guest. If the guest has multiple SCSI adapters defined, libvirt picks the first one. If you
want to use a specific one, use the --address argument.
After the disk is hot plugged to the guest, rescan the SCSI bus within the guest.
Note: You can use the rescan-scsi-bus.sh script from the sg3-utils package to rescan.
on the host:
[root@powerkvm ~]# virsh vol-create-as default hotplug.qcow2 --format qcow2 20G
--allocation 1G
Vol hotplug.qcow2 created
on the guest:
sles11vm04:~ # rescan-scsi-bus.sh
/usr/bin/rescan-scsi-bus.sh: line 647: [: 1.11: integer expression expected
Host adapter 0 (ibmvscsi) found.
Scanning SCSI subsystem for new devices
Scanning host 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs
Scanning for device 0 0 0 0 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: QEMU Model: QEMU HARDDISK Rev: 1.6.
Type: Direct-Access ANSI SCSI revision: 05
Scanning for device 0 0 0 1 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 01
Vendor: QEMU Model: QEMU CD-ROM Rev: 1.6.
Type: CD-ROM ANSI SCSI revision: 05
sles11vm04:~ # lsscsi
[0:0:0:0] disk QEMU QEMU HARDDISK 1.6. /dev/sda
[0:0:0:1] cd/dvd QEMU QEMU CD-ROM 1.6. /dev/sr0
[0:0:0:2] disk QEMU QEMU HARDDISK 1.6. /dev/sdb
Notes:
--target SDC doesn’t reflect the device name shown in the guest. Instead, it shows the
name used by QEMU to identify the device. Also, based on the device name, the bus is
automatically determined. Examples: sdX for vSCSI and vdX for Virtio devices, where X is
a, b, c, and so on.
spapr-vscsi has a limit of seven devices being attached to the same bus.
To remove a disk from the system, use the virsh detach-disk command, as shown in
Example 6-32.
On the guest:
sles11vm04:~ # rescan-scsi-bus.sh -r
/usr/bin/rescan-scsi-bus.sh: line 647: [: 1.11: integer expression expected
Host adapter 0 (ibmvscsi) found.
Syncing file systems
Scanning SCSI subsystem for new devices and remove devices that have disappeared
Scanning host 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs
Scanning for device 0 0 0 0 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: QEMU Model: QEMU HARDDISK Rev: 1.6.
Type: Direct-Access ANSI SCSI revision: 05
Scanning for device 0 0 0 1 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 01
Vendor: QEMU Model: QEMU CD-ROM Rev: 1.6.
Type: CD-ROM ANSI SCSI revision: 05
sg2 changed: LU not available (PQual 3)
REM: Host: scsi0 Channel: 00 Id: 00 Lun: 02
DEL: Vendor: QEMU Model: QEMU HARDDISK Rev: 1.6.
Type: Direct-Access ANSI SCSI revision: 05
0 new or changed device(s) found.
1 device(s) removed.
sles11vm04:~ # lsscsi
[0:0:0:0] disk QEMU QEMU HARDDISK 1.6. /dev/sda
[0:0:0:1] cd/dvd QEMU QEMU CD-ROM 1.6. /dev/sr0
Note: --config instructs libvirt to add an adapter definition, which is available after the next
boot.
“Host PCI Device tab” on page 904.4.11, “CPU Hotplug” on page 127.
After reading this chapter, you will have a deeper understanding of these PowerKVM-related
topics:
Install PowerKVM on a hardware Redundant Array of Independent Disks (RAID)
Migrate guests to another host
Add the host to a cloud environment
Security
PowerVC
Docker usage
To proceed, you need to enter the Petitboot shell, as described in Figure 2-1 on page 33.
On the Petitboot shell, launch the iprconfig tool and you see the main window, as shown in
Figure 7-1.
You are prompted to select the disk adapter, as shown in Figure 7-3. Select your disk adapter
by pressing 1 and then Enter.
You are prompted to select the wanted RAID type, as shown in Figure 7-5. Press c to change
the RAID type and then Enter to select. After that, press Enter to proceed.
The message Disk array successfully created is displayed at the bottom of the window.
You can verify the status of your disk array by selecting option 1. Display disk array status in
the main window. You then see the status of the disk array, as shown in Figure 7-7.
At this point, the iprconfig tool created a RAID 10 disk array. It still takes some hours until
the disk array is fully built and ready to be used. After the disk array is ready, you can follow
the installation instructions from section 2.1, “Host installation” on page 32.
To migrate a guest from one PowerKVM host to another, the following requirements must be
satisfied:
The Images volume is mounted at the same location in both hosts, usually
/var/lib/libvirt/images.
Source and destination hosts run the same PowerKVM version.
Hosts have equal libvirt network configuration.
Network traffic on TCP/IP ports 49152-49215 is allowed in both hosts. For migration over
Secure Shell (SSH) protocol, make sure traffic on port 22 is also allowed to the destination
host.
The --persistent option, used in the following examples, saves guest configuration on the
destination host permanently. Otherwise, when this option is not specified, the guest
configuration is erased from libvirt after the guest is shut down.
To perform an offline migration, you need to make sure both source and destination hosts
share the same storage pool for guest disks, as for example:
Fibre Channel
iSCSI
NFS
See section 4.2, “Managing storage pools” on page 107 for how to configure and use shared
storage pools.
The switch --offline is specified in the virsh migrate command line options to indicate an
offline migration.
The option --undefinesource is used to undefine the guest configuration on the source host.
Otherwise, the guest will be configured on both servers after the migration is complete.
Note: Having a guest running on two different hosts can damage the guest disk image on
the shared storage.
The online migration takes longer to complete than the offline one because the entire guest
disk is copied over the network. The transfer time depends on the guest memory usage and
network throughput between source and destination hosts.
To perform an online migration, the guest needs to be running on the source host. During the
transfer, the guest appears as paused on the destination. After migration is complete, the
guest is shut down on the source and resumed on the destination host.
Note: An attempt to perform an online migration with the guest shut down results in the
following error message:
Error: Requested operation is not valid: domain is not running.
Note: Do not use Kimchi for guest XML disk setup if you plan to do live migration with SAN
Fibre Channel.
Before proceeding, you need to make sure that the guest disk image is already available on
the destination host. If you are not using a shared storage, make sure you perform an online
migration first, as described in section 7.2.2, “Online migration” on page 199.
To perform a live migration, the guest must be running on the source host and must not be
running on the destination host. During the migration, the guest is paused on the destination.
After the transfer is complete, the guest is shut down on the source and then resumed on the
destination host.
The --timeout option forces the guest to suspend when live migration exceeds the specified
seconds, and then the migration completes offline.
Example 7-4 shows how to perform a live migration specifying a timeout of 120 seconds.
The migration can be interrupted due to an intense workload in the guest and can be started
again with no damage to the guest disk image.
Note: Migration can fail if there is not enough contiguous memory space available in the
target system.
The only requirement is to have an HTTP server configured to serve vmlinuz, initrd.img,
squashfs.img, and packages repository. Refer to section “Configuration of an HTTP server” on
page 46 for more details.
The script downloads the kernel and rootfs image, and hands execution over to the
downloaded kernel image by calling the kexec command.
SERVER="http://server-address”
NIC="net0"
MAC="mac-address"
IP="ip-address"
GW="gateway"
NETMASK="netmask"
NS="dns-server"
HOSTNAME="your-system-hostname"
VMLINUZ="${SERVER}/ppc/ppc64le/vmlinuz"
INITRD="${SERVER}/ppc/ppc64le/initrd.img"
SQUASHFS="${SERVER}/LiveOS/squashfs.img"
REPO="${SERVER}/packages"
NET_PARAMS="ifname=${NIC}:${MAC} \
ip=${IP}::${GW}:${NETMASK}:${HOSTNAME}:${NIC}:none nameserver=${NS}"
BOOT_PARAMS="rd.dm=0 rd.md=0 console=hvc0 console=tty0"
cd /tmp
wget $VMLINUZ
wget $INITRD
Remember to update the following variables in the boot.sh sample script to meet your
environment configuration:
SERVER: The IP address or domain name of your HTTP server.
NIC: The name of the network interface.
MAC: The hardware address of the network interface.
IP: The IP address of your host.
GW: The gateway address.
NETMASK: The network address.
NS: The name server address.
HOSTNAME: The host name of your host system.
7.4 Security
This section gives you an overview of some security aspects present in IBM PowerKVM
V3.1.0.
By default, the PowerKVM Live DVD and the target system run in Enforcing mode. You can
verify the SELinux policy by running the following command:
# getenforce
Enforcing
You can change the runtime policy to Permissive by running the following command:
# setenforce Permissive
The policy can be updated permanently by changing the content of the /etc/selinux/config
file. Example 7-6 is a sample SELinux configuration file with Enforcing policy.
SELINUX=enforcing
SELINUXTYPE=targeted
The SELinux context of a file can be updated by using the chcon command. The following
example updates the context of guest69-disk.img file to samba_share_t type:
# chcon -t samba_share_t guest69-disk.img
The default SELinux context for the guest disk files under /var/lib/libvirt/images is
virt_image_t. Run the restorecon command to restore the original SELinux context of a file.
For example:
# restorecon -v /var/lib/libvirt/images/guest69-disk.img
restorecon reset /var/lib/libvirt/images/guest69-disk.img context \
system_u:object_r:samba_share_t:s0->system_u:object_r:virt_image_t:s0
For more details about SELinux, visit the SELinux Project page at:
http://www.selinuxproject.org
Example 7-7 shows the content of the default repository configuration file.
[powerkvm-updates]
name=IBM PowerKVM $ibmver - $basearch
baseurl=http://public.dhe.ibm.com/software/server/POWER/Linux/powerkvm/$ibmmilesto
ne/$ibmver/updates
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ibm_powerkvm-$ibmver-$ibmmilestone
skip_if_unavailable=1
[powerkvm-debuginfo]
name=IBM PowerKVM Debuginfo - $ibmver - $basearch
baseurl=http://public.dhe.ibm.com/software/server/POWER/Linux/powerkvm/$ibmmilesto
ne/$ibmver/debuginfo
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ibm_powerkvm-$ibmver-$ibmmilestone
skip_if_unavailable=1
One preferred practice is to use the ibm-update-system command to apply system updates
because it installs the recommended packages that were not installed by default at
installation time.
For example, when a package is added to the PowerKVM image in a further release, you can
still get it installed on your system next time you run the ibm-update-system tool.
Another preferred practice is not to use external packages repositories. Otherwise, you can
damage your system when installing software from non-trusted sources.
With the ibm-update-system command, you can also update your system using a local
PowerKVM ISO image. The following example shows how to apply updates using a local
image:
# ibm-update-system -i ibm-powerkvm.iso
When the command is entered, you are prompted to answer if you want to proceed with the
update. The -y option can be used to assume yes and skip this question.
Run ibm-update-system --help to obtain more information about the supported options.
Kimchi is an open source project for virtualization management within one server.
IBM PowerVC and IBM Cloud Manager are advanced management solutions created and
maintained by IBM, built on OpenStack.
OpenStack is a cloud operating system that controls large pools of compute, storage, and
networking resources throughout a data center. OpenStack Compute (the Nova component)
has an abstraction layer for compute drivers to support different hypervisors, including QEMU
or KVM, through the libvirt virtualization API.
The following sections introduce the virtualization management systems that can be used to
manage the PowerKVM servers in cloud environments.
This section gives an overview of how you can add PowerKVM hosts as a compute node and
deploy cloud images by using PowerVC. For more detailed information about how to install
and configure PowerVC, see IBM PowerVC Version 1.2.3: Introduction and Configuration,
SG24-8199.
To connect to a PowerKVM host, simply enter the credentials for the host as shown in
Figure 7-9. PowerVC automatically installs the necessary OpenStack modules on the
PowerKVM host and adds the host as a compute node in PowerVC.
Before importing images and creating instances on PowerVC, configure the network and
storage settings.
When using PowerVC together with PowerKVM, the networking is done using an Open
vSwitch environment, not the standard bridging that is commonly used with PowerKVM. For a
simple environment, PowerVC prepares the Open vSwitch environment when connecting to a
new PowerKVM host that does not have an open vSwitch environment configured.
Example 7-8 shows a simple vSwitch environment configured by PowerVC.
It is also possible to use a virtual machine as a golden image that contains the operating
system, perhaps some applications, and also other customization. This virtual machine can
be equipped with an activation mechanism, such as cloud-init or the IBM Virtual Solutions
Activation Engine (VSAE) that changes settings like an IP address, host name, or the SSH
keys when deploying the image to a new virtual machine. A virtual machine that contains the
golden image can be captured and then used for deployments.
To deploy an image, simply use the Deploy button as shown in Figure 7-12 and enter the
required data for the new guest as shown in Figure 7-13.
In the deploy dialog, all necessary data is gathered to install and configure the new guest on
the PowerKVM host. For the size of the new guest, so-called compute templates are used. In
native OpenStack, compute templates are also referred to as flavors. A compute template
defines the number of virtual processors, sockets, cores and threads, the memory size, and
the size for the boot disk.
For attachment of further data disks, also iSCSI volumes can be attached. The creation of a
disk is usually done in PowerVC, but also existing volumes can be imported. Figure 7-14
shows the attachment of a new data disk using iSCSI, connected to an IBM Storwize V7000
system.
For implementing PowerVC connecting to PowerKVM servers, an IBM Systems Lab Services
Techbook can be found here:
https://ibm.box.com/PowerVC123-on-PowerKVM
See also IBM PowerVC Version 1.2.3: Introduction and Configuration, SG24-8199 for more
information about how to manage PowerKVM hosts using PowerVC.
These are among the benefits of using IBM Cloud Manager with OpenStack for Power:
Full access to OpenStack APIs
Simplified cloud management interface
All IBM server architectures and major hypervisors are supported. This includes x86 KVM,
KVM for IBM z™ Systems, PowerKVM, PowerVM, Hyper-V, IBM z/VM®, and VMware.
Chef installation enables flexibility to choose which OpenStack capabilities to use
AutoScale using the OpenStack Heat service
Manage Docker container services
Figure 7-15 IBM Cloud Manager with OpenStack Self Service portal
For more information about the IBM Cloud Manager with OpenStack, refer to the
documentation that can be found here:
http://ibm.co/1cc5r7o
This section gives an overview of how to install and configure compute controller services to
add your PowerKVM server to OpenStack. You can configure these services on a separate
node or the same node. A dedicated compute node requires only openstack-nova-compute,
the service that launches the virtual machines on the PowerKVM host.
RPM is the package management system used by PowerKVM. To install the open source
version of OpenStack compute services on PowerKVM, get the RPM packages from your
preferred Linux distribution or build your own packages.
IBM PowerKVM does not bundle OpenStack packages. The installation instructions in this
section are based on Fedora repositories:
http://repos.fedorapeople.org/repos/openstack
The link has several subdirectories for the OpenStack releases, especially the Liberty
release, which was the latest at the time of writing.
Note: IBM PowerKVM version 3.1.0 does not include OpenStack community packages.
You can choose to install IBM Cloud Manager or IBM PowerVC to have full integration and
support for cloud services.
Compute node
Follow these steps to add a PowerKVM host as a compute node to an existing cloud
controller:
1. Install the openstack-nova-compute service. These dependencies are required:
– openstack-nova-api
– openstack-nova-cert
– openstack-nova-conductor
– openstack-nova-console
– openstack-nova-novncproxy
– openstack-nova-scheduler
– python-novaclient
2. Edit the /etc/nova/nova.conf configuration file:
a. Set the authentication and database settings.
b. Configure the compute service to use the RabbitMQ message broker.
c. Configure Compute to provide remote console access to instances.
3. Start the Compute service and configure it to start when the system boots.
4. Confirm that the compute node is listed as a host on nova, as shown in Example 7-9.
Tip: Export Nova credentials and access the API from any system that can reach the
controller machine.
To configure the compute services in the controller node, install these packages:
openstack-nova-api
openstack-nova-cert
openstack-nova-conductor
openstack-nova-console
openstack-nova-novncproxy
openstack-nova-scheduler
python-novaclient
Note: It is also possible to add a PowerKVM compute node to an existing cloud controller
running on an IBM x86 server. You might use host aggregates or an availability zone to
partition mixed architectures into logical groups that share specific types or images.
After the services are running in the controller, you can deploy your images on the PowerKVM
host. To deploy an image and specify the host that you want to run, use the
--availability-zone option, as shown in Example 7-10.
Besides PowerKVM, there are several hypervisors that are supported. For details, see the
HypervisorSupportMatrix page on the OpenStack website:
https://wiki.openstack.org/wiki/HypervisorSupportMatrix
For more detailed options for creating virtual machines on OpenStack, see the online
documentation.
As explained in the 1.4, “Docker” on page 22, Docker provides an infrastructure for
containers, aiming to build and run distributed applications. Even though Docker is focused on
application containers, this section uses it as a system container.
Dependencies Resolved
==================================================================================
==============================================================
Package Arch Version
Repository Size
==================================================================================
==============================================================
Installing:
Transaction Summary
==================================================================================
==============================================================
Install 1 Package (+2 Dependent packages)
Installing : uberchain-ppc64le-8.0-4.pkvm3_1_0.ppc64le
2/3
Installing : 1:docker-1.7.0-22.gitdcff4e1.5.el7_1.2.ppc64le
3/3
Verifying : uberchain-ppc64le-8.0-4.pkvm3_1_0.ppc64le
1/3
Verifying : 1:docker-1.7.0-22.gitdcff4e1.5.el7_1.2.ppc64le
2/3
Verifying : 1:docker-selinux-1.7.0-22.gitdcff4e1.5.el7_1.2.ppc64le
3/3
Installed:
docker.ppc64le 1:1.7.0-22.gitdcff4e1.5.el7_1.2
Dependency Installed:
docker-selinux.ppc64le 1:1.7.0-22.gitdcff4e1.5.el7_1.2
uberchain-ppc64le.ppc64le 0:8.0-4.pkvm3_1_0
Complete!
Docker errors
If you run the command shown in Example 7-12, and the output was an error, for example, an
output similar to Example 7-13, it means that you do not have access to the running Docker
service. It might be because your user does not have privileges to access the Docker service.
The other option is that the Docker service is down. In that case, you can start the service
with:
# systemctl start docker
You can search for ready-to-deploy container images by using the docker search command,
which lists all the images that contain a specified word in the images name available at
Docker hub.
Example 7-15 shows part of the output docker search command searching for ppc64 images.
By convention, POWER images have ppc64 or ppc64le in its name but it is important to notice
that the image uploader is responsible to name the image, so an image can have ppc64 in its
name but not necessarily designed to run on a POWER architecture.
Docker images can be listed by using the docker images command, as shown in
Example 7-17.
Note: You can also see most of the Docker images for POWER at the Docker hub web at
the following address:
https://hub.docker.com/u/ppc64le
Starting a container by using the run command is more straightforward because only one
command is enough to create the container, start it, and have a shell access to it. Using the
create command requires three different commands that are shown in “Creating a container”
on page 220.
Note: Your user/login needs to have proper access to Docker in order to access Docker
commands. Otherwise, Docker complains by using the following message:
time="2015-11-16T09:29:45-05:00" level=fatal msg="Get
http:///var/run/docker.sock/v1.18/containers/json: dial unix
/var/run/docker.sock: permission denied. Are you trying to connect to a
TLS-enabled daemon without TLS?"
For example, if you want to have a bash shell command inside the container, you can use the
Docker command run, using the following arguments:
$ docker run -t -i <image> <command>
Example 7-18 shows a Debian container being started using the ppc64le/debian image
downloaded previously. In this example, the bash shell is started and returned to the user in
an interactive terminal. After that, you will be inside the container, and any file you see or
execute will come from the container file system and will be executed within the container
context.
After you leave the shell, the default container dies because Docker was created to be an
application container. To see the container that is already down, you can use the ps Docker
command. No argument is necessary. You only list the active containers, as shown in
Example 7-19. You can also keep the container running even after the shell dies. In order to
do that, you must use the argument --restart.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5c1da35da075 ppc64le/debian:latest "/bin/bash" 14 seconds ago Up 13 seconds itso
On the other side, if you want to see all the containers, either in running or suspended states,
you can use the -a flag, as shown in Example 7-20.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
5c1da35da075 ppc64le/debian:latest "/bin/bash" 40 seconds ago Exited (130) 4 seconds ago
itso
e022e2a9cb66 ppc64le/debian:latest "/bin/bash" 20 minutes ago Exited (130) About a minute ago
angry_swartz
926963dbc80d ppc64le/debian:latest "/bin/bash" 27 minutes ago Exited (130) 20 minutes ago
mad_lumiere
7118a9b8a56d ppc64le/debian:latest "/bin/bash" 28 minutes ago Exited (0) 28 minutes ago
trusting_shockley
1428429888b1 ppc64le/debian:latest "/bin/bash" 29 minutes ago Exited (0) 29 minutes ago
loving_albattani
Note: If the user does not specify a name to the container, it creates one automatically for
you, as those described above, as “loving_albattani”, “trusting_shockley” , “mad_lumiere”,
and “angry_swartz”.
The advantage of using the create method over the run method is the ability to define detailed
options for the container, as memory usage, CPU binding, and so on.
Note: The arguments for any Docker command should be passed before the Docker
image. Otherwise, it will be seen as a command. The argument position is not
interchangeable.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
5c1da35da075 ppc64le/debian:latest "/bin/bash" 2 days ago
Up 2 seconds itso
Note: If you try to attach to a container that is not started, Docker complains by using the
following error message:
Example 7-23 shows the commands that were executed in an image after it was created.
To make an image change, you should do it through the container itself, instantiate the image
in a container, change the container image, and commit this change to a new image. You can,
later, replace the old image with the newer one.
Suppose you want to change the container image called itso, adding a new sub directory foo
at /tmp/ directory. To do it, you need to create a console, attach to the console, and run the
mkdir /tmp/foo command.
When you do it, you can see the file system change by using the docker diff command. If
you agree with the changes, you need to commit the changes to a new image by using the
docker commit command, which is going to create a new image for you, based on the
previous one, using the layer support from AUFS. This whole process is described in
Example 7-24. After the commit image is generated, a new image is generated, with the
commit ID a98c7e146562.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
<none> <none> a98c7e146562 33 seconds ago 127.6 MB
ppc64le/debian latest ba92052e52c6 4 days ago 127.6 MB
You can give a name to any image by using the docker tag command (Example 7-25).
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
itso_version2 latest a98c7e146562 15 minutes ago 127.6 MB
Docker login
In order to upload your image to Docker hub, you need to have a login at Docker hub service.
You can create one easily and for no charge at https://hub.docker.com.
When you have a Docker hub login, you can associate your system to a Docker hub account
by using the docker login command, as shown in Example 7-26.
Note: After you enter your login information for the first time, it is saved at ~/.dockercfg
file. It saves your user ID and the encrypted password.
Docker push
To be able to upload the image to Docker hub, you need to rename it appropriately. The
container image name should have your Docker hub login as part of the image. For example,
if your login is username, and the image name is itso, the image name should be
username/itso. Otherwise, you get the following error:
time="2015-11-16T11:17:08-05:00" level=fatal msg="You cannot push a \"root\"
repository. Please rename your repository to <user>/<repo> (ex: <user>/itso2)"
Example 7-27 shows a successful image upload to Docker hub. The image was originally
called itso, but we need to rename it to username/itso, and pushed to the Docker hub.
The easiest way to do so is by creating a .tar.gz file with the root file system and importing it
in Docker by using the docker import command. This is the method that is explained in this
section. Two operating systems are used to describe how to create an image from scratch,
Ubuntu and Debian.
Ubuntu Core
Ubuntu has a distribution called Ubuntu Core, which is an image already stripped down for
small environments like a Docker container. It is released with the Ubuntu releases, and
contains around 50 MB compressed (172 MB decompressed) for each release. It also
contains the most 100 important basic packages in Ubuntu. For more information about
Ubuntu Core, see the following site:
https://wiki.ubuntu.com/Core
Because Ubuntu Core is available in a .tgz format in the web, we can point Docker to create
an image from it by using a one-line command, as shown in Example 7-29.
When you have imported the Ubuntu Core, you now have an image for Ubuntu Core. You can
rename it to “Ubuntu Core” and start it as Example 7-30 shows.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
ubuntucore latest 24215bea40ea 12 minutes ago 172.1 MB
Debootstrap
Debian and Ubuntu have a command to create a small operating system rootfs called
debootstrap. It is an application that needs to be run on a Debian or Ubuntu system. It
downloads a set of packages from an archive and decompresses the packages in a directory.
The set of packages is defined in the command line, and in this example the minbase, which
includes the minimal set of the most important packages. Ubuntu and Debian has the same
package set, so, in order to differentiate Debian and Ubuntu, you basically need to point to the
Debian or Ubuntu archive. It does not matter if you are in Ubuntu or Debian, you can create a
cross operating system root file system.
To use it, you need to specify the platform you want (ppc64el for POWER8), as the distro and
the local directory to decompress the files, as for example, the command below created a
localdirectory directory, and install the minimal Debian (minbase) distribution using the
unstable packages. Example 7-31 shows part of the expected output:
# debootstrap --arch=ppc64el --variant=minbase unstable localdirectory
http://ftp.debian.org/debian
After the debootstrap command finishes to install the packages at the localdirectory
directory, you are going to see a minimal Debian installed at this directory. After that, you can
compress that directory on a file and use it to import in Docker, as shown in Example 7-32.
For more information about debootstrap and how to add extra packages to the image, check:
https://wiki.debian.org/Debootstrap
This chapter covers the development kit and some development examples for the IBM
PowerKVM product using C and Python programming languages.
Introduction
Development kit installation
Connection and VM status
Advanced operations
You can write applications in the most common computer languages, as the main ones:
C
C++
Perl
PHP
Python
This chapter covers the Libvirt API, which provides an extensive virtualization API with
bindings to several programming languages. All examples in this chapter are written in C and
Python.
Other than that, the libVirt API can be accessed locally and remotely using security protocols.
Glossary
To understand the Libvirt API, it should make clear the names for each structure in a
hypervisor machine.
Node
A node is the physical machine that runs the PowerKVM hypervisor.
Hypervisor
A hypervisor is the software stack that is able to virtualize a node, providing for example,
processor, memory, and I/O support for the domains. All functions that you call for a certain
domain are executed by a hypervisor driver. In PowerKVM the hypervisor driver is qemu.
8.2 Installation
The development kit installation can be done in two different ways, either by installing from a
public repository, or from the ISO image that is shipped with the POWER machines.
To validate that you have the proper repository working, you can check it with the yum
repolist -v command. You should be able to see the IBM public repository listed, as shown
in Example 8-1.
Repo-id : powerkvm-updates
Repo-name : IBM PowerKVM 3.1.0 - ppc64le
Repo-revision: 1445348973
Repo-updated : Tue Oct 20 13:49:33 2015
Repo-pkgs : 0
Repo-size : 0
Repo-baseurl :
http://public.dhe.ibm.com/software/server/POWER/Linux/powerkvm/release/3.1.0/updat
es/
Repo-expire : 21,600 second(s) (last: Wed Nov 4 12:24:22 2015)
Repo-filename: /etc/yum.repos.d/base.repo
repolist: 1,641
This group contains around 503 packages and consumes almost 300 MB of disk space when
installed.
For more information about Systemtap on POWER, check SystemTap: Instrumenting the
Linux Kernel for Analyzing Performance and Functional Problems, REDP-4469.
If you have the PowerKVM DVD inserted in the disc tray, you might be able to find it in
/mnt/cdrom. If you have just the ISO image, you can mount it at the same place using the
following command:
# mount -o loop IBM-powerKVM-3.1.0.iso /mnt/cdrom
When you have the image at the /mnt/cdrom directory, you need to create a repository file to
point to it. To do it, you need to create a repository file at /etc/yum.repos.d. You can name it
for example local_iso.repo and it should contain a content similar to Example 8-3.
Example 8-3 Repository file for development kit installation from ISO
[powerkvm-iso]
name=IBM PowerKVM $ibmver - $basearch - ISO media
baseurl=file:///mnt/cdrom/packages
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ibm_powerkvm-$ibmver-$ibmmilestone
skip_if_unavailable=1
8.3 Architecture
Libvirt uses a set of pointers and structures that are worth mentioning. The main structure is
called virConnect and it is generated when a connection to the hypervisor is created.
In order to create a connection with a node you want, use one of the three basic connection
functions shown at Example 8-4.
These three functions are used initially to connect to the libvirt host in order to start the host
and guest management. You can choose three different connection functions depending on
what aspect of PowerKVM you want to manage.
If you want to connect your software to manage the localhost hypervisor, you might use an
empty name argument. Other than that, you need to provide the URI for the remote machine
that you want access to. Here is an example that you might want depending on how you
configured PowerKVM.
For this method, you want to use URI as qemu+tcp://example.ibm.com/system. In order to use
this option, you might want to enable libvirt to listen to TCP port 16509. This is an example on
how to access libvirt using this method.
uri = "qemu+tcp://sid.ltc.br.ibm.com/system"
conn = libvirt.openAuth(uri,
[[libvirt.VIR_CRED_AUTHNAME, libvirt.VIR_CRED_PASSPHRASE],
getCred,
mydata], 0)
If you do not provide any URI, it uses the default one got from the /etc/libvirt.conf file in
the uri_default file in the local machine (not on the PowerKVM hypervisor), as:
uri_default = “qemu=///system”
After connecting to the hypervisor, the function returns a pointer to a complex data structure
that contains all the basic information needed to do the development that you want. This
pointer is called virConnectPtr and points to a structure that is called virConnect. Figure 8-1
shows a little bit of the structure that you have when the connection is successful.
virConnectPtr
virConnect
virHypervisorDriverPtr
virNeworkDriverPtr
virInterfaceDriverPtr
virStorageDriverPtr
virNodeDeviceDriverPtr
def connect_to_hypervisor():
# Opens a read only connection to the hypervisor
connection = libvirt.openReadOnly(None)
if connection is None:
print 'Not able to connect to the localhost hypervisor'
sys.exit(1)
return connection
return dom.ID()
if __name__ == "__main__":
if len(sys.argv) < 2:
print "Usage\n%s <virtual machine name>" % sys.argv[0]
sys.exit(1)
vm_name = sys.argv[1]
con = connect_to_hypervisor()
ID = find_a_domain(con, vm_name)
if ID > 0:
print "Domain %s is running " % vm_name
elif ID < 0:
print "Domain %s is NOT running " % vm_name
else:
# If ID is 0, it means that the virtual machine does not exist
# on the hypervisor
print "Domain %s not found on this hypervisor " % vm_name
$ virsh list
Id Name State
----------------------------------------------------
2 ubuntu1504 running
3 debian running
4 ubuntu1510 running
5 dockerhub running
9 fedora running
The first function is able to connect to a localhost hypervisor (Example 8-8). It is useful when
you want to create a local application. You can use any domain that you want, as explained in
Example 8-4 on page 231.
return conn;
}
When you have the domain ID, you are able to identify if the virtual machine is active, inactive,
or even exists. If the domain ID is larger than zero, the virtual machine is active. If the domain
ID is a negative number, the domain is inactive. If the domain is zero, the virtual machine
does not exist, as returned from the find_a_domain() function shown in Example 8-9.
In order to compile Example 8-10, you should use the -lvirt option, which means that it will link
to the libvirt shared objects as:
$ gcc -Wall -o example1 example1.c -lvirt
After that, you are able to call example1 directly and see an output similar to Example 8-7 on
page 235.
conn = connect_to_hypervisor();
id = find_a_domain(conn, vm_name);
virConnectClose(conn);
return 0;
}
Note: If you do not install the development packages, specifically the libvirt development
packages, you can get an error similar to the following:
example1.c:3:29: fatal error: libvirt/libvirt.h: No such file or directory
#include <libvirt/libvirt.h>
^
compilation terminated.
By using these functions, you are going to be able to get a hypervisor connection. When the
connection is returned, you are able to go through the active domains and collect the memory
utilization information. This is done by using the function get_guest_info(), as shown in
Example 8-11.
Example 8-11 Get the memory information for the active guests
virDomainInfo *get_guest_info(virConnectPtr conn,
int numDomains,
int *activeDomains)
{
int i;
virDomainInfo *domain_array;
virDomainInfo info;
virDomainPtr dom;
virDomainMemoryStatStruct memstats[VIR_DOMAIN_MEMORY_STAT_NR];
memset(memstats,
0,
sizeof(virDomainMemoryStatStruct) * VIR_DOMAIN_MEMORY_STAT_NR);
return domain_array;
}
When you understand the functions above, the second example is easy to read. You can find
it at Example 8-12. This new example prints the collected information in an ordered way.
#define SCREEN_COLS 76
return 0;
}
conn = connect_to_hypervisor();
// Print header
printf("%-15s | %4s | %7s | %9s | %28s\n",
"Name", "vCPU", "Memory", "Mem fault",
"Proportional CPU Utilization");
for (i = 0; i < SCREEN_COLS; ++i)
printf("-");
printf("\n");
// Sleep for 1 second in order to grab the VM stats during this time
sleep(1);
snd_measure = get_guest_info(conn, numDomains, activeDomains);
// Putting the CPU and Memory differences in the virDomainInfo for qsort()
for (i = 0 ; i < numDomains ; i++) {
fst_measure[i].maxMem = snd_measure[i].maxMem - fst_measure[i].maxMem;
fst_measure[i].cpuTime = snd_measure[i].cpuTime - fst_measure[i].cpuTime;
sum += fst_measure[i].cpuTime;
}
// List all the domains and print the info for each
for (i = 0 ; i < numDomains ; i++) {
dom = virDomainLookupByID(conn, activeDomains[i]);
virDomainGetInfo(dom, &info);
virConnectClose(conn);
return 0;
}
Example 8-13 shows the output of the example detailed above. In this output, you see a list of
virtual machines, the number of memory faults each has, and also the CPU utilization for
each of them. This list is sorted by CPU utilization.
Example 8-14 shows how to connect to the Libvirt and how to get the guest XML definition.
The virConnectOpenReadOnly function opens a restricted connection, which is enough to get
the XML file.
Then, in the get_domain_xml function, the XML file of a given guest is returned by
virDomainGetXMLDesc and stored in domain_xml, which is a simple pointer to the char pointer
allocated in and returned by virDomainGetXMLDesc.
#define MAC_ADDRESS_LEN 17
#define IP_ADDRESS_LEN 15
virConnectPtr connect_to_hypervisor()
{
// opens a read only connection to libvirt
virConnectPtr conn = virConnectOpenReadOnly("");
if (conn == NULL) {
fprintf(stderr, "Not able to connect to libvirt.\n");
exit(1);
}
return conn;
}
virDomainFree(dom);
}
In Example 8-15, the get_macs function is implemented. It receives the domain_xml got from
get_domain_xml and extracts all network interface MAC addresses found. Those addresses
are stored as elements of the struct mac_t, which is a linked list. The function returns a pointer
to the first element in the list.
// read the XML line by line looking for the MAC Address definition
// and insert that address, if found, into a list of mac address
// to be returned
char *xml = strdup(domain_xml);
char *token = strtok(xml, "\n");
while (token) {
return macs;
}
char line[1024];
mac_t *tmp = head;
// open the arp file for read, this is where the IP addresses
// will be found
FILE *arp_table = fopen("/proc/net/arp", "r");
if (arp_table == NULL) {
fprintf(stderr, "Not able to open /proc/net/arp\n");
exit(1);
}
// for each MAC address in the list, try to find the respective
// IP address in the arp file, so print the MAC address and its
// IP address to output
printf("%-17s\t%s\n", "MAC Address", "IP Address");
while (tmp != NULL) {
while (fgets(line, 1024, arp_table)) {
Example 8-17 is the main function, where the program is orchestrated and the resources
used are freed, including the libvirt connection. Notice that the program expects the guest
name as the parameter.
// locates the guest and gets the XML which defines such guest
get_domain_xml(conn, vm_name, &domain_xml);
// gets a list with all mac address found in that particular guest
macs = get_macs(domain_xml);
return 0;
}
Example 8-18 on page 244 finally shows how to compile the program and the results you get.
$ ./example3 MyGuest
MAC Address IP Address
52:54:00:9c:74:28 192.168.122.218
52:54:00:13:96:48 192.168.122.80
The Python version of the program is even simpler as shown in Example 8-19.
def connect_to_hypervisor():
conn = libvirt.openReadOnly(None)
if conn is None:
print 'Not able to connect to libvirt'
sys.exit(1)
return conn
try:
dom = conn.lookupByName(domain)
except:
print 'guest not found'
sys.exit(1)
return dom.XMLDesc()
def get_macs(domain_xml):
if domain_xml is None:
print 'domain_xml is None\n'
sys.exit(1)
mac_regex = '([a-zA-Z0-9]{2}:){5}[a-zA-Z0-9]{2}'
return [m.group(0) for m in re.finditer(mac_regex, domain_xml)]
def print_ip_per_mac(macs):
arp_table = None
try:
arp_table = open('/proc/net/arp', 'r')
except IOError:
print 'Not able to open /proc/net/arp\n'
sys.exit(1)
row = line.split()
if row[3] in macs:
print '%s\t%s' % (row[3], row[0])
arp_table.close()
if __name__ == '__main__':
if len(sys.argv) != 2:
print 'Usage: %s <guest name>' % sys.argv[0]
sys.exit(1)
conn = connect_to_hypervisor()
domain_xml = get_domain_xml(conn, sys.argv[1])
macs = get_macs(domain_xml)
print_ip_per_mac(macs)
conn.close()
Example 8-20 shows the results that you get by using the program implemented in Python. As
expected, the result is the same from the program written in C language.
For more information about application development using the PowerKVM Development Kit,
refer to libvirt documentation:
http://libvirt.org/devguide.html
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topics in
this book. Some of these publications might be available in softcopy only:
IBM Power Systems S812L and S822L Technical Overview and Introduction, REDP-5098
IBM PowerVC Version 1.2 Introduction and Configuration, SG24-8199
Managing Security and Compliance in Cloud or Virtualized Data Centers Using IBM
PowerSC, SG24-8082
You can search for, view, download, or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, from the Redbooks website:
ibm.com/redbooks
Online resources
These websites are also relevant as further information sources:
PowerKVM on developerWorks
https://www.ibm.com/developerworks/community/wikis/home?lang=en_us#!/wiki/W51a7
ffcf4dfd_4b40_9d82_446ebc23c550/page/PowerKVM
KVM performance: SPEC virt sc2013 benchmark
http://www.spec.org/virt_sc2013
IBM Fix Central
https://www.ibm.com/support/fixcentral
Debian
http://www.debian.org
https://wiki.debian.org
Fedora
https://getfedora.org
Red Hat Enterprise Linux
http://www.redhat.com/products/enterprise-linux
SUSE Linux Enterprise Server
https://www.suse.com/products/server
openSUSE
https://www.opensuse.org
SG24-8231-01
ISBN 073844152X
Printed in U.S.A.
®
ibm.com/redbooks