VirtualMachines Vs Container
VirtualMachines Vs Container
Different tools, some overlap in the use cases, large overlap in the technology stack. 5
From a security perspective?
The cloud use case
7
Virtual Machines
"A virtual machine (VM) is an emulation of a computer system. Virtual machines are based
on computer architectures and provide functionality of a physical computer."
"[...] virtual machines [...] provide a substitute for a real machine. They provide functionality
needed to execute entire operating systems."
"(Linux) Virtual machine block diagram" - (C) Francesco Romani 2019 - CC by-sa 4.0 9
Containers
A (Linux) container is a set of one or more processes isolated from the rest of the system,
using facilities of the Linux kernel
"(Linux) Containers block diagram" - (C) Francesco Romani 2019 - CC by-sa 4.0 11
Virtual Machines vs Containers
"(Linux) VMs vs Containers block diagram" - (C) Francesco Romani 2019 - CC by-sa 4.0 12
How a container is made
13
Meet the containers
"Containers are being loaded on the container ship MSC Sola at the container terminal of Bremerhaven in Germany" by
Tvabutzku1234, public domain, from Wikimedia Commons 14
A recipe for containers
The basic building blocks:
- namespaces: process isolation
- cgroups: resource limits
A namespace...
wraps a global system resource in an abstraction that makes it appear to the processes
within the namespace that they have their own isolated instance of the global resource.
[...]
One use of namespaces is to implement containers.
Namespaces are ephemeral by default: they are tied to the lifetime of a process.
Once that process is gone, so is the namespace.
///samurai7/~># echo $$
5184
We start a new process (bash) with different network and PID namespaces
Let's doublecheck:
With the linux namespaces, we have the bare bones of a simpl{e,istic} container engine!
Most notably, memory and CPU time (and more: block I/O, pids...)
systemd
docker
libvirt (spoiler!!) 24
cgroups: DIY
Mostly, you don't want to do it :)
Seriously, the management tool (whatever it is) almost always Just Works (tm) and
it is simpler to tune. 25
cgroups: wrap-up
CGroups provide resource limit and accounting
Organized in hierarchies
Here's why you should not DIY - don't reinvent a square wheel
Operational modes:
0 disabled
1 for strict: only four system calls: read, write, exit, sigreturn
2 for filter: allow developers to write filters to determine if a given syscall can run 27
seccomp: API & DIY
Kernel API (syscall), so just prctl(2) and seccomp(2)
You can add your own syscall filters using BPF language (!!!)
Again, better don't reinvent the wheel, just use profiles from your management engine
Adds Mandatory Access Control (MAC) and Role Based Access Control (RBAC) to the linux
kernel
DAC: access control is based on the discretion of the owner: root can do anything.
MAC: the system (and not the users) specifies which can access what: no, even root cannot
do that.
RBAC: in a nutshell, generalization of MAC: create and manage Roles to specify which entity
can access which data.
beware: Again: the world is much more complex than that... (https://en.wikipedia.org/wiki/Role-based_access_control) 30
SELINUX: Daily usage
Mostly used on CentOS, Fedora, RHEL, RHEL-derived distributions
"Just disable SELinux" was a recurrent advice up until not so long ago
It got EXTREMELY better: most of time, you don't even notice it is running. Just Works (tm)
Again, most often just use the profiles your distribution/management engine provides
32
Virtual Machines
We will focus only on the "winner stack": [VT-x/SVM +] KVM + QEMU (e.g. not Xen)
After a vmexit, the hypervisor must take actions to let the VM resume its operations, and
then call VMRESUME
The VMCS holds the virtual CPU state (as seen by the guest)
The VMCS holds the reason why a vmexit happen. 42
The x86 virtualization in a nutshell
The key component of the X86 virtualization is the interaction between root and non-root
code:
hypervisor -> VMLAUNCH -> vmexit -> [hypervisor actions from VMCS data] -> VMRESUME43
KVM
In a nutshell
- Turns Linux into a hypervisor
- built on top of hardware virtualization (VT-x, SVM)
- API as device /dev/kvm, ioctl()s
Do not use directly! (use qemu! or kvmtool or pretty much any other linux tool)
47
VMs
OS-inside-OS
perceived as heavyeight, slow to spin up, hard to manage
guest apps interact with guest Kernel
actually two layers of operating system around your code
more layers -> more code -> more bugs
VM escape techniques do exist
still the greatest possible isolation 48
Containers
shared kernel with host OS
easy and lightweight to get started - aka nice scaling down
guest apps interact with host kernel - but they believe they are alone :)
made popular by docker
friendlier tooling overall?
weak isolation 49
The fallout - and more musings about security
50
Containers as amalgamation of technologies
Containers don't exist -YET- as proper linux objects
Containers are made of a set of linux technologies which create isolation layer(s) around
regular processes
"19th century knowledge mechanisms homemade concrete block mold parts" by Henry Colin Campbell, Public Domain, from
Wikimedia Commons 51
Containers are turbo-charged processes
Wait, QEMU is a process too!
So what does prevent us to use the same isolation technologies around Virtual Machines?
Defense in depth
"(Linux) container stack evolution block diagram" - (C) Francesco Romani 2020 - CC by-sa 4.0 54
Container building blocks integration /2
The modern linux systems are gaining more and more container-like capabilities out of the
box
Meaning, will they just become yet another type of service units? 55
What's a container, really?
If a container is a way to run isolated workloads, the basic linux system are gaining
capabilities to run them
- systemd (and more to come)
- podman?
See:
- kubevirt
- kata containers
- ... 59
Q? A!
60
Thank you
Francesco Romani
Senior Software Engineer, Red Hat
fromani {gmail,redhat}
http://github.com/{mojaves,fromanirh} (http://github.com/%7Bmojaves,fromanirh%7D)