How to build derived bootc container images

The original Docker container model of using "layers" to model applications has been extremely successful. This project aims to apply the same technique for bootable host systems - using standard OCI/Docker containers as a transport and delivery format for base operating system updates.

With bootable containers, you can build and customize the entire host OS with the same tools as for application containers. That means you can build on top of base bootc images with Dockerfiles and tailor the OS to your needs.

Best Practices

Multi-stage builds

Bootc containers are shipped as ordinary OCI containers and are intended to be usable as part of a container build process, but are primarily designed to run on booted physical/virtual machines via bootc. Hence, there is a number of things to consider when building and running bootc OCI containers.

There is a number of systemd services that setup the filesystem, among other things. For instance, the root’s home directory is not present but created by a systemd service on boot/init. That implies that a bootc container is not always the best environment to, for instance, compile a project. We recommend multi-stage builds for that purpose and compile the source in a build stage from which build artifacts can be copied into the final stage to create a derived image.

dnf -y update

Do not attempt to invoke dnf -y update (or upgrade) in general. While some things will work correctly, others will not (especially at the moment kernel and bootloader updates). We will aim to fix much of this over time, but still you should instead prefer only explicitly pulling in updates (or reversions) that you need.

The secondary reason to avoid this: Often people choose image-based updates for their predicability, and you can easily "pin" the base image by digest for example. The default for dnf repositories is to "float" - so what happens with an image build today could be different tomorrow. Packages can be locked with extra effort.

linting

We recommend running the bootc container lint command as a final stage during a container build in Containerfile. This command will perform a number of checks inside the container image and throw an error in case of issues.

Containerfile
FROM quay.io/fedora/fedora-bootc:41
# Customization steps
RUN bootc container lint

GitHub Actions

You may want to build a derived bootc image on a GitHub project via GitHub Actions. Since bootc-based images can grow in size quickly, you are likely to run into disk-space issues on the Action runner. Adding the following first step to the Action may solve the space issue:

    # Based on https://github.com/orgs/community/discussions/25678
    - name: Delete huge unnecessary tools folder
          run: rm -rf /opt/hostedtoolcache

For an example project on GitHub using the Buildah and Podman Actions, please visit github.com/nzwulfin/cicd-bootc.

Container metadata

While one can add Container configuration metadata (e.g., environment, exposed ports, default users) to an OCI container image, bootc generally ignores that. In practice, that means that certain things may work when being run as an ordinary OCI container via Podman but won’t work once booted. For instance, you may use the ENV foo=bar instruction in a Container file which will be visible in a Podman container but it won’t be propagated to the booted system.

For details and recommendations, please refer to the bootc-runtime documentation.

Lifecycle binding code and configuration

At the current time, the role of bootc is solely to boot and upgrade from a single container image. This is a very simplistic model, but it is one that captures many use cases.

In particular, the default assumption is that code and configuration for the base OS are tightly bound. Systems which update one or the other asynchronously often lead to problems with skew.

Containerized vs 1:1 host:app

A webserver is the classic case of something that can be run as a container on a generic host alongside other workloads. However, many systems today still follow a "1:1" model between application and a virtual machine. Migrating to a container build for this can be an important stepping stone into eventually lifting the workload into an application container itself.

Additionally in practice, even some containerized workloads have such strong bindings/requirememnts for the host system that they effectively require a 1:1 binding. Production databases often fall into this class.

httpd (bound)

Nevertheless, here’s a classic static http webserver example; an illustrative aspect is that we move content from /var into /usr. It expects an index.html colocated with the Containerfile.

Containerfile
FROM quay.io/fedora/fedora-bootc:41
# The default package drops content in /var/www, and on bootc systems
# we have /var as a machine-local mount by default. Because this content
# should be read-only (at runtime) and versioned with the container image,
# we move it to /usr/share/www instead.
RUN dnf -y install httpd && \
    systemctl enable httpd && \
    mv /var/www /usr/share/www && \
    echo 'd /var/log/httpd 0700 - - -' > /usr/lib/tmpfiles.d/httpd-log.conf && \
    sed -ie 's,/var/www,/usr/share/www,' /etc/httpd/conf/httpd.conf
# Further, we also disable the default index.html which includes the operating
# system information (bad idea from a fingerprinting perspective), and crucially
# we inject our own content as part of the container image build.
# This is a key point: In this model, the webserver content is lifecycled exactly
# with the container image build, and you can change it "day 2" by updating
# the image. The content is underneath the /usr readonly bind mount - it
# should not be mutated per machine.
RUN rm /usr/share/httpd/noindex -rf
COPY index.html /usr/share/www/html
EXPOSE 80

httpd (containerized)

In contrast, this example demonstrates a webserver as a "referenced" container image via podman-systemd that is also configured for automatic updates.

This reference example is maintained in app-podman-systemd.

caddy.container
[Unit]
Description=Run a demo webserver

[Container]
# This image happens to be multiarch and somewhat maintained
Image=docker.io/library/caddy
PublishPort=80:80
AutoUpdate=registry

[Install]
WantedBy=default.target
Containerfile
# In this example, a simple "podman-systemd" unit which runs
# an application container via https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html
# that is also configured for automatic updates via
# https://docs.podman.io/en/latest/markdown/podman-auto-update.1.html
FROM quay.io/centos-bootc/centos-bootc:stream9
COPY caddy.container /usr/share/containers/systemd
# Enable the simple "automatic update containers" timer, in the same way
# that there is a simplistic bootc upgrade timer. However, you can
# obviously also customize this as you like; for example, using
# other tooling like Watchtower or explicit out-of-band control over container
# updates via e.g. Ansible or other custom logic.
RUN systemctl enable podman-auto-update.timer

Authentication, users and groups

The container images above are just illustrative demonstrations that are not useful standalone. It is highly likely that you will want to run other container images, and perform other customizations.

Among the most likely additions is configuring a mechanism for remote SSH; see Authentication, Users, and Groups.

Invoking useradd as part of a container build

Often packaging scripts may invoke useradd. This can cause "state drift" in the case where /etc/passwd is also locally modified on the system, and transient /etc is not in use.

More on this in bootc upstream.

If the user does not own any content shipped in /usr and it runs as a systemd unit, then it’s often a good candidate to convert to systemd DynamicUser=yes, which has numerous advantages in general. Using DynamicUser will also help take care of ownership of e.g. /var/lib/somedaemon (StateDirectory and more).

However, porting to DynamicUser=yes can be somewhat involved in complex cases. If the RPM does contain files owned by the allocated user, but that content is just in e.g. /var/lib/somedaemon or /var/log/somedaemon, then often the best fix is to drop that content from the RPM (you can %ghost it to mark it as owned) and switch to creating it at runtime via systemd-tmpfiles.

You can then also switch to creating the user via systemd-sysusers.

And at that point, you can also drop the %post from the RPM which allocates the user.

When your package owns content shipped in /usr

This occurs in the case of things like setuid/setgid binaries. The first solution: Avoid setuid/setgid binaries entirely! Usually, there’s a better approach to the problem domain.

Another case is where a daemon wants to drop privileges but wants to access its configuration state in /etc. For example, polkit does this in /etc/polkit-1/rules.d. One solution here is to use e.g. BindReadOnlyPaths= to mount the source directory into the namespace of the daemon.

If you are in this situation, then there is no solution other than statically allocating the user, which requires global coordination. You can request it e.g. via Fedora. But this should be avoided to the greatest extent possible.

General configuration guidance

Many configuration changes to a Linux system boil down effectively to writing configuration files into /etc or /usr - those operations translate seamlessly into booted hosts via a COPY instruction or similar in a container build.

More examples

See Examples for many examples of container image definitions!