Bounded loops in BPF for the 5.3 kernel

July 31, 2019

This article was contributed by Marta Rybczyńska

BPF programs have gained significantly in capabilities over the last few years and can now perform many useful operations. That said, BPF developers have had to work around an annoying limitation until recently: they could not use loops. This restriction was recently lifted by a patch set from Alexei Starovoitov that was merged for Linux 5.3. In addition to adding support for loops, it also greatly decreases the load time of most BPF programs.

The problem

Before a BPF program runs, it needs to be checked to ensure that it cannot cause harm to the system. For example, if it does not complete in a bounded time, it could be used to carry out a denial-of-service attack on the system. The task of checking programs is performed by the BPF verifier. Until recently, the verifier could not handle loops, meaning that all programs with loops were rejected.

Since loops are one of the basic ingredients of a computer program, this limitation was often hit by developers, who worked around it by unrolling loops (either by hand or using a compiler pragma). It comes as no surprise that there have been a number of attempts to change the situation by allowing loops that provably have a limited (and reasonable) number of iterations. Those include a proof-of-concept proposal from Edward Cree. Another attempt by John Fastabend tried to solve the problem with loop analysis: identifying each loop's induction variable and verifying its use. Fastabend covered the theoretical background and the possible solutions in a presentation at the 2018 Linux Plumbers Conference. The kernel developers decided not to take either solution at that time.

In parallel, important work has been put into optimizing the BPF verifier, as shown in a set of Linux Storage, Filesystem and Memory Management Summit slides by Jakub Kicinski. One important outcome of this work was increasing the size limitation for BPF programs in the 5.2 kernel; instead of 4096 instructions, a program can execute up to one million. These optimizations allow a more direct — even brute-force — approach to the bounded-loop-checking problem: instead of adding special handling for loops, the verifier can simply simulate the iterations of a loop as a collection of states no different from any others.

The BPF verifier

When the BPF verifier analyzes a program, it creates a model in the form of a state machine. It checks each state for incorrect behavior; while it does so, it records the states it has already verified. Later on, when it reaches a state that is equivalent to one that was already verified, it can conclude there is nothing more to do on that path, which can thus be pruned. This pruning significantly reduces the amount of work that the verifier must do.

State-based pruning is essential for verifier performance, but it imposes its own cost in the form of comparing states and copying the safe ones. Recent analysis of a set of networking-related BPF programs showed that 80% of the saved states will never be matched and, thus, will never prune a future search. As a result, there may be performance gains to be had by reducing the number of states and pruning points maintained by the verifier.

The old optimizer worked by placing pruning points both before and after each jump instruction, resulting in pruning points being added approximately every four instructions. It turns out that simply placing a pruning point every ten instructions, regardless of the instruction type, improves performance considerably. The verifier was also modified to aggressively drop saved states that do not actually prune paths. These changes improve verifier performance by up to 20% and increase the length of a program that can be verified in a reasonable time by one-third.

Adding bounded loops

Starovoitov's patch set introducing bounded loops builds on that earlier work. It also includes a number of other improvements. This is partly the result of a big performance regression introduced by the first patch in the series that is a necessary building block for the final solution.

That regression takes the form of a performance degradation when adding variable tracking on the stack. The compiler often puts variables on the stack (or "spills" them) when it needs to free up some registers. The verifier should be able to track such variables, since it may need to make decisions based on their contents. Until now it did not do so, resulting in certain programs being incorrectly rejected. It turns out that loop induction variables are often spilled, so fixing the tracking of their contents was necessary to be able to verify loop termination.

On the other hand, tracking specific variable values (as opposed to ranges of possible values) creates more states, decreasing the effectiveness of state pruning; tracking them on the stack makes the problem even worse. The number of states increases, so the verification time also increases. When debugging this issue, Starovoitov found another effect; the performance penalty is aggravated by changes in the Clang compiler. It turns out that newer Clang versions spill fewer variables onto the stack, reducing state pruning even in the absence of complete value tracking. The two problems together caused an important degradation of verifier performance, with the exact results available in the commit message.

Another feature needed by the bounded-loop support is extending the way the verifier handles conditional branches. If a comparison takes place between two constants, the verifier can easily determine which branch will be taken. Comparisons involving variable values are clearly harder, which is why, until now, the verifier supported only comparisons of a register with a constant. In this patch set, Starovoitov added support for tests comparing two registers as well. This enhancement was necessary to be able to simulate the execution of loop tests.

The third and final piece consists of adding a parent-child relationship between the states. When the verifier needs to explore two branches in the program, it considers them both as child states of the parent state. It also counts the number of branches to explore so that it knows how many are left. With those features, and more aggressive heuristics in the exploration state pruning, the verifier can support bounded loops. It does it simply by simulating all possible iterations of their execution.

With bounded-loop support added, only one item remained: the regression introduced at the beginning of the series. The solution comes in the last patch of the series. Based on the improvements added before (especially the parentage tracking), Starovoitov added tracking of precise scalar values. Those values are stored in registers and will be modified during program execution. The verifier needs to have precise values to analyze branches correctly, but it need not track the value of every register precisely; only those that control branching require that precision. So the verifier does not incur the cost of tracking all registers precisely; instead, when the need arises, it backtracks through the use of a register in the code to generate a precise value if possible.

Summary

The BPF verifier has undergone a number of changes in this patch set. The resulting code not only adds support for bounded loops, but also a number of important optimizations. Writing BPF programs for the kernel should be rather easier in the 5.3 release, though undoubtedly BPF developers will still have ample opportunity to complain about the remaining hoops they have to jump through to convince the verifier that their programs are safe.

Index entries for this article
Kernel	BPF/Loops
GuestArticles	Rybczynska, Marta

Bounded loops in BPF for the 5.3 kernel

Posted Jul 31, 2019 21:51 UTC (Wed) by mtaht (subscriber, #11087) [Link] (17 responses)

And now bpf is turing complete. How long will it be until it can read mail?

Bounded loops in BPF for the 5.3 kernel

Posted Jul 31, 2019 22:16 UTC (Wed) by johill (subscriber, #25196) [Link] (12 responses)

It can never be Turing complete (at least not intentionally, given the verifier), see the Halting problem.

Bounded loops in BPF for the 5.3 kernel

Posted Aug 1, 2019 15:29 UTC (Thu) by Karellen (subscriber, #67644) [Link] (11 responses)

Just because a verifier for Turing-complete languages cannot reliably classify *all* arbitrary programs into either "safe" or "unsafe", that does not mean they don't exist or are useless. Verifiers can correctly classify large numbers of useful programs as safe. They can also correctly classify large numbers of unsafe programs as such. Any program that a verifier cannot classify, due to being insufficiently powerful, or due to the Halting problem, can simply be declared as "not verified safe" and rejected.

Note that this means that some useful, safe programs could be rejected. That is unfortunate, but not an argument that verifiers are impossible, or even useless.

Bounded loops in BPF for the 5.3 kernel

Posted Aug 1, 2019 17:27 UTC (Thu) by excors (subscriber, #95769) [Link] (10 responses)

I think there's some ambiguity on what the BPF language is (in the theoretical computer science sense of "language"). Is it all the programs that can be syntactically expressed with BPF bytecode? Or is it all the programs that can be expressed *and* that the verifier accepts? I think johill was using the latter, but the phrase "verifier for Turing-complete languages" seems to imply the former.

The pre-verification language may be Turing-complete (if it had unbounded loops or equivalent). The post-verification language cannot be, assuming the verifier only accepts programs that it can prove will halt (and rejects ones where it can't decide), because there will always exist a computation that halts but that cannot be implemented in a program that the verifier will accept. The post-verification language may still be useful in practice (as you say), but it's not Turing-complete.

In practice, Turing-completeness seems a pretty useless concept and a big distraction. Turing machines only care about the boundary between infinite and non-infinite, so a computation that requires more time than exists in the universe is just as good as one that completes almost immediately. A practical software engineer cares about a boundary measured in seconds or milliseconds, and a program that takes years is just as bad as one that never halts, so they need a totally different set of tools to analyse that problem.

Bounded loops in BPF for the 5.3 kernel

Posted Aug 1, 2019 19:27 UTC (Thu) by johill (subscriber, #25196) [Link]

Yeah, I was indeed thinking of "the programs that will be accepted by the kernel verifier" - because that's actually what the article discusses (changes/improvements in the verifier).

The language itself is fairly obviously Turing complete (not that I've written down an emulator), but of course that is a theoretical concept anyway, as you say.

And in any case, going back to the OP's comment, BPF is almost certainly already reading email somewhere, in the form of DPI :-)

Bounded loops in BPF for the 5.3 kernel

Posted Aug 1, 2019 19:48 UTC (Thu) by Karellen (subscriber, #67644) [Link] (8 responses)

Just because the verifier only outputs programs which are guaranteed to halt, does that mean that the language used to express them is necessarily Turing-incomplete?

Bounded loops in BPF for the 5.3 kernel

Posted Aug 2, 2019 13:39 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (3 responses)

Seems like a philisophical question to me. Is this C code? Or is it invalid because a reasonable C implementation could just say "no" rather than outputting any object code for it?

int* p = NULL;
int i = *p;

Bounded loops in BPF for the 5.3 kernel

Posted Aug 5, 2019 16:40 UTC (Mon) by shane (subscriber, #3335) [Link] (2 responses)

Hm... turning your code into a C program:

#include <stddef.h>

int
main ()
{
    int* p = NULL;
    int i = *p;
    return i;
}

We discover that clang doesn't warn or complain in any way:

$ clang --version
clang version 8.0.0-3 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ clang -Weverything foo.c
$ ./a.out
Segmentation fault (core dumped)

Likewise by default GCC doesn't:

$ gcc --version
gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -Wextra -Wall -Wpedantic foo.c
$ ./a.out
Segmentation fault (core dumped)

We can use a feature disabled by default:

$ gcc -Werror=null-dereference -O foo.c 
foo.c: In function ‘main’:
foo.c:7:9: error: null pointer dereference [-Werror=null-dereference]
     int i = *p;
         ^
cc1: some warnings being treated as errors

Enabling this warning apparently breaks the GCC self-bootstrapping build (as well as things like building the Linux kernel). The ticket which added the warning is here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16351

Bounded loops in BPF for the 5.3 kernel

Posted Aug 5, 2019 17:09 UTC (Mon) by excors (subscriber, #95769) [Link] (1 responses)

I guess that's because it's only undefined behaviour if you actually execute main(), and the compiler doesn't know if you're going to execute main(), so it can't reject the code at compile time. And in practice this pattern probably occurs frequently in dead code, so programmers would get annoyed if it generated a warning.

GCC still knows that function must never be executed, so it replaces it with the "ud2" instruction (at least with gcc -O2).

Bounded loops in BPF for the 5.3 kernel

Posted Aug 6, 2019 9:27 UTC (Tue) by farnz (subscriber, #17727) [Link]

As an aside, examining the code in Matt Godbolt's Compiler Explorer shows that gcc 9.1 leaves an unnecessary MOV before UD2, while clang goes straight to RET. MSVC compiles to clear RAX, load EAX from *RAX, and return 0, while ICC generates code to set up SSE2 the way it wants it, loads AL from *NULL, and then calls an "abort" subroutine.

Bounded loops in BPF for the 5.3 kernel

Posted Aug 6, 2019 11:39 UTC (Tue) by massimiliano (subscriber, #3048) [Link] (2 responses)

The way I see it, the semantics of the BPF is that a program needs to be verified, and then executed. The verification step is an integral part of the execution, because if affects the final result of the high level invocation of a BPF program (the invocation might fail because the verification fails).

My take is that the verifier, being implemented in a Turing complete language (C), is subject to the halting problem and in principle could never terminate.

It is like if every BPF program had an initial implicit "verify" instruction opcode. A BPF program without verification step is Turing complete (thanks to its loops), and one with the verification step... maybe is not Turing complete by itself (it number of states is bounded so in the end it is a a finite state machine), but it is still subject to the halting problems because of the verifier!

Of course, as said in other comments, nothing of this matters in practice. But it is one way of understanding how BPF programs are not violating the "halting problem" principle: it's just that kernel developers are trusting the verifier to always terminate :-)

Bounded loops in BPF for the 5.3 kernel

Posted Aug 7, 2019 7:39 UTC (Wed) by laarmen (subscriber, #63948) [Link] (1 responses)

What the halting problem basically says is that there are some programs for which one cannot tell if they will halt or not. Doesn't mean all programs, nor even most programs, just "some". I'm not aware of any proof that the verifier is one of those programs (nor of a proof that it halts, for that matter), and a quick Google search doesn't turn out any promising link.

Anyone aware of such work ?

Bounded loops in BPF for the 5.3 kernel

Posted Aug 8, 2019 19:27 UTC (Thu) by johill (subscriber, #25196) [Link]

The verifier always halts, simply because it gives up after a while. It only processes a maximum of 1e6 instructions (BPF_COMPLEXITY_LIMIT_INSNS) before giving up:

                if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
                        verbose(env,
                                "BPF program is too large. Processed %d insn\n",
                                env->insn_processed);
                        return -E2BIG;
                }

Bounded loops in BPF for the 5.3 kernel

Posted Aug 7, 2019 15:48 UTC (Wed) by fuhchee (guest, #40059) [Link]

Yes, exactly - construing the language as the sequence of bpf bytecodes that are accepted as inputs.

Bounded loops in BPF for the 5.3 kernel

Posted Aug 1, 2019 10:59 UTC (Thu) by pizza (subscriber, #46) [Link] (2 responses)

I'm more worried that, coupled with the kernel's new time travel features and all these new "AI" accelerators, a BPF program could gain sentience, go back in time and try to kill Sarah Conner.

Bounded loops in BPF for the 5.3 kernel

Posted Aug 1, 2019 13:56 UTC (Thu) by abatters (✭ supporter ✭, #6932) [Link] (1 responses)

I'm confused now. So you *don't* want your BPF programs to terminate?

Bounded loops in BPF for the 5.3 kernel

Posted Aug 3, 2019 14:06 UTC (Sat) by sdalley (subscriber, #18550) [Link]

Oh, that's not the issue. With the back-in-time feature, It can easily arrange to terminate before it begins, so everyone should be happy...

Bounded loops in BPF for the 5.3 kernel

Posted Aug 1, 2019 14:22 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Bounded loops are still push-down automata, not Turing machines. You need unbounded loops to be Turing-complete.

Should verification be done in the kernel?

Posted Aug 1, 2019 0:12 UTC (Thu) by anadav (subscriber, #99427) [Link] (1 responses)

The results are quite impressive and loop support is very useful. However, there might be a question of whether the approach of using ad-hoc optimizations for compiled byte code (following verification failures, or llvm version changes) is a viable path, or whether it is a whack-a-mole. In addition, such optimizations have introduced in the past several security bugs.

Verifying byte code for safety guarantees is a known problem with practical and proven solutions. Since eBPF verification is not usually on the hot-path (or at least it should not be), it raises the question: why does eBPF verification have to be performed in the kernel? Wouldn’t it be better to perform it in userspace using standard tools instead?

It may be beneficial just to introduce this option to users, by separating the binary rewriting and verification stages and provide hooks for userspace to register an external verifier. IMA and other security measures can then be used to ensure the verifier itself is not compromised. Having such a verifier can also assists in the adoption of eBPF in other environments (e.g., DPDK).

As a reference [1], my colleagues have developed an eBPF verifier prototype that uses the well-known verification technique called abstract interpretation and shows promising results (good asymptotic scalability, very low rate of false positives). It verifies loop safety out of the box and can be extended to also check that programs with loops terminate.

[1] https://github.com/vbpf/ebpf-verifier

Should verification be done in the kernel?

Posted Aug 5, 2019 17:16 UTC (Mon) by ecree (guest, #95790) [Link]

The eBPF developers are aware of this work, and we looked into whether any of it could be of use. The following analysis is my own, but I believe the other devs broadly agree with it.
There are some problems which are non-trivial, at least, with trying to extend PREVAIL to cover the full feature set of the existing verifier, including:
* the zone domain can only support relational difference constraints _or_ relational integer congruence constraints (Miné 2004 §5.4.6), not both at once (the paper just states this is 'straightforward' without further explanation) which is needed for alignment checking.
* similarly, the paper doesn't actually explain how abstract interpretation can check loop termination; once you've got a fixed-point state set, how do you determine that the program doesn't go round in circles within that set? Just stating that it's possible doesn't guide implementors much.
* the verifier today has checks more complicated than simple 'safety of execution', for instance it can detect certain speculative-execution vulnerabilities (don't ask me how this works). It's not clear that an abstract semantics can be written that deals with this.
* sure your runtime complexity is good, but that memory usage!
Our conclusion was that the best use we could make of PREVAIL was to incorporate some of its detail ideas, while rejecting the overall architecture. In particular, I was intrigued by the concept (§7.3) of having userspace supply (essentially) a safety proof which the kernel merely validates; the equivalent of this in the existing verifier's architecture would be to specify explored_states at pruning points which are both sufficient to prove safety downstream and provable from the upstream code (which currently are discovered by the verifier and often overly precise, leading to path explosion); this would be particularly useful in the form of loop invariants as the kernel verifier would only need to prove (a) holds at loop entry, (b) maintained by loop and (c) sufficient to prove safety from loop exit. I haven't had the time to dig into this myself (and expect to be far too busy in the near future).

Bounded loops in BPF for the 5.3 kernel

Posted Aug 5, 2019 17:21 UTC (Mon) by ecree (guest, #95790) [Link]

So the verifier does not incur the cost of tracking all registers precisely; instead, when the need arises, it backtracks through the use of a register in the code to generate a precise value if possible.

Alexei can correct me, but I don't think this is quite accurate. AIUI, the verifier always calculates precise value bounds, but normally that precision is ignored for pruning. Only if the path from a pruning point to an exit uses the precise value does the precise value need to be checked for pruning safety. So the backtracking is marking parent states as "needs precision", not actually generating new values.