Improving Linux Kernel Fuzzing
Improving Linux Kernel Fuzzing
the degree of
Master of Science
in
Palash B. Oswal
May, 2023
© Palash B. Oswal, 2023
All Rights Reserved
Acknowledgements
ii
Abstract
In the Linux Kernel project, one of the most rapidly evolving code bases, fuzz
testing is a successful approach for vulnerability detection. However, with the high
rate of change in the kernel code, testing each change thoroughly becomes a challenge.
With this study, we explore various ways to improve the current Linux Kernel testing
landscape. We identify and contribute novel ways of leveraging previously discovered
crashes in the Linux Kernel. We call it enriched corpus. We also investigate aspects
of program generation for system call fuzzers using iterative deepening.
We work with the state of the art kernel fuzzers like syzkaller [11] and HEALER [9].
During this research, we identified many new kernel bugs, and contribute a new open
source framework for enriching fuzzer corpus [6]. We also identify challenges in
working with corpus and discuss our ongoing experiments and lay out future areas for
research. These findings provide insight into improving the Linux Kernel fuzz testing
process for higher system reliability and security.
iii
Table of Contents
Acknowledgements ii
Abstract iii
1 Introduction 1
3 Regression Testing 10
iv
3.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Regression Testing . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Obtaining Crash Reproducers . . . . . . . . . . . . . . . . . . 11
3.1.3 Running the Regression Tests . . . . . . . . . . . . . . . . . . 11
3.2 Analyzing the Test Results . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Enriched Corpus 13
4.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Obtaining Crash Reproducers . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.1 Initial Experiments . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.2 Production Setup . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4.1 Enriched Corpus with Initial Setup . . . . . . . . . . . . . . . 15
4.4.2 Corpus Comparison in Production Setup . . . . . . . . . . . . 18
4.5 Results of Enriched Corpus . . . . . . . . . . . . . . . . . . . . . . . 21
4.6 Ongoing and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Bonsai Fuzzing 24
5.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Bonsai Fuzzing Parameters . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.1 Program Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.2 System Call Priority . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Bonsai Fuzzing Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4.1 Bonsai Fuzzing with Linux Kernel . . . . . . . . . . . . . . . . 28
5.5 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.6 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.7 Results and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
v
6 Conclusion 31
Bibliography 32
vi
List of Figures
Figure 4.1 Coverage observed during 48 CPU hours (i.e., 24 hours per VM
instance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Figure 4.2 Unique crashes observed during 48 CPU hours (i.e., 24 hours
per VM instance) . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 4.3 Total crashes observed during 48 CPU hours (i.e., 24 hours per
VM instance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 4.4 Coverage observed during 384 CPU hours (i.e., 24 hours of 8
VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Figure 4.5 Total crashes observed during 384 CPU hours (i.e., 24 hours of
8 VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 4.6 Unique crashes observed during 384 CPU hours (i.e., 24 hours
of 8 VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 4.7 Minimization time over 384 CPU hours fuzzing (i.e., 24 hours of
8 VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
Symbols
rM, Ns A bonsai fuzzing based node generated using properties M and
N.
M Size of program M .
Abbreviations
LTS Long Term Stable releases for the Linux Kernel.
viii
1
Introduction
1
is state-of-the-art kernel fuzzer. It uses system call descriptions and a choice table
to generate system call sequences. The choice table captures the probability of
system calls that can be called in a sequence. Syzkaller has identified over 4000
bugs in the upstream Linux Kernel in the past five years with the help of compiler
instrumented kernel sanitizers [2]. Code coverage exercised by the fuzzer programs
is identified using branch coverage information instrumented by the compiler using
the kernel feature. Coverage collection is available to userspace on per task basis
and therefore fuzzers can capture the coverage of a single system call. Kernel fuzzers
like syzkaller can operate with a known set of input programs, referred to as corpus
of programs. Moonshine [7] is designed to construct input programs using seed
distillation. Moonshine extracts the initial seeds from observing program execution
in user space. HEALER [9] is another coverage guided grammar based system call
fuzzer for the Linux Kernel. HEALER aims to identify influence relations between
system call at runtime to inform it’s choice table. HEALER observes a sequence of
system calls and then tries to identify if the execution of one system call affects the
execution of the second system call. This improves the accuracy of the data available
in the choice table and HEALER generates programs more often that have related
system calls together. Kernel testing groups, like KernelCI [4] also perform regression
testing for newer releases of the Linux Kernel. They leverage the historic crashes
identified by syzkaller to perform regression testing. However the process of collecting
these crashes is manual. Each run of regression tests takes over 4 hours to complete
and therefore it is infeasible to test each commit exhaustively.
This study investigates the improvements in system call-based coverage guided
fuzzers of the Linux Kernel. We explore regression testing as a viable strategy to test
long term releases of the Linux Kernel. We also demonstrate the value of leveraging
historic crashes as a viable input corpus for the fuzzers. These crashes have been
identified by decades of CPU time using fuzzing, and they provide value in future
2
explorations. We call it the enriched corpus for kernel fuzzers. The majority of
the new bugs identified with the enriched corpus are a result of corpus triage process
and we explore improvements for existing corpora with the enriched corpus. We
identified 22 new high severity bugs with the enriched corpus, and we obtained 4
CVE identifiers for the vulnerabilities. We have open sourced our enriched corpus
framework for everyone to consume with a continuous integration mechanism to keep
the corpus up to date with historic crashes.
Additionally, we also evaluate how we can approach a new program generation
strategy using iterative deepening after surveying the historic crashes.
The results of this study will provide insight into improving the Linux Kernel fuzz
testing process for higher system reliability and security.
3
2
Background and Problem
Understanding various avenues of improving the Linux Kernel fuzz testing process will
result in higher system reliability and security whilst supporting rapid development.
Over the past few years, fuzz testing has produced numerous bugs. There is a window
of opportunity to analyze the crash reproducers. In Bonsai Fuzzing [10], the authors
demonstrate the generation of minimal fuzzer inputs, which would help reduce the
minimization efforts on fuzzer-generated complex inputs. Kernel fuzzers currently
suffer from having to spend time minimizing the crash reproducers.
There are various state of the art tools that can be used for fuzzing the Linux Kernel.
Syzkaller has been developed by Google in the Go programming language. Google
hosts an instance of syzkaller, known as syzbot [13]. Syzkaller is an unsupervised
coverage-guided grammar-based kernel fuzzer. HEALER [9] is a kernel fuzzer that is
inspired from syzkaller but uses dynamic relationship mapping to inform its fuzzing
algorithm leveraging the same grammar-based approach for input generation as
syzkaller.
4
2.1 Kernel Fuzzing
2.1.1 Syzkaller
Syzkaller generates and mutates programs(system call sequences) based on system call
descriptions in syzlang, the corpus and the choice table. Depending on the provided
inputs, syzkaller generates and executes programs while monitoring the kernel under
test. When syzkaller is operated with an existing corpus, the behavior of the system
is different. The fuzzer first triages the corpus and minimizes the corpus, if the
corpus is marked to be minimized to determine the smallest subset of programs that
exercise the maximum coverage. All corpus programs are ran and then subsequently
minimized before the actual fuzzing process begins. Syzkaller duplicates list of all
corpus programs and the superset is called Candidates. All candidates are triaged
before actual fuzzing, program generation and minimization process can begin.
2.1.2 HEALER
HEALER [9] also operates in the same mode as syzkaller with a major difference
being how it operates the choice table. HEALER makes use of the same grammar,
syzlang for generating programs as syzkaller. HEALER updates the choice table
dynamically at runtime by observing the change in coverage observed by executing
different system call sequences in a program. For example, consider the program:
5
close(r0) // system call 2
HEALER executes the program consisting of two system calls, open() and close().
If it notices new coverage, it executes two programs separately, each of which one
system call. If the observed coverage with both system call program from the first
execution is higher than the execution coverage of the system calls independently,
then HEALER draws a influence relationship between the two system calls. This is
then captured in the choice table to indicate that close() has a higher probability of
occurring if the first system call in a newly generated program is open(). HEALER
also triages the corpus programs given to it as input before starting the actual fuzzing
process. However the major difference here is that corpus programs are not forced to
be minimized in the same way as syzkaller does. HEALER emphasizes the relationship
between system calls and extracts system call pairs that indicate coverage growth.
Crash reproducers are programs that when run in isolation produce the respective
deterministic crashes in isolation. Crash reproducers are of high importance when
trying to narrow down the root cause of a bug and are of high value for developers
and researchers. When a fuzzing tool, such as syzkaller, is tasked with identifying
crash reproducers, it tries to narrow down the previously executed programs before
kernel crash. There are algorithms such as bisection and minimization employed to
produce the simplest version of the crash reproducer program. Crash reproducers can
be of two formats: executable code that can be rerun by developers and syzlang-based
reproducers that can convey semantic value to the fuzzers.
When running syzkaller with a corpus, the tool operates differently when working
with corpora that are hand-packaged. All corpora undergo a triage phase, however,
6
an additional minimization is only employed with corpus packaged with version 0.
To force a corpus to undergo minimization during triage phase, we can unpack and
repack an existing corpus to version 0. During corpus minimization, the program is
iteratively reduced until the least complex configuration is observed that can generate
the same new corpus.
r0 = openat$cdrom(...)
ioctl$CDROMPAUSE(r0, 0x123)
If the following corpus program generates new coverage, then subsequently, the
following program will be executed and the coverage from this execution will be
compared to the prior run.
r0 = openat$cdrom(...)
ioctl$CDROMPAUSE(r0, 0x0)
This minimized version of the corpus program is then stored in the new corpus if
the minimized version of the program indeed generates equivalent coverage. This
minimization operation is also employed at fuzzer runtime when new programs are
mutated or generated that lead to increased coverage.
Google deploys a robust fuzz testing infrastructure with multiple instances of syzkaller
collectively managed by syzbot [13]. Syzbot is responsible for collecting data from
all fuzzer instances and notifying kernel developers through mailing lists about bugs,
crash reproducers and helps with testing patches. Google deploys a clean instance
of syzkaller, alongside latest builds of the Linux Kernel and syzkaller [11] everyday.
Syzbot has been running for over 1946 days as of this writing and has discovered over
7
5513 bugs, out of which 4479 bugs have been fixed and 1034 are yet to be addressed.
Not all bugs discovered by syzkaller have crash reproducers available for them. There
are challenges with efficiently identifying crash reproducers for kernel crashes. Syzbot
publishes statistics and a copy of the corpus after daily fuzzing campaigns. These
crash reproducers carry immense value in terms of CPU hours that has led to the
discovery and identification of the bugs. Besides the Linux Kernel, syzbot also tests
other operating system kernels like the Android Kernel and OpenBSD kernel. Out of
the 4479 bugs discovered by syzkaller that have been fixed, only 2901 bugs have C
reproducers and additionally 332 bugs have non-deterministic syzlang reproducers.
Each bug can have more than one reproducer. There are more than 14901 unique
syzlang based reproducers currently available for these 4479 fixed bugs. These syzlang
based reproducers can be obtained from syzbot.
Many kernel developer partners collaborate within the KernelCI [4] group’s Linux-
Arts [5] Project. Running regression tests against a kernel takes about 4 hours to
complete. Since the Linux Kernel project is evolving at such a rapid pace, regression
testing is not employed for testing every commit. However, regression testing is used
by distribution maintainers for their releases and for releases of the Linux Kernel
Mainline. The set of test programs have historically been given manually by the
syzkaller project maintainers in bulk when requested by the Kernel testing team.
Since the test set is not always up-to-date, it leaves a big room for bugs to creep into
releases for the recent regressions that are not in the test set.
8
2.3.1 Are previously-known crashes and high code coverage corpus entries a valuable
source for testing the Linux Kernel?
Obtain a corpus of previously known crashes and high code coverage entries for the Linux
Kernel from the syzkaller dashboard for various versions of the Linux Kernel. We perform
regression testing with this corpus. By this we will aim to determine the viability of relying
on a static unit testing corpus for continuous testing of the Linux Kernel.
2.3.2 Do historic crashes add value to fuzzing the Linux Kernel as an initial corpus
of inputs?
fuzzer corpus that can leverage these programs as an input. In the thesis, we investigate
testing this enriched corpus independently, and compare it with other fuzzing corpora.
2.3.3 Can we construct programs intelligently using Bonsai Fuzzing technique that
can lead to faster bug discovery?
We determine the bonsai fuzzing parameters to generate programs and evaluate the initial
prototype.
We identify the challenges and also draw a path for future work.
9
3
Regression Testing
Regression testing the Linux Kernel involves running a series of tests to ensure that
changes or updates to the kernel do not introduce any new bugs or break existing
functionality. This is important because the kernel is the core of the operating system
and any issue with it can have serious consequences. Regression tests can be run
manually or automated using a testing framework. Automated regression testing
is preferred because it allows for a larger number of tests to be run in a shorter
amount of time, increasing the chances of detecting potential issues. These tests
can include functional tests, which check that the kernel is functioning correctly,
and performance tests, which measure the performance of the kernel under different
conditions. Regression testing is an ongoing process that is critical for maintaining the
stability and reliability of the Linux Kernel. In this section we will discuss leveraging
the previously fuzzer discovered bugs for continuous regression testing.
10
3.1 Procedure
3.1.1 Regression Testing
• Set up testing environments for the corresponding kernels with a Debian boot-
strapped file system.
• Record the number of discovered bugs and compared the results of the regression
tests with the results of the same tests on previous versions of the Linux Kernel
to determine the effectiveness of the corpus for continuous testing.
• Analyze the results and identify any patterns or trends in the effectiveness of
the regression testing.
The syzkaller dashboard is a web-based interface that allows users to view and analyze
the result of syzkaller runs. To obtain crash reproducers for regression testing, we
determine the versions of the Linux Kernel we are going to run regression tests against.
We selected the LTS releases v4.14.292 and v4.19.257. For each major-minor version,
we obtained the corresponding previous crashes that have been reported to be fixed
by the developer community. We automated the collection and compilation process
for the test suite.
11
newer v4.14.292 kernel and similarly fixed 4.19 crash reproducers against the newer
v4.19.257 release. After running the baseline tests, we also used the fixed corpus and
we ran it against newer version releases of the Linux Kernel.
We have identified that the Long Term Releases for the Linux Kernel do not have an
official channel for reporting regression bugs. There are a lot of unnoticed bugs in
the long term stable releases. We reported our regressions to the syzkaller-lts-bugs
mailing list that tracks the bugs for the LTS releases.
3.3 Contributions
We have prepared scripts that can be used to set up an automated regression testing
framework using previously fixed syzkaller discovered kernel bugs. Some of these
scripts were shared with the Kernel CI testing team [4].
12
4
Enriched Corpus
In this thesis, we propose a novel approach to generating high-quality input data for
fuzzing: enriching historic crashes. Specifically, we propose using publicly available
data on historic syzkaller crashes as a corpus for generating fuzzing inputs. We
specifically target the reproducers for the bugs that have been fixed to avoid re-
triggering any known unfixed bugs and avoid invalid marked bugs.
To evaluate our approach, we conducted experiments using syzkaller and HEALER
with historic crash enriched corpus. Our results show that our approach is effective
at generating high-quality, diverse inputs that are capable of identifying new vulnera-
bilities in software applications. Additionally, our approach is significantly faster and
more efficient than traditional methods of fuzzing, making it a valuable resource.
Overall, our work represents an important step forward in the field of kernel
security by providing a novel approach to generating high-quality input data for
fuzzing. By leveraging the wealth of historic crash data that is publicly available,
we can obtain high number of bugs, ultimately helping to improve the security of
software applications and ensure the safety of users. We contribute our pipeline of
preparing enriched corpus as an open sourced repository [6].
13
4.1 Procedure
• Obtain all crash reproducers from syzbot dashboard for fixed bugs. Condense all
reproducers into a corpus that can be fed into fuzzers, syzkaller and HEALER [9].
• Prepare fuzzing environments and run the fuzzers for a fixed time period.
• Analyze the results and identify any patterns or trends to understand the impact
of enriched crashes in Linux Kernel fuzzing.
We automate obtaining the crash reproducers from syzbot [12] using scripts. We
collected reproducers of the bugs that were fixed before the release of the kernel
versions under test. We tested various versions of the Linux Kernel, namely v6.0.8,
v6.1.20 and we have experiments ongoing at the time of this writing against v6.3-rc6.
HEALER [9] consumes corpus as a set of text-files that are written in syz-lang,
the grammar defined by syzkaller. Syzkaller consumes a corpus formatted as a
database using the inbuilt syz-db tool.
For a comprehensive corpus study, we obtained corpus from Google ran syzkaller
instance, syzbot [13] which is available for research purposes. We refer to this corpus
as “Google corpus” for ease of comparison.
Number of programs in various corpus tested:
14
• Enriched Corpus for Production experiments: 14901 Programs
The difference in the number of programs for enriched corpus in the two exper-
iments is because the production enriched corpus was obtained at a later date prior
to the release of the newer kernel.
We used ThinkMate, Intel® Xeon® Gold 6226R server with 32 threads for our
experiments.
The initial set of kernel fuzzing experiments were performed with 1 Virtual Machine
leveraging KVM capability on the server. Each Virtual Machine had 2 vCPU cores
and 4 GB RAM. All experiments ran for 24 hours (effectively 48 CPU hours) and we
aggregated results of 10 isolated runs.
The later set of production experiments were ran on 8 Virtual Machines, each with a
similar configuration as before. All experiments ran for 24 hours (effectively 384 CPU
hours). These experiments are ongoing and we only present data collected so far.
4.4 Observations
4.4.1 Enriched Corpus with Initial Setup
• No corpus
15
Additionally, we run the experiments on syzkaller and HEALER.
Figure 4.1: Coverage observed during 48 CPU hours (i.e., 24 hours per VM instance)
It is known that using a corpus [14] boosts the coverage for the fuzzers. The
coverage information is the number of basic blocks instrumented by the compiler
(gcc). This information is obtained directly from syzkaller and HEALER. We notice
that HEALER outperforms syzkaller when working with the same initial corpus to
discover new coverage. However, syzkaller discovered more coverage when starting
without a corpus in Figure 4.1.
Syzkaller and HEALER classify crashes with unique tracebacks differently. In practice,
each unique traceback corresponds to a unique bug. The unique crashes information
16
Figure 4.2: Unique crashes observed during 48 CPU hours (i.e., 24 hours per VM
instance)
Syzkaller and HEALER also report the total number of crashes across all bugs. Each
bug relates to a crash. Similar to the observations with unique crashes, we can
observe in Figure 4.3 that the plateau more clearly with total crashes when using the
enriched corpus. We also notice that the number of crashes identified with HEALER
17
Figure 4.3: Total crashes observed during 48 CPU hours (i.e., 24 hours per VM
instance)
is growing in curve and can potentially overtake syzkaller if it runs for a longer time.
These set of experiments are performed to emulate larger fuzzing campaigns with
heavy corpus intake. Single VM based initial experiments took extremely long time
1
The data in production section of experiments is incomplete as the experiments are ongoing.
18
to triage the input corpus and are not representative of the scale at which companies
like Google operate them.
We also prepare an enriched version of the Google corpus by injecting the enriched
corpus in the google’s corpus and repacking it.
Figure 4.4: Coverage observed during 384 CPU hours (i.e., 24 hours of 8 VM instances)
We notice in Figure 4.4 that enriched version of Google corpus outperforms all
other corpus candidates. We notice that forcing minimization of Google corpus takes
CPU time to reach the same coverage than a pre-minimized version of the same
corpus. The total coverage observed with Google corpus is higher than just enriched
corpus.
19
Figure 4.5: Total crashes observed during 384 CPU hours (i.e., 24 hours of 8 VM
instances)
In Figure 4.5 we notice that only the fuzzer instances with enriched corpus result into
higher amount of crashes.
In Figure 4.7 we observe the total triaging time for different corpus. Despite spending
a considerable amount of CPU time in triaging the candidate programs in the enriched
Google corpus, it still results in a high coverage and high amount of crashes.
20
Figure 4.6: Unique crashes observed during 384 CPU hours (i.e., 24 hours of 8 VM
instances)
We identified 22 new bugs when minimizing the enriched corpus against different
versions of the Linux Kernel. We have also obtained 4 CVE identifiers for them
so far. The CVE’s are CVE-2023-26544, CVE-2023-26605, CVE-2023-26606 and
CVE-2023-26607.
Minimizing the enriched corpus with syzkaller also resulted into discovery of
previously unknown bugs. Most of the bugs were identified as a result of minimization
process during the corpus triage phase of the fuzzer. Syzkaller delivers more bugs
than HEALER when minimizing the enriched corpus.
21
Figure 4.7: Minimization time over 384 CPU hours fuzzing (i.e., 24 hours of 8 VM
instances)
Production scale experiments as described in Section 4.3.2 are ongoing. We will also
need to evaluate the performance of HEALER in such a setting with high CPU hour
campaigns. There is an opportunity to study how effectively minimization can be
handled and if that can be used as a input generation for a new kernel fuzzing tool.
We have contributed the enriched corpus generation framework as an open source
tool for researchers to consume an enriched corpus directly. Additionally, we have
added Continuous Integration framework with GitHub actions to prepare enriched
corpus and an enriched version of google’s corpus daily in an automated manner.
22
4.7 Conclusion
We observe the benefits of leveraging historic crashes. There are a lot of benefits from
leveraging enriched corpus with syzkaller. The coverage obtained through historic
crashes is limited to the bugs that have been previously discovered and fixed.
23
5
Bonsai Fuzzing
System call based kernel fuzzers, like syzkaller and HEALER [9] leverage syz-lang
as the grammar for defining system calls. Syzkaller employs a random biased search
to identify subsequent system calls when generating a program from scratch. When
syzkaller is started, it initializes its choice table. The choice table is a large matrix
that demonstrates the relationships between system calls. In case of syzkaller, the
choice table is generated one time at the start of the fuzzing campaign based on
heuristics and corpus presented to the fuzzer. Heuristics display similarity in the
24
parameters shared by different system calls. Corpus presented to syzkaller is used to
identify the system calls that occur together in a program, and that is added to the
heuristic based knowledge to shape the choice table.
When a program is generated by syzkaller, the first system call is arbitrarily
chosen from the list of available system calls. Subsequently, aforementioned random
biased search is employed against the choice table to determine the future system
call. This process is repeated until the size of the program matches the requirements
set by the fuzzer.
Typically, syzkaller generates a program of size 30. During fuzzing, if a crash is
observed, then syzkaller employs a bisection algorithm to find the culprit program and
further minimizes that program to determine the crash reproducer. Upon surveying
a set of crash reproducers from the Linux Kernel in Figure 5.1, we observe that the
majority of programs responsible in the crashes are smaller than the programs that
25
are generated by the fuzzer. This results in the hypothesis that generating smaller
programs by design could save reproducer identification time. Generating smaller
programs for fuzzing could also result into generating more programs in a given
amount of time. With this motivation, we begin our experimentation of generating
programs with iterative deepening strategy.
We decided to use the program size as one of the Bonsai Fuzzing parameters in our
implementation. Program size determines the number of system calls in a program.
We label this property as m in our lattice nodes.
Since choice table is a key element in most grammar-based coverage guided fuzzers for
the Linux Kernel, we relied on it to be the second parameter for our implementation.
The affinity of the highest n system calls was constrained to generate programs. Only
the top n system calls of highest affinity would be allowed as subsequent system calls
in the program.
26
1,0
Figure 5.2 demonstrates the Bonsai Fuzzing lattice we experimented with. The lattice
nodes are labelled [m,n], where m represents the program size and n represents the
number of system calls of highest priority based on the choice table.
The corpus from one lattice node [m,n] will be provided as an input to the future
lattice nodes [m+1,n] and [m,n+1]. For a lattice node [m,n], the input corpus is
the merged corpus from the nodes [m-1,n] and [m,n-1].
The fuzzing for nodes [m+1,n] and [m,n+1] can be run in parallel if their required
input corpus is available.
In our experiment, we started with the lattice node [1,0], which means we
generated programs containing 1 system call. In subsequent node, we leveraged the
corpus from [1,0] as input. [2,0] means we generated a program that consists of 2
system calls, the first of which was chosen at random and the second system call in
the program is the highest priority system call in the choice table of the first system
call.
27
5.4 Procedure
5.4.1 Bonsai Fuzzing with Linux Kernel
• Determine the Bonsai Fuzzing parameters for Linux Kernel fuzzing. The
parameters chosen are system call priority and program size.
• Prototype the design with syzkaller source code by leveraging masks to the
system call choice table.
• Prepare fuzzing environments and run the fuzzers for a fixed time period.
• Analyze the results and identify any patterns or trends to understand the impact
of Bonsai Fuzzing strategy for kernel fuzzing program generation.
We used ThinkMate, Intel® Xeon® Gold 6226R server with 32 threads for our
experiments.
The initial set of kernel fuzzing experiments were performed with 1 Virtual Machine
leveraging KVM capability on the server. Each Virtual Machine had 2 vCPU cores
and 4 GB RAM. All experiments ran for 24 hours (effectively 48 CPU hours) and
we aggregated results of 10 isolated runs. We were able to only experiment with one
lattice configuration.
5.6 Observations
We notice that despite high execution with Bonsai-based approach in Figure 5.4, we
do not match the coverage baselines of syzkaller.
28
Figure 5.3: Coverage over time against Linux v6.0.8
29
Figure 5.4: Executions over time against Linux v6.0.8
5.8 Conclusion
Program generation and corpus merges provide exciting challenges during exploration
of iterative deepening based input generation with Bonsai Fuzzing in the Linux Kernel.
Due to lack of time and experience, we were not able to investigate this research goal.
30
6
Conclusion
With this research, we identified the value in leveraging historic crashes and contribute
the enriched corpus generation framework. We also identified many new bugs during
our research and we have reported them to the Linux Kernel maintainers.
Our enriched corpus framework [6] has gained traction in the open source commu-
nity and we are currently exploring the production scale experiments per Section 4.3.2
to make a case for Google to adopt the enriched corpus within syzbot.
The majority of the crashes with the enriched corpus are a result of the corpus
triage and the subsequent minimization steps performed by syzkaller. We demonstrate
the benefits of using enriched corpus and improving the quality of existing fuzzer
corpus.
There are a lot of opportunities to study program generation. We feel that the
Bonsai Fuzzing methodology implemented and tested in this thesis only scratches the
surface.
31
Bibliography
[1] J. Corbet, “Development statistics for the 6.1 kernel (and beyond) [lwn.net],”
Dec 2022. [Online]. Available: https://lwn.net/Articles/915435/. [Accessed
30-Apr-2023].
[8] H. Shi, R. Wang, Y. Fu, M. Wang, X. Shi, X. Jiao, H. Song, Y. Jiang, and
J. Sun, “Industry practice of coverage-guided enterprise Linux kernel fuzzing,”
in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engi-
neering Conference and Symposium on the Foundations of Software Engineering,
2019, pp. 986–995.
[9] H. Sun, Y. Shen, C. Wang, J. Liu, Y. Jiang, T. Chen, and A. Cui, “HEALER:
Relation learning guided kernel fuzzing,” in Proceedings of the ACM SIGOPS
28th Symposium on Operating Systems Principles, 2021, pp. 344–358.
32
[10] V. Vikram, R. Padhye, and K. Sen, “Growing a test corpus with Bonsai fuzzing,”
in 2021 IEEE/ACM 43rd International Conference on Software Engineering
(ICSE). IEEE, 2021, pp. 723–735.
33
ProQuest Number: 30486941
This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA