0% found this document useful (0 votes)
24 views

Improving Linux Kernel Fuzzing

Uploaded by

gilexib487
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Improving Linux Kernel Fuzzing

Uploaded by

gilexib487
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Improving Linux Kernel Fuzzing

Submitted in partial fulfillment of the requirements for

the degree of

Master of Science

in

Information Technology - Information Security

Palash B. Oswal

Bachelor of Technology in Computer Engineering


Sardar Vallabhbhai National Institute of Technology, INDIA

Carnegie Mellon University


Pittsburgh, PA

May, 2023
© Palash B. Oswal, 2023
All Rights Reserved
Acknowledgements

I would like to express my sincere gratitude to my advisor, Dr. Rohan Padhye,


for their invaluable guidance, encouragement, and support throughout my Master’s
program. Without their guidance and expertise, this thesis would not have been
possible. Dr. Padhye also sponsored the compute required to perform experiments
for the thesis. The thesis is self funded.
I also extend my heartfelt thanks to my thesis reader, Dr. Maverick Woo, for
their insightful comments and suggestions, which greatly improved the quality of this
thesis.
I am deeply grateful to the members of my research group, Pasta Lab, for their
support, collaboration, and friendship. I have learned so much from each of you and
greatly appreciate all the discussions, feedback, and assistance you provided.
Finally, I would like to thank my department staff for their support.

ii
Abstract

In the Linux Kernel project, one of the most rapidly evolving code bases, fuzz
testing is a successful approach for vulnerability detection. However, with the high
rate of change in the kernel code, testing each change thoroughly becomes a challenge.
With this study, we explore various ways to improve the current Linux Kernel testing
landscape. We identify and contribute novel ways of leveraging previously discovered
crashes in the Linux Kernel. We call it enriched corpus. We also investigate aspects
of program generation for system call fuzzers using iterative deepening.
We work with the state of the art kernel fuzzers like syzkaller [11] and HEALER [9].
During this research, we identified many new kernel bugs, and contribute a new open
source framework for enriching fuzzer corpus [6]. We also identify challenges in
working with corpus and discuss our ongoing experiments and lay out future areas for
research. These findings provide insight into improving the Linux Kernel fuzz testing
process for higher system reliability and security.

iii
Table of Contents

Acknowledgements ii

Abstract iii

List of Figures vii

1 Introduction 1

2 Background and Problem 4


2.1 Kernel Fuzzing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Syzkaller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 HEALER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Crash Reproducers . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Corpus Minimization . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Current Testing Landscape and Challenges . . . . . . . . . . . . . . . 7
2.2.1 Syzbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Regression Testing . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Are previously-known crashes and high code coverage corpus
entries a valuable source for testing the Linux Kernel? . . . . 9
2.3.2 Do historic crashes add value to fuzzing the Linux Kernel as
an initial corpus of inputs? . . . . . . . . . . . . . . . . . . . . 9
2.3.3 Can we construct programs intelligently using Bonsai Fuzzing
technique that can lead to faster bug discovery? . . . . . . . . 9

3 Regression Testing 10

iv
3.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Regression Testing . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Obtaining Crash Reproducers . . . . . . . . . . . . . . . . . . 11
3.1.3 Running the Regression Tests . . . . . . . . . . . . . . . . . . 11
3.2 Analyzing the Test Results . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Enriched Corpus 13
4.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Obtaining Crash Reproducers . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.1 Initial Experiments . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.2 Production Setup . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4.1 Enriched Corpus with Initial Setup . . . . . . . . . . . . . . . 15
4.4.2 Corpus Comparison in Production Setup . . . . . . . . . . . . 18
4.5 Results of Enriched Corpus . . . . . . . . . . . . . . . . . . . . . . . 21
4.6 Ongoing and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Bonsai Fuzzing 24
5.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Bonsai Fuzzing Parameters . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.1 Program Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.2 System Call Priority . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Bonsai Fuzzing Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4.1 Bonsai Fuzzing with Linux Kernel . . . . . . . . . . . . . . . . 28
5.5 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.6 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.7 Results and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

v
6 Conclusion 31

Bibliography 32

vi
List of Figures

Figure 4.1 Coverage observed during 48 CPU hours (i.e., 24 hours per VM
instance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Figure 4.2 Unique crashes observed during 48 CPU hours (i.e., 24 hours
per VM instance) . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 4.3 Total crashes observed during 48 CPU hours (i.e., 24 hours per
VM instance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 4.4 Coverage observed during 384 CPU hours (i.e., 24 hours of 8
VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Figure 4.5 Total crashes observed during 384 CPU hours (i.e., 24 hours of
8 VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 4.6 Unique crashes observed during 384 CPU hours (i.e., 24 hours
of 8 VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 4.7 Minimization time over 384 CPU hours fuzzing (i.e., 24 hours of
8 VM instances) . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Figure 5.1 Histogram of Crash reproducers in the Linux Kernel . . . . . . 25


Figure 5.2 Bonsai Fuzzing Lattice . . . . . . . . . . . . . . . . . . . . . . . 27
Figure 5.3 Coverage over time against Linux v6.0.8 . . . . . . . . . . . . . 29
Figure 5.4 Executions over time against Linux v6.0.8 . . . . . . . . . . . . 30

vii
Symbols
rM, Ns A bonsai fuzzing based node generated using properties M and
N.

M Size of program M .

N Choice Table affinity of system call N .

Abbreviations
LTS Long Term Stable releases for the Linux Kernel.

LiFT Shubhra Kar Linux Foundation Training (LiFT) Scholarship


Program

viii
1
Introduction

The Linux Kernel is a highly privileged component in a system, and therefore


vulnerabilities and bugs in the Kernel are very important for the stability and security
of the system. The Linux Kernel project is one of the most rapidly evolving code
bases with over 1000 line change sets per day [1]. Fuzz testing is a successful approach
for vulnerability detection in the Linux Kernel, but with the high rate of change,
testing each change thoroughly becomes a challenge.
For instance, a buffer overflow vulnerability in the Linux Kernel stack has the
potential to render the kernel unresponsive, and might also lead to the compromise
of the system in an attack [3]. However, due to the growing complexity in the feature
requirements and the growth in publicly available system architectures, maintaining
the Linux Kernel code is complicated. As of writing, the Linux Kernel has over 1.1
million commits resulting in a huge code-base. Manually writing unit tests for the
kernel is difficult to properly test with a high code coverage. With such a scale,
fuzz testing [8] has proven to be a successful technique to identify bugs in the Linux
Kernel. One of the most popular approaches to fuzz testing the Linux Kernel is
through system call fuzzing. Coverage guided system call fuzzers like syzkaller [11]

1
is state-of-the-art kernel fuzzer. It uses system call descriptions and a choice table
to generate system call sequences. The choice table captures the probability of
system calls that can be called in a sequence. Syzkaller has identified over 4000
bugs in the upstream Linux Kernel in the past five years with the help of compiler
instrumented kernel sanitizers [2]. Code coverage exercised by the fuzzer programs
is identified using branch coverage information instrumented by the compiler using
the kernel feature. Coverage collection is available to userspace on per task basis
and therefore fuzzers can capture the coverage of a single system call. Kernel fuzzers
like syzkaller can operate with a known set of input programs, referred to as corpus
of programs. Moonshine [7] is designed to construct input programs using seed
distillation. Moonshine extracts the initial seeds from observing program execution
in user space. HEALER [9] is another coverage guided grammar based system call
fuzzer for the Linux Kernel. HEALER aims to identify influence relations between
system call at runtime to inform it’s choice table. HEALER observes a sequence of
system calls and then tries to identify if the execution of one system call affects the
execution of the second system call. This improves the accuracy of the data available
in the choice table and HEALER generates programs more often that have related
system calls together. Kernel testing groups, like KernelCI [4] also perform regression
testing for newer releases of the Linux Kernel. They leverage the historic crashes
identified by syzkaller to perform regression testing. However the process of collecting
these crashes is manual. Each run of regression tests takes over 4 hours to complete
and therefore it is infeasible to test each commit exhaustively.
This study investigates the improvements in system call-based coverage guided
fuzzers of the Linux Kernel. We explore regression testing as a viable strategy to test
long term releases of the Linux Kernel. We also demonstrate the value of leveraging
historic crashes as a viable input corpus for the fuzzers. These crashes have been
identified by decades of CPU time using fuzzing, and they provide value in future

2
explorations. We call it the enriched corpus for kernel fuzzers. The majority of
the new bugs identified with the enriched corpus are a result of corpus triage process
and we explore improvements for existing corpora with the enriched corpus. We
identified 22 new high severity bugs with the enriched corpus, and we obtained 4
CVE identifiers for the vulnerabilities. We have open sourced our enriched corpus
framework for everyone to consume with a continuous integration mechanism to keep
the corpus up to date with historic crashes.
Additionally, we also evaluate how we can approach a new program generation
strategy using iterative deepening after surveying the historic crashes.
The results of this study will provide insight into improving the Linux Kernel fuzz
testing process for higher system reliability and security.

3
2
Background and Problem

Understanding various avenues of improving the Linux Kernel fuzz testing process will
result in higher system reliability and security whilst supporting rapid development.
Over the past few years, fuzz testing has produced numerous bugs. There is a window
of opportunity to analyze the crash reproducers. In Bonsai Fuzzing [10], the authors
demonstrate the generation of minimal fuzzer inputs, which would help reduce the
minimization efforts on fuzzer-generated complex inputs. Kernel fuzzers currently
suffer from having to spend time minimizing the crash reproducers.
There are various state of the art tools that can be used for fuzzing the Linux Kernel.
Syzkaller has been developed by Google in the Go programming language. Google
hosts an instance of syzkaller, known as syzbot [13]. Syzkaller is an unsupervised
coverage-guided grammar-based kernel fuzzer. HEALER [9] is a kernel fuzzer that is
inspired from syzkaller but uses dynamic relationship mapping to inform its fuzzing
algorithm leveraging the same grammar-based approach for input generation as
syzkaller.

4
2.1 Kernel Fuzzing
2.1.1 Syzkaller

Syzkaller is an unsupervised grammar-based system call fuzzer that supports many


architectures and was developed by Google [11]. It is a mature kernel fuzzer that lever-
ages program generation grammar called syzlang. There are system call definitions
in syzlang which might look like:

read(fd fd, buf buffer[out], count len[buf])


close(fd fd)

Syzkaller generates and mutates programs(system call sequences) based on system call
descriptions in syzlang, the corpus and the choice table. Depending on the provided
inputs, syzkaller generates and executes programs while monitoring the kernel under
test. When syzkaller is operated with an existing corpus, the behavior of the system
is different. The fuzzer first triages the corpus and minimizes the corpus, if the
corpus is marked to be minimized to determine the smallest subset of programs that
exercise the maximum coverage. All corpus programs are ran and then subsequently
minimized before the actual fuzzing process begins. Syzkaller duplicates list of all
corpus programs and the superset is called Candidates. All candidates are triaged
before actual fuzzing, program generation and minimization process can begin.

2.1.2 HEALER

HEALER [9] also operates in the same mode as syzkaller with a major difference
being how it operates the choice table. HEALER makes use of the same grammar,
syzlang for generating programs as syzkaller. HEALER updates the choice table
dynamically at runtime by observing the change in coverage observed by executing
different system call sequences in a program. For example, consider the program:

r0 = open(&(0x7f0000000000)="./file0", 0x3, 0x9) // system call 1

5
close(r0) // system call 2

HEALER executes the program consisting of two system calls, open() and close().
If it notices new coverage, it executes two programs separately, each of which one
system call. If the observed coverage with both system call program from the first
execution is higher than the execution coverage of the system calls independently,
then HEALER draws a influence relationship between the two system calls. This is
then captured in the choice table to indicate that close() has a higher probability of
occurring if the first system call in a newly generated program is open(). HEALER
also triages the corpus programs given to it as input before starting the actual fuzzing
process. However the major difference here is that corpus programs are not forced to
be minimized in the same way as syzkaller does. HEALER emphasizes the relationship
between system calls and extracts system call pairs that indicate coverage growth.

2.1.3 Crash Reproducers

Crash reproducers are programs that when run in isolation produce the respective
deterministic crashes in isolation. Crash reproducers are of high importance when
trying to narrow down the root cause of a bug and are of high value for developers
and researchers. When a fuzzing tool, such as syzkaller, is tasked with identifying
crash reproducers, it tries to narrow down the previously executed programs before
kernel crash. There are algorithms such as bisection and minimization employed to
produce the simplest version of the crash reproducer program. Crash reproducers can
be of two formats: executable code that can be rerun by developers and syzlang-based
reproducers that can convey semantic value to the fuzzers.

2.1.4 Corpus Minimization

When running syzkaller with a corpus, the tool operates differently when working
with corpora that are hand-packaged. All corpora undergo a triage phase, however,

6
an additional minimization is only employed with corpus packaged with version 0.
To force a corpus to undergo minimization during triage phase, we can unpack and
repack an existing corpus to version 0. During corpus minimization, the program is
iteratively reduced until the least complex configuration is observed that can generate
the same new corpus.

r0 = openat$cdrom(...)
ioctl$CDROMPAUSE(r0, 0x123)

If the following corpus program generates new coverage, then subsequently, the
following program will be executed and the coverage from this execution will be
compared to the prior run.

r0 = openat$cdrom(...)
ioctl$CDROMPAUSE(r0, 0x0)

This minimized version of the corpus program is then stored in the new corpus if
the minimized version of the program indeed generates equivalent coverage. This
minimization operation is also employed at fuzzer runtime when new programs are
mutated or generated that lead to increased coverage.

2.2 Current Testing Landscape and Challenges


2.2.1 Syzbot

Google deploys a robust fuzz testing infrastructure with multiple instances of syzkaller
collectively managed by syzbot [13]. Syzbot is responsible for collecting data from
all fuzzer instances and notifying kernel developers through mailing lists about bugs,
crash reproducers and helps with testing patches. Google deploys a clean instance
of syzkaller, alongside latest builds of the Linux Kernel and syzkaller [11] everyday.
Syzbot has been running for over 1946 days as of this writing and has discovered over

7
5513 bugs, out of which 4479 bugs have been fixed and 1034 are yet to be addressed.
Not all bugs discovered by syzkaller have crash reproducers available for them. There
are challenges with efficiently identifying crash reproducers for kernel crashes. Syzbot
publishes statistics and a copy of the corpus after daily fuzzing campaigns. These
crash reproducers carry immense value in terms of CPU hours that has led to the
discovery and identification of the bugs. Besides the Linux Kernel, syzbot also tests
other operating system kernels like the Android Kernel and OpenBSD kernel. Out of
the 4479 bugs discovered by syzkaller that have been fixed, only 2901 bugs have C
reproducers and additionally 332 bugs have non-deterministic syzlang reproducers.
Each bug can have more than one reproducer. There are more than 14901 unique
syzlang based reproducers currently available for these 4479 fixed bugs. These syzlang
based reproducers can be obtained from syzbot.

2.2.2 Regression Testing

Many kernel developer partners collaborate within the KernelCI [4] group’s Linux-
Arts [5] Project. Running regression tests against a kernel takes about 4 hours to
complete. Since the Linux Kernel project is evolving at such a rapid pace, regression
testing is not employed for testing every commit. However, regression testing is used
by distribution maintainers for their releases and for releases of the Linux Kernel
Mainline. The set of test programs have historically been given manually by the
syzkaller project maintainers in bulk when requested by the Kernel testing team.
Since the test set is not always up-to-date, it leaves a big room for bugs to creep into
releases for the recent regressions that are not in the test set.

2.3 Research Questions

The research questions that the thesis aims to explore are:

8
2.3.1 Are previously-known crashes and high code coverage corpus entries a valuable
source for testing the Linux Kernel?

Obtain a corpus of previously known crashes and high code coverage entries for the Linux

Kernel from the syzkaller dashboard for various versions of the Linux Kernel. We perform

regression testing with this corpus. By this we will aim to determine the viability of relying

on a static unit testing corpus for continuous testing of the Linux Kernel.

2.3.2 Do historic crashes add value to fuzzing the Linux Kernel as an initial corpus
of inputs?

We obtain a corpus of previously known crashes from syzkaller dashboard. We prepare a

fuzzer corpus that can leverage these programs as an input. In the thesis, we investigate

testing this enriched corpus independently, and compare it with other fuzzing corpora.

2.3.3 Can we construct programs intelligently using Bonsai Fuzzing technique that
can lead to faster bug discovery?

We determine the bonsai fuzzing parameters to generate programs and evaluate the initial

prototype.

We identify the challenges and also draw a path for future work.

9
3
Regression Testing

Regression testing the Linux Kernel involves running a series of tests to ensure that
changes or updates to the kernel do not introduce any new bugs or break existing
functionality. This is important because the kernel is the core of the operating system
and any issue with it can have serious consequences. Regression tests can be run
manually or automated using a testing framework. Automated regression testing
is preferred because it allows for a larger number of tests to be run in a shorter
amount of time, increasing the chances of detecting potential issues. These tests
can include functional tests, which check that the kernel is functioning correctly,
and performance tests, which measure the performance of the kernel under different
conditions. Regression testing is an ongoing process that is critical for maintaining the
stability and reliability of the Linux Kernel. In this section we will discuss leveraging
the previously fuzzer discovered bugs for continuous regression testing.

10
3.1 Procedure
3.1.1 Regression Testing

• Obtain a corpus of previously-known crashes and high code coverage entries


for the Linux Kernel from the syzkaller dashboard for versions v4.14.292 and
v4.19.257 of the Linux Kernel which are categorized as Long Term Stable
releases.

• Set up testing environments for the corresponding kernels with a Debian boot-
strapped file system.

• Record the number of discovered bugs and compared the results of the regression
tests with the results of the same tests on previous versions of the Linux Kernel
to determine the effectiveness of the corpus for continuous testing.

• Analyze the results and identify any patterns or trends in the effectiveness of
the regression testing.

3.1.2 Obtaining Crash Reproducers

The syzkaller dashboard is a web-based interface that allows users to view and analyze
the result of syzkaller runs. To obtain crash reproducers for regression testing, we
determine the versions of the Linux Kernel we are going to run regression tests against.
We selected the LTS releases v4.14.292 and v4.19.257. For each major-minor version,
we obtained the corresponding previous crashes that have been reported to be fixed
by the developer community. We automated the collection and compilation process
for the test suite.

3.1.3 Running the Regression Tests

After setting up regression testing environments, automated scripts were set up to


run corresponding corpus. We ran all the fixed 4.14 crash reproducers against the

11
newer v4.14.292 kernel and similarly fixed 4.19 crash reproducers against the newer
v4.19.257 release. After running the baseline tests, we also used the fixed corpus and
we ran it against newer version releases of the Linux Kernel.

3.2 Analyzing the Test Results

We have identified that the Long Term Releases for the Linux Kernel do not have an
official channel for reporting regression bugs. There are a lot of unnoticed bugs in
the long term stable releases. We reported our regressions to the syzkaller-lts-bugs
mailing list that tracks the bugs for the LTS releases.

3.3 Contributions

We have prepared scripts that can be used to set up an automated regression testing
framework using previously fixed syzkaller discovered kernel bugs. Some of these
scripts were shared with the Kernel CI testing team [4].

12
4
Enriched Corpus

In this thesis, we propose a novel approach to generating high-quality input data for
fuzzing: enriching historic crashes. Specifically, we propose using publicly available
data on historic syzkaller crashes as a corpus for generating fuzzing inputs. We
specifically target the reproducers for the bugs that have been fixed to avoid re-
triggering any known unfixed bugs and avoid invalid marked bugs.
To evaluate our approach, we conducted experiments using syzkaller and HEALER
with historic crash enriched corpus. Our results show that our approach is effective
at generating high-quality, diverse inputs that are capable of identifying new vulnera-
bilities in software applications. Additionally, our approach is significantly faster and
more efficient than traditional methods of fuzzing, making it a valuable resource.
Overall, our work represents an important step forward in the field of kernel
security by providing a novel approach to generating high-quality input data for
fuzzing. By leveraging the wealth of historic crash data that is publicly available,
we can obtain high number of bugs, ultimately helping to improve the security of
software applications and ensure the safety of users. We contribute our pipeline of
preparing enriched corpus as an open sourced repository [6].

13
4.1 Procedure

• Obtain all crash reproducers from syzbot dashboard for fixed bugs. Condense all
reproducers into a corpus that can be fed into fuzzers, syzkaller and HEALER [9].

• Prepare fuzzing environments and run the fuzzers for a fixed time period.

• Collect logs and benchmark statistics data provided by the fuzzer.

• Analyze the results and identify any patterns or trends to understand the impact
of enriched crashes in Linux Kernel fuzzing.

4.2 Obtaining Crash Reproducers

We automate obtaining the crash reproducers from syzbot [12] using scripts. We
collected reproducers of the bugs that were fixed before the release of the kernel
versions under test. We tested various versions of the Linux Kernel, namely v6.0.8,
v6.1.20 and we have experiments ongoing at the time of this writing against v6.3-rc6.
HEALER [9] consumes corpus as a set of text-files that are written in syz-lang,
the grammar defined by syzkaller. Syzkaller consumes a corpus formatted as a
database using the inbuilt syz-db tool.
For a comprehensive corpus study, we obtained corpus from Google ran syzkaller
instance, syzbot [13] which is available for research purposes. We refer to this corpus
as “Google corpus” for ease of comparison.
Number of programs in various corpus tested:

• Enriched Corpus for Initial experiments: 14244 Programs

• Moonshine Corpus: 360 Programs

• Google Corpus: 29568 Programs

14
• Enriched Corpus for Production experiments: 14901 Programs

• Enriched Google Corpus: 43812 Programs

The difference in the number of programs for enriched corpus in the two exper-
iments is because the production enriched corpus was obtained at a later date prior
to the release of the newer kernel.

4.3 Experiment Setup

We used ThinkMate, Intel® Xeon® Gold 6226R server with 32 threads for our
experiments.

4.3.1 Initial Experiments

The initial set of kernel fuzzing experiments were performed with 1 Virtual Machine
leveraging KVM capability on the server. Each Virtual Machine had 2 vCPU cores
and 4 GB RAM. All experiments ran for 24 hours (effectively 48 CPU hours) and we
aggregated results of 10 isolated runs.

4.3.2 Production Setup

The later set of production experiments were ran on 8 Virtual Machines, each with a
similar configuration as before. All experiments ran for 24 hours (effectively 384 CPU
hours). These experiments are ongoing and we only present data collected so far.

4.4 Observations
4.4.1 Enriched Corpus with Initial Setup

We compare enriched corpus against:

• No corpus

• Moonshine [7] corpus with sample traces

15
Additionally, we run the experiments on syzkaller and HEALER.

Coverage Comparison against Linux v6.0.8

Figure 4.1: Coverage observed during 48 CPU hours (i.e., 24 hours per VM instance)

It is known that using a corpus [14] boosts the coverage for the fuzzers. The
coverage information is the number of basic blocks instrumented by the compiler
(gcc). This information is obtained directly from syzkaller and HEALER. We notice
that HEALER outperforms syzkaller when working with the same initial corpus to
discover new coverage. However, syzkaller discovered more coverage when starting
without a corpus in Figure 4.1.

Unique Crashes against Linux v6.0.8

Syzkaller and HEALER classify crashes with unique tracebacks differently. In practice,
each unique traceback corresponds to a unique bug. The unique crashes information

16
Figure 4.2: Unique crashes observed during 48 CPU hours (i.e., 24 hours per VM
instance)

is provided directly by the corresponding fuzzers. We notice in Figure 4.2 huge


number of unique crashes when using syzkaller with enriched corpus. The number
of crashes observed in syzkaller is higher than HEALER when using the enriched
corpus. We also notice that the unique crashes plateau around the 20 hour mark. It
is identified that majority of the crashes are a result of the minimization and triaging
of the enriched corpus.

Total Crashes against Linux v6.0.8

Syzkaller and HEALER also report the total number of crashes across all bugs. Each
bug relates to a crash. Similar to the observations with unique crashes, we can
observe in Figure 4.3 that the plateau more clearly with total crashes when using the
enriched corpus. We also notice that the number of crashes identified with HEALER

17
Figure 4.3: Total crashes observed during 48 CPU hours (i.e., 24 hours per VM
instance)

is growing in curve and can potentially overtake syzkaller if it runs for a longer time.

4.4.2 Corpus Comparison in Production Setup

We compare enriched corpus1 against:

• Non minimized version of Google corpus obtained from syzbot [13]

• Pre-minimized version of Google corpus obtained from syzbot

• Enriched version of non-minimized version of Google corpus from syzbot

These set of experiments are performed to emulate larger fuzzing campaigns with
heavy corpus intake. Single VM based initial experiments took extremely long time

1
The data in production section of experiments is incomplete as the experiments are ongoing.

18
to triage the input corpus and are not representative of the scale at which companies
like Google operate them.
We also prepare an enriched version of the Google corpus by injecting the enriched
corpus in the google’s corpus and repacking it.

Coverage against Linux v6.3-rc6

Figure 4.4: Coverage observed during 384 CPU hours (i.e., 24 hours of 8 VM instances)

We notice in Figure 4.4 that enriched version of Google corpus outperforms all
other corpus candidates. We notice that forcing minimization of Google corpus takes
CPU time to reach the same coverage than a pre-minimized version of the same
corpus. The total coverage observed with Google corpus is higher than just enriched
corpus.

19
Figure 4.5: Total crashes observed during 384 CPU hours (i.e., 24 hours of 8 VM
instances)

Total Crashes against Linux v6.3-rc6

In Figure 4.5 we notice that only the fuzzer instances with enriched corpus result into
higher amount of crashes.

Unique Crashes against Linux v6.3-rc6

In Figure 4.6 we reflect on our observations in the initial experiments in 4.4.1 to


notice that enriched corpus continues to display higher count of unique crashes.

Time to Triage against Linux v6.3-rc6

In Figure 4.7 we observe the total triaging time for different corpus. Despite spending
a considerable amount of CPU time in triaging the candidate programs in the enriched
Google corpus, it still results in a high coverage and high amount of crashes.

20
Figure 4.6: Unique crashes observed during 384 CPU hours (i.e., 24 hours of 8 VM
instances)

4.5 Results of Enriched Corpus

We identified 22 new bugs when minimizing the enriched corpus against different
versions of the Linux Kernel. We have also obtained 4 CVE identifiers for them
so far. The CVE’s are CVE-2023-26544, CVE-2023-26605, CVE-2023-26606 and
CVE-2023-26607.
Minimizing the enriched corpus with syzkaller also resulted into discovery of
previously unknown bugs. Most of the bugs were identified as a result of minimization
process during the corpus triage phase of the fuzzer. Syzkaller delivers more bugs
than HEALER when minimizing the enriched corpus.

21
Figure 4.7: Minimization time over 384 CPU hours fuzzing (i.e., 24 hours of 8 VM
instances)

4.6 Ongoing and Future Work

Production scale experiments as described in Section 4.3.2 are ongoing. We will also
need to evaluate the performance of HEALER in such a setting with high CPU hour
campaigns. There is an opportunity to study how effectively minimization can be
handled and if that can be used as a input generation for a new kernel fuzzing tool.
We have contributed the enriched corpus generation framework as an open source
tool for researchers to consume an enriched corpus directly. Additionally, we have
added Continuous Integration framework with GitHub actions to prepare enriched
corpus and an enriched version of google’s corpus daily in an automated manner.

22
4.7 Conclusion

We observe the benefits of leveraging historic crashes. There are a lot of benefits from
leveraging enriched corpus with syzkaller. The coverage obtained through historic
crashes is limited to the bugs that have been previously discovered and fixed.

23
5
Bonsai Fuzzing

Bonsai Fuzzing [10] presented a coverage-guided grammar-based fuzzing technique


for automatically synthesizing a corpus of concise test inputs. Their key insight
was that instead of attempting to minimize convoluted fuzzer-generated test inputs,
we can instead grow concise test inputs by construction using a form of iterative
deepening. They call this approach Bonsai Fuzzing. In this research, we attempted
to implement similar approach towards generating fuzzing programs for Linux Kernel
in syzkaller [11].

5.1 Background and Motivation

System call based kernel fuzzers, like syzkaller and HEALER [9] leverage syz-lang
as the grammar for defining system calls. Syzkaller employs a random biased search
to identify subsequent system calls when generating a program from scratch. When
syzkaller is started, it initializes its choice table. The choice table is a large matrix
that demonstrates the relationships between system calls. In case of syzkaller, the
choice table is generated one time at the start of the fuzzing campaign based on
heuristics and corpus presented to the fuzzer. Heuristics display similarity in the

24
parameters shared by different system calls. Corpus presented to syzkaller is used to
identify the system calls that occur together in a program, and that is added to the
heuristic based knowledge to shape the choice table.
When a program is generated by syzkaller, the first system call is arbitrarily
chosen from the list of available system calls. Subsequently, aforementioned random
biased search is employed against the choice table to determine the future system
call. This process is repeated until the size of the program matches the requirements
set by the fuzzer.
Typically, syzkaller generates a program of size 30. During fuzzing, if a crash is
observed, then syzkaller employs a bisection algorithm to find the culprit program and
further minimizes that program to determine the crash reproducer. Upon surveying

Figure 5.1: Histogram of Crash reproducers in the Linux Kernel

a set of crash reproducers from the Linux Kernel in Figure 5.1, we observe that the
majority of programs responsible in the crashes are smaller than the programs that

25
are generated by the fuzzer. This results in the hypothesis that generating smaller
programs by design could save reproducer identification time. Generating smaller
programs for fuzzing could also result into generating more programs in a given
amount of time. With this motivation, we begin our experimentation of generating
programs with iterative deepening strategy.

5.2 Bonsai Fuzzing Parameters

To implement Bonsai Fuzzing capabilities in program generation for Linux Kernel


fuzzing, we have to choose the parameters that can create lattice nodes. Syzkaller is
a Linux Kernel fuzzing tool that relies on a choice table for determining system call
priorities while constructing programs. System calls with higher priority are usually
chosen to run together. This results in an increase in coverage.

5.2.1 Program Size

We decided to use the program size as one of the Bonsai Fuzzing parameters in our
implementation. Program size determines the number of system calls in a program.
We label this property as m in our lattice nodes.

5.2.2 System Call Priority

Since choice table is a key element in most grammar-based coverage guided fuzzers for
the Linux Kernel, we relied on it to be the second parameter for our implementation.
The affinity of the highest n system calls was constrained to generate programs. Only
the top n system calls of highest affinity would be allowed as subsequent system calls
in the program.

26
1,0

2,0 2,1 2,2 2,3

3,0 3,1 3,2 3,3

Figure 5.2: Bonsai Fuzzing Lattice

5.3 Bonsai Fuzzing Lattice

Figure 5.2 demonstrates the Bonsai Fuzzing lattice we experimented with. The lattice
nodes are labelled [m,n], where m represents the program size and n represents the
number of system calls of highest priority based on the choice table.
The corpus from one lattice node [m,n] will be provided as an input to the future
lattice nodes [m+1,n] and [m,n+1]. For a lattice node [m,n], the input corpus is
the merged corpus from the nodes [m-1,n] and [m,n-1].
The fuzzing for nodes [m+1,n] and [m,n+1] can be run in parallel if their required
input corpus is available.
In our experiment, we started with the lattice node [1,0], which means we
generated programs containing 1 system call. In subsequent node, we leveraged the
corpus from [1,0] as input. [2,0] means we generated a program that consists of 2
system calls, the first of which was chosen at random and the second system call in
the program is the highest priority system call in the choice table of the first system
call.

27
5.4 Procedure
5.4.1 Bonsai Fuzzing with Linux Kernel

• Determine the Bonsai Fuzzing parameters for Linux Kernel fuzzing. The
parameters chosen are system call priority and program size.

• Prototype the design with syzkaller source code by leveraging masks to the
system call choice table.

• Prepare fuzzing environments and run the fuzzers for a fixed time period.

• Collect logs and benchmark statistics data provided by the fuzzer.

• Analyze the results and identify any patterns or trends to understand the impact
of Bonsai Fuzzing strategy for kernel fuzzing program generation.

5.5 Experiment Setup

We used ThinkMate, Intel® Xeon® Gold 6226R server with 32 threads for our
experiments.
The initial set of kernel fuzzing experiments were performed with 1 Virtual Machine
leveraging KVM capability on the server. Each Virtual Machine had 2 vCPU cores
and 4 GB RAM. All experiments ran for 24 hours (effectively 48 CPU hours) and
we aggregated results of 10 isolated runs. We were able to only experiment with one
lattice configuration.

5.6 Observations

We notice that despite high execution with Bonsai-based approach in Figure 5.4, we
do not match the coverage baselines of syzkaller.

28
Figure 5.3: Coverage over time against Linux v6.0.8

5.7 Results and Challenges

While the small program generation is promising, we face challenges in achieving


high coverage with our experiment setup. This can be attributed to lack of fuzzing
time and lack of experience in tuning the lattice parameters. We also notice that
despite having a merged corpus, the fuzzer spends a considerable amount of time in
minimizing and triaging the merged corpus in the nodes [3,1], [3,2] and [3,3]. If
we can force the fuzzer to skip the minimization phase, we could potentially spend
more time in fuzzing and generate more programs.
We encourage future researchers to experiment with these ideas.

29
Figure 5.4: Executions over time against Linux v6.0.8

5.8 Conclusion

Program generation and corpus merges provide exciting challenges during exploration
of iterative deepening based input generation with Bonsai Fuzzing in the Linux Kernel.
Due to lack of time and experience, we were not able to investigate this research goal.

30
6
Conclusion

With this research, we identified the value in leveraging historic crashes and contribute
the enriched corpus generation framework. We also identified many new bugs during
our research and we have reported them to the Linux Kernel maintainers.
Our enriched corpus framework [6] has gained traction in the open source commu-
nity and we are currently exploring the production scale experiments per Section 4.3.2
to make a case for Google to adopt the enriched corpus within syzbot.
The majority of the crashes with the enriched corpus are a result of the corpus
triage and the subsequent minimization steps performed by syzkaller. We demonstrate
the benefits of using enriched corpus and improving the quality of existing fuzzer
corpus.
There are a lot of opportunities to study program generation. We feel that the
Bonsai Fuzzing methodology implemented and tested in this thesis only scratches the
surface.

31
Bibliography

[1] J. Corbet, “Development statistics for the 6.1 kernel (and beyond) [lwn.net],”
Dec 2022. [Online]. Available: https://lwn.net/Articles/915435/. [Accessed
30-Apr-2023].

[2] Google, “The Linux Kernel.” [Online]. Available: https://www.kernel.org/doc/


html/latest/dev-tools/kasan.html.” [Accessed 30-Apr-2023].

[3] J. Horn, “CVE-2022-42703.” [Online]. Available: https://bugs.chromium.org/p/


project-zero/issues/detail?id=2351.” [Accessed 30-Apr-2023].

[4] G. Kennedy, “Re: [automated-testing] syzkaller reproducers,” Dec 2022. [Online].


Available: https://groups.io/g/kernelci/message/1633. [Accessed 30-Apr-2023].

[5] S. Khan. [Online]. Available: https://git.kernel.org/pub/scm/linux/kernel/git/


shuah/linux-arts.git/. [Accessed 30-Apr-2023].

[6] P. Oswal and R. Padhye, “Cmu-pasta/linux-kernel-enriched-corpus - github.” [On-


line]. Available: https://github.com/cmu-pasta/linux-kernel-enriched-corpus.”
[Accessed 30-Apr-2023].

[7] S. Pailoor, A. Aday, and S. Jana, “MoonShine: Optimizing OS fuzzer seed


selection with trace distillation.” in USENIX Security Symposium, 2018, pp.
729–743.

[8] H. Shi, R. Wang, Y. Fu, M. Wang, X. Shi, X. Jiao, H. Song, Y. Jiang, and
J. Sun, “Industry practice of coverage-guided enterprise Linux kernel fuzzing,”
in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engi-
neering Conference and Symposium on the Foundations of Software Engineering,
2019, pp. 986–995.

[9] H. Sun, Y. Shen, C. Wang, J. Liu, Y. Jiang, T. Chen, and A. Cui, “HEALER:
Relation learning guided kernel fuzzing,” in Proceedings of the ACM SIGOPS
28th Symposium on Operating Systems Principles, 2021, pp. 344–358.

32
[10] V. Vikram, R. Padhye, and K. Sen, “Growing a test corpus with Bonsai fuzzing,”
in 2021 IEEE/ACM 43rd International Conference on Software Engineering
(ICSE). IEEE, 2021, pp. 723–735.

[11] D. Vyukov, “Github - google/syzkaller: Syzkaller is an unsupervised


coverage-guided ...” [Online]. Available: https://github.com/google/syzkaller.”
[Accessed 30-Apr-2023].

[12] ——, “Syzbot.” [Online]. Available: https://syzkaller.appspot.com/upstream/


fixed.” [Accessed 30-Apr-2023].

[13] ——, “Syzbot - appspot.com.” [Online]. Available: https://syzkaller.appspot.


com/.” [Accessed 30-Apr-2023].

[14] D. Wang, Z. Zhang, H. Zhang, Z. Qian, S. V. Krishnamurthy, and N. B. Abu-


Ghazaleh, “Syzvegas: Beating kernel fuzzing odds with reinforcement learning.”
in USENIX Security Symposium, 2021, pp. 2741–2758.

33
ProQuest Number: 30486941

INFORMATION TO ALL USERS


The quality and completeness of this reproduction is dependent on the quality
and completeness of the copy made available to ProQuest.

Distributed by ProQuest LLC ( 2023 ).


Copyright of the Dissertation is held by the Author unless otherwise noted.

This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.

This work is protected against unauthorized copying under Title 17,


United States Code and other applicable copyright laws.

Microform Edition where available © ProQuest LLC. No reproduction or digitization


of the Microform Edition is authorized without permission of ProQuest LLC.

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA

You might also like