0% found this document useful (0 votes)
3 views

2407 a Coverage-Guided Fuzzing Method for Automatic Software Vulnerability Detection Using Reinforcement Learning-Enabled Multi-Level Input Mutation

This article presents a coverage-guided fuzzing model called CTFuzz, which utilizes reinforcement learning (RL) to enhance software vulnerability detection through optimized input mutation. The model aims to balance exploitation and exploration in the fuzzing process, improving code coverage and execution speed compared to existing tools like rlfuzz and AFLplusplus. The research highlights the effectiveness of integrating RL with fuzzing techniques to create a more adaptive and efficient approach for identifying software vulnerabilities.

Uploaded by

Piao David
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

2407 a Coverage-Guided Fuzzing Method for Automatic Software Vulnerability Detection Using Reinforcement Learning-Enabled Multi-Level Input Mutation

This article presents a coverage-guided fuzzing model called CTFuzz, which utilizes reinforcement learning (RL) to enhance software vulnerability detection through optimized input mutation. The model aims to balance exploitation and exploration in the fuzzing process, improving code coverage and execution speed compared to existing tools like rlfuzz and AFLplusplus. The research highlights the effectiveness of integrating RL with fuzzing techniques to create a more adaptive and efficient approach for identifying software vulnerabilities.

Uploaded by

Piao David
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2023.0322000

A Coverage-guided Fuzzing Method for


Automatic Software Vulnerability Detection using
Reinforcement Learning-enabled Multi-Level
Input Mutation
VAN-HAU PHAM1,2 , DO THI THU HIEN1,2 , NGUYEN PHUC CHUONG1,2 , PHAM THANH THAI1,2 ,
and PHAN THE DUY1,2
1
Information Security Lab, University of Information Technology, Ho Chi Minh City, Vietnam
2
Vietnam National University, Ho Chi Minh City, Vietnam
Corresponding author: Van-Hau Pham (e-mail: [email protected]).
This research was supported by The VNUHCM-University of Information Technology’s Scientific Research Support Fund.

ABSTRACT Fuzzing is a popular and effective software testing technique that automatically generates
or modifies inputs to test the stability and vulnerabilities of a software system, which has been widely
applied and improved by security researchers and experts. The goal of fuzzing is to uncover potential
weaknesses in software by providing unexpected and invalid inputs to the target program to monitor its
behavior and identify errors or unintended outcomes. Recently, researchers have also integrated promising
machine learning algorithms, such as reinforcement learning, to enhance the fuzzing process. Reinforcement
learning (RL) has been proven to be able to improve the effectiveness of fuzzing by selecting and prioritizing
transformation actions with higher coverage, which reduces the required effort to uncover vulnerabilities.
However, RL-based fuzzing models also encounter certain limitations, including an imbalance between
exploitation and exploration. In this study, we propose a coverage-guided RL-based fuzzing model that
enhances grey-box fuzzing, in which we leverage deep Q-learning to predict and select input variations
to maximize code coverage and use code coverage as a reward signal. This model is complemented by
simple input selection and scheduling algorithms that promote a more balanced approach to exploiting and
exploring software. Furthermore, we introduce a multi-level input mutation model combined with RL to
create a sequence of actions for comprehensive input variation. The proposed model is compared to other
fuzzing tools in testing various real-world programs, where the results indicate a notable enhancement in
terms of code coverage, discovered paths, and execution speed of our solution.

INDEX TERMS Reinforcement Learning, Fuzzing, Vulnerability Detection, Coverage Fuzzing.

I. INTRODUCTION Black-box fuzzing, also known as dumb fuzzing, operates


without any knowledge of the program’s internals, randomly

R ECENTLY, software testing and security enhancement


have become increasingly important and attract signif-
icant research and development attention. Among the tech-
generating inputs to uncover basic flaws. White-box fuzzing,
in contrast, requires a comprehensive understanding of the
program’s source code, employing static and dynamic anal-
niques for detecting software vulnerabilities, fuzzing is one ysis to methodically explore and test the software’s execution
of the most popular, convenient, and effective methods [1], paths. Grey-box fuzzing strikes a balance between the two,
[2]. The goal of fuzzing is to identify potential flaws in leveraging limited knowledge of the application’s internals
software by providing unexpected and invalid inputs to the along with feedback from testing to intelligently refine the
target program, thereby monitoring behavior and identify- fuzzing process [5]. Each approach varies in complexity,
ing errors, failures, or undesired outcomes [3], [4]. Fuzzing resource requirements, and efficiency, with black-box fuzzing
techniques can be broadly categorized into three main types: being the simplest and quickest, white-box offering the most
black-box fuzzing, white-box fuzzing, and grey-box fuzzing.

VOLUME 11, 2023 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

thorough examination at the cost of greater complexity, and balances the concepts of exploitation and exploration—key
grey-box providing an effective middle ground that combines elements in effective fuzzing. Exploitation delves deep into
ease of implementation with the potential for deep vulnera- the program to reach critical code segments, while explo-
bility detection. ration aims for broad branch coverage to ensure no potential
More specifically, grey-box fuzzing, particularly when fault is overlooked. Achieving a harmonious balance between
augmented with code coverage metrics, represents a promi- these two approaches is essential for optimizing the fuzzing
nent methodology within the fuzzing landscape [6]. This ap- model’s effectiveness, thereby leading to quicker and more
proach necessitates access to the executable files of the target comprehensive identification of software vulnerabilities [9].
application, which are then subjected to fuzzing procedures This evolution in fuzzing marks a significant milestone in
within a specialized environment. Its widespread adoption automated software testing, offering a smarter, more adaptive,
can be attributed to its applicability to closed-source software, and outcome-focused method for detecting and addressing
eliminating the need for source code access. By harnessing security flaws.
insights from code coverage data garnered during the execu- Current approaches like [10], [7] merging RL with fuzzing
tion of the program, grey-box fuzzing enhances its efficacy are intensely focused on developing the algorithms, states,
[7]. Fuzzing frameworks employing this technique initiate rewards, and parameters specific to RL while overlooking
the execution of the program using various test inputs while critical factors like the balance between exploitation and
deploying sophisticated mechanisms to capture detailed code exploration. This emphasis leads to fuzzing models based
coverage information. This crucial data serves as the basis on RL that concentrate excessively on a single code branch,
for evaluating and refining test input criteria, with the over- thereby missing potential errors in other branches. These
arching goal of augmenting the fuzzing process’s effective- models often give priority to selecting mutation actions with-
ness. Through this iterative optimization based on dynamic out incorporating effective mechanisms for the selection and
execution feedback, grey-box fuzzing with code coverage scheduling of inputs, setting them apart from conventional
insight emerges as a powerful tool in identifying software fuzzing tools. Moreover, there’s a significant gap in research
vulnerabilities [8]. offering a comparative analysis of RL models in fuzzing
The fusion of machine learning (ML) with fuzzing method- against the efficiency, strengths, and weaknesses of modern
ologies signifies a groundbreaking advancement in the de- fuzzing tools.
tection of software vulnerabilities, offering a sophisticated, To a certain point, the fuzzing techniques can be classified
intelligent approach to security assessments. This innovative into exploitation and exploration techniques [11]. Therein,
integration empowers fuzzing tools with the ability to learn exploitation and exploration stand as pivotal concepts in
from previous iterations, refining and targeting their search the fuzzing process, with exploitation denoting the capacity
for flaws more effectively. Machine learning algorithms an- of test cases to penetrate deeply and access code segments
alyze patterns from past fuzzing activities to enhance the buried within the program [12], [13]. Conversely, exploration
generation of test inputs, focusing on areas more likely to pertains to test cases achieving extensive branch coverage.
reveal critical vulnerabilities. This not only increases the effi- A fuzzing model with an excessive focus on exploration
ciency of the fuzzing process by prioritizing high-risk code may not generate test cases that effectively pinpoint faulty
paths but also enables the adaptation of fuzzing strategies code segments within the program. Similarly, a model overly
in real time, optimizing the exploration of complex software concentrated on exploitation might only aim to reach the
environments. The result is a more nuanced, context-aware deepest branch within a program, potentially overlooking
approach to vulnerability detection that significantly reduces faults in other branches. Therefore, finding an optimal bal-
the time and computational resources required, marking a ance between exploitation and exploration is essential for
substantial leap forward in the field of software security. the development of an efficient fuzzing model. This balance
Reinforcement Learning (RL), a subset of ML character- ensures that the model can effectively uncover faults across
ized by an agent learning to make decisions through trial and different branches of the code, maximizing the potential for
error to achieve a specific goal, has emerged as a potent tool identifying vulnerabilities within the software [14], [9] [11].
for enhancing fuzzing techniques in the generation of new Hence, in our research, we introduce an innovative fuzzing
test cases. By applying RL principles, fuzzing frameworks model guided by coverage metrics, leveraging RL (RL) to
can dynamically adjust their strategies based on the outcomes enhance input selection and scheduling. This approach aims
of previously executed test cases, effectively learning which to meticulously balance the exploration of new paths and
types of inputs are more likely to induce anomalies or reveal the exploitation of deeper, potentially vulnerable segments
vulnerabilities in the software under test. This integration within a program. Furthermore, our model incorporates a
leverages the outcomes of previously executed test cases to novel multi-level input mutation mechanism, designed to syn-
inform the creation of new inputs, focusing on those more ergize with RL. This mechanism facilitates the generation of
likely to uncover vulnerabilities by navigating unexplored or mutations via a series of deliberate actions, enabling a more
less-tested paths in the application’s codebase. The applica- granular and targeted exploration of the software’s attack
tion of RL not only enhances the efficiency of the fuzzing surface. To validate the effectiveness and efficiency of our
process by learning from past interactions but also expertly proposed model, we undertake a comprehensive comparative
2 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

analysis, juxtaposing it with two notable fuzzing tools: rlfuzz,


which also employs RL strategies, and AFLplusplus, a con-
temporary and widely used fuzzer. Through this comparison,
we aim to underscore the advantages and potential of our RL-
based fuzzing approach in enhancing the identification and
mitigation of software vulnerabilities, making a compelling
case for its adoption in the field of cybersecurity.
Our contributions in this work are summarized as follows:
• Propose CTFuzz, a coverage-guided fuzzing model that
utilizes reinforcement learning (RL) to optimize input
mutation for enhanced code coverage. FIGURE 1. The coverage-guided fuzzing process.
• Integrate an RL model with an input selection and
scheduling algorithm for ranking the inputs in a queue
based on their current coverage, then prioritizing inputs to the input (e.g., flipping bits, adding bits, shifting bits,
with higher coverage. Besid, it allocates the number of etc.). The choice of actions and the effort devoted to this
trials for each input to ensure a balanced distribution. process also impact the fuzzing process’s performance.
• Design and implement the multi-level input mutation 4) Subsequently, the fuzzing tools test the generated in-
that can enhance the capabilities of the proposed model puts with the target program, collecting runtime infor-
by enabling the generation of complex mutations using mation (program crashes, execution time, code cover-
a sequence of actions. age, etc.).
• Finally, we provide a perspective on how well our model 5) Using the collected information, the fuzzing tool de-
performs compared to other fuzzing tools, with a partic- cides whether to continue, remove, replace, or adjust
ular focus on state-of-the-art fuzzing tools like AFLplus- the inputs in the input queue. If a crash occurs, the
plus and rlfuzz. fuzzing tool saves the test case and the crash informa-
The remaining sections of this article are constructed as tion for further analysis by the fuzzing operator. The
follows. Section II introduced related works in fuzzing us- process then returns to step 2, and the cycle continues.
ing machine learning. Next, the proposed framework and For fuzzing techniques based on the current code coverage,
methodology are discussed in Section III. Section IV de- the main challenges that researchers are focusing on improv-
scribes the experimental settings and evaluation result of ing are as follows:
CTFuzz compared with another RL-based fuzzing tool - rl-
1) How to increase code coverage?
fuzz and a modern fuzzing tool - AFLplusplus. Section V
2) How to generate better test inputs?
discussion about the experiments. Following the discussion,
3) How to perform input mutation more effectively, reduc-
future developments are outlined in Section VI. Finally, we
ing ineffective mutations?
conclude the paper in Section VII.
4) How to overcome program structure checks?
II. RELATED WORK 5) How to reach more branches in the program?
A. FUZZING 6) How to reduce false positives for detected vulnerabili-
With continuous development and research, modern fuzzing ties?
techniques can be categorized into three main types: black- For each question, various ways and alternatives have been
box fuzzing, white-box fuzzing, and grey-box fuzzing. researched and proposed. However, we can divide the im-
Among these three types, grey-box fuzzing is the most com- provement approaches into the following steps:
mon and flexible. Therefore, our team decided to focus on • Generating test cases: In this step, researchers often
investigating grey-box fuzzing for our model. The coverage- apply static and dynamic analysis techniques to extract
guided fuzzing process of an application consists of several information from the executable file and create an input
fundamental steps, as illustrated in Fig. 1. The functions of that can achieve deeper program exploration. This is
each step are outlined below: different from using a random input or a pre-defined
1) First, we need to provide an input value (seed), usually input pattern. In cases where the input involves special
a valid input of the program, which the fuzzing tool data structures (file formats, etc.), some studies employ
places into a queue of input values. artificial intelligence to learn the data structure and gen-
2) From this queue, the fuzzing tool selects an input value erate test cases that can bypass initial structural checks
for mutation, and each fuzzing tool employs different of the program.
algorithms for this selection. The effectiveness of these • Input selection and mutation: When multiple good test
seed selection algorithms significantly influences the cases are evaluated and saved for subsequent steps, the
fuzzing process’s efficiency. selection of the next input for transformation, the choice
3) Next, using the chosen input value, the fuzzing tool of transformation operations, and the amount of ef-
generates various test cases by applying specific actions fort invested in testing that input significantly influence
VOLUME 11, 2023 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

the model’s performance. Researchers focus on mini- generation, (4) mutation action selection, etc. RapidFuzz [19]
mizing redundant transformation efforts, quickly select- and CGFuzzer [20] employ Generative Adversarial Networks
ing accurate transformations to increase code coverage, (GANs) to learn the structure of complex inputs, aiming to
thereby enhancing the fuzzing process’s efficiency. generate higher similarity patterns for fuzzing protocols or
• Post-fuzzing analysis: In certain programs, the number specific file formats. This approach helps save time by avoid-
of false positives found during fuzzing can be high, ing mutating invalid samples and increasing the likelihood of
or duplicate vulnerabilities might be discovered. This passing structure checks. NeuFuzz by Wang, Y. [21] models
can be cumbersome for manual analysts. Therefore, re- the bug-finding process akin to natural language processing,
searchers also address this by implementing algorithms utilizing deep-learning Long Short-Term Memory (LSTM) to
to detect duplicates, evaluating the return of detected learn the structure of error-containing paths, predicting which
vulnerabilities or employing machine learning algo- paths are more likely to have errors and prioritizing them for
rithms to score the exploitability of vulnerabilities. input scheduling.
The RL approaches have also been applied to fuzzing
B. TRADITIONAL IMPROVEMENT TECHNIQUES IN for the first time in 2018 by Böttinger et al. [22]. They
FUZZING transformed the fuzzing problem into an RL problem, where
These are traditional improvement methods for fuzzing, each the selection of the next mutation action is analogous to
with its advantages and drawbacks. Ji Tiantian et al. intro- choosing the next move in a chess game. Although an op-
duced AFLPro in [15], enhancing input selection and schedul- timal strategy may exist, searching for optimal actions is
ing by combining static analysis with a basic block synthe- performed using the deep Q-learning algorithm. However,
sis model. The goal is to prioritize inputs that reach code their proposed model was specifically designed for PDF files,
segments less explored previously, aiming to increase the lacking objective results when compared to modern fuzzing
coverage of deeper program areas. Tai Yue et al. proposed tools and not addressing the balance between exploitation
EcoFuzz [9], modeling input scheduling as a Multi-Armed and exploration. A. Kuznetsov et al. also employed deep
Bandit problem and presenting a variant of the Adversarial Q-learning to select mutation actions for application testing
Multi-Armed Bandit model to improve it. The common idea [23]. They demonstrated that combining RL can reduce the
of both techniques is to enhance input selection and schedul- time needed to create expected test cases by up to 30%,
ing, prioritizing inputs predicted to have a higher likelihood yet their evaluation method does not suit real-world appli-
of containing errors and leading to better code coverage. cations. S. Reddy et al. improved mutation action selection
However, the issue with using static analysis techniques al- using the Monte Carlo Control algorithm, creating more valid
ways lies in the lack of complete runtime data and often samples for applications with complex input structures [24].
produces low-accurate or false-positive results. Additionally, The results enhanced the rate of passing structure checks for
it might not work effectively on applications utilizing code samples. However, their model skewed towards exploitation
obfuscation or packing mechanisms. Peng Chen et al. created rather than exploration, focusing on generating diverse inputs
Matryoshka [16], using taint analysis to solve conditional with similar features instead of exploring new behaviors. Li,
statements and penetrate deeper into the program. However, X et al. introduced Reinforcement Compiler Fuzzing [25],
this technique demands significant resources and slows down also utilizing deep Q-learning for mutation action selection
the fuzzing process, while taint analysis also faces challenges at the compilation level. Nevertheless, their implementation
with under-tainting and over-tainting. H. Zhang et al. pro- requires source code to work effectively. Drozd et al. com-
posed a lightweight and convenient mechanism to surpass bined Deep Double Q-learning to select mutation actions and
input checks by combining static analysis with mutating key accelerate libFuzzer [26]. However, they acknowledged that
bytes in InsFuzz [17]. They identify bytes influencing condi- it is not sufficient and that further enhancements are needed
tional statement results and then mutate them. However, since in terms of input selection and filtering.
it also employs static analysis, it is still subject to limitations Zheng Zhang and colleagues proposed rlfuzz [10], a
like the techniques mentioned above and requires modifying method to balance exploitation and exploration in a deep-q-
the executable files, causing instability in applications with learning fuzzing model by randomly selecting trial inputs for
integrity checks. subsequent transformations when the model does not expe-
rience an increase in code coverage. However, this selection
C. THE COMBINATION OF ARTIFICIAL INTELLIGENCE method is not yet optimal, as inputs with low code coverage
INTO FUZZING in the queue still have an equal chance of being transformed
Recently, with the explosive development of artificial intel- as those with higher potential. In a different approach, Wang
ligence, machine learning techniques have also been applied Jinghan et al. [7] utilized RL for input scheduling rather than
by researchers to enhance the fuzzing process. Surveys [12], selecting mutation actions like other studies. They proposed
[18] indicate that the application of machine learning tech- a multi-level code coverage model to enhance fuzzing detail
niques to fuzzing is diverse and creative, yielding promising and introduced a scheduling mechanism to support this multi-
results. The steps typically addressed by artificial intelligence level code coverage model using RL. The results showed a
include (1) input selection, (2) input scheduling, (3) input balance between exploitation and exploration in the gener-
4 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

ated test cases, but there was no significant improvement in 1) Seed selection and schedule algorithm: From a queue
selecting more effective mutation actions. of inputs, the algorithm ranks them based on the current
Current combined RL and fuzzing solutions tend to focus coverage and prioritizes inputs with higher coverage,
heavily on designing RL algorithms, states, rewards, and pa- while allocating the number of trials to the model.
rameters without considering factors like the balance between 2) RL model: This model receives inputs and selects the
exploitation and exploration. This leads to RL fuzzing models appropriate action to mutate them, aiming to predict the
delving deep into one code branch, missing opportunities to best coverage improvement.
find vulnerabilities in other branches. Moreover, the focus 3) Multi level input mutation: This component receives
often lies solely on mutation action selection, without mecha- the input from (1) and the action from (2) to perform a
nisms for effective input selection and scheduling that are cru- mutation on that input, generating a list of test cases,
cial for real-world fuzzing tools. Furthermore, no RL fuzzing which are then fed into (4) to obtain results. Multi-
study provides a comparative perspective on performance, level input mutation and early stopping mechanism are
strengths, and weaknesses compared to modern fuzzing tools. applied at this stage.
Recognizing the weaknesses in the exploitation-exploration 4) Coverage observer: takes the responsibility of execut-
imbalance of combined RL fuzzing models, this issue could ing inputted test cases on the target program to obtain
be mitigated by combining effective input selection and results.
scheduling algorithms, a topic that has been extensively
studied in traditional fuzzing. Our work investigates and B. INPUT MUTATION WITH RL
proposes an RL-based guided coverage-aware fuzzing model When applying RL to chess, the state of the RL model can be
to address these weaknesses by integrating it with an effective considered as the positions of the pieces on the chessboard,
input selection and scheduling algorithm. Additionally, we the actions as selecting the next move, and the reward as
propose a multi-level input transformation algorithm that can the advantage gained after making the move (chess computer
be applied to RL-based fuzzing models, coupled with a waste programs have algorithms to evaluate advantages based on
reduction mechanism to improve model efficiency. the chessboard position, which we will not elaborate on here).
Researchers can then apply RL techniques to train the model
III. METHODOLOGY to find the optimal moves by playing numerous chess games
A. THE ARCHITECTURE OF CTFUZZ and accumulating experience (rewards). One notable example
Inherited from the fuzzing model using RL named rlfuzz [10], is AlphaZero, developed by DeepMind, which used a combi-
our CTFuzz is designed with four main components as in nation of RL and deep learning to play chess automatically.
Fig. 2. The main improvements of our CTFuzz compared It improved its chess abilities by playing thousands of games
to its predecessor include replacing the random selection against itself, accumulating experience, and has become one
approach with a balanced input selection and scheduling algo- of the strongest chess-playing tools in the world.
rithm, which ensures prioritization of inputs with higher code Similarly, we can apply a similar concept to fuzzing, where
coverage. Additionally, we have implemented a multi-level the test inputs become like the positions of chess pieces and
input transformation model that can be combined with RL become the input for the RL model. Selecting transformation
to enhance the long-term performance of the fuzzing model, actions will be similar to choosing the next move on the
especially when multiple consecutive actions are needed to chessboard. Meanwhile, code coverage will be analogous to
transform inputs effectively. the advantage on the chessboard and becomes the reward for
Each component in CTFuzz is responsible for different the RL model. Consequently, we can train the RL model to
tasks to enhance the effectiveness of the fuzzing process. select the best transformation actions for each input based on

FIGURE 2. The architecture of CTFuzz.

VOLUME 11, 2023 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

the accumulated experience. The reward mechanism does not TABLE 1. Actions for mutation
necessarily depend solely on code coverage but can incorpo-
Action Description
rate other factors based on what we want to improve in the Mutate_EraseBytes Erase byte
model, such as execution time, length of test cases, or their Mutate_InsertByte Insert byte
combinations. Mutate_InsertRepeatedBytes Insert a sequence of bytes
Mutate_ChangeByte Change byte
Based on a similar concept, we model the fuzzing process Mutate_ChangeBit Change a bit of a byte
as a Markov decision problem, where the number of newly Mutate_ShuffleBytes Shuffle the order of bytes within a range
discovered code coverage is considered as the reward, aiming Mutate_ChangeASCIIInteger Change integer
Mutate_ChangeBinaryInteger Change binary integer
for the model to seek new code portions within the program. Mutate_CopyPart Perform byte copy and insertion
The detailed definitions of states, actions, and rewards in
our proposed RL model will be further elaborated in the
subsequent sections. 3) Reward
One of the most crucial steps in designing an RL model is
1) State devising the reward mechanism, as it significantly impacts
For the state space, we represent all test inputs as byte ar- the performance and outcomes of the RL model. Thus, the
rays with a maximum length of 65,536 (0x10000) bytes, as reward mechanism needs to be well-designed and aligned
in Fig. 3. Bytes in each input are converted into values in with the desired goals of the model. In the context of our
the range of [0, 255]. Such a state representation limits the model during fuzzing operations, our aspiration is for the
maximum length of the test input to a constraint of 65,536 model to maximize code coverage, aiming to explore code
bytes due to the consideration that a too-large state would regions that have not been touched before. Consequently,
slow down the processing of the RL model. With the initial new or increased code coverage becomes the criterion within
byte length, based on the chosen mutate action, the input our reward mechanism. Particularly, considering the target
values are modified accordingly by adding bytes, truncating program as a graph with various code blocks as nodes and
bytes, or permuting bytes (new states). Hence, our aim is their relationships as edges, we aim to discover as many edges
for the fuzzing model to select actions based solely on the as possible. The number of newly explored edges plays the
mutated input and choose the best mutation action based on role of an indication for increased coverage and should result
experience. in a higher reward. Hence, given that total_new_coverage is
indeed the number of newly discovered edges and energy is
2) Action the number of attempts used, the reward in our RL model is
In RL, actions are taken to interact with an environment and defined as in (1).
achieve specific goals. In the context of fuzzing, the actions total_new_coverage
are the mutation applied to the test inputs to feed into the reward = (1)
energy
target programs. Our RL model includes 9 mutate actions on
the bytes of the test input, inspired by libFuzzer [27], due to its This reward mechanism is designed to help optimize the
independence and ease of implementation compared to other discovery of new code segments for the model, as only those
fuzzing engines. The detailed definition of these actions is segments truly make an impact, but not the high code cover-
provided in Table 1. age on previously discovered parts. It also ensures the goal
of our model is to prioritize the ability to find inputs that
contribute to new code coverage, rather than achieving the
highest coverage on each input. In addition, using the number
of attempts in the reward is for compatibility with the input
scheduling mechanism described in Section III-C, where the
number of trials is distributed differently from action to action
and the averaging approach will ensure fairness for inputs
with few trials.

4) Deep Q Network Model


Our RL model employs Deep Q-Learning with the archi-
tecture and general operation depicted in Fig. 4. The initial
hidden layer takes an input size of 65,536 (0×10000), which
is the state size of the model. The final hidden layer has 9
units, corresponding to the number of available actions. Ad-
ditionally, at the end of the network, a policy is incorporated to
select an action based on the predicted scores of the 9 actions.
The chosen action from this process is then used to create
FIGURE 3. Converting fuzzing input to state in RL model.
a list of test cases used for fuzzing by transforming the input.
6 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

FIGURE 4. Architecture of Deep Q Learning.

The outcome of the fuzzing process with test cases created by


transforming input with the given action is used as feedback to total_energy ∗ (n − i)
update the layers of the network. Subsequently, the next input energyi = max( , min_energy) (2)
SAP(n)
in the queue becomes the next input state of the RL model.
Where:
• i: The order of the input after being rearranged according
C. SEED SELECTION AND SCHEDULE ALGORITHM
to decreasing code coverage, starting from 0.
Utilizing a random input selection mechanism for transfor- th
• energyi : Number of trials corresponding to the i input
mations, like rlfuzz [10], is ineffective, as it can lead to an • total_energy: Total number of trials for one loop.
improper allocation of test cases due to an equal probability • min_energy: Minimum number of trials.
of selecting inputs with both low and high code coverage. • n: Total number of available inputs in the queue.
Based on the common steps in the fuzzing process with code • SAP(n): Sum of arithmetic progression with n elements
coverage, it can be noticeable that inputs achieving higher (1 . . . n), which is defined as in (3).
code coverage typically surface in later stages, after multiple
n
transformations from ones with lower code coverage. There- X
SAP(n) = i (3)
fore, transforming found inputs with higher code coverage
i=1
has the potential to enable the exploration of deeper and pre-
viously unexplored branches, thus increasing code coverage. With this formula, inputs with higher code coverage, or
Moreover, after fuzzing for a certain period, mutating inputs higher rank, will be tested on a more regular basis with more
with lower code coverage is more likely to revisit previously trials and prioritized for testing. Meanwhile, lower-code-
discovered code regions, resulting in duplication and time coverage inputs will still be tested later with smaller numbers
wastage in the fuzzing process. of trials, but not less than the minimum. Moreover, we limit
Hence, our paper introduces an input selection and schedul- the total trial count of one loop to avoid excessively long loops
ing algorithm with the following criteria. in which newly added inputs are not tested adequately.
In general, Algorithm 1 provides a pseudocode representa-
• Prioritize fuzzing for inputs with higher code coverage. tion of our seed selection and scheduling algorithm. Addition-
• Allocate more trials to inputs with higher code coverage. ally, to complement and prevent wastage for inputs allocated a
• Distribute some trials and ensure the reasonable testing larger trial count, we integrate a waste-reduction mechanism
frequency of inputs with lower code coverage, balancing as detailed in Section III-D2.
the exploitation and exploration of the model.
To ensure that all inputs are used in the fuzzing process, we D. MULTI-LEVEL INPUT MUTATION AND EARLY STOPPING
iterate through input transformations in a loop. At the begin- MECHANISM
ning of each loop, inputs are organized in the queue based on Fuzzing models using RL currently lack practicality due to
their code coverage in descending order and are allocated a the absence of certain techniques or straightforward optimiza-
corresponding number of trials. Upon a loop finishing testing tion algorithms. Notable examples include the Multi-level In-
all inputs, only the ones that result in new code coverage are put Mutation mechanism and the Early stopping Mechanism.
added to the queue for being used as the input for the next These are algorithms already developed and utilized by mod-
loop of further transformations. ern fuzzing tools like AFLplusplus [28]. In the scope of our
For more details, each input in the ordered queue is as- work, we have made slight simplifications and modifications
signed a specific number of trials according to (2), where its to incorporate these techniques seamlessly into our fuzzing
ranking based on code coverage does matter. model.
VOLUME 11, 2023 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

Algorithm 1 Seed selection and schedule algorithm ineffective waste-prevention mechanism, a value that is too
Require: Seed queue Q small can cause the model to ignore significant opportunities
Ensure: List of seed and mutation energy in order for new code coverage in subsequent trials. Thus, a balanced
1: function SEED_SELECTION_AND_SCHEDULE(Q) value for M is essential, striking a trade-off for the model’s
2: Q′ = Q performance. Moreover, the efficacy of this choice also de-
3: sort(Q′ ) // Order seeds in Q′ by coverage pends on the values of the two constants total_energy and
4: n = len(Q′ ) min_energy used in the trial allocation algorithm discussed
5: for i ← 0 to n − 1 do in Section III-C. Particularly, though adjusting M based on
6: Q′ [i].energy = max(total_energy ∗ (n − the allocated number of trials might yield better results, in
i)/SAP(n), min_energy) our model, we opt for simplicity by using a fixed value for
7: end for M throughout the process. The mechanism can be outlined in
8: return Q′ pseudocode form, as shown in Algorithm 3.

E. COVERAGE OBSERVER
1) Multi-level input mutation
In coverage-guided fuzzing models, a crucial step is the ex-
Many modern fuzzing tools like AFLplusplus [28] have im-
traction of code coverage information during program execu-
plemented the idea of multi-level mutation on input, where the
tion, significantly impacting the model’s effectiveness. In our
fuzzing model needs to execute more than one consecutive
work, this process needs to be rapid, precise, and stable, espe-
action to discover new “noteworthy” samples. Considering
cially when dealing with numerous test cases and independent
this as an essential mechanism to enhance the effectiveness
of access to the source code of the target program.
of the fuzzing process, we designed a simplified multi-level
mutation algorithm that is compatible with the RL-based To meet these requirements, we take the idea of the client-
fuzzing model. server model using a fork server introduced in AFLplus-
Algorithm 2 provides a pseudocode representation of the plus [28]. While the server is responsible for initializing the
proposed multi-level mutation algorithm, with the key ideas target program and employing Frida to inject code as well
as follows. as recording the initial state, the client interacts with it via
• The initial transformation depth is set to 1.
• Each input is associated with a counter table that tracks
Algorithm 2 Multi-level input mutation algorithm
the number of times each transformation action is exe-
Require: Input seed, mutation depth depth
cuted.
Ensure: List of test cases
• Each action is selected no more than C times.
1: function MULTI_LEVEL_MUTATE(seed, depth)
• If an action has been selected more than C times, a
2: input = seed.input
different action that has not reached its threshold is
3: energy = seed.energy
randomly chosen for transformation.
4: for step ← 1 to depth − 1 do
• Once all actions for an input have been selected C times,
5: action = get_action(input) // Get action from RL
the input is removed from the queue Qn and placed into
model
the queue Qn+1 .
6: input = mutate(input, action)
• If the queue Qn is emptied, the model proceeds to a new
7: end for
depth of n + 1 and switches to queue Qn+1 .
8: action = get_action(input)
9: // Get action from RL model
2) Early-stopping mechanism
10: if action_is_maximun_try(seed, action, C) then
Another effective algorithm that contributes to the efficiency
11: // Random pick another valid action
of the fuzzing process of other tools is the "early stopping"
12: action = pick_another_action(seed, C)
mechanism, or so-called early abort. Take AFLplusplus as an
13: end if
example, where this allows the fuzzing system to early stop
14: testcase_list = array[energy]
testing and switch to other inputs upon numerous trials on a
15: for i ← 1 to seed.energy do
specific input without achieving appropriate results, ignoring
16: testcase = mutate(input, action)
the remaining number of available trials. Inheriting this idea,
17: testcase_list.append(testcase)
we simplified and implemented this mechanism for our model
18: end for
by introducing a constant M representing the maximum con-
19: update_mutate_count(seed, action)
secutive unsuccessful trials. When transforming an input fails
20: if all_mutate_reach_maximum_try(seed, C) then
to increase code coverage or find new paths for M times
21: remove_from_queue(Qdepth , seed)
continuously, the model will move on to a different input. This
22: add_to_queue(Qdepth+1 , seed)
approach can save a significant number of futile trial attempts.
23: end if
However, it is noticeable that setting an appropriate value
24: return testcase_list
for M is also crucial. While too large a value leads to an
8 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

Algorithm 3 Early stopping mechanism 1) Initially, the model pushes the initial seed to the queue.
1: Q′ = SEED_SELECTION _AND_SCHEDULE(Q) 2) The model proceeds to mutate the seed in a loop. At the
2: for seed in Q′ do start of a new iteration, all seeds in the queue are sorted
3: last_find = 0 in descending order of code coverage to prioritize seeds
4: for i ← 1 to seed.energy do with higher code coverage.
5: input = mutate(seed.input, action) 3) Next, the scheduling algorithm is invoked to allocate
6: reward = run_target(input) the number of attempts for each seed. This is designed
7: if has_new_cov_or_new_unique_path(reward) to distribute mutate energy reasonably, ensuring that a
then seed with higher potential is given more priority while
8: last_find = i still trying to mutate a seed with lower potential. This
9: end if balances the exploitation and exploration of our model.
10: if i − last_find > M then 4) The model runs a loop, taking each seed and its allo-
11: break // go to next seed in queue cated number of attempts as input for the RL model.
12: end if 5) The RL model predicts and chooses an action that
13: end for maximizes the new code coverage for the selected seed.
14: end for 6) The model mutates the seed using the action chosen by
the RL model and the number of attempts given by the
scheduling algorithm, generating a series of test cases.
shared memory to transfer samples and receive code coverage 7) The target application is executed with the generated
information. To implement our coverage observation mecha- test cases, while the code coverage information is col-
nism, we consider CTFuzz model as an ALFplusplus client. lected. Inputs that result in new code coverage are
Then, we designed an application called ex − frsv, which added to the queue to be used in the next iteration.
is responsible for not only initiating a forkserver-like server During this phase, multi-level mutation and early stop-
but also playing the role of a proxy, enabling the connection ping mechanisms are also implemented to optimize the
between client and server. fuzzing process.
The interaction of the proposed CTFuzz and those compo- 8) After finishing fuzzing on a seed, the reward points are
nents to obtain code coverage is depicted in Fig. 5. For more calculated and returned, and the next seed in the queue
details, ex − frsv receives test cases from the CTFuzz model becomes the next state for the RL model.
and sends them to forkserver, which then returns the code 9) Continue to step (4) until the iteration is completed,
coverage result. Some techniques such as shared memory and then return to step (2) to create a new loop to start a
semaphores are also employed to facilitate communication new iteration.
between the model and ex − frsv to enhance both speed and With the design of the proposed CTFuzz, we expect to
stability. Moreover, being a coverage-guided fuzzing tool, the enable a better fuzzing process. First, the RL algorithm can
coverage observer in our model returns obtained coverage via enable the fuzzing model to select better actions based on
a bitmap, in which each bit corresponds to a basic code block the criterion of increasing discovered code coverage. Second,
and is enabled when that block is hit. the input selection and scheduling algorithm may assist in
distributing test attempts according to the nature of inputs,
F. WORKFLOW OF CTFUZZ prioritizing inputs with higher code coverage, and ensuring
The workflow of our proposed model, which involves the testing for lower-coverage inputs with the minimum number
cooperation of the components mentioned above, is illus- of attempts to balance the exploitation and exploration of the
trated in Fig. 6. As a coverage-oriented grey-box fuzzing model. The multi-level mutation mechanism makes the model
model using RL, equipped with other supporting algorithms flexible and more practically effective for long-term and chal-
to enhance effectiveness and performance, CTFuzz works lenging fuzzing scenarios. Meanwhile, the waste prevention
with the main steps as follows. or early stopping mechanism helps avoid wasting time on

FIGURE 5. Coverage observation method.

VOLUME 11, 2023 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

FIGURE 6. The workflow of CTFUzz.

inefficient inputs with excessive test attempts. TABLE 2. Hyperparameters for training DQN-based fuzzing model

Hyperparameter Value
IV. IMPLEMENTATION AND EXPERIMENTS
Size of input layer Size of state space: 65536 (0x10000)
A. RESEARCH QUESTION Size of output layer Size of action space: 9
Based on the improvements expected in the model presented Optimizer Adam
Discount factor γ 0.9
above, in this experimental section, we will focus on answer- ϵ for EpsGreedyPolicy 0.7
ing the following questions. Learning rate 0.001
• Question 1: How effective is the model compared to the
previous RL-based fuzzing model? TABLE 3. Settings of the mutation strategy for fuzzing
• Question 2: What is the efficiency of the model com-
pared to a modern fuzzing tool? Parameter Value
total_energy 1,000,000
• Question 3: What is the contribution of the RL model
min_energy 50
used within the entire framework? C - Maximum selection times of an action 10
• Question 4: If the speed disparity between program- M - Maximum consecutive unsuccessful trial 5,000
ming languages is improved, what will be the effective-
ness of the model?
2) Target programs for evaluation
B. ENVIRONMENTAL SETTINGS In the scope of this paper, our investigation involved statis-
1) Implementation setup tical analysis and fuzz testing of two toolsets, specifically
We deploy the proposed model as well as perform experimen- Binutils-2.34 and Poppler-0.86.1, as outlined in Table 4.
tation on a VPS machine equipped with an Intel(R) Xeon(R) These toolsets are commonly used for testing in practical
CPU E5-2660 v4 @ 2.00GHz, boasting 4 cores, 64 GB RAM, fuzzing research due to their open-source and easy-to-use
and running Ubuntu 20.04.1 LTS 64-bit. On this system, we nature, allowing for testing methods requiring source code
installed AFLplusplus version 4.05c, Binutils 2.34, Poppler access. The target applications used in our experiment are
0.86.1, along with requisite libraries such as tensorflow 2.3.3, categorized into 2 main types, including PDF and ELF. Be-
gym 0.10.3, posix-ipc 1.1.1, keras-rl2 1.0.5, xxhash 3.2.0, etc. sides, they are not too excessive or complicated to require a
Moreover, the RL model is implemented with a DQN- dedicated fuzzer with specific customization for the fuzzing
based agent and an environment deployed using gym library process. However, chosen target programs still possess a high
[29]. More details of hyperparameters of the RL model are potential for vulnerabilities because of their complex func-
mentioned in Table 2, in which they are inherited from related tionalities and diverse formats.
work as well as defined based on the best results in multiple In addition, the initial criteria for the model targeted grey-
experiments. Meanwhile, the settings for mutation strategy box fuzzing of toolkits and software, therefore network ser-
are described in Table 3. vices were not included in the objectives. Network services
10 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

TABLE 4. Target fuzzing programs formula, a positive ER indicates that CTFuzz is more
effective than its counterpart and vice versa, which can
ID Target Program Parameters Version Type be used to compare all the above metrics.
1 readelf -a @@ Binutils-2.34 ELF
2 strings -a @@ Binutils-2.34 ELF
3 size -A -x -t @@ Binutils-2.34 ELF Value of CTFuzz − Value of model X
4 objdump -a -f -x -d @@ Binutils-2.34 ELF ER = × 100
Value of model X
5 nm -C @@ Binutils-2.34 ELF (4)
6 pdfinfo -box @@ Poppler-0.86.1 PDF
7 pdfimages -list -j @@ Poppler-0.86.1 PDF
Where:
8 pdfdetach -list @@ Poppler-0.86.1 PDF -- Value of CTFuzz: The value, which can be one
9 pdftotext -htmlmeta @@ Poppler-0.86.1 PDF of the above 3 metrics, of the CTFuzz model to be
10 pdftohtml -stdout @@ Poppler-0.86.1 PDF
11 pdftoppm -mono @@ Poppler-0.86.1 PDF
compared.
-- Value of model X: The value, which can be one of
the above 3 metrics, of the other models.
receive inputs differently and require a different approach
4) Experimental scenarios
when developing fuzzing tools. Contemporary fuzzing tools
To address these four questions, we conducted experiments to
also have specific features for fuzzing network services. In
evaluate the effectiveness of our model. In our experiments,
terms of image processing tools, fuzzing via parsing software
our proposed CTFuzz is put side by side with 3 other fuzzing
is quite similar to PDF file parsing. However, our tools do
tools, each of which is leveraged to answer one of the above
not yet support more complex operations like interactive
questions. While the first question can be resolved by compar-
image editing in terms of fuzzing based on events during UI
ing CTFuzz with rlfuzz [10], another tool called AFLplusplus
application interactions.
is used for the second one. Moreover, to address the third
question, we compared CTFuzz with a randomly generated
3) Evaluation metrics mechanism designed to separate the RL model from the over-
Our approach is evaluated via the four following metrics. all model and evaluated the performance difference between
• Code coverage is calculated by the total number of selecting transformation actions based on RL and selecting
edges found by each model with the edge-based cover- them randomly. In the case of the final question, which is
age computation method. Each traversed edge is only related to the performance of a programming language, we
counted once, ignoring later duplications during the observe some metrics after 200,000 initial trials.
fuzzing process. As mentioned above, a bitmap is re- To sum up, 2 different scenarios are conducted as follows.
turned to indicate the observed coverage to make up a • Scenario 1: We compared the effectiveness of our
global bitmap to record the edges found after each trial. model against various fuzzing models, including rlfuzz
A higher code coverage represents a better performance. [10], AFLplusplus [28], and a random mechanism within
• Unique paths are measured for each test case, based the same execution time of 6 hours via all metrics and
on the set of edges it traverses (without taking the fre- their ERs.
quency of each edge traversed into account). This is an • Scenario 2: This scenario aims to answer the last ques-
essential metric to evaluate the path-discovery capability tion, where all models are compared using code cover-
of fuzzing models, reflecting the diversity of executed age and unique path after completing 200,000 trials.
paths. Once again, more unique paths found indicate Note that, in each scenario, each model is tested 5 times to
better effectiveness in our model. obtain averaged results.
• Execution rate is a parameter evaluating the speed of
the models, which is identified by time-related infor- C. EXPERIMENTAL RESULTS AND ANALYSIS
mation when trying inputs on the target programs. For 1) Scenario 1: Performance after 6 hours
example, it can be the total number of trials each model Table 5 summarizes the number of edges found by each
performed in a specific amount of time or the time it model after a 6-hour experimentation. In general, CTFuzz
takes to finish a particular number of trials. In fact, achieves lower code coverage compared to AFLplusplus
given the inherently random nature of fuzzing, more when fuzzing most of the tested applications, while it
trials conducted can increase the chances of discovering beats rlfuzz regardless of target programs. For more de-
vulnerabilities or new paths as well as performance. In tails, the most significant decreased performance compared
general, this metric can be calculated by the total number to AFLplusplus is observed in the readelf application, with
of executed trials per second. Hence, the higher speed 73.3%, while the best increase in code coverage can be ob-
of models is reflected via the higher number of trials served in an application called objdump with the ER of 39.5%.
performed or the less time-consuming. Especially, in comparison with rlfuzz, our CTFuzz can even
• Enhancement rate (ER): is used to determine the in- achieve a climb of 315.7%, which is 4 times more effective
crease or decrease in performance of CTFuzz compared than its counterpart in terms of seeking new edges. On aver-
to other models, calculated by (4). According to this age of 11 applications, CTFuzz achieves lower coverage of
VOLUME 11, 2023 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

TABLE 5. Total edge-based coverage (number of found edges) after 6 hours

CTFuzz AFLplusplus Random rlfuzz


ID Program
# Edges # Edges ER (%) # Edges ER (%) # Edges ER (%)
1 readelf 2,950 11,047 -73.3 2,815 4.8 2,278 29.5
2 strings 71.6 72 -0.6 71.6 0 70.4 1.7
3 size 65 65 0 61 6.6 56 16.1
4 objdump 335.6 240.6 39.5 332.4 -1 174,6 92.2
5 nm 100.4 119.6 -16.1 65.2 54 47.6 110.9
6 pdfinfo 149.2 149.4 -0.1 150.6 -0.9 41.8 256.9
7 pdfimages 55 55 0 55 0 52.4 5
8 pdfdetach 53 52.6 0.8 53 0 46.6 13.7
9 pdftotext 195.4 195.4 0 195.2 0.1 47 315.7
10 pdftohtml 326.2 319.2 2.2 322 1.3 207.8 57
11 pdftoppm 105.8 102.2 3.5 105.4 0.4 101.8 3.9
Average ER(%) -4.0 6.1 82.1

4.0% compared to AFLplusplus after 6 hours. In comparison Despite the relatively high ratio, the actual difference is not
with the random transformation model, CTFuzz experiences substantial (9.8 compared to 6.2). On average, CTFuzz lags
a slight increase in code coverage by 6.1% overall. However, behind AFLplusplus by 37.3%. In comparison to the random
the code coverage of the two models is nearly equivalent transformation model, CTFuzz experiences a slight increase.
across all tested applications, except for the nm application, The most significant boost is observed in the nm application
where the difference reaches up to 54% in favor of CTFuzz. with 306.3%, while the lowest decrease is -16.7% in pdfinfo.
Overall, compared with rlfuzz, CTFuzz exhibits a higher code On average, CTFuzz surpasses the random model by 55.8%
coverage with an average increase of 82.1%. in terms of discovered path count across 11 applications.
Concerning the rlfuzz model, CTFuzz once again outperforms
When it comes to finding unique paths, Table 6 con- all test applications. The highest superiority is in pdfinfo with
tains information about the number of discovered paths for 1,053.8%, and the lowest is 15.8% in pdfimages. The average
each model after 6 hours. Once again, CTFuzz trails behind across the 11 applications is 349.1%.
AFLplusplus almost entirely, with the largest gap in the read-
elf application at -95.8%. However, there is an exception in In terms of execution speed, let’s delve into the comparison
the pdftoppm application, where CTFuzz increases by 58.1%. in terms of execution speed, as illustrated in Table 7, based on

TABLE 6. Total number of unique paths found after 6 hours

CTFuzz AFLplusplus Random rlfuzz


ID Program
# Paths # Paths ER (%) # Paths ER (%) # Paths ER (%)
1 readelf 28,753.6 689,824.4 -95.8 11,920.4 141.2 11,056.2 160.1
2 strings 11.4 68.6 -83.4 8.6 32.6 1.6 612.5
3 size 8.8 11.6 -24.1 4 120 2.8 214.3
4 objdump 362.4 477 -24 310.8 16.6 101.4 257.4
5 nm 13 93.2 -86.1 3.2 306.3 2.2 490.9
6 pdfinfo 30 43.4 -30.9 36 -16.7 2.6 1,053.8
7 pdfimages 3 4 -25 3.4 -11.8 2.6 15.4
8 pdfdetach 6 6.8 -11.8 5.8 3.4 3.6 66.7
9 pdftotext 25 60.2 -58.5 25.6 -2.3 3 733.3
10 pdftohtml 309.6 434 -28.7 242 27.9 111.4 177.9
11 pdftoppm 9.8 6.2 58.1 10.2 -3.9 6.2 58.1
Average ER(%) -37.3 55.8 349.1

TABLE 7. Execution speed (number of performed trials per second) after 6 hours

CTFuzz AFLplusplus Random rlfuzz


ID Program
# Trials/s # Trials/s ER (%) # Trials/s ER (%) # Trials/s ER (%)
1 readelf 22.5 208.4 -89.2 23.3 -3.1 12.6 78.7
2 strings 48.5 286.3 -83.1 48.4 0.2 15 224.3
3 size 51.9 598 -91.3 53.1 -2.2 16.2 220
4 objdump 39.9 542.9 -92.7 39.2 1.8 13.9 187.1
5 nm 55.9 731.9 -92.4 57.5 -2.8 16.5 237.8
6 pdfinfo 37.8 200.8 -81.2 39.6 -4.6 14.7 157.2
7 pdfimages 37.6 237.3 -84.1 38.5 -2.2 14.7 155.9
8 pdfdetach 39.7 216.5 -81.7 41.2 -3.6 15.2 161.5
9 pdftotext 36.1 196.8 -81.6 35.4 2.2 14.7 145.3
10 pdftohtml 31.3 202.2 -84.5 32.2 -2.8 14.3 119.5
11 pdftoppm 28.6 210.1 -86.4 28.9 -1 13.4 112.6
Average ER (%) -86.2 -1.7 163.6

12 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

the number of trial attempts per second. CTFuzz experiences within the first 200,000 execs as shown in Table 9, CTFuzz
a significant speed disadvantage, with an average number continues to demonstrate a slight advantage over AFLplus-
of trial attempts per second being 86.2% lower than that of plus, exhibiting an increase of 8.2%. The disparity in im-
AFLplusplus. As for the random transformation model, the provement varies across applications. The largest boost was
difference is marginal at -1.7%. However, CTFuzz wins over seen in objdump at 97.6%, and the most marginal in pdftotext
rlfuzz model with 163.6% faster. at -34.6%. Compared to the random transformation mecha-
nism, CTFuzz surpasses the average by 33.4%, achieving the
2) Scenario 2: Performance after 200,000 trials highest increase in the readelf application at 169.8%, and
Given a fixed 200,000 initial trials, Table 8 summarizes the lowest in pdfdetach at -15.4%. For the rlfuzz model, the
the performance of models in the fuzzing process in terms average increase across 11 applications is 350.4%. Notably,
of discovered edges. In this context, CTFuzz outperforms in the pdftotext application, there is a substantial difference
AFLplusplus, exhibiting an improvement of over 6.5% on av- of 1,560% due to rlfuzz discovering only one path compared
erage in terms of the number of edges discovered. The largest to 16.6 paths found by CTFuzz.
disparity is observed in the objdump application, where CT- Moreover, we proceed to compare the execution speeds of
Fuzz achieves a lead of more than 47.3%. For the rlfuzz the four models within the first 200,000 trials in the form
model, CTFuzz continues to win over when fuzzing most of the number of executed trials per second, as illustrated
applications, showing an average improvement of 82.5%. The in Table 10. The increased ratios of CTFuzz in comparison
highest figure is seen in the pdftotext application, with an to AFLplusplus, the random transformation mechanism, and
increase of 314%, without any lower performance observed. rlfuzz are -86.3%, -3.5%, and 154.6%, respectively.
With the random transformation mechanism, the overall dif-
ference remains relatively small, and the coverage achieved V. DISCUSSION
in various applications is nearly comparable. Notably, in the According to the nature of designed experiments, it is essen-
case of the nm application, CTFuzz exhibits a significant tial to provide a few general observations about the experi-
increase of 70.8% in edge coverage compared to the ran- mental results, as follows.
dom mechanism. On average, across 11 applications, CTFuzz • The 6-hour timeframe is relatively short to have a
achieves more than 9.4% higher edge coverage than the ran- comprehensive comparison of the effectiveness of the
dom mechanism after 200,000 executions. fuzzing models in a real-world scenario, where practical
In terms of the number of discovered execution paths fuzzing projects can span over weeks, utilizing multiple

TABLE 8. Total code coverage (number of found edges) after 200,000 trials

CTFuzz AFLplusplus Random rlfuzz


ID Program
# Edges # Edges ER (%) # Edges ER (%) # Edges ER (%)
1 readelf 2,739.2 3,444.6 -20.5 2,512.6 9 2,194.6 24.8
2 strings 70.8 71.6 -1.1 70.4 0.6 70.4 0.6
3 size 63.4 65 -2.5 61 3.9 56 13.2
4 objdump 291.6 198 47.3 246.8 18.2 160.2 82
5 nm 88.8 64.6 37.5 52 70.8 47.6 86.6
6 pdfinfo 148.6 148.4 0.1 147.4 0.8 41 262.4
7 pdfimages 55 55 0 55 0 52.4 5
8 pdfdetach 51 51 0 52.6 -3 46.6 9.4
9 pdftotext 194.6 194.8 -0.1 193.6 0.5 47 314
10 pdftohtml 311.2 295 5.5 305.4 1.9 151.2 105.8
11 pdftoppm 105.4 99.8 5.6 104.6 0.8 101.4 3.9
Average ER (%) 6.5 9.4 82.5

TABLE 9. Total number of unique paths found after 200,000 trials

CTFuzz AFLplusplus Random rlfuzz


ID Program
# Paths # Paths ER (%) # Paths ER (%) # Paths ER (%)
1 readelf 13,111.4 18,269.6 -28.2 4,858.8 169.8 9,033.8 45.1
2 strings 4.6 5 -8 2.8 64.3 1.6 187.5
3 size 7 7.8 -10.3 4 75 2.8 150
4 objdump 149 75.4 97.6 111.4 33.8 79.8 86.7
5 nm 3 4 -25 3 0 2.2 36.4
6 pdfinfo 22.4 23.6 -5.1 24.4 -8.2 1.4 1,500
7 pdfimages 3 3 0 3 0 2.6 15.4
8 pdfdetach 4.4 3.8 15.8 5.2 -15.4 2.8 57.1
9 pdftotext 16.6 25.4 -34.6 16.4 1.2 1 1,560
10 pdftohtml 148.2 153.2 -3.3 111 33.5 54.6 171.4
11 pdftoppm 8.4 4.4 90.9 7.4 13.5 5.8 44.8
Average ER (%) 8.2 33.4 350.4

VOLUME 11, 2023 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

TABLE 10. Execution speed (number of performed trials per second) after 200,000 trials

CTFuzz AFLplusplus Random rlfuzz


ID Program
# Trails/s # Trails/s ER (%) # Trails/s ER (%) # Trails/s ER (%)
1 readelf 21.9 431.1 -94.9 22.7 -3.6 12.6 73.6
2 strings 46.4 212.6 -78.2 47.3 -1.9 15.1 207.5
3 size 47.4 523.8 -91 53.1 -10.7 16.2 191.9
4 objdump 38.4 454.8 -91.6 37.9 1.3 13.9 177.1
5 nm 55 666.7 -91.7 56.6 -2.8 16.5 234.6
6 pdfinfo 37 198 -81.3 36.7 0.9 14.7 151.4
7 pdfimages 36 232.8 -84.5 38.3 -5.8 14.7 145.2
8 pdfdetach 38.1 220.9 -82.8 38.7 -1.7 15.1 152.3
9 pdftotext 34.5 198.5 -82.6 35.5 -2.9 14.7 135.2
10 pdftohtml 31.3 208.8 -85 31.6 -0.8 14.3 119.1
11 pdftoppm 28.6 204.1 -86 31.9 -10.4 13.4 113.1
Average ER (%) -86.3 -3.5 154.6

machines with significantly higher speeds. However, performance, it has the potential to be highly useful.
this duration is sufficient to highlight differences be- With the rising popularity of ChatGPT and its applica-
tween the models, meanwhile, running each application tion, the potential of ChatGPT-assisted fuzzers can be also
three times helps mitigate the impact of luck. a promising solution. Despite its huge database and diverse
• Many applications achieved similar values across all knowledge obtained from various sources, there are still lim-
four models and did not show significant improvement itations when considering ChatGPT in our approach. For
regardless after 200,000 trials or 6 hours. This could instance, the provided APIs of ChatGPT require expenses and
be due to the less effective initial inputs used or the may lead to the proposed fuzzing tool dependent on network
relatively short timeframe. speed. Besides, the fuzzing process may define various re-
• Some comparison values exhibit minor differences in straints such as data and source code privacy and security,
effectiveness in terms of improvements. However, since then it can be risky when based on a third-party approach like
the base values being compared against were low, the ChatGPT.
resulting improvement ratios appear substantial, impact- Question 3: What is the trade-off regarding speed that
ing the final average numbers. yields performance improvements?
When it comes to addressing the three pre-defined research In the 6-hour and 200,000-trial contexts, the speed differ-
questions, the above experiment results lead us to the follow- ence between CTFuzz and the random mechanism remains
ing answers. marginal, at approximately 0.7%. Despite this small delay,
Question 1: How effective is the model compared to the CTFuzz enhances code coverage and path discovery slightly,
previous RL-based fuzzing model? highlighting the value of the RL mechanism. However, the
This question can be answered using the result of the speed trade-off does not appear to be significant.
comparison of CTFuzz and rlfuzz, another RL-based model. Question 4: How would the model’s effectiveness
Clearly, in both 6-hour and 200,000-trial experiments, it is change if the speed gap between languages is reduced?
evident that the CTFuzz model consistently outperforms the Comparing the execution speeds of the four models reveals
rlfuzz model in all terms of code coverage, unique path count, a considerable discrepancy between AFLplusplus (imple-
and execution speed. CTFuzz’s code coverage improves by mented in C) and the other models (implemented in Python).
approximately 80% compared to rlfuzz, and even surpasses However, the speed difference between CTFuzz and the ran-
rlfuzz in the number of discovered paths by 411.8% in the dom mechanism is minor, around 0.7%. This indicates that
initial 200,000 trials and by 658.6% within the 6-hour time- the primary contributor to the speed difference is the Python
frame. In terms of execution speed, CTFuzz exhibits approx- language rather than the RL component. Considering the
imately 160% faster performance in both contexts. These slight advantage of CTFuzz over AFLplusplus in per-run
results demonstrate the higher effectiveness of CTFuzz than effectiveness, optimizing the model’s execution speed could
that of rlfuzz in both the exploitation and exploration aspects yield promising results.
of the model. In conclusion, the achieved results partially reflect our
Question 2: How does the model compare to a state-of- expectations for the fuzzing model. CTFuzz boasts a higher
the-art fuzzing tool? execution speed, greater code coverage, and an increased
Take AFLplusplus as an example of a modern fuzzing tool, number of found paths compared to the rlfuzz model in both
the effectiveness of CTFuzz slightly falls behind in all evalu- the 6-hour and 200,000-trial experiments. Although CTFuzz
ation metrics. However, when examining the effectiveness on operates slightly slower than the random mechanism, it si-
a per-run basis, CTFuzz achieves slightly better results than multaneously improves code coverage and path count. This
AFLplusplus. This indicates that if the speed of the CTFuzz suggests the RL mechanism has some positive effects, though
model can be improved without compromising its overall not overwhelmingly substantial. While the discussion has
14 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

provided valuable insights, further detailed comparisons and GPT not only achieves greater coverage but also identifies
evaluations between different RL models and their hyper- numerous previously undetected bugs, leading to a collabo-
parameters are necessary for optimal and efficient fuzzing. ration with the Syzkaller team to incorporate these automati-
Additionally, some applications reached stagnation in code cally inferred specifications. This development underscores
coverage and path discovery, indicating either the initial input the potential of LLMs to transform the scope and efficacy
ineffectiveness or the need for longer experimentation peri- of kernel fuzzing practices, providing a more automated and
ods. systemic approach to securing operating systems against a
Regarding time constraints, we have not conducted com- broad spectrum of threats.
prehensive comparisons and performance evaluations across In terms of protocol fuzzing, pretrained LLMs have shown
different RL models, nor have we thoroughly assessed the significant potential in advancing fuzzing protocols, partic-
hyperparameters. Consequently, the choices may not have ularly in overcoming the limitations of traditional mutation-
resulted in optimal and efficient fuzzing. Moreover, several based fuzzing techniques. By leveraging the extensive knowl-
applications exhibited coverage and path discovery stagna- edge embedded within LLMs in ChatAFL approach by Ruijie
tion, which suggests that either the initial inputs employed Meng et al. [31], which have been trained on vast amounts
were ineffective for those applications or the experimentation of human-readable protocol specifications, it becomes pos-
duration was insufficient for the model to explore new paths. sible to extract machine-readable information that aids in
generating valid and diverse test inputs. This is especially
VI. FUTURE DIRECTIONS useful given that protocol implementations often lack formal,
Much more research effort is needed for RL-based fuzzing machine-readable specifications, relying instead on extensive
models to be effectively applicable in practice. One of the natural language documentation. The LLM-guided approach
significant challenges is speed, which can be possibly im- enhances state and code coverage by constructing grammar
proved by transitioning the model to other faster program- for various message types within a protocol and predicting
ming languages, like Rust. Moreover, our proposed fuzzer can subsequent messages in a sequence.
be redesigned in a multithreaded processing manner to speed The potential of LLMs like GPT in enhancing fuzzing tech-
it up significantly. Besides, a comprehensive comparison with niques extends beyond kernel security to help fuzzers encom-
other RL models with more complicated architecture or using pass userland applications in binary software and protocols
various RL algorithms is also a promising direction. Besides, [31]. LLMs demonstrate a profound capability to understand
instead of focusing on input selection and scheduling, the and generate complex code patterns, which can be leveraged
process of input creation can also be enhanced by deploying to automate the generation of fuzzing inputs for userland
state-of-art data generation techniques, such as Generative binary applications. This is particularly valuable in scenarios
adversarial networks (GANs) in RapidFuzz [19], CGFuzzer where conventional fuzzing struggles due to the complexity
[20]. Such GANs-based approaches can be used to learn data of the input structures required by these applications. By
structures to effectively create input, which is useful for deal- automating input generation, LLMs can uncover vulnerabil-
ing with complicated input types or strict input verification in ities that might be missed by more traditional methods, thus
some target programs. broadening the scope of security testing in userland environ-
In the evolving landscape of operating system security, in ments. Furthermore, the adaptability of LLMs to understand
addition to software fuzzing in userland mode, kernel fuzzing context from documentation and prior code enables them to
has emerged as a critical technique for uncovering vulnerabil- tailor fuzzing approaches to the specific nuances of userland
ities that could potentially impact billions of devices globally. binaries. This enhances the efficacy and coverage of fuzz
One of the foundational tools in this domain is Syzkaller, tests, pushing the boundaries of what can be achieved with
which utilizes a domain-specific language, syzlang, to metic- current technology in identifying and mitigating potential
ulously define system call (syscall) sequences and their in- software vulnerabilities.
terdependencies. Despite the progress in automating kernel In summary, while ChatGPT and other LLMs can en-
fuzzing, the generation of Syzkaller specifications has largely hance certain aspects of the fuzzing process, especially in
remained a manual endeavor, with a significant number of test case generation and automation [32], its effectiveness in
syscalls still not effectively covered. Recognizing this gap, fuzzing binary software is limited by its lack of specialized
the recent study of Chenyuan Yang et al. introduced in Kernel- knowledge in low-level computing and potential scalability
GPT approach [30], marks a significant advancement. It har- issues. Hence, to make it more suitable for fuzzing, LLMs
nesses the capabilities of Large Language Models (LLMs) to need to be customized, fine-tuned and pretrained in specific
infer Syzkaller specifications, leveraging the extensive kernel large-scale datasets. In our future works, it can be best used
code, documentation, and use case data encoded during the as a supplementary tool alongside more specialized fuzzing
pre-training of these models. The KernelGPT model utilizes tools and frameworks. Specifically, in the test case generation
an iterative method to derive and refine syscall specifications, task for fuzzing, ChatGPT or other LLMs can significantly
integrating feedback mechanisms to enhance the accuracy enhance the process by leveraging its language processing
and comprehensiveness of the generated sequences. Prelim- capabilities to create diverse and contextually appropriate
inary findings highlighted in the study indicate that Kernel- inputs. It can generate various forms of user-like data, develop
VOLUME 11, 2023 15

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

complex user scenarios, and create semantic variants of input [9] Tai Yue, Pengfei Wang, Yong Tang, Enze Wang, Bo Yu, Kai Lu, and
data, which are critical in exploring different execution paths Xu Zhou. Ecofuzz: Adaptive energy-saving greybox fuzzing as a variant
of the adversarial multi-armed bandit. In Proceedings of the 29th USENIX
in software. Additionally, LLMs can aid in automating test Conference on Security Symposium, pages 2307–2324, 2020.
script writing and can integrate and interpret outputs from [10] Zheng Zhang, Baojiang Cui, and Chen Chen. Reinforcement learning-
other tools to suggest relevant test cases, thereby enriching based fuzzing technology. In Innovative Mobile and Internet Services in
Ubiquitous Computing: Proceedings of the 14th International Conference
the depth and coverage of the fuzzing process in uncover- on Innovative Mobile and Internet Services in Ubiquitous Computing
ing potential vulnerabilities. We also intend to improve our (IMIS-2020), pages 244–253. Springer, 2021.
work with LLM model in the future, for boosting fuzzing [11] Jiaxi Ye, Ruilin Li, and Bin Zhang. Rdfuzz: Accelerating directed fuzzing
with intertwined schedule and optimized mutation. Mathematical Prob-
performance in many open-source software projects, kernel lems in Engineering, 2020:1–12, 2020.
applications and protocols. [12] Yan Wang, Peng Jia, Luping Liu, Cheng Huang, and Zhonglin Liu. A
systematic review of fuzzing based on machine learning techniques. PloS
one, 15(8):e0237749, 2020.
VII. CONCLUSION [13] Jianye Hao, Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu,
Combining RL and current fuzzing techniques holds the po- Zhaopeng Meng, Peng Liu, and Zhen Wang. Exploration in deep re-
tential to accelerate fuzzing effectively. However, there are inforcement learning: From single-agent to multiagent domain. IEEE
Transactions on Neural Networks and Learning Systems, pages 1–21, 2023.
still significant limitations that hinder its practicality. These [14] Jinghan Wang, Chengyu Song, and Heng Yin. Reinforcement Learning-
include the imbalance between exploitation and exploration based Hierarchical Seed Scheduling for Greybox Fuzzing. In The Network
in the model, slow speed, and the absence of comparative and Distributed System Security (NDSS) Symposium 2022, 2022.
[15] Tiantian Ji, Zhongru Wang, Zhihong Tian, Binxing Fang, Qiang Ruan,
studies regarding practical effectiveness against real-world Haichen Wang, and Wei Shi. AFLPro: Direction sensitive fuzzing. Journal
fuzzing tools. In this article, we introduce an RL-based of Information Security and Applications, 54:102497, 2020.
fuzzing model with code coverage that can balance the ex- [16] Peng Chen, Jianzhong Liu, and Hao Chen. Matryoshka: Fuzzing deeply
nested branches. In Proceedings of the 2019 ACM SIGSAC Conference on
ploitation and exploration aspects by efficient input selection Computer and Communications Security, pages 499–513, 2019.
and scheduling algorithms. Moreover, to improve efficiency, [17] Hanfang Zhang, Anmin Zhou, Peng Jia, Luping Liu, Jinxin Ma, and Liang
multi-level input mutation algorithms and early termination Liu. InsFuzz: Fuzzing binaries with location sensitivity. IEEE Access,
7:22434–22444, 2019.
mechanisms are also implemented. The effectiveness of our [18] Craig Beaman, Michael Redbourne, J Darren Mummery, and Saqib Hakak.
proposed CTFuzz model has been demonstrated via experi- Fuzzing vulnerability discovery techniques: survey, challenges and future
ment results, where it is compared with other modern fuzzing directions. Computers & Security, page 102813, 2022.
[19] Aoshuang Ye, Lina Wang, Lei Zhao, Jianpeng Ke, Wenqi Wang, and
tools or RL-based fuzzing models in the capability of discov- Qinliang Liu. RapidFuzz: accelerating fuzzing via generative adversarial
ering new paths, increasing code coverage as well as time- networks. Neurocomputing, 460:195–204, 2021.
effectiveness. Despite its potential outcomes, the contribu- [20] Zhenhua Yu, Haolu Wang, Dan Wang, Zhiwu Li, and Houbing Song.
CGFuzzer: A fuzzing approach based on coverage-guided generative ad-
tion of CTFuzz is still bounded, such as insufficiently rapid versarial networks for industrial IoT protocols. IEEE Internet of Things
speed, limited length of test cases, only aiming to binary Journal, 9(21):21607–21619, 2022.
software and insignificant improvement compared to the ran- [21] Yunchao Wang, Zehui Wu, Qiang Wei, and Qingxian Wang. Neufuzz:
Efficient fuzzing with deep neural network. IEEE Access, 7, 2019.
dom mechanism, all of which need to be considered in future [22] Konstantin Böttinger, Patrice Godefroid, and Rishabh Singh. Deep rein-
efforts. forcement fuzzing. In 2018 IEEE Security and Privacy Workshops (SPW),
pages 116–122. IEEE, 2018.
ACKNOWLEDGMENT [23] Alexandr Kuznetsov, Yehor Yeromin, Oleksiy Shapoval, Kyrylo Chernov,
Mariia Popova, and Kostyantyn Serdukov. Automated Software Vulner-
This research was supported by The VNUHCM-University of ability Testing Using Deep Learning Methods. In IEEE 2nd UKRCON,
Information Technology’s Scientific Research Support Fund. 2019.
[24] Sameer Reddy, Caroline Lemieux, Rohan Padhye, and Koushik Sen.
Quickly Generating Diverse Valid Test Inputs with Reinforcement Learn-
REFERENCES
ing. In 2020 IEEE/ACM 42nd International Conference on Software
[1] Valentin JM Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Engineering (ICSE), pages 1410–1421, 2020.
Manuel Egele, Edward J Schwartz, and Maverick Woo. The art, science, [25] Xiao Liu, Rupesh Prajapati, Xiaoting Li, and Dinghao Wu. Reinforcement
and engineering of fuzzing: A survey. IEEE Transactions on Software compiler fuzzing. In ICML 2019 Workshop, 2019.
Engineering, 47(11):2312–2331, 2019. [26] William Drozd and Michael D Wagner. Fuzzergym: A competitive frame-
[2] Fayozbek Rustamov, Juhwan Kim, Jihyeon Yu, and Joobeom Yun. Ex- work for fuzzing and learning. arXiv preprint arXiv:1807.07490, 2018.
ploratory Review of Hybrid Fuzzing for Automated Vulnerability Detec- [27] libfuzzer – a library for coverage-guided fuzz testing. https://llvm.org/docs/
tion. IEEE Access, 9:131166–131190, 2021. LibFuzzer.html.
[3] Xiaogang Zhu, Sheng Wen, Seyit Camtepe, and Yang Xiang. Fuzzing: a [28] American Fuzzy Lop plus plus (AFL++). https://github.com/AFLplusplus/
survey for roadmap. ACM Computing Surveys, 54:1–36, 2022. AFLplusplus.
[4] Sanoop Mallissery and Yu-Sung Wu. Demystify the fuzzing methods: A [29] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John
comprehensive survey. ACM Computing Surveys, 56(3):1–38, 2023. Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym, 2016.
[5] Xiaoqi Zhao, Haipeng Qu, Jianliang Xu, Xiaohui Li, Wenjie Lv, and Gai- [30] Chenyuan Yang, Zijie Zhao, and Lingming Zhang. Kernelgpt: En-
Ge Wang. A systematic review of fuzzing. Soft Computing, 2023. hanced kernel fuzzing via large language models. arXiv preprint
[6] Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. Coverage- arXiv:2401.00563, 2023.
Based Greybox Fuzzing as Markov Chain. IEEE Transactions on Software [31] Ruijie Meng, Martin Mirchev, Marcel Böhme, and Abhik Roychoudhury.
Engineering, 45(5):489–506, 2019. Large language model guided protocol fuzzing. In Proceedings of the
[7] Jinghan Wang, Chengyu Song, and Heng Yin. Reinforcement learning- 31st Annual Network and Distributed System Security Symposium (NDSS),
based hierarchical seed scheduling for greybox fuzzing. In Network and 2024.
Distributed Systems Security (NDSS) Symposium 2021, 2021. [32] Yutian Tang, Zhijie Liu, Zhichao Zhou, and Xiapu Luo. Chatgpt vs sbst:
[8] Hongliang Liang, Xiaoxiao Pei, Xiaodong Jia, Wuwei Shen, and Jian
A comparative assessment of unit test suite generation. IEEE Transactions
Zhang. Fuzzing: State of the Art. IEEE Transactions on Reliability,
on Software Engineering, pages 1–19, 2024.
67(3):1199–1218, 2018.

16 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3421989

Van-Hau Pham et al.: A Coverage-guided Fuzzing for Software Vulnerability Detection using RL-enabled Multi-Level Input Mutation

VAN-HAU PHAM obtained his bachelor’s degree PHAM THANH THAI graduated in Information
in computer science from the University of Natural Security with Honor Program in 2023 at the Uni-
Sciences of Hochiminh City in 1998. He pursued versity of Information Technology, Vietnam Na-
his master’s degree in Computer Science from the tional University, Ho Chi Minh City, Vietnam
Institut de la Francophonie pour l’Informatique (UIT-VNU-HCM). Since 2022, he has been ac-
(IFI) in Vietnam from 2002 to 2004. Then he did tively engaged as a student member at the Informa-
his internship and worked as a full-time research tion Security Laboratory (InSecLab), UIT-VNU-
engineer in France for 2 years. He then persuaded HCM, focusing on projects related to information
his Ph.D. thesis on network security under the security and AI-driven security. His primary re-
direction of Professor Marc Dacier from 2005 to search focuses on cybersecurity, AI cybersecurity
2009. He is now a lecturer at the University of Information Technology, Viet- endeavors, particularly in fuzzing techniques for binary exploitation with AI.
nam National University Ho Chi Minh City (UIT-VNU-HCM), Hochiminh
City, Vietnam. His main research interests include network security, system
PHAN THE DUY received the B.Eng. and M.Sc.
security, mobile security, and cloud computing.
degrees in Software Engineering and Informa-
tion Technology, respectively from the University
DO THI THU HIEN received the B. Eng. degree of Information Technology (UIT), Vietnam Na-
in Information Security from the University of tional University Ho Chi Minh City (VNU-HCM),
Information Technology, Vietnam National Uni- Hochiminh City, Vietnam in 2013 and 2016, re-
versity Ho Chi Minh City (UIT-VNU-HCM) in spectively. Currently, he is pursuing a Ph.D. degree
2017. She received an M.Sc. degree in Information majoring in Information Technology, specializing
Technology in 2020. From 2017 until now, she in Cybersecurity at UIT, Hochiminh City, Vietnam.
has worked as a member of a research group at From 2016, he also worked as a researcher member
the Information Security Laboratory (InSecLab) in the Information Security Laboratory (InSecLab), UIT-VNU-HCM after 5
at UIT. Her research interests are malware analy- years in the industry, where he joined and created several security-enhanced
sis and detection, Information security & privacy, and large-scale teleconference systems. His main research interests concen-
Software-defined Networking, and its related security-focused problems. trate on Cybersecurity & privacy problems, including Software Security,
Software-Defined Networking (SDN), Malware and Cyber threat detection,
Digital forensics, Machine Learning and Adversarial Machine Learning in
NGUYEN PHUC CHUONG graduated in Infor-
Cybersecurity domains, Private Machine Learning, and Blockchain.
mation Security with Honor Program in 2023 at the
University of Information Technology, Vietnam
National University, Ho Chi Minh City, Vietnam
(UIT-VNU-HCM). Since 2022, he has been ac-
tively engaged as a student member at the Informa-
tion Security Laboratory (InSecLab), UIT-VNU-
HCM, focusing on projects related to information
security and AI-driven security. His primary re-
search focuses on software security, automatic ex-
ploitation, and AI cybersecurity endeavors, particularly in fuzzing techniques
for binary exploitation with AI.

VOLUME 11, 2023 17

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like