0% found this document useful (0 votes)
13 views

Machine Learning-Based Fuzz Testing Techniques A Survey

Uploaded by

Adel El-Shahat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Machine Learning-Based Fuzz Testing Techniques A Survey

Uploaded by

Adel El-Shahat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Received 4 November 2023, accepted 19 December 2023, date of publication 26 December 2023,

date of current version 31 January 2024.


Digital Object Identifier 10.1109/ACCESS.2023.3347652

Machine Learning-Based Fuzz Testing


Techniques: A Survey
AO ZHANG 1, YIYING ZHANG 1, YAO XU 1, CONG WANG 1, AND SIWEI LI2
1 College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, China
2 State Grid Information and Telecommunication Company Ltd., Beijing 102200, China

Corresponding author: Yiying Zhang ([email protected])

ABSTRACT Fuzz testing is a vulnerability discovery technique that tests the robustness of target programs by
providing them with unconventional data. With the rapid increase in software quantity, scale and complexity,
traditional fuzzing has revealed issues such as incomplete logic coverage, low automation level and
insufficient test cases. Machine learning, with its exceptional capabilities in data analysis and classification
prediction, presents a promising approach for improve fuzzing. This paper investigates the latest research
results in fuzzing and provides a systematic review of machine learning-based fuzzing techniques. Firstly,
by outlining the workflow of fuzzing, it summarizes the optimization of different stages of fuzzing using
machine learning. Specifically, it focuses on the application of machine learning in the preprocessing phase,
test case generation phase, input selection phase and result analysis phase. Secondly, it mentally focuses on
the optimization methods of machine learning in the process of mutation, generation and filtering of test cases
and compares and analyzes its technical principles. Furthermore, it analyzes the performance gains brought
by applying machine learning techniques to fuzzing, mainly including coverage, vulnerability detection
capability, efficiency and effectiveness of test cases. Lastly, it concludes by summarizing the challenges
and difficulties in combining machine learning with fuzzing and presents prospects for future trends in this
field.

INDEX TERMS Vulnerability discovery, fuzzing, machine learning.

I. INTRODUCTION data. The value of fuzzing has been explored, black-box [3],
In recent years, there has been a proliferation of network white-box [4] and gray-box fuzzers [5] have appeared one
attacks and a rapid increase in the number of vulnerabili- after another. Countless scholars have carried out continuous
ties, leading to potential risks such as information leakage improvement and enhancement, and the coverage rate and
or loss. Vulnerability discovery techniques aim to identify anomaly triggering ability have been improved to different
and patch vulnerabilities before they are exploited by attack- degrees. However, traditional fuzzing still faces several chal-
ers [1], effectively reducing security threats and maintaining lenges, such as an insufficient number of existing test cases,
the secure operation of networks. Fuzz testing, as an effec- weak ability of generated test cases to trigger vulnerabilities,
tive method for vulnerability discovery, attempts to trigger the lack of differentiation between test case weights during
program anomalies by automatically or semi-automatically input selection, and a relatively high degree of blindness
generating test cases, monitoring target program execution during the testing process.
and providing feedback to adjust the generation of test cases. With the remarkable performance of machine learning
It offers the advantages of easy deployment and broad appli- techniques in statistical learning, natural language processing
cability. The concept of fuzz testing was initially proposed and pattern recognition, researchers have applied these tech-
by Miller in 1990 [2], who designed a tool called Fuzz to niques to the field of cybersecurity, including the detection of
test the robustness of target programs using unconventional malicious code [6] and intrusion detection [7]. Machine learn-
ing can automatically learn grammar rules that conform to
The associate editor coordinating the review of this manuscript and syntax specifications from a large number of samples, effec-
approving it for publication was Xinyu Du . tively addressing classification problems in fuzzing, such as

2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 14437
A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

FIGURE 1. Basic flow of fuzzing.

determining the validity of generated test cases and the usabil- machine learning techniques at different stages of fuzzing,
ity of seeds for mutation. Furthermore, machine learning can comparing and analyzing the strengths and technical princi-
reduce manual effort and minimize the time overhead of ples of different fuzzing tools. Furthermore, in Section IV,
fuzzing. Therefore, combining machine learning with fuzzing the performance gains to fuzzing from different machine
provides new ideas and methods for alleviating the bottle- learning approaches are theorized. Next, the challenges faced
necks of traditional fuzzing techniques. How to balance the by existing fuzzing techniques are analyzed, and in Section V,
advantages of both to better enhance vulnerability detection the existing liberation schemes are presented as well as an
is still an area that requires further research. This paper insight into the future trends in the field. Finally, Section VI
focuses on the background of machine learning, analyzes summarizes and concludes the work presented in this paper.
and reviews a large body of literature on the combination of
machine learning and fuzzing. Taking the basic process of II. OVERVIEW OF FUZZING
fuzzing as a vein, it introduces various improved methods A. BASIC FLOW OF FUZZING
of fuzzing implementation based on different machine learn- Fuzzing involves constructing a large number of illegal
ing models, comparing and analyzing their enhancements test inputs, fuzz testing the target program, monitoring its
and improvements. Furthermore, it introduces the perfor- execution, observing and recording any abnormal behavior,
mance gain of different machine learning methods for fuzzing analyzing the cause of abnormality or crashes, and finally
and demonstrates the effectiveness of machine learning for detecting vulnerabilities. The basic flow of fuzzing can be
fuzzing improvement. Moreover, it identifies existing issues divided into six parts: pre-processing, test case generation,
in applying machine learning techniques to fuzzing and pro- input selection, test execution, evaluation and result analy-
vides insights into future development trends. sis [8], as shown in Figure 1.
The primary contributions of our work can be summarized The preprocessing stage primarily involves collecting rel-
as follows: evant information about the target program and specifying
(1) This paper refers to and examines a large amount of the strategy for fuzzing to assist the fuzzing tool in detecting
relevant literature and highlights the latest research results in or observing the target program. This stage typically relies
the past five years, which can better grasp the future direction on program analysis techniques such as instrumentation [9]
of the fuzzing field. Not only that, this paper analyzes and [10], symbolic execution [11], [12] and taint analysis [13],
organizes research on fuzzing in different areas, such as [14]. Existing research efforts have focused on integrating
fuzzing in the Internet of Things, web applications, compilers one or more of these techniques into hybrid fuzzing to
and deep learning models, which encompasses common areas improve overall performance. For example, Risk-AFL [10]
where fuzzing can be used. proposes a risk-guided seed selection method based on AFL.
(2) This paper focuses on the workflow of fuzzing and During program operation, the risk fitness of the seeds is
introduces the application of machine learning methods in calculated based on the risky functions and function calls
four different stages: preprocessing, test case generation, on the program execution path by means of the instru-
input selection and result analysis. It compares and contrasts ment, and the seed selection strategy of AFL is improved
various improvement techniques, explaining their underlying accordingly. Intriguer [11] optimizes symbolic execution by
technical principles and the resulting optimization enhance- utilizing field-level knowledge to more effectively simulate
ments. Finally, it provides a comprehensive summary of symbolically relevant instructions. TaintPoint [14] applies to
the performance gains achieved through the utilization of the seed mutation stage of general fuzzing and obtains more
machine learning algorithms. It facilitates readers to better accurate taint analysis results to guide mutation.
understand the overall workflow of fuzzing and helps them The test case generation phase is mainly to obtain a large
to carry out in-depth research. number of test inputs, and based on the relevant information
(3) By comparing different improvement methods, the obtained in the preprocessing phase, select the appropri-
problems and challenges in this field are analyzed and sum- ate generation method or mutation strategy to construct a
marized, and the possible hot research directions in the field large number of test cases which are suitable for the target
of fuzzing in the background of machine learning are put program. The test case generation phase consists of seed
forward. selection, mutation strategy scheduling and test case gener-
Section II provides a brief overview of the basic pro- ation. Seed selection is a process of evaluating the likelihood
cess of fuzzing. Section III introduces the application of that a seed could trigger a program anomaly and prioritizing

14438 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

information such as the input data format of the target.


In addition, during the testing process, black-box fuzzing
cannot track the execution status inside the target and can
only determine the status of the target by detecting the output
data of the target [8]. Black-box fuzzing tools are simple
to implement and fast to test, and are more suitable for
target programs with highly structured input data, as well as
complex and difficult to analyze target programs. However,
its detection is not satisfactory.
White-box fuzzing is the opposite of black-box fuzzing in
that it obtains sufficient internal information about the target
to generate high-quality test inputs. White-box fuzzing has
better performance in the coverage of programs and in the
detection of deep vulnerabilities. However, the method can
seriously affect the efficiency of fuzzing because a detailed
FIGURE 2. Classification of fuzzing. and comprehensive analysis of the target program consumes
a lot of resources.
Gray-box fuzzing is between black-box and white-box,
the higher quality seeds for mutation, so as to reduce the and only part of the in-ternal information of the target is
number of invalid test cases generated. Mutation strategy obtained for fuzzing. The method aims to obtain satisfactory
scheduling is similar to the idea of seed selection, prioritizing test results with limited internal information and a good
near-excellent mutation strategies to improve test case bypass testing strategy. Compared to both black-box and white-box,
and reduce mutation blindness. Test case generation can be gray-box is more flexible and has more advantages. Gray-box
categorized based on the generation method as generation- fuzzing can find a suitable balance between detection capa-
based, mutation-based and based on a combination of the two. bility and resource consumption to obtain the best detection
There are more studies on the application of machine learning results.
techniques to this phase, which are described in detail below
in the context of existing studies.
The input selection phase screens constructed test cases 2) GENERATION-BASED, MUTATION-BASED AND
before execution, eliminates invalid cases, and reduces com- COMBINATION OF GENERATION-BASED
putation time. AND MUTATION-BASED
The test execution phase involves entering the constructed The generation-based test case generation approach is mainly
test cases into the target program, monitoring the execution based on the known input case format or protocol syntax to
of the program, and identifying and recording abnormal state generate new test cases. The method needs to generate and
changes. process inputs according to the specification of the expected
The evaluation phase selects suitable indicators to assess input format or protocol.
fuzzing effectiveness and vulnerability mining ability. The Mutation-based test case generation is based on exist-
results are fed back to the test case generation phase to ing test cases with certain mutation methods (e.g., bit-flip,
optimize the fuzzer. byte-inversion, arithmetic increment/decrement and splicing
The result analysis phase analyzes output results after operations) [15]. In general, blind mutation or manipulation
fuzzing execution. Based on abnormal program states, of data generates a multitude of invalid test cases. The intro-
causes and defect categories are identified to better detect duction of machine learning techniques can guide mutation
vulnerabilities. operations and improve the quality of generated test cases.
The test case generation approach based on a combination
B. CLASSIFICATION OF FUZZING of generation and mutation considers both variation and gen-
Fuzzing can be classified according to different classification eration approaches for different test scenarios to maximize
bases. As shown in Figure 2. black-box fuzzing, gray-box their advantages and better guide test case generation.
fuzzing and white-box fuzzing can be classified according to
the degree of analysis of the information inside the program. III. FUZZING IN THE CONTEXT OF MACHINE LEARNING
According to the way of test case generation, they can be clas- Existing fuzzing tools have problems such as low degree
sified as generation-based fuzzing, Mutation-based fuzzing of automation and weak vulnerability triggering ability of
and combined generation and Mutation-based fuzzing. fuzzing test cases, the excellent data processing and classi-
fication prediction ability of machine learning technology is
1) BLACK-BOX, GRAY-BOX AND WHITE-BOX utilized and applied to different stages of fuzzing, which can
Black-box fuzzing cannot analyze the internal state and realize the quality optimization of fuzzing test cases and the
structure of the target, but only obtains internally irrelevant efficiency improvement of vulnerability detection.

VOLUME 12, 2024 14439


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

FIGURE 3. Schematic diagram of machine learning-based fuzzing process.

Currently, machine learning is primarily applied in the pre- to accelerate constraint solving in cooperative execution [16].
processing stage, test case generation stage, input selection The results of constraint solving are then used to guide
stage and result analysis stage of fuzzing. The basic process the subsequent fuzzing process, improving the efficiency
of fuzzing combined with machine learning algorithms is of vulnerability discovery. Liu et al. propose SiCsFuzzer,
illustrated in Figure 3. In the preprocessing stage, machine a tool that adopts a sparse instrumentation-based tracing strat-
learning algorithms can analyze and predict the program egy combined with ‘‘warm-up’’ optimization to improve the
information obtained during preprocessing, enhancing the efficiency of fuzzing for closed-source programs by elim-
effectiveness of program analysis techniques combined with inating the redundancy overhead in the coverage tracking
fuzzing. In the test case generation stage, machine learning process of closed-source software without compromising
algorithms can be used to optimize seed selection, guide coverage calculation accuracy [17]. Meanwhile, Xiao et al.
mutation strategies and mutation point selection, facilitating leverage runtime information obtained through instrumenta-
seed and test case generation. In the input selection stage, tion as rewards in a deep reinforcement learning network,
machine learning algorithms can filter and select test inputs. guiding the generation of more targeted and directed test
For example, machine learning algorithms can be used for cases [18].
vulnerability prediction and classification processing of test Many researchers have dedicated efforts to combine these
inputs, prioritizing the selection of test inputs that are more techniques with fuzzing, complementing each other. The
likely to trigger vulnerabilities when passed into the target hybrid fuzzing methods overcome difficulties by employing
program. In the result analysis stage, machine learning can one technique when the other encounters bottlenecks, lead-
efficiently and reasonably analyze the numerous test results, ing to higher coverage and the exploration of deep program
enabling the identification of true vulnerabilities within a regions to discover deep-seated vulnerabilities. MPFuzz pro-
large number of crashes and anomalies. poses a hybrid fuzzing technique that combines symbolic
Based on the general process of fuzzing, this section simulation and grammar-based [19]. Symbolic simulation is
provides a detailed introduction to the application and used to guide the testing process for achieving high cover-
improvements of machine learning algorithms in different age, while grammar-based fuzzing generates test instructions
stages of fuzzing. It systematically elucidates fuzzing meth- conforming to the syntax specifications of microprocessor
ods based on different machine learning techniques and RTL designs. The combination of both techniques efficiently
intuitively presents the performance of different fuzzing mod- generates test instructions for microprocessor RTL designs.
els in tabular form. Furthermore, the utilization of deep learning techniques can
learn code space features suitable for both techniques before
A. PRE-PROCESSING program execution, serving as guidance for hybrid fuzzing.
The preprocessing stage of fuzzing can utilize program This approach effectively enhances code coverage and sig-
analysis techniques such as instrumentation and symbolic nificantly improves defect detection capabilities. Gao et al.
execution to extract program features or runtime informa- introduce a hybrid testing method based on deep learn-
tion, providing support for generating subsequent test cases. ing [20]. The algorithm flow is illustrated in Figure 4. Given
For instance, Pangolin introduces the concept of incremental a program, a graphical representation of its paths is con-
fuzzing, which involves a polyhedral path abstraction method structed, and a gated graph neural network (GGNN) model is

14440 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

FIGURE 4. The overall framework of deep learning-based hybrid fuzzing [20].

employed to predict whether a path is suitable for fuzzing or sample generation and reliance on manual involvement,
symbolic execution. This leads to the development of Smart- thereby greatly improving the quality of generated samples.
FuSE, which guides symbolic execution or fuzzing tools Machine learning has been widely used in this stage in
attempt to execute the set of paths preferentially. Moreover, existing research. Therefore, in this section, we will divide
considering the inaccuracies of model predictions, Smart- the discussion based on the problems addressed by machine
FuSE proposes a hybrid mechanism that, among the set of learning algorithms, and specifically introduce their appli-
predicted paths suitable for executing fuzzing, passes the cations in mutation strategy scheduling, seed selection and
paths uncovered by the fuzzing to symbolic execution, and test case generation problems. Not only that, we will also
uses the symbolic execution technique to traverse the paths analyze how different machine learning models contribute
for further improvement of the overall coverage. to the improvement and enhancement of the efficiency and
Where AST denotes abstract syntax tree, depth denotes capability of fuzzing.
path depth, X denotes the score suitable for fuzzing, and Y
denotes the score suitable for symbolic execution. 1) SEED SELECTION
Hybrid fuzzing that incorporates one or more techniques Seeds can be mutated using various mutation operations
has become a new research branch aiming to combine the to generate test cases. The quality of seeds is one of the
advantages of multiple techniques and enhance vulnerability important factors that influence the effectiveness of fuzzing.
detection capability. Many studies have improved combina- Selecting well-formed seeds can significantly save CPU time,
torial strategy for hybrid fuzzing using ‘‘optimal strategy,’’ and mutations based on well-formed seed inputs are more
‘‘discriminative dispatch strategy,’’ and ‘‘Priority Based Path likely to generate test cases that reach deeper levels of the
Searching method’’ [21], [22], [23]. However, in the face of program.
large software vulnerability mining, the operational costs of Wei Xiao et al. proposed a test case classification method
hybrid fuzzing tools are invariably high due to path explosion based on LSTM neural networks, where the test cases are
problems caused by program branches. Applying machine passed through LSTM and linear layers, resulting in two
learning techniques to mitigate inherent shortcomings of a output nodes. The activation function is applied to obtain the
single technique in hybrid fuzzing and thus improve fuzzing probability of the input belonging to a certain class in the
performance is also a possible research direction for future label set [24]. The model is trained using the test cases and
fuzzing development. their coverage states, and after multiple training iterations,
an accurate prediction model for input categories is obtained.
B. TEST CASE GENERATION This model is used to learn high-level features of the input
In the field of test case generation, machine learning can file structure and assess the value of seeds. By prioritizing
be applied to scenarios such as mutation position selection, the mutation of high-value seeds, the seed selection process is
mutation strategy schedule and structured test case genera- guided. NeuFuzz proposes a hidden pattern learning approach
tion. It effectively overcomes the limitations of traditional for vulnerable program paths based on LSTM models. Firstly,
fuzzing techniques, including blind mutation, ineffective the seed files are subjected to vulnerability detection. Then,

VOLUME 12, 2024 14441


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

TABLE 1. Seed selection.

the fuzzer is instructed to prioritize the vulnerable paths improve both the test hit rate and the test case coverage as
identified by the trained model and allocate more mutation much as possible.
energy to them. This method achieves maximum efficiency Zhou proposed an improved mutation method that imitates
in defect discovery [25]. With the increasing research on mutation. It enhances the ability of poor individuals to bypass
graph embedding networks, V-Fuzz applies it to fuzzing and defense mechanisms by imitating the mutation strategy of
proposes a fuzzing framework that combines graph embed- good individuals, thereby improving the aggressiveness of
ding networks and evolutionary algorithms. This framework test cases and the efficiency of genetic algorithms [28].
enables efficient testing of binary programs without requiring Zhang et al. also proposed an improved mutation strat-
source code [26]. V-Fuzz proposes a vulnerability detec- egy [29]. They set a threshold as a criterion, where individuals
tion model based on graph embedding networks, which with fitness values higher than the threshold still undergo ran-
outputs predicted vulnerability probability values for each dom mutation to maintain population diversity. Individuals
function in the target program. These probabilities are sub- with fitness values lower than the threshold randomly select
sequently used to calculate fitness scores. During the test an individual with a fitness value higher than the threshold,
execution using user-defined initial seeds, evolutionary algo- learn its mutation method, and mutate themselves accord-
rithms compute fitness scores for each seed and select ingly, guiding the population towards high aggressiveness
seeds with high fitness scores and triggering crashes as new evolution.
seed inputs. Subsequently, these seeds are mutated to gen- DARWIN proposed a mutation scheduling optimization
erate more test inputs that have the potential to discover method based on evolutionary strategies. It systematically
vulnerabilities. optimizes and updates the probability distribution of mutation
methods using evolutionary strategies, selects an approximate
2) MUTATION STRATEGY SCHEDULING optimal mutation strategy, and guides seeds to mutate towards
The random selection of mutation strategies and the sequen- high-quality directions [30]. AMSFuzz introduced an adap-
tial selection of mutation positions in existing fuzzers are tive mutation scheduling framework [31]. It adaptively
important factors that affect the quality of test cases and adjusts the probability distribution of mutation operators
vulnerability detection. In this section, we will focus on the using a multi-armed bandit model to determine the capabil-
application of different machine learning algorithms in the ities of the mutation operators. It also utilizes a seed slicing
problem of mutation strategy scheduling. mechanism to select the mutation positions and mutation area
Genetic algorithms simulate the natural process of gene sizes for seeds, thereby improving the efficiency of fuzzing.
recombination and evolution. Based on the principles of SEAMFUZZ also proposed a fuzzing method for adaptive
biological evolution, they perform selection, crossover and selection of variation strategies. By learning the individual
mutation operations on test cases to enhance their ability characteristics of different seeds, different mutation strategies
to trigger exceptions. To address the issue of high ineffi- are applied to different seeds. SEAMFUZZ clusters seeds
ciency of test cases in fuzzing for industrial control protocol into clusters based on their grammar properties and uses
vulnerability discovery and to automate and streamline the Thompson sampling variants to learn the probability dis-
fuzzing process, Zhang et al. designed a protocol fuzzer tribution of selecting different mutation strategies for each
called GA-Fuzzer that combines genetic algorithms with cluster, customizing effective mutation strategies for each
fuzzing [27]. The model structure is shown in Figure 5. In the seed cluster [32].
paper, a dynamic fitness function is constructed. Through Reinforcement learning is the process of adjusting agent
monitoring the presence of danger point use cases within the behavior during interaction with a system, aiming to max-
test case population, different fitness functions are selected. imize the received rewards based on executed actions and
Additionally, by introducing dynamic mutation and crossover system state transitions. Böttinge et al. formalized fuzzing
probabilities, the diversity of test cases within the population as reinforcement learning problems using Markov Decision
can be adjusted based on the population’s state, aiming to Processes (MDP) [33]. The fuzzy agent learns a policy by

14442 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

FIGURE 5. Flow chart of test case generation algorithm based on genetic algorithm.

TABLE 2. Mutation strategy scheduling.

observing the reward induced by mutations of a particular set policy is used to generate new higher-rewarded inputs, which
of actions performed on the initial program input. The learned in turn improves the quality of test cases. DRLFuzzer uses

VOLUME 12, 2024 14443


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

MDP and deep Q_learning algorithms to help the fuzzer can learn the intrinsic format of test cases and automate
automatically select mutation operations [34]. Furthermore, the generation of more well-formed test cases. Wang et al.
it realizes the combination with hardware mechanisms during also introduced a machine learning framework that can gen-
program execution to effectively improve the path coverage erate a large number of seed files [38]. They used the
and execution efficiency of fuzzing. A method based on Transformer model to learn the internal formatting docu-
the DDPG reinforcement learning algorithm was proposed ment syntax of PDF files and guide the generation of a new
by RLFUZZ to improve traditional fuzzing techniques [35]. sequence of objects. These objects were then assembled to
After modeling traditional fuzzing as a Markov decision form a new PDF file with complete formatting for subsequent
process, the DDPG reinforcement learning algorithm with an fuzzing. Experiments showed that for the mupdf software
integrated value function and policy function is used to select (version 1.4.0), this approach not only achieves faster cov-
an optimal action selection strategy for the process. This erage growth but also increases the upper limit of code
helps to reduce the blindness of sample mutation, enables coverage.
mutation samples to obtain maximum code coverage reward, GANFuzz proposed an automated test case generation
reduces the generation of invalid samples, and thus improves method that enables the generation of test cases without rely-
the efficiency of traditional fuzzing techniques. ing on protocol-specific format specifications [39]. Firstly,
a real protocol message corpus is used as training data and
3) TEST CASE GENERATION is partitioned using three clustering strategies. Then, a test
Test cases can be generated through mutation operations on case generation model is built using Generative Adversarial
seeds or automatically generated based on the known speci- Networks (GANs) [39]. By using the generated model based
fication format of test inputs. The content of test cases serves on real protocol messages, fake protocol messages with dif-
as the payload for attacking the target program, directly influ- ferent degrees of similarity to real protocols can be generated.
encing the effectiveness of vulnerability detection. Therefore, The SeqGan algorithm is introduced to update the gener-
constructing effective test cases with high code coverage can ator’s parameters using reinforcement learning techniques,
enhance the efficiency of fuzzers in vulnerability detection. addressing the issue of the inability to apply backpropa-
In this section, we introduce three techniques for fuzzing gation during the training process of protocol messages.
based on test case generation: generation-based fuzzing, Finally, this approach generates a large number of effec-
mutation-based fuzzing, and combination of generation and tive test cases. When these test cases are applied to the
mutation-based fuzzing. Modbus TCP protocol, the experiment confirms their good
vulnerability detection ability. However, overall, GAN-based
• Generation-Based fuzzing methods may somewhat reduce the efficiency of
In the generation-based test case generation approach, fuzzing.
machine learning algorithms are primarily used to learn the Security of industrial control protocols is a crucial aspect
format specifications of the target program. By learning from in overall industrial safety. To overcome the limitations of
well-structured corpus features, these algorithms generate a traditional fuzzing heavily reliant on industrial protocol spec-
large number of high-quality test cases that adhere to the ifications, Wang et al. proposed a pointer-generated network
specifications. (PGN)-based approach to handle the generation of fuzz test-
For fuzzing against the PDF file format, Godefroid et al. ing data [40]. The aim is to intelligently learn the real
proposed Learn&fuzz, which considers the use of deep learn- sequences of industrial control protocol messages using a
ing algorithms to enhance the syntax-based fuzzing case pointer-generated network and generate well-structured syn-
generation process [36]. Learn&fuzz introduces a genera- thetic test cases similar to actual data frames without detailed
tion model based on Char-RNN to learn PDF objects and protocol specifications. The architecture of this model com-
a SampleFuzz algorithm that can conduct fuzzy processing bines three components: a seq2seq model, an attention
when sampling new objects, intelligently guiding the gener- mechanism, and a pointer network model. It also incor-
ation of well-formed PDF input files. While their experiment porates a coverage mechanism, as illustrated in Figure 6.
did not achieve better results, it was still a commendable Firstly, a hierarchical LSTM unit is employed as both
effort. In 2021, Liu and Yang proposed an automatic test the encoder and decoder of the seq2seq model to retain
case generation model based on BLSTM and attention mech- the temporal dimensionality information and feature vector
anism, along with an improved sampling algorithm based on dimensionality information. The encoder consists of bidi-
Learn&fuzz [37]. BLSTM models were employed to extract rectional LSTM units that learn the character probability
and preserve information in the training samples considering distribution within protocol messages. The decoder utilizes
both forward and backward factors. The attention mechanism a unidirectional LSTM unit to predict the learned protocol
highlights key positions of sample sequences and prevents sequence and generate test cases for fuzzing with semanti-
information loss. The sampling algorithm’s performance was cally valid data fields and well-formed sequential grammar.
improved by adding mutations that better predict character Secondly, a coverage mechanism is introduced to address the
sequences. This paper proposed test case generation model issue of message repetition in the seq2seq model generation.

14444 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

FIGURE 6. The specific generative model structure [40]. Where the generation probability of time step t pgen ∈ [0, 1].

Additionally, a general-purpose intelligent industrial control generation of a large number of valid test cases that adhere to
protocol fuzzing framework called PGNFuzz is proposed the syntactic structure.
based on this method. Experimental results demonstrate Despite their widespread use, compilers and interpreters
that PGNFuzz outperforms GAN-based and LSTM-based are still prone to defects that can cause abnormal behavior in
seq2seq model fuzzers in industrial control protocol fuzzing programs. DeepFuzz [43] has proposed a test input generation
scenarios. method based on the seq2seq model to implement fuzzing of
Stateful protocols can pose testing difficulties for fuzzing, compiler test suites. The seq2seq model uses LSTM as the
and in the case of unknown industrial protocols, seqfuzzer encoder and decoder and is trained to learn the language pat-
proposes a fuzzing method based on the seq2seq struc- terns of C programs. More syntactically correct C programs
ture [41]. Real traffic in industrial networks is captured, are generated as test inputs by employing insertion, replace-
pre-processed, and passed into the seq2seq model as a dataset. ment, and deletion strategies to fuzz the compiler. To address
The LSTM model is used as both encoder and decoder of the problems of insufficient syntactic correctness and low
seq2seq to automatically learn temporal features of state- generation efficiency in existing methods for generating test
ful protocols. By learning the syntax of the real protocol cases, a feedforward neural network-based compiler fuzzing
sequence, spurious protocol messages similar to real mes- case generation method is proposed in FAIR [44]. FAIR cap-
sages are generated as test cases for fuzzing. The experiments tures the widespread long-distance syntactic dependencies
verify that seqfuzzer can generate test cases conforming to the existing in the source code. Subtrees are extracted from the
EtherCAT protocol format with unknown protocol structure abstract syntax tree to form a sequence of code snippets.
and detect various vulnerabilities. However, there are many A self-attention-based feedforward neural network is used
protocols in industrial networks, and building a generalized to capture the syntactic correlations between code snippets.
protocol fuzzing method for industrial networks needs to face By learning a series of context-aware feature representations
the challenge of various different private protocols. Future in the input sequence, it predicts subsequent code sequences.
research targeting generalized fuzzing frameworks will also For JavaScript engines, Montage utilizes LSTM to learn
provide significant value to industrial safety. syntactic and semantic relationships between segments in
In fuzzing for web applications vulnerabilities, A test a regression test set to guide the reconstruction of a given
case generation method for Web applications based on regression JavaScript test case and generate more effective
an improved LeakGAN algorithm has been proposed by test cases for use in JS engine fuzzing [45]. Similarly, COM-
Liu [42]. In the optimized LeakGAN algorithm model, FORT [46] has designed a test input generation model based
the generator consists of two LSTM models acting as on the GPT-2 model, which can generate more syntactically
the manager module and the worker module, respectively. correct JS programs using the specification rules defined in
Additionally, batch normalization is introduced to process the the ECMAScript standard.
input test cases, preventing extreme data distribution. The dis- The application of deep learning models in the test case
criminator applies an attention mechanism to guide the generation phase is widely studied. Seq2seq is a typical
generator in generating test cases. This method enables the encoder-decoder model that can be used to generate more

VOLUME 12, 2024 14445


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

TABLE 3. Generation-based test case generation.

and higher quality test cases by selecting LSTM, BLSTM, learning and data processing capabilities, giving them an
RNN, and their variants as encoders and decoders based on advantage in the generation of fuzz testing test cases. They
data characteristics. The Transformer model is an attention- can generate more high-quality test cases that conform to
based encoder-decoder model that dynamically selects all format specifications based on specification information such
inputs through attention mechanisms, effectively preventing as syntactic format.
information loss and highlighting key positions in training
• mutation-based
samples. Therefore, attention mechanisms can be applied
to various deep learning models. Although the basic RNN Mutation-based fuzzing focuses on generating new test
model can effectively handle long sequence data, it suffers cases by modifying certain fields of valid inputs. The main
from problems such as long-range dependencies and vanish- optimization directions are selecting appropriate mutation
ing gradients. In contrast, the LSTM model can effectively positions and improving the fitness function. Typically,
address these issues by introducing cell states on top of the a fuzzer can guide the mutation process based on the evalua-
RNN and using gate structures to update and delete cell states. tion results of test input performance to effectively generate
The LSTM model uses the output information from the pre- inputs [47].
vious part as input for the current training, thus achieving test Rajpal et al. [48] proposed a technique that uses informa-
case generation. Many studies currently focus on designing tion from training data for mutation coverage to predict a
and implementing test case generation models based on the heatmap of complete input files. This heatmap corresponds
LSTM model, which have shown good performance in han- to the mutation probabilities for each file location that leads
dling PDF files, various protocol information and compilers. to new code coverage. It guides the generation of effective
BLSTM, considering both forward and backward factors on test cases, reducing time wasted on invalid test cases and
top of LSTM, extracts and retains more information from improving the overall efficiency of fuzzing. DeltaFuzz [49]
the training samples, thereby improving the prediction accu- is a fuzzing technique based on historical version informa-
racy of the model. Deep learning models possess powerful tion. By analyzing the differences between historical versions

14446 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

and the target program, it locates the points of change. problem of refining the initial sample set. Wang et al. utilized
Then, it identifies the affected basic code blocks based on a heuristic genetic algorithm to optimize chromosome selec-
impact analysis. Finally, it calculates the fitness value of tion methods by eliminating redundancy caused by duplicate
test cases based on execution traces and iteratively generates genes and selecting chromosomes that contain more genes
new test cases using a genetic algorithm to improve test and richer gene combinations [53]. This optimization allows
case quality. Experimental results have shown that DeltaFuzz for improved search conditions and enhanced efficiency of
reaches the target faster compared to existing fuzz testing fuzzing without needing to change the working process of the
tools. genetic algorithm.
DYNFuzz [50] proposes a neural network-based directed In summary, within the domain of mutation-based test
grey-box fuzzing method. It uses an LSTM neural network case generation methods, most studies rely on genetic algo-
to learn mutation patterns at different positions in previous rithms to generate diverse test cases. By improving the fitness
input files and predicts the mutation gains at different posi- function and optimizing search conditions, the quality of
tions in the current input file. This optimization helps in test cases is enhanced, effectively exploring the input space
guiding future fuzz search. The entire DGF process consists to identify potential vulnerabilities and errors. It should be
of two stages: exploration and exploitation. The seed inputs noted, however, that the search process of genetic algo-
in fuzzing are divided into two groups: coverage seeds for rithms may require numerous iterations and computational
path exploration and directed seeds for exploitation. In the resources, leading to potentially reduced efficiency when
exploration stage, the fuzzer queries a trained neural net- dealing with large-scale complex systems. Moreover, for
work model before mutating the seed. The model returns target programs with complex software structures and vary-
a coverage heatmap for the corresponding complete input ing variable types, further research is needed to address
file, indicating probabilities of mutation-induced new code the handling of non-numeric variables and the construc-
coverage for each position in the file. The mutation posi- tion of appropriate fitness functions. On another front,
tions are sorted based on the probabilities from the coverage machine learning algorithms possess exceptional predictive
heatmap, giving priority to positions with a higher likelihood capabilities that can address the issue of selecting suitable
of mutation gains. In the exploitation stage, the distance mutation positions. Through model training, these algorithms
between basic blocks and target points is calculated using the forecast mutation gains for different mutation positions,
LLVM method, and each directed fuzzing seed is assigned guiding the fuzzing tool to prioritize mutation at positions
a distance value for priority sorting and seed mutation with higher gains. This enables faster traversal of the tar-
optimization. get program and minimizes the time required to trigger
In protocol-oriented fuzzing, Xiang and Ma [51] aimed to exceptions.
avoid generating a large amount of redundant data. They uti-
• Combination of Generation and Mutation Based
lized the returned error codes to indicate the code coverage of
test cases and optimized the calculation method of individual The combined approach of generation and mutation in
fitness function based on two aspects: the similarity between fuzzing aims to leverage the strengths of both techniques,
individuals and the seed queue, as well as the error codes generating higher-quality and effective test cases.
of seeds. This approach adjusted the evolution direction of When applied to network protocols, Wang et al. [54] pro-
genetic algorithms in a timely manner based on the fuzzing posed an adaptive fuzzing method based on transformers.
results, effectively improving the efficiency of fuzzing for the Utilizing transformers, they learned semantic information of
Modbus TCP protocol. the Modbus TCP protocol and generated test cases. By com-
In the field of web application vulnerability detec- paring the semantic similarity between protocols, they guided
tion, Qu et al. proposed a test case optimization method the generated test cases to undergo byte-level mutations,
based on genetic algorithm to improve the effectiveness of reducing the similarity among test cases and enhancing
fuzzing [52]. They analyzed different types of attack elements the probability of triggering exceptions. This method com-
through traffic analysis systems and created a weighted web bines both generation and mutation approaches, dynamically
attack feature database, which was then passed to the genetic adjusting the mutation frequency of bytes that are prone to
algorithm. The construction of the fitness function is based triggering vulnerabilities. It not only ensures compliance with
on the analysis of the response information of the web sites the protocol’s syntax format but also improves the ability to
to calculate the actual fitness function value. By repeatedly generate test cases that effectively trigger exceptions. Rapid-
iterating through the selection, crossover and mutation oper- Fuzz proposes a combination of an improved WGAN model
ations, the best test case is generated. Experimental results based on the gradient penalty and a mutated gene detec-
have shown that this method performs well in web vulnera- tion algorithm for test case generation [55]. The WGAN-GP
bility mining. model is utilized to learn the numerical distribution of seed
In addition to improving the fitness function and designing samples and generate numerous new samples. RapidFuzz
optimized mutation strategies during the test case mutation proposes a mutated gene detection algorithm that sorts train-
generation phase, some research has also focused on the ing set samples obtained from AFL based on the frequency

VOLUME 12, 2024 14447


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

TABLE 4. Mutation-based test case generation.

TABLE 5. Test case generation based on the combination of mutation and generation.

of sensitive mutation sites. To combine samples generated efficiency of fuzzing, machine learning techniques can be
by the GAN model with high-frequency mutation points utilized for input selection by classifying a large number of
with those generated by the original AFL, a semi-random test cases before testing, thus prioritizing and filtering out test
method is employed. Through the rational combination cases that are expected to trigger new paths or specific types
of generation-based fuzzing with mutation-based fuzzing, of vulnerabilities, as determined by testers.
RapidFuzz achieves significantly faster fuzzing speeds while Input selection involves the direct selection and elimination
obtaining higher coverage. of test cases. Hu and Pan proposed a Quasi-Recurrent Neural
Generation-based test case generation techniques often Network (QRNN)-based fuzzing case filtering method for
require known specifications and relevant information. How- network protocols that combines the processing and predic-
ever, in most cases, fuzzing is a black-box testing technique tion capabilities of the QRNN model for sequential data [56].
where limited known information is available for training This method effectively learns the structural features of
and optimizing the test case generation model. Addition- network protocols to automatically filter invalid test cases,
ally, existing mutation-based test case generation techniques thus improving the efficiency of network protocol fuzzing.
exhibit strong randomness and often do not differentiate Karamcheti et al. proposed a gray-box fuzzing method based
between seed files that have different vulnerabilities, result- on machine learning that directly models program behav-
ing in wasted time generating ineffective test cases. As a ior [57]. The learned forward prediction model maps program
result, research on the combined approach of generation and inputs to execution traces, and the entropy of the distribution
mutation in test case generation has gained significant atten- of execution traces is used to assess the model’s uncertainty
tion, aiming to leverage the advantages of both approaches about the input. A higher entropy indicates higher uncertainty,
and improve the efficiency and vulnerability detection capa- suggesting that the input may cover new code areas during
bility of fuzzing. execution. This method filters out deterministic test inputs,
significantly reducing unnecessary executions and improving
C. INPUT SELECTION the efficiency of fuzzing. Zong et al. developed a directed
In the real world, due to the presence of a large number gray-box fuzzer called FuzzGuard [58], which predicts the
of invalid test cases in the generated input and various reachability of test inputs without executing the target pro-
constraints protecting the target program, the efficiency gram. By learning from previous execution inputs, it predicts
of fuzzing processing is greatly affected. To improve the whether a program can execute the target error code with

14448 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

newly generated inputs. If the prediction result is unreach- utilized to automatically classify and identify output results,
able, the input will not be executed. This method, built upon reducing manual cost. When applying machine learning to
the mature directed gray-box fuzzing tool AFLGo, improves improve fuzzing, it is necessary to balance multiple factors
overall efficiency by filtering out unreachable inputs, thereby and consider practical needs.
saving actual execution time. This section provides a further summary of existing
While research on input selection is limited, machine learn- research, analyzing the gain effects of different machine
ing techniques have demonstrated their ability to enhance the learning algorithms on fuzzing, mainly including coverage,
quality of test inputs, reduce unnecessary resource waste, and vulnerability detection capability, efficiency and test case
outperform traditional fuzzing tools in terms of efficiency. effectiveness.

D. ANALYSIS OF RESULTS A. COVERAGE


The result analysis phase, which follows the completion As one of the important indicators for evaluating fuzzing
of fuzzing, focuses on analyzing and processing the output performance, coverage reflects the possibility of triggering
information. In cases of abnormal output states, manual iden- crashes. Existing research on coverage indicators mainly
tification and analysis are typically required to determine the includes statement coverage, branch coverage, number of
cause of the anomaly, a process heavily reliant on domain triggered paths, edge coverage and basic block coverage.
knowledge and the ability to perform vulnerability analysis MPFuzz [19] improves coverage by a factor of 4 over con-
and reproduction. ventional Fuzzer, while SmartFuSE [20] improved statement
To improve the automation of fuzzing and reduce the influ- coverage by 2.8%-3.4%, branch coverage by 20.7%-26.9%,
ence of subjective experience on analysis results, machine and increased the number of paths by 0.9-13.5 times.
learning techniques can be used for output classification, In LAVA-M, a total of 929 program defects were discovered.
facilitating the identification of abnormalities and their QYSM [21] explored over 20% of the code paths in libpng,
underlying causes. Harsh et al. utilized four methods - increasing the code coverage by about 3%. NeuFuzz [25]
supervised, unsupervised, unsupervised + supervised and can achieve more than 1000 new edge coverages in an hour,
semi-supervised - with various techniques such as decision which is about 4 times better than AFL. All five types of
trees, support vector machines, K-Means clustering and Nave errors in six real programs were detected at least 64 more
Bayes, to experimentally address the root cause analysis bugs than other compared fuzzer in LAVA-M. DARWIN [30]
problem [59]. Given the lack of labeled data, Harsh et al. averaged a 6.77% improvement in edge coverage in MOPT
proposed a semi-supervised method that is best suited for and a 1.73% improvement in edge coverage in AFL. The
most real-world scenarios and evaluated the feasibility of the seeds generated by literature [38] covered up to 9914 paths,
method on eclipse. which is much higher than traditional methods. It can be
However, the application of machine learning techniques seen that machine learning algorithms applied in fuzzing can
to the post-fuzzing result analysis phase requires further effectively improve coverage and greatly enhance the ability
research due to the limited availability of labeled data sets to trigger crashes. Figure 7 shows the coverage ranges of
suitable for training and the predictive nature of machine different literature. The figure mainly displays the coverage
learning results, making it challenging to analyze and inter- interval implemented by different methods.
pret the output of fuzzing.
B. VULNERABILITY DETECTION CAPABILITY
IV. PERFORMANCE GAINS EVALUATION The ability to detect vulnerabilities is the most intuitive
Different machine learning methods have their own charac- reflection of fuzzer’s performance. It mainly includes the
teristics and can be applied to a variety of target scenarios. number of triggered crashes, errors and CVEs. Existing lit-
So far, machine learning techniques have been applied in erature mostly evaluates fuzzer performance by testing on
the four main stages of fuzzing. In the preprocessing stage, public datasets and real-world applications. For example,
deep learning and reinforcement learning techniques can be QYSM [21] detected 13 unknown errors in eight real pro-
used to guide fuzzing in better utilizing program information. grams. Literature [26] discovered 10 CVEs in real programs;
In the test case generation stage, evolutionary algorithms can Literature [27], Literature [29], and Literature [42] conducted
be applied to guide seed mutation, and machine learning experimental verification in a web target environment and
techniques can learn effective structural features, optimize detected 4, 3 and 21 vulnerabilities respectively. AMSFuzz
seed selection, and generate a large number of test cases in [31] discovered an average of 226.2 bugs in LAVA-M,
unknown format specifications. Reinforcement learning can detected 17 previously unknown bugs in real programs, 15 of
also be applied to select the approximate optimal mutation which were assigned CVE IDs. Compared with the baseline,
strategy to improve the efficiency of test cases. In the input AMSFuzz triggered the most bugs in the same amount of
selection stage, machine learning techniques can be applied to time. SEAMFUZZ [32] generated 56.4%-57.1% more crash
predict the effectiveness and accessibility of test inputs. In the inputs, triggering a total of 606 crashes; discovered 99 unique
result analysis stage, machine learning techniques can be bugs, including 27 bugs that other baselines did not detect.

VOLUME 12, 2024 14449


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

FIGURE 8. Comparison of the number of vulnerabilities detected by


FIGURE 7. Comparison of coverage across different literature. different methods.

show that GANFuzz achieves rejection rates as low as 43%.


Figure 8 shows a comparison of the number of vulnerabilities PGNFuzz [40] introduces the test case identification rate,
detected by different methods. Due to the different target which primarily measures the percentage of test cases rec-
datasets used in different literature, the number of bug detec- ognized by the test target. Experimental results demonstrate
tions also varies. Therefore, the bug quantity in this figure is that PGNFuzz achieves an average test case identification rate
selected as the maximum number of vulnerabilities detected improvement of 8%-13%. Similarly, SeqFuzzer [41] achieves
in the literature. a pass rate of 90.86%-99.99%. In [51], the acceptance rate has
been improved by approximately 45%.
C. EFFICIENCY
Efficiency mainly refers to the total number of covered paths V. PROBLEMS AND PROSPECTS
or the number of test cases that trigger program crashes in Research on machine learning-based fuzzing techniques is
the same amount of time. The more the quantity, the higher currently a hot topic. Despite the numerous research results
the efficiency. The average discovery time of vulnerabilities available, the complex and diverse architecture, syntax and
in [18] is 1.5 times higher than AFL. In an average runtime input of target programs have resulted in a broad range of
of 30 minutes, QSYM [21] generated hundreds of test cases, vulnerabilities with various causes and types. As such, effi-
exceeding the number of test cases generated by other fuzzers ciently and comprehensively detecting vulnerabilities using
by 10 times. DARWIN [30] is 48.26% faster than MOPT fuzzing remains a challenge, requiring continued efforts to
while, in the MAGMA benchmark test, after 5 hours of address obstacles and limitations in this area.
fuzzing, DARWIN was able to find 15 bugs (a total of 21).
[39] discovered 5 bugs per 10,000 test cases. COMFORT [46] A. MACHINE LEARNING MODELS CAN SLOW DOWN
detected 158 unique vulnerabilities by automatically running FUZZ TESTING
on 250k self-generated test cases for 200 hours, of which Fuzzing combined with machine learning techniques has
129 have been verified and 115 have been fixed by devel- emerged as a cybersecurity research hotspot. However, cur-
opers. In [51], only 2500 test cases successfully triggered rent efforts focus primarily on improving fuzzing coverage
two denial-of-service vulnerabilities. In [56], the total number and achieving more accurate vulnerability detection by
of paths covered for BIND 9 within the same test time was leveraging the image recognition and feature extraction capa-
2360, an increase of approximately 85.1% in the overall path bilities of machine learning models to guide the generation of
coverage. high-quality fuzzing cases in various domains. Nonetheless,
given the time-consuming and computationally expensive
D. TEST CASE EFFECTIVENESS nature of machine learning model training, fuzzing execution
Due to various filtering and protection mechanisms in the speed may be slower than traditional fuzzing methods, result-
target program, it is necessary to measure the effectiveness ing in reduced overall efficiency.
of test cases. Test case effectiveness primarily refers to the To address this issue, some scholars have begun exploring
rate at which test cases can effectively input into the target strategies to expedite the generation of highly structured test
program, representing their bypass capability. Different liter- cases. SmartSeed proposes a generic and efficient approach
ature sets different specific indicators for this purpose. For for test case generation that employs a WGAN model to
example, in GANFuzz [39], the test rejection rate represents learn valuable document features, which are then used to
the percentage of test cases that are rejected. A lower rejection generate additional high-value test cases [60]. RapidFuzz
rate indicates better test case quality, and experimental results introduces an improved Wassertein-Generative Adversarial

14450 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

Network implementation utilizing a gradient penalty-based suitable mutation strategy for different seeds [68]. Experi-
method to stabilize the GAN model training process and mental validation on eight deep learning models shows that
optimize probability distribution learning on the original the method can generate more adversarial inputs and explore
dataset, addressing unstable model training and unexpected more internal states of the deep learning model with less time
behavior issues caused by weight cropping introduced in overhead.
WGAN [55]. Although existing fuzzing tools have been applied in var-
As testing tasks become increasingly complex, parallel ious domains, such as file formats, network and industrial
computing is also being leveraged to enhance fuzzing effi- protocols, binary programs and security vulnerabilities of IoT
ciency and effectiveness. Honggfuzz automatically supports devices, the application of machine learning models brings
multi-process and multi-thread execution of fuzzing tech- advantages such as improved detection accuracy to fuzzing.
niques [61], ClusterFuzz deploys parallel fuzzing across However, the models themselves may have security issues
multiple machines and cores [62], and literature [63] presents such as resource leakage, crashes, computational errors and
a parallelized task allocation method based on execution anomalous behavior. An increasing number of scholars have
paths to reduce duplicate tests among fuzzing instances, fully worked on extending fuzzing techniques to machine learn-
utilize distributed computing resources, and improve paral- ing frameworks and conducted experiments to verify their
lelized fuzzing efficiency. feasibility [65], [66], [67]. Hence, it is crucial to explore
Execution speed is a critical metric for fuzzing. While the nature of fuzzing technology, extend it to more applica-
improving machine learning models can reduce fuzzing time tion objects, and develop fuzzing techniques oriented toward
and increase efficiency, it is crucial to balance overall fuzzing multi-domain vulnerability detection.
efficiency with vulnerability detection capabilities through
continuous research in this area. C. DATASET STANDARDIZATION
Currently, there is a lack of standardized datasets for bench-
B. EXPANDED APPLICATION AREAS
marking in the field of fuzzing because different target
programs have different characteristics and requirements.
As research on machine learning techniques gains momen-
Researchers typically collect data through web crawlers,
tum, an increasing number of scholars are focusing solely
generate test cases using fuzzing, or utilize some pub-
on applying machine learning models in fuzzing. However,
licly available datasets [69]. For example, in fuzzing for
it is important to note that machine learning models are
file formats and protocols, the commonly used dataset is
susceptible to adversarial examples, which may contain vul-
LAVA-M [70], which was created by NIST and contains
nerabilities that can cause serious security issues. Yi Qin
various types of files and protocols such as JPEG, MP3,
proposed a hard-labeled black-box attack method based on
PDF, HTTP, etc. The developers selected four programs, uniq,
fuzzing for machine learning models and developed two
who, md5sum and base64, to create a corpus and injected
fuzzers, AdvFuzzer and LocalFuzzer, capable of generating
some validated errors into each program. For fuzzing for
numerous successful adversarial examples [64].
web application vulnerabilities, spider technology is mainly
In recent years, there has been a gradual increase in
used to collect and organize test input data, and the test
research proposing fuzzing methods for machine learning
data set is constructed manually. For fuzzing for binary pro-
frameworks. For example, FAME is a DL framework fuzzy
grams, publicly available datasets include AFLSmart [71]
system with an API mutation generation model and proposes
and Fuzzingbook [72], which contain over 1,000 binary pro-
the optimization of layer and weight mutations capable of
grams and their corresponding seed files that can be directly
detecting NaN errors and crash errors in deep learning frame-
obtained from the official website. The quality of the dataset
works [65]. Park et al. proposed a mixed constraint mutation
directly affects the training effect of the machine learning
(MCM) strategy for fuzzing deep learning systems, gener-
model and the performance of the vulnerability detection
ating diverse variant results while preserving the original
model. Therefore, it is meaningful to establish a standard-
input semantics by combining various image transforma-
ized dataset for programs and vulnerability types in various
tion algorithms [66]. Muffin proposed a new model fuzzing
fields.
approach to explore target libraries by developing metrics for
measuring inconsistencies between different deep learning
libraries and testing various models for differences, allowing D. EXPAND THE TYPES OF VULNERABILITY DETECTION
the generation of different deep learning models [67]. Deep- Three problems exist with current fuzzing techniques for
Controller uses feedback obtained during test execution to detecting vulnerability types. Firstly, a large number of cur-
dynamically select seed and mutation strategies, proposing rent fuzzers typically rely on program crashes as an indication
adaptive seed selection strategy-AS2, which uses feedback of detected anomalies, but not all vulnerabilities result in
information from test execution to select seeds with high program crashes, such as memory corruption, Trojans and
fault detection potential, and an adaptive mutation strategy viruses. Secondly, vulnerabilities have increasingly been trig-
selection method-AMS2, which analyzes the performance of gered by multiple input points at different levels, and testing
mutation strategies on different seeds and selects the most a single input point may not effectively monitor program

VOLUME 12, 2024 14451


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

anomalies, making it less effective in detecting vulnerabilities [4] P. Godefroid, M. Y. Levin, and D. Molnar, ‘‘SAGE: Whitebox fuzzing for
of the multi-point trigger type [73]. Thirdly, for some newly security testing: SAGE has had a remarkable impact at Microsoft,’’ Queue,
vol. 10, no. 1, pp. 20–27, Jan. 2012, doi: 10.1145/2090147.2094081.
emerged vulnerability types, it can be challenging to build [5] J. Li, S. Li, G. Sun, T. Chen, and H. Yu, ‘‘SNPSFuzzer: A fast
machine learning models due to the lack of relevant reference greybox fuzzer for stateful network protocols using snapshots,’’ IEEE
materials. Therefore, future research on how to detect more Trans. Inf. Forensics Security, vol. 17, pp. 2673–2687, 2022, doi:
10.1109/TIFS.2022.3192991.
types of vulnerabilities will become one of the key research [6] L. Liu, X. He, L. Liu, L. Qing, Y. Fang, and J. Liu, ‘‘Capturing the
directions. symptoms of malicious code in electronic documents by file’s entropy
signal combined with machine learning,’’ Appl. Soft Comput., vol. 82,
Sep. 2019, Art. no. 105598, doi: 10.1016/j.asoc.2019.105598.
VI. CONCLUSION [7] A. Javaid, Q. Niyaz, W. Q. Sun, and W. Alam, ‘‘A deep learning approach
The present study investigates the application of machine for network intrusion detection system,’’ in Proc. 9th EAI Int. Conf. Bio-
learning in the field of fuzzing, based on an extensive review Inspired Inf. Commun. Technol., vol. 2016, pp. 21–26.
[8] Z. H. Ren, H. Zheng, J. Y. Zhang, W. J. Wang, T. Feng, and Y. Q. Zhang,
of the relevant literature. Machine learning methods are most ‘‘A review of fuzzing techniques,’’ J. Comput. Res. Develop., vol. 58, no. 5,
commonly applied in the test case generation phase, where pp. 944–963, May 2021, doi: 10.7544/issn1000-1239.2021.20201018.
genetic algorithms effectively generate diverse test cases to [9] M. Zalewski. (2017). American Fuzzy Lop. Accessed: Jul. 4, 2023.
improve the coverage and effectiveness of fuzzing. Deep [Online]. Available: https://lcamtuf.coredump.cx/afl/
[10] X. Zhou and Y. Hu, ‘‘Fuzzing method based on path risk fitness,’’ Commun.
learning methods leverage their powerful pattern recognition Technol., vol. 55, no. 4, pp. 500–505, Apr. 2022, doi: 10.3969/j.issn.1002-
and feature extraction capabilities to generate more tar- 0802.2022.04.014.
geted and high-quality test cases, further uncovering potential [11] M. Cho, S. Kim, and T. Kwon, ‘‘Intriguer: Field-level constraint solving for
hybrid fuzzing,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur.,
vulnerabilities and abnormal behaviors in the system. Rein- New York, NY, USA, Nov. 2019, pp. 515–530.
forcement learning employs reward mechanisms to guide [12] S. K. Cha, M. Woo, and D. Brumley, ‘‘Program-adaptive mutational
the generation of test cases that are conducive to explor- fuzzing,’’ in Proc. IEEE Symp. Secur. Privacy, San Jose, CA, USA,
ing and discovering abnormal behaviors, thereby enhancing May 2015, pp. 725–741.
[13] P. Chen and H. Chen, ‘‘Angora: Efficient fuzzing by principled search,’’ in
the efficiency and quality of test case generation. More- Proc. IEEE Symp. Secur. Privacy (SP), May 2018, pp. 711–725.
over, machine learning has made many experiments and [14] H. R. Fang, F. Guo, and H. Y. Li, ‘‘TaintPoint: Fuzzing taint flow efficiently
improvements in the preprocessing, input selection and result with live trace,’’ J. Softw., vol. 33, no. 6, pp. 1978–1995, Jan. 2022, doi:
10.13328/j.cnki.jos.006564.
analysis and evaluation stages of fuzzing, effectively improv-
[15] T. T. Gu, S. B. Lu, X. Li, X. H. Kuang, and G. Zhao, ‘‘Overview of parallel
ing the efficiency and vulnerability detection capabilities of fuzzing,’’ Comput. Eng. Sci., vol. 44, no. 6, pp. 1046–1055, Jun. 2022, doi:
fuzzing. As an important approach in the field of fuzzing, 10.3969/j.issn.1007-130X.2022.06.012.
machine learning provides new ideas and technical means for [16] H. Huang, P. Yao, R. Wu, Q. Shi, and C. Zhang, ‘‘PANGOLIN:
Incremental hybrid fuzzing with polyhedral path abstraction,’’ in Proc.
improving fuzzing techniques. Future research should further IEEE Symp. Secur. Privacy (SP), San Francisco, CA, USA, May 2020,
explore how to integrate different machine learning methods, pp. 1613–1627.
harnessing their strengths to address the challenges faced in [17] L. Y. Liu, F. Li, Y. Y. Zou, J. H. Zhou, A. H. Piao, F. Liu, and W. Huo,
‘‘SiCsFuzzer: A sparse-instrumentation-based fuzzing platform for closed
fuzzing, thus promoting the development and application of source software,’’ J. Cyber Secur., vol. 7, no. 4, pp. 55–70, Jul. 2022, doi:
fuzzing technology. 10.19363/J.cnki.cn10-1380/tn.2022.07.05.
Fuzz testing will continue to play a crucial role in future [18] T. Xiao, Z. H. Jiang, P. Tang, Z. Huang, J. Guo, and D. W. Qiu,
project vulnerability assessments. With the advancement of ‘‘High-performance directional fuzzing scheme based on deep reinforce-
ment learning,’’ Chin. J. Netw. Inf. Secur., vol. 9, no. 2, pp. 132–142,
research in this field, we hope to witness the continuous Apr. 2023.
application of machine learning to address bottlenecks in the [19] D. Luo, T. Li, L. Chen, H. Zou, and M. Shi, ‘‘Grammar-based fuzz
fuzzing process. This article provides a detailed introduction testing for microprocessor RTL design,’’ Integration, vol. 86, pp. 64–73,
Sep. 2022, doi: 10.1016/j.vlsi.2022.05.001.
to the development of machine learning-based fuzzing tech- [20] F. J. Gao, Y. Wang, L. Y. Situ, and L. Z. Wang, ‘‘Deep learning-based
niques and related research, aiming to serve as a valuable hybrid fuzz testing,’’ J. Softw., vol. 32, no. 4, pp. 988–1005, Apr. 2021.
reference for researchers in this field. [21] X. Wang, J. Sun, Z. Chen, P. Zhang, J. Wang, and Y. Lin, ‘‘Towards optimal
concolic testing,’’ in Proc. IEEE/ACM 40th Int. Conf. Softw. Eng. (ICSE),
Gothenburg, Sweden, May 2018, pp. 291–302.
ACKNOWLEDGMENT [22] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, ‘‘QSYM: A practical concolic
The authors thank the anonymous referees for their helpful execution engine tailored for hybrid fuzzing,’’ in Proc. 27th USENIX Secur.
Symp., 2018, pp. 745–761.
comments and suggestions on the initial version of this arti-
[23] P.-H. Lin, Z. Hong, Y.-H. Li, and L.-F. Wu, ‘‘A priority based path
cles. They declare that they have no conflict of interest. searching method for improving hybrid fuzzing,’’ Comput. Secur., vol. 105,
Jun. 2021, Art. no. 102242, doi: 10.1016/j.cose.2021.102242.
[24] W. Xiao, A. M. Zhou, and P. Jia, ‘‘Optimizing seed selection in fuzzing
REFERENCES
based on deep learning,’’ Mod. Comput., vol. 28, no. 8, pp. 30–35,
[1] W. P. Wen, ‘‘Automated vulnerability mining and attack detection,’’ J. Inf. Apr. 2022.
Secur. Res., vol. 8, no. 7, pp. 630–631, Jul. 2022. [25] Y. Wang, Z. Wu, Q. Wei, and Q. Wang, ‘‘NeuFuzz: Efficient fuzzing with
[2] B. P. Miller, L. Fredriksen, and B. So, ‘‘An empirical study of the reliability deep neural network,’’ IEEE Access, vol. 7, pp. 36340–36352, 2019, doi:
of UNIX utilities,’’ Commun. ACM, vol. 33, no. 12, pp. 32–44, Dec. 1990. 10.1109/ACCESS.2019.2903291.
[3] R. Kaksonen, M. Laakso, and A. Takanen, Communications and [26] Y. Li, S. Ji, C. Lyu, Y. Chen, J. Chen, Q. Gu, C. Wu, and R. Beyah,
Multimedia Security Issues of the New Century, vol. 64. Boston, ‘‘V-Fuzz: Vulnerability prediction-assisted evolutionary fuzzing for binary
MA, USA: Springer, 2001, pp. 173–183. [Online]. Available: programs,’’ IEEE Trans. Cybern., vol. 52, no. 5, pp. 3745–3756, May 2022,
https://link.springer.com/book/10.1007/978-0-387-35413-2 doi: 10.1109/TCYB.2020.3013675.

14452 VOLUME 12, 2024


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

[27] G. Y. Zhang, W. L. Shang, B. W. Zhang, C. Y. Chen, and R. Zhang, [50] Z. J. Li, T. Y. Wang, Z. Q. Zhou, Y. Wang, and Y. L. Chen, ‘‘Directed grey-
‘‘Fuzzy test method for industrial control protocol combining genetic box fuzzing technology based on LSTM and dynamic strategy,’’ Comput.
algorithm,’’ Appl. Res. Comput., vol. 38, no. 3, pp. 680–684, 2021, doi: Eng. Appl., vol. 58, no. 18, pp. 147–153, 2022.
10.19734/j.issn.1001-3695.2020.03.0048. [51] L. Xiang and R. F. Ma, ‘‘Research on fuzzy testing technology of Modbus
[28] X. S. Zhou, ‘‘Research and implementation of web vulnerability detecting TCP protocol based on genetic algorithm,’’ Ship Electron. Eng., vol. 40,
based on fuzzing test,’’ M.S. thesis, Dept. Cyberspace Secur., Beijing Univ. no. 10, pp. 149–153, Oct. 2020.
Posts Telecommun., Beijing, China, 2020. [52] S. Qu, Z. Zhang, B. Ma, and Y. Shao, ‘‘Optimization method of web fuzzy
[29] H. Zhang, W. Dong, and L. Jiang, ‘‘Zokfuzz: Detection of web vulnerabil- test cases based on genetic algorithm,’’ J. Phys., Conf. Ser., vol. 2078, no. 1,
ities via fuzzing,’’ in Proc. 2nd Int. Conf. Consum. Electron. Comput. Eng. Nov. 2021, Art. no. 012015.
(ICCECE), Guangzhou, China, Jan. 2022, pp. 281–287. [53] Z. H. Wang, H. F. Wang, and M. M. Cheng, ‘‘Fuzzing testing sam-
[30] P. Jauernig, D. Jakobovic, S. Picek, E. Stapf, and A.-R. Sadeghi, ‘‘DAR- ple set optimization scheme based on heuristic genetic algorithm,’’ J.
WIN: Survival of the fittest fuzzing mutators,’’ in Proc. Netw. Distrib. Syst. Beijing Univ. Aeronaut. Astronaut., vol. 48, no. 2, pp. 217–224, 2022, doi:
Secur. Symp., San Diego, CA, USA, 2023, pp. 24–27. 10.13700/j.bh.1001-5965.2020.0422.
[31] X. Zhao, H. Qu, J. Xu, S. Li, and G.-G. Wang, ‘‘AMSFuzz: An adaptive [54] W. Wang, Z. Chen, Z. Zheng, and H. Wang, ‘‘An adaptive fuzzing method
mutation schedule for fuzzing,’’ Expert Syst. Appl., vol. 208, Dec. 2022, based on transformer and protocol similarity mutation,’’ Comput. Secur.,
Art. no. 118162, doi: 10.1016/j.eswa.2022.118162. vol. 129, Jun. 2023, Art. no. 103197.
[32] M. Lee, S. Cha, and H. Oh, ‘‘Learning seed-adaptive mutation strategies for [55] A. Ye, L. Wang, L. Zhao, J. Ke, W. Wang, and Q. Liu, ‘‘RapidFuzz:
greybox fuzzing,’’ in Proc. IEEE/ACM 45th Int. Conf. Softw. Eng. (ICSE), Accelerating fuzzing via generative adversarial networks,’’ Neurocomput-
May 2023, pp. 384–396. ing, vol. 460, pp. 195–204, Oct. 2021.
[33] K. Böttinger, P. Godefroid, and R. Singh, ‘‘Deep reinforcement fuzzing,’’ [56] Z. H. Hu and Z. L. Pan, ‘‘Testcase filtering method based on QRNN for
in Proc. IEEE Secur. Privacy Workshops (SPW), San Francisco, CA, USA, network protocol,’’ Comput. Sci., vol. 49, no. 5, pp. 318–324, 2022.
May 2018, pp. 116–122. [57] S. Karamcheti, G. Mann, and D. Rosenberg, ‘‘Improving grey-box fuzzing
[34] C. Chen, ‘‘Grey-box fuzzing with deep reinforcement learning and process by modeling program behavior,’’ 2018, arXiv:1811.08973.
trace back,’’ in Proc. 4th Int. Conf. Adv. Electron. Mater., Comput. Softw. [58] P. Y. Zong, T. Lv, D. W. Wang, Z. Z. Deng, R. G. Liang, and K.
Eng. (AEMCSE), Changsha, China, Mar. 2021, pp. 1167–1171. Chen, ‘‘FuzzGuard: Filtering out unreachable inputs in directed grey-
[35] Z. Zhang, ‘‘Research on fuzz testing technology based on DDPG reinforce- box fuzzingthrough deep learning,’’ in Proc. 29th USENIX Secur. Symp.,
ment learning algorithm,’’ M.S. thesis, Dept. Cyberspace Secur., Beijing Aug. 2020, pp. 2255–2269.
Univ. Posts Telecommun., Beijing, China, 2021. [59] H. Lal and G. Pahwa, ‘‘Root cause analysis of software bugs using machine
learning techniques,’’ in Proc. 7th Int. Conf. Cloud Comput., Data Sci.
[36] P. Godefroid, H. Peleg, and R. Singh, ‘‘Learn&fuzz: Machine learning for
Eng.-Confluence, Noida, India, Jan. 2017, pp. 105–111.
input fuzzing,’’ in Proc. 32nd IEEE/ACM Int. Conf. Automated Softw. Eng.
(ASE), Urbana, IL, USA, Oct. 2017, pp. 50–59. [60] C. Y. Lv, Y. W. Li, and S. L. Ji, ‘‘SmartSeed: Smart seed generation strategy
for fuzzing testing,’’ J. Eng. Heilongjiang Univ., vol. 12, no. 3, pp. 90–108,
[37] W. Q. Liu and W. C. Yang, ‘‘Research on efficient fuzzing technol-
Sep. 2021.
ogy based on deep learning,’’ Highlights Sciencepaper, vol. 14, no. 2,
[61] SwieckiR. (2016). Honggfuzz. Accessed: Jul. 4, 2023. [Online]. Available:
pp. 160–167. Jun. 2021.
http://code.google.com/p/honggfuzz
[38] M. Wang, D. G. Feng, L. Cheng, and Y. Zhang, ‘‘Optimization of fuzzing
[62] (2020). ClusterFuzz. Accessed: Jul. 4, 2023. [Online]. Available:
seed input based on machine learning,’’ Comput. Syst. Appl., vol. 30, no. 6,
https://google.github.io/clusterfuzz/
pp. 1–8, Jun. 2021.
[63] R. Tang, ‘‘Research on efficient fuzzing technology based on deep learn-
[39] Z. Hu, J. Shi, Y. Huang, J. Xiong, and X. Bu, ‘‘GANFuzz: A GAN-based
ing and parallelization,’’ M.S. thesis, Dept. Comput. Syst. Org., China
industrial network protocol fuzzing framework,’’ in Proc. 15th ACM Int.
Electron. Technol. Group Corp. Electron. Sci. Res. Inst., Beijing, China,
Conf. Comput. Frontiers, May 2018, pp. 138–145.
2022.
[40] T. Y. Wang, S. H. Wu, Z. J. Li, H. G. Xin, X. Li, and Y. L. Chen, ‘‘PGNFuzz: [64] Y. Qin and C. Yue, ‘‘Fuzzing-based hard-label black-box attacks
Pointer generation network based fuzzing framework for industry control against machine learning models,’’ Comput. Secur., vol. 117, Jun. 2022,
protocols,’’ Comput. Sci., vol. 49, no. 10, pp. 310–318, Jun. 2022. Art. no. 102694.
[41] H. Zhao, Z. Li, H. Wei, J. Shi, and Y. Huang, ‘‘SeqFuzzer: An industrial [65] X. Shen, J. Zhang, X. Wang, H. Yu, and G. Sun, ‘‘Deep learning framework
protocol fuzzing framework from a deep learning perspective,’’ in Proc. fuzzing based on model mutation,’’ in Proc. IEEE 6th Int. Conf. Data Sci.
12th IEEE Conf. Softw. Test., Validation Verification (ICST), Xi’an, China, Cyberspace (DSC), Oct. 2021, pp. 375–380.
Apr. 2019, pp. 59–67, doi: 10.1109/ICST.2019.00016. [66] L. H. Park, J. Kim, J. Park, and T. Kwon, ‘‘Mixed and constrained input
[42] Y. Y. Liu, ‘‘Research on fuzzy testing technology based on deep learning,’’ mutation for effective fuzzing of deep learning systems,’’ Inf. Sci., vol. 614,
M.S. thesis, Dept. Cyberspace Secur., Beijing Univ. Posts Telecommun., pp. 497–517, Oct. 2022.
Beijing, China, 2021. [67] J. Gu, X. Luo, Y. Zhou, and X. Wang, ‘‘Muffin: Testing deep learning
[43] X. Liu, X. T. Li, R. Prajapati, and D. H. Wu, ‘‘DeepFuzz: Automatic libraries via neural architecture fuzzing,’’ in Proc. IEEE/ACM 44th Int.
generation of syntax valid C programs for fuzz testing,’’ in Proc. 33rd AAAI Conf. Softw. Eng. (ICSE), May 2022, pp. 1418–1430.
Conf. Artif. Intell. 31st Innov. Appl. Artif. Intell. Conf. 9th AAAI Symp. [68] H. Dai, C.-A. Sun, and H. Liu, ‘‘DeepController: Feedback-directed
Educ. Adv. Artif. Intell., 2019, pp. 1044–1051. fuzzing for deep learning systems,’’ in Proc. 34th Int. Conf. Softw. Eng.
[44] H. R. Xu, Y. J. Wang, Z. J. Huang, P. D. Xie, and S. H. Fan, ‘‘Compiler Knowl. Eng., Jul. 2022, pp. 531–536.
fuzzing test case generation with feed-forward neural network,’’ J. Softw., [69] X. Zhou and B. Wu, ‘‘Web application vulnerability fuzzing based
vol. 33, no. 6, pp. 1996–2011, Jun. 2022. on improved genetic algorithm,’’ in Proc. IEEE 4th Inf. Technol.,
[45] S. Lee, H. Han, S. K. Cha, and S. Son, ‘‘Montage: A neural network Netw., Electron. Autom. Control Conf. (ITNEC), vol. 1, Jun. 2020,
language model-guided JavaScript engine fuzzer,’’ in Proc. 29th USENIX pp. 977–981.
Secur. Symp., Aug. 2020, pp. 2613–2630. [70] B. Dolan-Gavitt, P. Hulin, E. Kirda, T. Leek, A. Mambretti, W. Robertson,
[46] G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, X. Sun, L. Bian, H. Wang, F. Ulrich, and R. Whelan, ‘‘LAVA: Large-scale automated vulnerabil-
and Z. Wang, ‘‘Automated conformance testing for JavaScript engines via ity addition,’’ in Proc. IEEE Symp. Secur. Privacy (SP), May 2016,
deep compiler fuzzing,’’ in Proc. 42nd ACM SIGPLAN Int. Conf. Program. pp. 110–121.
Lang. Design Implement., New York, NY, USA, Jun. 2021, pp. 435–450. [71] V.-T. Pham, M. Böhme, A. E. Santosa, A. R. Caciulescu, and A. Roychoud-
[47] G. J. Saavedra, K. N. Rodhouse, D. M. Dunlavy, and P. W. Kegelmeyer, hury, ‘‘Smart GreyBox fuzzing,’’ IEEE Trans. Softw. Eng., vol. 47, no. 9,
‘‘A review of machine learning applications in fuzzing,’’ 2019, pp. 1980–1997, Sep. 2021.
arXiv:1906.11133. [72] A. Zeller, R. Gopinath, M. Böhme, G. Fraser, and C. Holler. Accessed:
[48] M. Rajpal, W. Blum, and R. Singh, ‘‘Not all bytes are equal: Neural byte Oct. 25, 2023. [Online]. Available: https://www.fuzzingbook.org/
sieve for fuzzing,’’ 2017, arXiv:1711.04596. [73] S. Miao, J. Wang, C. Zhang, Z. Lin, J. Gong, X. Zhang, and J. Li, ‘‘Deep
[49] J.-M. Zhang, Z.-Q. Cui, X. Chen, H.-H. Wu, L.-W. Zheng, and learning in fuzzing: A literature survey,’’ in Proc. IEEE 2nd Int. Conf.
J.-B. Liu, ‘‘DeltaFuzz: Historical version information guided fuzz testing,’’ Electron. Technol., Commun. Inf. (ICETCI), Changchun, China, May 2022,
J. Comput. Sci. Technol., vol. 37, no. 1, pp. 29–49, Feb. 2022. pp. 220–223.

VOLUME 12, 2024 14453


A. Zhang et al.: Machine Learning-Based Fuzz Testing Techniques: A Survey

AO ZHANG received the bachelor’s degree in CONG WANG was born in Hunan, China, in 1988.
software engineering from Hebei University. She She received the M.S. and Ph.D. degrees in com-
is currently pursuing the master’s degree with the puter science from Tianjin University, China, in
Tianjin University of Science and Technology. Her 2012 and 2017, respectively. She is currently a
research interests include fuzzing, vulnerability Teacher with the Tianjin University of Science and
mining, and network security. Technology. Her research interests include net-
work security and authentication protocol design
and the Internet of Things.

YIYING ZHANG received the B.E. degree from


Northeast Normal University, in 1996, the M.Ec.
degree from Northeastern University, China,
in 2003, and the Ph.D. degree from Korea Univer-
sity, in 2010. He was a Postdoctoral Fellow with
State Grid Information and Telecommunication
Branch, from 2011 to 2013. He is currently a
Professor with the Tianjin University of Science
and Technology. His research interests include
network security, wireless sensor networks, the
Internet of Things, and smart grids.

YAO XU received the bachelor’s degree in soft- SIWEI LI received the B.E. degree in Jilin Engi-
ware engineering from Jilin University and the neering Normal University in 2011. He is currently
master’s degree from the School of Artificial pursuing the Ph.D. degree with Tianjin Univer-
Intelligence, Tianjin University of Science and sity. He is working in State Grid Information
Technology. His research interests include net- and Telecommunication Co., Ltd. He has long
work security and cloud environment security. been engaged in the research of smart grid load
management and grid artificial intelligence related
content.

14454 VOLUME 12, 2024

You might also like