0% found this document useful (0 votes)

13 views

FUZZCODER Byte-level Fuzzing Test via Large Language Model

The document introduces F UZZ C ODER, a framework that utilizes fine-tuned large language models (LLMs) to enhance the fuzzing process for identifying vulnerabilities in software by predicting mutation locations and strategies. It leverages an instruction dataset called Fuzz-Instruct, derived from successful fuzzing attempts, to guide the mutation of input files in a sequence-to-sequence modeling approach. Experimental results demonstrate significant improvements in mutation effectiveness and crash detection across various input formats compared to traditional fuzzing methods.

Uploaded by

cpjesus221

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

FUZZCODER Byte-level Fuzzing Test via Large Language Model

Uploaded by

cpjesus221

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

F UZZ C ODER: Byte-level Fuzzing Test via Large Language Model

Liqun Yang1 , Jian Yang1 ∗, Chaoren Wei1 , Guanglin Niu2 , Ge Zhang3,5 , Yunli Wang1 ,
Linzheng Chai1 , Wanxu Xia1 , Hongcheng Guo1 , Shun Zhang1 , Jiaheng Liu1 , Yuwei Yin1 ,
Junran Peng4 , Jiaxin Ma6 , Liang Sun1 Zhoujun Li1
1
Beihang University; 2 University of British Columbia; 3 University of Waterloo
4
University of Science and Technology Beijing; 5 M-A-P;
6
Beijing University of Posts and Telecommunications
[email protected];

Abstract Small DL/ML models

Fuzzing is an important dynamic program anal-

ysis technique designed for finding vulnerabili-
ties in complex software. Fuzzing involves pre-
arXiv:2409.01944v1 [cs.CL] 3 Sep 2024

(a) Baselines
senting a target program with crafted malicious
input to cause crashes, buffer overflows, mem-
ory errors, and exceptions. Crafting malicious 7f 45 7f 45
inputs in an efficient manner is a difficult open 00 01 1c 01
problem and the best approaches often apply 1d 2c LLM 1d 2c
uniform random mutations to pre-existing valid
(b) FZ-LLM
inputs. In this work, we propose to adopt fine-
tuned large language models (F UZZ C ODER) to
learn patterns in the input files from success- Figure 1: Comparison between the standard byte-level
ful attacks to guide future fuzzing explorations. fuzz test and our proposed method.
Specifically, we develop a framework to lever-
age the code LLMs to guide the mutation pro- become industry standards, while researchers fur-
cess of inputs in fuzzing. The mutation process
ther explore advanced strategies like evolutionary
is formulated as the sequence-to-sequence mod-
eling, where LLM receives a sequence of bytes fuzzing and hybrid approaches to enhance test case
and then outputs the mutated byte sequence. generation and code coverage. As the intricacy of
F UZZ C ODER is fine-tuned on the created in- software systems escalates, fuzzing continues to
struction dataset (Fuzz-Instruct), where the suc- evolve, proving its essential role in the realm of
cessful fuzzing history is collected from the software development and security testing.
heuristic fuzzing tool. F UZZ C ODER can pre- Based on neural network architectures like
dict mutation locations and strategies locations
RNNs and GANs (Goodfellow et al., 2016), this
in input files to trigger abnormal behaviors of
the program. Experimental results show that line of research has shown potential in improv-
F UZZ C ODER based on AFL (American Fuzzy ing test case generation, increasing code coverage,
Lop) gain significant improvements in terms and detecting elusive vulnerabilities. Trained on
of effective proportion of mutation (EPM) and billions of lines of code, large language models
number of crashes (NC) for various input for- (LLMs) have shown exceptional aptitude in vari-
mats including ELF, JPG, MP3, and XML.1 ous software engineering tasks in code generation
(Rozière et al., 2023; Bai et al., 2023; Guo et al.,
1 Introduction
2024a), program repair (Zhang et al., 2023; Guo
Fuzzing test (Guo et al., 2018; Xie et al., 2022; Wei et al., 2023), and fuzzing (Xia et al., 2024; Deng
et al., 2022; Cummins et al., 2018; Manès et al., et al., 2023; Huang et al., 2024; Yang et al., 2024).
2019; Li et al., 2018), a dynamic software test- The rigorous pre-training on vast code datasets
ing technique, has emerged as a powerful method forms the cornerstone of the capabilities of LLM
for uncovering vulnerabilities and defects within in code generation and comprehension, even for
software applications. Fuzzing frameworks like the encoded byte sequence. Byte level byte pair
AFL (American Fuzzy Lop) and libFuzzer have encoding (BBPE) tokenizer (Wang et al., 2020; Wu
∗ et al., 2024; Radford et al., 2019) have become
Corresponding author.
1
https://github.com/weimo3221/ the standard practices for state-of-the-art LLMs,
FUZZ-CODER which brings powerful understanding and genera-
tion capability for byte-like data. Moreover, these and strategies. The data in any format is first
LLMs can be further optimized through fine-tuning converted into a sequence of bytes as the input
or prompting to enhance their proficiency in spe- of LLMs. Then, the code LLM will decide
cific domains. However, how to effectively leverage the possible mutation strategies and positions.
instruction fine-tuning (IFT) to inspire LLMs to
help byte-based mutation for the fuzzing test still • We construct a complete framework to fine-
requires further exploration. tune the code LLMs with the help of the col-
In this paper, we investigate the feasibility of lected instruction corpora Fuzz-Instruct. To
leveraging code LLM to develop a framework, effectively evaluate the performance of differ-
guiding the mutation process of inputs in fuzzing. ent models, we construct a fuzzing test bench-
The mutation process is formulated as the sequence- mark Fuzz-Bench comprised of 8 programs,
to-sequence modeling, where LLM receives a byte which accept different formats of data (e.g.
sequence and then outputs the mutated byte se- ELF, JPG, MP3, and XML).
quence. The LLM is fine-tuned on the created • The experimental results on created bench-
instruction dataset, where the successful fuzzing mark Fuzz-Bench (simulation using AFL)
history is collected from the heuristic fuzzing tool. demonstrate the fine-tuned F UZZ C ODER sig-
In Figure 1, the instruction corpus is coupled into nificantly improves the effective proportion of
pairs comprised of original inputs and successfully mutation (EPM) and triggers more program
mutated inputs. F UZZ C ODER aims at identify- crashes compared to the previous baselines.
ing the most possible bytes within input files for
mutations. To gather the instruction dataset Fuzz- 2 Preliminary: Fuzzing Test
Instruct, we initially adopt standard fuzzing meth-
ods to record mutation instances that yield new Fuzzing is a robust software testing technique de-
code coverage or trigger crashes. Fuzz-Instruct signed to uncover vulnerabilities and flaws in com-
then serves to train F UZZ C ODER based on different puter programs, primarily by subjecting them to
code foundation models to guide towards generat- a barrage of unexpected and often invalid inputs.
ing promising mutated inputs. While our method- The fuzzing test can be mathematically represented
ology is adaptable to various fuzzing frameworks, as follows:
we apply it specifically to the state-of-the-art AFL,
F(T, g(x)) = R (1)
which introduces random mutations into a batch of
seed input files and curates a queue of new inputs, where F(·, ·) represents the fuzzing process receiv-
which are effective in tracing new code executions. ing mutation of input test cases. T is the target
Our proposed method is evaluated on software or program subjected to the fuzzing test.
the benchmark Fuzz-Bench, comprised I represents the input test cases, which are typi-
of 8 programs: NM_ELF, READ_ELF, cally malformed, unexpected, or random data. g(x)
OBJDUMP_ELF, LINT_XML, MP3GAIN_MP3, is the mutation format of the original input x. R
IMAGEMAGICK_GIF, SPLIT_TIFF, and stands for the results or observations obtained dur-
TRAN_JPEG. Fuzz-Bench accepts the different ing the fuzzing test, which may include system
format inputs, including ELF, XML, MP3, and crashes, error messages, or other unexpected be-
GIF. F UZZ C ODER significantly improves line haviors in the target software.
coverage and branch coverage compared to the American Fuzzy Lop2 (AFL) is a widely used au-
previous strong baselines. Further, we observe tomated vulnerability mining tool, which finds se-
that F UZZ C ODER triggers more new paths or the curity vulnerabilities in software programs through
frequency of code blocks found during fuzz testing fuzzy testing techniques. Fuzzy testing is a black-
due to the effective mutation prediction of the box testing methodology that injects random or
understanding capability of the code LLM. semi-random data into program inputs to detect
The key contributions are summarized as: anomalous behavior and potential vulnerabilities
in the program. In AFL, mutation refers to the
• We formulate the fuzzing test as a sequence-
generation of new fuzzy test inputs by modifying
to-sequence paradigm and then introduce the
the input samples, which is a core component of
generation model to attack vulnerable posi-
2
tions by selecting proper mutation positions https://github.com/google/AFL
AFL fuzzy testing. Its mutation strategy employs For OBJDUMP_ELF, the program displays various
a range of random and semi-randomized mutation information from object files (including executable
techniques to create a diversity of test inputs. Let files, target files, and shared libraries), such as dis-
x(i) ∈ {x(1) , . . . , x(n) } denote the seed test input assembled code and section table information. For
from the initial pool comprised of n test cases, MP3GAIN_MP3, the program adjusts the volume
we leverage the NLP techniques to generate the of MP3 audio files, which aims to balance and nor-
mutated test case z (i) . Different from the rule- malize the volume of MP3 files so that they sound
based mutation, we use a generation model to ob- more consistent when played without noticeable
tain variant samples for fuzzy testing by predicting volume differences. For IMAGEMAGICK_GIF,
variant locations and variant types. Specifically, the program is a tool in ImageMagick for process-
(i) (i)
x(i) = {x1 , . . . , xm } is m bytes input sequence, ing various image files (including JPG, PNG, GIF,
the prediction model M chooses k mutation po- etc.). It can get information about the image, ad-
sitions p = {p1 , . . . , pk } and their corresponding just the image, and process it. For SPLIT_TIFF,
mutation strategies s = {s1 , . . . , sk } to modify the it splits a TIFF file containing multiple images
original test case xk into z k . The process can be into multiple separate TIFF files, each file con-
described as: taining a frame or page from the input file. For
m
Y
TRAN_JPEG, it can rotate JPG images 90 degrees,
P (p, s|x(i) ) = P (pj , sj |x(i) , p<j , s<j ; Θ) (2) 180 degrees or 270 degrees clockwise. JPG images
j=1 can also be cropped, optimized, etc.
where p<j = (p1 , . . . , pj−1 ) and s<j = Data Construction For different programs, we
(s1 , . . . , sj−1 ). pj and sj represent the j-th mu- need to collect the data used for LLMs separately
tation position and mutation strategy respectively by fuzzing the programs with heuristic methods,
predicted by the previous context p<j and s<j se- where the baseline is denoted as AFL. Through the
quentially and the original test case x(i) . simulation of the original AFL, we can collect the
k valid mutations {(p1 , s1 ), . . . , (pk , sk )} for the
3 Fuzz-Bench
specific test case x. Then, we can construct the
We introduce 8 fuzzing datasets: NM_ELF, supervised training pair (x, p, s) comprised of the
READ_ELF, OBJDUMP_ELF, LINT_XML, input test case x, valid mutation positions p, and
MP3GAIN_MP3, IMAGEMAGICK_GIF, the corresponding strategies s. For each dataset,
SPLIT_TIFF, and TRAN_JPEG, which ac- we can obtain the corresponding instruction cor-
cept the different format inputs, including the pus Dt = {I (i) , x(i) , y (i) }N
i=1 (1 ≤ t ≤ T = 8,
t

ELF, XML, MP3, and GIF format. The program T is the number of the programs, Nt is the train-
subjected to the fuzzing test originates from the ing data size of the program t, and I (i) is the in-
FuzzBench3 and previous works4 . struction) and merge them as the whole dataset
Here, we describe the details of each dataset. D = {Dt }Tt=1 .
For LINT_XML, the program parses one or more Given the specific test case, there exist different
XML files and prints various types of output, de- valid mutation strategies to successfully fuzz the
pending upon the options selected. It is useful for program (e.g. the mutation leads to the program
detecting errors both in XML code and in the XML crash or triggers a new execution path). We can
parser itself. For READ_ELF, the program reads gather the valid mutation pairs together as the target
and displays information about the contents of sequence. i.e., valid (pi , si ) pairs of the test case.
ELF (executable and linkable Format) format files, In the following example, if its valid (pi , si ) pairs
which include executables, target files, and shared are (1, 2) and (1, 3), it denotes that the 2-th and 3-
libraries. For NM_ELF, the program displays sym- th token in the hexadecimal sequence will perform
bol table information in target files (including ex- 1-th operation to cause crash of the program. the
ecutables, target files, and shared libraries). The final expression can be described as follows:
symbol table contains symbols defined and refer-
enced in the program (e.g., variable names, func- Data Collection
tion names, etc.) and their associated attributes. Byte Input: 0x3c 0x21 0x44 0x4f 0x43
3
https://github.com/google/FuzzBench Mutation strategies: [(1, 2), (1, 3)]
4
https://github.com/fdu-sec/NestFuzz
Test Case LLM
Queue Test Case Encoder Decoder
𝑥! , …, 𝑥" 𝑦! , …, 𝑦#
Decoder-only

𝑥! , … , 𝑥" , 𝑦! , …, 𝑦#
7f 45 4c 46 01 01 01
00 00 00 00 00 00 02

Byte-level
Execute the next file Mutated Cases
Add to Queue
Execute the next file
No
7f 45 46 4c 01 01 01 Bitfilp
00 00 00 00 00 00 02 (0x46 0x4c)
Yes …
7f 45 4c 46 01 01 01 Replace
Trigger New Paths ?
Test Program 00 00 80 00 00 00 02 (0x00 with 0x80)

Figure 2: The workflow of the fuzzing test with fine-tuned LLMs F UZZ C ODER.

The queue of input sequences Q is used to store 4 Fuzzing Test via Generation Model
input test cases (test cases). When the fuzzing pro-
cess (e.g. AFL) starts, it automatically selects and 4.1 Input Encoding
mutates input data based on the response of the Our framework consists of a fuzzer and a model
target program to better explore potential program that highlights useful locations in an input file. Dur-
paths and boundary conditions. Q contains input ing runtime. the fuzzer queries the model for each
files that successfully caused the program to exe- seed file and focuses mutations on the highlighted
cute different paths during testing. These input files locations. Given an open-ended input file, we first
are considered valid because they cause program convert the input file into a sequence of bytes x(i)
execution to enter new code paths or trigger spe- in Figure 2 (hexadecimal sequence). Then, the
cific error conditions. To collect as much mutation generation model should predict the mutation po-
data as possible for each program, each program is sitions p = {p1 , . . . , pk } and the mutation strate-
fuzzed multiple times. gies s = {s1 , . . . , sk }, where the sk is the corre-
sponding mutation strategy of the position pk . To
Data Split Since the training of the model re- jointly model the mutation position and strategy,
quires a training set and a valid set, we randomly the prediction sequence y = (y1 , . . . , y2k ) can be
select 90% of the samples as the training set and described as:
10% of the data as the valid set. The number of
samples is described as: y = (p1 , s1 , . . . , pk , sk ) (3)

Benchmark Train Test Program Input Option where the model first predicts the mutation position
NM_ELF 4534 504 nm-new ELF -a @@ pk and then output the corresponding strategy sk .
READ_ELF 4167 464 readelf ELF -a @@
OBJDUMP_ELF 4009 446 objdump ELF -x -a -d @@
LINT_XML
MP3GAIN_MP3
5442
1431
605
150
xmllint
mp3gain
XML
MP3
–valid –recover @@
@@
4.2 Encoder-Decoder Framework
IMAGEMAGICK_GIT 6477 720 magick GIF identify @@
SPLIT_TIFF 4136 459 tiffsplit TIFF @@ Given the source inputs Dsrc and target predictions
TRAN_JPEG 1376 153 jpegtran JPEG @@
Dtrg , the encoder of the encoder-decoder-based
Table 1: Statistics of the different benchmarks. F UZZ C ODER first receives the original input x and
encodes it into the hidden states Henc with the
bidirectional attention mechanism.
Simulation Environment We incorporate the A
QK T

generation model into the AFL framework to sup- He = S(x, Me ) = Softmax √ ⊗ Me V
a=1 dk
port the fuzzing with LLM. The simulation environ- (4)
ment is Ubuntu 18.04.6 LTS, Intel Xeon Processor
(Skylake, IBRS), A100-PCIE-40GB, AFL-2.57b5 .
where A is the number of attention heads Then,
the decoder predicts the target tokens sequentially
5
https://github.com/google/AFL based on He .
4.3 Decoder-only Framework Task Description:
Now, you are a AFL (American Fuzzy Lop), which is a
Given the source inputs Dsrc and target predictions highly efficient and widely used fuzz testing tool designed
Dtrg , the encoder of the encoder-decoder-based for finding security vulnerabilities and bugs in software. You
are now fuzzing a program named {dataset_name}, which
F UZZ C ODER first receives the original input x and requires variable (a byte sequence) to run. I will give you a
encodes it into the hidden states Henc with the byte sequence as input sequence, and you need to mutate the
input sequence to give me a output sequence through a
bidirectional attention mechanism. mutation operation below. Finally you need to give me a
A
output which includes input sequence, mutation operation
QK T

√ and output sequence.
Hd = S(x, Md ) = Softmax ⊗ Md V
a=1 dk
Mutation Operations:
(5) {Mutation Operations 𝑂}
where A is the number of attention heads The de- Input Sequence Definition:
coder predicts the target tokens sequentially based It consists of bytes represented in hexadecimal, separated by
spaces. It is the byte sequence to be mutated. It is a variable
on He with the casual mask Md . that can cause the program to crash or trigger a new path.

4.4 Mutation Strategy Prediction Output Sequence Definition:

It consists of bytes represented in hexadecimal, separated by
For each mutation position pj , we use the genera- spaces. It is the mutated byte sequence. It is a variable that
tion model to infer the possible mutation strategy can cause the program to crash or trigger a new path.

for the position. 12 candidate mutation strategies

are provided for each position, including: (1) bit- Input Sequence:
{byte_input}
flip 1/1: perform bitfilp on a bit randomly. (2) Please list all possible mutation strategies (mutation position
bitflip 2/1: perform bitfilp on two neighboring bits and mutation operation) with the JSON format as:
output:
randomly. (3) bitflip 4/1: perform bitfilp on four {
neighboring bits randomly. (4) bitflip 8/8: ran- "mutation strategies": [
(𝑜! , 𝑝! ),
domly select a byte and XOR it with 0xff. (5) ... ,
bitflip 16/8: randomly select two neighboring bytes (𝑜" , 𝑝" ),
]
and XOR them with 0xff. (6) bitflip 32/8: randomly }
select four neighboring bytes and XOR them with
0xff. (7) arith 8/8: randomly select a byte and per- Figure 3: The prompt to get mutation positions and
form addition or subtraction on it (operands are strategies of F UZZ C ODER.
0x01 0x23). (8) arith 16/8: randomly select two
be described as:
neighboring bytes and convert these two bytes into
a decimal number. Select whether to swap the po- Lm = −Ex(i) ,p(i) ,s(i) ∈Dsrc log P (p(i) , s(i) |x(i) ) (6)
sitions of these two bytes. Perform addition or
subtraction on it (operands are 1 35). Finally, con- where x(i) is the i-th original input from the col-
vert this number to 2 bytes and put it back to its lected dataset. p = (p1 , . . . , pk ) is the predicted
original position. (9) arith 32/8: randomly select mutation positions and s = (s1 , . . . , sk ) is the mu-
four neighboring bytes. Select whether to swap the tation strategies.
positions of these four bytes. Convert these four 4.6 Incorporating LLMs into Fuzzing Test
bytes into a decimal number. Perform addition or
subtraction on it. Finally, convert this number to 4 The AFL tool will first compile our test program
bytes and put it back to its original position. (10) and then use the test cases after mutation as input
interest 8/8: randomly select a byte and replace it into the compiled program. The mutated test case
with a random byte. (11) interest 16/8: randomly causing a crash or triggering a new path will be
select two neighboring bytes and replace them with used as seeds. F UZZ C ODER adopts the Top-p sam-
two random bytes. (12) interest 32/8: randomly se- pling strategy to produce the candidate mutation
lect four neighboring bytes and replace them with strategy and position for diversity, which ensures
four random bytes. that the effective mutation strategy and mutation
positions are covered as much as possible.
4.5 Jointly Training
5 Experiments
Since the mutation strategies and positions y =
(p1 , s1 , . . . , pk , sk ) are our prediction goals, the su- We evaluate our proposed method F UZZ C ODER
pervised fine-tuning objective of F UZZ C ODER can on 8 test sets, including NM_ELF, READ_ELF,
Method Base Size bitflip 1/1 bitflip 2/1 bitflip 4/1 bitflip 8/8 bitflip 16/8 bitflip 32/8 arith 8/8 arith 16/8 arith 32/8 interest 8/8 interest 16/8 interest 32/8 Avg.
READ_ELF
AFL (Original) - - 1.50 0.66 0.25 0.33 0.09 0.24 0.30 0.00 0.00 0.48 0.06 0.03 0.33
AFL (LSTM) - - 1.37 1.11 0.97 0.00 0.00 0.00 2.49 0.00 0.00 0.00 0.00 0.49 0.54
AFL (Transformer) - - 1.11 1.04 1.02 1.61 0.00 0.90 3.99 0.22 0.30 2.34 1.98 0.82 1.28
F UZZ C ODER StarCoder-2 7B 3.42 0.92 1.28 2.45 0.12 0.15 0.63 0.12 0.05 0.45 2.41 0.34 1.03
F UZZ C ODER StarCoder-2 15B 4.21 2.38 1.43 2.95 0.24 0.21 1.25 0.45 0.38 0.57 1.38 0.45 1.32
F UZZ C ODER CodeLlama 7B 3.82 2.24 1.45 2.01 0.17 0.33 1.36 0.19 0.43 1.24 0.95 0.91 1.26
F UZZ C ODER DeepSeek-Coder 7B 1.98 1.73 0.66 3.13 0.08 0.24 2.92 0.22 0.25 1.48 1.82 2.05 1.38
F UZZ C ODER CodeQwen 7B 3.00 1.41 2.07 1.09 0.66 0.97 5.86 0.37 0.37 0.73 0.54 1.15 1.52
F UZZ C ODER CodeShell 7B 2.08 2.42 1.34 3.81 0.54 0.55 2.45 0.55 0.02 0.45 0.25 1.23 1.31
OBJ_DUMP
AFL (Original) - - 2.07 0.89 0.43 0.43 1.35 1.93 0.31 0.08 0.01 0.79 0.21 0.11 0.72
AFL (LSTM) - - 1.26 4.20 2.95 1.21 1.23 2.81 1.33 1.67 0.00 2.78 2.45 2.64 2.04
AFL (Transformer) - - 1.97 1.68 0.86 0.00 1.38 1.84 1.27 1.61 1.47 1.82 1.01 1.28 1.35
F UZZ C ODER StarCoder-2 7B 1.24 1.71 0.02 1.21 0.23 0.05 1.52 0.85 0.32 0.01 0.23 0.43 0.65
F UZZ C ODER StarCoder-2 15B 1.37 1.74 0.11 2.48 0.08 0.73 1.78 0.43 0.55 0.07 0.11 1.28 0.89
F UZZ C ODER CodeLlama 7B 1.62 1.32 0.18 1.15 0.49 2.43 0.75 0.19 0.37 0.05 0.14 1.15 0.82
F UZZ C ODER DeepSeek-Coder 7B 1.74 1.10 0.50 2.00 1.21 3.45 6.84 1.70 3.45 1.44 1.63 1.47 2.21
F UZZ C ODER CodeQwen 7B 1.16 0.95 0.46 6.23 1.05 0.82 3.87 0.36 1.27 0.42 1.08 1.44 1.59
F UZZ C ODER CodeShell 7B 1.12 0.32 0.07 2.43 2.45 0.35 1.34 0.23 0.05 0.34 0.13 0.92 0.81
NM
AFL (Original) - - 1.35 0.41 0.04 0.38 2.03 1.29 0.10 0.01 0.00 0.23 0.03 0.05 0.49
AFL (LSTM) - - 1.95 0.84 0.09 9.74 0.00 0.90 2.47 0.00 0.00 0.24 0.75 0.72 1.47
AFL (Transformer) - - 0.90 0.83 0.30 3.48 1.27 1.31 3.80 1.32 0.00 0.00 1.29 0.52 1.25
F UZZ C ODER StarCoder-2 7B 1.34 0.23 0.75 0.18 0.85 0.38 1.78 0.01 0.34 0.05 0.11 0.01 0.50
F UZZ C ODER StarCoder-2 15B 1.41 0.37 1.21 0.34 0.93 0.72 2.43 0.08 0.17 0.14 0.05 0.05 0.66
F UZZ C ODER CodeLlama 7B 0.17 0.13 0.83 0.71 0.71 0.81 1.82 0.03 0.11 0.35 0.08 0.26 0.50
F UZZ C ODER DeepSeek-Coder 7B 2.19 1.83 1.01 1.88 1.25 0.97 2.40 1.87 3.42 2.96 1.66 0.44 1.82
F UZZ C ODER CodeQwen 7B 1.83 0.54 1.27 1.39 1.37 1.32 2.98 0.97 2.41 1.12 2.69 2.43 1.69
F UZZ C ODER CodeShell 7B 1.91 0.23 0.83 1.01 0.91 0.24 0.95 1.34 0.85 0.23 1.34 1.23 0.92
LINT_XML
AFL (Original) - - 11.21 1.75 1.49 0.13 3.37 5.42 0.82 0.11 0.00 1.13 0.24 0.08 2.15
AFL (LSTM) - - 2.82 2.06 4.60 0.00 3.09 0.00 3.01 0.00 0.00 4.64 3.24 0.00 1.96
AFL (Transformer) - - 5.71 2.90 3.01 0.00 2.99 3.08 2.82 0.00 0.00 7.15 0.00 0.00 2.31
F UZZ C ODER StarCoder-2 7B 0.05 0.25 0.43 3.42 1.02 3.42 0.55 0.73 0.01 0.53 2.41 1.31 1.18
F UZZ C ODER StarCoder-2 15B 0.13 0.13 0.54 2.72 1.73 2.43 0.48 0.54 0.34 0.71 3.42 2.33 1.29
F UZZ C ODER CodeLlama 7B 0.31 0.32 1.31 12.31 2.43 1.27 0.83 0.34 0.45 0.65 2.45 1.43 2.01
F UZZ C ODER DeepSeek-Coder 7B 0.99 0.00 0.49 14.28 8.31 0.36 0.84 0.72 0.41 2.61 1.42 9.80 3.35
F UZZ C ODER CodeQwen 7B 0.68 0.82 0.19 19.51 6.42 0.00 1.65 0.91 0.28 3.63 0.41 2.51 3.08
F UZZ C ODER CodeShell 7B 0.13 0.15 0.08 5.41 4.65 2.43 0.94 0.45 0.34 0.12 0.71 3.41 1.57
MP3_GAIN
AFL (Original) - - 0.65 0.22 0.15 0.09 0.91 0.40 0.08 0.09 0.01 0.23 0.28 0.17 0.27
AFL (LSTM) - - 1.60 1.68 1.19 0.33 0.65 0.00 1.95 1.61 0.00 1.16 3.46 3.44 1.42
AFL (Transformer) - - 2.70 1.01 0.93 0.00 0.52 0.19 1.25 0.17 0.00 1.02 3.20 3.87 1.24
F UZZ C ODER StarCoder-2 7B 0.85 0.78 0.45 2.10 0.02 0.03 5.67 0.01 0.01 0.95 3.25 4.00 1.51
F UZZ C ODER StarCoder-2 15B 0.90 0.82 0.50 2.20 0.03 0.04 5.80 0.01 0.01 1.00 3.30 4.10 1.56
F UZZ C ODER CodeLlama 7B 0.80 0.76 0.40 2.00 0.01 0.02 5.50 0.00 0.01 0.90 3.20 3.90 1.46
F UZZ C ODER DeepSeek-Coder 7B 0.76 0.75 0.36 2.13 0.00 0.00 6.44 0.00 0.00 1.25 3.30 4.12 1.59
F UZZ C ODER CodeQwen 7B 1.09 0.83 0.48 0.82 1.05 0.00 2.72 0.00 0.00 1.72 3.21 3.50 1.29
F UZZ C ODER CodeShell 7B 0.88 0.79 0.42 2.05 0.01 0.02 5.60 0.00 0.01 1.05 3.22 3.95 1.50
IMAGE_MAGICK
AFL (Original) - - 1.95 0.30 0.36 1.89 1.14 2.26 0.74 0.00 0.09 0.94 0.16 0.09 0.83
AFL (LSTM) - - 3.12 1.29 0.26 0.00 0.00 0.00 5.66 0.00 0.00 0.00 0.00 13.39 1.98
AFL (Transformer) - - 3.88 1.05 0.62 3.02 1.67 1.22 12.28 0.00 0.00 2.34 1.16 0.00 2.27
F UZZ C ODER StarCoder-2 7B 2.05 1.82 0.70 1.40 0.00 0.80 8.90 1.30 0.00 3.20 8.10 3.15 2.62
F UZZ C ODER StarCoder-2 15B 2.25 2.00 0.75 1.50 0.00 0.85 9.05 1.40 0.00 3.30 8.20 3.25 2.71
F UZZ C ODER CodeLlama 7B 2.10 1.85 0.71 1.42 0.00 0.82 8.92 1.32 0.00 3.22 8.12 3.17 2.64
F UZZ C ODER DeepSeek-Coder 7B 2.15 1.88 0.72 1.43 0.00 0.81 8.95 1.34 0.00 3.24 8.15 3.19 2.65
F UZZ C ODER CodeQwen 7B 3.16 0.60 0.52 2.37 0.00 10.33 15.34 0.00 0.00 2.11 6.09 9.88 4.20
F UZZ C ODER CodeShell 7B 2.12 1.86 0.73 1.44 0.00 0.83 8.97 1.35 0.00 3.25 8.16 3.20 2.66
SPLIT_TIFF
AFL (Original) - - 0.80 0.28 0.05 0.03 0.00 2.25 0.29 0.05 0.01 0.04 0.10 0.08 0.33
AFL (LSTM) - - 0.00 0.00 0.00 0.00 0.00 0.18 0.00 0.00 0.00 0.00 0.30 0.18 0.05
AFL (Transformer) - - 0.06 0.02 0.01 0.26 0.00 0.00 0.36 0.14 0.00 0.01 0.25 0.73 0.15
F UZZ C ODER StarCoder-2 7B 0.15 0.05 0.10 2.10 0.00 0.70 0.05 0.00 0.00 0.01 0.02 0.01 0.27
F UZZ C ODER StarCoder-2 15B 0.20 0.08 0.18 2.20 0.00 0.75 0.06 0.00 0.00 0.02 0.03 0.02 0.29
F UZZ C ODER CodeLlama 7B 0.18 0.09 0.15 2.15 0.00 0.73 0.07 0.00 0.00 0.03 0.01 0.03 0.29
F UZZ C ODER DeepSeek-Coder 7B 0.34 1.01 0.22 2.33 0.43 0.76 0.04 1.08 0.44 0.54 0.64 0.34 0.68
F UZZ C ODER CodeQwen 7B 0.23 0.10 0.00 0.00 0.00 0.00 0.19 0.00 0.00 0.00 0.26 0.19 0.08
F UZZ C ODER CodeShell 7B 0.14 0.07 0.11 2.12 0.00 0.72 0.03 0.00 0.00 0.01 0.02 0.01 0.27
TRAN_JPEG
AFL (Original) - - 1.41 0.35 0.15 0.27 0.41 1.18 0.18 0.08 0.01 0.32 0.21 0.11 0.39
AFL (LSTM) - - 2.68 0.98 0.52 0.82 0.00 0.00 5.80 0.94 0.00 1.44 3.67 2.15 1.58
AFL (Transformer) - - 0.14 1.11 0.66 1.32 1.30 1.94 2.42 1.96 0.00 1.83 2.82 2.76 1.52
F UZZ C ODER StarCoder-2 7B 0.40 0.22 0.60 0.10 0.00 0.05 2.60 0.00 0.00 0.05 0.55 2.50 0.59
F UZZ C ODER StarCoder-2 15B 0.50 0.28 0.65 0.15 0.00 0.08 2.70 0.01 0.00 0.10 0.60 2.60 0.64
F UZZ C ODER CodeLlama 7B 0.45 0.25 0.58 0.12 0.00 0.07 2.55 0.00 0.00 0.07 0.54 2.45 0.59
F UZZ C ODER DeepSeek-Coder 7B 0.36 0.21 0.56 0.00 0.00 0.00 2.52 0.00 0.00 0.00 0.53 2.40 0.55
F UZZ C ODER CodeQwen 7B 3.40 0.54 0.86 0.45 0.53 0.54 1.29 1.13 0.54 2.11 6.21 1.34 1.58
F UZZ C ODER CodeShell 7B 0.42 0.23 0.54 0.08 0.00 0.06 2.50 0.00 0.00 0.03 0.52 2.35 0.56

Table 2: Evaluation results (EPM, ‰) of multiple models. Bitflip a/b denotes a ∗ b bits are flipped as a whole. Arith
a/b denotes the a ∗ b bits for addition and subtraction operations.

OBJDUMP_ELF, LINT_XML, MP3GAIN_MP3, 5.1 Implementation Details

IMAGEMAGICK_GIF, SPLIT_TIFF, and By performing fuzzy tests using AFL6 , we col-
TRAN_JPEG. In this section, we provide the lect the original and variant inputs of successful at-
details, results, and analysis of the experiments. tacks as a training set (nearly 30K SFT pairs). Our
6
https://lcamtuf.coredump.cx/afl/
Method Base Size READ_ELF OBJ_DUMP NM LINT_XML MP3_GAIN IMAGE_MAGICK SPLIT_TIFF TRAN_JPEG Avg.
AFL (Original) - - 0 0 0 117 68 0 95 0 35
AFL (LSTM) - - 0 0 0 55 53 0 42 0 19
AFL (Transformer) - - 0 0 0 61 45 0 77 0 23
F UZZ C ODER StarCoder-2 7B 2 3 1 100 150 12 110 1 47
F UZZ C ODER StarCoder-2 15B 4 5 2 120 180 15 130 2 57
F UZZ C ODER CodeLlama 7B 3 2 0 90 140 10 100 1 43
F UZZ C ODER DeepSeek-Coder 7B 2 4 0 130 230 3 224 3 75
F UZZ C ODER CodeQwen 7B 1 9 0 114 209 4 221 2 70
F UZZ C ODER CodeShell 7B 3 6 1 95 160 11 105 1 48

Table 3: Number of crashes of different models on eight datasets.

READ_ELF OBJ_DUMP NM LINT_XML

Line Branch Function Avg. Line Branch Function Avg. Line Branch Function Avg. Line Branch Function Avg.
AFL (Original) 7.9 7.3 9.9 8.4 1.7 1.1 2.8 1.9 0.3 0.1 1.1 0.5 8.2 8.0 11.0 9.1
AFL (LSTM) 7.3 6.6 9.0 7.6 1.6 1.0 2.8 1.8 0.3 0.2 1.1 0.5 8.1 7.8 10.9 8.9
AFL (Transformer) 6.6 5.9 8.2 6.9 1.6 1.0 2.7 1.8 0.3 0.1 1.0 0.5 8.0 7.7 11.0 8.9
F UZZ C ODER (Deepseek-Coder) 14.9 16.5 15.4 15.6 2.0 1.5 3.1 2.2 0.6 0.3 1.9 0.9 9.2 9.4 11.8 10.1
F UZZ C ODER (CodeQwen) 14.5 15.9 15.2 15.2 2.0 1.5 3.1 2.2 0.6 0.4 1.9 1.0 8.7 8.8 11.3 9.6
MP3_GAIN IMAGE_MAGICK SPLIT_TIFF TRAN_JPEG
AFL (Original) 53.5 41.3 58.1 51.0 87.5 50.0 100.0 79.2 1.0 1.4 1.4 1.3 17.8 22.6 27.5 22.6
AFL (LSTM) 53.2 40.8 58.1 50.7 87.5 50.0 100.0 79.2 0.9 1.3 1.1 1.1 15.5 18.8 26.3 20.2
AFL (Transformer) 54.0 41.5 58.1 51.2 87.5 50.0 100.0 79.2 1.0 1.4 1.4 1.3 15.4 18.3 26.3 20.0
F UZZ C ODER (Deepseek-Coder) 54.9 43.2 59.1 52.4 87.5 50.0 100.0 79.2 1.0 1.6 1.4 1.3 19.0 24.7 27.9 23.9
F UZZ C ODER (CodeQwen) 54.9 42.8 59.1 52.3 87.5 50.0 100.0 79.2 1.0 1.6 1.4 1.3 18.2 23.1 27.2 22.8

Table 4: Coverate rate (%) of different models on 8 datasets.

Input Gain
AFL (Original)
6 5.8 AFL (LSTM)
5.6 5.6
AFL (Transformer)
5.0 FuzzCoder (StarCoder2)
5 FuzzCoder (CodeLlama)
FuzzCoder (Deepseek-Coder)
FuzzCoder (CodeQwen)
Number (K)

4 3.8 3.8
3.6
3.4
3.1
2.9
3 2.8

2.4 2.4 2.4

2.3 2.3 2.3
2.1 2.1
1.9
2 1.8
1.7
1.8
1.7 1.7
1.5 1.5 1.5
1.4 1.4
1.3
1.2 1.2
1.1
1.0
0.9 0.9
1 0.7
0.8 0.8 0.8
0.7
0.8 0.8
0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3 0.3 0.3
0.2

0
READ_ELF OBJ_DUMP NM_ELF LINT_XML MP3_GAIN IMAGE_MAGICK SPLIT_TIFF TRAN_JPEG
Dataset

Figure 4: Comparison between the baselines and F UZZ C ODER.

Figure 5: Comparison between the original JPG file and the JPG file after blur test

model based on open-source code LLMs CodeL- tool to improve the effectiveness of the fuzzing test.
lama, Deepseek-Coder, and CodeQwen is trained StarCoder-2: StarCoder-2 models with 3B, 7B,
for 3 epochs with a cosine scheduler, starting at a and 15B parameters are trained on 3.3 to 4.3 tril-
learning rate of 5e-5 (3% warmup steps). We use lion tokens, supporting hundreds of programming
the AdamW (Loshchilov and Hutter, 2017) opti- languages. Code-Llama: Code-Llama is a fam-
mizer with a batch size of 1024 (max length 4K). ily of code large language models based on Llama
2, providing infilling and long context capabilities.
5.2 Methods DeepSeek-Coder: Deepseek-coder is a series of
open-source code models with sizes from 1.3B to
AFL (Original): The original AFL with the 33B, pre-trained from scratch on 2 trillion tokens.
heuristic mutation rules is used as a baseline. CodeQwen: CodeQwen with 7B parameters sup-
AFL (LSTM): We use the encoder-decoder-based ports 92 languages and 64K tokens.
LSTM network without pre-training to decide the
mutation position and strategy. AFL (Trans-
former): The encoder-decoder-based Transformer
without pre-training is incorporated into the AFL
5.3 Evaluation Metrics can trigger more complete paths more effectively,
Effective proportion of mutation (EPM): For so the higher these two metrics, the better.
each mutation of the seed sample in the queue, a Case study In Figure 5, we take the
mutation location is selected, and then the corre- JPEG_TRANS program as an example. In
sponding mutation strategy is carried out for a mu- Figure 5, the original Image will get Mutated
tation location. The effective proportion of muta- Image after several rounds of fuzzing test. We use
tions (‰) can be used to evaluate the effectiveness the big language model to guide the mutation of
of different methods. Image. For example, where Original Image was
Number of Crashes (NC): This indicator refers 0x53, it becomes 0x51. And the SSIM Score of
to the number of input samples that cause the pro- Mutated Image vs. Original Image is 0.93. The
gram to crash during fuzz testing and is used to Mutated Image is then fed into the JPEGTRAN
measure the number of malicious inputs and the program, which triggers a new code path or a
number of vulnerabilities. program crash.

5.4 Main Results 7 Related Work

Results of EPM In Table 2, we find that the F UZ - Fuzzing Test Inspired by the success of
Z C ODER generally has better EPM than the AFL sequence-to-sequence learning (s2s) in many NLP
(Original) in each of the 8 programs and differ- tasks (Vaswani et al., 2017; Yang et al., 2020,
ent LLMs have their own advantages in different 2022b,a), the fuzzing test approaches use s2s to
programs. The results demonstrate that the code train neural networks to learn generative models of
LLMs with the powerful understanding and genera- the input formats for fuzzing. For different input
tion capabilities can further bring improvement for formats and the target program, random mutation
the fuzzing test, compared to the AFL with small of the inputs makes it hard to find the vulnerable
models. positions to fuzz the program. Deep-learning-based
methods (Godefroid et al., 2017; He et al., 2019;
Results of NC In Table 3, our vulnerability find-
Patra and Pradel, 2016; Yang et al., 2024) present a
ings for READ_ELF and NM programs have 0 re-
technique to use LSTMs to learn grammar for PDF
sults on AFL (Original), AFL (LSTM and Trans-
objects using a character-level model, which can
former), which indicates that these two datasets are
then be sampled to generate new inputs. Instead
hard to vulnerabilities in the limited time. It shows
of learning grammar, our technique uses neural
that the mutation sequences from the LLMs easily
networks to learn a function to predict promising
lead to the crash for the program to be tested.
locations in a seed file to perform mutations. The
previous methods are hindered by a small number
6 Discussions and Analysis
of parameters and the training corpora lack com-
Input Gain (IG) Figure 4 shows the number of mon knowledge of the byte sequence, codes, and
new paths of changes in the execution of code reasoning. Recently, researchers (Xia et al., 2024;
blocks found during fuzz testing of the target pro- Deng et al., 2023) directly leverage prompt engi-
gram. We can observe that F UZZ C ODER signifi- neering to inspire the instruct-following capability
cantly improves the performance compared to the of LLMs for effective fuzzing.
heuristic methods.
Domain-specific Large Language Model Large
Coverage Rate In Table 4, we report the cover- language models (LLMs) (Touvron et al., 2023a,b;
age rate of different models, including line cover- Achiam et al., 2023; Bai et al., 2023) based on
age, branch coverage, and function coverage. Line the decoder-only Transformer architecture have be-
coverage refers to the ratio of whether each line of come a cornerstone in the realm of natural language
code has been executed at the time of the program processing (NLP). The pre-training on a vast corpus
under test fuzzing, and branch coverage refers to of internet text, encompassing billions of tokens
the ratio of whether each conditional branch has enables LLMs to understand and generate human-
been executed at the time of the program under test style responses, making them highly versatile as
fuzzing. By looking at these two metrics, we can zero-short learners. Further, code LLMs tailored
know whether the test cases mutated by the Fuzzer for software engineering tasks push boundaries of
code understanding and generation (Chai et al.,
2024; Guo et al., 2023, 2024b; Rozière et al., 2023;
Guo et al., 2024a; Wu et al., 2024; Slagle, 2024).
The code LLM supports many code-related works,
such as code translation, code generation, code
refinement, program repair, and fuzzing. Recent
methods tailored for fuzzing (Xia et al., 2024; Yao
et al., 2024; Deng et al., 2023) relying on common
LLMs without domain-specific instruction tuning
can not effectively unleash the potential of LLMs
in the field of fuzzing.

8 Conclusions
In this paper, we present F UZZ C ODER, a series of
fine-tuned large language models for the fuzzing
test. First, we collect the Fuzz-Instruct dataset
based on a self-instruct strategy, which contains
multiple programs to improve the generalization
ability of LLMs on fuzzing operations. Then, to
easily evaluate the performance of existing LLMs
on fuzzing test, we also introduce the Fuzz-Bench
evaluation benchmark dataset with eight programs.
Besides, we also introduce the mixture-of-adapter
strategy to further enhance the instruction tuning
performance. Moreover, extensive experimental
results on our F UZZ C ODER demonstrate the effec-
tiveness of our F UZZ C ODER for fuzzing test.
References Software Engineering Conference and Symposium
on the Foundations of Software Engineering, pages
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama 739–743.
Ahmad, Ilge Akkaya, Florencia Leoni Aleman,
Diogo Almeida, Janko Altenschmidt, Sam Altman, Jingxuan He, Mislav Balunović, Nodar Ambroladze,
Shyamal Anadkat, et al. 2023. Gpt-4 technical report. Petar Tsankov, and Martin Vechev. 2019. Learning
arXiv preprint arXiv:2303.08774. to fuzz from symbolic execution with application to
smart contracts. In Proceedings of the 2019 ACM
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, SIGSAC conference on computer and communica-
Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei tions security, pages 531–548.
Huang, et al. 2023. Qwen technical report. arXiv
preprint arXiv:2309.16609. Linghan Huang, Peizhou Zhao, Huaming Chen, and
Lei Ma. 2024. Large language models based
Linzheng Chai, Jian Yang, Tao Sun, Hongcheng Guo, fuzzing techniques: A survey. arXiv preprint
Jiaheng Liu, Bing Wang, Xiannian Liang, Jiaqi Bai, arXiv:2402.00350.
Tongliang Li, Qiyao Peng, et al. 2024. xcot: Cross-
lingual instruction tuning for cross-lingual chain-of- Jun Li, Bodong Zhao, and Chao Zhang. 2018. Fuzzing:
thought reasoning. arXiv preprint arXiv:2401.07037. a survey. Cybersecurity, 1(1):1–13.

Chris Cummins, Pavlos Petoumenos, Alastair Murray, Ilya Loshchilov and Frank Hutter. 2017. Decou-
and Hugh Leather. 2018. Compiler fuzzing through pled weight decay regularization. arXiv preprint
deep learning. In Proceedings of the 27th ACM SIG- arXiv:1711.05101.
SOFT International Symposium on Software Testing
and Analysis, pages 95–105. Valentin JM Manès, HyungSeok Han, Choongwoo Han,
Sang Kil Cha, Manuel Egele, Edward J Schwartz,
Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, and Maverick Woo. 2019. The art, science, and engi-
Chenyuan Yang, and Lingming Zhang. 2023. Large neering of fuzzing: A survey. IEEE Transactions on
language models are zero-shot fuzzers: Fuzzing deep- Software Engineering, 47(11):2312–2331.
learning libraries via large language models. In Pro-
Jibesh Patra and Michael Pradel. 2016. Learning to fuzz:
ceedings of the 32nd ACM SIGSOFT international
Application-independent fuzz testing with probabilis-
symposium on software testing and analysis, pages
tic, generative models of input data. TU Darmstadt,
423–435.
Department of Computer Science, Tech. Rep. TUD-
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. CS-2016-14664.
Learn&fuzz: Machine learning for input fuzzing. In Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
2017 32nd IEEE/ACM International Conference on Dario Amodei, Ilya Sutskever, et al. 2019. Language
Automated Software Engineering (ASE), pages 50–59. models are unsupervised multitask learners.
IEEE.
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi,
2016. Deep Learning. MIT Press. http://www. Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023.
deeplearningbook.org. Code Llama: Open foundation models for code.
arXiv preprint arXiv:2308.12950.
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie,
Kai Dong, Wentao Zhang, Guanting Chen, Xiao Kevin Slagle. 2024. Spacebyte: Towards deleting to-
Bi, Y Wu, YK Li, et al. 2024a. Deepseek-coder: kenization from large language modeling. arXiv
When the large language model meets programming– preprint arXiv:2404.14408.
the rise of code intelligence. arXiv preprint
arXiv:2401.14196. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier
Martinet, Marie-Anne Lachaux, Timothée Lacroix,
Hongcheng Guo, Jian Yang, Jiaheng Liu, Liqun Yang, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal
Linzheng Chai, Jiaqi Bai, Junran Peng, Xiaorong Hu, Azhar, et al. 2023a. Llama: Open and effi-
Chao Chen, Dongfeng Zhang, et al. 2023. Owl: A cient foundation language models. arXiv preprint
large language model for it operations. arXiv preprint arXiv:2302.13971.
arXiv:2309.09298.
Hugo Touvron, Louis Martin, Kevin Stone, Peter Al-
Hongcheng Guo, Wei Zhang, Anjie Le, Jian Yang, Jia- bert, Amjad Almahairi, Yasmine Babaei, Nikolay
heng Liu, Zhoujun Li, Tieqiao Zheng, Shi Xu, Run- Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti
qiang Zang, Liangfan Zheng, et al. 2024b. Lemur: Bhosale, et al. 2023b. Llama 2: Open founda-
Log parsing with entropy sampling and chain-of- tion and fine-tuned chat models. arXiv preprint
thought merging. arXiv preprint arXiv:2402.18205. arXiv:2307.09288.
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Jiaguang Sun. 2018. Dlfuzz: Differential fuzzing Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
testing of deep learning systems. In Proceedings Kaiser, and Illia Polosukhin. 2017. Attention is all
of the 2018 26th ACM Joint Meeting on European you need. In NIPS 2017, pages 5998–6008.
Changhan Wang, Kyunghyun Cho, and Jiatao Gu. 2020. Quanjun Zhang, Tongke Zhang, Juan Zhai, Chunrong
Neural machine translation with byte-level subwords. Fang, Bowen Yu, Weisong Sun, and Zhenyu Chen.
In The Thirty-Fourth AAAI Conference on Artificial 2023. A critical review of large language model
Intelligence, AAAI 2020, The Thirty-Second Innova- on software engineering: An example from chat-
tive Applications of Artificial Intelligence Conference, gpt and automated program repair. arXiv preprint
IAAI 2020, The Tenth AAAI Symposium on Educa- arXiv:2310.08879.
tional Advances in Artificial Intelligence, EAAI 2020,
New York, NY, USA, February 7-12, 2020, pages
9154–9160. AAAI Press.

Anjiang Wei, Yinlin Deng, Chenyuan Yang, and Ling-

ming Zhang. 2022. Free lunch for testing: Fuzzing
deep-learning libraries from open source. In Proceed-
ings of the 44th International Conference on Software
Engineering, pages 995–1007.

Shangda Wu, Xu Tan, Zili Wang, Rui Wang, Xiaobing

Li, and Maosong Sun. 2024. Beyond language mod-
els: Byte models are digital world simulators. arXiv
preprint arXiv:2402.19155.

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian,

Michael Pradel, and Lingming Zhang. 2024.
Fuzz4all: Universal fuzzing with large language mod-
els. arXiv preprint arXiv:2308.04748.

Danning Xie, Yitong Li, Mijung Kim, Hung Viet Pham,

Lin Tan, Xiangyu Zhang, and Michael W Godfrey.
2022. Docter: documentation-guided fuzzing for
testing deep learning api functions. In Proceedings
of the 31st ACM SIGSOFT International Symposium
on Software Testing and Analysis, pages 176–188.

Jian Yang, Shuming Ma, Dongdong Zhang, Shuangzhi

Wu, Zhoujun Li, and Ming Zhou. 2020. Alternating
language modeling for cross-lingual pre-training. In
AAAI 2020, pages 9386–9393.

Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang,

Zhoujun Li, and Furu Wei. 2022a. High-resource
language-specific training for multilingual neural ma-
chine translation. In IJCAI 2022, pages 4461–4467.

Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang,

Shuangzhi Wu, Hongcheng Guo, Zhoujun Li, and
Furu Wei. 2022b. UM4: unified multilingual multi-
ple teacher-student model for zero-resource neural
machine translation. In IJCAI 2022, pages 4454–
4460.

Liqun Yang, Chaoren Wei, Jian Yang, Jinxin Ma,

Hongcheng Guo, Long Cheng, and Zhoujun Li. 2024.
Seq2seq-afl: Fuzzing via sequence-to-sequence
model. International Journal of Machine Learning
and Cybernetics, pages 1–19.

Dongyu Yao, Jianshu Zhang, Ian G Harris, and Mar-

cel Carlsson. 2024. Fuzzllm: A novel and univer-
sal fuzzing framework for proactively discovering
jailbreak vulnerabilities in large language models.
In ICASSP 2024-2024 IEEE International Confer-
ence on Acoustics, Speech and Signal Processing
(ICASSP), pages 4485–4489. IEEE.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
61% (72)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
Trauma-Focused ACT - Russ Harris
95% (38)
Trauma-Focused ACT - Russ Harris
568 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Penis Enlargement Secret
61% (123)
Penis Enlargement Secret
12 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (29)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
2025 MandateForLeadership FULL
70% (10)
2025 MandateForLeadership FULL
920 pages
How To Kiss A Woman's Breast
60% (114)
How To Kiss A Woman's Breast
14 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Fault Tolerant & Fault Testable Hardware Design
From Everand
Fault Tolerant & Fault Testable Hardware Design
Parag K. Lala
5/5 (2)
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
LLM Reprogramming
No ratings yet
LLM Reprogramming
20 pages
Black-Box Prediction of Flaky Test Fix Categories Using Language Models
No ratings yet
Black-Box Prediction of Flaky Test Fix Categories Using Language Models
12 pages
2406.06852v2-A Survey of Backdoor Attacks and Defenses on Large Language Models-Implications for Security Measures
No ratings yet
2406.06852v2-A Survey of Backdoor Attacks and Defenses on Large Language Models-Implications for Security Measures
19 pages
2502.17139v1
No ratings yet
2502.17139v1
13 pages
W2 SDefense
No ratings yet
W2 SDefense
14 pages
Does Prompt Formatting Have Any Impact on LLM Performance
No ratings yet
Does Prompt Formatting Have Any Impact on LLM Performance
16 pages
2308.08784
No ratings yet
2308.08784
10 pages
3650212.3680389
No ratings yet
3650212.3680389
13 pages
computer 3
No ratings yet
computer 3
17 pages
Robust Prompt Optimization For Defending Language Models Against Jailbreaking Attacks
No ratings yet
Robust Prompt Optimization For Defending Language Models Against Jailbreaking Attacks
17 pages
Catastropic Jailbreak of Open Source Llm...
No ratings yet
Catastropic Jailbreak of Open Source Llm...
19 pages
2024 - Anti-LM Decoding For Zero-Shot In-Context Machine Translation
No ratings yet
2024 - Anti-LM Decoding For Zero-Shot In-Context Machine Translation
18 pages
Chao 等 - 2024 - JailbreakBench An Open Robustness Benchmark for Jailbreaking Large Language Models
No ratings yet
Chao 等 - 2024 - JailbreakBench An Open Robustness Benchmark for Jailbreaking Large Language Models
25 pages
Smart Fuzzing Based On Automatic Input Format Reverse Engineering
No ratings yet
Smart Fuzzing Based On Automatic Input Format Reverse Engineering
18 pages
flaming hot initiation
No ratings yet
flaming hot initiation
9 pages
LLM time series
No ratings yet
LLM time series
24 pages
Your Instructions Are Not Always Helpfu
No ratings yet
Your Instructions Are Not Always Helpfu
10 pages
Lm-I: S O - F L G - L L M: Nfinite Imple N THE LY Ength Ener Alization For Arge Anguage Odels
No ratings yet
Lm-I: S O - F L G - L L M: Nfinite Imple N THE LY Ength Ener Alization For Arge Anguage Odels
14 pages
LOOKAHEAD DECODING
No ratings yet
LOOKAHEAD DECODING
16 pages
Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories
No ratings yet
Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories
17 pages
Ali 等 - 2024 - Prompt-SAW Leveraging Relation-Aware Graphs for Textual Prompt Compression
No ratings yet
Ali 等 - 2024 - Prompt-SAW Leveraging Relation-Aware Graphs for Textual Prompt Compression
16 pages
2401.06468v3
No ratings yet
2401.06468v3
23 pages
2402.02057v1
No ratings yet
2402.02057v1
16 pages
DLAP
No ratings yet
DLAP
15 pages
Agent Coder 2312.13010v2
No ratings yet
Agent Coder 2312.13010v2
21 pages
CodeTree
No ratings yet
CodeTree
16 pages
Unlocking Efficiency in Large Language Model Inference
No ratings yet
Unlocking Efficiency in Large Language Model Inference
17 pages
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark For Chinese Large Language Models
No ratings yet
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark For Chinese Large Language Models
14 pages
Instruction Position Matters in Sequence Generation With Large Language Models
No ratings yet
Instruction Position Matters in Sequence Generation With Large Language Models
11 pages
Connecting Large Language Models With Evolutionary Algorithms Yields Powerful Prompt
No ratings yet
Connecting Large Language Models With Evolutionary Algorithms Yields Powerful Prompt
18 pages
Multilingual Machine Translation With Large Language Models: Empirical Results and Analysis
No ratings yet
Multilingual Machine Translation With Large Language Models: Empirical Results and Analysis
14 pages
2502.05664v1
No ratings yet
2502.05664v1
27 pages
2502.12859v1
No ratings yet
2502.12859v1
20 pages
From Quantity To Quality - Boosting LLM Performance With Self-Guided Data Selection For Instruction Tuning
No ratings yet
From Quantity To Quality - Boosting LLM Performance With Self-Guided Data Selection For Instruction Tuning
34 pages
Details For Prompt Engineering
No ratings yet
Details For Prompt Engineering
8 pages
2502.12962v1 - Unknown
No ratings yet
2502.12962v1 - Unknown
21 pages
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities
No ratings yet
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities
18 pages
2308.04748v2
No ratings yet
2308.04748v2
13 pages
D2LLM Decomposed and Distilled LLMs For Semantic Search 1719397862
No ratings yet
D2LLM Decomposed and Distilled LLMs For Semantic Search 1719397862
17 pages
On Discrete Prompt Optimization or Di Usion Models
No ratings yet
On Discrete Prompt Optimization or Di Usion Models
20 pages
Vulnerability-Oriented Directed Fuzzing For Binary
No ratings yet
Vulnerability-Oriented Directed Fuzzing For Binary
13 pages
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
No ratings yet
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
18 pages
2411.16035v1
No ratings yet
2411.16035v1
32 pages
LLMs Reasoning
No ratings yet
LLMs Reasoning
18 pages
2024_SCL_CVD_Supervised_contrastive_learning_for_code_vulnerability
No ratings yet
2024_SCL_CVD_Supervised_contrastive_learning_for_code_vulnerability
19 pages
2302.06527v4
No ratings yet
2302.06527v4
21 pages
2404.12038v3
No ratings yet
2404.12038v3
26 pages
W S M LLM F: T E D, M F M: HEN Caling Eets Inetuning HE Ffect of ATA Odel and Inetuning Ethod
No ratings yet
W S M LLM F: T E D, M F M: HEN Caling Eets Inetuning HE Ffect of ATA Odel and Inetuning Ethod
20 pages
seminar final presentation
No ratings yet
seminar final presentation
11 pages
Low-Rank Adaptation Method For Wav2vec2-Based Fake Audio Detection
No ratings yet
Low-Rank Adaptation Method For Wav2vec2-Based Fake Audio Detection
6 pages
7469_Magicoder_Empowering_Code
No ratings yet
7469_Magicoder_Empowering_Code
26 pages
LLM Hallucinations in Practical Code Generation
No ratings yet
LLM Hallucinations in Practical Code Generation
13 pages
Mixture-of-Agents Enhances Large Language Model Capabilities
No ratings yet
Mixture-of-Agents Enhances Large Language Model Capabilities
15 pages
What Language Model Architecture and Pretraining Objective Work Best For Zero-Shot Generalization
No ratings yet
What Language Model Architecture and Pretraining Objective Work Best For Zero-Shot Generalization
26 pages
Moe-I: Compressing Mixture of Experts Models Through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
No ratings yet
Moe-I: Compressing Mixture of Experts Models Through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
11 pages
Legal AI
No ratings yet
Legal AI
21 pages
2406.18219v2
No ratings yet
2406.18219v2
19 pages
Evolutionary Algo
No ratings yet
Evolutionary Algo
24 pages
Learning Applied to Ground Vehicles: Enhancing Ground Vehicle Performance through Computer Vision Learning
From Everand
Learning Applied to Ground Vehicles: Enhancing Ground Vehicle Performance through Computer Vision Learning
Fouad Sabry
No ratings yet
UNIVERSITY MANAGEMENT SYSTEM Full Report
No ratings yet
UNIVERSITY MANAGEMENT SYSTEM Full Report
20 pages
Api
No ratings yet
Api
7 pages
Abhishek OS FINAL
No ratings yet
Abhishek OS FINAL
46 pages
UCI 202 Topic 7 Notes
No ratings yet
UCI 202 Topic 7 Notes
17 pages
Qt-Interface For Volume Visualization
No ratings yet
Qt-Interface For Volume Visualization
48 pages
Software Case Tools Overview
100% (1)
Software Case Tools Overview
6 pages
Thesis Attendance Monitoring System
100% (2)
Thesis Attendance Monitoring System
6 pages
Intro To Fmdtools
No ratings yet
Intro To Fmdtools
38 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
37 pages
Ezhil (எழல) : A Tamil Programming Language
No ratings yet
Ezhil (எழல) : A Tamil Programming Language
6 pages
Jahid Hossan Jony: Curriculum Vitae
No ratings yet
Jahid Hossan Jony: Curriculum Vitae
3 pages
Lukas Dagne Thesis
No ratings yet
Lukas Dagne Thesis
37 pages
Mobile APP HANA Dashboard Not Working
No ratings yet
Mobile APP HANA Dashboard Not Working
4 pages
LabVIEW and Web Browser Based UIs
No ratings yet
LabVIEW and Web Browser Based UIs
22 pages
Java PTU
No ratings yet
Java PTU
76 pages
Objective:: Explain How To Take Input A Single Character and Display It, Also Displaying String
No ratings yet
Objective:: Explain How To Take Input A Single Character and Display It, Also Displaying String
3 pages
Openshift Container Platform 4.7 Cli Tools en Us
No ratings yet
Openshift Container Platform 4.7 Cli Tools en Us
123 pages
ATTENDANCE MANAGEMENT SYSTEM (PD) (1) B
No ratings yet
ATTENDANCE MANAGEMENT SYSTEM (PD) (1) B
10 pages
Online Telon Overview
No ratings yet
Online Telon Overview
14 pages
Amazon AWS: Functional Specification
No ratings yet
Amazon AWS: Functional Specification
8 pages
Computer Peripherals
No ratings yet
Computer Peripherals
5 pages
Harish Gadiyaram_VLNR Group_Salesforce Lead Consultant
No ratings yet
Harish Gadiyaram_VLNR Group_Salesforce Lead Consultant
5 pages
For All Subjects Class Notes & PDF Files Download Careerwill App Now
No ratings yet
For All Subjects Class Notes & PDF Files Download Careerwill App Now
146 pages
Digital Forensic & Incident Response
No ratings yet
Digital Forensic & Incident Response
60 pages
GE8151 Notes
No ratings yet
GE8151 Notes
118 pages
Cfadisk
No ratings yet
Cfadisk
9 pages
Railway_Reservation_System
No ratings yet
Railway_Reservation_System
6 pages
HTML Css Javascript
No ratings yet
HTML Css Javascript
88 pages
Angular Questions & Answers
No ratings yet
Angular Questions & Answers
21 pages
Diskless Remote Boot in Linux
No ratings yet
Diskless Remote Boot in Linux
14 pages