Smart Fuzzing Based On Automatic Input Format Reverse Engineering
Smart Fuzzing Based On Automatic Input Format Reverse Engineering
Ji Shi1,2,3,4 , Zhun Wang∗ 2 , Zhiyao Feng2,6 , Yang Lan2 , Shisong Qin2 , Wei You5 , Wei Zou1,4 ,
Mathias Payer6 , and Chao Zhang†2
1 {CAS-KLONAT‡, BKLONSPT§}, Institute of Information Engineering, Chinese Academy of Sciences
2 Institute for Network Science and Cyberspace & BNRist, Tsinghua University; Zhongguancun Lab
3 Singular Security Lab, Huawei Technologies
4 School of Cyber Security, University of Chinese Academy of Sciences
5 Renmin University of China
6 EPFL
0x00h: 7F 45
Magic Number
4C 46 01
Class
01
Data
01
Version
00
Osabi
00
Abiver.
00 00 00
Ei_pad
00 00 00 00
E_ident
Listing 1: Source code taken from readelf, which is respon-
0x10h: 02 00
E_type
03
E_machine
00 01 00
E_version
00 00 74 80 04
Address
08 34 00 00 00
Address of Program Header sible for parsing ELF format input.
0x20h: A4 00 00 00 00 00 00 00 34 00 20 00 02 00 28 00
Address of Section Header E_flags ELF Header Size Header Size Header Number Sec Header Size
1 # define BYTE_GET ( field ) byte_get ( field , sizeof ( field ))
0x30h: 04 00 03 00
Sec Header Num Sec Header Index 2 static bfd_boolean get_file_header ( Filedata * filedata
String Magic Enumeration Offset Size Checksum ){
3 ...
(a) ELF file header structure of 32bit. 4 filedata -> file_header . e_machine = BYTE_GET ( ehdr32 .
e_machine );
0 1 2 3 4 5 6 7 8 9 A B C D E F
5 filedata -> file_header . e_shnum = BYTE_GET ( ehdr32 .
0x00h: 7F 45 4C 46 02 01 01 00 00 00 00 00 00 00 00 00
Magic Number Class Data Version Osabi Abiver. Ei_pad E_ident e_shnum );
0x10h: 01 00 3E 00 01 00 00 00 00 00 00 00 00 00 00 00 6 ...
E_type E_machine E_version Address
0x20h: 00 00 00 00 00 00 00 00 58 D0 03 00 00 00 00 00 7 }
Address of Program Header Address of Section Header
00 00 00 00 40 00 00 00 00 00 40 00 3D 00 10 00
8 void init_dwarf_regnames_by_elf_machine_code ( unsigned
0x30h: E_flags ELF Header Size Header Size Header Number Sec Header Size Sec Header Num Sec Header Index int e_machine ){
String Magic Enumeration Offset Size Checksum 9 dwarf_regnames_lookup_func = NULL ;
10 switch ( e_machine ){
(b) ELF file header structure of 64bit. 11 case EM_386 :
Figure 1: ELF file header definition. 12 init_dwarf_regnames_i386 () ;
13 break ;
14 case EM_X86_64 :
An ELF file consists of several data structures (e.g., file 15 init_dwarf_regnames_x86_64 () ;
16 break ;
header, program header, and section header), each composed 17 ...
of several fields. For the file header structure as shown in 18 }
19 }
Figure 1, it has some fields consisting of consecutive bytes 20 static bfd_boolean process_file_header ( Filedata *
(e.g., magic number from offset 0x00 to 0x03) and some filedata ){
21 if ( header -> e_ident [ EI_MAG0 ] != ELFMAG0 ||...||
fields with single-byte values (e.g., class at offset 0x04). header -> e_ident [ EI_MAG3 ] != ELFMAG3 ){
Take the e_machine at offset 0x12 as an example, it indicates 22 return error ;
23 }
the machine architecture (e.g., AArch64, i386, or x86_64), 24 ...
which changes the structure of ELF (e.g., the length of address 25 init_dwarf_regnames_by_elf_machine_code ( filedata ->
file_header . e_machine )
fields can be four or eight bytes, marked with the red box). 26 ...
27 if ( header -> e_shstrndx != SHN_UNDEF && header ->
Listing 1 shows the code snippet of readelf, which e_shstrndx >= header -> e_shnum ){
parses inputs of ELF format. We can see that there are 28 printf (" corrupt ");
29 }
two main steps to parse the input file. The first step is to 30 ...
read the input and initialize certain variables correspond- 31 }
ing to the input fields. For example, line 3 to line 6 read
the fields from the input and initialize corresponding vari-
ables (e.g., file_header.e_machine). The next step is to Observation 2: The program code processing fields of
further process these initialized variables in the function different types shows different patterns. For example, an enu-
process_file_header. Different BBs are used to process meration variable (e.g., e_machine) is very likely to be pro-
different types of variables (i.e., input fields). For example, cessed by a switch-case statement, and a size variable (e.g.,
the four-byte magic number is compared one byte at a time e_shnum) may be processed by mathematical operations.
on line 21, while the e_machine is handled as an enumeration Observation 3: The structure of the input may differ, and
with a switch-case statement. the program will dispatch distinct code to parse the input. For
From this example, we observe the following: example, the variation of e_machine may indicate a different
Observation 1: In most cases, bytes in an indivisible field data structure, like the length of address fields marked with
are parsed together in one BB. For example, the two-byte the red box in Figure 1. If a new structure is found during
field e_machine is processed as a whole in the block starting fuzzing, the fuzzer should re-assign the power and use the
at line 4, and another block starting at line 10. For the field corresponding format knowledge to get more coverage.
e_shstrndx, it is parsed together in BBs at line 27. However, From these observations, we summarize that the existing so-
there are corner cases in which parts of one field (e.g., magic lutions have the following limitations. First, splitting fields at
number field parsed at line 21) can be parsed in different BBs. the instruction level is not generally appropriate. For example,
Developers’ code style and compiler optimizations would instructions in BYTE_GET may parse multiple fields and will
affect whether to parse them as one entity or separately. We cause different fields to be merged erroneously. Second, most
will illustrate this further in §5.1.1. existing solutions of field type identification rely on human-
Besides, one BB may process multiple fields and get ex- extracted code patterns , which are labor-intensive and cannot
ecuted multiple times at runtime. For example, BBs in the cover complicated cases. For instance, there could be multiple
BYTE_GET function are executed multiple times to read from BB patterns given one input field type. Third, existing format-
the input buffer and extract different fields to assign different aware fuzzing solutions have limitations when considering the
variables. power scheduling during fuzzing. For example, ProFuzzer [8]
Fuzzing with Power Scheduling
(§3.3)
AFLSmart with well-prepared pit files, the distributions vary
for different formats and programs. For PNG, the average of
Seed Format-Aware Power
Valuable Seed
Analysis Mutation Reassign v(s) decreases to 35.23%, and 29.90% of them are no less
than 50%, which is the key threshold to judge the quality of a
Field Type Classification
(§3.2) seed file in AFLSmart. In AIFORE, we design a new power
Binary
Taint (insn, offset) Field Feature Field Type
Analysis Extraction
CNN scheduling algorithm to re-assign the energy to those formats
Format Template that are seldom fuzzed.
Field Boundary Analysis
Initial Seed
(§3.1)
BB Level
Merge
Minimum
Cluster Field Boundary
3 Design
Growth Rate
movzx eax, byte ptr [rax+1];[1] 0.85 accuracy 0.15
Accuracy
cmp al, 45h;‘E’;[1] 1.0
jnz short loc_41856C
growth rate
0.80 0.10
ically, for programs in the training set, we collected some new -Os) against existing input format reverse engineering works,
file formats supported by them, and apply our trained model like Polyglot [3], Protocol Informatics (PI) project [30], Pro-
to predict the field types of the new formats. Fuzzer [8], AFL-Analyze [14], and TIFF-fuzzer [6] to mea-
The upper part of Table 5 shows the evaluation result. We sure the format reversing performance.
consider Top-2 accuracy because the Top-K suggestions from We collect the field boundary and the type results of
the model are still valuable for fuzzing. To be concrete, the AIFORE as described in §5.1.1 and §5.1.2, and compare the
Top-K field type knowledge can help the fuzzer reduce the results with ProFuzzer, AFL-Analyze, and TIFF-fuzzer. There
mutation space and find high-quality test cases faster. are also other tools like WEIZZ [9] that can extract input
We have two findings from this table: (1) AIFORE can formats. However, we did not compare the field boundary
predict the field type in unseen formats with high accuracy, accuracy of AIFORE with them since they cannot extract all
while the Top-1 accuracy is over 80% on average and the the field boundaries even though the target program parses
Top-2 accuracy is over 90%; (2) the model performance is the fields already. We analyze concrete cases to explain the
irrelevant to the programs’ optimization levels. Although reason for such false negatives in §Appendix B.
compiler optimizations might change the features of the code, Metrics. For the field boundary analysis, we calculate dif-
our model learns stable patterns across optimizations. ferent solutions’ accuracy as described in §5.1.1, i.e., the num-
• Experiment 4: For untrained programs, can the model ber of correctly identified fields divided by the total number
predict their (unseen) input formats? We apply the afore- of fields in the ground truth.
mentioned trained model to analyze untrained programs We delicately process the results for the field type analysis
(marked as X in Table 2) and predict their input formats. We since different solutions focus on different field types. For
choose 7 untrained programs (and 2 protocols shown in Ap- example, ProFuzzer classifies the field types into 6 categories
pendix) and corresponding formats to test the accuracy of while AFL-Analyze recognizes only 3 semantic types. We
AIFORE. These untrained programs are chosen based on two manually check the result produced by different solutions
criteria: (1) they should be able to parse the chosen formats, and calculate the accuracy accordingly. For AFL-Analyze, we
and (2) they do not share libraries, which process the chosen only check if its results match the 3 semantic types they define
formats, with programs in the training set. The latter require- (i.e., raw data, magic number, and length). For ProFuzzer,
ment is applied for a fair comparison. Note that, the chosen we similarly check its results. For AIFORE, we manually
formats (e.g., ELF) may be processed by other trained pro- check if the Top-1 result matches the 6 semantic types we
grams (e.g., readelf) in the training set. But still, they are define in §3.2.1.
unseen to the untrained program (e.g., elfutils-readelf). In addition, we also measure the average time cost of ana-
The bottom of Table 5 shows the evaluation result. We lyzing one input format for each solution.
learn that AIFORE can predict the field type with a Top-1 Test Targets. To be fair, we choose 4 formats and programs
accuracy of 81% and a Top-2 accuracy of 88% on average. with different accuracy levels from the training set (Table 4)
Even when we use mixed test data from different compiler and from unseen formats and programs (Table 5) respectively.
optimizations, AIFORE can also predict the field type based We intend to investigate what is the performance of other
on how the program parses the field. solutions regarding the targets with different accuracy levels
under AIFORE. The chosen targets are shown in Table 6. For
5.2 RQ2: Comparison of Format Extraction the programs, we compile all the targets with their default
compiler optimizations. Regarding input samples, we choose
In this section, we compare AIFORE with the CNN model the ones that are not in the training set for the trained programs
trained from programs with mixed optimizations (i.e., -O0 to and randomly choose some input samples for the untrained
Table 6: Field boundary/type accuracy comparison between different input format reverse engineering solutions.
Large Seeds Small Seeds
Format Program
Size(bytes) AIFORE ProFuzzer AFL-Analyze TIFF-fuzzer Size(bytes) AIFORE ProFuzzer AFL-Analyze TIFF-fuzzer
ELF readelf 808 96.43%/87.73% 37.40%66.25% 43.73%/40.00% 94.30%/(N/A) 324 98.91%/91.23% 74.26%/66.67% 55.70%/38.60% 97.40%/(N/A)
GIF gifsicle 695 97.64%/87.31% 12.59%/52.94% 6.44%/11.76% 71.96%/(N/A) 198 97.64%/72.23% 50.29%/60.00% 31.96%/12.00% 65.30%/(N/A)
Trained
TIFF tiffdump 448 82.23%/89.33% 27.13%/42.11% 13.19%/21.05% 81.30%/(N/A) 166 84.33%/88.45% 37.01%/40.00% 21.05%/21.67% 82.33%/(N/A)
TTF freetype 542 67.43%/83.22% 30.28%/65.69% 9.93%/20.44% 5.56%/(N/A) 148 72.23%/85.32% 2.20%/0.00% 1.35%/21.21% 40.00%/(N/A)
PCAP tcpdump 894 88.64%/82.34% 28.84%/71.14% 0.00%/39.04% 81.20%/(N/A) 114 93.18%/84.23% 73.18%/77.78% 39.66%/44.44% 85.60%/(N/A)
WAV sfinfo 572 100.00%/95.55% 63.85%/66.67% 48.77%/57.69% 100.00%/(N/A) 44 100.00%/91.22% 56.25%/63.64% 56.25%/45.45% 100.00%/(N/A)
Untrained
BMP jasper 630 45.34%/83.54% 12.11%/91.18% 1.35%/32.35% 24.24%/(N/A) 58 45.34%/84.33% 75.00%/82.35% 28.12%/23.53% 22.22%/(N/A)
XLS xls2csv 6656 81.25%/74.56% (N/A)/(N/A)* 22.16%/0.56% 25.50%/(N/A) 5632 94.40%/78.32% (N/A)/(N/A)* 0.00%/51.02% 33.33%/(N/A)
Average 1406 82.37%/85.45% 26.53%/57.00% 18.20%/27.86% 60.51%/(N/A) 836 85.75%/84.42% 46.02%/48.81% 29.26%/32.24% 65.77%/(N/A)
* The tool fails to get the result within 24 hours
programs. The details of the training set and validation set for we have a trained model with a stable prediction time. We
field type prediction are described in the first part of §5.1.2. find that in Profuzzer and AFL-Analyze, the profiling phase
Input Size. The input size will affect the run-time perfor- consumes most of the time, which is strongly related to the
mance of format reverse engineering. AIFORE, ProFuzzer, input size. However, the taint analysis in AIFORE (and TIFF-
and AFL-Analyze all rely on dynamic analysis to predict the fuzzer) is not very sensitive to the input size. Since AFL-
field type, but with different methods. Both ProFuzzer and Analyze and TIFF-fuzzer perform some rough analysis, they
AFL-Analyze mutate each input byte and rerun the program to have better execution time but achieve lower accuracy than
get the coverage bitmap as the profile of the current execution. AIFORE on average.
Based on the variation of this profile, they can analyze the Moreover, we also conduct reverse engineering on two
type features of each byte and combine consecutive bytes of protocols. The detailed results are presented in Appendix
similar features as a field with the corresponding type. Thus, a A. Compared with other tools [3, 30, 31], AIFORE can give
larger file may consume more time to get the result. AIFORE not only more accurate format knowledge in terms of field
is based on the taint trace and the CNN model to infer the boundary and type but also with more details.
field boundaries and field types. Although the model is only
trained once, the taint analysis requires a time-consuming 5.3 RQ3: Comparison of Fuzzing Performance
analysis. To better understand how the input size affects the
performance of each work, we split the input files into dif- There are already some fuzzers [2, 6, 8, 13, 15, 32] that try to
ferent groups according to their size and observe the time to extract format knowledge and perform power scheduling to
complete the analysis in each group. optimize the fuzzing process. We compare AIFORE against
them to investigate how much our format analysis and power
Results. Table 6 demonstrates the accuracy of field bound-
scheduling can improve fuzzing efficiency. Note that for the
ary and type analysis, and Table 7 shows the average cost of
field type classification, once the model has been built, we do
time to parse an input, respectively.
not need to retrain the model during the fuzzing process.
From Table 6, we can learn that AIFORE achieves higher Target Programs and Seeds. For the programs, we use 15
accuracy both in field boundary recognition and field type programs to parse files as shown in Table 2, of which 6 are
prediction. Besides, AIFORE performs better in trained pro- trained and 9 have not been seen by AIFORE. For each file
grams than those in untrained. This is consistent with our type, we randomly choose one input file as the initial seed for
experience since the model has learned the pattern of how all fuzzers.
those programs parse different field types accurately. Fuzzers. We compare AIFORE with 6 fuzzers, includ-
The average time to parse a file with AIFORE (and TIFF- ing format-aware fuzzers, format-unware but popular fuzzers,
fuzzer) does not differ significantly for different size classes, and power scheduling optimization fuzzers, i,e, AFL [33],
while ProFuzzer and AFL-Analyze spend much more time AFLFast [15], ProFuzzer [8], TIFF-fuzzer [6], WEIZZ [9],
parsing larger files. In Table 7, B represents boundary identifi- and EcoFuzz [13]. AFL is one of the most popular grey-box
cation, the count includes the required time for taint analysis fuzzing tools, and there are many fuzzers built on top of AFL.
(which requires most of the total time). T represents type AFLFast and EcoFuzz optimize AFL by prioritizing the seeds
prediction, its time cost remains stable during the test since that may lead to new coverage. ProFuzzer has a dynamic
Table 8: Basic block coverage after 24 hours.
Format Program1 AIFORE (B)2 AIFORE (B+T)2 AIFORE (B+T+P)2 AFL AFLFast EcoFuzz ProFuzzer TIFF-fuzzer WEIZZ
BMP jasper 7123 7705 7887 6694 5926 7777 7437 5331 6344
readelf* 9880 13798 17985 9652 12020 13791 17872 2426 15720
ELF
elfutils-readelf 7213 7541 8308 6995 5873 6608 7978 2180 6519
gifsicle* 4702 5062 5435 4528 4634 4675 5315 4146 4578
magick* 14221 14414 16685 13529 13684 12335 14512 8351 10002
GIF
gif2tiff* 2458 2451 2576 2372 2466 2500 2580 2014 2383
exiv2 16529 17500 19245 14117 14417 16602 18952 7637 11437
JPG jhead 1228 1258 1281 1244 1244 1244 1253 856 1246
PCAP tcpdump 16772 19201 22963 14535 13743 19823 22930 1561 11837
PNG pngtest 3219 3223 3629 3217 3239 3268 3617 2846 4802
TIFF tiffdump* 1145 1155 1177 1054 1073 1104 1135 522 1128
TTF freetype_parser* 9893 10520 10370 6362 6309 10030 7278 4513 7316
WAV sfinfo 2477 2501 2632 2326 2326 2379 2498 2019 2566
XLS xls2csv 2476 2699 2706 2512 2510 2619 2424 1841 2470
ZIP 7za 24696 25266 28159 21429 22089 25226 27979 13299 21833
1* for trained pairs of formats and programs.
2 B for Field Boundary; T for Field Type; P for Power Scheduling Algorithm.
probing stage to infer the field boundary and field type for
improving fuzzing efficiency. TIFF-fuzzer and WEIZZ use mean
format knowledge to increase fuzzing efficiency. 80%
60%
Relative Increment
5.3.1 Code Coverage Result
40%
For each target program, we run all fuzzers for 24 hours with 20%
5 repetitions. Then we measure the average BB coverage
0%
instead of path coverage since not all the fuzzers use the same
metric to calculate the path. −20%
We can draw the following conclusions from the result (BTP) (BTP) (B) (BT) (BTP)
in Table 8. First, AIFORE increases the coverage signifi- ProFuzzer WEIZZ AFL (B) (BT)
cantly for most targets except pngtest for which WEIZZ Figure 7: Coverage comparison between AIFORE and other
performs the best. The reason is that WEIZZ can not only SOTA fuzzers, and coverage increment by each module of
detect the checksum field in the file, but it can also correct AIFORE.
the checksum values. However, AIFORE does not support
checksum value correction even though it can be aware that
the field is a checksum field. For all the targets, AIFORE programs from Table 4 for 7 days. With the help of format
(B+T+P) (i.e., enable the field boundary, type analysis, and knowledge, AIFORE finds 34 bugs (20 are uncovered by
utilize the power scheduling algorithm) has an average 6% other fuzzers) in total after manual deduplication, including
and 26% increment on average compared with ProFuzzer and 10 buffer overflow bugs (CWE-122), 18 NULL pointer deref-
WEIZZ respectively, as shown in Figure 7. erence bugs (CWE-476), and 6 double-free bugs (CWE-617).
Second, the coverage increment with AIFORE is signifi- Among the 20 bugs that are uncovered by other fuzzers, 18
cant, even though the target program and the file format are of them have been known to vendors before. AIFORE finds
unseen. 2 vulnerabilities in the newest version of xls2csv, and we
Third, AIFORE achieves the best performance. For TIFF- have responsibly reported both bugs to the Launchpad, and
fuzzer, since it aims at maximizing the likelihood of triggering the track IDs are 1901462 and 19014635 . Here we analyze 2
a bug, we find it is not good at increasing the code coverage. of 34 bugs to illustrate how AIFORE finds them.
Although ProFuzzer also performs better than format-unaware Sfinfo. Sfinfo is a program for parsing WAV files and show-
fuzzers like AFL and AFLFast in most cases, its analysis time ing their properties or metadata to users. AIFORE reports a
is proportional to the input size, which makes it not scaleable heap overflow bug during the testing. The bug locates in a
to large inputs. Observe that ProFuzzer performs the worst function for parsing an enumeration field called FormatTag.
in XLS except TIFF-fuzzer. The reason is that the minimum The field represents the way the wave data is stored and affects
size of the XLS seed is above 1k bytes which is too large for how the following bytes will be processed. If the FormatTag
ProFuzzer. field in the input file is mutated to WAVE_FORMAT_ADPCM, then
the length data might be calculated incorrectly, which then
5.3.2 Bugs Found by AIFORE causes a heap overflow. The key to triggering this bug is
Coordinated Vulnerability Disclosure. To explore 5 https://launchpad.net/ The bugs have not been opened to the pub-
AIFORE’s ability of bug finding, we fuzz the real-world lic when we submit the paper.
setting FormatTag equal to WAVE_FORMAT_ADPCM, which is Steelix [34] identifies magic numbers in the input to pass
easier to achieve if the fuzzer is aware of the boundary and value validation. But it does not analyze fields of other types.
type of this field. The FormatTag is an enumeration field and Intriguer [2] utilizes light-weighted taint analysis to find multi-
AIFORE identifies the boundary and the type of this field byte fields processed by instructions in the program, then uses
correctly. Then it mutates the field accordingly and triggers the field-level knowledge to optimize symbolic execution.
this bug. From the perspective of file format, it can only extract a small
Readelf. During the testing of readelf, AIFORE finds a part of the fields because of its incomplete traces. WEIZZ [9]
stack overflow bug. The bug can be triggered when a 4-bytes splits the input into fields according to dependencies between
offset field is mutated to an invalid value. The offset field input bytes and comparison instructions, which ignores the
serves as indicating the location of a string. When the string’s bytes not affecting the control flow of the program.
size is bigger than 256 bytes. a call of sprintf will cause AIFORE extracts more accurate, concrete, and complete
the stack buffer overflow. The root cause of this bug can be format knowledge of field boundaries and semantic types,
concluded as (1) the offset field must be set to point to an which improves fuzzing. Besides, AIFORE utilizes a novel
invalid string region, and (2) the string field pointed to must power scheduling algorithm to balance the power for different
be long enough to trigger the stack overflow. We backtrack formats.
the log and find that AIFORE identifies the 4-bytes offset
field correctly, and mutates the field as a whole to an invalid
6.2 Input Format Reverse Engineering
value. The invalid offset points to a string field and AIFORE
mutates the string field with a longer length. So it can trigger Input format reverse engineering works can be classified into
the bug. two main categories regarding the problem they solve.
Field Boundary Identification. Identifying the boundary
5.4 RQ4: Contribution of Each Module of different fields in the input is fundamental to reversing the
format. Several works try to split the input into fields based on
In this section, we analyze how each module of AIFORE con- taint analysis [3, 4, 35–38] or trace analysis of a few network
tributes to the fuzzing performance. We investigate the BBs messages [5, 30, 39–41]. The closest work to AIFORE is
covered by different components during fuzzing in §5.3. The Tupni [4], which uses the weighted taint information in the
result is shown from column 3 to column 5 in Table 8, and instruction, as long as a greedy algorithm to identify different
we draw a boxplot in Figure 7. We have the following con- fields. However, the instruction-level taint information may
clusions. First, the module of boundary identification helps produce false positives since it does not consider the semantic.
AIFORE to mutate the bytes which have the same semantic Besides, Tupni [4] is a coarse-grained method in which the
and belongs to one field, and it boosts AIFORE to cover 9.3% record it identifies may contain several fields, rather than
more BBs than AFL. Besides, AIFORE uses the AI model a single field. AIFORE considers the BB as the minimum
to predict the field type, which helps the fuzzer know how functional unit instead of the instruction. AutoFormat [35]
to mutate. Benefiting from the field type prediction module, combines dynamic taint analysis and call stack to build a
AIFORE increases another 6.9% code coverage on average field tree. It relies much on the tokenization and the operation
for all the targets. At last, AIFORE utilizes a novel power of the tree itself, rather than on how the program processes
scheduling algorithm to help the fuzzer assign more energy different fields. MIMID [42] and AUTOGRAM [43] also rely
to those formats which are not mutated adequately. It brings on dynamic taint analysis and call stack analysis. They are
an 8.8% increment in coverage. Different from AFLSmart, used to extract context-free grammar for text-based inputs
our approach does not rely on the constraints in the pit file. rather than context-sensitive ones like binary-based inputs
We further analyze how it works with a case study in Ap- which are more complex. Reverx [44] splits the input into
pendix §B.3. fields by predefined delimiters and it cannot be used for binary
messages. There are some other works [30, 45] that try to
6 Related Work identify the field by analyzing a large number of high-quality
inputs. Nevertheless, AIFORE can extract the field knowledge
with only one input.
6.1 Format-Aware Fuzzing
Field Type Identification. Identifying the type of distinct
Format-aware fuzzers try to understand the format of inputs fields is also an important problem. Current works [3, 34, 46,
to increase the fuzzing efficiency. TIFF-fuzzer [6] performs 47] mainly utilize dynamic program analysis and predefined
bug-directed mutation by inferring some program variable rules to classify the input field into types. Dispatcher [48] iden-
types (e.g., int, char*) of input fields. ProFuzzer [8] uses tifies the field type by leveraging taint analysis and heuristics
predefined rules to deduce fields and corresponding types. rules similar to TIFF-fuzzer. Polyglot [3] tries to identify the
However, adding more data types is labor-intensive. Besides, keywords and separators in the protocol message. It also tries
the rules may not be accurate enough to cover all the cases. to identify the length field by heuristics rules. However, the
identified field types are limited, and the heuristic rules have 8 Conclusion
limitations when the message is not strict. For example, a
protocol may allow multiple delimiters. Input format knowledge is useful for fuzzing to discover vul-
Different from the existing works, AIFORE utilizes a nerabilities in programs. Existing approaches struggle at cor-
machine-learning model to extract the feature and predicts the rectly recognizing or applying the input format. We introduce
field types automatically. We feed multi-dimension seman- AIFORE to automatically reverse engineer input format and
tic features from the tainted instructions, format strings, and later guide the fuzzing process. Specifically, we propose to
library calls in related BBs to train the model. In this way, utilize taint analysis to infer basic blocks responsible for pro-
AIFORE does not require extra manually-defined rules and cessing each input byte, and group input bytes with a mini-
therefore it is more general. mum cluster algorithm. Further, we utilize a neural network
to infer the type of input fields based on the behavior of basic
blocks. Based on the input knowledge, we present a novel
6.3 Binary Analysis with AI power scheduling algorithm for fuzzers. A systematic eval-
uation shows that this solution has better effectiveness and
The closest works in AI-based binary analysis include bi- efficiency than existing baselines.
nary similarity detection [23, 49] and semantic information
recovery [50, 51]. For the binary similarity detection, [23]
utilizes BERT and CNN to find similar code. [52] uses Struc- Acknowledgements
ture2vec to vectorize the CFGs. Different from existing works,
This work was supported by the National Key Research and
AIFORE aims at classifying the input fields into different
Development Program of China (2021YFB2701000), Na-
categories, which is a classification problem instead of an
tional Natural Science Foundation of China (61972224), Bei-
embedding problem. There are also some works [50, 51] that
jing National Research Center for Information Science and
try to recover the semantic information (e.g., function names,
Technology (BNRist) under Grant BNR2022RC01006.
variable types) from the binary program with AI. However,
the program variable type is simpler than the field type. For
example, a variable of int type may represent an offset or References
a size field. AIFORE aims to recover the semantic type of [1] J. Chen, J. Jiang, H. Duan, N. Weaver, T. Wan, and V. Paxson, “Host
an input field, which is harder but more useful. of Troubles: Multiple Host Ambiguities in HTTP Implementations,”
in Proceedings of the 2016 ACM SIGSAC Conference on Computer
and Communications Security. Vienna Austria: ACM, Oct. 2016, pp.
1516–1527.
7 Limitation
[2] M. Cho, S. Kim, and T. Kwon, “Intriguer: Field-level constraint solving
for hybrid fuzzing,” in Proceedings of the 2019 ACM SIGSAC Confer-
While AIFORE has good accuracy for field boundary detec- ence on Computer and Communications Security, 2019, pp. 515–530.
tion and type analysis, some limitations remain. As described, [3] J. Caballero, H. Yin, Z. Liang, and D. Song, “Polyglot: Automatic
our method fundamentally relies on dynamic taint analysis. extraction of protocol message format using dynamic binary analy-
sis,” in Proceedings of the 14th ACM conference on Computer and
Thus, the key limitation of AIFORE is that how the program
communications security, 2007, pp. 317–329.
parses the input highly affects the result. For example, if a
[4] W. Cui, M. Peinado, K. Chen, H. J. Wang, and L. Irun-Briz, “Tupni:
program does not parse some fields, AIFORE cannot extract Automatic reverse engineering of input formats,” in Proceedings of
the format knowledge. Further, if a program parses individ- the 15th ACM conference on Computer and communications security,
ual bytes of a field separately, AIFORE may produce false 2008, pp. 391–402.
positives. However, we can overcome this issue by feeding [5] W. Cui, J. Kannan, and H. J. Wang, “Discoverer: Automatic proto-
the input to several programs that can parse this format. The col reverse engineering from network traces.” in USENIX Security
Symposium, 2007, pp. 1–14.
second limitation is that our analysis runs at byte granularity,
[6] V. Jain, S. Rawat, C. Giuffrida, and H. Bos, “Tiff: using input type
which means bit-level fields cannot be analyzed. Supporting inference to improve fuzzing,” in Proceedings of the 34th Annual
bit-level analysis is technically possible but requires further Computer Security Applications Conference, 2018, pp. 505–517.
engineering and optimization. The byte-level analysis also [7] Z. Lin, X. Zhang, and D. Xu, “Automatic reverse engineering of data
indicates that AIFORE does not support text-based input. The structures from binary execution,” in Proceedings of the 11th Annual
minimum unit of text-based input is a keyword rather than Information Security Symposium, 2010, pp. 1–1.
a byte. Third, it is hard for AIFORE to analyze the cases [8] W. You, X. Wang, S. Ma, J. Huang, X. Zhang, X. Wang, and B. Liang,
of input encryption or code obfuscation (e.g., in malware or “Profuzzer: On-the-fly input type probing for better zero-day vulnera-
bility discovery,” in 2019 IEEE Symposium on Security and Privacy
ransomware). There is no obvious format information in en- (SP). IEEE, 2019, pp. 769–786.
crypted input, and the developer may also use obfuscation [9] A. Fioraldi, D. C. D’Elia, and E. Coppa, “Weizz: Automatic grey-box
code to hide the operation pattern to parse the file. There are fuzzing for structured binary formats,” in Proceedings of the 29th ACM
several orthogonal works, for example, Reformat [38] that try SIGSOFT International Symposium on Software Testing and Analysis,
to reverse the format of encrypted input. 2020, pp. 1–13.
[10] V. Pham, M. Böhme, A. E. Santosa, A. R. Caciulescu, and A. Roy- [31] A. Orebaugh, G. Ramirez, and J. Beale, Wireshark & Ethereal network
choudhury, “Smart greybox fuzzing,” IEEE Transactions on Software protocol analyzer toolkit. Elsevier, 2006.
Engineering, 2019.
[32] D. She, A. Shah, and S. Jana, “Effective Seed Scheduling for Fuzzing
[11] “The peach project,” https://www.peach.tech/. with Graph Centrality Analysis,” in 2022 IEEE Symposium on Security
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica- and Privacy (SP). IEEE Computer Society, Apr. 2022, pp. 1558–1558.
tion with deep convolutional neural networks,” Advances in neural [33] M. Zalewski, “American fuzzy lop,” http://lcamtuf.coredump.cx/afl/,
information processing systems, vol. 25, pp. 1097–1105, 2012. 2014.
[13] T. Yue, P. Wang, Y. Tang, E. Wang, B. Yu, K. Lu, and X. Zhou, [34] Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu,
“{EcoFuzz}: Adaptive {Energy-Saving} greybox fuzzing as a vari- “Steelix: program-state based binary fuzzing,” in Proceedings of the
ant of the adversarial {Multi-Armed} bandit,” in 29th USENIX Security 2017 11th Joint Meeting on Foundations of Software Engineering, 2017,
Symposium (USENIX Security 20), 2020, pp. 2307–2324. pp. 627–637.
[14] “Automatically inferring file syntax with afl-analyze,” 2016, https:// [35] Z. Lin, X. Jiang, D. Xu, and X. Zhang, “Automatic protocol format
lcamtuf.blogspot.com/2016/02/say-hello-to-afl-analyze.html. reverse engineering through context-aware monitored execution.” in
[15] M. Böhme, V.-T. Pham, and A. Roychoudhury, “Coverage-based grey- NDSS, vol. 8. Citeseer, 2008, pp. 1–15.
box fuzzing as markov chain,” IEEE Transactions on Software Engi- [36] P. M. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda, “Prospex:
neering, vol. 45, no. 5, pp. 489–506, 2017. Protocol specification extraction,” in 2009 30th IEEE Symposium on
[16] M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspec- Security and Privacy. IEEE, 2009, pp. 110–125.
tives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015.
[37] B. Cui, F. Wang, T. Guo, G. Dong, and B. Zhao, “Flowwalker: a fast
[17] J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, “Cnn-rnn: A and precise off-line taint analysis framework,” in 2013 Fourth Interna-
unified framework for multi-label image classification,” in Proceedings tional Conference on Emerging Intelligent Data and Web Technologies.
of the IEEE conference on computer vision and pattern recognition, IEEE, 2013, pp. 583–588.
2016, pp. 2285–2294.
[38] Z. Wang, X. Jiang, W. Cui, X. Wang, and M. Grace, “Reformat: Auto-
[18] J. Lee, T. Avgerinos, and D. Brumley, “Tie: Principled reverse engi- matic reverse engineering of encrypted messages,” in European Sympo-
neering of types in binary programs,” 2011. sium on Research in Computer Security. Springer, 2009, pp. 200–215.
[19] A. Slowinska, T. Stancescu, and H. Bos, “Howard: A dynamic excavator [39] S. Kleber, H. Kopp, and F. Kargl, “{NEMESYS}: Network message
for reverse engineering data structures.” in NDSS, 2011. syntax reverse engineering by analysis of the intrinsic structure of
[20] “Tool interface standard (tis) executable and linking format (elf) speci- individual messages,” in 12th {USENIX} Workshop on Offensive Tech-
ficatione,” 1995, https://refspecs.linuxfoundation.org/elf/elf.pdf. nologies ({WOOT} 18), 2018.
[21] Y. Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna, “Vex,” [40] I. Bermudez, A. Tongaonkar, M. Iliofotou, M. Mellia, and M. M. Mu-
https://github.com/angr/pyvex. nafò, “Towards automatic protocol field inference,” Computer Commu-
nications, vol. 84, pp. 40–51, 2016.
[22] H. Lee and H. Kwon, “Going deeper with contextual cnn for hyper-
spectral image classification,” IEEE Transactions on Image Processing, [41] J. Kannan, J. Jung, V. Paxson, and C. E. Koksal, “Semi-automated
vol. 26, no. 10, pp. 4843–4855, 2017. discovery of application session structure,” in Proceedings of the 6th
ACM SIGCOMM conference on Internet measurement, 2006, pp. 119–
[23] Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, and S. Wu, “Order mat-
132.
ters: Semantic-aware neural networks for binary code similarity detec-
tion,” in Proceedings of the AAAI Conference on Artificial Intelligence, [42] R. Gopinath, B. Mathis, and A. Zeller, “Mining input grammars from
vol. 34, no. 01, 2020, pp. 1145–1152. dynamic control flow,” in Proceedings of the 28th ACM Joint Meeting
on European Software Engineering Conference and Symposium on the
[24] A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren,
Foundations of Software Engineering, ser. ESEC/FSE 2020. New York,
G. Grieco, and D. Brumley, “Optimizing seed selection for
NY, USA: Association for Computing Machinery, 2020, pp. 172–183.
fuzzing,” in 23rd USENIX Security Symposium (USENIX Security
14). San Diego, CA: USENIX Association, Aug. 2014, pp. [43] M. Höschele and A. Zeller, “Mining input grammars with AUTO-
861–875. [Online]. Available: https://www.usenix.org/conference/ GRAM,” in 2017 IEEE/ACM 39th International Conference on Soft-
usenixsecurity14/technical-sessions/presentation/rebert ware Engineering Companion (ICSE-C), 2017, pp. 31–34.
[25] S. Rawat, V. Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos, [44] J. Antunes, N. Neves, and P. Verissimo, “Reverx: Reverse engineering
“Vuzzer: Application-aware evolutionary fuzzing.” in NDSS, vol. 17, of protocols,” 2011.
2017, pp. 1–14.
[45] I. Bermudez, A. Tongaonkar, M. Iliofotou, M. Mellia, and M. M. Mu-
[26] V. P. Kemerlis, G. Portokalidis, K. Jee, and A. D. Keromytis, “libdft: nafo, “Automatic protocol field inference for deeper protocol under-
Practical dynamic data flow tracking for commodity systems,” in Pro- standing,” in 2015 IFIP Networking Conference (IFIP Networking).
ceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual IEEE, 2015, pp. 1–9.
Execution Environments, 2012, pp. 121–132.
[46] T. Wang, T. Wei, G. Gu, and W. Zou, “Taintscope: A checksum-aware
[27] F. Wang and Y. Shoshitaishvili, “Angr-the next generation of binary directed fuzzing tool for automatic software vulnerability detection,”
analysis,” in 2017 IEEE Cybersecurity Development (SecDev). IEEE, in 2010 IEEE Symposium on Security and Privacy. IEEE, 2010, pp.
2017, pp. 8–9. 497–512.
[28] Lonami, “Autoit scripting language,” 2018, https://www.autoitscript. [47] Z. Lin and X. Zhang, “Deriving input syntactic structure from ex-
com/site/. ecution,” in Proceedings of the 16th ACM SIGSOFT International
[29] G. Klees, A. Ruef, B. Cooper, S. Wei, and M. Hicks, “Evaluating fuzz Symposium on Foundations of software engineering, 2008, pp. 83–93.
testing,” in Proceedings of the 2018 ACM SIGSAC Conference on [48] J. Caballero, P. Poosankam, C. Kreibich, and D. Song, “Dispatcher:
Computer and Communications Security, 2018, pp. 2123–2138. Enabling active botnet infiltration using automatic protocol reverse-
[30] M. A. Beddoe, “Network protocol analysis using bioinformatics algo- engineering,” in Proceedings of the 16th ACM conference on Computer
rithms,” Toorcon, 2004. and communications security, 2009, pp. 621–634.
[49] F. Zuo, X. Li, P. Young, L. Luo, Q. Zeng, and Z. Zhang, “Neural ma- The results of DNS and ARP format reverse engineering are
chine translation inspired binary code similarity comparison beyond shown in Figure 8 and Figure 9 respectively. The orange parts
function pairs,” arXiv preprint arXiv:1808.04706, 2018.
represent the wrong results. From the result, we can learn that
[50] J. He, P. Ivanov, P. Tsankov, V. Raychev, and M. Vechev, “Debin: Pre- AIFORE extracts more detailed and correct information than
dicting debug information in stripped binaries,” in Proceedings of the
2018 ACM SIGSAC Conference on Computer and Communications Wireshark in the green parts of DNS. From the RFC speci-
Security, 2018, pp. 1667–1680. fication [53] of DNS, we know that the QNAME field contains
[51] Y. David, U. Alon, and E. Yahav, “Neural reverse engineering of several labels, and each label consists of a byte representing
stripped binaries using augmented control flow graphs,” Proceedings of the length followed by a number of octets representing data.
the ACM on Programming Languages, vol. 4, no. OOPSLA, pp. 1–28, Wireshark marks the QNAME as a whole and is not able to split
2020.
it into individual labels. AIFORE extracts more detailed field
[52] X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, “Neural network- boundaries than Wireshark, as shown in the result. AIFORE
based graph embedding for cross-platform binary code similarity de-
tection,” in Proceedings of the 2017 ACM SIGSAC Conference on also identifies the field type correctly.
Computer and Communications Security, 2017, pp. 363–376.
[53] P. V. Mockapetris, “Rfc1035: Domain names-implementation and spec-
ification,” 1987. B Case Study
[54] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
D. Batra, “Grad-cam: Visual explanations from deep networks via In this section, we analyze some concrete examples to help
gradient-based localization,” in Proceedings of the IEEE international understand how AIFORE outperforms other state-of-art works
conference on computer vision, 2017, pp. 618–626. from the view of field boundary recognition and field type
classification. We take the ELF as an example to illustrate.
A Comparison of Protocol Reverse Engineer- We take 2 format-aware fuzzing tools that can identify fields
in the input as examples. The field extraction result during
ing fuzzing is shown in Figure 10.
Beyond the file inputs, we also choose two protocols, ARP From the result, TIFF-fuzzer and AIFORE split the first
and DNS, to test and compare the result with Wireshark [31], four bytes (i.e., magic_number field) into single-byte fields.
Polyglot [3], and PI [30]. The program we choose to parse However, since the program parses the bytes one by one, then
ARP is arping which is provided by the system of Ubuntu it is better to fuzz each of the bytes rather than as a whole.
18.04, and the parameter is arping 192.168.1.1 -I eno2 For WEIZZ, there are a few false negatives. The reason is that
-c 1. This command produces an ARP request and then WEIZZ relies on cmp instruction when extracting the fields,
parses an ARP reply packet. The program to parse DNS which is not sufficient.
is nslookup from BIND 9, and the parameter we use is In AIFORE, we perform a complete taint analysis on valu-
nslookup example.com. able inputs, rather than WEIZZ, which achieves a higher accu-
In this experiment, we only focus on reversing the format racy on field identification and thus can increase the fuzzing
of the response part (i.e., ARP reply and DNS response packet). efficiency better than other fuzzers, as shown in Table 8.
Given only one response packet, AIFORE can successfully
reverse the format via taint analysis and related analysis. How- B.2 Field Type Case
ever, during the test against PI, it fails to reverse the format
given only one response packet, so we change the testing For field type identification, state-of-art works generally de-
command from -c 1 to -c 10 to collect 10 response packets pend on specific rules, and the field types they identified are
for PI to analyze. During the execution of the command, we usually program types rather than the semantic type of a
capture the packets so that we can later compare the results. field. For example, TIFF-fuzzer infers field types based on
We use the format knowledge taken from Wireshark as the APIs (strcmp,strcpy) in libraries and then splits the field
ground truth. Wireshark identifies protocol formats by tem- into several program variable types such as char* and int.
plate scripts written carefully by community experts. How- However, such rules and types may not be sufficient. Take
ever, we find AIFORE can extract more detailed format knowl- the section name, s_name, in the ELF file as an example.
edge in some cases, like the DNS target. Thus, we also use Such fields are string type. However, TIFF-fuzzer considers
the RFC 1035 [53], which defines the DNS protocol format them as consecutive int bytes. The reason is that readelf
specification as complementary to the ground truth. The or- uses repe cmpsb instruction rather than strcmp call to parse
ange parts represent the wrong results. AIFORE can not only s_name. In AIFORE, it marks this field as a magic number,
identify the field boundaries but also identify the field types. which is also reasonable, since the program tries to compare
AIFORE predicts the field type correctly for 4 fields out of 5. the section name with hardcoded strings. We then investigate
Polyglot AIFORE PI 0.0.2
Wireshark 3.4.0 AIFORE PI 0.0.2 Bind 9.3.4 Bind 9.16.16 Bind 9.16.16
RFC 1035 Wireshark 3.4.0
0 0
Zeroed data
Hardware type: Enumeration Enum ID : Integer ID : Integer Unused N/A for type Binary data
2 Binary data 2
Protocol type: Enumeration Enum Flags : Size Flags : Size Fixed Size Binary data
Zeroed data 4
4
Hardware size: Size Size Zeroed data
Binary data QDCOUNT : Size Questions RR : Size Fixed Size
5 Binary data
Protocol size: Size Size 6
6 Header Zeroed data
Zeroed data ANCOUNT : Size Size
Opcode: Enumeration String Answer RR : Size Fixed
Binary data Binary data
8
8
Binary data NSCOUNT : Size Authority RR : Size Fixed Size
ASCII data 10 Zeroed data
ARCOUNT : Size Additional RR : Size Fixed Size
Sender MAC address: Address N/A for type Binary data
12
Direction Size Ascii data
Zeroed data
Figure 8: Results of the ARP reverse engineering. Figure 9: Results of the DNS format reverse engineering.
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x00h: [7F] [45] [4C] [46] [01] [01] [01] [00] [00] [00] [00] [00] [00] [00] [00] [00] ; char *__cdecl get_file_type(unsigned int e_type)
…
0x10h: [02 00] [03 00] [01 00 00 00] [74 80 04 08] [34 00 00 00] …
0x20h: [A4 00 00 00] [00 00 00 00] [34 00] [20 00] [02 00] [28 00] mov rcx, 0F3DDh
call __afl_maybe_log ; bitmap changes here
0x30h: [04 00] [03 00] mov rax, [rsp + 98h + var_88]
mov rcx, [rsp + 98h + var_90]
(a) TIFF-fuzzer mov rdx, [rsp + 98h + var_98]
lea rsp, [rsp + 98h]
mov edi, edi
0 1 2 3 4 5 6 7 8 9 A B C D E F mov edx, 5 ; e_type
jmp ds:off_77DA00[e_type*8] ; switch jump
0x00h: [7F 45 4C 46] [01 01 01 00] 00 00 00 00 00 00 00 00
0x10h: [02 00] [03 00] 01 00 00 00 74 80 04 08 [34 00] [00 00
0x20h: A4 00 00 00] 00 00 00 00 34 00 [20 00] [02 00] [28 00]
loc_40DFF0: ; jumptable case 0 loc_40DFE0: ; jumptable case 4
0x30h: [04 00] [03 00] mov esi, offset aNoneNone mov esi, offset aCoreCoreFile
… case 2 …
xor edi, edi ; domainname
(b) WEIZZ xor
jmp
edi, edi
_dcgettext
; domainname
jmp _dcgettext
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x00h: [7F] [45] [4C] [46] [01 01 01 00] 00 00 00 00 00 00 00 00 Figure 11: ProFuzzer failure case.
0x10h: [02 00] [03 00] 01 00 00 00 74 80 04 08 [34 00] [00 00
0x20h: A4 00 00 00] 00 00 00 00 34 00 [20 00] [02 00] [28 00]
0x30h: [04 00] [03 00] Another work that can identify the semantic type of a field
(c) AIFORE
is ProFuzzer. We take the field e_type locates at offset 0x10
Figure 10: Field identification results. Bytes in red brackets in Figure 10 as an example. It represents the file type (e.g.,
indicate they are different than the specification, while the ET_EXEC), which is an enumeration. During the probing
shallow parts indicate the target fails to extract the fields. stage, ProFuzzer considers it an incorrect offset type. We
then investigate the reason and find this is due to the limitation
of code coverage bitmap. ProFuzzer mutates each of the bytes
in e_type to observe the similarity of the bitmaps. But from
Figure 11, we can learn that different switches share the same
bitmap, which leads ProFuzzer to make the wrong decision.
why the model in AIFORE predicts the field as a magic num- However, in AIFORE, it predicts the field type based on how
ber. We leverage the Grad-cam [54], which is used to explain the program parses the specific field, and we also use forward
why a model makes a specific decision. It helps humans to slicing to merge BB to get more code features, which makes
understand the internal working principle of a classification it able to recognize this field correctly.
model.
Figure 12, the frame has some fields with consecutive bytes 40%
70%
(e.g., Incl Len consisting of bytes at offsets from 0x08 to 30% 50%
40%
0x0B) and some fields with single-byte (e.g., L4 Prot ) val- 20%
30%
10%
0x00h: 36 00 00 00 96 D3 09 00 BD 01 00 00 BD 01 00 00
Ts Sec Ts Usec Incl Len Orig Len
0x10h: FF FF FF FF FF FF 10 A1 CD 11 00 0D 08 00 45 00
Destination MAC Source MAC Layer3 Protocol Ver Len DiffServ
0x20h: 01 AF
Total Length
00 00
Identification
00
Flags
00 40
TTL
11
L4 Prot.
79 3F
Checksum
00 00
Source IP
00 0D FF FF C Other Tables and Figures
0x30h: FF FF 00 44 00 43 01 9B CA 88
Dest IP Source Port Dest Port Data Length Checksum
(b) Frame with UDP packet. Figure 14 shows the field type accuracy results of models
trained on programs with different compiler optimization lev-
Figure 12: Frame structure definition.
els. The detailed explanation is in Question 1 of §5.1.2.