A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
Abstract—Deep learning allows building high-accuracy produce the wrong classification labels. Similarly, malware
malware detectors without complicated feature engineering. detectors based on deep learning are also vulnerable to
However, research shows that the deep learning model is adversarial attacks.
vulnerable and can be deceived if attackers add perturbation to
input samples to craft adversarial examples deliberately. By However, it is not trivial to add perturbations to malware
altering the pixel values of the images, attackers have been able to binaries. Windows malware is a structured binary file that
generate adversarial examples that can fool state-of-the-art deep follows the Portable Executable (PE) format [7]. The PE file
learning based image classifiers. However, Windows malware is a contains metadata, machine code, data, resource files, etc.
structured binary program file. Therefore, arbitrarily altering its Arbitrarily altering the content of a PE file will often break its
contents will often break the program's functionality. In order to functionality or even cause the program to fail to run at all.
solve this problem, a standard but inefficient method is to run the
sample in the sandbox to verify whether its functionality is A simple solution is to run the altered malware in the
preserved. This paper proposes a multi-strategy adversarial sandbox and check whether its functionality is preserved.
attack method, which can generate malware adversarial examples Unfortunately, this method is inefficient and impractical
with functionality-preserving. Our method manipulates the because 1) it needs to consume many resources to run the
redundant or extended space in the Windows malware binary, so sandbox and filter from the generated samples; 2) it is not easy
it will not break functionality. Experiments show that our method to thoroughly and reliably verify the program's functionality,
has a high attack success rate and efficiency. especially without source code. Therefore, a new method is
needed to generate malware adversarial examples with
Keywords—malware detection, adversarial examples, evasion functionality-preserving.
attacks, deep learning
This paper proposes an advanced adversarial attack method
I. INTRODUCTION for deep learning based malware detectors. Through an in-depth
Malware detection is challenging because of the large review of the PE file format, we form the methods of preserving
number, variety, and fast updating of malware. According to the functionality into six attack strategies. Four of them leverage
statistics of AV-TEST [1], the total number of malicious redundant space, and two extend space in PE malware. The
software exceeded 1 billion in 2022, and more than 96 million attack strategies allow generating practical malware adversarial
new malware were discovered in 2022. Traditionally, building a examples directly with functionality-preserving. Furthermore,
malware detector requires a malware analyst to extract since there is no need to verify in the sandbox, our method is fast
signatures manually [2] or perform complicated feature and lightweight. Combining the advantages of each attack
extraction processes [3], [4]. These methods are time-consuming strategy, we further developed a multi-strategy method to
and inefficient for magnanimity malware. As a novel technology, improve the attack success rate. The experiments show that the
deep learning has shown impressive effects in computer vision multi-strategy adversarial attack method has high attack success
and natural language processing. Not surprisingly, it also rate, outperforming other methods.
performs very well in the field of malware detection. By training II. RELATED WORKS
end-to-end deep learning models, researchers can build high-
accuracy malware detectors without complicated feature A. Malware Detection
engineering. The early works of malware detection using neural networks
Although deep learning has excellent performance, research still involve complicated feature engineering. Dahl et al. [8]
shows that the deep learning model is vulnerable and easily extracted features, including null-terminated patterns in process
deceived by adversarial examples [5]. For example, in the image memory and system API calls, and used random projection to
classification task, attackers [6] can add small perturbations that reduce the feature dimension. They trained a fully connected
are difficult to be recognized by human eyes to pixel values of network classifier and got a two-class error rate of 0.49%. Saxe
the images and then make the state-of-the-art image classifiers and Berlin [9] proposed a novel two-dimensional byte entropy
67
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:01:08 UTC from IEEE Xplore. Restrictions apply.
x NT Header: Include PE metadata, such as characteristics, Based on the above six attack strategies, we further
address of entry point and number of sections. The first developed a multi-strategy method. This method leverages all
two bytes are magic number “PE”. six attack strategies to maximize the available space of PE files.
In this way, it can cover the most sensitive areas of the MalConv
x Section Headers: Include metadata for each section. model as much as possible and improve the attack success rate.
x Sections: The main part, including code or data. B. Practical Adversarial Examples Generation
A. Attack Strategies The attack strategy tells us which bytes can be manipulated
Arbitrarily altering some bytes of a PE file will often break and generates an index vector to mark the positions that can be
its functionality, but there is actually available space in the PE manipulated. In order to generate practical malware adversarial
file: redundant and extended space. Modifying the bytes in these examples, we also need to generate an adversarial payload and
spaces will keep the functionality of PE. Attack strategies are then inject the payload into the marked positions based on the
proposed to locate or expose these spaces. According to the index vector. Here, we use the FGSM (Fast Gradient Sign
structure of the PE file, the attack strategy generates an index Method) [22] algorithm to generate an adversarial payload
vector for each PE file. The index vector contains the index (i.e., because it is fast and performs well.
offset) of bytes that can be manipulated. Assume that the input of the deep learning model is ∈ ,
x DOS Header Attack Strategy: Locate index 0x2-0x3c the generated adversarial payload is ∗ ∈ , the prediction
which is the whole DOS header except the first two bytes result is ( ), the original label is , and the loss function is .
(magic number “MZ”) and the integer at 0x3c (address Generating adversarial payload is equivalent to maximize the
of NT header). These bytes are redundant and will not be loss function value:
used by the Windows operating system. maximize ( ) = ( ( ), ) (1)
∈
x DOS Stub Attack Strategy: Locate index range from
0x40 to address of NT header. These bytes are redundant The FGSM algorithm solves this problem by adding a
and will not be used by the Windows operating system. perturbation in the gradient direction:
∗ ( ( ), ) (2)
x NT Header Attack Strategy: Locate certain unused fields. = + sign
For example, the timestamp of compiled, linker version, Where is an adjustable super parameter to control the step
and image version. length of gradient.
x Slack Space Attack Strategy: Locate the padding at the Another thorny problem is the embedding layer adopted by
end of sections due to file alignment. In order to speed MalConv model. Since PE files are binary files, discrete bytes
up reading and writing, alignment is widely used. are not conducive to model learning. Therefore, researchers
Generally, the PE format requires that the size of each often use the embedding layer to map discrete bytes into floating
section stored in the disk is an integer multiple of 512. If point vectors.
the section size is not an integer multiple of 512, padding
will be added to the end of the section. Therefore, The embedding layer maintains a learnable lookup table,
padding is redundant. Modifying it will not affect which maps bytes to floating point vectors. Obviously, the
functionality. embedding layer is nondifferentiable, and the gradient cannot be
transferred to discrete bytes through the embedding layer. When
x Slack Space Extend Attack Strategy: Extend the end of generating adversarial payload, it must be mapped back to
sections to match the section alignment. For each section, discrete byte values. Here, we use the KNN algorithm to map
in addition to file alignment, there is also section each floating point vector to the nearest byte value.
alignment. Section alignment means that the memory
size occupied by the section after it is loaded into Figure 3 shows the attack architecture. The attack algorithm
memory is an integral multiple of 4096. When the section steps are as follows:
is stored on disk, the size is only an integer multiple of 1. Input the malware sample.
512. This means that the section has extra extended space
when it is loaded into memory. With this, it’s possible to 2. Preprocess the malware sample by the attack strategy,
insert extra bytes at the end of the section to make it meet and generate an index vector.
the section alignment.
3. Generate adversarial payload via the FGSM algorithm
x Append Attack Strategy: Simply append bytes to the end and map float point vectors to bytes using KNN.
of files. This will not touch the existing file structure, so 4. Inject adversarial payload to the preprocessed malware
it is safe. sample to get the practical malware adversarial example.
The first four attack strategies utilize the redundant space in 5. Verify whether the adversarial example can fool deep
the PE file, and the last two attack strategies expand the extra learning model successfully. If not, put it to the input of
space. When the last two attack strategies are used, the PE file step (1) and continue to iterate until it succeeds or
needs to be preprocessed to extend the space, and then the index reaches the maximum number of iterations.
vector is generated.
68
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:01:08 UTC from IEEE Xplore. Restrictions apply.
25% for testing. We trained MalConv on dataset A and B
respectively and got 98.43% test accuracy on dataset A and
88.07% on dataset B.
C. Attack Results
Follow the steps described in Section III. The maximum
number of iterations is set to 20. For the append attack strategy,
we append 64K bytes to the end of file.
The attack success rate is used to evaluate the attack
strategies. Assuming that the total number of malware is and
the number of adversarial examples that can be misclassified by
MalConv is , the success rate is:
= (3)
Average Time
Attack Strategy Success Rate
(seconds)
Before performing attack, we need to train the MalConv For each task, the experiments compare the success rate and
model due to it is our attack target. The two datasets were average time of seven attack strategies. The results show that the
randomly split into two parts respectively, 75% for training and three attack strategies for file header have a low success rate, and
69
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:01:08 UTC from IEEE Xplore. Restrictions apply.
the success rate of the multi-strategy method is significantly Processing, May 2013, pp. 3422–3426. doi:
superior to other methods. The success rate of attacks against 10.1109/ICASSP.2013.6638293.
malware family classifications is usually higher. We believe that [9] J. Saxe and K. Berlin, “Deep neural network based malware detection
using two dimensional binary program features,” in 2015 10th
family classification is more difficult, which makes deep International Conference on Malicious and Unwanted Software
learning models more prone to errors. (MALWARE), Oct. 2015, pp. 11–20. doi:
10.1109/MALWARE.2015.7413680.
On average, all attack strategies can complete their attack in
[10] E. Raff, J. Sylvester, and C. Nicholas, “Learning the PE Header, Malware
hundreds of milliseconds. However, the three attack strategies Detection with Minimal Domain Knowledge,” in Proceedings of the 10th
for file header take a long time because they require more ACM Workshop on Artificial Intelligence and Security, New York, NY,
iterations than others. The append attack atrategy takes the least USA: Association for Computing Machinery, 2017, pp. 121–132.
time, because it does not require complex preprocess and usually [11] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. K.
requires less iterations. The multi-strategy method needs more Nicholas, “Malware Detection by Eating a Whole EXE,” in Workshops
time to preprocess samples, but its performance is still great. at the Thirty-Second AAAI Conference on Artificial Intelligence, Jun.
2018.
V. CONCLUSION [12] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language modeling
with gated convolutional networks,” in Proceedings of the 34th
In this paper, we proposed a multi-strategy adversarial attack International Conference on Machine Learning - Volume 70, Sydney,
method for deep learning based malware detectors. Through in- NSW, Australia, Aug. 2017, pp. 933–941.
depth analysis of PE file format, we carefully manipulated the [13] W. Hu and Y. Tan, “Generating Adversarial Malware Examples for
bytes of redundant space and extended space. This makes it Black-Box Attacks Based on GAN,” arXiv:1702.05983 [cs], Feb. 2017.
Accessed: Nov. 21, 2021. [Online]. Available:
possible to generate adversarial examples without breaking the https://arxiv.org/abs/1702.05983
functionality of the malware. Furthermore, by combining the
[14] R. L. Castro, C. Schmitt, and G. Dreo, “AIMED: Evolving Malware with
advantages of various attack strategies, our method has a high Genetic Programming to Evade Detection,” in 2019 18th IEEE
attack success rate. The experimental results show that our International Conference On Trust, Security And Privacy In Computing
method has great performance. And Communications/13th IEEE International Conference On Big Data
Science And Engineering (TrustCom/BigDataSE), 2019, pp. 240–247.
Deep learning has state-of-the-art performance for malware doi: 10.1109/TrustCom/BigDataSE.2019.00040.
detection, but its vulnerability should not be ignored. In [15] B. Chen, Z. Ren, C. Yu, I. Hussain, and J. Liu, “Adversarial Examples for
particular, malware is a field of strong confrontation, and CNN-Based Malware Detectors,” IEEE Access, vol. 7, pp. 54360–54371,
attackers have a strong motivation to bypass malware detectors. 2019, doi: 10.1109/ACCESS.2019.2913439.
Unfortunately, most malware detectors based on deep learning [16] B. Kolosnjaji et al., “Adversarial Malware Binaries: Evading Deep
Learning for Malware Detection in Executables,” in 2018 26th European
do not consider defense against an adversarial attack. Actually, Signal Processing Conference (EUSIPCO), Sep. 2018, pp. 533–537. doi:
some attack strategies have apparent features. In the future, we 10.23919/EUSIPCO.2018.8553214.
will develop defense methods for adversarial attacks by [17] L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando,
leveraging the features of attack strategies. “Explaining Vulnerabilities of Deep Learning to Adversarial Malware
Binaries,” arXiv:1901.03583 [cs], Jan. 2019, Accessed: Nov. 12, 2021.
REFERENCES [Online]. Available: http://arxiv.org/abs/1901.03583
[1] AV-TEST, “Malware Statistics & Trends Report,” Accessed: Mar. 01, [18] F. Kreuk, A. Barak, S. Aviv-Reuven, M. Baruch, B. Pinkas, and J. Keshet,
2023. [Online]. Available: https://www.av-test.org/en/statistics/malware/ “Deceiving End-to-End Deep Learning Malware Detectors using
Adversarial Examples,” arXiv:1802.04528 [cs], Jan. 2019, Accessed:
[2] B. Li, K. Roundy, C. Gates, and Y. Vorobeychik, “Large-Scale
Nov. 12, 2021. [Online]. Available: http://arxiv.org/abs/1802.04528
Identification of Malicious Singleton Files,” in Proceedings of the
Seventh ACM on Conference on Data and Application Security and [19] O. Suciu, S. E. Coull, and J. Johns, “Exploring Adversarial Examples in
Privacy, New York, NY, USA, 2017, pp. 227–238. doi: Malware Detection,” in 2019 IEEE Security and Privacy Workshops
10.1145/3029806.3029815. (SPW), May 2019, pp. 8–14. doi: 10.1109/SPW.2019.00015.
[3] H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training [20] L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando,
Static PE Malware Machine Learning Models,” arXiv:1804.04637 [cs], “Functionality-Preserving Black-Box Optimization of Adversarial
Apr. 2018, Accessed: Dec. 01, 2021. [Online]. Available: Windows Malware,” IEEE Transactions on Information Forensics and
http://arxiv.org/abs/1804.04637 Security, vol. 16, pp. 3469–3478, 2021, doi: 10.1109/TIFS.2021.3082330.
[4] C. Choi, K. Lee, H. Lee, I. Jeong, and H. Yun, “Malware Family [21] L. Demetrio, S. E. Coull, B. Biggio, G. Lagorio, A. Armando, and F. Roli,
Classification Based on Novel Features from Frequency Analysis,” IJCTE, “Adversarial EXEmples: A Survey and Experimental Evaluation of
vol. 10, no. 4, pp. 135–138, 2018, doi: 10.7763/IJCTE.2018.V10.1214. Practical Attacks on Machine Learning for Windows Malware Detection,”
ACM Trans. Priv. Secur., vol. 24, no. 4, p. 27:1-27:31, Sep. 2021, doi:
[5] C. Szegedy et al., “Intriguing properties of neural networks,”
10.1145/3473039.
arXiv:1312.6199 [cs], Feb. 2014, Accessed: Nov. 21, 2021. [Online].
Available: http://arxiv.org/abs/1312.6199 [22] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing
Adversarial Examples,” arXiv:1412.6572 [cs, stat], Mar. 2015, Accessed:
[6] N. Carlini and D. Wagner, “Towards Evaluating the Robustness of Neural
Nov. 21, 2021. [Online]. Available: http://arxiv.org/abs/1412.6572
Networks,” in 2017 IEEE Symposium on Security and Privacy (S&P),
May 2017, pp. 39–57. doi: 10.1109/SP.2017.49. [23] VirusShare, “VirusShare,” Accessed: Mar. 01, 2023. [Online]. Available:
https://virusshare.com/
[7] Microsoft, “PE format,” Accessed: Mar. 01, 2023. [Online]. Available:
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format [24] L. Yang, A. Ciptadi, I. Laziuk, A. Ahmadzadeh, and G. Wang,
“BODMAS: An Open Dataset for Learning based Temporal Analysis of
[8] G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware
PE Malware,” in 2021 IEEE Security and Privacy Workshops (SPW),
classification using random projections and neural networks,” in 2013
May 2021, pp. 78–84. doi: 10.1109/SPW53761.2021.00020.
IEEE International Conference on Acoustics, Speech and Signal
70
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:01:08 UTC from IEEE Xplore. Restrictions apply.