A survey of malware detection using deep learning
A survey of malware detection using deep learning
Keywords: The problem of malicious software (malware) detection and classification is a complex task, and there is no
Malware detection perfect approach. There is still a lot of work to be done. Unlike most other research areas, standard benchmarks
Multi-task learning are difficult to find for malware detection. This paper aims to investigate recent advances in malware detection
Malware image
on MacOS, Windows, iOS, Android, and Linux using deep learning (DL) by investigating DL in text and image
Generative adversarial networks
classification, the use of pre-trained and multi-task learning models for malware detection approaches to
Mobile malware
Convolutional neural network
obtain high accuracy and which the best approach if we have a standard benchmark dataset. We discuss
the issues and the challenges in malware detection using DL classifiers by reviewing the effectiveness of these
DL classifiers and their inability to explain their decisions and actions to DL developers presenting the need
to use Explainable Machine Learning (XAI) or Interpretable Machine Learning (IML) programs. Additionally,
we discuss the impact of adversarial attacks on deep learning models, negatively affecting their generalization
capabilities and resulting in poor performance on unseen data. We believe there is a need to train and test the
effectiveness and efficiency of the current state-of-the-art deep learning models on different malware datasets.
We examine eight popular DL approaches on various datasets. This survey will help researchers develop a
general understanding of malware recognition using deep learning.
∗ Corresponding author.
E-mail addresses: [email protected] (A. Bensaoud), [email protected] (J. Kalita), [email protected] (M. Bensaoud).
1
https://www.nbcnews.com/tech/security/colorado-state-websites-struggle-russian-hackers-vow-attack-rcna51012
2
https://www.nbcnews.com/tech/security/china-hacked-least-six-us-state-governments-report-says-rcna19255
3
https://attackmap.sonicwall.com/live-attack-map
https://doi.org/10.1016/j.mlwa.2024.100546
Received 24 December 2023; Received in revised form 10 March 2024; Accepted 10 March 2024
Available online 20 March 2024
2666-8270/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
• It reviews the use of natural language processing in malware document or performs a specific function. Each object starts with
classification (Section 10). two numbers, followed by the keyword obj, and ends with endobj.
• It presents the deep learning models for cryptographer ranso- There are many kinds of objects, such as font objects, image
mware (Section 11). objects, and even objects that contain metadata.
• It shows how we know if we can trust the results of a DL model • There are many keywords that begin with a /and describe how
using Explainable Artificial Intelligence, XAI (Section 12). the PDF works. Some of the keywords related to malicious activity
• It discusses significant challenge for the reliability and secu- include /OpenAction, or its abbreviation /AA, both of which indi-
rity pozed by adversarial attacks on deep learning models (Sec- cate an automatic action to be performed when the document is
tion 13). viewed.4 This keyword points to another object that automatically
gets opened or executed when the PDF is opened. Malicious
The rest of this paper, we discuss avenues for future research and
PDFs have /OpenAction pointing to some malicious JavaScript, or
we examine the Efficientnet B0, B1, B2, B3, B4, B5, B6, and B7 models
an object containing an export; whenever one opens the docu-
on malware images datasets for classification.
ment, the system is automatically compromised. /JavaScript or
/JS keyword indicate the presence of JavaScript code. Malicious
2. Mechanics of malware attacks
PDFs usually contain malicious JavaScript to launch an exploit
or download additional malware. Some objects can be referred
The hacker has one goal, which is to get malware installed onto
to as /Name instead of their number. Some PDFs have the ability
a victim’s computer. Because most computers are protected by some
to have files embedded with keyword /EmbeddedFile, /URL or /
type of firewall, direct attacks are difficult to impossible to perform.
SubmitForm. /URL is accessed or downloaded when the object is
Therefore, attackers attempt to trick the computer into running the
loaded.
malicious code. The most common way to do this is by using documents
• PDFs can encode data in multiple ways, which is very flexible
or executable files. For instance, a hacker may send an email or a
phish to the victim with a malicious document attachment or a link and can store data in a number of ways. Hackers can encode and
to a website where the malicious document is located. Once the victim hide their data. For example, names are case sensitive, but can be
opens the document, embedded exploits or scripts run and download fully or partially hex encoded. More precisely, the # sign followed
or extract more malware. This is the real malware the hacker wants to by two hex characters represents hex encoded data. Data also can
run on the victim’s system and is often something like a backdoor or be octal encoded or represented by their base eight number. The
ransomware. However, malicious documents are usually not the final octal encoded character has a ∖ followed by three digits between 0
piece of malware in an attack, but are one of the compromised vectors and 7. However, the hackers can mix hex, octal, and ASCII data all
used by the hacker to get on the system. As an example, below we together, which makes it possible to hide data such as JavaScript
discuss how a PDF document can be used to initiate an attack. code or URLs.
The names and strings can be encoded, but data streams can be
2.1. PDF and document files
modified and encoded further using filters. Filters are algorithms that
When analyzing PDF, we find three things: Object, which is the are applied to the data to encode or compress within the PDF. There
structure of the PDF, Keywords which control how the PDF works, and are multiple filters that can be used in PDFs, such as /ASCiiHexDecode,
Data stored or encoded within a PDF. Hex encoding of characters; /LZWDecode, LZW compression algorithm;
/FlateDecode, Zlib compression; /ASCii85Decode, ASCII base-85 repre-
• Objects are the building blocks of a PDFs. Every PDF starts with sentation; and /Crypt, various encryption algorithms. For example, in
a Header which needs to be present in the first 1024 bytes of Fig. 2, we have a PDF document with three objects. Object 1 is a catalog
the documents. Some hackers take advantage of this by putting that has OpenAction and is referring to version 0 of object 2, which
unrelated data within the first 1024 bytes. This is a very simple
technique to try to avoid signature-based detection. PDFs are
4
composed of objects; each section has specific data within the https://blog.didierstevens.com/programs/pdf-tools/
2
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
3
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
4. Overview & malware detection Several tools can visualize and edit a binary file in hexadecimal or
ASCII formats such as IDA Pro,7 x32/x64 Debugger,8 HxD,9 PE-bear,10
Yara,11 Fiddler,12 Metadata,13 XOR analysis,14 and Embedded strings.15
Malware detection methods are divided into three types: static,
Malware file or code can be used to generate an image by converting
dynamic, and hybrid (Damodaran, Di Troia, Visaggio, Austin, & Stamp,
the binary, octal, hexadecimal or decimal into a two dimensional
2017). Static methods inspect an executable file without running it,
matrix of pixels. The image can be grayscale or RGB. In grayscale, pixels
while dynamic methods must run the executable file and analyze its
are black and white values in the range [0–255] where 0 represents
behaviors inside a controlled environment. In hybrid methods, the in-
black, and 255 represents white.
formation is collected regarding malware from static as well as dynamic
Gray image feature: The machine stores images in a matrix of
analysis.
numbers. These numbers, or the pixel values, denote the intensity or
Some security researchers use static features by decompiling the
brightness of the pixel. Smaller numbers (close to zero) represent black,
target file. Naik, Jenkins, Savage, Yang, Boongoen, and Iam-On (2021) and larger numbers (closer to 255) denote white (see Fig. 6).
proposed a fuzzy-import hashing technique based on static analysis
for malware detection. Mohamad, Arif, Ab Razak, Awang, Tuan Mat,
Ismail, and Firdaus (2021) proposed machine learning classifiers based 7
https://hex-rays.com/ida-pro
on permission-based features for static analysis to detect Android mal- 8
https://x64dbg.com/#start
ware. 9
https://mh-nexus.de/en/hxd
10
https://hshrzd.wordpress.com/pe-bear
11
https://yara.readthedocs.io/en/stable
5 12
https://securityintelligence.com/news/costa-rica-state-emergency- https://www.telerik.com/purchase/fiddler
ransomware/ 13
https://www.malwarebytes.com/glossary/metadata
6 14
https://www.cnn.com/2022/12/28/politics/hackers-access-data- https://eternal-todo.com/var/scripts/xorbruteforcer
15
louisiana-hospital-system-ransomware/index.html https://virustotal.github.io/yara/
4
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Deep learning can solve diverse ‘‘vision’’ problems, including mal- In feature selection, we choose a subset of the features, in contrast
ware image classification tasks. Deep learning can extract features to feature extraction where we map the original features to a lower
automatically obviating manual feature extraction. The content of the
malware executable file is first converted into a digital image. Nataraj,
16
Karthikeyan, Jacob, and Manjunath (2011) visualized the byte codes https://archive.org/download/vxheavens-2010-05-18
5
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Table 2
Comparative performance summary of Transfer Learning models for malware image classification.
Reference Features Model Files Accuracy Dataset
Çayır, Ünal, and Dağ (2021) gray-scale images CapsNet PE 98.63% Malimg
Çayır et al. (2021) gray-scale images RCNF PE 98.72% Malimg
Go et al. (2020) gray-scale images ResNeXt PE 98.32% Malimg
Bensaoud, Abudawaood, and Kalita (2020) gray-scale images Inception V3 PE 99.24% Malimg
El-Shafai, Almomani, and AlKhayer (2021) gray-scale images VGG16 PE 99.97% Malimg
Hemalatha, Roseline, Geetha, Kadry, and Damaševičius (2021) gray-scale images DenseNet PE 98.23% Malimg
Hemalatha et al. (2021) gray-scale images DenseNet PE 98.46% BIG 2015
Lo, Yang, and Wang (2019) gray-scale images Xception PE 99.03% Malimg
Lo et al. (2019) gray-scale images Xception PE 99.17% BIG 2015
dimensional space. The smaller dimensional feature set can help pro-
duce better as well as faster classification. To do that, we need to find
a projection matrix 𝑊 ∋ 𝑍̄ = 𝑊 𝑇 𝑋. ̄ We expect from such a projection
that the new features are uncorrelated and cannot be reduced further
and are non redundant. Next, we need features to have large variance:
Why? Because if a feature takes similar values for all the instances, that
feature cannot be used as a discriminator.
Feature extraction methods such as a Principal Component Anal-
ysis (PCA) (Barath et al., 2016), GIST (Oliva & Torralba, 2001), Hu
Moments (Hu, 1962), Color Histogram (Swain & Ballard, 1991), Har-
alick texture (Lin, Hays, Wu, Kwatra, & Liu, 2004), Discrete Wavelet
Transform (DWT) (Kancherla, Donahue, & Mukkamala, 2016), Inde-
pendent Component Analysis (ICA) (Herault & Jutten, 1986), Linear Fig. 8. Feature Extraction for Transfer Learning.
discriminant analysis (LDA) (Fan, Xu, & Zhang, 2011), Oriented Fast
and Rotated BRIEF (ORB) (Rublee, Rabaud, Konolige, & Bradski, 2011),
Speeded Up Robust Feature (SURF) (Bay, Tuytelaars, & Van Gool, 9.1. Using feature extraction
2006), Scale Invariant Feature Transform (SIFT) (Lowe, 1999), Dense
Scale Invariant Feature Transform (D-SIFT) (Lowe, 1999), Local Binary Feature extraction discussed earlier is a practical and common,
Patterns (LBPs) (Ojala, Pietikäinen, & Harwood, 1996), KAZE (Alcan- and low resource-intensive way of using pre-trained networks. It takes
tarilla, Bartoli, & Davison, 2012) have been combined with machine the convolutional base of a previously trained network and runs the
learning including deep learning. These methods successfully filter the malware data through it, and then trains a new classifier on top
characteristics of malware files. Azad, Riaz, Aftab, Rizvi, Arshad, and of the output. As shown in Fig. 8, we can choose a network such
Atlam (2022) proposed a method named DEEPSEL (Deep Feature Selec- as VGG16 (Simonyan & Zisserman, 2014) that has been trained on
tion) to identify malicious codes of 39 unique malware families. Their ImageNet, as an example. The input fed at the bottom, goes up to the
model achieved an accuracy of 83.6% and an F-measure of 82.5%. To- trained convolutional base, representing the CNN region of the VGG16.
biyama, Yamaguchi, Shimada, Ikuse, and Yagi (2016) proposed feature The trained classifier resides in the dense region and the prediction is
extraction based on system calls. Recurrent Neural Network was used made by this dense region at the end. Usually, we have 1000 neurons at
to extract features and Convolutional Neural Network to classify these the end to predict the actual ImageNet classes. We take this ImageNet
features. trained model as base, and remove the classifier layer, keeping the
convolutional layers of the pre-trained model, along with their weights.
In the next step, we attach a new classifier that has new dense layers for
9. Deep transfer leaning models for malware detection malware classification on top. The weights of the base are frozen, which
means that the malware input passes through convolutional layers
Transfer learning takes place if we have a source model which which have their prior weights, during training. However, all dense
has some pre-trained knowledge and this knowledge is needed as the layers are randomly initialized, and the interconnection weights for
foundation to build a new model (Ye & Dai, 2021). For example, using these layers are learned during the new training process for detecting
malware.
a very large pre-trained convolutional neural network usually involves
Why remove the original dense layers? What has been observed is
saving a network that was previously trained on some large dataset,
that the representations learned by the convolutional base are generic
typically on a large-scale image classification task, using a dataset like
and therefore reusable for a variety of tasks.
ImageNet (Russakovsky et al., 2015). After training a network on the
ImageNet dataset, we can re-purpose this trained network. Research
9.2. Using fine tuning
papers have discussed applying these pre-trained networks to malware
image datasets (Bhodia, Prajapati, Di Troia, & Stamp, 2019; Qiao,
Fine-tuning involves changing some of the convolutional layers by
Zhang, & Zhang, 2020; Rezende, Ruppert, Carvalho, Ramos, & De Geus,
learning new weights. In Fig. 9, we have a network divided into three
2017; Vasan et al., 2020) that are generated form PE and APK malware regions. The yellow region is a pre-trained model. The green region
files, which are quite different from each other. represents our dense layers for which we need to learn the weights.
Malware image datasets are very different from ImageNet, which During training using a library such as Keras (Ketkar & Santana, 2017)
is normally used to pre-train the model. The ImageNet dataset and a and Tensorflow (Abadi et al., 2016), we can select certain layers and
malware image dataset represent visually completely different images. freeze the weights of those layers.
However, pre-trained still seems to help. Training a machine learning For example, we can select convolutional block one and then freeze
algorithm on large datasets can be done in two ways, as discussed all the weights of the convolutional layers, in this block only. This
below. means that during training, everything else will change, but the weights
6
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Table 3
Fine-tuned pre-trained models applied on different malware image datasets.
Setting Average accuracy Our dataset
Pre-trained model Samples Resize image Epoch Malimg Microsoft challenge Drebin Accuracy
EffNet B0 30,000 224 200 92.72% 90.45% 87.23% 94.59%
EffNet B1 30,000 240 200 95.64% 93.65% 88.91% 95.89%
EffNet B2 20,000 260 200 93.84% 91.78% 86.82% 94.12%
EffNet B3 15,000 300 400 90.32% 94.19% 89.35% 95.73%
EffNet B4 20,000 380 400 95.63% 96.68% 90.59% 97.98%
EffNet B5 25,000 456 400 80.19% 87.54% 84.23% 94.68%
EffNet B6 40,000 528 400 85.67% 83.82% 85.43% 93.54%
EffNet B7 30,000 600 1000 82.76% 80.76% 90.57% 88.45%
Inception V4 20,000 229 300 95.98% 93.21% 88.93% 96.39%
Xception 20,000 229 200 89.50% 90.84% 84.39% 93.53%
CapsNet 3000 256 100 88.64% 72.69% 78.68% 92.65%
7
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Table 4
The steps of encoding the domain by NLP.
Domain Notes
www.uccs.edu Start with domain
uccs Extract second level
[‘‘u’’, ‘‘c’’, ‘‘c’’, ‘‘s’’] Convert to sequence
[21, 3, 3, 18] Translate character to numeric values
[0, 0, 0, . . . .,0 , 21, 3, 3, 18] Pad sequence
Shaalan, 2021), and text classification (Niu, Zhong, & Yu, 2021).
Fig. 11. Training and testing for accuracy and loss of EfficientnetB0.
Attention mechanism was designed to improve the performance of the
encoder–decoder machine translation approach (Ren et al., 2021). The
encoder and decoder are usually many stacked RNN layers such as
LSTM as shown in Fig. 12. The encoder converts the text into a fixed-
(TDM), n-grams, One hot encoding, ASCII representations, and mod-
length vector while the decoder generates the translation text from this
ern word embedding such as Word2vec (Mikolov, Chen, Corrado, &
vector. The sequence {𝑥1 , 𝑥2 , … , 𝑥𝑛 } can either be a representation of
Dean, 2013) and Sent2vec (Pagliardini, Gupta, & Jaggi, 2017). Table 4
text or image as shown in Fig. 13. In case of sequences, Recurrent Neu-
presents text representation methods used in malware classification.
ral Networks (RNNs) can take two sequences with the same or arbitrary
Current word embeddings, when used in malware classification, do
lengths. In Fig. 14, the encoder creates a compressed representation
not carry much semantic and contextual significance. Bensaoud and
called context vector of the input, while the decoder gets the context
Kalita (2024) proposed a novel model for malware classification using
vector to generate the output sequence. In this approach, the network
API calls and opcodes, incorporating a combined Convolutional Neural
Network and Long Short-Term Memory architecture. By transform- is incapable of remembering dependencies in long sentences. This is
ing features into N-gram sequences and experimenting with various because the context vector needs to handle potentially long sentences,
deep learning architectures, including Swin-T and Sequencer2D-L, the and a shoot overall representation does not have the especially to store
method achieves a high accuracy of 99.91%, surpassing state-of-the- many potential dependencies.
art performance. Mimura and Ito (2021) designed NLP-based malware Attention in encoder–decoder: Bahdanau, Cho, and Bengio (2014)
detection by using printable ASCII strings. The model can detect ef- proposed an encoder–decoder attention mechanism framework for ma-
fectively packed malware and anti-debugging. Sequence to Sequence chine translation. A single fixed context vector is created by an RNN by
neural models are commonly used for natural languages processing and encoding the input sequence. Rather than using just the fixed vector, we
therefore used for malware detection as well. can also use each state of the encoder along with the current decoder
state to generate a dynamic context vector. There are two benefits; the
10.1. Sequence to sequence neural models first benefit is encoding information contained in a sequence of vectors
not just in one single context vector. The second benefit is to choose a
Attention mechanism (Luong, Pham, & Manning, 2015) has achieved subset of these vectors adaptively while decoding the translation.
high performance in sequential learning applications such as machine An attention mechanism is another Lego block that can be used
translation (Lu et al., 2021), image recognition (Gao, Gong, Ding, in any deep learning model. Vaswani et al. (2017) showed that an
& Guo, 2021), text summarization (AlMazrouei, Nelci, Salloum, & attention mechanism is apparently the only Lego block one needs. It
8
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Table 5
State-of-the-art deep learning models.
Ref Deep learning OS Features Accuracy
approach
Kim, Ban, Ko, MAPAS Android API call graphs 91.27%
Cho, and Yi
Fig. 13. Encoder and decoder include RNNs.
(2022)
Onwuzurike MaMaDroid Android API calls 84.99%
et al. (2019)
Kim and Cho Deep Generative Android Dalvik code, 97.47%
(2022) Model API call,
Malware images,
developers’
signature
Olani, Wu, DeepWare Windows/ HPC 96.8%
Chang, and Shih Linux
(2022)
Lian, Nie, Kang, Multi-Modal Windows Grayscale image, 97.01%
Jia, and Zhang Deep Learning Byte/Entropy
(2022) Histogram
Bensaoud and Deep multi-task Windows Grayscale color 99.97%
Kalita (2022) learning Android Linux image
MacOS
9
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
10
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
images. Kim, Park, Kwon, Jang, and Seo (2021) proposed detection 12. Explainable artificial intelligence (XAI)
of cryptographic ransomware using Convolutional Neural Network.
Their model prevents crypto-ransomware infection by detecting a block Explainable Artificial Intelligence (XAI) is a rapidly emerging field
that focuses on creating transparent and interpretable models (see
cipher algorithm. Sharmeen, Ahmed, Huda, Koçer, and Hassan (2020)
Fig. 19). In the context of malware detection, XAI can help security
proposed an approach to extract the intrinsic attack characteristics of
experts and analysts understand how a machine learning model arrived
unlabeled ransomware samples using a deep learning-based unsuper- at its decisions, making it easier to identify and understand false
vised learned model. Fischer et al. (2019) designed a tool to detect positives and false negatives. By applying XAI techniques, such as Lo-
security vulnerabilities of cryptographic APIs in Android by achieving cal Interpretable Model-Agnostic Explanations (LIME) (Ribeiro, Singh,
an average AUC-ROC of 99.2%. & Guestrin, 2016) or Deep Learning Important Features (DeepLIFT)
11
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
(Shrikumar, Greenside, & Kundaje, 2017), security teams can gain for various objectives, including evasion attacks and poisoning at-
insights into the most important features and decision-making processes tacks. Zhong et al. (2023) proposed a novel adversarial malware exam-
of the model. This can help them identify areas where the model may ple generation method called Malfox, which uses conditional generative
be vulnerable to evasion or identify new malware strains that the model adversarial networks (conv-GANs) to generate camouflaged adversar-
may have missed. Ultimately, XAI can improve the trustworthiness and ial examples against black-box detectors. The presented method was
reliability of machine learning models for malware detection, enabling evaluated on two real-world malware detection systems, and the results
more effective threat detection and response. showed that Malfox achieved high attack success rates while maintain-
Nadeem et al. (2022) provided a comprehensive survey and analysis ing low detection rates. Zhao et al. (2023) proposed a new method
of the current state of research on explainable machine learning (XAI) called SAGE for steering the adversarial generation of examples with ac-
techniques for computer security applications. The paper highlights celerations. The technique combines the advantages of gradient-based
the challenges and opportunities for adopting XAI in the security and gradient-free methods to generate more effective and efficient
domain and discusses several approaches for designing and evalu- adversarial examples.
ating explainable machine learning models. Vivek, Ravi, Mane, and The development of defense mechanisms against adversarial attacks
Naidu (2022) proposed an approach for detecting ATM fraud using is a computationally expensive process, which can potentially affect
explainable artificial intelligence (XAI) and causal inference techniques. the performance of the deep learning model. In addition, adversarial
They presented a detailed analysis of the proposed method and high- examples can impact the generalization ability of deep learning models,
lighted its effectiveness in improving the accuracy and interpretability resulting in poor performance on new and unseen data. Moreover,
of ATM fraud detection systems. Kinkead, Millar, McLaughlin, and generating adversarial examples can be computationally intensive, es-
O’Kane (2021) proposed an approach that uses LIME to identify im- pecially for large datasets and complex models, which can hinder
portant locations in the opcode sequence that are deemed significant the practical deployment of deep learning models in real-world appli-
by the Convolutional Neural Network (CNN). McLaughlin et al. (2017) cations. Thus, further research is required to improve the efficiency
used LRP (Bach et al., 2015) and DeepLift (Shrikumar et al., 2017) and effectiveness of defense mechanisms, as well as the generalization
methods to identify the opcode sequences for most malware families, ability and robustness of deep learning models to adversarial attacks.
and they demonstrated that the CNN, while using the DAMD dataset, Hu and Tan (2023) proposed a method to generate adversarial
learned patterns from the underlying op-code representation. Hooker, malware examples using Generative Adversarial Networks (GANs) for
Erhan, Kindermans, and Kim (2019) proposed a method to remove black-box attacks. Their results show that the generated adversarial
relevant features detected by an XAI approach and verify the accuracy malware samples can evade detection by existing machine learning
degradation. Lin, Lee, and Celik (2021) presented seven different XAI models while maintaining high similarity to the original malware. Ling
methods and automated the evaluation of the correctness of explana- et al. (2023) conducted a survey of the state-of-the-art in adversarial
tion techniques. The first four XAI methods are white-box approaches attacks against Windows PE malware detection, covering various types
to determine the importance of input features: Backpropagation (BP), of attacks and defense mechanisms. The authors also provided insights
on potential future research directions in this area. Xu et al. (2023)
Guided Backpropagation (GBP), Gradient-weighted Class Activation
proposed a semi-black-box adversarial sample attack framework called
Mapping (GCAM), and Guided GCAM (GGCAM). The last three are
Ofei that can generate adversarial samples against Android apps de-
black-box approaches that observe an essential feature in the output
ployed on a DLAAS platform. The framework utilizes a multi-objective
probability using perturbed samples of the input: Occlusion Sensitivity
optimization algorithm to generate robust and stealthy adversarial
(OCC), Feature Ablation (FA), and Local Interpretable Model-Agnostic
samples. Qiao et al. (2022) proposed an adversarial detection method
Explanations (LIME).
for ELF malware using model interpretation and show that their method
Guo et al. (2018) proposed an approach called Explaining Deep
can effectively identify adversarial ELF malware with high accuracy.
Learning based Security Applications (LEMNA) for security applica-
The proposed approach combines random forests and LIME to identify
tions, which generates interpretable features to explain how input
the most important features and thus improve the interpretability and
samples are classified. Kuppa and Le-Khac (2020) presented a com-
robustness of the model. Meenakshi and Maragatham (2023) proposed
prehensive analysis of the vulnerability of XAI methods to adversarial
a defensive technique using Curvelet transform to recognize adversarial
attacks in the context of cybersecurity, discussing potential risks associ-
iris images, optimizing the image classification accuracy. The designed
ated with deploying XAI models in real-world applications, and propos-
method was shown to be effective against several existing adversarial
ing a framework for designing robust and secure XAI systems. Rao and
attacks on iris recognition systems. Pintor et al. (2022) introduced a
Mane (2021) proposed an approach to protect and analyze systems
method for debugging and improving the optimization of adversarial
against the alarm-flooding problem using the NSL-KDD dataset. They
examples by identifying and analyzing the indicators of attack failure.
included a Security Information and Event Management (SIEM) system
The proposed method can help to improve the robustness of deep
to generate a zero-shot method for detecting alarm labels specific to
learning models against adversarial attacks.
adversarial attacks. Although explainable artificial intelligence (XAI)
has gained significant attention, its effectiveness in malware detection 14. Conclusion
still requires further investigation to fully comprehend its performance.
Machine learning has started to gain the attention of malware
13. Adversarial attack on deep neural networks detection researchers, notably in malware image classification and
cipher cryptanalysis. However, more experimentation is required to
Adversarial examples refer to maliciously crafted inputs to machine understand the capabilities and limitations of deep learning when used
learning models designed to deceive the model into making incorrect to detect/classify malware. Deep learning can reduce the need for static
predictions. Deep detection in this context refers to the use of deep and dynamic analysis and discover suspicious patterns. In the future,
learning models for detecting and classifying objects or patterns in the researchers may consider developing more accurate, robust, scalable,
input data. Adversarial examples can be specifically crafted to evade and efficient deep learning models for malware detection systems for
deep detection models and cause them to misclassify or miss the target various operating systems. Finally, multi-task learning and transfer
objects or patterns. Therefore, adversarial examples can be seen as a learning can provide valuable results in classifying all types of malware.
type of attack on deep detection models. Adversarial examples can Furthermore, we show that the significant challenges of deep learning
be generated using a variety of techniques, including optimization- approaches that need to be considered are hyperparameters optimiza-
based approaches and perturbation-based approaches, and can be used tion, fine-tuning, and size and quality of datasets when features are
12
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Fig. 20. Training and testing for accuracy and loss of EfficientnetB1. Fig. 22. Training and testing for accuracy and loss of EfficientnetB3.
Fig. 21. Training and testing for accuracy and loss of EfficientnetB2.
Fig. 23. Training and testing for accuracy and loss of EfficientnetB4.
Data availability
. tbk, .jpeg, . brd, .dot, .jpg, .rtf, .doc, .js, .sch, .3dm, .mp3, .sh, .3ds,
.key, .sldm, .3g2, .lay, .sldm, .mkv, .std, .asp, .mml, .sti, .avi, .mov, .stw,
.backup, . jsp, .suo, .bak, .mp4, .svg, .bat, .mpeg, .swf, .bmp, .mpg, .sxc,
.rb, .msg, .sxd, .bz2, .myd, .sxi, .c, .myi, .sxm, .cgm, .nef, .sxw, .class,
.odb, .tar, .cmd, .odg, .123, . onetoc2, .odp, .tgz, .crt, .ods, .tif, .3gp,
Fig. 25. Training and testing for accuracy and loss of EfficientnetB6.
.lay6, .sldx, .7z, .ldf, .slk, .vsd, .m3u, .sln, .aes, .m4u, .snt, .ai, .max,
.sql, . ppam, .mdb, .sqlite3, .asc, .mdf, .sqlitedb, .asf, .mid, .stc, .asm,
.cs, .odt, .tiff, .csr, .cpp, .txt, .csv, .pas, .vmx, .docb, .pdf, .vob, .docm,
.pem, . accdb, .docx, .pfx, .vsdx, .602, . p12, .wav, .dotm, .pl, .wb2,
.dotx, .png, .wk1, .dwg, .pot, . xltx, .edb, .potm, .wma, .eml, .potx,
.wmv, .fla, .ARC, .xlc, .flv, .pps, .xlm, .frm, .ppsm, .xls, .gif, .ppsx, .xlsb,
.gpg, .ppt, .xlsm, .gz, .pptm, .xlsx, .h, .pptx, .xlt, .hwp, .ps1, .xltm, .ibd,
.psd, .wks, .iso, .pst, .xlw, .jar, .rar, . djvu, .java, .raw., .ost, .uop, .db,
.otg, .uot, .dbf, .otp, .vb, .dch, .ots, .vbs, .der’’, .ott, .vcd, .dif, .php, .vdi,
.dip, .PAQ, .vmdk, .zip
Fig. 26. Training and testing for accuracy and loss of EfficientnetB7.
See Figs. 20–26.
13
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
References Fan, Z., Xu, Y., & Zhang, D. (2011). Local linear discriminant analysis framework using
sample neighbors. IEEE Transactions on Neural Networks, 22(7), 1119–1132.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). Tensorflow: Fischer, F., Xiao, H., Kao, C.-Y., Stachelscheid, Y., Johnson, B., Razar, D., et al. (2019).
A system for large-scale machine learning. In 12th {USeNIX} symposium on operating Stack overflow considered helpful! deep learning security nudges towards stronger
systems design and implementation {oSDI} 16, (pp. 265–283). cryptography. In 28th {USeNIX} security symposium {USeNIX} security 19, (pp.
Agrawal, R., Stokes, J. W., Selvaraj, K., & Marinescu, M. (2019). Attention in recurrent 339–356).
neural networks for ransomware detection. In ICASSP 2019 - 2019 IEEE international Gao, Y., Gong, H., Ding, X., & Guo, B. (2021). Image recognition based on mixed
conference on acoustics, speech and signal processing (pp. 3222–3226). attention mechanism in smart home appliances. Vol. 5, In 2021 IEEE 5th advanced
Alaraimi, S., Okedu, K. E., Tianfield, H., Holden, R., & Uthmani, O. (2021). Trans- information technology, electronic and automation control conference (pp. 1501–1505).
fer learning networks with skip connections for classification of brain tumors. Gibert, D., Mateu, C., & Planes, J. (2020). The rise of machine learning for detection and
International Journal of Imaging Systems and Technology. classification of malware: Research developments, trends and challenges. Journal of
Alcantarilla, P. F., Bartoli, A., & Davison, A. J. (2012). KAZE features. In European Network and Computer Applications, 153, Article 102526.
conference on computer vision (pp. 214–227). Springer. Girinoto, Setiawan, H., Putro, P. A. W., & Pramadi, Y. R. (2020). Comparison of
AlMazrouei, R. Z., Nelci, J., Salloum, S. A., & Shaalan, K. (2021). Feasibility of using LSTM architecture for malware classification. In 2020 international conference on
attention mechanism in abstractive summarization. In International conference on informatics, multimedia, cyber and information system (pp. 93–97).
emerging technologies and intelligent systems (pp. 13–20). Springer. Go, J. H., Jan, T., Mohanty, M., Patel, O. P., Puthal, D., & Prasad, M. (2020).
Asam, M., Hussain, S. J., Mohatram, M., Khan, S. H., Jamal, T., Zafar, A., et al. (2021). Visualization approach for malware classification with ResNeXt. In 2020 IEEE
Detection of exceptional malware variants using deep boosted feature spaces and congress on evolutionary computation (pp. 1–7).
Guo, W., Mu, D., Xu, J., Su, P., Wang, G., & Xing, X. (2018). Lemna: Explaining
machine learning. Applied Sciences, 11(21), 10464.
deep learning based security applications. In Proceedings of the 2018 ACM SIGSAC
Azad, M. A., Riaz, F., Aftab, A., Rizvi, S. K. J., Arshad, J., & Atlam, H. F. (2022).
conference on computer and communications security (pp. 364–379).
DEEPSEL: A novel feature selection for early identification of malware in mobile
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
applications. Future Generation Computer Systems, 129, 54–63.
recognition. In Proceedings of the IEEE conference on computer vision and pattern
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., & Samek, W. (2015). On
recognition (pp. 770–778).
pixel-wise explanations for non-linear classifier decisions by layer-wise relevance
Hemalatha, J., Roseline, S. A., Geetha, S., Kadry, S., & Damaševičius, R. (2021). An
propagation. PLoS One, 10(7), Article e0130140.
efficient DenseNet-based deep learning model for malware detection. Entropy, 23(3),
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly
344.
learning to align and translate. arXiv preprint arXiv:1409.0473.
Herault, J., & Jutten, C. (1986). Space or time adaptive signal processing by neural
Bai, Y., Xing, Z., Ma, D., Li, X., & Feng, Z. (2021). Comparative analysis of feature
network models. Vol. 151, In AIP conference proceedings (1), (pp. 206–211).
representations and machine learning methods in Android family classification.
American Institute of Physics.
Computer Networks, 184, Article 107639.
Heron, S. (2009). Advanced encryption standard (AES). Netw. Secur., 2009(12), 8–12.
Baksi, A. (2022). Machine learning-assisted differential distinguishers for lightweight
Hooker, S., Erhan, D., Kindermans, P.-J., & Kim, B. (2019). A benchmark for inter-
ciphers. In Classical and physical security of symmetric key cryptographic algorithms
pretability methods in deep neural networks. In Advances in neural information
(pp. 141–162). Springer.
processing systems: vol. 32.
Barath, N., Ouboti, D., & Temesguen, M. (2016). Pattern recognition algorithms for
Hu, M.-K. (1962). Visual pattern recognition by moment invariants. IRE Transactions
malware classification. In Proceeding of 2016 IEEE conference of aerospace and
on Information Theory, 8(2), 179–187.
electronics (pp. 338–342).
Hu, W., & Tan, Y. (2023). Generating adversarial malware examples for black-box
Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In
attacks based on GAN. In Data mining and big data: 7th international conference,
European conference on computer vision (pp. 404–417). Springer.
DMBD 2022, Beijing, China, November 21–24, 2022, proceedings, Part II (pp.
Bensaoud, A., Abudawaood, N., & Kalita, J. (2020). Classifying malware images with
409–423). Springer.
convolutional neural network models. International Journal of Network Security, 22,
Jurafsky, D., & Martin, J. (2021). Speech and Language Processing (third ed.).
1022–1031.
Kancherla, K., Donahue, J., & Mukkamala, S. (2016). Packer identification using Byte
Bensaoud, A., & Kalita, J. (2022). Deep multi-task learning for malware image
plot and Markov plot. Journal of Computer Virology and Hacking Techniques, 12(2),
classification. Journal of Information Security and Applications, 64, Article 103057.
101–111.
Bensaoud, A., & Kalita, J. (2024). CNN-LSTM and transfer learning models for malware Ketkar, N., & Santana, E. (2017). Vol. 1, Deep learning with Python. Springer.
classification based on opcodes and API calls. Knowledge-Based Systems, Article Khan, R. U., Zhang, X., & Kumar, R. (2019). Analysis of ResNet and GoogleNet models
111543. for malware detection. Journal of Computer Virology and Hacking Techniques, 15(1),
Bhodia, N., Prajapati, P., Di Troia, F., & Stamp, M. (2019). Transfer learning for 29–37.
image-based malware classification. arXiv preprint arXiv:1903.11551. Khayam, S. A. (2003). Vol. 114, The discrete cosine transform (DCT): theory and
Çayır, A., Ünal, U., & Dağ, H. (2021). Random CapsNet forest model for imbalanced application (pp. 1–31). Michigan State University, Citeseer.
malware type classification task. Computers & Security, 102, Article 102133. Kim, J., Ban, Y., Ko, E., Cho, H., & Yi, J. H. (2022). MAPAS: a practical deep learning-
Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In based android malware detection system. International Journal of Information
Proceedings of the thiry-fourth annual ACM symposium on theory of computing (pp. Security, 1–14.
380–388). Kim, J.-Y., & Cho, S.-B. (2022). Obfuscated malware detection using deep generative
Chauhan, D., Singh, H., Hooda, H., & Gupta, R. (2022). Classification of malware model based on global/local features. Computers & Security, 112, Article 102501.
using visualization techniques. In International conference on innovative computing Kim, H., Park, J., Kwon, H., Jang, K., & Seo, H. (2021). Convolutional neural network-
and communications (pp. 739–750). Springer. based cryptography ransomware detection for low-end embedded processors.
Chaulagain, D., Poudel, P., Pathak, P., Roy, S., Caragea, D., Liu, G., et al. (2020). Mathematics, 9(7), 705.
Hybrid analysis of android apps for security vetting using deep learning. In 2020 Kim, T., Suh, S. C., Kim, H., Kim, J., & Kim, J. (2018). An encoding technique for
IEEE conference on communications and network security (pp. 1–9). CNN-based network anomaly detection. In 2018 IEEE international conference on big
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In data (big data) (pp. 2960–2965).
Proceedings of the IEEE conference on computer vision and pattern recognition (pp. Kinkead, M., Millar, S., McLaughlin, N., & O’Kane, P. (2021). Towards explainable
1251–1258). CNNs for android malware detection. Procedia Computer Science, 184, 959–965.
Damodaran, A., Di Troia, F., Visaggio, C. A., Austin, T. H., & Stamp, M. (2017). A Kocaman, V., Shir, O. M., & Bäck, T. (2021). Improving model accuracy for imbal-
comparison of static, dynamic, and hybrid analysis for malware detection. Journal anced image classification tasks by adding a final batch normalization layer: An
of Computer Virology and Hacking Techniques, 13(1), 1–12. empirical study. In 2020 25th international conference on pattern recognition (pp.
Darem, A., Abawajy, J., Makkar, A., Alhashmi, A., & Alanazi, S. (2021). Visualization 10404–10411).
and deep-learning-based malware variant detection using OpCode-level features. Kok, S., Azween, A., & Jhanjhi, N. (2020). Evaluation metric for crypto-ransomware
Future Generation Computer Systems, 125, 314–323. detection using machine learning. Journal of Information Security and Applications,
Ding, Y., Wu, G., Chen, D., Zhang, N., Gong, L., Cao, M., et al. (2020). DeepEDN: 55, Article 102646.
a deep-learning-based image encryption and decryption network for internet of Kota, C. M., & Aissi, C. (2022). Implementation of the RSA algorithm and its
medical things. IEEE Internet of Things Journal, 8(3), 1504–1518. cryptanalysis. In 2002 GSW.
El-Shafai, W., Almomani, I., & AlKhayer, A. (2021). Visualized malware multi- Kuppa, A., & Le-Khac, N.-A. (2020). Black box attacks on explainable artificial
classification framework using fine-tuned CNN-based transfer learning models. intelligence (XAI) methods in cyber security. In 2020 international joint conference
Applied Sciences, 11(14). on neural networks (pp. 1–8). IEEE.
Euh, S., Lee, H., Kim, D., & Hwang, D. (2020). Comparative analysis of low-dimensional Lee, T. R., Teh, J. S., Jamil, N., Yan, J. L. S., & Chen, J. (2021). Lightweight block
features and tree-based ensembles for malware detection systems. IEEE Access, 8, cipher security evaluation based on machine learning classifiers and active S-boxes.
76796–76808. IEEE Access, 9, 134052–134064.
Eum, S., Lee, H., & Kwon, H. (2018). Going deeper with CNN in malicious crowd event Lian, W., Nie, G., Kang, Y., Jia, B., & Zhang, Y. (2022). Cryptomining malware
classification. Vol. 10646, In Signal processing, sensor/information fusion, and target detection based on edge computing-oriented multi-modal features deep learning.
recognition XXVII. International Society for Optics and Photonics, Article 1064616. China Communications, 19(2), 174–185.
14
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Lin, W.-C., Hays, J., Wu, C., Kwatra, V., & Liu, Y. (2004). A comparison study of four Qiao, Y., Zhang, W., Tian, Z., Yang, L. T., Liu, Y., & Alazab, M. (2022). Adversarial
texture synthesis algorithms on regular and near-regular textures: Tech. Rep., Citeseer. ELF malware detection method using model interpretation. IEEE Transactions on
Lin, Y.-S., Lee, W.-C., & Celik, Z. B. (2021). What do you see? Evaluation of explainable Industrial Informatics, 19(1), 605–615.
artificial intelligence (XAI) interpretability through neural backdoors. In Proceedings Qiao, Y., Zhang, B., & Zhang, W. (2020). Malware classification method based on
of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. word vector of bytes and multilayer perception. In ICC 2020-2020 IEEE international
1027–1035). conference on communications (pp. 1–6). IEEE.
Ling, X., Wu, L., Zhang, J., Qu, Z., Deng, W., Chen, X., et al. (2023). Adversarial attacks Rao, D., & Mane, S. (2021). Zero-shot learning approach to adaptive cybersecurity using
against Windows PE malware detection: A survey of the state-of-the-art. Computers explainable AI. arXiv preprint arXiv:2106.14647.
& Security, Article 103134. Ren, S., Zhou, L., Liu, S., Wei, F., Zhou, M., & Ma, S. (2021). Semface: Pre-training
Lo, W. W., Yang, X., & Wang, Y. (2019). An Xception convolutional neural network encoder and decoder with a semantic interface for neural machine translation. In
for malware classification with transfer learning. In 2019 10th IFIP international Proceedings of the 59th annual meeting of the association for computational linguistics
conference on new technologies, mobility and security (pp. 1–5). and the 11th international joint conference on natural language processing (volume 1:
Lowe, D. (1999). Object recognition from local scale-invariant features. Vol. 2, In long papers) (pp. 4518–4527).
Proceedings of the seventh IEEE international conference on computer vision (pp. Rezende, E., Ruppert, G., Carvalho, T., Ramos, F., & De Geus, P. (2017). Malicious
1150–1157 vol.2). software classification using transfer learning of resnet-50 deep neural network.
Lu, Z., Li, X., Liu, Y., Zhou, C., Cui, J., Wang, B., et al. (2021). Exploring multi-stage In 2017 16th IEEE international conference on machine learning and applications (pp.
information interactions for multi-source neural machine translation. IEEE/ACM 1011–1014). IEEE.
Transactions on Audio, Speech, and Language Processing, 1.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Ex-
Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective approaches to
plaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD
attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
international conference on knowledge discovery and data mining (pp. 1135–1144).
Ma, X., Guo, S., Li, H., Pan, Z., Qiu, J., Ding, Y., et al. (2019). How to make
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alter-
attention mechanisms more practical in malware classification. IEEE Access, 7,
native to SIFT or SURF. In 2011 international conference on computer vision (pp.
155270–155280.
2564–2571).
Mahendra, M., & Prabha, P. S. (2022). Classification of security levels to enhance
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015).
the data sharing transmissions using blowfish algorithm in comparison with data
Imagenet large scale visual recognition challenge. International Journal of Computer
encryption standard. In 2022 international conference on sustainable computing and
Vision, 115(3), 211–252.
data communication systems (pp. 1154–1160). IEEE.
Sabour, S., Frosst, N., & Hinton, G. E. (2017a). Dynamic routing between capsules. In
McLaughlin, N., Martinez del Rincon, J., Kang, B., Yerima, S., Miller, P., Sezer, S., et
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, &
al. (2017). Deep android malware detection. In Proceedings of the seventh ACM on
R. Garnett (Eds.), Vol. 30, Advances in neural information processing systems. Curran
conference on data and application security and privacy (pp. 301–308).
Meenakshi, K., & Maragatham, G. (2023). An optimised defensive technique to Associates, Inc..
recognize adversarial Iris images using Curvelet transform. Intelligent Automation Sabour, S., Frosst, N., & Hinton, G. E. (2017b). Dynamic routing between capsules.
& Soft Computing, 35(1), 627–643. arXiv preprint arXiv:1710.09829.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word Sasi, S. B., & Sivanandam, N. (2015). A survey on cryptography using optimization
representations in vector space. arXiv preprint arXiv:1301.3781. algorithms in WSNs. Indian Journal of Science and Technology, 8(3), 216.
Mimura, M., & Ito, R. (2021). Applying NLP techniques to malware detection in a Sharmeen, S., Ahmed, Y. A., Huda, S., Koçer, B. Ş., & Hassan, M. M. (2020). Avoiding
practical environment. International Journal of Information Security, 1–13. future digital extortion through robust protection against ransomware threats using
Mimura, M., & Ohminami, T. (2020). Using LSI to detect unknown malicious VBA deep learning based adaptive approaches. IEEE Access, 8, 24522–24534.
macros. Journal of Information Processing, 28, 493–501. Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through
Mohamad, Arif, J., Ab Razak, M. F., Awang, S., Tuan Mat, S. R., Ismail, N. S. N., et al. propagating activation differences. In International conference on machine learning
(2021). A static analysis approach for android permission-based malware detection (pp. 3145–3153). PMLR.
systems. PLoS One, 16(9), Article e0257968. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale
Mohammed, T. M., Nataraj, L., Chikkagoudar, S., Chandrasekaran, S., & Manjunath, B. image recognition. arXiv preprint arXiv:1409.1556.
(2021). Malware detection using frequency domain-based image visualization and Sudhakar, & Kumar, S. (2021). MCFT-CNN: Malware classification with fine-tune
deep learning. arXiv preprint arXiv:2101.10578. convolution neural networks using traditional and transfer learning in internet of
Nadeem, A., Vos, D., Cao, C., Pajola, L., Dieck, S., Baumgartner, R., et al. (2022). Sok: things. Future Generation Computer Systems, 125, 334–351.
Explainable machine learning for computer security applications. arXiv preprint Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer
arXiv:2208.10605. Vision, 7(1), 11–32.
Naik, N., Jenkins, P., Savage, N., Yang, L., Boongoen, T., & Iam-On, N. (2021). Fuzzy- Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-
import hashing: A static analysis technique for malware detection. Forensic Science resnet and the impact of residual connections on learning. In Thirty-first AAAI
International: Digital Investigation, Vol. 37, Article 301139. conference on artificial intelligence.
Narayanan, B. N., & Davuluru, V. S. P. (2020). Ensemble malware classification system Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going
using deep neural networks. Electronics, 9(5), 721. deeper with convolutions. In Proceedings of the IEEE conference on computer vision
Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B. S. (2011). Malware images: and pattern recognition (pp. 1–9).
visualization and automatic classification. In Proceedings of the 8th international
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the
symposium on visualization for cyber security (pp. 1–7).
inception architecture for computer vision. In Proceedings of the IEEE conference on
Ni, S., Qian, Q., & Zhang, R. (2018). Malware identification using visualization images
computer vision and pattern recognition (pp. 2818–2826).
and deep learning. Computers & Security, 77, 871–885.
Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional
Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep
neural networks. In International conference on machine learning (pp. 6105–6114).
learning. Neurocomputing, 452, 48–62.
PMLR.
Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture
Tobiyama, S., Yamaguchi, Y., Shimada, H., Ikuse, T., & Yagi, T. (2016). Malware
measures with classification based on featured distributions. Pattern Recognition,
detection with deep neural network using process behavior. Vol. 2, In 2016 IEEE
29(1), 51–59.
40th annual computer software and applications conference (pp. 577–582).
Olani, G., Wu, C.-F., Chang, Y.-H., & Shih, W.-K. (2022). DeepWare: Imaging perfor-
Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., & Zheng, Q. (2020). IMCFN:
mance counters with deep learning to detect ransomware. IEEE Transactions on
Image-based malware classification using fine-tuned convolutional neural network
Computers, 1.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic architecture. Computer Networks, 171, Article 107138.
representation of the spatial envelope. International Journal of Computer Vision, Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al.
42(3), 145–175. (2017). Attention is all you need. In Advances in neural information processing
Onwuzurike, L., Mariconti, E., Andriotis, P., Cristofaro, E. D., Ross, G., & Stringhini, G. systems, (pp. 5998–6008).
(2019). Mamadroid: Detecting android malware by building Markov chains of Vivek, Y., Ravi, V., Mane, A. A., & Naidu, L. R. (2022). Explainable artificial intelligence
behavioral models (extended version). ACM Transactions on Privacy and Security, and causal inference based ATM fraud detection. arXiv preprint arXiv:2211.10595.
22(2), 1–34. Xiao, M., Guo, C., Shen, G., Cui, Y., & Jiang, C. (2021). Image-based malware
Or-Meir, O., Cohen, A., Elovici, Y., Rokach, L., & Nissim, N. (2021). Pay attention: classification using section distribution information. Computers & Security, 110,
Improving classification of PE malware using attention mechanisms based on system Article 102420.
call analysis. In 2021 international joint conference on neural networks (pp. 1–8). Xu, G., Xin, G., Jiao, L., Liu, J., Liu, S., Feng, M., et al. (2023). Ofei: A semi-black-box
Pagliardini, M., Gupta, P., & Jaggi, M. (2017). Unsupervised learning of sentence android adversarial sample attack framework against dlaas. IEEE Transactions on
embeddings using compositional n-gram features. arXiv preprint arXiv:1703.02507. Computers.
Pintor, M., Demetrio, L., Sotgiu, A., Demontis, A., Carlini, N., Biggio, B., et al. (2022). Yakura, H., Shinozaki, S., Nishimura, R., Oyama, Y., & Sakuma, J. (2019). Neural
Indicators of attack failure: Debugging and improving optimization of adversarial malware analysis with attention mechanism. Computers & Security, 87, Article
examples. Advances in Neural Information Processing Systems, 35, 23063–23076. 101592.
15
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546
Ye, R., & Dai, Q. (2021). Implementing transfer learning across different datasets for Zhong, F., Cheng, X., Yu, D., Gong, B., Song, S., & Yu, J. (2023). Malfox: camouflaged
time series forecasting. Pattern Recognition, 109, Article 107617. adversarial malware example generation based on conv-GANs against black-box
Yuan, B., Wang, J., Liu, D., Guo, W., Wu, P., & Bao, X. (2020). Byte-level malware detectors. IEEE Transactions on Computers.
classification based on Markov images and deep learning. Computers & Security, 92, Zhu, J., Jang-Jaccard, J., Singh, A., Watters, P. A., & Camtepe, S. (2021). Task-aware
Article 101740. meta learning-based siamese neural network for classifying obfuscated malware.
Zhao, Z., Li, Z., Zhang, F., Yang, Z., Luo, S., Li, T., et al. (2023). SAGE: Steering arXiv preprint arXiv:2110.13409.
the adversarial generation of examples with accelerations. IEEE Transactions on
Information Forensics and Security, 18, 789–803.
16