0% found this document useful (0 votes)
8 views16 pages

A survey of malware detection using deep learning

This paper surveys recent advancements in malware detection using deep learning across various operating systems, highlighting the challenges and effectiveness of different deep learning models. It discusses the need for explainable AI in malware detection, the impact of adversarial attacks, and reviews multiple deep learning approaches on malware datasets. The findings aim to provide a comprehensive understanding of malware recognition and the complexities involved in developing effective detection systems.

Uploaded by

kofeinmrdoors98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

A survey of malware detection using deep learning

This paper surveys recent advancements in malware detection using deep learning across various operating systems, highlighting the challenges and effectiveness of different deep learning models. It discusses the need for explainable AI in malware detection, the impact of adversarial attacks, and reviews multiple deep learning approaches on malware datasets. The findings aim to provide a comprehensive understanding of malware recognition and the complexities involved in developing effective detection systems.

Uploaded by

kofeinmrdoors98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Machine Learning with Applications 16 (2024) 100546

Contents lists available at ScienceDirect

Machine Learning with Applications


journal homepage: www.elsevier.com/locate/mlwa

A survey of malware detection using deep learning


Ahmed Bensaoud ∗, Jugal Kalita, Mahmoud Bensaoud
Deptarment of Computer Science, University of Colorado Colorado Springs, CO, USA

ARTICLE INFO ABSTRACT

Keywords: The problem of malicious software (malware) detection and classification is a complex task, and there is no
Malware detection perfect approach. There is still a lot of work to be done. Unlike most other research areas, standard benchmarks
Multi-task learning are difficult to find for malware detection. This paper aims to investigate recent advances in malware detection
Malware image
on MacOS, Windows, iOS, Android, and Linux using deep learning (DL) by investigating DL in text and image
Generative adversarial networks
classification, the use of pre-trained and multi-task learning models for malware detection approaches to
Mobile malware
Convolutional neural network
obtain high accuracy and which the best approach if we have a standard benchmark dataset. We discuss
the issues and the challenges in malware detection using DL classifiers by reviewing the effectiveness of these
DL classifiers and their inability to explain their decisions and actions to DL developers presenting the need
to use Explainable Machine Learning (XAI) or Interpretable Machine Learning (IML) programs. Additionally,
we discuss the impact of adversarial attacks on deep learning models, negatively affecting their generalization
capabilities and resulting in poor performance on unseen data. We believe there is a need to train and test the
effectiveness and efficiency of the current state-of-the-art deep learning models on different malware datasets.
We examine eight popular DL approaches on various datasets. This survey will help researchers develop a
general understanding of malware recognition using deep learning.

1. Introduction Researchers have used deep learning to classify malware samples


since it generalizes well to unseen data. Our survey focuses on static,
Operating systems such as Windows, Android, Linux, and MacOS are dynamic and hybrid malware detection methods in Windows, Android,
updated every few weeks to protect against critical vulnerabilities. On Linux, MacOS, and iOS. We describe the strengths and weaknesses of
the other hand, malware authors are also always looking for new ways deep learning models for malware detection. Most recent research uses
to finesse their malicious code to overwhelm the new operating system deep neural networks (DNNs) for malware classification and achieves
updates. Every operating system is vulnerable. In addition, since oper- high success. State-of-the-art DNN models have been developed against
ating systems run on desktops and servers, and even on routers, security modern malware such as Zeus, Fleeceware, RaaS, Mount Locker, REvil,
cameras, drones and other devices, the biggest problem is diversity of LockBit, Cryptesla, Snugy, and Shlayer.
systems to protect because all these devices are very different. The contributions of this paper are as follows:
Most every day, there is a new story about malicious software in the
news. For example, in Oct 2022, cyberattacks coming from a Russia- • It gives the big picture of how hackers attack (Sections 2,3,4,5).
based hacker group known as Killnet targeted the government services • It presents how to generate images form malware files (Section 6).
of the state of Colorado, Alabama, Alaska, Delaware, Connecticut, • It discusses deep learning models for malware image classification
Florida, Mississippi, and Kansas websites.1 Again in 2022, hackers (Section 7).
working on behalf of the Chinese government stole $20 million from • It describes feature reduction that can improve performance (Sec-
covid relief benefits.2 The increase in the vulnerability of sensitive data tion 8).
due to cyber-attacks, cyber-threats, cyber-crimes, and malware needs to
• It discusses transfer learning approaches in the classification of
be countered. In 2023, Fig. 1 shows countries that have been attacked
malware and what needs to improve for better performance (Sec-
by malware and the top origins of these malware.3
tion 9).

∗ Corresponding author.
E-mail addresses: [email protected] (A. Bensaoud), [email protected] (J. Kalita), [email protected] (M. Bensaoud).
1
https://www.nbcnews.com/tech/security/colorado-state-websites-struggle-russian-hackers-vow-attack-rcna51012
2
https://www.nbcnews.com/tech/security/china-hacked-least-six-us-state-governments-report-says-rcna19255
3
https://attackmap.sonicwall.com/live-attack-map

https://doi.org/10.1016/j.mlwa.2024.100546
Received 24 December 2023; Received in revised form 10 March 2024; Accepted 10 March 2024
Available online 20 March 2024
2666-8270/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Fig. 1. Worldwide attacks.

• It reviews the use of natural language processing in malware document or performs a specific function. Each object starts with
classification (Section 10). two numbers, followed by the keyword obj, and ends with endobj.
• It presents the deep learning models for cryptographer ranso- There are many kinds of objects, such as font objects, image
mware (Section 11). objects, and even objects that contain metadata.
• It shows how we know if we can trust the results of a DL model • There are many keywords that begin with a /and describe how
using Explainable Artificial Intelligence, XAI (Section 12). the PDF works. Some of the keywords related to malicious activity
• It discusses significant challenge for the reliability and secu- include /OpenAction, or its abbreviation /AA, both of which indi-
rity pozed by adversarial attacks on deep learning models (Sec- cate an automatic action to be performed when the document is
tion 13). viewed.4 This keyword points to another object that automatically
gets opened or executed when the PDF is opened. Malicious
The rest of this paper, we discuss avenues for future research and
PDFs have /OpenAction pointing to some malicious JavaScript, or
we examine the Efficientnet B0, B1, B2, B3, B4, B5, B6, and B7 models
an object containing an export; whenever one opens the docu-
on malware images datasets for classification.
ment, the system is automatically compromised. /JavaScript or
/JS keyword indicate the presence of JavaScript code. Malicious
2. Mechanics of malware attacks
PDFs usually contain malicious JavaScript to launch an exploit
or download additional malware. Some objects can be referred
The hacker has one goal, which is to get malware installed onto
to as /Name instead of their number. Some PDFs have the ability
a victim’s computer. Because most computers are protected by some
to have files embedded with keyword /EmbeddedFile, /URL or /
type of firewall, direct attacks are difficult to impossible to perform.
SubmitForm. /URL is accessed or downloaded when the object is
Therefore, attackers attempt to trick the computer into running the
loaded.
malicious code. The most common way to do this is by using documents
• PDFs can encode data in multiple ways, which is very flexible
or executable files. For instance, a hacker may send an email or a
phish to the victim with a malicious document attachment or a link and can store data in a number of ways. Hackers can encode and
to a website where the malicious document is located. Once the victim hide their data. For example, names are case sensitive, but can be
opens the document, embedded exploits or scripts run and download fully or partially hex encoded. More precisely, the # sign followed
or extract more malware. This is the real malware the hacker wants to by two hex characters represents hex encoded data. Data also can
run on the victim’s system and is often something like a backdoor or be octal encoded or represented by their base eight number. The
ransomware. However, malicious documents are usually not the final octal encoded character has a ∖ followed by three digits between 0
piece of malware in an attack, but are one of the compromised vectors and 7. However, the hackers can mix hex, octal, and ASCII data all
used by the hacker to get on the system. As an example, below we together, which makes it possible to hide data such as JavaScript
discuss how a PDF document can be used to initiate an attack. code or URLs.

The names and strings can be encoded, but data streams can be
2.1. PDF and document files
modified and encoded further using filters. Filters are algorithms that
When analyzing PDF, we find three things: Object, which is the are applied to the data to encode or compress within the PDF. There
structure of the PDF, Keywords which control how the PDF works, and are multiple filters that can be used in PDFs, such as /ASCiiHexDecode,
Data stored or encoded within a PDF. Hex encoding of characters; /LZWDecode, LZW compression algorithm;
/FlateDecode, Zlib compression; /ASCii85Decode, ASCII base-85 repre-
• Objects are the building blocks of a PDFs. Every PDF starts with sentation; and /Crypt, various encryption algorithms. For example, in
a Header which needs to be present in the first 1024 bytes of Fig. 2, we have a PDF document with three objects. Object 1 is a catalog
the documents. Some hackers take advantage of this by putting that has OpenAction and is referring to version 0 of object 2, which
unrelated data within the first 1024 bytes. This is a very simple
technique to try to avoid signature-based detection. PDFs are
4
composed of objects; each section has specific data within the https://blog.didierstevens.com/programs/pdf-tools/

2
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Fig. 2. PDF format example. Fig. 4. Obfuscated malicious JavaScript code.

Fig. 3. Malicious JavaScript code.

Fig. 5. Oops, your files have been encrypted!


means as soon as the document is opened, Object 2 will be run. Object
2 contains a JavaScript keyword, but we do not see any JavaScript code
in this object because the JavaScript keyword refers to another object
3.1. Obfuscation
which is Object 3. Object 3 is a stream object as indicated by the stream
keyword and has been ASCiiHex encoded and compressed with the Zlib
Obfuscation is an attempt by an author of a piece of code to obscure
compression algorithm. However, we have been able to determine that
the meaning, to make something unclear, or make it very difficult to
as soon as the PDF opens, JavaScript will be executed, and we do not
analyze. It may use encryption or compression to hide its true intentions
know what the JavaScript’s goal is. If this is a malicious PDF, it can
or to evade signature-based detection by security software.
cause problems. In Fig. 3, the JavaScript code references the two hosts’
names, performs an HTTP GET request to each, saves an executable file,
and finally runs it. 3.2. Payload delivery
Is malicious JavaScript used only in documents? The answer is
everywhere. Malicious JavaScript is used in web pages that are created Malware code typically carries a payload, which is the malicious
by web attack kits that perform drive-by downloads. The user opens action it intends to execute. This can range from stealing sensitive
the website that has been compromised or loads a malicious ad, which information (e.g., financial data, login credentials) to launching dis-
then loads malicious JavaScript. Without JavaScript, it is difficult for tributed denial-of-service (DDoS) attacks, encrypting files for ransom
hackers to get their exploit to work. (ransomware), or providing backdoor access for remote control.
Most hackers try to hide what their script is doing using obfuscation
techniques. Most techniques used to obfuscate script can be broken 3.3. Command and control (C&C)
down into four different categories. How the format of a program is
obfuscated is shown in Fig. 4; approaches include adding extra lines Many malware strains establish communication channels with re-
of code, obfuscating the data, and substituting variable names. mote servers or command-and-control infrastructure. This allows at-
tackers to remotely control and manage the infected systems, update
3. Nature of malware code the malware, and receive stolen data.

The nature of malware code encompasses various characteristics 3.4. Self-replication


and behaviors that define its purpose and functionality. Malware, short
for malicious software, refers to any code or program designed with Many malware strains possess the ability to self-replicate, allowing
malicious intent to compromise systems, steal information, or disrupt them to spread across networks, devices, or files. This replication
normal operations. The nature of malware code can vary depending on can occur through various means, such as attaching to exploiting
its specific type and objectives, but some common attributes include: vulnerabilities, legitimate files, or utilizing network resources.

3
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

3.5. Exploitation Table 1


Syslog and Windows log.
Syslog Windows logs
Malware leverages vulnerabilities and weaknesses in software, oper-
ating systems, or user behavior to gain unauthorized access or control. IETF standard Event log
Timestamp Contains source, event ID, and log level
It can exploit security flaws, network vulnerabilities, or social en-
Standard for network equipment Logs Application, security, network
gineering techniques to compromise systems and execute malicious logging events from a machine or server
actions. Device-ID, severity level, message Timestamp, user, computer, and process
number, message text ID
Can be customized on network Used in most enterprise environments
3.6. Polymorphism
equipment for different events running Windows
and severity levels
Some malware utilizes polymorphic or metamorphic techniques to
dynamically change its code structure or appearance while preserving
its functionality. This makes it more challenging for antivirus software
to detect and block. Compared to static analysis, dynamic analysis includes system dy-
namic behavior monitoring, snapshot, debugging, etc. Kim, Suh, Kim,
3.7. Ransomware Kim, and Kim (2018) presented a new encoding technique for dynamic
features to identify anomalous events using Convolutional Neural Net-
A ransomware usually combines cryptography with malware. How works (CNNs).
Security researchers have also extracted combined features from
does it work? The hacker sends the file to an unknowing victim.
different parts of malware files. Bai, Xing, Ma, Li, and Feng (2021)
When the victim opens the file, it executes the malware’s payload and
extracted features from static and dynamic analysis of Android apps and
encrypts victim data such as photos, documents, multimedia, files, and
applied a deep learning technique. Chaulagain et al. (2020) presented
even confidential records. The hacker often forces the victim to pay in
a deep learning-based hybrid analysis technique by collecting different
cryptocurrency, in most cases Bitcoin.
artifacts during static and dynamic analysis to train the deep learning
Ransomware has worm-like properties and has names such as Wan-
models.
naCrypt, WanaCrypt0r, WCRY, WanaDecrypt0r, and WCrypt. Each
encrypted file is locked by a different key and encrypted with the RSA 5. Data for malware detection
algorithm, which makes the file unaccessible to the owner who does
not have the keys. The WannaCry virus can encrypt a large number of Numerous system logs of activities of machines such as phones,
file types. An exhaustive list is given in Appendix A. tablets, laptops, and other devices are generated by the operating
The ransomware replaces the desktop wallpaper with the ransom system and other infrastructure software. The data are created and
note file by modifying Windows registry. It holds all files hostage stored on the local device and sent to remote servers. Analyzing log
to demand ransom payments of $300 and later $600 in the Bitcoin data, we can not only detect breaches or suspicious activity, but we can
cryptocurrency as shown in Fig. 5. track behavior through the network. Log data allow us to track security
As an example, on May 11, 2022, Costa Rica’s newly elected presi- events, troubleshoot the infrastructure, and optimize the environment
dent had to declare a state of national emergency due to a ransomware and the machines. Log data can take many different forms like syslog,
attack carried out by the Conti ransomware gang. They requested $10 authentication logs, local security event logs, network asset logs, and
million, but the demand changed to $20 million after Costa Rica re- system logs. One of goals in malware detection is to be able to read,
fused to pay.5 As another example, in October 2022, ransomware gang search, and analyze the data efficiently and effectively.
accessed data on 270,000 patients from Louisiana hospital system.6 Table 1 contains some information that is useful from syslog and
Understanding the nature of malware code is crucial for developing windows logs. Both kinds of logs have many components in different
effective defense mechanisms and mitigating its impact. It enables format that helps us in the investigation.
security professionals to develop robust detection methods, implement
security best practices, and respond promptly to evolving threats. 6. Generating malware images for deep learning

4. Overview & malware detection Several tools can visualize and edit a binary file in hexadecimal or
ASCII formats such as IDA Pro,7 x32/x64 Debugger,8 HxD,9 PE-bear,10
Yara,11 Fiddler,12 Metadata,13 XOR analysis,14 and Embedded strings.15
Malware detection methods are divided into three types: static,
Malware file or code can be used to generate an image by converting
dynamic, and hybrid (Damodaran, Di Troia, Visaggio, Austin, & Stamp,
the binary, octal, hexadecimal or decimal into a two dimensional
2017). Static methods inspect an executable file without running it,
matrix of pixels. The image can be grayscale or RGB. In grayscale, pixels
while dynamic methods must run the executable file and analyze its
are black and white values in the range [0–255] where 0 represents
behaviors inside a controlled environment. In hybrid methods, the in-
black, and 255 represents white.
formation is collected regarding malware from static as well as dynamic
Gray image feature: The machine stores images in a matrix of
analysis.
numbers. These numbers, or the pixel values, denote the intensity or
Some security researchers use static features by decompiling the
brightness of the pixel. Smaller numbers (close to zero) represent black,
target file. Naik, Jenkins, Savage, Yang, Boongoen, and Iam-On (2021) and larger numbers (closer to 255) denote white (see Fig. 6).
proposed a fuzzy-import hashing technique based on static analysis
for malware detection. Mohamad, Arif, Ab Razak, Awang, Tuan Mat,
Ismail, and Firdaus (2021) proposed machine learning classifiers based 7
https://hex-rays.com/ida-pro
on permission-based features for static analysis to detect Android mal- 8
https://x64dbg.com/#start
ware. 9
https://mh-nexus.de/en/hxd
10
https://hshrzd.wordpress.com/pe-bear
11
https://yara.readthedocs.io/en/stable
5 12
https://securityintelligence.com/news/costa-rica-state-emergency- https://www.telerik.com/purchase/fiddler
ransomware/ 13
https://www.malwarebytes.com/glossary/metadata
6 14
https://www.cnn.com/2022/12/28/politics/hackers-access-data- https://eternal-todo.com/var/scripts/xorbruteforcer
15
louisiana-hospital-system-ransomware/index.html https://virustotal.github.io/yara/

4
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

of samples from 25 malware families as grayscale images. Several


visualization techniques have been used for malware classification.
The basic idea used in these methods is to explore the distinguishing
patterns in malware images. In addition, the visualization techniques
help find the correlations among different malware families. Some
existing approaches generate grayscale images and others generate
RGB images. Most existing approaches use global features to generate
malware image.
Yuan et al. (2020) proposed a method based on Markov images
according to the byte transmission probability matrix. They used a CNN
to classify Markov malware images without scaling. Narayanan and
Davuluru (2020) proposed an ensemble approach using RNN and CNN
architectures for malware image classification. Images were generated
Fig. 6. Malware feature representation in grayscale image. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of
from assembly compiled files and classified using CNNs. Zhu, Jang-
this article.) Jaccard, Singh, Watters, and Camtepe (2021) proposed a Task-Aware
Meta Learning-based Siamese Neural Network to classify obfuscated
malware images. Their model showed high effectiveness on unique
malware signature detection to classify obfuscated malware. Chauhan,
Singh, Hooda, and Gupta (2022) visualized malware files in different
color modes, RGB, HSV, grayscale, and BGR. They used a support vector
machine (SVM) to classify these malware images, with accuracy of
96% in all modes. Darem, Abawajy, Makkar, Alhashmi, and Alanazi
(2021) designed a semi-supervised method based on malware image
and feature engineering for obfuscated malware detection. The model
achieved 99.12% accuracy on obfuscated malware detection. Asam
et al. (2021) proposed two malware image classification approaches
called Deep Feature Space-based Malware classification (DFS-MC) and
Deep Boosted Feature Space-based Malware classification (DBFS-MC).
The approach achieved a good accuracy of 98.61% on the MalImg
Fig. 7. Malware feature representation in RGB image. (For interpretation of the malware dataset.
references to color in this figure legend, the reader is referred to the web version
Xiao, Guo, Shen, Cui, and Jiang (2021) presented a visualization
of this article.)
method called Colored Label boxes (CoLab) to specify each section in a
PE file and convert it to malware image. The authors built a composed
CoLab image, and used VGG16, and Support vector machine for clas-
RGB images: There are three matrices or channels (Red, Green,
sification. The model was applied on two datasets, VX-Heaven16 and
Blue), where each matrix has values between 0−255. These three colors
BIG-2015, with 96.59% and 98.94% average accuracies, respectively.
are combined together in various ways to represent one of 16,777,216
A comparison of reviewed malware images classification is discussed in
possible colors (see Fig. 7).
Table 2.
Malware can be converted to images in different ways. Yuan, Wang,
Liu, Guo, Wu, and Bao (2020) converted malware binaries into Markov
8. Feature reduction for efficient malware detection
images by computing transfer probability of bytes where each pixel is
generated by Eq. (1):
Feature Reduction reduces the number of variables or features in
𝑓 (𝑚, 𝑛)
𝑝𝑚,𝑛 = 𝑃 (𝑛|𝑚) = 𝑚, 𝑛 ∈ {0, 1, … , 255}. (1) the representation of a data example. Approaches to feature reduction

255
can be divided into two subcategories called (a) Feature Selection
𝑓 (𝑚, 𝑛)
𝑛=0
which includes methods such as Wrappers, Filters, and Embedded,
and (b) Feature Extraction, which includes methods such as Principal
Mohammed, Nataraj, Chikkagoudar, Chandrasekaran, and Manju-
Components Analysis (Barath, Ouboti, & Temesguen, 2016). How does
nath (2021) used a vector of 16-bit signed hexadecimal numbers to
Feature Reduction improve performance? It does by reducing the
represent a 256 × 256 image. Then, they computed bi-gram frequency
number of features that are considered for analysis.
counts which they used as pixel intensity values. Full-frame Discrete
In feature extraction, we start with 𝑛 features 𝑥1 , 𝑥2 , 𝑥3 , … ., 𝑥𝑛 ,
Cosine Transform (DCT) (Khayam, 2003) was computed to de-sparsify,
which we map to a lower dimensional space to get the new features
and the bigram-DCT was used to represent the output image. Euh, Lee,
𝑧1 , 𝑧2 , 𝑧3 , … ., 𝑧𝑚 where 𝑚 < 𝑛. Each of the new features is usually linear
Kim, and Hwang (2020) proposed Window Entropy Map (WEM) to
a combination of the original feature set 𝑥1 , 𝑥2 , 𝑥3 , … ., 𝑥𝑛 . Thus, each
visualize malware as an image. They calculated the entropy for each
new feature is obtained as a function F(X) of the original feature set
byte to measure the degree of uncertainty. Ni, Qian, and Zhang (2018)
X. This makes a projection of a higher dimensional feature space to
converted malware code into gray images using SimHash (Charikar,
2002) and then encoded them. They mapped SimHash values to pixels a lower dimensional feature space, so that the smaller dimensional
and then converted them to grayscale images. feature set may lead to better classification or faster classification (see
Eq. (2)).
[ ]⊺ ([ ]⊺ )
7. Image classification for malware detection 𝑧1 … 𝑧𝑚 = 𝐹 𝑥1 … 𝑥𝑛 (2)

Deep learning can solve diverse ‘‘vision’’ problems, including mal- In feature selection, we choose a subset of the features, in contrast
ware image classification tasks. Deep learning can extract features to feature extraction where we map the original features to a lower
automatically obviating manual feature extraction. The content of the
malware executable file is first converted into a digital image. Nataraj,
16
Karthikeyan, Jacob, and Manjunath (2011) visualized the byte codes https://archive.org/download/vxheavens-2010-05-18

5
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Table 2
Comparative performance summary of Transfer Learning models for malware image classification.
Reference Features Model Files Accuracy Dataset
Çayır, Ünal, and Dağ (2021) gray-scale images CapsNet PE 98.63% Malimg
Çayır et al. (2021) gray-scale images RCNF PE 98.72% Malimg
Go et al. (2020) gray-scale images ResNeXt PE 98.32% Malimg
Bensaoud, Abudawaood, and Kalita (2020) gray-scale images Inception V3 PE 99.24% Malimg
El-Shafai, Almomani, and AlKhayer (2021) gray-scale images VGG16 PE 99.97% Malimg
Hemalatha, Roseline, Geetha, Kadry, and Damaševičius (2021) gray-scale images DenseNet PE 98.23% Malimg
Hemalatha et al. (2021) gray-scale images DenseNet PE 98.46% BIG 2015
Lo, Yang, and Wang (2019) gray-scale images Xception PE 99.03% Malimg
Lo et al. (2019) gray-scale images Xception PE 99.17% BIG 2015

dimensional space. The smaller dimensional feature set can help pro-
duce better as well as faster classification. To do that, we need to find
a projection matrix 𝑊 ∋ 𝑍̄ = 𝑊 𝑇 𝑋. ̄ We expect from such a projection
that the new features are uncorrelated and cannot be reduced further
and are non redundant. Next, we need features to have large variance:
Why? Because if a feature takes similar values for all the instances, that
feature cannot be used as a discriminator.
Feature extraction methods such as a Principal Component Anal-
ysis (PCA) (Barath et al., 2016), GIST (Oliva & Torralba, 2001), Hu
Moments (Hu, 1962), Color Histogram (Swain & Ballard, 1991), Har-
alick texture (Lin, Hays, Wu, Kwatra, & Liu, 2004), Discrete Wavelet
Transform (DWT) (Kancherla, Donahue, & Mukkamala, 2016), Inde-
pendent Component Analysis (ICA) (Herault & Jutten, 1986), Linear Fig. 8. Feature Extraction for Transfer Learning.
discriminant analysis (LDA) (Fan, Xu, & Zhang, 2011), Oriented Fast
and Rotated BRIEF (ORB) (Rublee, Rabaud, Konolige, & Bradski, 2011),
Speeded Up Robust Feature (SURF) (Bay, Tuytelaars, & Van Gool, 9.1. Using feature extraction
2006), Scale Invariant Feature Transform (SIFT) (Lowe, 1999), Dense
Scale Invariant Feature Transform (D-SIFT) (Lowe, 1999), Local Binary Feature extraction discussed earlier is a practical and common,
Patterns (LBPs) (Ojala, Pietikäinen, & Harwood, 1996), KAZE (Alcan- and low resource-intensive way of using pre-trained networks. It takes
tarilla, Bartoli, & Davison, 2012) have been combined with machine the convolutional base of a previously trained network and runs the
learning including deep learning. These methods successfully filter the malware data through it, and then trains a new classifier on top
characteristics of malware files. Azad, Riaz, Aftab, Rizvi, Arshad, and of the output. As shown in Fig. 8, we can choose a network such
Atlam (2022) proposed a method named DEEPSEL (Deep Feature Selec- as VGG16 (Simonyan & Zisserman, 2014) that has been trained on
tion) to identify malicious codes of 39 unique malware families. Their ImageNet, as an example. The input fed at the bottom, goes up to the
model achieved an accuracy of 83.6% and an F-measure of 82.5%. To- trained convolutional base, representing the CNN region of the VGG16.
biyama, Yamaguchi, Shimada, Ikuse, and Yagi (2016) proposed feature The trained classifier resides in the dense region and the prediction is
extraction based on system calls. Recurrent Neural Network was used made by this dense region at the end. Usually, we have 1000 neurons at
to extract features and Convolutional Neural Network to classify these the end to predict the actual ImageNet classes. We take this ImageNet
features. trained model as base, and remove the classifier layer, keeping the
convolutional layers of the pre-trained model, along with their weights.
In the next step, we attach a new classifier that has new dense layers for
9. Deep transfer leaning models for malware detection malware classification on top. The weights of the base are frozen, which
means that the malware input passes through convolutional layers
Transfer learning takes place if we have a source model which which have their prior weights, during training. However, all dense
has some pre-trained knowledge and this knowledge is needed as the layers are randomly initialized, and the interconnection weights for
foundation to build a new model (Ye & Dai, 2021). For example, using these layers are learned during the new training process for detecting
malware.
a very large pre-trained convolutional neural network usually involves
Why remove the original dense layers? What has been observed is
saving a network that was previously trained on some large dataset,
that the representations learned by the convolutional base are generic
typically on a large-scale image classification task, using a dataset like
and therefore reusable for a variety of tasks.
ImageNet (Russakovsky et al., 2015). After training a network on the
ImageNet dataset, we can re-purpose this trained network. Research
9.2. Using fine tuning
papers have discussed applying these pre-trained networks to malware
image datasets (Bhodia, Prajapati, Di Troia, & Stamp, 2019; Qiao,
Fine-tuning involves changing some of the convolutional layers by
Zhang, & Zhang, 2020; Rezende, Ruppert, Carvalho, Ramos, & De Geus,
learning new weights. In Fig. 9, we have a network divided into three
2017; Vasan et al., 2020) that are generated form PE and APK malware regions. The yellow region is a pre-trained model. The green region
files, which are quite different from each other. represents our dense layers for which we need to learn the weights.
Malware image datasets are very different from ImageNet, which During training using a library such as Keras (Ketkar & Santana, 2017)
is normally used to pre-train the model. The ImageNet dataset and a and Tensorflow (Abadi et al., 2016), we can select certain layers and
malware image dataset represent visually completely different images. freeze the weights of those layers.
However, pre-trained still seems to help. Training a machine learning For example, we can select convolutional block one and then freeze
algorithm on large datasets can be done in two ways, as discussed all the weights of the convolutional layers, in this block only. This
below. means that during training, everything else will change, but the weights

6
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Table 3
Fine-tuned pre-trained models applied on different malware image datasets.
Setting Average accuracy Our dataset
Pre-trained model Samples Resize image Epoch Malimg Microsoft challenge Drebin Accuracy
EffNet B0 30,000 224 200 92.72% 90.45% 87.23% 94.59%
EffNet B1 30,000 240 200 95.64% 93.65% 88.91% 95.89%
EffNet B2 20,000 260 200 93.84% 91.78% 86.82% 94.12%
EffNet B3 15,000 300 400 90.32% 94.19% 89.35% 95.73%
EffNet B4 20,000 380 400 95.63% 96.68% 90.59% 97.98%
EffNet B5 25,000 456 400 80.19% 87.54% 84.23% 94.68%
EffNet B6 40,000 528 400 85.67% 83.82% 85.43% 93.54%
EffNet B7 30,000 600 1000 82.76% 80.76% 90.57% 88.45%
Inception V4 20,000 229 300 95.98% 93.21% 88.93% 96.39%
Xception 20,000 229 200 89.50% 90.84% 84.39% 93.53%
CapsNet 3000 256 100 88.64% 72.69% 78.68% 92.65%

which is a large-scale image dataset. This is the pre-training step. Next,


we train the model on dataset Y; this dataset is typically smaller than
dataset X. This is the fine-tuning step.
State-of-the-art transfer learning models we have trained and eval-
uated for malware classification are EffNet B0, B1, B2, B3, B4, B5, B6,
and B7 (Tan & Le, 2019); Inception-V4 (Szegedy, Ioffe, Vanhoucke, &
Fig. 9. Fine Tuning of Transfer Learning. Alemi, 2017), Xception (Chollet, 2017), and CapsNet (Sabour, Frosst,
& Hinton, 2017b) as shown in Table 3. The datasets used are our
RGB malware image dataset and two other datasets, namely Malimg
of the convolutional layers in this block will not change. Similarly, we Dataset (Nataraj et al., 2011) and Microsoft Malware Dataset (Gibert,
Mateu, & Planes, 2020). The accuracy and loss curve plots for EffNet
can keep frozen the convolutional layers of the next block as well as
B1, B2, B3, B4, B5, B6, and B7 are shown in Appendix B and EffNet B0
blocks three and four if we so wish. Then, we can fine-tune the convo-
shows in Fig. 11.
lutional layers that are closer to the dense layer. As a result, the initial
We found that the Inception-V4 model is most effective in classi-
layers of representation are kept constant, but new representations are
fying malware images among the ten models. In addition, the training
learned by later layers (yellow region) as their weights change, evolve
times for each model increases with increase in the size of input images
and get updated.
since the number of network cells grows quickly in GPU RAM.
Thus, fine-tuning means unfreezing a few of the top layers of a
frozen model base used for feature extraction. What we simply do is
9.3. Analysis of transfer learning for malware classification
jointly train the newly added top part of the model (green region)
consisting of dense layers, and the top convolutional layers (yellow
We found that transfer learning based image classification, with
region), for which we have unfrozen the weights.
a small number of parameters to retrain successfully to classify mal-
Why fine-tune in this manner? Because, we slightly adjust the
ware images. On the other hand, we argue that scaled up wider and
more abstract representations of the model being reused to make them deeper transfer models with more parameters builds a new model that
more relevant for the problem at hand. Sudhakar and Kumar (2021) may improve performance. Inception-V3 and Inception-V4 for malware
redesigned ResNet50 (He, Zhang, Ren, & Sun, 2016) by changing the detection and classification avoid the inefficiencies in classifying un-
last layer with a fully connected dense layer to detect unknown mal- known malware grayscale and RGB images among transfer learning
ware samples without feature engineering. Go et al. (2020) proposed classification model. There are many transfer learning models tech-
a visualization approach to classify the malware families by using a niques such as batch normalization (Kocaman, Shir, & Bäck, 2021), skip
ResNeXt50 pre-trained model. The model achieved 98.86% accuracy on connections (Alaraimi, Okedu, Tianfield, Holden, & Uthmani, 2021)
the Malimg dataset (Nataraj et al., 2011). Çayır et al. (2021) built an en- that are designed to help in training, but the accuracy still needs to
semble pre-trained capsule network (CapsNet) (Sabour, Frosst, & Hin- improve. For instance, ResNet-101 and ResNet-50 have similar accura-
ton, 2017a) based on the bootstrap aggregating approach. The model cies in terms of malware detection even though they have very different
was trained and tested on two public datasets, Malimg, and BIG2015. deep networks (Eum, Lee, & Kwon, 2018).
Their model achieved F-Score 96.6% on the Malimg dataset (Nataraj
et al., 2011) and 98.20% on the BIG2015 dataset.17 Bensaoud et al. 10. Natural language processing for malware classification
(2020) used six convolutional neural network models for malware
classification. Comparison among these models shows that the trans- Natural Language Processing (NLP) extracts valuable information
fer learning model called Inception-V3 (Szegedy, Vanhoucke, Ioffe, so that a program is able to read, understand and derive meaning
Shlens, & Wojna, 2016) achieved the current state-of-the-art in malware from human language text or speech. Malware data contain executable
classification. Khan, Zhang, and Kumar (2019) evaluated ResNet and files, Microsoft Word files, macro files, logs from different operating
GoogleNet (Szegedy et al., 2015) models for malware detection by systems, emails, network activities, etc. Many of these files contain
converting an APK bytecode into grayscale image. Table 3 summarizes extensive amounts of text; some others contain snippets of text mixed
the most transfer learning models for malware classification. We con- with code and other information. NLP can be used to enhance malware
clude that CNN transfer learning models can be fine-tuned to specific classification due to the extensive use of text or text-like content within
image sizes that are robust enough and accurate to use malware image malware. A critical requirement for malware text classification is using
classification. effective text representation in the form text encoding. The initial step
Fig. 10 shows how to train the model on an image dataset. We in text encoding is preprocessing by removing a redundant opcode or
randomly initialize the model, and then train the model on dataset X, API fragments, discarding unnecessary text. After tokenization, there
are different types of non-sequential text representations (Jurafsky
& Martin, 2021) such as Bag of Words (BoW), Term Frequency In-
17
https://www.kaggle.com/c/malware-classification verse document frequency matrices (TFIDF), Term document matrices

7
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Fig. 10. Transfer Learning steps.

Table 4
The steps of encoding the domain by NLP.
Domain Notes
www.uccs.edu Start with domain
uccs Extract second level
[‘‘u’’, ‘‘c’’, ‘‘c’’, ‘‘s’’] Convert to sequence
[21, 3, 3, 18] Translate character to numeric values
[0, 0, 0, . . . .,0 , 21, 3, 3, 18] Pad sequence

Shaalan, 2021), and text classification (Niu, Zhong, & Yu, 2021).
Fig. 11. Training and testing for accuracy and loss of EfficientnetB0.
Attention mechanism was designed to improve the performance of the
encoder–decoder machine translation approach (Ren et al., 2021). The
encoder and decoder are usually many stacked RNN layers such as
LSTM as shown in Fig. 12. The encoder converts the text into a fixed-
(TDM), n-grams, One hot encoding, ASCII representations, and mod-
length vector while the decoder generates the translation text from this
ern word embedding such as Word2vec (Mikolov, Chen, Corrado, &
vector. The sequence {𝑥1 , 𝑥2 , … , 𝑥𝑛 } can either be a representation of
Dean, 2013) and Sent2vec (Pagliardini, Gupta, & Jaggi, 2017). Table 4
text or image as shown in Fig. 13. In case of sequences, Recurrent Neu-
presents text representation methods used in malware classification.
ral Networks (RNNs) can take two sequences with the same or arbitrary
Current word embeddings, when used in malware classification, do
lengths. In Fig. 14, the encoder creates a compressed representation
not carry much semantic and contextual significance. Bensaoud and
called context vector of the input, while the decoder gets the context
Kalita (2024) proposed a novel model for malware classification using
vector to generate the output sequence. In this approach, the network
API calls and opcodes, incorporating a combined Convolutional Neural
Network and Long Short-Term Memory architecture. By transform- is incapable of remembering dependencies in long sentences. This is
ing features into N-gram sequences and experimenting with various because the context vector needs to handle potentially long sentences,
deep learning architectures, including Swin-T and Sequencer2D-L, the and a shoot overall representation does not have the especially to store
method achieves a high accuracy of 99.91%, surpassing state-of-the- many potential dependencies.
art performance. Mimura and Ito (2021) designed NLP-based malware Attention in encoder–decoder: Bahdanau, Cho, and Bengio (2014)
detection by using printable ASCII strings. The model can detect ef- proposed an encoder–decoder attention mechanism framework for ma-
fectively packed malware and anti-debugging. Sequence to Sequence chine translation. A single fixed context vector is created by an RNN by
neural models are commonly used for natural languages processing and encoding the input sequence. Rather than using just the fixed vector, we
therefore used for malware detection as well. can also use each state of the encoder along with the current decoder
state to generate a dynamic context vector. There are two benefits; the
10.1. Sequence to sequence neural models first benefit is encoding information contained in a sequence of vectors
not just in one single context vector. The second benefit is to choose a
Attention mechanism (Luong, Pham, & Manning, 2015) has achieved subset of these vectors adaptively while decoding the translation.
high performance in sequential learning applications such as machine An attention mechanism is another Lego block that can be used
translation (Lu et al., 2021), image recognition (Gao, Gong, Ding, in any deep learning model. Vaswani et al. (2017) showed that an
& Guo, 2021), text summarization (AlMazrouei, Nelci, Salloum, & attention mechanism is apparently the only Lego block one needs. It

8
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Fig. 12. Encoder and decoder.


Fig. 15. LSTM with attention mechanism for malware classification.

Table 5
State-of-the-art deep learning models.
Ref Deep learning OS Features Accuracy
approach
Kim, Ban, Ko, MAPAS Android API call graphs 91.27%
Cho, and Yi
Fig. 13. Encoder and decoder include RNNs.
(2022)
Onwuzurike MaMaDroid Android API calls 84.99%
et al. (2019)
Kim and Cho Deep Generative Android Dalvik code, 97.47%
(2022) Model API call,
Malware images,
developers’
signature
Olani, Wu, DeepWare Windows/ HPC 96.8%
Chang, and Shih Linux
(2022)
Lian, Nie, Kang, Multi-Modal Windows Grayscale image, 97.01%
Jia, and Zhang Deep Learning Byte/Entropy
(2022) Histogram
Bensaoud and Deep multi-task Windows Grayscale color 99.97%
Kalita (2022) learning Android Linux image
MacOS

predictive models using LSTM and attention mechanism for malware


classification, we need to add an embedding layer followed by an
LSTM layer and dense layers . This approach is superior to captur-
Fig. 14. Encoder and decoder include RNNs with attention mechanism. ing a long sequence of Windows API call sequences and using them
directly (Girinoto, Setiawan, Putro, & Pramadi, 2020) (see Fig. 15).
Malware’s longer sequence can be addressed by attention mechanisms
improved the performance of a language translation model by dynam- that can help detect short repeating patterns and other dependen-
ically choosing important parts of the input sequence that matter at a cies (Agrawal, Stokes, Selvaraj, & Marinescu, 2019). While attention
certain point in the output sequence. We can entirely replace traditional mechanism improves accuracy, it suffers from the heavy computation.
Recurrent Neural Network (RRN) blocks by an attention mechanism Table 5 shows various approaches and their corresponding accu-
block. When dealing with sequential data, the attention mechanisms racies. The methods presented, including MAPAS, MaMaDroid, Deep
allow models to not only perform better but also train faster. Generative Model, DeepWare, Multi-Modal Deep Learning, and Deep
Multi-Task Learning, employ diverse techniques such as API call graph
Applying attention mechanism in malware classification: Or-
analysis, static analysis, and hybrid deep generative models. Particu-
Meir, Cohen, Elovici, Rokach, and Nissim (2021) added an attention
larly, these methods are evaluated on distinct datasets, indicating that
mechanism to an LSTM model, which improved accuracy in mal-
the comparisons are not based on the same dataset. The authors aim to
ware classification. Yakura, Shinozaki, Nishimura, Oyama, and Sakuma
convey the effectiveness of these models in detecting malware across
(2019) proposed a method by using Convolutional Neural Network with
different datasets and scenarios. However, a comprehensive overview
Attention Mechanism for malware image classification. Mimura and of the comparative performance of these methods is needed, high-
Ohminami (2020) proposed a sliding local attention mechanism model lighting their strengths and capabilities in addressing the challenges of
(SLAM) based on API execution sequence. Ma et al. (2019) proposed malware detection.
a malware classification framework (ACNN) based on two sections
within the malware text, the assembly code and binary code, and 11. Deep learning for cryptographic ransomware
converted them into multi-dimensional features. A CNN with attention
mechanism for classification has a higher malware image classification Cryptography has been used traditionally for military and govern-
accuracy than conventional methods (Yakura et al., 2019). To build ment use, to keep secrets from the enemy. Today most of us use

9
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

cryptography when we use commercial websites or services. For ex-


ample, we use it to protect our emails. A lot of countries try to control
the export of cryptography to make sure that good cryptographic algo-
rithms are not in the hands of criminals, enemies, or adversaries. This
is the idea behind export administration, and regulations as codified
in International Traffic in Arms Regulations (ITAR).18 In addition, we
have various agreements like the Wassenaar Arrangement,19 where a
number of countries got together and developed an agreement for what
cryptographic elements can be exported and imported without any type
of restrictions. This agreement allows publicly available cryptographic
algorithms to be distributed freely. Cryptography provides various
security capabilities for us.

• Confidentiality: To protect our intellectual property from some-


body else being able to get hold of it. Fig. 16. Crypto Action.
• Non-repudiation: To repudiate is to deny. For example, if we use
digital signatures, we can provide proof that the message came
from the person who signed. We can link the signed document to We need to use another mode of communication to transmit the key.
a trusted person, which gives us trust or assurance in the world of For example, Ahmed sends the symmetric key to Bryan using a certain
e-contracts and e-commerce. The signer cannot repudiate or deny secure node of communication. Once Bryan has the key, Ahmed can
being the source of the document. encrypt the plaintext message into ciphertext and send it over a public
• Integrity: Hashing provides integrity, to know that a message network to Bryan with confidence that it will remain encrypted until
was not changed either accidentally or intentionally as it was Bryan decides to decrypt with the received key.
transmitted or stored. Integrity is built into implementation of Multiple attacks, such as a man-in-middle attack, brute force at-
electronic communication services today using such as SHA al- tack, biclique attack, ciphertext only attack, known plaintext attack,
gorithms20 and MD5.21 chosen plaintext attack, chosen ciphertext attack, and chosen text
• Proof-of-Origin: Cryptography can be used to prove where a attack can discover the key to find the plaintext. Attackers know the
message came from, the idea of Proof-of-Origin. mathematical relationship of the keys for some algorithms, such as
• Authenticity: The idea is to ensure that communication is with Advanced Encryption Standard (AES) (Heron, 2009), Triple DES (Sasi &
the intended person. For example, if we go to a bank’s website, Sivanandam, 2015), Blowfish (Mahendra & Prabha, 2022), and Rivest-
then we want to be sure that the website is truly of that bank, not Shamir-Adleman (RSA) (Kota & Aissi, 2022). We perform cryptanalysts
that of an impostor or somebody else masquerading as that bank. using statistical measures to try to get the cipher type, but a crypt-
analyst can only test as many solvers via trial and error to test if the
11.1. Operations of cryptography ciphertext was encrypted using a specific cipher. Machine learning can
tell us what type a cipher is (Lee, Teh, Jamil, Yan, & Chen, 2021).
Cryptographic algorithms come in three basic flavors: Symmet- The cipher type detection problem is classification problem. We can
ric, Asymmetric, and Hash algorithms. Each of these different types use statistical values as features for machine learning.
of algorithms serves a different purpose, but all work together in a
cryptography system. 11.2. Connection between deep learning and cryptography
Cryptography is a key to keeping communicated information secret
by converting it into an unreadable code that is hard to break. To A neural network can deal with the complexity of computation
encrypt or encipher is to take a plaintext message and convert it into applied to perform cryptography. Instead of giving an image to a neural
something unreadable to anyone who does not have a key. To decrypt network, we can give ciphertext to the neural network to classify the
or decipher is the reverse step. kind of algorithm that was used to obtain the ciphertext, as shown in
In Fig. 16, the basic action includes plaintext being fed into a Fig. 17. To build a machine learning model, we can represent different
cryptosystem. This process is used to encrypt and decrypt a message. features of the cipher, which cryptanalysts usually use to identify them.
It contains an algorithm that uses a mathematical process to convert We need to put an intermediate layer between the network and cipher-
a message from plaintext to ciphertext and then back again. The text that computes the features, such as Unigram frequencies, Bigram
frequencies, Index of Coincidence IoC, HasDoubleLetters, etc., and then
algorithm includes a key or a cryptovariable. The variable is used by the
we can train the network with millions of ciphertext and all American
algorithm during the encryption and decryption processes. Typically
Cryptogram Association (ACA) cipher types. For example, in Fig. 18,
the key is a secret password, passphrase, or PIN chosen either by the
the three blue neural networks are given the frequencies of N-grams (1-
person or by the tool that encrypts the message. This combination of
grams, 2-grams, 3-grams, 4-grams, etc.), and the green neural network
the key (or a cryptovariable) and the algorithm in the cryptosystem
computes HasDoubleLetters. Then we have a hidden layer that connects
produces a unique ciphertext.
the input and output layers. Finally, in this case the designed neural
In the symmetric algorithm family, a symmetric key is one that is
network shows 90% Seriated Playfair, and the green neural network
a shared secret between the sender and receiver of the information.
shows 10% Bazeries. Baksi (2022) designed a machine-learning model
The same key used for encryption is also used for decryption. It is not
for differential attacks on the non-Markov 8-round GIMLI cipher and
safe to send a copy of the key along with the message that it encrypts.
GIMLI hash. They applied multi-layer perceptron (MLP), Convolutional
Neural Networks (CNN), and Long Short-Term Memory (LSTM).
18 The ransomware families to encrypt data and force the victim to
https://csrc.nist.gov/glossary/term/itar
19
https://www.federalregister.gov/documents/2022/08/15/2022- make payment via cryptocurrency include WannaCry, Locky, Stop,
17125/implementation-of-certain-2021-wassenaar-arrangement-decisions- CryptoJoker, CrypoWall, TeslaCrypt, Dharma, Locker, Cerber, and
on-four-section-1758-technologies GandCrab. Recently, deep learning algorithms have been used for cryp-
20
https://csrc.nist.gov/glossary/term/sha tography (Kok, Azween, & Jhanjhi, 2020). Ding et al. (2020) proposed
21
https://csrc.nist.gov/glossary/term/md5 DeepEDN to fulfill the process of encrypting and decrypting medical

10
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Fig. 17. Cryptocurrency malware detection using machine learning.

Fig. 18. Detect the Cipher Type With Neural Networks.

Fig. 19. Using explainable artificial intelligence in deep learning.

images. Kim, Park, Kwon, Jang, and Seo (2021) proposed detection 12. Explainable artificial intelligence (XAI)
of cryptographic ransomware using Convolutional Neural Network.
Their model prevents crypto-ransomware infection by detecting a block Explainable Artificial Intelligence (XAI) is a rapidly emerging field
that focuses on creating transparent and interpretable models (see
cipher algorithm. Sharmeen, Ahmed, Huda, Koçer, and Hassan (2020)
Fig. 19). In the context of malware detection, XAI can help security
proposed an approach to extract the intrinsic attack characteristics of
experts and analysts understand how a machine learning model arrived
unlabeled ransomware samples using a deep learning-based unsuper- at its decisions, making it easier to identify and understand false
vised learned model. Fischer et al. (2019) designed a tool to detect positives and false negatives. By applying XAI techniques, such as Lo-
security vulnerabilities of cryptographic APIs in Android by achieving cal Interpretable Model-Agnostic Explanations (LIME) (Ribeiro, Singh,
an average AUC-ROC of 99.2%. & Guestrin, 2016) or Deep Learning Important Features (DeepLIFT)

11
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

(Shrikumar, Greenside, & Kundaje, 2017), security teams can gain for various objectives, including evasion attacks and poisoning at-
insights into the most important features and decision-making processes tacks. Zhong et al. (2023) proposed a novel adversarial malware exam-
of the model. This can help them identify areas where the model may ple generation method called Malfox, which uses conditional generative
be vulnerable to evasion or identify new malware strains that the model adversarial networks (conv-GANs) to generate camouflaged adversar-
may have missed. Ultimately, XAI can improve the trustworthiness and ial examples against black-box detectors. The presented method was
reliability of machine learning models for malware detection, enabling evaluated on two real-world malware detection systems, and the results
more effective threat detection and response. showed that Malfox achieved high attack success rates while maintain-
Nadeem et al. (2022) provided a comprehensive survey and analysis ing low detection rates. Zhao et al. (2023) proposed a new method
of the current state of research on explainable machine learning (XAI) called SAGE for steering the adversarial generation of examples with ac-
techniques for computer security applications. The paper highlights celerations. The technique combines the advantages of gradient-based
the challenges and opportunities for adopting XAI in the security and gradient-free methods to generate more effective and efficient
domain and discusses several approaches for designing and evalu- adversarial examples.
ating explainable machine learning models. Vivek, Ravi, Mane, and The development of defense mechanisms against adversarial attacks
Naidu (2022) proposed an approach for detecting ATM fraud using is a computationally expensive process, which can potentially affect
explainable artificial intelligence (XAI) and causal inference techniques. the performance of the deep learning model. In addition, adversarial
They presented a detailed analysis of the proposed method and high- examples can impact the generalization ability of deep learning models,
lighted its effectiveness in improving the accuracy and interpretability resulting in poor performance on new and unseen data. Moreover,
of ATM fraud detection systems. Kinkead, Millar, McLaughlin, and generating adversarial examples can be computationally intensive, es-
O’Kane (2021) proposed an approach that uses LIME to identify im- pecially for large datasets and complex models, which can hinder
portant locations in the opcode sequence that are deemed significant the practical deployment of deep learning models in real-world appli-
by the Convolutional Neural Network (CNN). McLaughlin et al. (2017) cations. Thus, further research is required to improve the efficiency
used LRP (Bach et al., 2015) and DeepLift (Shrikumar et al., 2017) and effectiveness of defense mechanisms, as well as the generalization
methods to identify the opcode sequences for most malware families, ability and robustness of deep learning models to adversarial attacks.
and they demonstrated that the CNN, while using the DAMD dataset, Hu and Tan (2023) proposed a method to generate adversarial
learned patterns from the underlying op-code representation. Hooker, malware examples using Generative Adversarial Networks (GANs) for
Erhan, Kindermans, and Kim (2019) proposed a method to remove black-box attacks. Their results show that the generated adversarial
relevant features detected by an XAI approach and verify the accuracy malware samples can evade detection by existing machine learning
degradation. Lin, Lee, and Celik (2021) presented seven different XAI models while maintaining high similarity to the original malware. Ling
methods and automated the evaluation of the correctness of explana- et al. (2023) conducted a survey of the state-of-the-art in adversarial
tion techniques. The first four XAI methods are white-box approaches attacks against Windows PE malware detection, covering various types
to determine the importance of input features: Backpropagation (BP), of attacks and defense mechanisms. The authors also provided insights
on potential future research directions in this area. Xu et al. (2023)
Guided Backpropagation (GBP), Gradient-weighted Class Activation
proposed a semi-black-box adversarial sample attack framework called
Mapping (GCAM), and Guided GCAM (GGCAM). The last three are
Ofei that can generate adversarial samples against Android apps de-
black-box approaches that observe an essential feature in the output
ployed on a DLAAS platform. The framework utilizes a multi-objective
probability using perturbed samples of the input: Occlusion Sensitivity
optimization algorithm to generate robust and stealthy adversarial
(OCC), Feature Ablation (FA), and Local Interpretable Model-Agnostic
samples. Qiao et al. (2022) proposed an adversarial detection method
Explanations (LIME).
for ELF malware using model interpretation and show that their method
Guo et al. (2018) proposed an approach called Explaining Deep
can effectively identify adversarial ELF malware with high accuracy.
Learning based Security Applications (LEMNA) for security applica-
The proposed approach combines random forests and LIME to identify
tions, which generates interpretable features to explain how input
the most important features and thus improve the interpretability and
samples are classified. Kuppa and Le-Khac (2020) presented a com-
robustness of the model. Meenakshi and Maragatham (2023) proposed
prehensive analysis of the vulnerability of XAI methods to adversarial
a defensive technique using Curvelet transform to recognize adversarial
attacks in the context of cybersecurity, discussing potential risks associ-
iris images, optimizing the image classification accuracy. The designed
ated with deploying XAI models in real-world applications, and propos-
method was shown to be effective against several existing adversarial
ing a framework for designing robust and secure XAI systems. Rao and
attacks on iris recognition systems. Pintor et al. (2022) introduced a
Mane (2021) proposed an approach to protect and analyze systems
method for debugging and improving the optimization of adversarial
against the alarm-flooding problem using the NSL-KDD dataset. They
examples by identifying and analyzing the indicators of attack failure.
included a Security Information and Event Management (SIEM) system
The proposed method can help to improve the robustness of deep
to generate a zero-shot method for detecting alarm labels specific to
learning models against adversarial attacks.
adversarial attacks. Although explainable artificial intelligence (XAI)
has gained significant attention, its effectiveness in malware detection 14. Conclusion
still requires further investigation to fully comprehend its performance.
Machine learning has started to gain the attention of malware
13. Adversarial attack on deep neural networks detection researchers, notably in malware image classification and
cipher cryptanalysis. However, more experimentation is required to
Adversarial examples refer to maliciously crafted inputs to machine understand the capabilities and limitations of deep learning when used
learning models designed to deceive the model into making incorrect to detect/classify malware. Deep learning can reduce the need for static
predictions. Deep detection in this context refers to the use of deep and dynamic analysis and discover suspicious patterns. In the future,
learning models for detecting and classifying objects or patterns in the researchers may consider developing more accurate, robust, scalable,
input data. Adversarial examples can be specifically crafted to evade and efficient deep learning models for malware detection systems for
deep detection models and cause them to misclassify or miss the target various operating systems. Finally, multi-task learning and transfer
objects or patterns. Therefore, adversarial examples can be seen as a learning can provide valuable results in classifying all types of malware.
type of attack on deep detection models. Adversarial examples can Furthermore, we show that the significant challenges of deep learning
be generated using a variety of techniques, including optimization- approaches that need to be considered are hyperparameters optimiza-
based approaches and perturbation-based approaches, and can be used tion, fine-tuning, and size and quality of datasets when features are

12
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Fig. 20. Training and testing for accuracy and loss of EfficientnetB1. Fig. 22. Training and testing for accuracy and loss of EfficientnetB3.

Fig. 21. Training and testing for accuracy and loss of EfficientnetB2.
Fig. 23. Training and testing for accuracy and loss of EfficientnetB4.

overweighted or overrepresented. We also illustrate the opportunities


and challenges of XAI in deep learning as well as future research
directions in the context of malware detection. Finally, we presented
the idea of adversarial attacks on deep neural networks by introducing
small, carefully crafted perturbations to input data in order to cause
misclassification or reduce model performance.

Declaration of competing interest

The authors declare that they have no known competing finan-


cial interests or personal relationships that could have appeared to
Fig. 24. Training and testing for accuracy and loss of EfficientnetB5.
influence the work reported in this paper.

Data availability

The data that has been used is confidential.

Appendix A. File types

. tbk, .jpeg, . brd, .dot, .jpg, .rtf, .doc, .js, .sch, .3dm, .mp3, .sh, .3ds,
.key, .sldm, .3g2, .lay, .sldm, .mkv, .std, .asp, .mml, .sti, .avi, .mov, .stw,
.backup, . jsp, .suo, .bak, .mp4, .svg, .bat, .mpeg, .swf, .bmp, .mpg, .sxc,
.rb, .msg, .sxd, .bz2, .myd, .sxi, .c, .myi, .sxm, .cgm, .nef, .sxw, .class,
.odb, .tar, .cmd, .odg, .123, . onetoc2, .odp, .tgz, .crt, .ods, .tif, .3gp,
Fig. 25. Training and testing for accuracy and loss of EfficientnetB6.
.lay6, .sldx, .7z, .ldf, .slk, .vsd, .m3u, .sln, .aes, .m4u, .snt, .ai, .max,
.sql, . ppam, .mdb, .sqlite3, .asc, .mdf, .sqlitedb, .asf, .mid, .stc, .asm,
.cs, .odt, .tiff, .csr, .cpp, .txt, .csv, .pas, .vmx, .docb, .pdf, .vob, .docm,
.pem, . accdb, .docx, .pfx, .vsdx, .602, . p12, .wav, .dotm, .pl, .wb2,
.dotx, .png, .wk1, .dwg, .pot, . xltx, .edb, .potm, .wma, .eml, .potx,
.wmv, .fla, .ARC, .xlc, .flv, .pps, .xlm, .frm, .ppsm, .xls, .gif, .ppsx, .xlsb,
.gpg, .ppt, .xlsm, .gz, .pptm, .xlsx, .h, .pptx, .xlt, .hwp, .ps1, .xltm, .ibd,
.psd, .wks, .iso, .pst, .xlw, .jar, .rar, . djvu, .java, .raw., .ost, .uop, .db,
.otg, .uot, .dbf, .otp, .vb, .dch, .ots, .vbs, .der’’, .ott, .vcd, .dif, .php, .vdi,
.dip, .PAQ, .vmdk, .zip

Appendix B. The accuracy and loss curves plots

Fig. 26. Training and testing for accuracy and loss of EfficientnetB7.
See Figs. 20–26.

13
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

References Fan, Z., Xu, Y., & Zhang, D. (2011). Local linear discriminant analysis framework using
sample neighbors. IEEE Transactions on Neural Networks, 22(7), 1119–1132.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). Tensorflow: Fischer, F., Xiao, H., Kao, C.-Y., Stachelscheid, Y., Johnson, B., Razar, D., et al. (2019).
A system for large-scale machine learning. In 12th {USeNIX} symposium on operating Stack overflow considered helpful! deep learning security nudges towards stronger
systems design and implementation {oSDI} 16, (pp. 265–283). cryptography. In 28th {USeNIX} security symposium {USeNIX} security 19, (pp.
Agrawal, R., Stokes, J. W., Selvaraj, K., & Marinescu, M. (2019). Attention in recurrent 339–356).
neural networks for ransomware detection. In ICASSP 2019 - 2019 IEEE international Gao, Y., Gong, H., Ding, X., & Guo, B. (2021). Image recognition based on mixed
conference on acoustics, speech and signal processing (pp. 3222–3226). attention mechanism in smart home appliances. Vol. 5, In 2021 IEEE 5th advanced
Alaraimi, S., Okedu, K. E., Tianfield, H., Holden, R., & Uthmani, O. (2021). Trans- information technology, electronic and automation control conference (pp. 1501–1505).
fer learning networks with skip connections for classification of brain tumors. Gibert, D., Mateu, C., & Planes, J. (2020). The rise of machine learning for detection and
International Journal of Imaging Systems and Technology. classification of malware: Research developments, trends and challenges. Journal of
Alcantarilla, P. F., Bartoli, A., & Davison, A. J. (2012). KAZE features. In European Network and Computer Applications, 153, Article 102526.
conference on computer vision (pp. 214–227). Springer. Girinoto, Setiawan, H., Putro, P. A. W., & Pramadi, Y. R. (2020). Comparison of
AlMazrouei, R. Z., Nelci, J., Salloum, S. A., & Shaalan, K. (2021). Feasibility of using LSTM architecture for malware classification. In 2020 international conference on
attention mechanism in abstractive summarization. In International conference on informatics, multimedia, cyber and information system (pp. 93–97).
emerging technologies and intelligent systems (pp. 13–20). Springer. Go, J. H., Jan, T., Mohanty, M., Patel, O. P., Puthal, D., & Prasad, M. (2020).
Asam, M., Hussain, S. J., Mohatram, M., Khan, S. H., Jamal, T., Zafar, A., et al. (2021). Visualization approach for malware classification with ResNeXt. In 2020 IEEE
Detection of exceptional malware variants using deep boosted feature spaces and congress on evolutionary computation (pp. 1–7).
Guo, W., Mu, D., Xu, J., Su, P., Wang, G., & Xing, X. (2018). Lemna: Explaining
machine learning. Applied Sciences, 11(21), 10464.
deep learning based security applications. In Proceedings of the 2018 ACM SIGSAC
Azad, M. A., Riaz, F., Aftab, A., Rizvi, S. K. J., Arshad, J., & Atlam, H. F. (2022).
conference on computer and communications security (pp. 364–379).
DEEPSEL: A novel feature selection for early identification of malware in mobile
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
applications. Future Generation Computer Systems, 129, 54–63.
recognition. In Proceedings of the IEEE conference on computer vision and pattern
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., & Samek, W. (2015). On
recognition (pp. 770–778).
pixel-wise explanations for non-linear classifier decisions by layer-wise relevance
Hemalatha, J., Roseline, S. A., Geetha, S., Kadry, S., & Damaševičius, R. (2021). An
propagation. PLoS One, 10(7), Article e0130140.
efficient DenseNet-based deep learning model for malware detection. Entropy, 23(3),
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly
344.
learning to align and translate. arXiv preprint arXiv:1409.0473.
Herault, J., & Jutten, C. (1986). Space or time adaptive signal processing by neural
Bai, Y., Xing, Z., Ma, D., Li, X., & Feng, Z. (2021). Comparative analysis of feature
network models. Vol. 151, In AIP conference proceedings (1), (pp. 206–211).
representations and machine learning methods in Android family classification.
American Institute of Physics.
Computer Networks, 184, Article 107639.
Heron, S. (2009). Advanced encryption standard (AES). Netw. Secur., 2009(12), 8–12.
Baksi, A. (2022). Machine learning-assisted differential distinguishers for lightweight
Hooker, S., Erhan, D., Kindermans, P.-J., & Kim, B. (2019). A benchmark for inter-
ciphers. In Classical and physical security of symmetric key cryptographic algorithms
pretability methods in deep neural networks. In Advances in neural information
(pp. 141–162). Springer.
processing systems: vol. 32.
Barath, N., Ouboti, D., & Temesguen, M. (2016). Pattern recognition algorithms for
Hu, M.-K. (1962). Visual pattern recognition by moment invariants. IRE Transactions
malware classification. In Proceeding of 2016 IEEE conference of aerospace and
on Information Theory, 8(2), 179–187.
electronics (pp. 338–342).
Hu, W., & Tan, Y. (2023). Generating adversarial malware examples for black-box
Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In
attacks based on GAN. In Data mining and big data: 7th international conference,
European conference on computer vision (pp. 404–417). Springer.
DMBD 2022, Beijing, China, November 21–24, 2022, proceedings, Part II (pp.
Bensaoud, A., Abudawaood, N., & Kalita, J. (2020). Classifying malware images with
409–423). Springer.
convolutional neural network models. International Journal of Network Security, 22,
Jurafsky, D., & Martin, J. (2021). Speech and Language Processing (third ed.).
1022–1031.
Kancherla, K., Donahue, J., & Mukkamala, S. (2016). Packer identification using Byte
Bensaoud, A., & Kalita, J. (2022). Deep multi-task learning for malware image
plot and Markov plot. Journal of Computer Virology and Hacking Techniques, 12(2),
classification. Journal of Information Security and Applications, 64, Article 103057.
101–111.
Bensaoud, A., & Kalita, J. (2024). CNN-LSTM and transfer learning models for malware Ketkar, N., & Santana, E. (2017). Vol. 1, Deep learning with Python. Springer.
classification based on opcodes and API calls. Knowledge-Based Systems, Article Khan, R. U., Zhang, X., & Kumar, R. (2019). Analysis of ResNet and GoogleNet models
111543. for malware detection. Journal of Computer Virology and Hacking Techniques, 15(1),
Bhodia, N., Prajapati, P., Di Troia, F., & Stamp, M. (2019). Transfer learning for 29–37.
image-based malware classification. arXiv preprint arXiv:1903.11551. Khayam, S. A. (2003). Vol. 114, The discrete cosine transform (DCT): theory and
Çayır, A., Ünal, U., & Dağ, H. (2021). Random CapsNet forest model for imbalanced application (pp. 1–31). Michigan State University, Citeseer.
malware type classification task. Computers & Security, 102, Article 102133. Kim, J., Ban, Y., Ko, E., Cho, H., & Yi, J. H. (2022). MAPAS: a practical deep learning-
Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In based android malware detection system. International Journal of Information
Proceedings of the thiry-fourth annual ACM symposium on theory of computing (pp. Security, 1–14.
380–388). Kim, J.-Y., & Cho, S.-B. (2022). Obfuscated malware detection using deep generative
Chauhan, D., Singh, H., Hooda, H., & Gupta, R. (2022). Classification of malware model based on global/local features. Computers & Security, 112, Article 102501.
using visualization techniques. In International conference on innovative computing Kim, H., Park, J., Kwon, H., Jang, K., & Seo, H. (2021). Convolutional neural network-
and communications (pp. 739–750). Springer. based cryptography ransomware detection for low-end embedded processors.
Chaulagain, D., Poudel, P., Pathak, P., Roy, S., Caragea, D., Liu, G., et al. (2020). Mathematics, 9(7), 705.
Hybrid analysis of android apps for security vetting using deep learning. In 2020 Kim, T., Suh, S. C., Kim, H., Kim, J., & Kim, J. (2018). An encoding technique for
IEEE conference on communications and network security (pp. 1–9). CNN-based network anomaly detection. In 2018 IEEE international conference on big
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In data (big data) (pp. 2960–2965).
Proceedings of the IEEE conference on computer vision and pattern recognition (pp. Kinkead, M., Millar, S., McLaughlin, N., & O’Kane, P. (2021). Towards explainable
1251–1258). CNNs for android malware detection. Procedia Computer Science, 184, 959–965.
Damodaran, A., Di Troia, F., Visaggio, C. A., Austin, T. H., & Stamp, M. (2017). A Kocaman, V., Shir, O. M., & Bäck, T. (2021). Improving model accuracy for imbal-
comparison of static, dynamic, and hybrid analysis for malware detection. Journal anced image classification tasks by adding a final batch normalization layer: An
of Computer Virology and Hacking Techniques, 13(1), 1–12. empirical study. In 2020 25th international conference on pattern recognition (pp.
Darem, A., Abawajy, J., Makkar, A., Alhashmi, A., & Alanazi, S. (2021). Visualization 10404–10411).
and deep-learning-based malware variant detection using OpCode-level features. Kok, S., Azween, A., & Jhanjhi, N. (2020). Evaluation metric for crypto-ransomware
Future Generation Computer Systems, 125, 314–323. detection using machine learning. Journal of Information Security and Applications,
Ding, Y., Wu, G., Chen, D., Zhang, N., Gong, L., Cao, M., et al. (2020). DeepEDN: 55, Article 102646.
a deep-learning-based image encryption and decryption network for internet of Kota, C. M., & Aissi, C. (2022). Implementation of the RSA algorithm and its
medical things. IEEE Internet of Things Journal, 8(3), 1504–1518. cryptanalysis. In 2002 GSW.
El-Shafai, W., Almomani, I., & AlKhayer, A. (2021). Visualized malware multi- Kuppa, A., & Le-Khac, N.-A. (2020). Black box attacks on explainable artificial
classification framework using fine-tuned CNN-based transfer learning models. intelligence (XAI) methods in cyber security. In 2020 international joint conference
Applied Sciences, 11(14). on neural networks (pp. 1–8). IEEE.
Euh, S., Lee, H., Kim, D., & Hwang, D. (2020). Comparative analysis of low-dimensional Lee, T. R., Teh, J. S., Jamil, N., Yan, J. L. S., & Chen, J. (2021). Lightweight block
features and tree-based ensembles for malware detection systems. IEEE Access, 8, cipher security evaluation based on machine learning classifiers and active S-boxes.
76796–76808. IEEE Access, 9, 134052–134064.
Eum, S., Lee, H., & Kwon, H. (2018). Going deeper with CNN in malicious crowd event Lian, W., Nie, G., Kang, Y., Jia, B., & Zhang, Y. (2022). Cryptomining malware
classification. Vol. 10646, In Signal processing, sensor/information fusion, and target detection based on edge computing-oriented multi-modal features deep learning.
recognition XXVII. International Society for Optics and Photonics, Article 1064616. China Communications, 19(2), 174–185.

14
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Lin, W.-C., Hays, J., Wu, C., Kwatra, V., & Liu, Y. (2004). A comparison study of four Qiao, Y., Zhang, W., Tian, Z., Yang, L. T., Liu, Y., & Alazab, M. (2022). Adversarial
texture synthesis algorithms on regular and near-regular textures: Tech. Rep., Citeseer. ELF malware detection method using model interpretation. IEEE Transactions on
Lin, Y.-S., Lee, W.-C., & Celik, Z. B. (2021). What do you see? Evaluation of explainable Industrial Informatics, 19(1), 605–615.
artificial intelligence (XAI) interpretability through neural backdoors. In Proceedings Qiao, Y., Zhang, B., & Zhang, W. (2020). Malware classification method based on
of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. word vector of bytes and multilayer perception. In ICC 2020-2020 IEEE international
1027–1035). conference on communications (pp. 1–6). IEEE.
Ling, X., Wu, L., Zhang, J., Qu, Z., Deng, W., Chen, X., et al. (2023). Adversarial attacks Rao, D., & Mane, S. (2021). Zero-shot learning approach to adaptive cybersecurity using
against Windows PE malware detection: A survey of the state-of-the-art. Computers explainable AI. arXiv preprint arXiv:2106.14647.
& Security, Article 103134. Ren, S., Zhou, L., Liu, S., Wei, F., Zhou, M., & Ma, S. (2021). Semface: Pre-training
Lo, W. W., Yang, X., & Wang, Y. (2019). An Xception convolutional neural network encoder and decoder with a semantic interface for neural machine translation. In
for malware classification with transfer learning. In 2019 10th IFIP international Proceedings of the 59th annual meeting of the association for computational linguistics
conference on new technologies, mobility and security (pp. 1–5). and the 11th international joint conference on natural language processing (volume 1:
Lowe, D. (1999). Object recognition from local scale-invariant features. Vol. 2, In long papers) (pp. 4518–4527).
Proceedings of the seventh IEEE international conference on computer vision (pp. Rezende, E., Ruppert, G., Carvalho, T., Ramos, F., & De Geus, P. (2017). Malicious
1150–1157 vol.2). software classification using transfer learning of resnet-50 deep neural network.
Lu, Z., Li, X., Liu, Y., Zhou, C., Cui, J., Wang, B., et al. (2021). Exploring multi-stage In 2017 16th IEEE international conference on machine learning and applications (pp.
information interactions for multi-source neural machine translation. IEEE/ACM 1011–1014). IEEE.
Transactions on Audio, Speech, and Language Processing, 1.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Ex-
Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective approaches to
plaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD
attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
international conference on knowledge discovery and data mining (pp. 1135–1144).
Ma, X., Guo, S., Li, H., Pan, Z., Qiu, J., Ding, Y., et al. (2019). How to make
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alter-
attention mechanisms more practical in malware classification. IEEE Access, 7,
native to SIFT or SURF. In 2011 international conference on computer vision (pp.
155270–155280.
2564–2571).
Mahendra, M., & Prabha, P. S. (2022). Classification of security levels to enhance
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015).
the data sharing transmissions using blowfish algorithm in comparison with data
Imagenet large scale visual recognition challenge. International Journal of Computer
encryption standard. In 2022 international conference on sustainable computing and
Vision, 115(3), 211–252.
data communication systems (pp. 1154–1160). IEEE.
Sabour, S., Frosst, N., & Hinton, G. E. (2017a). Dynamic routing between capsules. In
McLaughlin, N., Martinez del Rincon, J., Kang, B., Yerima, S., Miller, P., Sezer, S., et
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, &
al. (2017). Deep android malware detection. In Proceedings of the seventh ACM on
R. Garnett (Eds.), Vol. 30, Advances in neural information processing systems. Curran
conference on data and application security and privacy (pp. 301–308).
Meenakshi, K., & Maragatham, G. (2023). An optimised defensive technique to Associates, Inc..
recognize adversarial Iris images using Curvelet transform. Intelligent Automation Sabour, S., Frosst, N., & Hinton, G. E. (2017b). Dynamic routing between capsules.
& Soft Computing, 35(1), 627–643. arXiv preprint arXiv:1710.09829.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word Sasi, S. B., & Sivanandam, N. (2015). A survey on cryptography using optimization
representations in vector space. arXiv preprint arXiv:1301.3781. algorithms in WSNs. Indian Journal of Science and Technology, 8(3), 216.
Mimura, M., & Ito, R. (2021). Applying NLP techniques to malware detection in a Sharmeen, S., Ahmed, Y. A., Huda, S., Koçer, B. Ş., & Hassan, M. M. (2020). Avoiding
practical environment. International Journal of Information Security, 1–13. future digital extortion through robust protection against ransomware threats using
Mimura, M., & Ohminami, T. (2020). Using LSI to detect unknown malicious VBA deep learning based adaptive approaches. IEEE Access, 8, 24522–24534.
macros. Journal of Information Processing, 28, 493–501. Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through
Mohamad, Arif, J., Ab Razak, M. F., Awang, S., Tuan Mat, S. R., Ismail, N. S. N., et al. propagating activation differences. In International conference on machine learning
(2021). A static analysis approach for android permission-based malware detection (pp. 3145–3153). PMLR.
systems. PLoS One, 16(9), Article e0257968. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale
Mohammed, T. M., Nataraj, L., Chikkagoudar, S., Chandrasekaran, S., & Manjunath, B. image recognition. arXiv preprint arXiv:1409.1556.
(2021). Malware detection using frequency domain-based image visualization and Sudhakar, & Kumar, S. (2021). MCFT-CNN: Malware classification with fine-tune
deep learning. arXiv preprint arXiv:2101.10578. convolution neural networks using traditional and transfer learning in internet of
Nadeem, A., Vos, D., Cao, C., Pajola, L., Dieck, S., Baumgartner, R., et al. (2022). Sok: things. Future Generation Computer Systems, 125, 334–351.
Explainable machine learning for computer security applications. arXiv preprint Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer
arXiv:2208.10605. Vision, 7(1), 11–32.
Naik, N., Jenkins, P., Savage, N., Yang, L., Boongoen, T., & Iam-On, N. (2021). Fuzzy- Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-
import hashing: A static analysis technique for malware detection. Forensic Science resnet and the impact of residual connections on learning. In Thirty-first AAAI
International: Digital Investigation, Vol. 37, Article 301139. conference on artificial intelligence.
Narayanan, B. N., & Davuluru, V. S. P. (2020). Ensemble malware classification system Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going
using deep neural networks. Electronics, 9(5), 721. deeper with convolutions. In Proceedings of the IEEE conference on computer vision
Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B. S. (2011). Malware images: and pattern recognition (pp. 1–9).
visualization and automatic classification. In Proceedings of the 8th international
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the
symposium on visualization for cyber security (pp. 1–7).
inception architecture for computer vision. In Proceedings of the IEEE conference on
Ni, S., Qian, Q., & Zhang, R. (2018). Malware identification using visualization images
computer vision and pattern recognition (pp. 2818–2826).
and deep learning. Computers & Security, 77, 871–885.
Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional
Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep
neural networks. In International conference on machine learning (pp. 6105–6114).
learning. Neurocomputing, 452, 48–62.
PMLR.
Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture
Tobiyama, S., Yamaguchi, Y., Shimada, H., Ikuse, T., & Yagi, T. (2016). Malware
measures with classification based on featured distributions. Pattern Recognition,
detection with deep neural network using process behavior. Vol. 2, In 2016 IEEE
29(1), 51–59.
40th annual computer software and applications conference (pp. 577–582).
Olani, G., Wu, C.-F., Chang, Y.-H., & Shih, W.-K. (2022). DeepWare: Imaging perfor-
Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., & Zheng, Q. (2020). IMCFN:
mance counters with deep learning to detect ransomware. IEEE Transactions on
Image-based malware classification using fine-tuned convolutional neural network
Computers, 1.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic architecture. Computer Networks, 171, Article 107138.
representation of the spatial envelope. International Journal of Computer Vision, Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al.
42(3), 145–175. (2017). Attention is all you need. In Advances in neural information processing
Onwuzurike, L., Mariconti, E., Andriotis, P., Cristofaro, E. D., Ross, G., & Stringhini, G. systems, (pp. 5998–6008).
(2019). Mamadroid: Detecting android malware by building Markov chains of Vivek, Y., Ravi, V., Mane, A. A., & Naidu, L. R. (2022). Explainable artificial intelligence
behavioral models (extended version). ACM Transactions on Privacy and Security, and causal inference based ATM fraud detection. arXiv preprint arXiv:2211.10595.
22(2), 1–34. Xiao, M., Guo, C., Shen, G., Cui, Y., & Jiang, C. (2021). Image-based malware
Or-Meir, O., Cohen, A., Elovici, Y., Rokach, L., & Nissim, N. (2021). Pay attention: classification using section distribution information. Computers & Security, 110,
Improving classification of PE malware using attention mechanisms based on system Article 102420.
call analysis. In 2021 international joint conference on neural networks (pp. 1–8). Xu, G., Xin, G., Jiao, L., Liu, J., Liu, S., Feng, M., et al. (2023). Ofei: A semi-black-box
Pagliardini, M., Gupta, P., & Jaggi, M. (2017). Unsupervised learning of sentence android adversarial sample attack framework against dlaas. IEEE Transactions on
embeddings using compositional n-gram features. arXiv preprint arXiv:1703.02507. Computers.
Pintor, M., Demetrio, L., Sotgiu, A., Demontis, A., Carlini, N., Biggio, B., et al. (2022). Yakura, H., Shinozaki, S., Nishimura, R., Oyama, Y., & Sakuma, J. (2019). Neural
Indicators of attack failure: Debugging and improving optimization of adversarial malware analysis with attention mechanism. Computers & Security, 87, Article
examples. Advances in Neural Information Processing Systems, 35, 23063–23076. 101592.

15
A. Bensaoud et al. Machine Learning with Applications 16 (2024) 100546

Ye, R., & Dai, Q. (2021). Implementing transfer learning across different datasets for Zhong, F., Cheng, X., Yu, D., Gong, B., Song, S., & Yu, J. (2023). Malfox: camouflaged
time series forecasting. Pattern Recognition, 109, Article 107617. adversarial malware example generation based on conv-GANs against black-box
Yuan, B., Wang, J., Liu, D., Guo, W., Wu, P., & Bao, X. (2020). Byte-level malware detectors. IEEE Transactions on Computers.
classification based on Markov images and deep learning. Computers & Security, 92, Zhu, J., Jang-Jaccard, J., Singh, A., Watters, P. A., & Camtepe, S. (2021). Task-aware
Article 101740. meta learning-based siamese neural network for classifying obfuscated malware.
Zhao, Z., Li, Z., Zhang, F., Yang, Z., Luo, S., Li, T., et al. (2023). SAGE: Steering arXiv preprint arXiv:2110.13409.
the adversarial generation of examples with accelerations. IEEE Transactions on
Information Forensics and Security, 18, 789–803.

16

You might also like