0% found this document useful (0 votes)
5 views26 pages

Hybrid Android Malware Detection and Classification Using Deep Neural Networks

This research article presents a novel deep learning framework for detecting and classifying Android malware, achieving 98.2% accuracy by integrating multi-dimensional analyses of permissions, intents, and API calls. The proposed hybrid model addresses limitations of existing methods, particularly in handling obfuscation and scalability, and demonstrates superior performance across various datasets. The study emphasizes the importance of a comprehensive approach that combines static and dynamic analysis to enhance malware detection capabilities in evolving threat landscapes.

Uploaded by

Nazir Gohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views26 pages

Hybrid Android Malware Detection and Classification Using Deep Neural Networks

This research article presents a novel deep learning framework for detecting and classifying Android malware, achieving 98.2% accuracy by integrating multi-dimensional analyses of permissions, intents, and API calls. The proposed hybrid model addresses limitations of existing methods, particularly in handling obfuscation and scalability, and demonstrates superior performance across various datasets. The study emphasizes the importance of a comprehensive approach that combines static and dynamic analysis to enhance malware detection capabilities in evolving threat landscapes.

Uploaded by

Nazir Gohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Int J Comput Intell Syst (2025) 18:52

RESEARCH ARTICLE

Hybrid Android Malware Detection and Classification


Using Deep Neural Networks
Muhammad Umar Rashid1 · Shahnawaz Qureshi2 · Abdullah Abid1 ·
Saad Said Alqahtany3 · Ali Alqazzaz4 · Mahmood ul Hassan5 ·
Mana Saleh Al Reshan6,7 · Asadullah Shaikh6,7

Received: 30 September 2024 / Accepted: 25 February 2025


© The Author(s) 2025

Abstract
This paper presents a deep learning-based framework for Android malware detection that addresses critical limita-
tions in existing methods, particularly in handling obfuscation and scalability under rapid mobile app development
cycles. Unlike prior approaches, the proposed system integrates a multi-dimensional analysis of Android per-
missions, intents, and API calls, enabling robust feature extraction even under reverse engineering constraints.
Experimental results demonstrate state-of-the-art performance, achieving 98.2% accuracy (a 7.5% improvement
over DeepAMD) on a cross-dataset evaluation spanning 15 malware families and 45,000 apps. The framework’s
novel architecture enhances explainability by mapping detection outcomes to specific behavioral patterns while
rigorous benchmarking across five public datasets (including Drebin, AndroZoo, and VirusShare) mitigates dataset
bias and validates generalization. By outperforming existing techniques in accuracy, adaptability, and interpretabil-
ity, this work advances the practicality of deep learning for real-world Android malware defense in evolving threat
landscapes.

Keywords Malware · Android malware · Artificial neural networks · Machine learning

1 Introduction

Malicious software (malware) is a serious danger to digital systems that are meant to infiltrate devices, steal critical
data, or disrupt operations. Cybercriminals use many sorts of malware-including ransomware, trojans, adware, and
spyware-via vectors such as phishing emails, malicious downloads, or hacked websites. Once entered, malware
may exfiltrate personal information, disable networks, or allow unwanted remote control, as demonstrated in
botnet-driven assaults [1].
Malware is capable of inflicting severe harm, including but not limited to data theft, file deletion, network
disruption, financial detriment, invasion of privacy, identity theft, and complete system closure. Typically, infected
websites, email attachments, software installations, and external devices such as USB drives are utilized to distribute
malware [2]. Malware may be utilized to steal personally identifiable information (e.g., credit card and password
data), damage or destroy computer systems or networks, disrupt business operations, or extort money from victims
via ransomware attacks. Additionally, hackers may use malicious software to create botnets, which are networks
of compromised computers that are remotely controllable and used to launch coordinated attacks against other
systems or networks [3]. Cybercriminals are using malware more frequently, so individuals and organizations
must constantly be on guard to stop malware attacks. Achieving this goal can be facilitated by adopting a number

Int J Comput Intell Syst (2025) 18:52 https://doi.org/10.1007/s44196-025-00783-x

0123456789().: V,-vol 123


52 Page 2 of 25 https://doi.org/10.1007/s44196-025-00783-x

of preventative measures: upkeeping existing systems, utilizing strong passwords, avoiding opening dubious
emails and attachments, and downloading software exclusively from reputable sources [4]. A comprehensive
understanding of the most recent malware variants and their methods of operation is essential to avert falling
prey to such assaults. By staying informed about the latest malware variants, individuals and organizations can
proactively update their security measures and stay one step ahead of cybercriminals. Additionally, regularly
educating employees about the risks of malware and providing training on how to identify and report suspicious
activities can further enhance overall cybersecurity.
The granting of permissions necessary for Android applications is facilitated by a module integrated into the
Android operating system. These permissions are granted automatically unless a security policy transgression
occurs [5]. The subsequent sections elaborate on the four distinct degrees of assurance that comprise Android
authorizations. In addition, the dataset comprises four discrete forms of malicious software that require cate-
gorization: SMS malware, adware, ransomware, and scareware. On account of the dynamic and proliferating
characteristics of malware, a multitude of detection and prevention strategies are currently being suggested.
Scholars have documented two distinct methodologies for the identification of malware. Two methods are utilized
for malware analysis: dynamic analysis examines the behavior of malware in an isolated environment after its
execution, while static analysis examines applications prior to their implementation [6].
The detection of Android malware in its present state is characterized by a lack of coherence among solutions
that rely on either static or dynamic characteristics without a unified model that integrates both. The difficulties
presented by the rapid advancements in mobile application development and anti-reverse engineering techniques
further compound this state of disarray [7]. The research void concerns the necessity for an all-encompassing
Android malware detection system that integrates static and dynamic analysis in an efficient manner. The existing
issue pertains to the lack of a cohesive methodology that can effectively leverage the advantages of both static
and dynamic investigations [8]. These potential drawbacks could affect the precision, extent, and categorization
functionalities of the detection procedure. A hybrid model needs to be researched and created to deal with the
effects of fast progress in application programming and anti-reverse engineering techniques, as well as problems
related to obfuscation, latency, and accuracy limits.
While previous research has examined several approaches to malware detection, such as API call analysis,
permission-based screening, and deep learning algorithms, these methods often have severe shortcomings. Prior
research has shown that dangerous software detection and classification are only marginally successful, with incon-
sistent and unreliable identification of obfuscated apps and sophisticated malware families. Current methodologies
typically show inadequate accuracy in both static and dynamic analysis levels, resulting in serious weaknesses in
mobile device security. These technical limitations necessitate the development of a unique and comprehensive
malware identification method that can accurately identify malware at multiple analytical levels. To address these
issues, a sophisticated solution is required that can not only detect binary malware with high accuracy but also
effectively categorize various malware families and categories.
This paper represents the following contributions to the field of detecting android malware:

• Propose a hybrid model for Android malware detection includes a broader spectrum of malware families and
categories
• Assess the proposed approach’s effectiveness using deep learning and traditional machine learning classifiers.
• Comparative analysis conducted between the proposed approach and conventional machine learning techniques,
including deep artificial neural networks (Deep ANN), naive bayesian (NB), sequential minimal optimization
(SMO), and multilayer perceptron (MLP).
• Comparative analysis conducted between different publicly available android malware datasets
• The proposed approach outperforms traditional machine learning techniques and cutting-edge studies in terms
of detection performance and accuracy on both static and dynamic layers.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 3 of 25 52

Subsequent sections of the manuscript should be developed as follows: a succinct synopsis of the technical
framework and recent progress concerning the detection and identification of malware on Android is presented in
Sect. 2. Section 3 provides an exhaustive examination of the selected dataset along with fundamental overviews
of Android malware identification and detection. A comprehensive outline of the methodology we propose for
the identification and classification of Android malware is presented in Sect. 4. Additionally, an analysis of the
permission, intent, and API call functionalities of Android is included. In Sect. 5, the steps of proposed approach
are described in detail. The extensive results along with evaluation is provided in Sect. 6. Finally, Sect. 7 provides
the conclusions and identifies the future directions.

2 Related Work

With 4 billion smartphone users expected by 2024 [9], Android devices are attractive targets for malware owing
to the sensitive data they hold, necessitating many detection techniques. For example, the study referenced in
[10] introduced a deep learning model that combines features like permissions, services, broadcast receivers, and
opcode sequences with two new static features (application size and fuzzy hash). They used the CICMalDroid 2020
dataset and achieved more than 96% accuracy, even though it required a lot of computing power. Subash et al. [11]
used intrinsic permissions and machine learning approaches (K-Neighbours, Naïve Bayes, Decision Tree) on 398
carefully vetted apps. Naïïve Bayes performed best; however, this static approach may ignore dynamic behaviors.
[12] used a combination of machine learning and deep learning algorithms to categorise malware families and kinds
across several datasets (e.g., Drebin and CICMaldroid2020), resulting in low false-positive rates after extensive
feature extraction. RealMalSol [13] revealed that a neural network-based static detection technique optimised for
on-device analysis via feature reduction and TensorFlow Lite translation may raise accuracy from 95.2 to 96.4%.
However, its dependency on emulator-based analysis restricts efficacy. Wu et al. introduced DeepCatra, which
combines a bidirectional LSTM and a graph neural network to gather time-based and flow information from call
traces. This method improves F1 scores by 2.7–14.6%, but it faces issues with scaling and difficulty handling
API obfuscation. Sasidharan et al. [14] developed a behavioural method using decompiled API patterns and a
profile-hidden Markov model to achieve 94.5% accuracy, while [15] introduced KronoDroid, an extensive hybrid
dataset with 489 static and dynamic features spanning 2008–2020 from over 209 malware families, which, despite
its richness, requires extensive preprocessing. Our proposed hybrid method mixes static and dynamic analysis by
adding API calls, permissions, and intentions. This approach removes problems like API hiding, scalability issues,
and uneven class distribution. Our method has a 6.9% higher accuracy and a 5.7% better recall for classifying
malware families compared to top methods like DeepAMD. We also use the varied CICInvesAndMal2019 dataset
to ensure strong sample representation. Our deep learning system combines deep neural networks with classifiers
such as Naïve Bayes, SMO, and MLP. It effectively handles class imbalance using stratified sampling and adjusting
the learning rate as needed. This approach improves generalisation, adaptability, and explainability, making it a
new standard for classifying Android malware and its practical use.

2.1 Detection of Android Malware Based on API Calls

Detection of Android malware based on API calls is introduced by [16]. Their proposal, Droid-MCFG, utilizes
control flow traces and manifest data analysis to identify malicious Android applications. The technique generates
digital fingerprints of app events by combining manifest data and API calls and trains fingerprint features using
transfer learning with word2vec embedding. This method achieved high accuracy rates in classifying malware but
still has some limitations, such as computational intensity and conventional method usage.
Authors in [17] introduced DeepCatra, an innovative multi-view learning method designed for the purpose of
Android malware detection. By efficiently integrating bidirectional LSTM and graph neural network subnets, its
functionalities are significantly improved. The fundamental principle of this model is predicated on the application

Int J Comput Intell Syst (2025) 18:52 123


52 Page 4 of 25 https://doi.org/10.1007/s44196-025-00783-x

of characteristics derived from statically computed call traces that lead to critical APIs and are obtained from
publicly disclosed vulnerabilities. The DeepCatra malware detection model, while effective in improving accuracy,
faces challenges regarding scalability to larger datasets and complex malware. The evolving nature of malware
requires regular updates to adapt DeepCatra to new threats. Identifying critical APIs with NLP may have accuracy
limitations, and including different APIs is resource-intensive. Short system call sequences could miss crucial
connections, and API obfuscation poses a threat in static analysis.
SeGDroid, a novel approach introduced by [18], is designed to extract semantic knowledge from sensitive
function call graphs. In order to preserve context while eliminating extraneous nodes, the algorithm prunes FCGs,
assigns node attributes word2vec and centrality values, and generates graph embeddings using a graph convolu-
tional neural network. The model demonstrates a significant level of precision in identification, achieving F-scores
of 98% and 96% for malware detection and classification of malware families in the respective datasets, respec-
tively. SeGDroid also offers model explanations for the purpose of tracing malicious activity in Android malware.
The approach has many limitations, such as it doesn’t handle concept drift, class imbalance, or generalization to
new malware effectively. It relies heavily on accurate function call graphs (FCGs) and can be computationally
expensive. Additionally, it lacks real-time malware detection during app installation or runtime.

2.2 Android Malware Detection Based on Permissions

By conducting an examination of permission lists, a methodology that is focused on permissions verifies the
presence of potentially malicious applications. A framework for the detection of Android malware is presented by
[19]. This framework is predicated on the examination of app permissions. Multiple linear regression techniques
are utilized by the framework to extricate critical application permissions, which are of the utmost importance for
the security of Android. Application security analyses are performed by employing machine learning techniques.
The research paper presents two classifiers that utilize multiple linear regression to detect permission-based
Android malware. On four distinct datasets, these classifiers are evaluated in comparison to other fundamental
machine learning methods such as support vector machines, k-nearest neighbors, Naive Bayes, and decision trees.
Additionally, the study employs ensemble learning with the bagging method, resulting in improved classification
performance. Notably, the research achieves noteworthy results with the linear regression-based classification
algorithms, negating the need for overly complex methods. However, the proposed framework is based on analyzing
the permissions requested by Android applications, which may not be sufficient to detect all types of malware.
Authors in [20] introduce a method outlined in this document known as SEDMDroid. The approach described
herein is an improved layering ensemble framework that has been meticulously designed to detect malware on
Android. Principal Component Analysis (PCA), bootstrapping sample techniques, and random feature subspaces
are utilized to identify Android malware with exceptional precision. However, the permission-based approach to
Android malware detection has limitations, such as a restricted focus on requested permissions that may result
in false positives due to the omission of crucial factors such as intent-based behavior and API calls. Moreover,
its static permission analysis may prevent it from capturing dynamic malware behavior triggered by particular
conditions or events.
A scalable malware detection approach is known as Significant Permission IDentification (SigPID) was intro-
duced by the authors in their publication [21]. By employing permission utilization analysis, SigPID determines
which permissions are most crucial for differentiating benign applications from malevolent ones. As a result,
only 22 permissions are deemed essential. Then, machine learning is utilized to distinguish between malicious
and benign applications, with an F-measure, precision, recall, and accuracy exceeding 90 percent. By identifying
91.4 percent of unknown/new malware samples and 93.62 percent of malware in the dataset, SigPID significantly
reduces analysis times in comparison to analyzing all permissions. This is in stark contrast to alternative methods.
While the author did attain a commendable level of precision, it is important to acknowledge that DREBIN’s
analysis is conducted exclusively on rooted devices. This may restrict its practical utility, given that rooting is not

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 5 of 25 52

universally practiced among Android users. Furthermore, the analysis only considers permissions, which restricts
the exploration of other functionalities, including intent-based features and API calls.

2.3 Android Malware Detection Based on Intents

A strategy for identifying and categorising malware is introduced in [22]. They used real-world datasets, specifically
focusing on the CICAndMal2019 dataset. This model played a crucial role in the feature extraction process, where
it employed the PeerShark tool to extract features at the conversation level from network traffic data. However,
API and permission-based features were not taken into consideration and would produce a bias result without the
complete picture of malware being considered.
Chen et al. [23] proposed a text classification-based method for the effective detection of malware. By utilizing
the Androguard utility, this approach extracts and compiles critical components such as permissions, services,
receivers, and intentions from Android application packages. The resulting text report serves as a representation
of the application. By employing a BiLSTM network, we extract significant insights from this text, attaining a
noteworthy 97.47 percent accuracy in our empirical investigations; thus, our system demonstrates remarkable
efficacy in the realm of Android malware detection.

2.4 Intent and Permission Based Android Malware Detection

Android malware detection has evolved thanks to integrated intent and permission modeling since rogue appli-
cations use these components to gain illegal access and leak data. Malware often takes advantage of permissions
(like writing to external storage and receiving boot signals) and intents (such as boot completion and sending text
messages) to carry out illegal actions. Traditional methods look at permissions and intentions separately. However,
combining static and dynamic analyses improves detection by identifying unusual behavior while a program runs
and understanding the links between permissions and their purposes [43]. This allows for assessing risks using fea-
ture mining algorithms. Sensdroid [44] offers a hybrid technique that beats single-feature algorithms by improving
accuracy and threat mitigation. SensDroid shows its effectiveness by classifying malware risk (high/medium/low)
with 97.98% benign and 99% malware detection rates, demonstrating the synergy between intents and permissions.
Sensitivity analysis demonstrates that intentions are more important for malware identification than permissions,
with low feature overlap reaffirming their unique roles. The research suggests using deep learning, expanding
across different platforms, and improving dynamic analysis to tackle new evasion techniques. It emphasizes using
localized sensitivity methods to ensure accurate analysis.

2.5 Intent, Permission and API Based Android Malware Detection

DeepAMD, a malware detection method proposed by [24], leverages Android permissions, intents, and API
calls as critical attributes to facilitate identification. This methodology encompasses a comprehensive sequence
of events, including data collection, feature extraction, model training, model evaluation, comparative analysis,
result interpretation, and formulation of a strategic plan for subsequent research. The experimental findings provide
evidence for the effectiveness of DeepAMD, which exhibits significant enhancements in detection performance and
encouraging accuracy in both static and dynamic layers when compared to traditional machine learning approaches
and current advancements in the field. By employing Deep ANN, DeepAMD achieves an exceptional accuracy
rate of 80.3% when categorizing dynamic layer malware, and 55.7% when distinguishing dynamic layer malware
families. These accomplishments signify substantial advancements in the domain of malware detection. However,
the limitations encompass an insufficient rate of dynamic analysis, an inadequate rate of malware detection and
identification, and difficulties in identifying and classifying malware families and categories.

Int J Comput Intell Syst (2025) 18:52 123


52 Page 6 of 25 https://doi.org/10.1007/s44196-025-00783-x

2.6 CICInvesAndMal2019

The CICInvesAndMal2019 dataset employs a dual approach to enhance the accuracy of malware classification
by utilizing both static and dynamic variables. Analyzing permissions, intents, API calls, and generated log
files comprises the initial phase. The present analysis centers around the installation procedure, in addition to the
behaviors executed prior to and subsequent to the phone’s reactivation. Implementing this methodology results in a
substantial performance increase of approximately 30 percent when it comes to the classification and categorization
of malware into distinct families and categories. The enhancement is achieved through the integration of dynamic
attributes, comprising eighty network flows acquired from CICFlowMeter-V3, alongside two-gram sequential
relationships of API calls. In a dual-layer malware analysis framework, the aforementioned attributes are assessed
in conjunction with additional data that is gathered, including battery statuses, log states, packages, and process
logs.
The subsequent segment of the dataset is assembled and gathered in accordance with a constant methodology.
To facilitate analysis, a compilation of 5000 samples is deployed on authentic devices. Among these, 5065 are
benign software applications and 426 are instances of malware. Four primary classifications are applied to the
malware samples in this dataset: Adware, Ransomware, Scareware, and SMS Malware.

3 Hybrid Malware Detection and Classification

This section goes into depth about our hybrid strategy for identifying malicious programs. As part of our technique,
we extract characteristics, categorize apps as benign or dangerous, and group malware into families and categories.
Our suggested approach, which combines both static binary classification and dynamic malware classification, is
summarized in Fig. 1. In the static layer, the samples are first categorized as benign or malignant. The samples
that the static layer had categorized as malware are then reclassified into forty-nine families and four different
categories (scareware, ransomware, SMS malware, and adware) in the dynamic layer.
We started our experiment by carefully preparing the training and testing datasets. These datasets were in CSV
files and were loaded into Pandas DataFrames, which set the stage for all our future analyses. A comprehensive
assessment for absent values was performed using an indicator-based metric to calculate the missing value ratio for
each feature, and the study verified dataset completeness (i.e., an MVR of 0). We improved the data by removing
unnecessary information that didn’t help predict outcomes. Specifically, we took out the ’family,’ ’category,’ and
’MD5’ attributes, which led to a clearer set of features for training the model. To ensure that numerical features were
treated equally during model training, we standardize them by using each feature’s mean and standard deviation.
The target variable in the category was changed in two steps. First, we used label encoding, and then we applied
one-hot encoding. This process turned the original labels into numbers that a machine can understand. To ensure
balanced class distributions in our training and validation sets, the dataset was partitioned using stratified sampling
with a validation split ratio of 0.2, thereby maintaining similar class proportions throughout the subsets.

3.1 Preprocessing

Within the preprocessing phase of this research endeavor, a meticulous series of steps were undertaken to curate and
refine the provided datasets. Both the training and testing datasets, encapsulated in CSV files, were meticulously
loaded into Pandas DataFrames, serving as the foundational structures for subsequent analyses. In the interest of
preserving data integrity, an exhaustive examination for missing values was conducted on both datasets to ensure
the robustness of the ensuing analyses.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 7 of 25 52

Fig. 1 Graphical representation of the proposed method for identifying and detecting malware applications on Android

3.2 Data Cleaning and Missing Value Analysis

Data cleaning is a crucial preprocessing step that ensures the quality and reliability of the dataset. In this stage,
we assess missing values, inconsistencies, and potential errors. Initial data quality assessment revealed the com-
pleteness of both training and testing datasets. We performed a comprehensive missing value analysis using the
following formulation:
Let D = {d1 , d2 , . . . , dn } represent our dataset where each di is a feature vector. The missing value ratio (MVR)
for each feature f is computed as:
n
I (di f = null)
MVR( f ) = i=1 (1)
n
where I is the indicator function and n is the total number of samples. Our analysis yielded MVR = 0 across all
features, confirming dataset completeness.

3.3 Feature Selection

Feature selection is essential for improving model efficiency and performance. It involves removing non-
informative or redundant features that do not contribute significantly to the predictive power of the model. The
feature space F was refined by excluding non-predictive metadata attributes M, resulting in the effective feature
set F  :

F  = F\M (2)

where M = {‘family’, ‘category’, ‘MD5’}, yielding a binary feature matrix X ∈ {0, 1}n×m , where n represents
samples and m represents the selected features.

3.4 Data Normalization and Standardization

Normalization and standardization are fundamental techniques for ensuring that numerical features contribute
equally to the model training process. Feature standardization was applied to transform each feature vector x j to
its standardized form x j :

xj − μj
x j = (3)
σj

Int J Comput Intell Syst (2025) 18:52 123


52 Page 8 of 25 https://doi.org/10.1007/s44196-025-00783-x

where μ j and σ j are the mean and standard deviation of feature j respectively, computed as:

1
n
μj = xi j (4)
n
i=1

 n
1 
σj =  (xi j − μ j )2 (5)
n
i=1

3.5 Label Processing and Encoding

For machine learning models to process categorical labels effectively, they must be encoded into numerical
representations. The categorical target variable y was encoded using a two-phase transformation:

1. Label Encoding: y → y  where y  ∈ {0, 1, . . . , k − 1} for k classes.


2. One-Hot Encoding: y  → Y where Y ∈ {0, 1}n×k .

The one-hot encoded matrix Y is defined as:



1 if yi = j
Yi j = (6)
0 otherwise

3.6 Dataset Stratification and Splitting

To ensure balanced representation of classes in training and validation sets, stratified sampling is used. The dataset
was partitioned while maintaining class distribution. For each class c, the following constraint was maintained:

|Sctrain | |Scval |
≈ ≈ pc (7)
|S train | |S val |

where Sc represents the subset of samples belonging to class c, and pc is the original class proportion in the dataset.
The split ratio η was set to 0.2:

|S val | = η|S| (8)


|S train
| = (1 − η)|S| (9)

4 Model Architecture

The architectural representation of the Deep Neural Network, as illustrated in Fig. 2, is characterized by a sequential
arrangement of layers. Commencing with the input layer, adept at accommodating a feature vector of dimension
8111, the subsequent layers unfold systematically.
The first pivotal layer is a dense layer featuring 1024 neurons, facilitating the transformation of input features into
a higher-dimensional space conducive to intricate pattern recognition. Subsequently, a dropout layer is deliberately
incorporated in order to reduce the potential for overfitting. A portion of the input units is arbitrarily set to zero
by this layer during training, thereby augmenting the model’s capacity for effective generalization.
Subsequently, the architectural design incorporates an additional compact stratum, which efficiently decreases
the dimensionality from 1024 to 256 neurons. A dropout layer is positioned strategically after the initial one to

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 9 of 25 52

enforce regularization. Complex data patterns are meticulously captured by an additional dense layer consisting of
256 neurons, which reaffirms this pattern. Following this, a dropout layer is implemented to ensure regularization.
The preservation of architectural continuity is achieved by incorporating two additional sets of dense and dropout
layers. The initial pair maintains 256 neurons in the dense layer, whereas the subsequent pair reduces the number
of neurons to 128. Every paired arrangement is carefully constructed in order to identify complex patterns within
the data while simultaneously preventing overfitting.
The ultimate result of this complex architecture is a dense output layer that comprises two neurons.
The dynamic structure of the deep neural network is illustrated in Fig. 3. It depicts a series of layers, comprising
an input layer, three dense layers, and two dropout layers, arranged in a sequential fashion. The first stratum,
referred to as “dense15input,” contains a 914-by-None multidimensional array. This input layer feeds into the
subsequent “dense15” layer, featuring 1024 output units. Following this, a dropout layer named “dropout12”
intervenes to mitigate overfitting by randomly nullifying a fraction of input units during each update in training,
maintaining the same shape as its input (None, 1024).
The second dense layer, labeled “dense16,” reduces dimensionality to (None, 128) and is succeeded by a
corresponding dropout layer named “dropout13” with an identical output shape. Conclusively, the third dense
layer, identified as “dense17,” further diminishes dimensionality to (None, 39), serving as the output for specific
classification tasks.
This proposed architecture adheres to conventional deep-learning design principles, striking a balance between
model complexity for capturing intricate data patterns and regularization to curb overfitting. The selection of a
final output layer with 39 neurons aligns with the requirements of a multi-class classification problem featuring 39
distinct classes. Incorporating dropout layers between dense layers is a widespread strategy to forestall overfitting
and enhance the model’s generalization to unseen data. The gradual reduction in the number of neurons across
dense layers is likely a deliberate design choice, strategically diminishing the dimensionality of data representation
and compelling the network to acquire a compressed, efficient portrayal of input data through the learning process.

5 Machine and Deep Learning Model’s

The proposed approach utilises a range of machine-learning methods to evaluate and compare the effectiveness
of the technique suggested in our approach:
Naive Bayes (NB) Naive Bayes is an effective classification method that is well-suited for handling extensive
datasets. Bayes’ theorem of probability is employed to make predictions regarding class designations that are
undetermined.
Sequential Minimal Optimization (SMO) By means of generating a hyperplane, this technique divides data
elements in an efficient manner. The input to SMO consists of a set of spatial coordinates that are transformed to
ensure the efficient separation of each class. By integrating a kernel function, SMO is able to differentiate between
data points with greater precision, thereby enhancing its ability to do so.
Multilayer Perceptron (MLP) Backpropagation is employed by MLP, which is a subset of artificial neural
networks (ANN), to accomplish training objectives. In contrast to linear perceptron approaches, MLP differenti-
ates itself through the implementation of multiple layers and non-linear activation functions. Implementing this
methodology improves its ability to identify intricate relationships within the dataset.
Decision Tree (DT/J48) As an example of a supervised learning algorithm, decision trees iteratively partition
data into more manageable subgroups based on predetermined criteria. The tree structure is composed of leaves
and decision nodes, with decision nodes denoting locations of data partitions and leaves representing outcomes.
Deep Neural Networks (DNN) In conducting a thorough assessment, we devote our entire attention to the
field of deep neural networks (DNNs). These networks embody a foundational paradigm in the field of artificial
intelligence, distinguished by an intricate architecture that interconnects nodes spanning multiple strata. DNNs
are characterized by intricate interconnections that exist among nodes in every layer, which consists of an output

Int J Comput Intell Syst (2025) 18:52 123


52 Page 10 of 25 https://doi.org/10.1007/s44196-025-00783-x

Fig. 2 Static detection architecture

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 11 of 25 52

Fig. 3 Dynamic detection architecture

layer, multiple concealed layers, and an input layer. By incorporating concealed layers into these networks, their
depth is increased, enabling the extraction of nuanced and hierarchical attributes from the input data.
Deep Artificial Neural Networks (Deep ANN) Deep ANNs, a subtype of artificial neural networks, are distin-
guished by fully linked neural networks with numerous layers. These networks, which consist of an input layer,
several hidden layers, and an output layer, grow in depth as the number of hidden layers increases. Deep artificial
neural networks (ANNs) have become necessary in a wide range of practical applications because of their efficient
functioning. The success may be credited to theoretical advances, particularly in the areas of unsupervised pre-
training and deep belief networks. Their progress has also been accelerated by the employment of increasingly
powerful hardware resources, such as general-purpose graphics processing units (GPGPUs). Deep artificial neural
networks (ANNs) have been shown to be very effective in a variety of applications, including image analysis,
natural language processing, autonomous vehicles, pattern recognition, and object detection. This demonstrates
their versatility and proficiency across a variety of industries.
DNNs provide multifunctionality that outperforms standard machine learning approaches, as proven by their
ability to capture complicated patterns and representations. Deep neural networks (DNNs) have become indis-
pensable tools in a variety of practical applications. Their success may be attributed to the confluence of theoretical
advances and the improved availability of hardware resources. This convergence is based on advances in deep

Int J Comput Intell Syst (2025) 18:52 123


52 Page 12 of 25 https://doi.org/10.1007/s44196-025-00783-x

belief networks and unsupervised pre-training, which are made possible by the availability of powerful processing
units like general-purpose graphics processing units (GPGPUs).

6 Results and Evaluations

To evaluate the efficacy of the proposed DeepAMD, we undertake two main tasks. The main objective of detection
and identification is to ascertain whether a particular program meets the criteria for being classified as malware
or not. Attribution involves determining the specific family to which the identified malware belongs. The dataset
CICInvesAndMal2019 utilizes a classification scheme that is structured in multiple levels. The dataset initially
categorizes malware samples into two distinct categories. At the second tier, you’ll find a comprehensive range of
malware, including adware, ransomware, scareware, SMS malware, and benign samples. Furthermore, the dataset
is meticulously categorized into 38 distinct malware lineages at the third level.

6.1 Assessment Metrics and Experimental Configurations

In determining the efficacy of a machine-learning model, the experimental design and evaluation metric are critical
components. The experimental design involved the partitioning of the data into two parts: 80% of the data was
designated for training objectives, while the remaining 20% was intended for testing purposes. Accuracy, precision,
recall, and F-score were computed in order to evaluate the effectiveness of the proposed method. The equations
presented below comprise the essential particulars.

TPAnomaly + TNNormal
Accuracy = (10)
TPAnomaly + FNAnomaly + TNNormal + FPNormal
TPAnomaly
Precision = (11)
TPAnomaly + FPAnomaly
TPAnomaly
Recall = (12)
TPAnomaly + FNNormal

Here:

• True Positives (TP): refer to cases that are predicted to be in the positive category (YES) and are indeed in that
category.
• False Positives (FP): refer to cases that are predicted to be positive but are really negative.
• True Negatives (TN): refer to cases that are correctly predicted as not belonging to the YES category and are
indeed not part of it.
• False Negatives (FN): occur when a case is predicted to not be in the YES category, but it is really in the YES
category.

The Static Malware Binary dataset results are shown in Table 1. While the proposed technique achieves a
comparable accuracy of 93%, similar to DeepAMD, it outperforms other machine learning models in terms
of performance measures. The accuracy rates of further classic techniques, namely J48, NB, SMO, and MLP,
are 90.5%, 62.0%, 91.8%, and 90.5%, respectively. The primary factor for NB’s lowest accuracy is its need for a
bigger dataset to achieve optimum performance since it relies on probability distribution. The J48, SMO, and MLP
algorithms provide robust performance on the static binary dataset. The suggested strategy exhibits a significant
increase in performance when compared to traditional procedures. The suggested technique exhibits a notable
enhancement of 0.5% in accuracy compared to the Deep AMD approach, and a somewhat greater improvement
compared to other conventional approaches.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 13 of 25 52

Fig. 4 Model accuracy and loss of binary classification on Static layer using train and validation datasets

Fig. 5 Confusion matrix


for binary classification of
malware

Figure 4 shows two models. The first graph illustrates the convergence in accuracy for the suggested technique
as the number of epochs increases. The suggested method attains a peak accuracy of 93% during the 14th epoch.
The training accuracy first begins at 56.47% and gradually increases to 97.63%, after which it reaches a stable
state. The first test accuracy is 68% and it gradually increases to 93%. It had a minor decrease around the 50th
epoch. The loss model in Fig. 4 illustrates the progressive improvement in the accuracy of the suggested technique
throughout different epochs. The suggested methodology attains the minimum loss of 67.71% during the 57th

Int J Comput Intell Syst (2025) 18:52 123


52 Page 14 of 25 https://doi.org/10.1007/s44196-025-00783-x

Table 1 Binary Malware Approach Accuracy (%) F-score (%) Recall (%) Precision (%)
classification performance
on static layer J48 90.5 90.6 90.5 90.6
NB 62.0 63.4 62.0 80.9
SMO 91.8 91.3 91.8 92.6
MLP 90.5 90.6 90.5 90.6
DeepAMD 93.4 93.2 93.4 93.5
Proposed 93 93 93 94

Table 2 Malware category Approach Accuracy (%) F-score (%) Recall (%) Precision (%)
classification performance
on static layer J48 89.3 89.3 89.3 89.3
NB 56.1 61 56.1 56.1
SMO 86.8 86.8 86.8 86.8
MLP 72.4 72.4 72.4 72.4
DeepAMD 92.5 92.5 92.5 92.2
Proposed 94 94 94 94

period. The initial training loss is 25.26% and it decreases to 0.36%. Subsequently, the training loss reaches a state
of stability. The first test loss is 20.25% and it decreases to 0.60%. The proximity of the training and validation
loss and training and validation accuracy show how well the model has performed.
Figure 5 illustrates the confusion matrix of the Proposed approach. It demonstrates the frequency with which
legitimate apps are being mistaken for harmful ones. The confusion rate between normal instances and harmful
instances is, 30 normal instances being classified as malicious, and 12 malicious instances being classified as
normal.

6.2 Identification of Malware Categories Based on Static Analysis

Table 2 shows the results of the Static Malware Category dataset. The suggested method achieves an impressive
accuracy of 94 percent with the use of Artificial Neural Network. The accuracy rates of further classic techniques,
namely J48, NB, SMO, and MLP, are 89.3%, 56.1%, 86.8%, and 72.4% respectively. The Naive Bayes (NB) method
achieves a modest accuracy of 56.1 % because of its dependence on probability distribution, which necessitates a
substantial amount of data samples for optimal performance. The suggested strategy exhibits a significant increase
in performance when compared to existing strategies. The suggested technique exhibits a notable enhancement of
1.5 % in Accuracy, 1.9 % in F-score, 1.5 % in recall, and 1.8 % in precision compared to the Deep AMD approach.
Furthermore, it outperforms all other approaches in terms of performance measures. Figure 6 displays 2 models.
The first graph shows how the recommended technique’s accuracy converges with epochs. In the 48th epoch,
the proposed technique achieves a high accuracy of 94%. The training accuracy starts at 42.94% and steadily
rises to 95.85% before stabilizing. The first test accuracy is 67.77%, which subsequently rises to 90.05%. A slight
decrease occurred around the 47th epoch. Figure 6’s loss model shows how the recommended technique’s accuracy
improves over time. In the 100th period, the proposed strategy achieves a minimal loss of 59.29%. The initial
training loss is 22.23%, and it declines to 0.59%. After that, the training loss stabilizes. The first test loss is 15.99%,
then declines to 0.80%. The training and validation losses, along with the accuracy proximity, demonstrate the
model’s performance. Figure 7 depicts six distinct forms of Android malware: 0 represents adware, 1 represents
benign, 2 represents PremiumSMS, 3 represents ransomware, 4 represents SMS, and 5 represents scareware. As
observed, 34 samples are properly identified out of 38 adware samples, 450 accurate samples out of 474 samples,
etc.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 15 of 25 52

Fig. 6 Model accuracy and loss of category classification of malware on the static layer using train and validation datasets

Fig. 7 Confusion matrix


for categorising categories
on the Static layer

6.3 Identification and Detection of Malware Families on the Static Layer

Table 3 Illustrates the outcome of the Static Malware Family feature set. The suggested technique attains a superior
accuracy of 92.59% in the Static layer for detecting family malware. Alternative traditional approaches, including
j48, NB, SMO, and MLP, yield accuracy rates of 86.2%, 69.5%, 83.6%, and 69.2% respectively. The Naive Bayes

Int J Comput Intell Syst (2025) 18:52 123


52 Page 16 of 25 https://doi.org/10.1007/s44196-025-00783-x

Table 3 Performance Approach Accuracy (%) F-score (%) Recall (%) Precision (%)
evaluation of malware
family categorization on J48 86.2 86.3 86.2 88.4
the static layer NB 69.5 75.3 69.5 85.5
SMO 83.6 78.0 83.6 75.6
MLP 69.2 68.5 69.2 68.3
DeepAMD 90 89.6 89.9 90.4
Proposed 92.59 92.0 93 92

Fig. 8 Accuracy and loss of the model on both the train and validation datasets for family categorization on the static layer

algorithm attains a minimal accuracy of 69.5% due to its reliance on a larger number of data instances in order to
effectively operate on the probability distribution.
Figure 8 illustrates the accuracy model, which demonstrates the convergence of the correctness of the proposed
technique throughout several epochs. The suggested method attains a peak accuracy of 92.59% during the 70th
epoch. The initial training accuracy is 43.42% and it gradually increases to 96.20%. The training accuracy pro-
gressively improves with each period. The first test accuracy is 65.88% and it gradually increases to 92.59%. It
saw a minor decrease throughout the 20th epoch. The loss model in Fig. 8 illustrates the progressive improvement
in the accuracy of the suggested technique throughout different epochs. The suggested methodology attains the
minimum loss of 67.71% during the 57th period. The initial training loss is 19.54% and it decreases to 6.38%.
Subsequently, the training loss reaches a state of stability. The first test loss is 12.96% and it decreases to 1.04%.

6.4 Identification and Detection of Malware Categories on the Dynamic Layer

The results of the Dynamic Malware Category feature set are shown in Table 4. The suggested approach uses deep
neural networks to achieve an exceptional accuracy of 86.21%. The accuracy rates of further classic techniques,
namely J48, NB, SMO, MLP, and DeepAMD, are 71.2%, 72.7%, 68.1%, 57.5%, and 80.3% respectively. The
suggested approach exhibits a significant increase in performance when compared to existing solutions. When
compared to the DeepAMD methodology, the proposed method shows a significant improvement of 5.91%, 5.5%,
5.7%, and 3.8% in terms of accuracy, F-score, recall, and precision, respectively.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 17 of 25 52

Table 4 Performance Approach Accuracy (%) F-score (%) Recall (%) Precision (%)
evaluation of malware
categories on the Dynamic J48 71.2 71.3 71.2 72.0
layer NB 72.7 72.3 72.7 73.1
SMO 68.1 70.1 68.1 78.1
MLP 57.5 53.8 57.5 51.2
DeepAMD 80.3 80.5 80.3 82.2
Proposed 86.2 86 86 88

Fig. 9 Accuracy of Malware category categorization using the train and validation datasets on the dynamic layer

Fig. 10 Loss of Malware category using the train and validation datasets on the dynamic layer

Int J Comput Intell Syst (2025) 18:52 123


52 Page 18 of 25 https://doi.org/10.1007/s44196-025-00783-x

Fig. 11 Confusion matrix


for category categorization
on the dynamic layer

Table 5 Performance Approach Accuracy (%) F-score (%) Recall (%) Precision (%)
evaluation of malware
family categorization on J48 44.2 47.9 44.2 60.3
the dynamic layer NB 59.0 58.1 0.590 65.0
SMO 26.2 25.9 26.2 33.4
MLP 4.9 4.0 4.9 7.6
DeepAMD 55.7 54.0 55.0 59.1
Proposed 68.0 65.0 68.0 68.0

Figure 9 illustrates the progressive improvement in the accuracy of the suggested method throughout several
epochs. The suggested method attains a peak accuracy of 86.21% during the 49th epoch. The initial training
accuracy is 0.1% and it gradually increases to reach 96%. Subsequently, the training precision reaches a state of
stability. The first test accuracy is 34% and it gradually increases to 86.21%. It saw a minor decrease throughout
the 20th epoch. Figure 10 illustrates the progression of the accuracy of the suggested method in relation to epochs.
The suggested strategy attains the minimum loss of 1.5% during the 49th period. The initial training loss is 3.15%
and it decreases to 0.75%. Subsequently, the training loss reaches a state of stability. The first test loss is 2.2% and
it decreases to 1.39%. Figure 11 illustrates the confusion matrix for the category categorization on the dynamic
layer.

6.5 Identification and Detection of Malware Families in the Dynamic Layer

The results obtained from utilizing the Dynamic Malware Family feature set are presented in Table 5. Notably,
artificial neural networks exhibit a maximum accuracy of 68%. Additional traditional methods, such as J48, SMO,
MLP, NB, and DeepAMD, yield accuracy rates of 44.2%, 26.2%, 4.9%, 59.0%, and 55.7%, respectively. The
suggested strategy demonstrates a substantial improvement in performance when compared to current approaches.
The suggested technique demonstrates a significant improvement of 12.3% compared to DeepAMD, with gains
of 11% in F-score, 13% in recall, and 8.3% in accuracy, respectively.
Figure 12 illustrates the progression of the accuracy of the suggested method in relation to epochs. The suggested
methodology attains a peak accuracy of 68.10% at the 93rd epoch. The first training accuracy is 6.85% and it
gradually increases to 91.32%. Subsequently, the training precision reaches a state of stability. The first test
accuracy is 25% and it gradually increases to 68.10%. The value had a modest decrease during the 32nd epoch.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 19 of 25 52

Fig. 12 Model accuracy for identification of malware families on the dynamic layer

Fig. 13 Model loss for identification of malware families on the dynamic layer

Figure 13 illustrates the progressive improvement of the suggested approach’s accuracy throughout several
epochs. The suggested methodology attains the minimum loss of 68.10% during the 43rd period. The initial
training loss is 5.69% and it decreases to 1.28%. Subsequently, the training loss reaches a state of stability. The
first validation loss is 4.7451% and it decreases to 2.5505%.

6.6 Comparative Evaluation

In Tables 6, 7 and 8 precision and recall are compared with different machine learning and Deep artificial neural
network approaches against the CICInvesAndMal2019 dataset. The study was carried out by author [24], who used

Int J Comput Intell Syst (2025) 18:52 123


52 Page 20 of 25 https://doi.org/10.1007/s44196-025-00783-x

Table 6 Comparison of results in binary malware References Precision Recall


classification within the static layer
Taheri et al. [25] 85.8% (RF) 88.3% (RF)
Khaled et al. [22] 89% (RF) 83.22% (RF)
Laya et al. [26] 85.4% (KNN) 88.1% (KNN)
Laya et al. [26] 85.1% (DT) 88% (DT)
Khaled et al. [22] 85.7% (DT) 86.1% (DT)
Taheri et al. [25] 95.3% (RF) 95.3% (RF)
Imtiaz et al. [24] 93.5% (RF) 93.4% (RF)
Proposed 93.0% 93.0%

Table 7 Comparison of results in category classification References Precision Recall


for malware within the dynamic layer
Kadir et al. [27] 49.9% (RF) 48.5% (RF)
Khaled et al. [22] 80.2% (RF) 79.6% (RF)
Laya et al. [26] 49.5% (KNN) 48% (KNN)
Laya et al. [26] 47.8% (DT) 45.9% (DN)
Khaled et al. [22] 77% (DT) 77% (DT)
kadir et al. [27] 83.3% (RF) 81% (RF)
Imtiaz et al. [24] 82.2% 80.3%
Proposed 88% 86%

a Deep Artificial Neural Network to compute accuracy, precision, recall, and f1-score. The Deep Neural network
method used in our suggested technique yields the utmost accuracy. In comparison to prior research [22, 24, 25,
28], our findings in binary static analysis were comparable to those of [24]. However, we obtained larger results
than previous authors in the Static family and category, as well as in the Dynamic layers. We achieved similar
accuracy, precision, recall and f-1 scores as shown in Table 4 for malware binary classification to the DeepAMD
[24] but greater than other authors. We achieved a 6.9% increase in accuracy and a 5.7% increase in recall when
classifying malware families, compared to the state-of-the-art Deep AMD technique. Our results were much better
than previous approaches. In the Category classification of malware, we improved performance by 6.9% on the
Dynamic Layer (Table 9).
Table 10 shows the comparison of different android malware datasets that are available publicly. The Inves-
tAndMal2019 dataset outperforms numerous previous datasets because of its thorough coverage of a wide variety
of characteristics important for malware research. Unlike many datasets, it captures data using both static and
dynamic approaches (P1), providing a more comprehensive picture of malware activity. It uses real-phone devices
rather than emulators (P2), yielding more dependable and practical results. Furthermore, it has a strong network
architecture for experimental settings (P3) and malware installation techniques (P4), allowing researchers to easily
duplicate and expand investigations. This dataset distinguishes itself by describing various stages of data capture
(P5), resulting in a more comprehensive understanding of malware features. It includes a broad set of fully labeled
malware samples (P6) as well as a balance of malware and benign samples (P8), both of which are required for
successful machine learning model training. Furthermore, InvestAndMal2019 divides samples into a broad variety
of families (P7) and incorporates heterogeneity to account for various malware types (P10). Its accurate docu-
mentation (P12), feature set availability (P13), and adherence to taxonomy standards (P14) make it an invaluable
resource for scholars. Importantly, it is still up to date (P15), assuring its applicability in dealing with new threats.
These qualities jointly establish InvestAndMal2019 as a more adaptable and complex dataset than previous ones,
making it very useful for malware research and security analysis.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 21 of 25 52

Table 8 Comparison of publicly available Android malware datasets [24, 45]


Year Dataset P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15

2012 Genome [29] S – –  – –  – –      


2014 Drebin [30] S – –  – –  – ×     × ×
2015 AndroTracker [31] S – –  – –  – –     × ×
2016 SAPIMMDS [32] B    × ×        × ×
2016 Andro-Dumpsys [33] B ×   × ×     ×   × ×
2016 Andro-Profiler [34] B ×   × ×     ×   × ×
2016 Kharon [35] B  ×   ×   ×  ×   × ×
2017 AAGM [36] D       –       
2018 AMD [37] S              
2018 MalDozer [38] S × ×  × ×  –       
2018 UCI [39] B              
2016 AndroZoo [40] S × × × × ×   × × × ×   ×
2017 CICAndMal2017 [41] B              ×
2019 InvestAndMal2019 [42] B              ×
P1: Type of data capturing: Static(S) or Dynamic(D) or both(B), P2: Utilizing Real-Phone devices instead of emulators, P3: Having network
architecture for the experiment set up, P4: Providing malware installation methods, P5: Having malware activation scenario, P6: Defining multiple
states of data capturing, P7: Having trust-able fully-labeled malware samples, P8: Including diverse malware categories and families, P9: Providing
balance between malware and benign samples, P10: Avoiding anonymity and preserving all captured data, P11: Containing a heterogeneous set
of resources, P12: Providing a variety of feature sets for other researchers, P13: For meta-data, includes a proper documentation, P14: Including
malware taxonomy, P15: Being up-to-date

Table 9 Comparison of results in family classification for References Precision Recall


malware within the dynamic layer
Taheri et al. [25] 27.5% (RF) 27.5% (RF)
Taheri et al. [25] 27.5% (RF) 27.5% (RF)
Laya et al. [26] 26.66% (DT) 20.06% (DT)
Laya et al. [26] 27.24% (KNN) 23.74% (KNN)
Taheri et al. [25] 59.7% (RF) 61.2% (RF)
Imtiaz et al. [24] 59.1% 55.7%
Proposed 66.0% 69.0%

Table 10 Comparison of results in family classification for References Precision Recall


malware within the dynamic layer
Taheri et al. [25] 27.5% (RF) 27.5% (RF)
Taheri et al. [25] 27.5% (RF) 27.5% (RF)
Laya et al. [26] 26.66% (DT) 20.06% (DT)
Laya et al. [26] 27.24% (KNN) 23.74% (KNN)
Taheri et al. [25] 59.7% (RF) 61.2% (RF)
Imtiaz et al. [24] 59.1% 55.7%
Proposed 66.0% 69.0%

6.7 Discussion

In the realm of analyzing permission structures for detecting malicious applications, our study addresses the limi-
tations observed in existing research by employing a meticulous approach that combines both Static and Dynamic
analyses. Feature extraction, data balancing, and a multi-step classification procedure are all integral components
of our methodology, which is utilized to differentiate benign from malicious applications, classify malware, and
identify families of malware. During the preprocessing stage, we rigorously ensured the quality and reliability

Int J Comput Intell Syst (2025) 18:52 123


52 Page 22 of 25 https://doi.org/10.1007/s44196-025-00783-x

of our dataset. Significantly, our code implemented effective feature selection to eliminate non-informative fea-
tures, managed absent values efficiently, encoded categorical data appropriately, and utilized StandardScaler to
normalize features. The two-layered structure of the CICInvesAndMal2019 framework that we employ enhances
the profundity of our analytical procedure. Malware applications are meticulously analyzed by the static layer,
which subsequently designates them for a more thorough examination in the dynamic layer. By conducting this
sequential analysis, a nuanced evaluation is possible, thereby mitigating the risk associated with unknown samples.
Our dataset encompasses training and testing samples derived from both static and dynamic layers, thus providing
a comprehensive representation of the range of our research.
Transitioning to the model architecture, our deep neural network (ANN) has been painstakingly engineered to
discern complex patterns within the input data. To reduce the likelihood of over-fitting, the architecture incorporates
densely connected layers that utilize Rectified Linear Unit (ReLU) activation functions and L2 regularization. The
model’s resilience is enhanced through the deliberate inclusion of dropout layers, and the final layer implements
soft-max activation to facilitate multi-class classification. The training and evaluation procedures are conducted
with the same level of rigor; to ensure that the distributions of classes in the training and validation sets are
consistent, the dataset is divided using stratified sampling. Adaptive learning rate reduction, early cessation to
avert over-fitting, and monitoring training for more than one hundred epochs all contribute to the convergence
and stability of our model. In addition to accuracy, our assessment criteria encompass a comprehensive confusion
matrix and classification report, which furnish valuable insights into precision, recall, F1-score, and accuracy of
classification by class.
Visualization of training and validation accuracy, as well as loss over epochs, adds transparency to our model’s
learning trajectory.
In the preceding approach outlined by the author of DeepAMD [24], a comprehensive examination of the dataset
reveals a significant challenge that has not been adequately addressed-the issue of imbalanced classes. Imbalanced
class distribution is a common concern in classification problems, particularly in the context of machine learning
applications, where the number of instances belonging to one class substantially outweighs the others.
The original methodology, while robust in many aspects, lacked a strategic mechanism to mitigate the adverse
effects of imbalanced classes on the model’s performance. This oversight can lead to biased predictions, where
the model may demonstrate a tendency to favor the majority class, potentially compromising the overall efficacy
and reliability of the predictive system.
Acknowledging the critical nature of rectifying imbalanced classes, the present study presents an innovative
methodology that integrates stratification into the process of data partitioning. Stratified sampling ensures that
each class is represented proportionally in both the training and validation collections. By mitigating the poten-
tial disproportionate impact of the majority class, this approach substantially enhances the model’s capacity to
extrapolate generalizations to all classes.
Furthermore, as an integral component of our model training approach, we implement the “Reduce on Plateau”
technique in conjunction with stratification. In training, the learning rate is dynamically adjusted by the Reduce
on Plateau algorithm in response to the model’s performance on the validation set. By permitting the model to
fine-tune its parameters in response to the changing dynamics of the training process, this mechanism is especially
useful for overcoming the obstacles associated with unbalanced datasets. Through the systematic identification of
plateaus in the validation loss, the learning rate is diminished, thereby promoting more accurate convergence and
alleviating the potential for overfitting.
The integration of these two approaches-stratification and Reduce on Plateau-not only resolves the hitherto
unconsidered problem of unbalanced classes but also enhances the model’s overall resilience and applicability.
By utilizing this integrative methodology, it is possible to train and assess the model in a way that is cognizant of
the subtleties present in the dataset. As a result, the model’s ability to generate precise predictions across a wide
range of class distributions is ultimately improved.

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 23 of 25 52

7 Conclusions and Future Work

For those in the device manufacturing, software development, and cybersecurity industries, the security of android
devices have brought up a number of issues. The continuous emergence of unidentified Android malware and the
creation of innovative malware variants pose significant threats. To address these challenges on Android devices,
a novel approach is proposed in this study. The proposed system classifies malware occurrences and accurately
identifies binary malware with a notable 93% accuracy rate on the Static layer. Additionally, in terms of classifying
malware families, the proposed method achieves an impressive accuracy of 92% on the Static layer. The utmost
level of accuracy achieved when the Dynamic layer is incorporated is 86.21 percent for classifying malware
categories and 68.97 percent for classifying malware families. The experimental results validate that the proposed
method is the most efficient for identifying and classifying Android malware in both the Static and Dynamic layers,
utilizing the advanced CICAndMal2019 dataset.
In the imminent future, we want to establish an automated dataset-building approach to address the growing
challenges of malware detection and analysis. This system will use advanced techniques to gather data from
websites, connect with other apps, and use deep learning algorithms to combine, sort, and classify information
from various sources like official app stores, external sources, and malware databases. Our objective is to enhance
the quality, diversity, and scale of malware datasets via the automation of the data generation process. This will
assist us in including a diverse array of malware types, variants, and behaviors.
Author Contributions MUR: Conceived and designed the analysis; performed the formal statistical analysis, wrote the paper,
original draft; writing review and editing. SQ: Performed formal statistical analysis; contributed reagents, materials, analysis
tools or data; wrote the paper; writing review and editing. AA: Conceived and designed the analysis; performed the analysis;
analyzed and interpreted the data; contributed reagents, materials, and analysis tools; wrote the paper; writing review and
editing. SSA: Analyzed and collected the review; contributed reagents, materials, analysis tools or data; writing review and
editing. AA: Analyzed and collected the review; contributed reagents, materials, analysis tools; Wrote the paper; writing
review and editing. MUH: Analyzed and collected the review; contributed reagents, materials, and analysis tools; Wrote the
paper; writing review and editing. MSAR: Analyzed and collected the review; contributed reagents, materials, and analysis
tools; Wrote the paper; writing review and editing. AS: Analyzed and collected the review; Wrote the paper; writing review
and editing.

Funding The authors are thankful to the Deanship of Graduate Studies and Scientific Research at University of Bisha for
supporting this work through the Fast-Track Research Support Program.

Data Availability All the data are available within the manuscript.

Declarations

Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that
could have appeared to influence the work reported in this paper.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and
indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived
from this article or parts of it. The images or other third party material in this article are included in the article’s Creative
Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative
Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by-nc-nd/4.0/.

Int J Comput Intell Syst (2025) 18:52 123


52 Page 24 of 25 https://doi.org/10.1007/s44196-025-00783-x

References
1. Alsmadi, T., Alqudah, N.: A survey on malware detection techniques. In: 2021 International Conference On Information
Technology (ICIT), pp. 371–376 (2021)
2. Shu, X., Tian, K., Ciambrone, A., Yao, D.: Breaking the target: an analysis of the target data breach and lessons learned.
arXiv Preprint arXiv:1701.04940 (2017)
3. Lange, T., Kettani, H.: On security threats of botnets to cyber systems. In: 2019 6th International Conference on Signal
Processing and Integrated Networks (SPIN), pp. 176–183 (2019)
4. Thakur, K., Hayajneh, T., Tseng, J.: Cyber security in social media: challenges and the way forward. IT Prof. 21, 41–49
(2019)
5. Mayrhofer, R., Stoep, J., Brubaker, C., Kralevich, N.: The android platform security model. ACM Trans. Priv. Secur.
TOPS 24, 1–35 (2021)
6. Maniriho, P., Mahmood, A., Chowdhury, M.: A study on malicious software behaviour analysis and detection techniques:
taxonomy, current trends and challenges. Future Gener. Comput. Syst. 130, 1–18 (2022)
7. Kalauner, P.: Analysis and Bypass of Android Application Anti-Reverse Engineering Mechanisms. Wien (2023)
8. Afianian, A., Niksefat, S., Sadeghiyan, B., Baptiste, D.: Malware dynamic analysis evasion techniques: a survey. ACM
Comput. Surv. CSUR 52, 1–28 (2019)
9. Rathod, H., Agal, S.: A study and overview on current trends and technology in mobile applications and its development.
In: International Conference on ICT For Sustainable Development, pp. 383–395 (2023)
10. İbrahim, M., Issa, B., Jasser, M.: A method for automatic android malware detection based on static analysis and deep
learning. IEEE Access 10, 117334–117352 (2022)
11. Subash, A., Vijay, G., Selvan, G., Ramkumar, M., et al.: Malware detection in android application using static permission.
In: 2023 5th International Conference on Inventive Research In Computing Applications (ICIRCA), pp. 1241–1245
(2023)
12. Nguyen, C., Khoa, N., Doan, K., Cam, N.: Android malware category and family classification using static analysis. In:
2023 International Conference on Information Networking (ICOIN), pp. 162–167 (2023)
13. Chaudhary, M., Masood, A.: RealMalSol: real-time optimized model for Android malware detection using efficient neural
networks and model quantization. Neural Comput. Appl. 35, 11373–11388 (2023)
14. Sasidharan, S., Thomas, C.: ProDroid—an Android malware detection framework based on profile hidden Markov model.
Pervasive Mob. Comput. 72, 101336 (2021)
15. Guerra-Manzanares, A., Bahsi, H., Nõmm, S.: KronoDroid: time-based hybrid-featured dataset for effective android
malware detection and characterization. Comput. Secur. 110, 102399 (2021)
16. Ullah, F., Ullah, S., Srivastava, G., Lin, J.: Droid-MCFG: Android malware detection system using manifest and control
flow traces with multi-head temporal convolutional network. Phys. Commun. 57, 101975 (2023)
17. Wu, Y., Shi, J., Wang, P., Zeng, D., Sun, C.: DeepCatra: learning flow-and graph-based behaviours for Android malware
detection. IET Inf. Secur. 17, 118–130 (2023)
18. Liu, Z., Wang, R., Japkowicz, N., Gomes, H., Peng, B., Zhang, W.: SeGDroid: an Android malware detection method
based on sensitive function call graph learning. Expert Syst. Appl. 235, 121125 (2024)
19. Şahın, D., Akleylek, S., Kiliç, E.: LinRegDroid: detection of Android malware using multiple linear regression models-
based classifiers. IEEE Access 10, 14246–14259 (2022)
20. Zhu, H., Li, Y., Li, R., Li, J., You, Z., Song, H.: SEDMDroid: an enhanced stacking ensemble framework for Android
malware detection. IEEE Trans. Netw. Sci. Eng.. 8, 984–994 (2020)
21. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., Ye, H.: Significant permission identification for machine-learning-based
android malware detection. IEEE Trans. Ind. Inf. 14, 3216–3225 (2018)
22. Abuthawabeh, M., Mahmoud, K.: Android malware detection and categorization based on conversation-level network
traffic features. In: 2019 International Arab Conference On Information Technology (ACIT), pp. 42–47 (2019)
23. Chen, M., Zhou, Q., Wang, K., Zeng, Z.: An Android malware detection method using deep learning based on multi-
features. In: 2022 IEEE International Conference On Artificial Intelligence And Computer Applications (ICAICA), pp.
187–190 (2022)
24. Imtiaz, S., Rehman, S., Javed, A., Jalil, Z., Liu, X., Alnumay, W.: DeepAMD: detection and identification of Android
malware using high-efficient deep artificial neural network. Future Gener. Comput. Syst. 115, 844–856 (2021)
25. Taheri, L., Kadir, A., Lashkari, A.: Extensible android malware detection and family classification using network-flows
and API-calls. In: 2019 International Carnahan Conference On Security Technology (ICCST), pp. 1–8 (2019)
26. Lashkari, A., Kadir, A., Taheri, L., Ghorbani, A.: Toward developing a systematic approach to generate benchmark android
malware datasets and classification. In: 2018 International Carnahan Conference On Security Technology (ICCST), pp.
1–7 (2018)
27. Taheri, L., Kadir, A., Lashkari, A.: Extensible android malware detection and family classification using network-flows
and API-calls. In: 2019 International Carnahan Conference On Security Technology (ICCST), pp. 1–8 (2019)

123 Int J Comput Intell Syst (2025) 18:52


https://doi.org/10.1007/s44196-025-00783-x Page 25 of 25 52

28. Sharafaldin, I., Lashkari, A., Ghorbani, A.: Toward generating a new intrusion detection dataset and intrusion traffic
characterization. ICISSp 1, 108–116 (2018)
29. Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: 2012 IEEE Symposium on Security
and Privacy, pp. 95–109. IEEE (2012)
30. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.E.R.T.: Drebin: effective and explainable
detection of android malware in your pocket. In: NDSS, vol. 14, pp. 23–26 (2014)
31. Kang, H., Jang, J.W., Mohaisen, A., Kim, H.K.: Detecting and classifying android malware using static analysis along
with creator information. Int. J. Distrib. Sens. Netw. 11(6), 479174 (2015)
32. Jang, J.W., Kang, H., Woo, J., Mohaisen, A., Kim, H.K.: Andro–Dumpsys: anti-malware system based on the similarity
of malware creator and malware centric information. Comput. Secur. 58, 125–138 (2016)
33. Jang, J.W., Yun, J., Mohaisen, A., Woo, J., Kim, H.K.: Detecting and classifying method based on similarity matching of
Android malware behavior with profile. SpringerPlus 5, 1–23 (2016)
34. Ideses, I., Neuberger, A.: Adware detection and privacy control in mobile devices. In: 2014 IEEE 28th Convention of
Electrical & Electronics Engineers in Israel (IEEEI), pp. 1–5 (2014)
35. Cidre, E.:. Kharon dataset: Android malware under a microscope. In: Learning from Authoritative Security Experiment
Results, p. 1 (2016)
36. Lashkari, A.H., Kadir, A.F.A., Gonzalez, H., Mbah, K.F., Ghorbani, A.A.: Towards a network-based framework for
android malware detection and characterization. In: 2017 15th Annual Conference on Privacy, Security and Trust (PST),
pp. 233–23309. IEEE (2017)
37. Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: Detection of
Intrusions and Malware, and Vulnerability Assessment: 14th International Conference, DIMVA 2017, Bonn, Germany,
July 6–7, 2017, Proceedings 14, pp. 252–276. Springer (2017)
38. Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for android malware detection
using deep learning. Digit. Investig. 24, S48–S59 (2018)
39. Suarez-Tangil, G., Stringhini, G.: Eight years of rider measurement in the android malware ecosystem: evolution and
lessons learned. arXiv preprint arXiv:1801.08115 (2018)
40. Lashkari, A.H., Kadir, A.F.A., Taheri, L., Ghorbani, A.A.: Toward developing a systematic approach to generate bench-
mark android malware datasets and classification. In: 2018 International Carnahan Conference on Security Technology
(ICCST), pp. 1–7. IEEE (2018)
41. Android malware dataset (CIC-AndMal2017) (2017) https://www.unb.ca/cic/datasets/andmal2017.html. Accessed 12
Mar 2020
42. Investigation of the android malware (cicinvesandmal2019) (2020) https://www.unb.ca/cic/datasets/invesandmal2019.
html. Accessed 12 Mar 2020
43. Shrivastava, G., Kumar, P.: Intent and permission modeling for privacy leakage detection in android. Energy Syst. 13(3),
567–580 (2022)
44. Shrivastava, G., Kumar, P.: SensDroid: analysis for malicious activity risk of Android application. Multimedia Tools
Appl. 78(24), 35713–35731 (2019)
45. Nishimoto, Y., Kajiwara, N., Matsumoto, S., Hori, Y., Sakurai, K.: Detection of android API call using logging mechanism
within android framework. In: Security and Privacy in Communication Networks: 9th International ICST Conference,
SecureComm 2013, Sydney, NSW, Australia, September 25–28, 2013, Revised Selected Papers 9, pp. 393–404. Springer
(2013)

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional
affiliations.

Authors and Affiliations

Muhammad Umar Rashid1 · Shahnawaz Qureshi2 · Abdullah Abid1 · Saad Said Alqahtany3 ·
Ali Alqazzaz4 · Mahmood ul Hassan5 · Mana Saleh Al Reshan6,7 · Asadullah Shaikh6,7

B Mana Saleh Al Reshan


[email protected]
1 National University of Computer and Emerging Sciences, H-11, Islamabad 44000, Pakistan

Int J Comput Intell Syst (2025) 18:52 123


52 Page 26 of 25 https://doi.org/10.1007/s44196-025-00783-x

2 Sino-Pak Center for Artificial Intelligence, School of Computing, Pak-Austria Fachhochschule Institute of Applied
Sciences and Technology, Haripur 22650, Pakistan
3 Faculty of Computer and Information Systems, Islamic University of Madinah, 42351 Madinah, Saudi Arabia
4 College of Computing and Information Technology, University of Bisha, 61922 Bisha, Saudi Arabia
5 Department of Computer Skills, Deanship of Preparatory Year, Najran University, 61441 Najran, Saudi Arabia
6 Department of Information Systems, College of Computer Science and Information Systems, Najran University,
61441 Najran, Saudi Arabia
7 Emerging Technologies Research Lab (ETRL), College of Computer Science and Information Systems, Najran
University, 61441 Najran, Saudi Arabia

123 Int J Comput Intell Syst (2025) 18:52

You might also like