0% found this document useful (0 votes)
3 views

Network_Malware_Detection_Using_Deep_Learning_Netw

This document discusses the challenges of malware detection, particularly focusing on Android malware, and proposes a new framework called Anti-Virus Software Detection for Malware with Deep Learning Network (AVSD-MDLN). The framework utilizes deep learning techniques, specifically convolutional neural networks, to improve detection accuracy and efficiency, achieving a detection accuracy of 97.47%. It also outlines the methodologies for dynamic analysis and classification of malware, emphasizing the importance of feature extraction and selection in the detection process.

Uploaded by

sharanreddy169
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Network_Malware_Detection_Using_Deep_Learning_Netw

This document discusses the challenges of malware detection, particularly focusing on Android malware, and proposes a new framework called Anti-Virus Software Detection for Malware with Deep Learning Network (AVSD-MDLN). The framework utilizes deep learning techniques, specifically convolutional neural networks, to improve detection accuracy and efficiency, achieving a detection accuracy of 97.47%. It also outlines the methodologies for dynamic analysis and classification of malware, emphasizing the importance of feature extraction and selection in the detection process.

Uploaded by

sharanreddy169
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Network Malware Detection Using Deep

Learning Network Analysis

Peng Xiao

Information Center of Yunnan Power Grid Co., Ltd., Kunming, 650000, Yunnan,
China
E-mail: [email protected]

Received 02 December 2022; Accepted 08 February 2023;


Publication 05 December 2023

Abstract
Malware, short for malicious software, is designed for harmful purposes and
threatens network security because it can propagate without human interac-
tion by exploiting user’s vulnerabilities and carelessness. Having your system
regularly scanned for malicious software is essential for keeping hackers at
bay and avoiding the disclosure of sensitive data. The major drawbacks are
the rapid creation of new malware variants, and it may become difficult to
detect existing threats. With the ever-increasing volume of Android malware,
the sophistication with which it can hide, and the potentially enormous value
of data assets stored on Android devices, detecting or classifying Android
malware is a big data problem. Security researchers have developed various
malware detection and prevention programs for servers, gateways, user work-
stations, and mobile devices. Some offer centralized monitoring for malware
detection software deployed on many systems or computers. The purpose of
this essay is to critically examine the research that has been done specifically
on malware detection. This paper proposes the Anti-Virus Software Detection
for Malware with Deep Learning Network (AVSD-MDLN) framework to
explore the possible threats. The two methods help in finding the threats.

Journal of Cyber Security and Mobility, Vol. 13 1, 27–52.


doi: 10.13052/jcsm2245-1439.1312
© 2023 River Publishers
28 P. Xiao

Dynamic Analysis for the Detection of Spyware (DA-DS) framework is


framed to detect malicious malware, while the other is for classifying Android
malware which is helped out through the Category in an Ensemble (CE)
method. Prior malware detection methods are compared with the results of the
proposed method. According to the research findings, the proposed approach
achieves a higher projected time (0.5 sec) and detection accuracy (97.47%)
than the existing situation machine learning and deep learning methodolo-
gies. Performance, correlation coefficient, and recall rate all improved in the
suggested framework. Likewise, the negative rate (MPR) and the positive rate
(PPR) also improved.

Keywords: Malware detection, spyware, antivirus, deep learning, dynamic


analysis.

1 An Overview of Malware Detection


Devices are increasingly interconnected and sharing data; however, some
companies mass-produce insecure devices and sell them to end users [1].
Thus, millions of Android apps can be obtained by users from numerous app
stores and installed on their devices [2]. Malware authors will likely focus on
IoT devices if they become widely available on the market [3]. It is possible
for malware to rapidly spread to other networks by leaking user information
collected by devices and penetrating significant networks [4]. The number of
malicious Android apps has increased dramatically alongside the popularity
of Android devices and operating systems [5]. With the development of new
computer systems, cyberattacks have progressed [6]. These threats include
novel malware attacks that compromise Android device security and are
difficult to detect [7].
The Android operating system has a built-in module that can grant
the necessary permissions to Android apps if there has been no security
policy violation [8]. The dataset includes four types of malware: adware,
ransomware, scareware, and SMS [9]. Because malware is constantly grow-
ing and changing, many methods for finding and stopping it have been
proposed [10]. Researchers have reported two methods for spotting mal-
ware [11]. There is Static analysis, in which applications are examined
without running them, and Dynamic analysis, in which malware behavior
is reviewed in a sandbox after it has run [12].
Furthermore, the sophistication and multi-stage nature of modern cyber-
attacks has increased [13]. Large-scale property damage and monetary
Network Malware Detection Using Deep Learning Network Analysis 29

losses resulted from these kinds of attacks [14]. As most services have
moved to online and remote mode, the risk of cyberattacks and viruses has
increased, and the COVID-19 pandemic has offered various cybersecurity
challenges [15]. Research on detecting IoT malware by feature learning and
categorization has been spurred by a need to shield IoT devices from novel
and variant malware assaults [16].
Most research separates the process of finding malware into analysis and
detection phases [17]. Static analysis, dynamic analysis, hybrid analysis, and
memory analysis are some methods that can detect malicious software [18].
Unlike dynamic analysis, which runs malware in a sandbox or virtual com-
puter to observe its behavior, static analysis examines the information of
binary files [19]. Although static and dynamic analyses are used in hybrid
analysis, memory analysis is the most thorough method for analyzing mal-
ware in memory [20]. Using the document tree as input, user agents may
use the Visual Formatting Model to format the data appropriately for visual
presentation. File names, hashes, strings like IP addresses, domain names,
and file header data may all be utilized as technical indications to detect
whether or not a file is malicious. Statistical study of system modifications
or probabilistic explanation of an executable being malware based on the
occurrence of literals is the foundations of traditional malware analysis tech-
niques. However, this probabilistic or statistical technique only approximates
a small subset of malware characteristics and has trouble handling disguised
threats. The dataset was short, and the treatment of packed executables was
disregarded, both of which increase the unpredictability of the outcomes
should the system be deployed in real-time. a detection system that doesn’t
care about where the executable was called from or how it was called, and
hence misses suspicious behaviour.
After malware analysis is complete, malware detection can begin. Several
strategies, such as biometrics, memory-based and model, specification-based,
and internet methods, can be employed at the detection stage [21]. Signature-
based malware detection identifies malicious software by comparing known
malware samples’ signatures or characteristics [22]. There is growing consen-
sus that Android malware detection and classification could benefit greatly
from machine learning (ML) applications [23]. Many ML techniques have
been proposed, human ingenuity is still essential in developing these tools and
methods [24]. However, data integrity might be compromised throughout the
process of choosing useful data for a study. The main goal of data selection
is to find the best combination of data type, source, and instrument to answer
research questions.
30 P. Xiao

The significant contributions to this paper are as follows:


• A framework called Anti-Virus Software Malware Detection with a
Deep Learning Network (AVSD-MDLN) is presented to reveal the
current threats in the environment.
• One framework designed to identify spyware is the Dynamic Analysis
for the Detection of Spyware (DA-DS) framework.
• Category in an Ensemble (CE) is a strategy that helps classify Android
malware.
Precisely, this document will follow the following structure: Section 2
discusses the research in the broader context, and Section 3 provides a model
for foreseeing the current threat and malware detection. The findings and dis-
cussion can be found in Section 4, while the endnotes and recommendations
can be found in Section 5.

2 Background Literature
Phone conversations are just one function of smartphones; they may be
used as portable data banks for text messages, photos, videos, documents,
and more. Because of their popularity and the sensitive information they
store, fraudsters see Android smartphones as a prime opportunity to steal
personal information. Research on the visualization of malware’s features
is ongoing to improve classification precision, performance, and efficiency
while minimizing expenses.
Imtiaz, S. I et al. [25]. Proposed DeepAMD [DAMD], a game-changing
method that employs deep Artificial Neural Networks (ANNs) to defend
against actual Android malware. It is done in need of effective means
of identifying malicious software before exploiting sensitive information.
It compares favorably to traditional machine learning classifiers and new
research that pushes the boundaries of existing knowledge regarding effi-
ciency measures like recall, f-score, and precision.
Damaševičius, R et al. [26]. Suggest a method for detecting malware
based on Ensemble Classification [EC]. A stacked ensemble of dense (fully
connected) and convolutional neural networks (CNN) do the initial classifi-
cation step, while a meta-learner handles the final stage. An ensemble of five
dense and CNN neural networks, with the extra trees classifier acting as a
meta-learner, produces the best results.
Naeem H et al. [27]. Developed a framework for Malware Detection
in the IIoT (MD-IIOT). For this reason, protecting the millions of users of
Network Malware Detection Using Deep Learning Network Analysis 31

IIoT from cybercriminals necessitates a clever protection strategy. Malware


analysis is made more accessible with the help of a proposed method that
uses color picture visualization and a deep convolution neural network. Past
malware detection methods are compared with the results of the suggested
approach.
Qiu, J et al. [28]. Introduced DL-based Android Malware Detection and
Classification (DL-AMDC), which seeks to solve the problems by analyzing
how far it is with deep learning methods for identifying and categorizing
Android malware. Choice of DL architecture, feature extraction and process-
ing, performance measurement, and even acquiring appropriate high-quality
data provide difficulties for academics and practitioners. This research aims
to expose state of the art in malware detection on Android by modeling code
semantics. Consequently, offer our perspectives on the potential and threats
facing this new area of study and our predictions for where it will go in the
future.
Jeon, J et al. [29]. Framed Dynamic Analysis for IoT Malware Detection
(DA-IMD), harm done to the Internet of Things gadgets can be mitigated
with the use of a technique that detects both well-known and newly-evolved
IoT malware. DAIMD can dynamically evaluate IoT malware in a nested
cloud environment because of the CNN model it employs. DAIMD performs
dynamic analysis of IoT malware in a nested cloud, extracting features such
as memory, network, virtual file system, process, and system call.
DAMD, EC, MD-IIOT, DL-AMDC, and DA-IMD are just a few of
the well-known models that could benefit from refinement. One solution
to address the need for better models is using a deep learning network to
detect malware through anti-virus software. The other two approaches to
identifying spyware and aiding in the classification of Android malware
make the development of valuable methods far more efficient, accurate,
predictable, and successful than is currently possible. When used to scan
all incoming data, AVSD-MDLN may assist stop malware infections before
it start. In addition to identifying sophisticated malware, AVSD-MDLN
software can defend users against ransomware. A database server may not
detect file corruption until it attempts to retrieve corrupted data. Be sure that
database is still accurate every so often. Data may be checked for accuracy
in a number of ways, including checksums, index data, and the presence
of objects that should be associated with each page of a table. To further
boost efficiency, one may also use an express validation that skips referential
integrity checks. There has been no validation of the current CPS on the
dataset.
32 P. Xiao

3 Anti-Virus Software Detection for Malware with Deep


Learning Network
This paper uses the help of a Deep Neural Network to detect malware
through anti-virus software. The Anti-Virus Software Detection for Malware
dynamically analyzes malware using nested virtualization in the cloud envi-
ronments and then trains them using the deep learning network following
detailed examination and detection techniques comparable to those employed
by generic malware detection systems.
The above Figure 1 shows the system for detecting malware through a
deep learning network. The dataset includes malware and benign code, a
kind of executable file. Debugging, feature extraction, feature pre-processing,
feature selection, and classification are the primary operations of a mal-
ware detection system’s workflow. Consequently, debugging is performed on
datasets that include both malicious and benign files to produce a log file
from which behavioral data can be retrieved. Hence to classify the extracted
features, expected behaviors are chosen from among them. The files are then
classified as either malicious or benign using malware features that have been
trained and employ a model based on a deep neural network to carry out tasks
such as feature selection and classification. CNNs can reliably identify and
categorise malware by examining a training set of PE files. Bytes sequence,
grayscale images, API call sequence, structural entropy, and HTTP traffic are
just some of the data types that CNNs can be used to analyse in malware.

Debugging:
Analyzing tools translate binary files into assembly language, and remote
debugging helps understand the code’s logic and flow. It is done in a
simulated setting where both malicious and safe files can be run.

Figure 1 System for detecting malware.


Network Malware Detection Using Deep Learning Network Analysis 33

Feature Extraction:
Malware memory, network, system call, VFS, and process signatures are
extracted as.csv files and stored in Excel during the feature extraction phase.
A file’s signature is a characteristic explored about its execution by analyzing
its behavior and internal structure during debugging. Through the use of fea-
ture selection methodologies during training, the suggested framework was
able to effectively identify malware in real-world applications. In addition,
the model construction is aided by the specified characteristics when several
machine learning techniques are taken into account. The AVSD-MDLN
framework is educated using various machine learning methods that are
LSTM and SVM, and it operates on the idea of feature selection techniques.
We take into account two different feature selection methodologies, the
feature ranking strategy and the feature subset selection approach, to choose
the optimal features for training.

Feature Pre-Processing:
The term “data preprocessing” refers to the process of transforming a messy
data collection into a usable one. The data on malicious and safe files’ features
are compressed and converted during the feature pre-processing stage into an
image type for more accessible representation, most of which is made up of
metadata concerning behavior and visual terms of that data.

Feature Selection:
Feature Selection is a technique for streamlining the model’s inputs by
keeping just the most pertinent information and discarding irrelevant details,
or noise. The ZFNet CNN model automatically picks and trains representa-
tive aspects of behavior; AVSD-MDLN can detect malware without human
intervention in either the feature selection or classification phases.

Deep Learning Network:


For this, a deep neural network (DNN) or deep net will refer to any neural
network with two or more hidden layers. Complex mathematical modeling is
at the heart of how deep neural networks process data in various ways.

D = ∅(y1 , . . . , yn ) (1)

where D is the output y1 . . . yn are the network’s input. ∅ is the precision


factor. The network learns a map of input space y to space D as output in
34 P. Xiao

Equation (1). "N #


X
yi = A yj ∗ kij + bj (2)
i=1
The above Equation (2) shows the activation map of the network where k
is the kernel, j is the size of the kernel, N is the inputs count, b, yj is the bias
kernel, A is the activation function.

∅lj = ∅jl+1 Wjl+1 of ′ (ul ) = αjl+1 up(∅l+1 ′ l


j )of (u ) (3)

where l + 1 shows the pooling layer, W represents kernel and αjl+1 up do up


sample are shown in Equation (3).
∂E X
= (∅lj )u, v (4)
∂bj s,t

∂E X
= (∅l+1
j )u, v (5)
∂bj s,t

where p(∅jl+1 ) shows the patch for the deep network of yj , kij denotes the
patch center in samples of datasets, and the partial derivative of the error
function is mentioned in the above Equations (4) and (5).

yjl = A(down(yjl−1 ) + bj (6)

In the above Equation (6), down connotes a process involving collecting


and combining resources, and b shows value bias.

∅lj = ∅l+1
j Wj
l+1
of ′ (ul ) (7)

Precision is obtained from the above Equation (7).

Feature Classification:
The trained CNN is then put to use in the validation step, where the probabil-
ity of malware attack is determined. The CNN takes in a feature picture from
the validation process and spits out a two-dimensional vector whose elements
reflect varying degrees of harmful or benign behavior. The harmful value
exceeds the benign value if the input picture is determined to be malicious.
In order to determine the probability that a piece of malware will infect a
Network Malware Detection Using Deep Learning Network Analysis 35

system, we apply the following sigmoid function in Equation (8) on the value
of the harmful class:
1
P (X) = (8)
1 + e(−X)
Where, X is the malicious class.
Digitizing and channelizing integrated behavioral feature data and the
behavior frequency table retrieved from behavior feature metadata and mem-
ory produces these images. The proposed model can identify malicious code
created in the training dataset with the ZFNet model by training on display
pictures. A performance evaluation is carried out to determine if the model
this work proposes a method that can accurately analyze and detect malware,
from the most prevalent forms to the constantly evolving new and variant
forms, and then generate an output that cleanly separates the malware from
the input data.
The architecture of threat detection is shown in Figure 2. The storage
infrastructure comprises three databases: traffic, behavior, and log. The raw
Android files from devices using the control unit are collected and stored in
the traffic database. The history of a network is stored in the database as a
collection of datasets. The behavior database is kept up-to-date by constantly
collecting new malware fingerprints from the log database.
Processing a large amount of data requires more time and human
resources. As a result, incorporate a Deep Neural Network (DNN) and a
color image transformation technique into the design of the analyzer unit.

Figure 2 Architecture for threat detection.


36 P. Xiao

The DNN model is fed a color image previously converted from the raw
Android file. File fingerprints are compared to a behavior database, and the
analyzer decides whether or not the file is malicious. If malicious behavior
is exploreed on the network, the response unit delivers a final warning to the
system administrator, who can then take appropriate action. Then Antivirus
software connected to the gateway helps detect the threat recognized and will
be displayed for the administrator through the anti-virus software. Malware
with more temporal variability in time has been shown to be more con-
sistently classified as harmful by antivirus engines. Figure 2 compares the
distributions of malicious samples with low temporal variability (lower than
the 25th percentile) to those with high variability (on the top of the 75th
percentile). In November of 2019 (exactly one year after the first observation
of the samples), we will check Virus Total for the total number of detections
for each sample. The samples that were subsequently picked up by AV
software exhibited more temporal variation, as was seen. This demonstrates
that time tends to have a more profound influence on variability for samples
that are simpler to identify, even if temporal variability of time is low across
the board generally. It has been shown that the primary benefit of Deep
Learning algorithms is their attempt to learn abstract characteristics from data
in a gradual fashion. Thus, neither domain knowledge nor rigorous feature
extraction are required. In deep learning, a computer model is trained to make
inferences about the world around it without being explicitly programmed to
do so. When it comes to accuracy, deep learning algorithms may sometimes
even outperform humans.

Detection of Spyware:
Spyware is malicious software that sneaks into a user’s device without their
knowledge, monitors their online activities without their consent, and sends
this data to third parties without their knowledge or consent. Spyware, by
definition, is a form of malicious software that gains access to and disrupts
a user’s device without the latter’s knowledge or permission. The raw file
is crucial to the success of any android project while using android studio
for development. Audio and video files (in formats like MP3 and MP4) and
other file types can be stored in Android’s raw folder. Specifically, the raw
folder will be created in the res folder, at raw. To detect threats, one must
conduct a thorough investigation of the complete security infrastructure in
order to identify any malicious actions that might compromise the system.
Once a threat has been identified, appropriate countermeasures must be taken
to prevent it from exploiting any existing vulnerabilities in the system.
Network Malware Detection Using Deep Learning Network Analysis 37

Figure 3 Detection of malicious android application in a visual format.

Figure 3 gives a visual format for the detection of malware. It summa-


rizes the proposed method, which consists of Static binary classification and
Malware detection classification. Therefore, samples from the dataset are
sorted into malware and benign categories in the Static layer. After being
identified as malicious at the Static layer, models are further categorized at
the Dynamic layer into one of four types and 39 families. There are four
types of malware which are as follows:
1. Ransomware
2. Scareware
3. SMS Malware
4. Adware
38 P. Xiao

1. Ransomware:
Ransomware is malicious software that encrypts user data and locks the
infected device before demanding payment from the victim. Once files have
been encrypted, even if the ransomware is removed, the decryption key will
not work to restore the files. The android system is vulnerable to ransomware
because it handles permissions and intents.
In exchange for decrypting the victim’s files, ransomware demands pay-
ment from the infected user. Both lock-screen and crypto-ransomware are
common on Android. In the lock-screen context, a photograph that com-
pletely covers a smartphone’s display is used as a security measure. Second,
ransomware encrypts the client’s vital data, as in the crypto class. The defini-
tion of Android ransomware often mirrors that of a Trojan horse. To further
confuse victims, ransomware can assume the appearance of a legitimate law
enforcement agency or label.

2. Scareware:
Scareware is malicious software designed to trick users into downloading
and installing unnecessary programs. Scareware is software intended to
intimidate or trick users into providing personal information on a phishing
website. Scareware is software designed to intimidate its target audience by
masquerading as a helpful program that usually protects the user’s computer
from malicious software. Fake filtering exchanges, fake progress bars, and
fake alarms are all features designed for scareware.

3. SMS Malware:
Malicious text messages can be sent from an infected Android device if
the malware is allowed to take over the device’s messaging system. The
owner uses the malware to have the infected phones send messages in
his direction. Malware explicitly designed to target a compromised mobile
device is often used in SMS attacks. Without the client’s knowledge or
permission, the Trojan can be programmed to initiate unauthorized commu-
nications such as phone calls or text messages. These communications are
then routed to premium-rate or chargeable SMS content services, providing
the cybercriminal with massive revenue streams.

4. Adware:
Adware is a malicious program that compromises computer systems and
the personal information of its users. Adware can harm the client in several
Network Malware Detection Using Deep Learning Network Analysis 39

ways, including stealing his data and sending it to a remote server, stealing
his screen real estate, or displaying notifications for unrelated ads usually
reserved for major structural events. An adware program can hijack a mobile
device’s volume in rare instances. Adware is software whose primary func-
tion is to force users to view or click on many advertisements, banners, and
posts without their knowledge or consent. Any software that consistently
displays advertisements based on a user’s browsing or application activity
history is considered adware. One aspect of this is data collection, which is
often done for legitimate reasons can be used maliciously.

Category in an Ensemble:
An innovative method for malware identification is proposed in this study;
it employs both fully connected and convolutional neural networks are used
as the underlying learning mechanism and is founded on a category in an
ensemble learning approach. Cybercriminals and hackers often target critical
computer systems and networks to steal information, exact financial com-
pensation, or demonstrate their skills. When identifying previously explored
malware, the tried and true methods performed admirably. Log data is the
most basic data type created by any and all computers, services, and net-
work nodes, and it may be collected at the host level. Log data is a useful
information source for detecting attacks in IT networks because it allows for
monitoring of the current state of a computer system or network. Syslog,
journald, and the Windows event log are all well-known services for logging
data and making it accessible.
Managing the storage infrastructure includes a wide range of tasks,
such as ensuring its availability, enough capacity, optimal performance, and
flawless security. These components are interdependent and must operate in
conjunction to get optimal ROI.
A SAN (storage area network) is a network of storage devices that can
be accessed by multiple servers or computers, providing a shared pool of
storage space. Each computer on the network can access storage on the SAN
as though they were local discs connected directly to the computer.
Data selection is the first step in the system methodologies, as shown
in Figure 4, and progresses to the model evaluation stage. Machine learning
techniques can detect malware in two phases: feature extraction (to determine
which data characteristics should be used to make predictions) and classi-
fication/clustering. The proposed technology centers on machine learning,
which can be trained to identify malicious from safe files and make accurate
predictions about files that have never been seen.
40 P. Xiao

Figure 4 Malware detection procedures.

The following steps, which are part of the data processing, feature
selection, and engineering model, ultimately lead to the desired result. They
are processing, exploration, and choice of data-neural-network models; con-
struction, testing, and evaluation of ensemble models; assessment of model
performance.

Ensemble:
Ensemble techniques assume that a base classifier can be applied to many
differently rearranged training datasets (through resampling or reweighting)
to generate an ensemble of base classifiers. By merging the prediction impacts
of these individual base classifiers, a new ensemble classifier is caused by
the stacked ensemble approach, with the latter learning how to integrate the
predictions of the former more effectively. Here employed a two-step stacking
procedure.
Initially, several models are trained using the same dataset. The results
from each model are then combined to form a fresh set of information.
Each item in the current data set connects to the actual value it is trying to
approximate. Secondly, the output is derived from the dataset used to train
the meta-learning.
Figure 5 depicts a stacking model architecture and a conceptual overview
of ensemble classification in which level-0 models serve as the foundation
upon which a meta-learner (or generalizer) that combines the projections
of multiple level-0 models is built. Multiple atmospheric dispersion models
are used for each simulation in this multi-model ensemble. Typically, data
from many meteorological models is combined with the various atmospheric
dispersion models. The metrics, optimizer, and loss function are all defined
Network Malware Detection Using Deep Learning Network Analysis 41

Figure 5 A conceptual overview of ensemble classification.

during compilation. Compiling a model several times will not affect the
quality of the pretrained weights, and it has nothing to do with the weights
either. It’s impossible to train without a constructed model (because training
uses the loss function and the optimizer). Learning to learn, or meta-learning,
is the process of methodically examining the results of various machine
learning methods across a broad variety of learning tasks and then using
that information to accelerate the acquisition of new knowledge. Typically,
metadata is embedded directly into an image file, along with the data that
defines the image. Metadata text can be extracted from binary data with a
program like Hex Fiend.
Base models are compiled with forecasts and can fit into the training
data. The goal of training a meta-learner (level-1 model) is to merge the
model’s underlying predictions. The meta-learner makes use of straightfor-
ward models to inform their decision-making. To prepare the meta-learner,
feed it a dataset consisting of input-output value pairs and the predictions of
the underlying models. The data and malware are separated in the outcome
section.
The suggested ensemble learning model is tested for its classification
accuracy. True positive (TP), true negative (TN), false positive (FP), and false
negative (FN) values were determined by comparing the actual labels to the
predicted labels. Here presumed a two-category problem, where one class is
labeled positive, denoted by +1 and the other negative, represented by −1.
PN
[b(yi ) = +1][xi = −1]
MPR = i=1 PN (9)
i=1 [xi = −1]
42 P. Xiao

Where, N is the inputs count and The above Equation (9) shows the minus
positive rate (MPR)
PN
[b(yi ) = +1][xi = +1]
PPR = i=1 PN (10)
i=1 [xi = +1]

The above Equation (10) shows the plus positive rate (PPR)
PN
[b(yi ) = −1][xi = +1]
MNR = i=1 PN (11)
i=1 [xi = +1]

The above Equation (11) shows the minus negative rate (MNR) where
b(y) is the input classifier with input yi and output xi . The number of times
malicious files are mistakenly identified as benign is the minus positive (MP).
Still, the frequency with which they are accurately classified as harmless is
the plus negative (PN).
PPR
Accuracy = (12)
PPR + MPR
Equation (12) above helps in calculating the accuracy.
Accuracy ∗ Recall
Performance = 2 ∗ (13)
Accuracy + Recall
The above Equation (13) shows the performance of the proposed system.
PP ∗ PN − MP ∗ MN
CC = (14)
(PP + MP)(PP + MN)(PN + MP)(PN + MN)
where CC represents the correlation coefficient of the sample data taken and
correlation is achieved from the above Equation (14).
1 − ∂0
Recall = 1 − (15)
1 − ∂e
where ∂0 denoted the proportion of valid responses in the evaluation data,
and ∂e is the ratio of the percentage breakdown of correlation that can be
explained by chance alone in Equation (15). Recall Rate is high in perfor-
mance for the proposed model, and it’s achieved by the above Equation (15).
Table 1 above exhibits accuracy variation, showing how each of the
techniques above can record data as per Equation (12). Since it optimizes
gains while minimizing losses, AVSD-MDLN is more accurate than its rivals.
Network Malware Detection Using Deep Learning Network Analysis 43

Table 1 Accuracy variation


Epoch DAMD EC MD-IIOT DL-AMDC DA-IMD AVSD-MDLN
0 85.12 88.12 86.2 87.12 89.12 90.3
100 85.78 89.78 87.34 88.45 90.1 91.3
200 86.3 90.1 88.1 90.3 90.9 92.3
300 88.4 90.8 88.9 92.5 91.5 93.9
400 89.12 91.5 89 93 92.1 95.6
500 90.1 92.1 89.3 93.4 92.4 97.5

Table 2 Recall rate analysis


Epoch DAMD EC MD-IIOT DL-AMDC DA-IMD AVSD-MDLN
0 86.12 89.12 87.4 88.45 90.1 91.4
100 87.23 89.78 87.9 89.45 91.3 92.3
200 88.56 90.1 88.34 91.2 93.4 93.5
300 89.23 90.8 88.9 92.5 94.1 94.6
400 89.98 91.5 90.3 93.6 94.5 96.7
500 90.2 91.2 93 94.5 95 98.5

Accuracy is defined as the degree to which a calculated value is within


a certain tolerance of the actual value. The term “precision” refers to the
consistency with which a certain instrument or procedure returns the same
result. Thus, precision refers to easily an outcome may be replicated, whereas
accuracy measures how closely an outcome matches the original. Precision,
precision, and calibration are all phrases that seem similar but have quite dif-
ferent meanings. Being precise is essential to being accurate. Unfortunately,
accuracy is not a guarantee of precision. Accuracy is achieved by calibration
and precise measurement.
The above methods can all record information according to the Equation,
as shown in Table 2, which displays recall rate variation per Equation (15).
AVSD-MDLN excels compared to its competitors because it maximizes
profits while achieving a high recall rate.
Here, introducing Anti-Virus Software Detection for Malware with Deep
Learning Network is a strategy for enhancing the accuracy and precision with
which a procedure is typically performed. Two more methods have been
developed to evaluate the previously stated approach, with the overarching
intention of detecting data from malware through an assessment of precision,
recall and accuracy, and other factors.
44 P. Xiao

4 Results and Discussion


The research recommends defining and running a Deep Learning Network
in Anti-Virus Software for Malware Detection and simulations to achieve
optimal precision, performance, and accuracy. Different models are compared
with one another in terms of recall rate, technical performance, precision,
and accuracy (such as DAMD, EC, MD-IIOT, DL-AMDC, and DA-IMD).
To evaluate the performance of the proposed AVSD-MDLN, here engage
in two primary functions: (1) detection and identification and (2) attribution
determining if an app is malicious, the purpose of which is to identify the
type of malware that has been found.
The dataset is sourced from the cited website.
https://www.qub.ac.uk/ecit/CSIT/Research/SecurityIntelligence/DeepAndroi
dMalwareDetection/ [30].
Table 3 of the abovementioned data set is used for this analysis, and the
problems that the proposed model solves are outlined, and how all the systems
are compared.

4.1 Accuracy Analysis


The testing procedure for the accuracy analysis ratio is depicted in Figure 6.
Change over time is measured along the X-axis, while accuracy is examined
along the Y-axis. Time variation factor performance versus accuracy is crucial
regarding malware analysis and detection. To help meet these requirements,
the accuracy factor, represented by Equation (12), is provided.

4.2 Precision Analysis


The analytical precision outcome is shown in Figure 7. The optimum preci-
sion rate is determined by plugging Equation (7) into a graph where the Y-axis
represents the analysis ratio of precision, and the X-axis represents time.

Table 3
S.No Classification System Benign Malware Access
1. DAMD 741 1260 0.90
2. EC 834 1000 0.92
3. MD-IIOT 912 2367 0.89
4. DL-AMDC 1005 2890 0.93
5. DA-IMD 1034 3023 0.92
6. AVSD-MDLN 2345 5634 0.97
Network Malware Detection Using Deep Learning Network Analysis 45

100

Accuracy Analysis Ratio 95


DAMD
90 EC
in (%)

85 MD-IIOT
DL-AMDC
80
DA-IMD
75
AVSD-MDLN
0 100 200 300 400 500
Time in (min)

Figure 6 Accuracy analysis.

100
Precision Analysis Ratio

95
DAMD
90 EC
in (%)

85 MD-IIOT
DL-AMDC
80
DA-IMD
75
AVSD-MDLN
0 100 200 300 400 500
Time in (min)

Figure 7 Precision analysis.

When it comes to analyzing and predicting malware detection, AVSD-MDLN


is superior to all other models. Static features, such as a new feature based on
control statement act of trying, are employed for sample classification in the
process of static analysis. While it takes less time to train than s-value in
Equation (16), its accuracy is only 97.3%, a difference of 1.6%. It’s possible
that the actual rate of mistakes is much greater in practice. In this way, s-value
improved training speed marginally while also adding characteristics beyond
those that are static. In order to get higher precision, it is preferable to sacrifice
some time and effort. Static analysis cannot compete with s-value in either
direction because of the time it takes to make projections.
s-value = (subsets of samples)/(malware count in train set) (16)
46 P. Xiao

100

Recall Rate Analysis 95


DAMD
Ratio in (%)
90 EC
85 MD-IIOT
DL-AMDC
80
DA-IMD
75
AVSD-MDLN
0 100 200 300 400 500
Time in (min)

Figure 8 Recall rate analysis.

4.3 Recall Rate Analysis


Figure 8 displays the observed range in relay rates. The optimal recall
rate is found by plotting the analysis ratio of recall rate versus time using
Equation (15). There is no better model than AVSD-MDLN for analyzing
and forecasting malware detection. When the dynamic nature of Android’s
malware landscape is taken into consideration, as demonstrated in figure, both
detection techniques demonstrate strong performance in the identification
of previously identified malware. Our system has an average F-measure
of 0.94 when testing samples from the same year as the training, and
F-measures of 0.84 and 0.72 when testing samples from one and two years
following the training, respectively. Our average F-Measure is 0.92, whereas
AVSD-MDLN’s is 0.84 after one year of training and 0.70 after two years of
training, therefore the two are similar. 14 out of 15 findings suggest that our
system performs better when utilising newer samples for training and older
for testing. In a worldwide comparison of the two methods, 25 of our findings
are on par with or better than the AVSD-MDLN’s, while 13 of the AVSD-
MDLN’s results are superior. Because of this, the approach can outperform
AVSD-MDLN in identifying new forms of malware.

4.4 Performance Analysis


Figure 9 depicts the testing for the Performance evaluation. The Y axis rep-
resents the elapsed time, while the X axis displays the Performance analysis
ratio. Equation (13) helps maximize the performance index compared to other
models; the AVSD-performance MDLN is at its highest, making it ideal for
analyzing and predicting malware.
Network Malware Detection Using Deep Learning Network Analysis 47

500
Time in (min)
400 AVSD-MDLN
300 DA-IMD
200 DL-AMDC
100 MD-IIOT
0 EC
DAMD
75 80 85 90 95 100
Performance Analysis Ratio in (%)

Figure 9 Performance analysis.

100
Correlation Analysis

98
96 DAMD
Ratio in (%)

94
EC
92
90
MD-IIOT
88 DL-AMDC
86 DA-IMD
84
AVSD-MDLN
0 200 400 600
Time in (min)

Figure 10 Correlation analysis.

4.5 Correlation Analysis


Examination for correlation analysis is shown in Figure 10. The time varia-
tion is plotted along the X axis, while the correlation is shown along the Y
axis. Compared to other models, the AVSD- MDLN performance has the
highest correlation index achieved from Equation (14), making it ideally
suited for analyzing and predicting malware.
Existing models such as DAMD, EC, MD-IIOT, DL-AMDC, and
DA-IMD have been compared to the proposed AVSD- MDLN -model, which
has been shown to perform better than all of them. In this research, the authors
conclude that these issues can be overcome with the help of virus detection
in anti-virus software.
48 P. Xiao

5 Endnotes
The need for innovative approaches to identify new forms of malware is
growing since the current procedures are laborious and prone to numerous
mistakes. Malware for Android devices is evolving, and new strains are being
explored. Several investigations into the analysis and detection of malware
have been conducted to respond directly to this problem. Protecting Android
devices from malicious software, this paper proposed a new method based
on AVSD- MDLN. This method may hamper our ability to identify evolving
forms of malware as they emerge.
Moreover, the large amounts of behavior data required to detect Mal-
ware in E-devices are challenging to collect and store for extended periods
due to the devices’ limited hardware resources and purpose-built hardware.
The remaining steps of model development – classification, feature selec-
tion, pre-processing, feature extraction, and bug fixing – were carried out
on the cloud. The proposed model can stop malware from infecting or
spreading to other devices over the Internet. The proposed model in this
paper can analyze the behavioral features of malware in action through a
dynamic analysis technique. Malware aware of its limited execution envi-
ronment can potentially avoid dynamic analysis-based malware analysis and
detection systems. Yet, the suggested malware detection model achieved a
projected time (0.5 sec) and detection accuracy of 97.47% on the dataset.
The Detection of Spyware and Category in an Ensemble are the two methods
to accomplish the other models’ precision, recall rate, and performance.
Correlation coefficient is improved in the suggested framework. Likewise,
the negative rate (MPR) and the positive rate (PPR) also improved. Tra-
ditional malware detection methods cannot be directly applied due to the
computational complexity and limited resources (time, money, and mem-
ory) of devices. So, they hope to eventually develop a model for detecting
malware that operates entirely in the block chain and relies on machine
learning rather than on any persistent storage. There is no mention of
the deep learning architectures’ resilience in the paper offered. Since mal-
ware defection is a crucial application in mission-critical settings, this is a
promising area for further research. Misclassification is a serious problem
that may have far-reaching consequences for businesses. The future direc-
tion of the proposed research are these selected features help to build a
model by considering different machine learning algorithms that is MLP
and PCA.
Network Malware Detection Using Deep Learning Network Analysis 49

References
[1] Sarker, I. H. (2021). Deep cybersecurity: a comprehensive overview
from neural network and deep learning perspective. SN Computer
Science, 2(3), 1–16.
[2] Huang, X., Ma, L., Yang, W., and Zhong, Y. (2021). A method for
windows malware detection based on deep learning. Journal of Signal
Processing Systems, 93(2), 265–273.
[3] Catak, F. O., Ahmed, J., Sahinbas, K., and Khand, Z. H. (2021).
Data augmentation based malware detection using convolutional neural
networks. PeerJ Computer Science, 7, e346.
[4] Nisa, M., Shah, J. H., Kanwal, S., Raza, M., Khan, M. A., Damaševičius,
R., and Blažauskas, T. (2020). Hybrid malware classification method
using segmentation-based fractal texture analysis and deep convolution
neural network features. Applied Sciences, 10(14), 4966.
[5] Usman, N., Usman, S., Khan, F., Jan, M. A., Sajid, A., Alazab, M., and
Watters, P. (2021). Intelligent dynamic malware detection using machine
learning in IP reputation for forensics data analytics. Future Generation
Computer Systems, 118, 124–141.
[6] Amin, M., Tanveer, T. A., Tehseen, M., Khan, M., Khan, F. A., and
Anwar, S. (2020). Static malware detection and attribution in android
byte-code through an end-to-end deep system. Future generation com-
puter systems, 102, 112–126.
[7] Mahindru, A., and Sangal, A. L. (2021). MLDroid – Framework for
Android malware detection using machine learning techniques. Neural
Computing and Applications, 33(10), 5183–5240.
[8] Ren, Z., Wu, H., Ning, Q., Hussain, I., and Chen, B. (2020). End-to-end
malware detection for android IoT devices using deep learning. Ad Hoc
Networks, 101, 102098.
[9] Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., and Zheng, Q.
(2020). IMCFN: Image-based malware classification using fine-tuned
convolutional neural network architecture. Computer Networks, 171,
107138.
[10] Alzaylaee, M. K., Yerima, S. Y., and Sezer, S. (2020). DL-Droid: Deep
learning based android malware detection using real devices. Computers
& Security, 89, 101663.
[11] Gibert, D., Mateu, C., and Planes, J. (2020). The rise of machine learning
for detection and classification of malware: Research developments,
50 P. Xiao

trends and challenges. Journal of Network and Computer Applications,


153, 102526.
[12] Feng, R., Chen, S., Xie, X., Meng, G., Lin, S. W., and Liu, Y. (2020).
A performance-sensitive malware detection system using deep learning
on mobile devices. IEEE Transactions on Information Forensics and
Security, 16, 1563–1578.
[13] Pei, X., Yu, L., and Tian, S. (2020). AMalNet: A deep learning frame-
work based on graph convolutional networks for malware detection.
Computers & Security, 93, 101792.
[14] Hwang, J., Kim, J., Lee, S., and Kim, K. (2020). Two-stage ransomware
detection using dynamic analysis and machine learning techniques.
Wireless Personal Communications, 112(4), 2597–2609.
[15] Xiao, G., Li, J., Chen, Y., and Li, K. (2020). MalFCS: An effective
malware classification framework with automated feature extraction
based on deep convolutional neural networks. Journal of Parallel and
Distributed Computing, 141, 49–58.
[16] D’Angelo, G., Palmieri, F., Robustelli, A., and Castiglione, A. (2021).
Effective classification of android malware families through dynamic
features and neural networks. Connection Science, 33(3), 786–801.
[17] Wu, Y., Wei, D., and Feng, J. (2020). Network attacks detection
methods based on deep learning techniques: a survey. Security and
Communication Networks, 2020.
[18] Jha, S., Prashar, D., Long, H. V., and Taniar, D. (2020). Recurrent neural
network for detecting malware. computers & security, 99, 102037.
[19] Vasan, D., Alazab, M., Venkatraman, S., Akram, J., and Qin, Z. (2020).
MTHAEL: Cross-architecture IoT malware detection based on neural
network advanced ensemble learning. IEEE Transactions on Computers,
69(11), 1654–1667.
[20] Jeon, S., and Moon, J. (2020). Malware-detection method with a convo-
lutional recurrent neural network using opcode sequences. Information
Sciences, 535, 1–15.
[21] Park, S., and Choi, J. Y. (2020). Malware detection in self-driving
vehicles using machine learning algorithms. Journal of advanced trans-
portation, 2020.
[22] D’Angelo, G., Ficco, M., and Palmieri, F. (2020). Malware detection in
mobile environments based on Autoencoders and API-images. Journal
of Parallel and Distributed Computing, 137, 26–33.
Network Malware Detection Using Deep Learning Network Analysis 51

[23] Pektaş, A., and Acarman, T. (2020). Deep learning for effective Android
malware detection using API call graph embeddings. Soft Computing,
24(2), 1027–1043.
[24] Darabian, H., Homayounoot, S., Dehghantanha, A., Hashemi, S.,
Karimipour, H., Parizi, R. M., and Choo, K. K. R. (2020). Detecting
cryptomining malware: a deep learning approach for static and dynamic
analysis. Journal of Grid Computing, 18(2), 293–303.
[25] Imtiaz, S. I., ur Rehman, S., Javed, A. R., Jalil, Z., Liu, X., and Alnu-
may, W. S. (2021). DeepAMD: Detection and identification of Android
malware using high-efficient Deep Artificial Neural Network. Future
Generation computer systems, 115, 844–856.
[26] Damaševičius, R., Venčkauskas, A., Toldinas, J., and Grigaliūnas,
Š. (2021). Ensemble-based classification using neural networks and
machine learning models for windows pe malware detection. Electron-
ics, 10(4), 485.
[27] Naeem, H., Ullah, F., Naeem, M. R., Khalid, S., Vasan, D., Jabbar, S.,
and Saeed, S. (2020). Malware detection in industrial Internet of things
based on hybrid image visualization and deep learning model. Ad Hoc
Networks, 105, 102154.
[28] Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., and Xiang, Y. (2020).
A survey of android malware detection with deep neural models. ACM
Computing Surveys (CSUR), 53(6), 1–36.
[29] Jeon, J., Park, J. H., and Jeong, Y. S. (2020). Dynamic analysis for
IoT malware detection with convolution neural network model. IEEE
Access, 8, 96899–96911.
[30] https://www.qub.ac.uk/ecit/CSIT/Research/SecurityIntelligence/Deep
AndroidMalwareDetection/
52 P. Xiao

Biography

Peng Xiao was born in Kunming, Yunnan, P.R. China, in 1988. He received
the bachelor’s degree from Yunnan University Dianchi College, P.R. China in
2012. Now, he works in Information Center of Yunnan Power Grid Co., Ltd,
Kunming, Yunnan, China. His research interests is mainly information secu-
rity evaluation technology, include network attack and defense technology,
network security management, enterprise security system construction, etc.

You might also like