0% found this document useful (0 votes)
6 views

masum2019

The paper presents Droid-NNet, a deep learning framework designed for Android malware detection, which outperforms traditional machine learning methods. It evaluates the performance of Droid-NNet using two real-world datasets, Malgenome-215 and Drebin-215, demonstrating high accuracy and effectiveness in identifying malware. The study highlights the limitations of conventional detection techniques and emphasizes the potential of deep learning in enhancing Android security.

Uploaded by

zfazza4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

masum2019

The paper presents Droid-NNet, a deep learning framework designed for Android malware detection, which outperforms traditional machine learning methods. It evaluates the performance of Droid-NNet using two real-world datasets, Malgenome-215 and Drebin-215, demonstrating high accuracy and effectiveness in identifying malware. The study highlights the limitations of conventional detection techniques and emphasizes the potential of deep learning in enhancing Android security.

Uploaded by

zfazza4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2019 IEEE International Conference on Big Data (Big Data)

Droid-NNet: Deep Learning Neural Network for


Android Malware Detection
Mohammad Masum Hossain Shahriar
Analytics and Data Science Institute Department of Information Technology
Kennesaw State University Kennesaw State University
Kennesaw, USA Marietta, USA
[email protected] [email protected]

Abstract— Android, the most dominant Operating System Neural networks are currently widely used for many
(OS), experiences immense popularity for smart devices for the applications due to the capability of highly non-linear systems
last few years. Due to its’ popularity and open characteristics, and flexibility in architecture design. In this paper, we propose
Android OS is becoming the tempting target of malicious apps a deep neural network framework named Droid-NNet for
which can cause serious security threat to financial institutions, Android malware detection. Our contributions include: (1) We
businesses, and individuals. Traditional anti-malware systems do conduct a comprehensive assessment with rigorous
not suffice to combat newly created sophisticated malware. Hence, experimental setting to asses Droid-NNet performance with two
there is an increasing need for automatic malware detection publicly available real-world Android application datasets, and
solutions to reduce the risks of malicious activities. In recent (2) Droid-NNet provides high weighted F-beta score, high true
years, machine learning algorithms have been showing promising positive rate and low false positive rate based on deep neural
results in classifying malware where most of the methods are network architecture which suggests that detecting Android
shallow learners like Logistic Regression (LR). In this paper, we malware using deep learning technique is promising.
propose a deep learning framework, called Droid-NNet, for
malware classification. However, our proposed method Droid- The rest of the paper is organized as follows: In Section II,
NNet is a deep learner that outperforms existing cutting-edge we introduce the related work of Android malware detection.
machine learning methods. We performed all the experiments on Section III describes the methodology of our proposed method
two datasets (Malgenome-215 & Drebin-215) of Android apps to Droid-NNet along with the other three classifiers that are
evaluate Droid-NNet. The experimental result shows the implemented in this paper. The experimental setting and results
robustness and effectiveness of Droid-NNet. are explained in Section IV. Finally, Section V concludes the
paper.
Keywords—neural network, android malware, android security
II. BACKGROUND & RELATED WORK
I. INTRODUCTION
Traditional detection techniques have been applied for
Malware is malicious software (e.g. viruses, ransomware, classifying Android malware. Signature-based detection is the
trojan horses, and spyware) that can damage or execute harmful most widely used anti-malware system. It identifies a malware
actions on devices [2]. Android is one of the most accepted OS instance by searching specified byte sequences (called
for smart devices like phones, tablets, and other mobile devices. signatures) into an object to investigate matching with known
Due to its popularity and open characteristics, Android is prone signatures from blacklisted malicious programs. The detection
to malware attacks, which can cause devastating effects such as method is not effective against “zero-day attacks” as the system
stealing information, corrupting files, infecting entire network is formed based on known malware signatures [3]. A signature-
of devices [1]. Therefore, malware poses a major security threat based detection method was proposed to Android malware
to financial institutions, businesses, and individuals. detection that leverages signature matching algorithms. An
The number of malware threats on Android-based smart extension of signature-based method was proposed that
devices are increasing exponentially and the newly created combines anomaly-based and signature-based mechanisms. The
malware has become more sophisticated and variants. Hence, combined approach achieved 96% accuracy in classifying
traditional malware detection techniques such as signature- malicious apps by experimenting on three different data sets [8].
based detection, heuristic detection or behavior-based detection To overcome the limitations of signature-based detection,
are not adequate to combat malicious software [2]. behavior-based malware detection was proposed [4]. The
Machine learning algorithms have been showing promising behavior-based technique analyzes the behavior of a program
results in classifying Android malware. The algorithms can when it is executing and defines the program as malware if it
overcome the limitations of traditional detection methods and does not execute normally. However, the method affects the
provide a rewarding accuracy score. Machine learning system’s performance, requires more space, and generates many
approaches like Support Vector Machine (SVM), Logistic false positives and false negatives [3]. Behavior-based Android
Regression (LR), and Decision Tree (DT) were previously malware detection method MADAM was proposed which
proposed for malware detection [5]. simultaneously analyzes and correlates features at different
levels. MADAM achieved 95% accuracy in classifying malware
[8].

978-1-7281-0858-2/19/$31.00 © 2019 IEEE


978-1-7281-0858-2/19/$31.00 ©2019 IEEE 5789
Addressing the constraints of the traditional methods,
researchers have proposed machine learning algorithms for
malware classification. A linear SVM was applied to detect
Android malware. A set of 32 features that are highly related to
targeted malware are used in this study and achieved an F-
measure of 0.954 [13]. A multilevel classifier fusion approach,
DroidFusion, was proposed for Android malware detection.
DroidFusion contains two layers wherein the upper layer, a
regular classifier is used, and a ranking based classifier was then Fig. 2: An example of sigmoid function
applied to reassign the label of test data. DroidFusion
experimented with four different datasets including training data. Therefore, the hyperplane is used to classify new
Malgenome-215 and Drebin-215 datasets. For Malgenome-215 data [6]. Fig. 1 illustrates a hyperplane for SVM that separates
data, DroidFusion achieved 0.9840 weighted F-measure score two classes.
while 0.9872 weighted F-measure was obtained for the Drebin-
B. Decision Tree
215 dataset.
Decision Tree is a well-known supervised machine learning
Neural Network for Android Detection of Malware technique for classification. It builds a classification model in
(NADM) was proposed leveraging two fully connected hidden the shape of a tree structure through a process known as binary
layers. NADM was implemented on large-scale data that recursive partitioning [7]. It iteratively splits the data into
contains more than 1 million samples and achieved an average smaller and smaller subsets (branches) until each of the branches
90% accuracy in detecting malware [16]. Random Forest (RF) achieves homogeneous partitions. Therefore, it finally creates a
classifier and deep neural network with three different tree with decision nodes and leaf nodes where the decision nodes
architecture (2, 4, and 7 layers) implemented on a dataset where contain two or more branches and leaf node assigns a class or
11,308 malicious files were collected from the Malacia project decision.
and 2,819 benign files were collected from “virustotal.com”
[14]. The paper also presented four more different C. Logistic Regression
experimentation with varying the number of features which Logistic regression is a classical classifier of supervised
were extracted using autoencoder with different threshold learning. It utilizes the sigmoid function to squeeze the output
approach. Irrespective of feature sets, RF outperformed the deep of a linear equation between 0 and 1. Thus, the output of logistic
network and obtained 99% accuracy. DeepDetector, an Android regression can be used to predict the probability of a class [6].
malware detection method based on deep learning, was Fig. 2 shows an example of a sigmoid function.
proposed that can detect malicious applications and fine-grained
malware families at the same time. DeepDetector was tested D. Neural Network
with varying hidden layers and a different number of neurons At present, neural networks are widely used for many
and a maximum 94% F1 score was obtained in malware applications due to the capability of highly non-linear systems
classification. and flexibility in architecture design. The neural network’s basic
Our proposed method Droid-NNet, a neural network architecture contains input layers, one or more hidden layers,
framework, was optimized with different parameters and and output layers where each of the layers includes a certain
hyperparameters and experimented with two real-world Android number of neurons. Weighted linear combination of neurons of
applications datasets. The experimental results show the a layer is computed and then used as input to another neuron in
robustness and effectiveness of Droid-NNet. the succeeding layer. To capture the non-linearity of the data, a
non-linear function, called activation function, can be applied to
III. METHODOLOGY the weighted sums of neurons. All the weights of a neural
network are set to random values at the initial stage of training.
A. Support Vector Machine Data is fed into the input layer of the network, then it travels
SVM is a well-known supervised learning technique to through the hidden layers, and finally output is produced in the
analyze high-dimensional data. SVM searches for an optimal output layer. The network continually updates the weights
hyperplane in the input space that categorizes two classes given applying backpropagation based on the output and desired target
of the neural network. The network consequently reduces the
error between the output and target in each iteration [17]. In the
process, a loss function is used to

Fig. 1: Hyperplane for an SVM trained with samples from two classes

Fig. 3: Architecture of Droid-NNet

5790
2. FPR (False Positive Rate): The percentage of benign
apps incorrectly classified (FN) to the total number of
calculate the error of the network and the error is minimized by
applying optimization function during backpropagation. benign apps (TN+FP). FP (False Positive) is the
number of incorrect predictions of benign and TN
In this paper, we propose a neural network framework (True Negative) is the number of correct predictions of
named Droid-NNet. Fig. 3 shows the architecture of Droid-NNet benign samples.
which is a neural network containing three layers: input layer, ‫ܲܨ‬
hidden layer, and output layer. A threshold is applied to the ‫ ܴܲܨ‬ൌ
ܶܰ ൅ ‫ܲܨ‬
output layer to classify the instances as malware apps or benign 3. Precision: The proportion of the correctly identified
apps. The input layer contains 215 neurons (number of features benign apps to all the predicted benign apps.
of samples), the hidden layer contains 25 neurons and the output ܶܲ
layer includes only one neuron since the problem is a binary ܲ‫ ݊݋݅ݏ݅ܿ݁ݎ‬ൌ
classification. We applied binary cross-entropy as loss function ܶܲ ൅ ‫ܲܨ‬
4. ‫ܨ‬ଵ ‫݁ݎ݋ܿݏ‬: The harmonic mean of Precision and Recall.
and Adaptive Moment Estimation (Adam) optimizer for
calculating error and updating the parameters. ‫ܨ‬ଵ ‫ ݁ݎ݋ܿݏ‬is a better performance metric than the
accuracy metric for imbalanced data [10].
IV. EXPERIMENT & RESULTS ܲ‫ ݊݋݅ݏ݅ܿ݁ݎ‬ൈ ܴ݈݈݁ܿܽ
‫ܨ‬ଵ ൌ ʹ ൈ
A. Dataset specification ܲ‫ ݊݋݅ݏ݅ܿ݁ݎ‬൅ ܴ݈݈݁ܿܽ
The F-beta score is the weighted harmonic mean of
We performed all experiments on two datasets precision of recall where F-beta value at 1 means
(Malgenome-215 & Drebin-215) of Android apps to evaluate perfect score (perfect precision and recall) and 0 is
Droid NNet. Each of the dataset’s details are shown in Table I. worst.
Drebin-215 dataset is publicly available and Malgenome-215 ܲ‫ ݊݋݅ݏ݅ܿ݁ݎ‬ൈ ܴ݈݈݁ܿܽ
dataset is collected from the supplementary section of [10]. ‫ܨ‬ఉ ൌ ሺͳ ൅ Ⱦଶ ሻ

ሺߚ ൈ ܲ‫݊݋݅ݏ݅ܿ݁ݎ‬ሻ ൅ ܴ݈݈݁ܿܽ
Malgenome-215 dataset has a total of 3,799 app samples, where
2,539 and 1,260 are benign and malware samples, respectively
When ߚ ൌ ͳ , F-beta is ‫ܨ‬ଵ ‫݁ݎ݋ܿݏ‬Ǥ The ߚ parameter
from the Android malware genome project [11]. The Drebin-
determines the weight of precision and recall. ߚ ൏ ͳ
215 dataset consists of 15,036 samples of apps in which 9,476
can be picked, if we want to give more weight to
are benign and the remaining 5,560 are malware from the
precision, while ߚ ൐ ͳ values give more weight to
Drebin project [12]. Both datasets contain 215 features.
recall. Since we want to identify maximum number of
B. Model evaluation metrics malware apps, we give more weights to recall and
Both datasets we utilized in this paper are unbalanced. The utilize ߚ ൐ ͳ values. Hence, the F-beta score is
proportion among benign and malware samples in the considered the principal performance metric to
Malgenome-215 dataset is approximately 66% and 33% evaluate models in our experiments.
respectively. In the Drebin-215 dataset, the ratio of benign and 5. Wilcoxon rank-sum test: Wilcoxon rank-sum test
malware samples is approximately 63%: 37%. Therefore, we evaluates the statistical significance of the model
ought not to consider the "accuracy" metric to assess the performance. The test checks the null hypothesis that
performance of the models. Thus, the following performance two measurement sets are taken from the same
measurements are considered in the assessment of the models distribution while the alternative hypothesis is that
[10]. measurements are more likely to be higher in one
1. TPR (True Positive Rate / Recall): The proportion of study than those in the other.
correctly identified malware apps (TP) to the total C. Experimental Design
number of malware applications (TP+FN). TP (True We evaluated our model performance by comparing it with
Positive) is the quantity of correct predictions while FN the performance of LR, SVM, and DT methods. Both datasets
(False Negative) is the amount of malware were randomly split into training and test data while
misclassified. maintaining the apps class ratio between benign and malware
ܶܲ samples. Trained data was used to train each of the models we
ܴܶܲ ൌ
ܶܲ ൅ ‫ܰܨ‬ experimented with while test data was used for evaluating the
performance of the models. To verify the consistency of the
Table I: Details of Datasets
model, we experimented with each of the models with 10-fold
Datasets Number Number cross-validation.
Number
Number
of benign
of of The SVM, LR, and DT classifiers were applied to both
of apps malware features datasets for comparing results with our proposed Droid-NNet.
apps
apps
Malgenome 215 The algorithms were implemented using Python scikit-learn
3,799 2,539 1,260 library with available hyperparameter options. ‘rbf’ (Radial
-215
Drebin-215 15,036 9,476 5,560 215 Basis Kernel) were chosen for SVM, ‘gini’ index was chosen
for DT, and L2 penalty was chosen for LR classifier.

5791
Our proposed method is a deep neural network. We used Table II: Experimental results of different classifiers on Malgenome-215
dataset
‘ReLu’ activation function in the hidden layer and ‘sigmoid’
function in the output layer. ‘Adam’ and ‘binary cross-entropy’ Classifiers F-beta TPR FPR
were used for optimizer and loss function respectively. We DT 0.978411 0.973810 0.019305
implemented an early stopping method to stop training once the SVM 0.981564 0.961111 0.008280
model performance stops improving on the test data. We
LR 0.988859 0.976190 0.004850
selected validation loss to be monitored for early stopping and
Droid-NNet 0.992623 0.988095 0.005124
set minimum delta to ͳ݁ െ Ͷ (checks minimum change in the
monitored quantity to qualify as an improvement) and patience Table III: Statistical significance of Droid-NNet on Malgenome-215 dataset
to 10 (checks number of epochs that produced the monitored
quantity with no improvement after which training will be Classifiers Statistical significance
stopped). Mini-batch gradient descent was considered and a Droid-NNet vs. DT 0.0031971*
batch size of 64 was chosen to train the model. The initial Droid-NNet vs. SVM 0.00407199*
learning rate was set to 0.001 with a decay of ͳ݁ െ ͷ in every Droid-NNet vs. LR 0.0881054
epoch. The ‫ܮ‬ଶ regularization technique was applied to the
output of the hidden layer to prevent the network from *Statistical significance considering 0.05 significance level

overfitting and the regularization parameter ‘lambda’ was set to


0.001. The ‘beta’ parameter in calculating the F-beta score was
set to 10 to give more weight to recall so that the maximum
number of malware apps can be identified. All the parameters
and hyperparameters used in the model were optimized by grid
search.
The experiments are carried out on a Windows 10 Intel(R)
Core (TM) i7-8565U CPU 1.80 GHz with 16.0 GB RAM and
NVIDIA GeForce MX250 2GB GDDR5. We implemented our
experiment on Keras framework in Python 3.7 version.
D. Experimental results Fig. 4: Boxplot of F-beta score of ten folds on Malgenome-215 dataset
We compared the results of Droid-NNet with the other three
classifiers. The F-beta score was used to evaluate the models’ 2) Performance evaluation on Drebin-215 dataset: The
performance. 10-fold cross-validation was performed for each training and testing split ratio for all the classifiers was 90%:
of the experiments. The same configuration was applied to each 10%. The experimental results of implementing our proposed
Malgenome-215 and Drebin-215 datasets for maintaining network Droid-NNet and the other three algorithms on Drebin-
consistency. 215 dataset are illustrated in Table IV. Droid-NNet
1) Performance evaluation on Malgenome-215 dataset: outperformed all the other three classifiers considering the F-
We trained our proposed network Droid-NNet for 100 epochs beta score, FPR, and TPR. From Table IV, Droid-NNet had the
with early stopping. All the classifiers were trained on 90% of maximum F-beta score (0.988157±0.002) whilst the nearest F-
data and tested on the remaining 10% of the data. Table II beta score was 0.978311±0.002 obtained by the DT classifier.
illustrates the experimental results of different methods on this Fig. 5 presents the F-beta score achieved in each of the tenfold
dataset. Droid-NNet outperformed other methods and achieved by all the classifiers. The boxplot shows that Droid-NNet
the highest F-beta score of 0.988157±0.007, whereas the performed better than other classifiers with better consistency.
second-highest was achieved by LR with 0.988859±0.006 F- Table V illustrates that the Droid-NNet provides a statistically
beta score. The findings of FPR and TPR for Droid-NNet are significant higher F-beta score than the other three classifiers
also higher than other approaches. Table III presents the considering 0.05 significance level.
statistical significance of the performance of the models. Droid- Table IV: Experimental results of different classifiers on Drebin-215 dataset
NNet demonstrates a statistically better F-beta score than SVM,
and DT, though superiority of Droid-NNet over LR was not Classifiers F-beta TPR FPR
statistically significant considering 0.05 significance level. The DT 0.978311 0.973176 0.018679
boxplot in Fig. 4 shows the distribution of F-beta score of ten SVM 0.972919 0.953373 0.015619
folds. LR 0.977644 0.962733 0.013613
Droid-NNet 0.988157 0.979297 0.006648

5792
Table V: Statistical significance of Droid-NNet on Drebin-215 dataset [7] Liu, Z., Zeng, Y., Yan, Y., Zhang, P., & Wang, Y. (2017). Machine
Learning for Analyzing Malware. Journal of Cyber Security and
Classifiers Statistical significance Mobility, 6(3), 227-244.
Droid-NNet vs. DT 0.000157052* [8] Saracino, A., Sgandurra, D., Dini, G., & Martinelli, F. (2016). Madam:
Effective and efficient behavior-based Android malware detection and
Droid-NNet vs. SVM 0.000157052* prevention. IEEE Transactions on Dependable and Secure
Droid-NNet vs. LR 0.000506541* Computing, 15(1), 83-97.
[9] Yu, W., Zhang, H., Ge, L., & Hardy, R. (2013, December). On behavior-
*Statistical significance considering 0.05 significance level based detection of malware on Android platform. In 2013 IEEE global
communications conference (GLOBECOM) (pp. 814-819). IEEE.
[10] Yerima, Suleiman Y., and Sakir Sezer. "Droidfusion: A novel multilevel
classifier fusion approach for Android malware detection." IEEE
transactions on cybernetics 49.2 (2018): 453-466.
[11] Y. Zhou and X. Jiang, “Dissecting Android malware: Characterization
and evolution,” in Proc. IEEE Symp. Security Privacy (SP), San
Francisco, CA, USA, May 2012, pp. 95–109.
[12] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck, “Drebin:
Efficient and explainable detection of Android malware in your pocket,”
in Proc. 20th Annu. Netw. Distrib. Syst. Security Symp. (NDSS), San
Diego, CA, USA, Feb. 2014, pp. 1–15.
[13] Ham, H. S., Kim, H. H., Kim, M. S., & Choi, M. J. (2014). Linear SVM-
based Android malware detection for reliable IoT services. Journal of
Fig. 5: Boxplot of F-beta score of ten folds on Drebin-215 dataset Applied Mathematics, 2014.
[14] Sewak, M., Sahay, S. K., & Rathore, H. (2018, June). Comparison of deep
V. CONCLUSION learning and the classical machine learning algorithm for the malware
detection. In 2018 19th IEEE/ACIS International Conference on Software
Malware is increasingly posing a serious security threat to Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Android OS smart device users. It is essential to develop an Computing (SNPD) (pp. 293-296). IEEE.
automatic malware detection solution to reduce the risks of [15] Saracino, A., Sgandurra, D., Dini, G., & Martinelli, F. (2016). Madam:
Effective and efficient behavior-based Android malware detection and
malicious activities. In this paper, we proposed a neural prevention. IEEE Transactions on Dependable and Secure
network-based framework, Droid-NNet, for Android malware Computing, 15(1), 83-97.
detection. We train the Droid-NNet with ‫ܮ‬ଶ regularization [16] Duc, N. V., & Giang, P. T. (2018, December). NADM: Neural Network
technique, early stopping criteria and mini-batch gradient for Android Detection Malware. In Proceedings of the Ninth
International Symposium on Information and Communication
descent method. We performed all the experiments on two Technology (pp. 449-455). ACM.
datasets (Malgenome-215 & Drebin-215) of Android apps to [17] Alauthaman, M., Aslam, N., Zhang, L., Alasem, R., & Hossain, M. A.
evaluate Droid-NNet. We evaluated Droid-NNet performance (2018). A P2P Botnet detection scheme based on decision tree and
by comparing it with the performance of LR, SVM, and DT adaptive multilayer neural networks. Neural Computing and
algorithms. The experimental results show that Droid-NNet Applications, 29(11), 991-1004.
outperformed other methods and achieved the highest F-beta
score for both the dataset.
REFERENCES
[1] Kakavand, Mohsen & Dabbagh, Mohammad & Dehghantanha, Ali.
(2018). Application of Machine Learning Algorithms for Android
Malware Detection. 32-36. 10.1145/3293475.3293489.
[2] Kalash, M., Rochan, M., Mohammed, N., Bruce, N. D., Wang, Y., &
Iqbal, F. (2018, February). Malware classification with deep
convolutional neural networks. In 2018 9th IFIP International
Conference on New Technologies, Mobility and Security (NTMS) (pp. 1-
5). IEEE.
[3] Mujumdar, A., Masiwal, G., & Meshram, D. B. (2013). Analysis of
signature-based and behavior-based anti-malware
approaches. International Journal of Advanced Research in Computer
Engineering and Technology (IJARCET), 2(6).
[4] Burguera, I., Zurutuza, U., & Nadjm-Tehrani, S. (2011, October).
Crowdroid: behavior-based malware detection system for Android.
In Proceedings of the 1st ACM workshop on Security and privacy in
smartphones and mobile devices (pp. 15-26). ACM.
[5] Gavrilut, D., Cimpoesu, M., Anton, D. and Ciortuz, L. (2009) Malware
Detection Using Machine Learning, Proceedings of the International
Multiconference on Computer Science and Information Technology ,
735-741.
[6] Afrin, R., Haddad, H., & Shahriar, H. (2019, July). Supervised and
Unsupervised-Based Analytics of Intensive Care Unit Data. In 2019 IEEE
43rd Annual Computer Software and Applications Conference
(COMPSAC) (Vol. 2, pp. 417-422). IEEE.

5793

You might also like