masum2019
masum2019
Abstract— Android, the most dominant Operating System Neural networks are currently widely used for many
(OS), experiences immense popularity for smart devices for the applications due to the capability of highly non-linear systems
last few years. Due to its’ popularity and open characteristics, and flexibility in architecture design. In this paper, we propose
Android OS is becoming the tempting target of malicious apps a deep neural network framework named Droid-NNet for
which can cause serious security threat to financial institutions, Android malware detection. Our contributions include: (1) We
businesses, and individuals. Traditional anti-malware systems do conduct a comprehensive assessment with rigorous
not suffice to combat newly created sophisticated malware. Hence, experimental setting to asses Droid-NNet performance with two
there is an increasing need for automatic malware detection publicly available real-world Android application datasets, and
solutions to reduce the risks of malicious activities. In recent (2) Droid-NNet provides high weighted F-beta score, high true
years, machine learning algorithms have been showing promising positive rate and low false positive rate based on deep neural
results in classifying malware where most of the methods are network architecture which suggests that detecting Android
shallow learners like Logistic Regression (LR). In this paper, we malware using deep learning technique is promising.
propose a deep learning framework, called Droid-NNet, for
malware classification. However, our proposed method Droid- The rest of the paper is organized as follows: In Section II,
NNet is a deep learner that outperforms existing cutting-edge we introduce the related work of Android malware detection.
machine learning methods. We performed all the experiments on Section III describes the methodology of our proposed method
two datasets (Malgenome-215 & Drebin-215) of Android apps to Droid-NNet along with the other three classifiers that are
evaluate Droid-NNet. The experimental result shows the implemented in this paper. The experimental setting and results
robustness and effectiveness of Droid-NNet. are explained in Section IV. Finally, Section V concludes the
paper.
Keywords—neural network, android malware, android security
II. BACKGROUND & RELATED WORK
I. INTRODUCTION
Traditional detection techniques have been applied for
Malware is malicious software (e.g. viruses, ransomware, classifying Android malware. Signature-based detection is the
trojan horses, and spyware) that can damage or execute harmful most widely used anti-malware system. It identifies a malware
actions on devices [2]. Android is one of the most accepted OS instance by searching specified byte sequences (called
for smart devices like phones, tablets, and other mobile devices. signatures) into an object to investigate matching with known
Due to its popularity and open characteristics, Android is prone signatures from blacklisted malicious programs. The detection
to malware attacks, which can cause devastating effects such as method is not effective against “zero-day attacks” as the system
stealing information, corrupting files, infecting entire network is formed based on known malware signatures [3]. A signature-
of devices [1]. Therefore, malware poses a major security threat based detection method was proposed to Android malware
to financial institutions, businesses, and individuals. detection that leverages signature matching algorithms. An
The number of malware threats on Android-based smart extension of signature-based method was proposed that
devices are increasing exponentially and the newly created combines anomaly-based and signature-based mechanisms. The
malware has become more sophisticated and variants. Hence, combined approach achieved 96% accuracy in classifying
traditional malware detection techniques such as signature- malicious apps by experimenting on three different data sets [8].
based detection, heuristic detection or behavior-based detection To overcome the limitations of signature-based detection,
are not adequate to combat malicious software [2]. behavior-based malware detection was proposed [4]. The
Machine learning algorithms have been showing promising behavior-based technique analyzes the behavior of a program
results in classifying Android malware. The algorithms can when it is executing and defines the program as malware if it
overcome the limitations of traditional detection methods and does not execute normally. However, the method affects the
provide a rewarding accuracy score. Machine learning system’s performance, requires more space, and generates many
approaches like Support Vector Machine (SVM), Logistic false positives and false negatives [3]. Behavior-based Android
Regression (LR), and Decision Tree (DT) were previously malware detection method MADAM was proposed which
proposed for malware detection [5]. simultaneously analyzes and correlates features at different
levels. MADAM achieved 95% accuracy in classifying malware
[8].
Fig. 1: Hyperplane for an SVM trained with samples from two classes
5790
2. FPR (False Positive Rate): The percentage of benign
apps incorrectly classified (FN) to the total number of
calculate the error of the network and the error is minimized by
applying optimization function during backpropagation. benign apps (TN+FP). FP (False Positive) is the
number of incorrect predictions of benign and TN
In this paper, we propose a neural network framework (True Negative) is the number of correct predictions of
named Droid-NNet. Fig. 3 shows the architecture of Droid-NNet benign samples.
which is a neural network containing three layers: input layer, ܲܨ
hidden layer, and output layer. A threshold is applied to the ܴܲܨൌ
ܶܰ ܲܨ
output layer to classify the instances as malware apps or benign 3. Precision: The proportion of the correctly identified
apps. The input layer contains 215 neurons (number of features benign apps to all the predicted benign apps.
of samples), the hidden layer contains 25 neurons and the output ܶܲ
layer includes only one neuron since the problem is a binary ܲ ݊݅ݏ݅ܿ݁ݎൌ
classification. We applied binary cross-entropy as loss function ܶܲ ܲܨ
4. ܨଵ ݁ݎܿݏ: The harmonic mean of Precision and Recall.
and Adaptive Moment Estimation (Adam) optimizer for
calculating error and updating the parameters. ܨଵ ݁ݎܿݏis a better performance metric than the
accuracy metric for imbalanced data [10].
IV. EXPERIMENT & RESULTS ܲ ݊݅ݏ݅ܿ݁ݎൈ ܴ݈݈݁ܿܽ
ܨଵ ൌ ʹ ൈ
A. Dataset specification ܲ ݊݅ݏ݅ܿ݁ݎ ܴ݈݈݁ܿܽ
The F-beta score is the weighted harmonic mean of
We performed all experiments on two datasets precision of recall where F-beta value at 1 means
(Malgenome-215 & Drebin-215) of Android apps to evaluate perfect score (perfect precision and recall) and 0 is
Droid NNet. Each of the dataset’s details are shown in Table I. worst.
Drebin-215 dataset is publicly available and Malgenome-215 ܲ ݊݅ݏ݅ܿ݁ݎൈ ܴ݈݈݁ܿܽ
dataset is collected from the supplementary section of [10]. ܨఉ ൌ ሺͳ Ⱦଶ ሻ
ଶ
ሺߚ ൈ ܲ݊݅ݏ݅ܿ݁ݎሻ ܴ݈݈݁ܿܽ
Malgenome-215 dataset has a total of 3,799 app samples, where
2,539 and 1,260 are benign and malware samples, respectively
When ߚ ൌ ͳ , F-beta is ܨଵ ݁ݎܿݏǤ The ߚ parameter
from the Android malware genome project [11]. The Drebin-
determines the weight of precision and recall. ߚ ൏ ͳ
215 dataset consists of 15,036 samples of apps in which 9,476
can be picked, if we want to give more weight to
are benign and the remaining 5,560 are malware from the
precision, while ߚ ͳ values give more weight to
Drebin project [12]. Both datasets contain 215 features.
recall. Since we want to identify maximum number of
B. Model evaluation metrics malware apps, we give more weights to recall and
Both datasets we utilized in this paper are unbalanced. The utilize ߚ ͳ values. Hence, the F-beta score is
proportion among benign and malware samples in the considered the principal performance metric to
Malgenome-215 dataset is approximately 66% and 33% evaluate models in our experiments.
respectively. In the Drebin-215 dataset, the ratio of benign and 5. Wilcoxon rank-sum test: Wilcoxon rank-sum test
malware samples is approximately 63%: 37%. Therefore, we evaluates the statistical significance of the model
ought not to consider the "accuracy" metric to assess the performance. The test checks the null hypothesis that
performance of the models. Thus, the following performance two measurement sets are taken from the same
measurements are considered in the assessment of the models distribution while the alternative hypothesis is that
[10]. measurements are more likely to be higher in one
1. TPR (True Positive Rate / Recall): The proportion of study than those in the other.
correctly identified malware apps (TP) to the total C. Experimental Design
number of malware applications (TP+FN). TP (True We evaluated our model performance by comparing it with
Positive) is the quantity of correct predictions while FN the performance of LR, SVM, and DT methods. Both datasets
(False Negative) is the amount of malware were randomly split into training and test data while
misclassified. maintaining the apps class ratio between benign and malware
ܶܲ samples. Trained data was used to train each of the models we
ܴܶܲ ൌ
ܶܲ ܰܨ experimented with while test data was used for evaluating the
performance of the models. To verify the consistency of the
Table I: Details of Datasets
model, we experimented with each of the models with 10-fold
Datasets Number Number cross-validation.
Number
Number
of benign
of of The SVM, LR, and DT classifiers were applied to both
of apps malware features datasets for comparing results with our proposed Droid-NNet.
apps
apps
Malgenome 215 The algorithms were implemented using Python scikit-learn
3,799 2,539 1,260 library with available hyperparameter options. ‘rbf’ (Radial
-215
Drebin-215 15,036 9,476 5,560 215 Basis Kernel) were chosen for SVM, ‘gini’ index was chosen
for DT, and L2 penalty was chosen for LR classifier.
5791
Our proposed method is a deep neural network. We used Table II: Experimental results of different classifiers on Malgenome-215
dataset
‘ReLu’ activation function in the hidden layer and ‘sigmoid’
function in the output layer. ‘Adam’ and ‘binary cross-entropy’ Classifiers F-beta TPR FPR
were used for optimizer and loss function respectively. We DT 0.978411 0.973810 0.019305
implemented an early stopping method to stop training once the SVM 0.981564 0.961111 0.008280
model performance stops improving on the test data. We
LR 0.988859 0.976190 0.004850
selected validation loss to be monitored for early stopping and
Droid-NNet 0.992623 0.988095 0.005124
set minimum delta to ͳ݁ െ Ͷ (checks minimum change in the
monitored quantity to qualify as an improvement) and patience Table III: Statistical significance of Droid-NNet on Malgenome-215 dataset
to 10 (checks number of epochs that produced the monitored
quantity with no improvement after which training will be Classifiers Statistical significance
stopped). Mini-batch gradient descent was considered and a Droid-NNet vs. DT 0.0031971*
batch size of 64 was chosen to train the model. The initial Droid-NNet vs. SVM 0.00407199*
learning rate was set to 0.001 with a decay of ͳ݁ െ ͷ in every Droid-NNet vs. LR 0.0881054
epoch. The ܮଶ regularization technique was applied to the
output of the hidden layer to prevent the network from *Statistical significance considering 0.05 significance level
5792
Table V: Statistical significance of Droid-NNet on Drebin-215 dataset [7] Liu, Z., Zeng, Y., Yan, Y., Zhang, P., & Wang, Y. (2017). Machine
Learning for Analyzing Malware. Journal of Cyber Security and
Classifiers Statistical significance Mobility, 6(3), 227-244.
Droid-NNet vs. DT 0.000157052* [8] Saracino, A., Sgandurra, D., Dini, G., & Martinelli, F. (2016). Madam:
Effective and efficient behavior-based Android malware detection and
Droid-NNet vs. SVM 0.000157052* prevention. IEEE Transactions on Dependable and Secure
Droid-NNet vs. LR 0.000506541* Computing, 15(1), 83-97.
[9] Yu, W., Zhang, H., Ge, L., & Hardy, R. (2013, December). On behavior-
*Statistical significance considering 0.05 significance level based detection of malware on Android platform. In 2013 IEEE global
communications conference (GLOBECOM) (pp. 814-819). IEEE.
[10] Yerima, Suleiman Y., and Sakir Sezer. "Droidfusion: A novel multilevel
classifier fusion approach for Android malware detection." IEEE
transactions on cybernetics 49.2 (2018): 453-466.
[11] Y. Zhou and X. Jiang, “Dissecting Android malware: Characterization
and evolution,” in Proc. IEEE Symp. Security Privacy (SP), San
Francisco, CA, USA, May 2012, pp. 95–109.
[12] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck, “Drebin:
Efficient and explainable detection of Android malware in your pocket,”
in Proc. 20th Annu. Netw. Distrib. Syst. Security Symp. (NDSS), San
Diego, CA, USA, Feb. 2014, pp. 1–15.
[13] Ham, H. S., Kim, H. H., Kim, M. S., & Choi, M. J. (2014). Linear SVM-
based Android malware detection for reliable IoT services. Journal of
Fig. 5: Boxplot of F-beta score of ten folds on Drebin-215 dataset Applied Mathematics, 2014.
[14] Sewak, M., Sahay, S. K., & Rathore, H. (2018, June). Comparison of deep
V. CONCLUSION learning and the classical machine learning algorithm for the malware
detection. In 2018 19th IEEE/ACIS International Conference on Software
Malware is increasingly posing a serious security threat to Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Android OS smart device users. It is essential to develop an Computing (SNPD) (pp. 293-296). IEEE.
automatic malware detection solution to reduce the risks of [15] Saracino, A., Sgandurra, D., Dini, G., & Martinelli, F. (2016). Madam:
Effective and efficient behavior-based Android malware detection and
malicious activities. In this paper, we proposed a neural prevention. IEEE Transactions on Dependable and Secure
network-based framework, Droid-NNet, for Android malware Computing, 15(1), 83-97.
detection. We train the Droid-NNet with ܮଶ regularization [16] Duc, N. V., & Giang, P. T. (2018, December). NADM: Neural Network
technique, early stopping criteria and mini-batch gradient for Android Detection Malware. In Proceedings of the Ninth
International Symposium on Information and Communication
descent method. We performed all the experiments on two Technology (pp. 449-455). ACM.
datasets (Malgenome-215 & Drebin-215) of Android apps to [17] Alauthaman, M., Aslam, N., Zhang, L., Alasem, R., & Hossain, M. A.
evaluate Droid-NNet. We evaluated Droid-NNet performance (2018). A P2P Botnet detection scheme based on decision tree and
by comparing it with the performance of LR, SVM, and DT adaptive multilayer neural networks. Neural Computing and
algorithms. The experimental results show that Droid-NNet Applications, 29(11), 991-1004.
outperformed other methods and achieved the highest F-beta
score for both the dataset.
REFERENCES
[1] Kakavand, Mohsen & Dabbagh, Mohammad & Dehghantanha, Ali.
(2018). Application of Machine Learning Algorithms for Android
Malware Detection. 32-36. 10.1145/3293475.3293489.
[2] Kalash, M., Rochan, M., Mohammed, N., Bruce, N. D., Wang, Y., &
Iqbal, F. (2018, February). Malware classification with deep
convolutional neural networks. In 2018 9th IFIP International
Conference on New Technologies, Mobility and Security (NTMS) (pp. 1-
5). IEEE.
[3] Mujumdar, A., Masiwal, G., & Meshram, D. B. (2013). Analysis of
signature-based and behavior-based anti-malware
approaches. International Journal of Advanced Research in Computer
Engineering and Technology (IJARCET), 2(6).
[4] Burguera, I., Zurutuza, U., & Nadjm-Tehrani, S. (2011, October).
Crowdroid: behavior-based malware detection system for Android.
In Proceedings of the 1st ACM workshop on Security and privacy in
smartphones and mobile devices (pp. 15-26). ACM.
[5] Gavrilut, D., Cimpoesu, M., Anton, D. and Ciortuz, L. (2009) Malware
Detection Using Machine Learning, Proceedings of the International
Multiconference on Computer Science and Information Technology ,
735-741.
[6] Afrin, R., Haddad, H., & Shahriar, H. (2019, July). Supervised and
Unsupervised-Based Analytics of Intensive Care Unit Data. In 2019 IEEE
43rd Annual Computer Software and Applications Conference
(COMPSAC) (Vol. 2, pp. 417-422). IEEE.
5793