Android MLAlg
Android MLAlg
Abstract - The smartphones users have been rapidly accuracy in detecting malware [4]. Different researchers
increasing over the years, mainly the Android users. The main proposed different frameworks to detect malware. These
concern on the Android platform growing is malware detection. frameworks are developed based on different machine learning
Among the various existing approaches in detecting malware, algorithms such as SVM, Naive Bayes, Perceptron, and Deep
machine learning-based algorithms have achieved a high accuracy
in detecting malware. Therefore, Android devices must be
Neural Network algorithms.
provided with the ability to detect malware using machine This paper surveys the state of the art on Android malware,
learning algorithms. Different researchers proposed different and malware detection techniques by focusing on Machine
Machine learning systems to detect malware based on Machine Learning- based algorithms used to detect malware on Android
Learning Algorithms such as SVM, NB, and DNN. This paper devices. The rest of this paper is organized as follows. Section
surveys the state of the art on Android malware detection 2 introduces the Android malware. Section 3 discusses the
techniques by focusing on Machine Learning- based classifiers to machine learning based techniques which are used to detect
detect malicious software on Android devices. malware. Section 4 presents a comparison of all existing
Index Terms - Machine Learning, Android OS, Android Machine Learning malware detection techniques. Section 5
Security, Malware Detection. discusses the comparison findings. Section 6 concludes this
paper by highlighting the research directions and finally the
I. INTRODUCTION future work has been discussed briefly in Section 7.
The usage of smartphones has been rapidly increasing
over the years, with more than 408 million devices sold in the II. ANDROID MALWARE
fourth quarter of 2017, and 174% increase over the 149 The notable growth of Android users has been utilized by the
million devices sold in the same period of 2011 [1] [2]. In malware authors to effect and harm a huge number of users.
particular, Android users have been increased 1639.15% from Malware (Malicious software) in general aims to steal
76 million in 2011 to 1.3 billion in 2017 [1] [2]. Despite the sensitive and personal data stored in mobile devices using
notable growth of Android users, this growth has been utilized some device vulnerabilities and luring user to install
by the malware authors to effect and harm a huge number of applications which will allow the malware author to gain
users. unauthorized root access to infected device. Malware attacks
The main concern on the Android platform growing is can be Bluetooth, SMS, GPS, Phone jail-breaking, and
malware detection. There are two main malware detection Premium rate-based attacks [15].
approaches which are machine learning based and non- In 2017, Nokia company published the "Threat Intelligence
machine learning based approaches. Malware detection Report" [16] [17], which analyses the malware behaviour in
systems based on Machine Learning are employing different network communications. The study showed that 72% of
machine learning algorithms in the detection process. While network infections are targeting smartphones in general.
the non-machine learning based malware detectors are using Figure 1 illustrates the total devices breakdowns percentages
signatures, permissions, or dynamic taint analysis to detect in 2017.
malicious files. With the huge malware production nowadays,
non-machine learning-based Android malware detection
approaches are consuming time to detect it as well as its
inability to detect unseen malware. Thus, Android devices
must be provided with the ability to detect malware using
machine learning algorithms. Machine learning algorithms
have been increasingly applied in security, in response to the
increasing unpredictability and sophistication of modern
malware [3].
Among the various existing approaches in detecting malware,
Machine Learning algorithms and techniques achieved a high Figure 1: Devices BreakdownsPercentages in 2017 [17].
As illustrated, the majority of infections which is 69% are malware detection techniques have been surveyed to reduce
targeting Android devices specifically [16] [17]. the serious damages of malware.
Enck et al. in [18] characterized the security of applications in
the Android Market. III. MACHINE LEARNING BASED MALWARE DETECTIONS
Since many Android developers are failing to securely use SYSTEMS.
Application Program Interfaces (APIs), a wide misuse of
The power of Machine Learning in detecting malware have
privacy-sensitive information has been found within different
been illustrated by different researchers. Hence, in this section,
applications. In addition to that, some Android’s built-in
different machine learning based malware detection system
security features are insufficient, and it can’t provide enough
have been reviewed. It has been classified based on the used
protection and privacy of sensitive information; consequently,
algorithm such as SVM, NB, Perceptron and DNN.
even non-malicious applications can accidentally expose
Vaishanav L. et al. in [5] have summarized Android malware
sensitive information [18].
detection machine learning based techniques. All static,
Once the user installed an application that contains a malicious
dynamic and hybrid analysis-based approaches have been
code, the malicious functionality will take place in the
discussed and evaluated. Presented approaches have been
background while the user using the application. For example,
analysed the malicious applications features to use it in
if the user installed a malicious free game, when it starts
detecting and classifying unknown malicious applications [5].
playing the game, the malicious functionality will start sending
Vaishanav L. et al. focused on “System for automatically
SMS and steal money from users' bank account [19].
Training and Evaluating Android Malware (STREAM)”,
Spotting malwareon Google Play Store has become common
“Multi-Level Anomaly Detector for Android Malware
way to target huge number of users [20]. Android
(MADAM)”, “Host-based Anomaly Detection System
malwarecontinue to evolve with more sophisticated
(HOSBAD)”, and “Manifest Analysis for Malware detection in
capabilities. There are already well-documented cases of
Android (MAMA)” classifiers [5]. STREAM classifier
Android malware, such as ExpensiveWall Malware which has
provides an effective method of outlining malwareand training
been embedded with at least 50 Android free applications on
machine learning classifiers rapidly. It can run on single or
official Play Store. These applications were downloaded by
distributed servers or distributed. It manages the applications,
4.2 million users before Google removed them from the store
drives the features collection, and train the classifier and
[20].
evaluate its performance. STREAM framework provides good
2017 was one of the worst years in the Android cyber world,
accuracy and scalability [5]. MADAM classifier is monitoring
as it witnessed series of deadly malware attacks such as
the user interaction and the running apps by retrieving five
ExpensiveWall, HummingBad malware, FalseGuide malware,
groups of features at kernel level, application level, user level,
and Judy malware [20] [21].
and package levels. MADAM classifier applies either
Earlier of 2017, researchers discovered a new modified
anomaly-based approach or signature-based approach which
version of the “HummingBad malware”, hidden in more than
consider the behavioural patterns derived from known
20 apps on Google Play Store, and they were downloaded by
malwaremisbehaviours. It has been designed to detect
over 12 million users [20]. April 2017, over 40 apps with malicious behavioural patterns which extracted from several
hidden “FalseGuide malware” were spotted on the store and 2 malware categories. This multi-level behavioural analysis
million Android users have been infected. In May 2017, feature provides MADAM with the ability to detect
researchers found 41 apps on the Store hidden with the “Judy misbehaviours of almost all malicious applications. MADAM
malware” where 36.5 million Android users have been classifier requires little user interaction and it does not impact
infected with malicious ad-click software [20]. June 2017, the user experience. Thus, this classifier is considered as an
more than “800 Xavier-laden apps” were found on the store efficient and usable classifier [5].
which were downloaded by millions of users. Also, at the HOSBAD system is designed to detect Android Malware
same period the researchers found first “code injecting rooting which is propagated via SMSs and calls. It integrates data
malware” in the store [20]. mining and supervised machine learning techniques in its
Now, 2018 has begun with slightly better with detecting implementation. It is designed to monitor and extract the
malware before making serious damages. Some researchers applications data at the application layer using data mining
have been discovered a new malware called “AdultSwine” techniques. Then, the extracted data will be used to detect
embedded in around 60 games on Google Play store. It has the malware infections using K-NN supervised machine learning
capability to steal the banking credentials. Also, it is popped classifier to differentiate between normal and malicious
up sexual content even on certified appropriate for children applications [5].
games. Malicious games have been downloaded 3 to 7 million MAMA technique detects the executable Android malicious
times, which is considered as low number comparing to 36.5 applications. The Manifest file of the Android applications is
million infected devices by Judy malware [21]. Hence, the analysed to extract the application features and permissions
main concern on the Android platform growing is malware using “Android Asset Packaging” Tool. The extracted features
detection. In Section 3, different machine learning-based will be used to build supervised machine-learning classifier to
detect malicious applications. MAMA technique considered as
111
2019 Sixth International Conference on Software Defined Systems (SDS)
a new method of representing Android applications, based on machine learning based techniques. The taxonomy has been
its permissions and extracted features from the Manifest file classified based on used algorithm.
[5]. Figure 2 shows the taxonomy of malware detection
Different algorithms have been used along with the Naive “non-self” detection in AIS [6]. The “non-self” detector sets
Bayes Algorithm in “Parallelizing Machine Learning-Based will detect the non-self app instances and the other detects
Classification Malware Detection Technique” [10], “Data “self” app instances [6].
Mining Framework” [11], and “Mobile Security Platform” During generating detector set, the “non-self” detector set is
[12] techniques. However, in Figure 2 the taxonomy trained similarly to the AIS. The outputs from both detector
classification focused on NB algorithm only to avoid dense sets are used in the classifying and detecting Android malware.
taxonomy. The classification technique uses the “exclusive or” model
based on the proportion of fired detectors per set. The mAIS
It can be seen that most of the surveyed malware detection
approach achieved accuracy of 88.33% with rate 76.67% of
machine learning based techniques have focused on using
true positives and 0.00% of false positive rate [6].
Support Vector Machine and Naive Bayes algorithms [7-9]
The mAIS has been extended further to eliminate redundant
[10-12], while few studies have used different algorithms such
and irrelevant features by adding two variants which are
as: Deep Neural Networks and Perceptron algorithms [13]
Genetic and Evolutionary Feature Selection (GEFeS), and
[14].
Split Detector Method (SDM). Adding GEFeS and SDM to
the mAIS simultaneously has been achieved the best results
and improved the accuracy to 93.33% with rate 86.67% of true
A. Artificial Intelligent Proposed System.
positives and 0.00% of false positive rate [6].
Brown J. et al. in [6] introduced a modified version of
Artificial Immune System (AIS), which is called “Multiple-
B. Support Vector Machine Algorithm Based Systems.
Detector Set Artificial Immune System (mAIS)”. mAIS
Wen L. et al. [7] have been proposed a “machine learning-
produces two independent detector sets for “non-self” and
based lightweight system” which is based on Support Vector
“self” detections, instead of producing a single-detector set for
Machine algorithm.
112
2019 Sixth International Conference on Software Defined Systems (SDS)
The system can identify the Android malware by extracting the Sahs J. and Khan L. in [9] presented a novel machine learning
features on the basis of the static and dynamic analysis. The based system to detect Android malware. It uses
proposed system contains two main parts, client and server. “Androguard” project which is an open source project to
In the client side, the user interface will be provided to users in extract features from Package Android applications (APKs).
case of malware alert. The system will alert the user after The used dataset contains 91 malicious and 2081 benign
calculating the MD5 value of the installed application and applications; however, since the majority of the dataset, is
comparing it with MD5 values of the malicious applications benign applications. Therefore, the one class Support Vector
which is stored in SQLite database. If a malicious application Machine classifier has been trained on benign applications
has been identified, the system will alert the user and only using Scikit-learn framework. A random subset of the
recommend user to delete the application. Otherwise, the training set has been chosen to perform k-fold cross-validation.
calculated MD5 of the installed application will be submitted The validation step ran four times and the average taken as a
to the server [7]. result [9].
In the server side, the application features will be extracted The kernel concept has been applied in the proposed system to
using static and dynamic analysis. The “permission, intent, guarantee Mercer condition or positive definiteness property.
uses-feature, application and API” will be extracted as static Given a feature space of binary vectors, string features, feature
features using decoder based on the “Androguard” tool, while space of labelled, directed graphs and feature space of
“CPU consumption, battery consumption, number of running unordered sets of arbitrary cardinalities, different Kernels have
processes and number of short messages” will be extracted as been applied such as Kernel over Binary Vectors, Kernel over
dynamic features using “Droid Box” tool [7]. All extracted string features, Kernel over graph, Kernel over sets, Kernel
features will be sent to the feature selection module to reduce over Non-standard Permissions, and Kernel over applications
the redundant features and based on the PCA-RELIEF the key [9].
features will be selected. After selecting the key features, the The graph kernel showed unusual behavior, its false negative
classification model will be built using SVM algorithm. SVM rate is higher than the true negative rate. On the other hand, the
model will evaluate unfamiliar or unseen Android application bit-vector kernel showed good results against all kernels
by classifying it into malware or benign application [7]. combined. It divides the feature space into two segments:
SVM algorithm has been used to build the proposed system, applications which request some permissions required by
with 80% as a training data, and 20% of the collected samples malware, and applications which do not request any
as the testing data. The collected data contains 2000 Android permission required by malware. It correctly classifies the
applications, half of it are the benign applications while the applications that request same permissions required by
others are malicious applications. The benign samples are malware. In general, the proposed system has shown promising
collected from Google Play store using “crawler” technology, results with very low false negative rate. Sahs J. et al. have
while the malicious applications have been taken from mentioned number of possible improvements at the end of this
“Drebin Project” [22] and “Android Malware Genome paper [9].
Project” [23].
Experimental results show that feature selection algorithm C. Naïve Bayes Algorithm Based Systems.
PCA-RELIEF provides an effective method in detecting In [10], Yerima, S. Y. et al. have been proposed and
Android malware with 95% accuracy, 94.7% true positive rate, investigated parallelizing a machine learning-based
and 13.3% false positive rate [7]. classification malware detection technique using real malware
Gunalakshmii S. and Ezhumalai P. proposed in [8] machine and benign application samples. The developed detector is a
learning-based keyloggers detector system. The proposed parallel combination of different heterogeneous classifiers
system is based on Support Vector Machine algorithm. It is a which are Decision Tree, Simple Logistic, Naïve Bayes,
combination of three components which are: Application PART, and RIDOR algorithms. Combining different machine
Permissions Gatherer, Permissions Analyzer, and Keyloggers learning algorithms demonstrates algorithms’ efficacy and
Detector [8]. improve the detection accuracy. By utilizing several classifiers
The first component will gather the permissions of the with various characteristics, their strengths can be harnessed to
applications using Package Manager API, then the listed enhance the Android malware detection and to make white box
permissions will be stored in SQLite database. API class will analysis faster [10]. The proposed detector has been evaluated
retrieve different information related to a specific installed using 10-fold cross-validation technique. The malicious and
application package. The second component which is the benign applications’ samples have been partitioned into 10
permission analyser will use SVM algorithm to analyse equal parts without overlaps. K-fold cross-validation technique
different permissions and recognize its patterns using WEKA is a popular ML performance evaluator technique.
software. Using SVM the system will be capable enough to The evaluation showed that DT, NB, SL, RIDOR and PART
identify the malicious applications. The last component of the have 95.4%, 86.7%, 93.2%, 95%, 96.3% respectively when
proposed system is the keylogger detector, which will detect the classifiers have been evaluated individually. In terms of the
the keyloggers applications and recommend the user to disable combined classification approach. In this experiment, four
it to mitigate the risks of losing sensitive information [8]. different combination schemes were considered: average of
113
2019 Sixth International Conference on Software Defined Systems (SDS)
classifiers probabilities, product of classifiers probabilities, The used dataset in the experiment contains 241 malicious and
maximum of classifiers probabilities, and majority vote. The 241 benign applications, it is divided into 80% training dataset
obtained results of these combination schemes used in the and 20% testing dataset. All four algorithms have been run
parallel classification approach are 96.3%, 97.2%, 95.2%, and over these datasets to measure its accuracy. The best malware
96.3% respectively. It can be seen that the best accuracy result detection rate was 100% which achieved by OneR and J48
is obtained by product of classifiers probabilities scheme [10]. algorithms. Both algorithms showed the best performance by
Schultz M. G. et al. [11] presented a data mining framework to having zero false positive [12].
detect a new and unpredictable executables malicious
applications. Three different data mining algorithms have been D. Perceptron Algorithm Based Systems
used to generate classifiers which are: RIPPER, Naive Bayes, Gavrilut D. et al. [13] proposed a versatile machine learning-
and a Multi Naive Bayes Classifier. These classifiers have based framework to classify malicious and benign
been applied on a dataset which is gathered from public applications. The framework is based on multi-stage
resources. It contains 4,266 programs, 3,265 of it are combination “cascade” of perceptron algorithm [13]. “One-
malicious, where the rest are benign [11]. The framework has sided perceptrons", and a “kernelized one-sided perceptrons”
been trained on the collected dataset to find patterns and detect have been used in the framework. The “one-sided
new and unseen malicious applications. It has been tested perceptrons" performs the training on only one label either
using 5-fold cross-validation. The dataset will be divided into malicious or benign application, while “kernelized one-sided
five equal blocks, four of them will be used in the training perceptrons” it trains the system based on the Polynomial
phase, while the fifth one will be used in the testing phase [11]. Kernel Function [13]. While developing this model, three
The experiment results showed that RIPPER, Naive Bayes, datasets have been used to train, test and scale up the
and a Multi Naive Bayes Classifier system have accuracy of framework.
89.36%, 97.11%, and 96.88% respectively and false positive The training and testing have been done using a medium size
rate with 7% 0.5%, and 0.5% respectively. Naive Bayes has dataset which contains malicious and benign applications. On
the highest accuracy and false positive rate; however, Multi the other hand, the used scale up dataset has 180 million
Naive Bayes Classifier has higher detection rate with 97.76% records, and its used to test the framework capabilities of
while RIPPER, Naive Bayes have 71.05%, 97.43% detection identifying unknown malwareon a huge dataset [13].
rates respectively [11]. The framework has been trained using 3,5,7, and 10 cross-
In this paper, Schultz M. G. et al. have been compared their validation. The cross-validation technique ran over three
system results with the traditional signature-based systems versions of “One-sided perceptrons" which are: “Cascade
which were used to detect and differentiate between malicious One-Sided Perceptron”, “Cascade One-Sided Perceptron
and benign applications. The Multi Naive Bayes Classifier has with explicitly mapped features F1 score”, and “Cascade
approximately triple detection rate of the traditional signature- One-Sided Perceptron with explicitly mapped features F2
based method which has 33.75% detection rate [11]. score”. The highest accuracy over these versions was the
“Cascade One-Sided Perceptron with explicitly mapped
In [12], Hatche G. W. et al. proposed a platform of mobile
features F1 score” and 3 cross-validation with 96.08%
security to detect malicious activities in Android devices. It is
accuracy [13]. Two functions have been used in training
designed to operate in the cloud environment to support
“kernelized one-sided perceptrons” which are: “Polynomial
multiple devices simultaneously. The proposed platform
Kernel Function”, and “Radial-Base Kernel Function”. These
contains Security Web Server, Analysis Module, Android
functions have been trained using 3,5,7, and 10 cross-
application, and Google Cloud Messaging (GCM). The
validation [13].
Security Server is Linux Apache MySQL PHP server which
The highest obtained result in the training was 96.25% with 5
used to manage and store the Android Device information, log
cross-validation of “Kernelized One-Sided Perceptrons -
data, and applications’ information. It messages the android
Polynomial Kernel Function” [13].
device using GCM and then stores the information sent by the
All the algorithms versions have been tested over the testing
user to process it in the analysis module. The Analysis Module
dataset and the “Kernelized One-Sided Perceptrons -
uses WEKA to analyse the application and classify it into
Polynomial Kernel Function” showed the best result with
malicious and benign applications based on their attributes
88.84% accuracy [13].
using ZeroR, OneR, Naïve Bayes, and J48 algorithms. The
The scale up dataset has been tested using “One-sided
Android applications are developed to listen to GMC and
perceptrons" algorithm. The algorithm has been tested on 10%
respond to the security analysis request by sending system
-100% of the scale up dataset. As the dataset become bigger
information, log data, and applications’ information to the
and bigger, the accuracy of detecting malware reduced. Thus,
security server. GMC is “cloud-based messaging service
the best obtained accuracy was 71.94% when the algorithm
provided by Google for developing applications compatible
tested on 10% of the dataset [13].
with Android, iOS, and Chrome”. It is used to queue the
messages when the device is not connected to the Internet [12].
114
2019 Sixth International Conference on Software Defined Systems (SDS)
E. Deep Neural Networks Based Systems Cerber families according to the official name listed in
VirusTotal. [14].
Y. Chen et al. in [14] proposed a “Quantity Dependent Incorporating QDBP into TSDNN, increases the accuracy
Backpropagation (QDBP)” and an “End-to-end trainable from 59.08% using deep nueral networks to 99.63% and the
Tree-Shaped Deep Neural Network (TSDNN)”. detection rate from 8.33% to 85.4% using TSDNN along
Unlike multi-layer perceptron algorithm proposed in [13], with QDBP [14].
TSDNN classifies the data in layer-wise manner to learn better The proposed model has been tested on a new dataset which
from minor classess. it outperforms the multi-layer perceptron contains unseen-before malware’ families, it includes 14
framework by overcomes the difference between imbalance different families which are differ than the used families in
classes, and adjusting the sensitivity toward each class using training TSDNN [14]. The malware data used in the
QDBP which is based on backpropagation which is a experiments was collected from September 2016 to May 2017.
“gradient-based method to train neural networks” [14]. The first 6 minutes of the malicious’ behaviour were recorded
TSDNN is end-to-end trainable which trains all networks’ using VirusTotal [14].
nodes simultaneously, and it can expand the networks The experimental results showed that the proposed model can
dynamically. In the learning phase, TSDNN will tune the accurately detect the potential malware. All 14 unseen families
output vector through QDBP to improve the erroneous have been detected correctly with percentage higher than 80%.
classification. It classifies the data layer by layer [14]. Half of the unseen malware’ families have been detected
The model first classifies the data into “Malicious” and correctly using the proposed model with accuracy higher than
“Benign”. Then, after the initial classification, the “Malicious” 93%. Testing the proposed system on a new dataset
data will firstly be transferred to the “Concatenation Stage” demonstrates the ability of the model to detect the new
which will concatenate the original vector (V i1) of the data malware [14].
and the learned feature (V o1), then the concateated vector [V
i1, V o1] will be feded into the multi-class classification which IV. MALWARE DETECTION TECHNIQUES COMPARISON
will classify the malicious data based on its attack behaviour
In this section, we compare the proposed malware detection
[14].
frameworks in the surveyed paper. Table 1 compares different
The malicious data will be categorized into 5 categories: Bot,
proposed techniques to detect Android malwarebased on
Exploit, Trajon, Malspam, and Ransomware. Then, the
machine learning techniques algorithms. Different Machine
Ransomware malicious will be transferred to the
learning techniques have been used in developing malware
“Concatenation Stage” which will concatenate [V i1, V o1] and
detection system such as: SVM, NB, Perceptron, and DNN
the Ransomware vector (V o2), then the concateated vector
algorithms. In Table 1, if the surveyed paper authors didn’t
will be feded into the Fine-grained classification stage which
mention the accuracy, false positive rate, detection rate or the
will classifiy the Ransomware furthermore into Cryptomix,
used algorithm in a particular framework, “Not Mentioned”
Locky, CrypMic, Telslacrypt, CryptXXX, Cryptowall, and
will be used.
[7]. The proposed system contains two main parts, client and server. In the client side, the SVM and 95% 13.3% Not
user interface will alert the user in case if the application MD5 value matches one of PCA- Mentioned
the stored malicious applications MD5 values. Otherwise, the calculated MD5 of the RELIEF
installed application will be submitted to the server [7]. algorithms.
[8]. It is a combination of Application Permissions Gatherer, Permissions Analyzer, and SVM Not Not Not
Keyloggers Detector components. The first component will gather the permissions of Mentioned Mentioned. Mentioned
the applications using API and it will be stored in SQLite database. The second
component will use SVM algorithm to analyse different permissions and recognize its
patterns. The last component is the keylogger detector, which will detect the
keyloggers applications [8].
[9]. It uses Androguard project to extract features APKs. The used dataset contains 91 SVM. Not Not Not
malicious and 2081 benign applications; however, one class SVM classifier has been Mentioned. Mentioned. Mentioned.
trained on benign applications only using Scikit-learn framework [9].
[10] Using real malware and benign application samples. The developed detector is a DT 95.4% 4% 96.4%
115
2019 Sixth International Conference on Software Defined Systems (SDS)
parallel combination of different heterogeneous classifiers which are Decision Tree, NB 86.7% 8.7% 91.5%
Simple Logistic, Naïve Bayes, PART, and RIDOR algorithms. Combining different
machine learning algorithms demonstrates algorithms’ efficacy and improve the SL 93.2% 4.6% 97.7%
detection accuracy [10]. RIDOR 95% 5.8% 94.9%
PART 96.3% 3.3% 97%
AvgProb 96.3% 3.1% 98.8%
ProdProb 97.2% 3% 95.3%
MaxProb 95.2% 7.2% 98.6%
Mvote 96.3% 3.1% 96.3%
[11]. It detects new and unpredictable malicious applications with three different RIPPER 89.36% 7.77% 71.05%
algorithms which are: RIPPER, Naive Bayes, and a Multi Naive Bayes Classifier.
NB 97.11% 3.80% 97.43%
The framework has been trained on the collected dataset to find patterns and detect
new and unseen malicious applications [11]. Multi-NB 96.88% 6.01% 97.76%
[12]. It is designed to operate in the cloud environment to support multiple devices ZeroR 49.7 %, 0.00% 45%
simultaneously. The proposed platform contains Security Web Server, Analysis
Module, Android application, and Google Cloud Messaging (GCM) [12]. OneR 100% 0.00% 83%
[13]. It classifies malicious and benign applications. The framework is based on multi-stage “One-sided 74.6% 0.83% 68.73%
combination “cascade” of perceptron algorithm [13]. “One-sided perceptrons", and a perceptron
“kernelized one-sided perceptrons” have been used in the framework. The “one- (OSP)”
sided perceptrons" performs the training on only one label either malicious or benign
OSP F1 85.54% 0.55% 83.76%
application, while “kernelized one-sided perceptrons” it trains the system based on
the Polynomial Kernel Function [13]. OSP F2 85.17% 0.55% 83.22%
kernelized 88.84% 3.9% 89.96%
OSP -
Polynomial
function”
kernelized 57.08% 3.6% 50.97%
OSP -
Kernal
function”
[14]. TSDNN is end-to-end trainable which trains all networks’ nodes simultaneously and it TSDNN 99.63% Not 85.4%
classifies the data in layer-wise manner to learn better from minor classess. it and QDBP. Mentioned
outperforms the multi-layer perceptron framework by overcomes the difference
between imbalance classes, and adjusting the sensitivity toward each class using
QDBP which is based on backpropagation which is a “gradient-based method to train
neural networks” [14].
V. DISCUSSION AND FINDINGS. their detection rate is 83% and 90% respectively [12]. Also,
using “Quantity Dependent Backpropagation (QDBP)” and
As illustrated in the table, different machine learning
an “End-to-end trainable Tree-Shaped Deep Neural Network
algorithms have been used in developing malware detection
(TSDNN)” showed good accuracy and detection rate results
frameworks [10]. Parallelizing combination of different
with 99.63% and 85.4% respectiely.
heterogeneous classifiers such as DT, SL, NB, PART, and
RIDOR algorithms showed good results. The four different Since parallelizing machine learning algorithms showed
combination schemes AvgProb, ProdProb, MaxProb, and excellent results in [10], parallelizing schemas with OneR or
Mvote obtained accuracy 96.3%, 97.2%, 95.2%, and 96.3%, J48 [12] or parallelizing TSDNN along with QDBP [14] may
false positive rate with 3.1%, 3%, 7.2%, and 3.1%, and increase the accuracy and the detection rate of detecting
detection rate with 98.8%, 95.3%, 98.6%, and 96.3% malwareon Android devices. Also, integrating “Kernelized
respectively [10]. “Kernelized One-Sided Perceptrons - One-Sided Perceptrons -Polynomial Function” with OneR or
Polynomial Function” has obtained 88.84% accuracy, 89.96% J48 [13] may increase the accuracy and the detection rate of
detection rate, and reasonable false positive rate with 3.9% Android malwaredetection. Combining one or more techniques
[13]. The highest accuracy was 100% with 0.00% false need more detailed experiments which focus on the
positive rate obtained by OneR and J48 algorithms. However, algorithm’s performance such as robust and security.
116
2019 Sixth International Conference on Software Defined Systems (SDS)
REFERENCES
[1] Egham, “Gartner Says Worldwide Sales of Smartphones Recorded First
Ever Decline During the Fourth Quarter of 2017,” Gartner, 22-Feb-2018.
[Online]. Available: https://www.gartner.com/newsroom/id/3859963.
[2] Egham, “Gartner Says Worldwide Smartphone Sales Soared in Fourth
Quarter of 2011 With 47 Percent Growth,” Gartner, 15-Feb-2012. [Online].
Available: https://www.gartner.com/newsroom/id/1924314.
[3] [H. S. Anderson, A. Kharkar, B. Filar, and P. Roth, “Evading Machine
Learning Malware Detection,” Black Hat, 2017.
117