Mobile Botnet Detection A Deep Learning Approach Using Convolutional Neural Networks
Mobile Botnet Detection A Deep Learning Approach Using Convolutional Neural Networks
Abstract— Android, being the most widespread mobile oper- typically used to connect to online services and are rarely
ating systems is increasingly becoming a target for malware. Ma- switched off, they provide a rich source of candidates for op-
licious apps designed to turn mobile devices into bots that may erating botnets. Thus, the term ‘mobile botnet’ refers to a
form part of a larger botnet have become quite common, thus group of compromised smartphones and other mobile devices
posing a serious threat. This calls for more effective methods to
that are remotely controlled by botmasters using C&C chan-
detect botnets on the Android platform. Hence, in this paper, we
present a deep learning approach for Android botnet detection nels [2], [3].
based on Convolutional Neural Networks (CNN). Our proposed
botnet detection system is implemented as a CNN-based model Nowadays, malicious botnet apps have become a serious
that is trained on 342 static app features to distinguish between threat. Additionally, their increasing use of sophisticated eva-
botnet apps and normal apps. The trained botnet detection model sive techniques calls for more effective detection approaches.
was evaluated on a set of 6,802 real applications containing 1,929 Hence, in this paper we present a deep learning approach that
botnets from the publicly available ISCX botnet dataset. The leverages Convolutional Neural Networks (CNN) for Android
results show that our CNN-based approach had the highest over- botnet detection. The CNN model employs 342 static features
all prediction accuracy compared to other popular machine
to classify new or previously unseen apps as either ‘botnet’ or
learning classifiers. Furthermore, the performance results ob-
served from our model were better than those reported in previ- ‘normal’. The features are extracted through automated re-
ous studies on machine learning based Android botnet detection. verse engineering of the apps, and are used to create feature
vectors that feed directly into the CNN model without further
Keywords—Botnet detection; Deep learning; Convolutional pre-processing or feature selection.
Neural Networks; Machine learning; Android Botnets
We present the design of our CNN-based model for Android
botnet detection and evaluate the model on a dataset of real
I. INTRODUCTION
Android apps consisting of 1,929 botnets samples and 4,873
Android is now the most widespread mobile operating system clean samples. Also, we compare the performance of our CNN
worldwide. Over the years the volume of malware targeting model to other popular machine learning classifiers including
Android has continued to grow [1]. This is because it is easier Naïve Bayes, Bayes Net, Decision Tree, Support Vector Ma-
and more profitable for malware authors to target an operating chine (SVM), Random Forest, Random Tree, Simple Logistic
system that is open-source, more prevalent, and does not re- and Artificial Neural Network (ANN) on the same dataset.
strict the installation of apps from any possible source. As a The results show that the CNN-based model achieved a botnet
matter of fact, numerous families of malware apps that are detection performance of 98.9% with an F1-score of 0.981,
capable of infecting Android devices and turning them into thus outperforming all the other machine learning classifiers.
malicious bots have been discovered in the wild. These An- Furthermore, our CNN model shows better performance re-
droid bots may become part of a larger botnet that can be used sults compared to other existing studies focusing on Android
to perform various types of attacks such as Distributed Denial botnet detection. Some of these studies utilized the same ISCX
of Service (DDoS) attacks, generation and distribution of botnet apps employed in this paper.
Spam, Phishing attacks, click fraud, stealing login credentials
or credit card details, etc. The rest of the paper is organized as follows: Section II dis-
cusses related works in Android botnet detection; Section III
A botnet consists of a number of Internet-connected devices presents the overall system and gives some background on
under the control of a malicious user or group of users known CNN, including a discussion of 1D CNN which is adopted in
as botmaster(s). It also consists of a Command and Control this study; Section IV presents methodology and the experi-
(C&C) infrastructure that enables the bots to receive com- ments performed; Results of experiments are given in Section
mands, get updates and send status information to the mali- V and finally Section VI presents the conclusions of the study
cious actors. Since smartphones and other mobile devices are and possible future work.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.
II. RELATED WORK tion based on feature selection and classification algorithms.
In the study conducted by Kadir et al. [4], the objective was to The paper used ‘permissions requested’ as features and ‘In-
address the gap in understanding mobile botnets and their formation gain’ to select the most significant permissions.
communication characteristics. Thus, they provided an in- Afterwards, Naïve Bayes, Random Forest and Decision Trees
depth analysis of the Command and Control (C&C) and built- were used to classify the Android apps. Results show Random
in URLs of Android botnets. By combining both static and Forest achieving the highest detection accuracy of 94.6% with
dynamic analyses with visualization, relationships between the the lowest false positive rate of 0.099.
analysed botnet families were uncovered, offering insight into
each malicious infrastructure. It is in this study that a dataset Karim et al [11] proposed DeDroid, a static analysis approach
of 1929 samples of 14 Android botnet families were compiled to investigate botnet-specific properties that can be used to
and released to the research community. This dataset is known detect mobile botnets. They first identified ‘critical features’
as the ISCX Android botnet dataset and is available from [5]. by observing the coding behaviour of a few known malware
This paper and several previous works on Android botnets binaries having C&C features. They then compared these ‘crit-
have utilized the full dataset or a subset of it to evaluate pro- ical features’ with features of malicious applications from the
posed Android botnet detection techniques. Drebin dataset [12]. Through this comparison, 35% of the ma-
licious apps in the dataset qualified as botnets. However, clos-
Anwar et al. [6] proposed a static approach towards mobile er examination revealed that 90% were confirmed as botnets.
botnet detection where they utilized MD5 hashes, permissions,
broadcast receivers, and background services as features. Bernardeschia et al. [13] proposed a method to identify bot-
These features were extracted from Android apps to build a nets in Android environment through model checking. Model
machine learning classifier for detecting mobile botnet attacks. checking is an automated technique for verifying finite state
They conducted their experiments on 1400 apps from the systems. This is accomplished by checking whether a structure
UNB ISCX botnet dataset together with 1400 benign apps. representing a system satisfies a temporal logic formula de-
Their best result was 95.1% classification accuracy with a scribing their expected behaviour. In [14], Jadhav et al. pro-
recall value of 0.827 and a precision value of 0.97. pose a cloud-based Android botnet detection system which
exploits dynamic analysis by using a virtual environment with
Paper [7] used machine learning to detect Android botnets cluster analysis. The toolchain for the dynamic analysis pro-
based on permissions and their protection levels. The authors cess within the botnet detection system is composed of strace,
initially used 138 features and then added novel features netflow, logcat, sysdump, and tcpdump. However, the authors
known as protection levels to increase the number of features did not provide any experimental results to evaluate the effec-
to 145. Their approach was evaluated on four machine learn- tiveness of their proposed solution. Moreover, botnets may
ing algorithms: Random Forest, MLP, Decision Trees and easily employ different techniques to evade the virtual envi-
Naïve Bayes. They performed their study on 3270 app in- ronment, and code coverage could limit the system’s effec-
stances (1635 benign and 1635 botnets). The botnet apps used tiveness [15], [24].
were also obtained from the ISCX botnet dataset. The best
results came from Random Forest with 97.3% accuracy, 0.987 Paper [16] proposed an approach to detect mobile botnets us-
recall, and 0.958 precision. ing network features such as TCP/UDP packet size, frame
duration, and source/destination IP address. The authors used
In [8] a method was proposed to detect Android botnets based a set of ML box algorithms and five machine learning classifi-
on Convolutional Neural Networks using permissions as fea- ers to classify network traffic. The five supervised machine
tures. Applications are represented as images that are con- learning approaches include Naïve Bayes, Decision Tree, K-
structed based on the co-occurrence of permissions used with- nearest neighbour, Neural Network, and Support Vector Ma-
in the applications. The proposed CNN is a binary classifier chine. In [17], a method to detect Android botnets based on
that is trained using the images. The authors evaluated their source code mining and source code metric was proposed.
proposed method on 5450 Android applications consisting of There are also a number of works that have proposed signature
1800 botnet applications from the ISCX dataset. Their results based methods for Android botnet detection. These include
show an accuracy of 97.2% with a recall of 0.96, precision of [18-20]. However, these solutions are likely to suffer from the
0.955 and f-measure of 0.957, which is a promising result con- drawbacks of signature based systems which includes the ina-
sidering that only permissions were used in the study. bility to effectively detect previously unseen botnets.
Paper [9] proposed an Android Botnet Identification System Unlike most existing studies, our paper proposes a deep learn-
(ABIS) for checking Android applications in order to detect ing based Android botnet detection system, using Convolu-
botnets. ABIS utilized both static and dynamic features from tional Neural Networks. Also, unlike previous studies that
API calls, permissions and network traffic. The system is utilize only the app permissions, our system is based on 342
evaluated by using several machine learning algorithms with features that represent Permissions, API calls, Commands,
Random Forest obtaining a precision of 0.972 and a recall of Extra Files, and Intents. Furthermore, different from the study
0.969. In [10], a method is proposed for Android botnet detec- in [9] which utilized only permissions, we do not convert fea-
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.
ture vectors into images prior to model training. Instead our deeper layers of the CNN, hence, the number of layers re-
feature vectors are used directly to train 1D CNN models. This quired depends on the complexity and non-linearity of the data
makes our approach computationally less demanding. being analysed. Furthermore, the number of filters in each
stage determines the number of features extracted. Computa-
III. BACKGROUND tional complexity increases with more layers and higher num-
bers of filters. Also, with more complex architectures, there is
A. The CNN-based classification system
the possibility of training an overfitted model which results in
The classification system is built by extracting static features poor prediction accuracy on the testing set(s). To reduce over-
from the corpus of botnet and clean samples. To achieve this, fitting, techniques such as ‘dropout’ [22] and ‘batch regulari-
we used our bespoke tool built in Python for automated re- zation’ are implemented during training of our models.
verse engineering of APKs. With the help of the tool, we ex-
tracted 342 features consisting of five different types (see Ta- C. One Dimensional Convolutional Neural Networks
ble 2) from all the training apps. The five feature types in- Although CNN is more commonly applied in a multi-
clude: API calls extracted from the executable; Permissions dimensional fashion and has thus found success in image and
and Intents from the manifest file; Commands and Extra Files video analysis-based problems, they can also be applied to
from the APK. These features are represented as vectors of one-dimensional data. Datasets that possess a one-dimensional
binary numbers with each feature in the vector represented by structure can be processed using a one-dimensional convolu-
a ‘1’ or ‘0’. Each feature vector (corresponding to one applica- tional neural network (1D CNN). The key difference between
tion) is labelled with its class. The feature vectors are loaded a 1D and a 2D or 3D CNN is the dimensionality of the input
into the CNN model and used to train the model. After train- data and how the filter (feature detector) slides across the data.
ing, an unknown application can be predicted to be either For 1D CNN, the filters only slide across the input data in one
‘clean’ or ‘botnet’ by applying its own extracted feature vector direction. A 1D CNN is quite effective when you expect to
to the trained model. The process is depicted in Figure 1. derive interesting features from shorter (fixed-length) seg-
ments of the overall feature set, and where the location of the
feature within the segment is not of high relevance.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.
tion functions such as Sigmoid or Tangent functions [24]. A missions’ accounted for most of the features. From Table 2, it
simplified view of our architecture is shown in Figure 2. can be seen that there were 135 ‘API calls’ related features
and 130 ‘permissions’ features, while intents accounted for 53
Input layer Convolutional
layer 1
Convolutional
layer 2
Fully connected
layer
features. Some of the features are shown in Table 3.
filter
Table 1: Botnet dataset composition.
output layer
Botnet Family Number of samples
Anserverbot 244
Sliding filter
Bmaster 6
Droiddream 363
Geinimi 264
Sliding filter
Misosms 100
Nickyspy 199
Sliding filter
Notcompatible 76
0 = normal
1= botnet Pjapps 244
Pletor 85
Rootsmart 28
filter
L = 342
Sandroid 44
Tigerbot 96
Wroba 100
Figure 2: Overview of the implemented 1D CNN model for Zitmo 80
Android application classification to detect botnets. Total 1929
IV. METHODOLOGY AND EXPERIMENTS
Table 2: The five different types of features used to train the CNN
In this section we present the experiments undertaken to eval- model.
uate the CNN models developed in this paper. Our models Feature type Number
were implemented using Python and utilized the Keras library API calls 135
with TensorFlow backend. Other libraries used include Scikit Permissions 130
Learn, Seaborn, Pandas, and Numpy. The model was built and Commands 19
Extra files 5
evaluated on an Ubuntu Linux 16.04 64-bit Machine with
Intents 53
4GB RAM. Total 342 features
A. Problem definition
Let A ={a1, a2, … an} be a set of apps where each ai is repre- Table 3: Some of the prominent static features extracted from Android
sented by a vector containing the values of n features (where applications for training the CNN model to detect Android Botnets.
n=342). Let a ={f1,f2,f3 …fn, cl} where ݈ܿ אሼܾݐ݁݊ݐǡ ݈݊ܽ݉ݎሽ Feature name Type
is the class label assigned to the app. Thus, A can be used to TelephonyManager.*getDeviceId API
train the model to learn the behaviours of botnet and normal TelephonyManager.*getSubscriberId API
apps respectively. The goal of a trained model is then to clas- abortBroadcast API
sify a given unlabelled app Aunknown = { f1,f2,f3 …fn, ?} by as- SEND_SMS Permission
signing a label cl, where ݈ܿ אሼܾݐ݁݊ݐǡ ݈݊ܽ݉ݎሽ. DELETE_PACKAGES Permission
PHONE_STATE Permission
SMS_RECIVED Permission
B. Dataset Ljava.net.InetSocketAddress API
In this study we used the Android dataset from [5], which is READ_SMS Permission
Android.intent.action.BOOT_COMPLETED Intent
known as the ISCX botnet dataset. The ISCX dataset contains
io.File.*delete( API
1,929 botnet apps (from 14 different families) and has been chown Command
used in previous works including [4], [7-10], and [17]. The chmod Command
botnet families are shown in Table 1. A total of 4,873 clean Mount Command
apps were used for the study in this paper and these were la- .apk Extra File
belled under the category ‘normal’ to facilitate supervised .zip Extra File
learning when training the CNN and other machine learning .dex Extra File
classifiers. The clean apps were obtained from different cate- .jar Extra file
gories of apps on the Google Play store and verified to be non- CAMERA Permission
malicious by using VirusTotal. ACCESS_FINE_LOCATION Permission
INSTALL_PACKAGES Permission
android.intent.action.BATTERY_LOW Intent
The 342 static features extracted from the apps for model
.so Extra File
training were of 5 types: (a) API calls (b) commands (c) per- android.intent.action.POWER_CONNECTED Intent
missions (d) Intents (e) extra files. The ‘API calls’ and ‘per- System.*LoadLibrary API
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.
C. Experiments to evaluate the proposed CNN based model of all 10 results is then taken to produce the final result. Also,
In order to investigate the performance of our proposed model, during the training of the CNN models (for each fold), 10% of
we performed different sets of experiments. Table 4 shows the the training set was used for validation.
configuration of the CNN model. The 1D CNN model consists
V. RESULTS AND DISCUSSIONS
of two pairs of convolutional and maxpooling layers as shown
in Figure 2. The output of the second max pooling layer is A. Varying the numbers of filters.
flattened and passed on to a fully connected layer with 8 units.
In this section, we examine the results from experimenting
This is in turn connected to a sigmoid activated output layer
with different numbers of filters. In our model, we kept the
containing one unit.
number of filters in both convolutional layers the same. Table
The first set of experiments was aimed at evaluating the im- 5 shows the results from running the 1D CNN model with
different numbers of filters. From the table, it is evident that
pact of number of filters on the model’s performance. The
the number of filters had an effect on the performance of the
second set of experiments was performed to evaluate the effect
model. When increased from 4 to 8, there is an improvement
of varying the length of the filters. In the third, we investigate
in performance. The performance does not improve until we
the impact of the maxpooling size on performance.
reach 32 filters. It then drops again when we increase this to
64. Based on these results we select 32 filters as the optimal
Table 4: Summary of model configurations.
configuration parameter for the model’s number of filters.
Model design summary -1D CNN Notice the increase in the number of training parameters as the
Input layer: Dimension = 342 (feature vector size) number of filters is increased, and for 32 filters, the training of
25,625 parameters is required. With 32 filters we obtain a
1D Convolutional layer: 4, 8, 16, 32, 64 filters,
size = 4, 8, 16, 32, 64 (with number of filters =32) classification accuracy of 98.9% compared to 98.6% that is
MaxPooling layer: Size =2, 4, 8, 16 (with number of filters =32)
obtained with 4 filters. Nevertheless, the results obtain with 4
filters were still acceptable.
1D Convolutional layer: 4, 8, 16, 32, 64 filters,
size = 4, 8, 16, 32, 64 (with number of filters =32) 1) Training epochs, loss and accuracy graphs.
MaxPooling layer: Size =2, 4, 8, 16 (with number of filters =32)
Figures 3 and 4 shows the typical outputs obtained with the
validation and training sets during the training epochs. From
Fully Connected (Dense) layer: 8 units, activation=ReLU Fig. 3, it can be seen that the validation loss is generally fluc-
Output layer: Fully Connected layer; 1 unit, activa- tuating from one training epoch to another after an initial drop.
tion=sigmoid During each epoch, a model is trained and the validation loss
and accuracy are recorded. Our goal is to obtain the model
In order to measure model performance, we used the follow- with the least validation loss because we assume this will be
ing metrics: Accuracy, precision, recall and F1-score. The the ‘best’ model that fits the training data. Thus, at every
metrics are defined as follows (taking botnet class as positive): epoch, the validation loss is compared to previous ones and if
the current one is lower, the corresponding model is saved as
• Accuracy: Defined as the ratio between correctly pre- the best model. We implemented a ‘stopping criterion’ which
dicted outcomes and the sum of all predictions. It is will stop the training once no improvement in performance is
ା
given by: observed within 100 epochs. For example in Figure 3, the best
ାାା
model was obtained with the least validation loss of 0.00531 at
• Precision: All true positives divided by all positive
epoch 45. For the next 100 epochs validation loss did not im-
predictions. i.e. Was the model right when it predict-
prove, hence the training was stopped. Figure 4 shows the
ed positive? Given by: corresponding accuracy behaviour observed from epoch to
ା
epoch.
• Recall: True positives divided by all actual positives.
That is, how many positives did the model identify Table 5: Number of filters vs. model performance. Length of
filters used= 4 for first layer and =4 for second layer; dense
out of all possible positives? Given by:
ା layer = 8 units; validation split=10%.
• F1-score: This is the weighted average of precision
ଶ୶ୖୣୡୟ୪୪୶୰ୣୡ୧ୱ୧୭୬ Number of
and recall, given by: 4 8 16 32 64
ୖୣୡୟ୪୪ା୰ୣୡ୧ୱ୧୭୬
Filters
Accuracy 0.986 0.988 0.988 0.989 0.987
Where TP is true positives; FP is false positives; FN is false Precision 0.978 0.980 0.980 0.983 0.980
negatives, while TN is true negatives (all w.r.t. the botnet
Recall 0.974 0.977 0.976 0.978 0.975
class). All the results of the experiments are from 10-fold
cross validation where the dataset is divided into 10 equal F1-score 0.976 0.978 0.978 0.981 0.977
parts with 10% of the dataset held out for testing, while the Num. training
2777 5,657 11,801 25,625 59,417
models are trained from the remaining 90%. This is repeated parameters
until all of the 10 parts have been used for testing. The average
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.
Table 6: Length of filters vs. model performance. Number of
filters used= 32 in both first and second convolutional layers;
dense layer = 8 units; validation split=10%.
Length of
4 8 16 32 64
filters
Accuracy 0.989 0.988 0.988 0.988 0.988
Precision 0.983 0.979 0.980 0.981 0.983
Recall 0.978 0.977 0.978 0.979 0.974
F1-score 0.981 0.978 0.979 0.979 0.978
Training
25,625 29,081 35,993 49,817 77465
parameters
Figure 3: Training and validation losses at different epochs up
to 145. A stopping criterion of 100 is used to obtain the model C. Varying the Maxpooling parameter
with the least validation loss.
The results of the third set of experiments are discussed here.
The goal is to investigate the effect of changing the maxpool-
ing parameter. This corresponds to a subsampling ratio of 2, 4,
6, and 8 respectively as shown in Table 7. A value of 2 means
the next layer will be half the dimension of the previous one,
etc. Note that the maxpooling layer can be considered a fea-
ture reduction layer that also helps to alleviate overfitting
since it progressively reduces the number of parameters that
need to be trained. The other parameters were fixed as fol-
lows: Number of filters in both convolutional layers = 32;
Length of convolutional filters = 4; number of units in dense
layer=8.
Figure 4: Training and validation accuracies at different It can be seen from Table 7 that as we increase the maxpool-
epochs up to 145. These plots correspond to the training and ing parameter, the total number of training parameters is re-
validation losses depicted in Figure 3. duced. At the same time, we witness a progressive decline in
overall performance. Therefore, for our CNN model designed
B. Varying the length of the filters. to classify applications into ‘botnet’ and ‘normal’, the optimal
In this section we examine the effect of the length of filters on subsampling ratio for both layers is 2.
the performance of the model while the number of filters is
fixed at 32 in each convolutional layer. The length is varied Table 7: Maxpooling parameter vs. model performance.
from 4, 8, 16, 32, to 64 respectively (as shown in Table 6). Length of filters used=4 for both convolutional layers; number
The number of units in the dense layer was fixed at 8. The of filters =32 for both layers; dense layer = 8 units; validation
results indicate that the length of the filters does not appear to split=10%.
have much of an impact on the overall classification accuracy Maxpooling parame-
and F1-score performance, when increased. However, the 2 4 6 8
ter/Subsampling ratio
least filter length of 4 achieves the highest accuracy and F1- Accuracy 0.989 0.987 0.983 0.978
score. Note that as we increase the length of the filters, the
number of parameters to be trained increases (from 25,652 for Precision 0.983 0.982 0.974 0.971
length=4 to 77,465 for length=64). Recall 0.978 0.973 0.967 0.948
F1-score 0.981 0.978 0.970 0.959
The lack of improvement with the length of filters may be
Training
attributed to larger number of parameters leading to overfitting 25,625 9497 6,425 5,401
Parameters
the model to the training data thereby reducing its generaliza-
tion capability. This in turn leads to degraded performance D. CNN performance vs. other machine learning classifiers:
when tested on new data. Basically, what these results show is 10 fold cross validation results.
that when the training parameters increase beyond a certain In Table 8, the performance of the CNN model developed in
limit, the model becomes too complex for the data and this this paper is compared to other machine learning classifiers:
leads to overfitting. This becomes evident in lack of improve- Naïve Bayes, SVM, Random Forest, Artificial Neural Net-
ment or degradation in performance when tested on previously work, J48, Random Tree, REPtree, and Bayes Net. Figure 5
unseen data. shows the F1-scores of the classifiers, where CNN has the
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.
highest F1-score (0.981), followed by SVM (0.976), SL have used are reported in every paper. Nevertheless, it is clear
(0.973), ANN (0.973) and Random Forest (0.973). Bayes Net that our CNN model obtained better overall accuracy, F1 and
had the least F1-score of 0.781. Table 8 shows that the recall recall than the other works.
of CNN is 0.978 which indicates that it has the best botnet Table 9: performance comparisons with other works. Note that
detection performance than the other classifiers. Note that the all of the papers used botnets samples from the ISCX dataset.
ANN was a back propagation neural network built with a sin-
Paper reference Botnets ACC Rec. Prec. F1
gle hidden layer consisting 32 units (neurons). The sigmoid /Benign (%)
activation function was used within the neurons. This ANN
represented the application of a neural network without deep Hojjatinia et al. [8] 1800/3650 97.2 0.96 0.955 0.957
learning. The ANN showed no significant improvement in the Tansettanakorn et al. [9] 1926/150 - 0.969 0.972 -
results when the number of units in the hidden layer was in- Anwar et. al [6] 1400/1400 95.1 0.827 0.97 -
creased beyond 32.
Abdullah et al. [10] 1505/850 - 0.946 0.931 -
Table 8: Comparison of our CNN results with results from Alqatawna & Faris [7] 1635/1635 97.3 0.957 0.987 -
other ML classifiers. This paper 1929/4873 98.9 0.978 0.983 0.981
ACC Prec. Rec. F1
Naïve Bayes 0.872 0.728 0.874 0.795 VI. CONCLUSIONS AND FUTURE WORK
SVM 0.987 0.980 0.973 0.976 In this paper, we proposed a deep learning model based on 1D
CNN for the detection of Android botnets. We evaluated the
RF 0.985 0.982 0.965 0.973
model through extensive experiments with 1,929 botnet apps
ANN 0.985 0.982 0.965 0.973 and 4,387 clean apps. The model outperforms several popular
SL 0.984 0.983 0.963 0.973 machine learning classifiers evaluated on the same dataset. The
results (Accuracy: 98.9%; Precision: 0.983; Recall: 0.978; F1-
J48 0.981 0.974 0.958 0.966 score: 0.981) indicate that our proposed CNN based model can
Random Tree 0.972 0.948 0.955 0.951 be used to detect new, previously unseen Android botnets more
accurately than the other models. For future work, we will aim
REPTree 0.979 0.973 0.954 0.963
to improve the model training process by automating the search
Bayes Net 0.867 0.736 0.832 0.781 and selection of the key influencing parameters (i.e. number of
CNN 0.989 0.983 0.978 0.981 filters, filter length, and number of fully connected (dense)
layers) that jointly result in the optimal performing CNN
model.
CNN 0.981
REFERENCES
SVM 0.976
[1] S. Y. Yerima and S. Khan “Longitudinal Perfomance Anlaysis of
ANN 0.973 Machine Learning based Android Malware Detectors” 2019
SL 0.973
International Conference on Cyber Security and Protection of Digital
Services (Cyber Security), IEEE
RF 0.973
[2] H. Pieterse and M. S. Olivier, "Android botnets on the rise: Trends and
J48 0.966 characteristics," 2012 Information Security for South Africa,
Johannesburg, Gauteng, 2012, pp. 1-5.
REPTREE 0.963
[3] Letteri, I., Del Rosso, M., Caianiello, P., Cassioli, D., 2018. Performance
RANDOM TREE 0.951 of botnet detection by neural networks in software-dened networks, in:
CEUR WORKSHOP PROCEEDINGS, CEUR-WS.
NAÏVE BAYES 0.795
[4] Kadir, A.F.A., Stakhanova, N., Ghorbani, A.A., 2015. Android botnets:
BAYES NET 0.781 What urls are telling us, in: International Conference on Network and
System Security, Springer. pp. 78–91.
0.7 0.75 0.8 0.85 0.9 0.95 1
F1 Score [5] ISCX Android botnet dataset. Available from
https://www.unb.ca/cic/datasets/android-botnet.html. [Accessed
Figure 5: F1-score of CNN vs other ML classifiers. 03/03/2020]
[6] S. Anwar, J. M. Zain, Z. Inayat, R. U. Haq, A. Karim, and A. N. Jabir,
"A static approach towards mobile botnet detection," in 2016 3rd
E. Comparison with other works on Android botnet detection. International Conference on Electronic Design (ICED), 2016: IEEE, pp.
In Table 9, we present a comparison of our results with those 563-567.
reported in other papers that focus on Android botnet detection. [7] J. f. Alqatawna and H. Faris, "Toward a Detection Framework for
Note that all the papers mentioned in the table have used the Android Botnet," in 2017 International Conference on New Trends in
Computing Sciences (ICTCS), 2017: IEEE, pp. 197-202.
ISCX botnet dataset for their work. In our study we utilized the
entire 1929 samples within the dataset. In the second column of [8] S Hojjatinia, S Hamzenejadi, H Mohseni, “Android Botnet Detection
using Convolutional Neural Networks” 28th Iranian Conferenc on
the table, the numbers of botnet samples and benign samples Electircal Engineering (ICEE2020).
used in the papers are shown, while the other columns contain [9] C. Tansettanakorn, S. Thongprasit, S. Thamkongka, and V.
the performance results. Not all of the performance metrics we Visoottiviseth, "ABIS: a prototype of android botnet identification
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.
system," in 2016 Fifth ICT International Student Project Conference
(ICT-ISPC), 2016: IEEE, pp. 1-5.
[10] Z. Abdullah, M. M. Saudi, and N. B. Anuar, "ABC: android botnet
classification using feature selection and classification algorithms,"
Advanced Science Letters, vol. 23, no. 5, pp. 4717-4720, 2017.
[11] Karim, Ahmad & Salleh, Rosli & Shah, Syed. (2015). DeDroid: A
Mobile Botnet Detection Approach Based on Static Analysis.
10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.240.
[12] The Drebin Dataset. Available at: https://www.sec.cs.tu-
bs.de/~danarp/drebin/index.html [accessed 05/03/2020]
[13] Cinzia Bernardeschia, Francesco Mercaldo, Vittoria Nardonec,
Antonella Santoned, Exploiting Model Checking for Mobile Botnet
Detection. 23rd International Conference on Knowledge-Based and
Intelligent Information & Engineering Systems. Procedia Computer
Science 159 (2019) 963–972.
[14] Jadhav, S., Dutia, S., Calangutkar, K., Oh, T., Kim, Y.H., Kim, J.N.,
2015. Cloud-based android botnet malware detection system, in:
Advanced Communication Technology (ICACT), 2015 17th
International Conference on, IEEE. pp. 347–352.
[15] S. Y. Yerima, M. K. Alzaylaee, and S. Sezer. “Machine learning-based
dynamic analysis of Android apps with improved code coverage”
EURASIP Journal on Information Security, 4 (2019).
https://doi.org/10.1186/s13635-019-0087-1
[16] Meng, X. and Spanoudakis, G. (2016). MBotCS: A mobile botnet
detection system based on machine learning. Lecture Notes in Computer
Science, 9572, pp. 274-291. doi: 10.1007/978-3-319-31811-0_17
[17] B. Alothman and P. Rattadilok ‘Android botnet detection: An integrated
source code mining aproach’ 12th International Conference for Internet
Technology and Secured Transactions (ICITST),11-14 Dec.,Cambridge,
UK, 2017, IEEE, pp 111-115.
[18] A. J. Alzahrani and A. A. Ghorbani, "Real-time signature-based
detection approach for sms botnet," in 2015 13th Annual Conference on
Privacy, Security and Trust (PST), 2015: IEEE, pp. 157-164.
[19] D. A. Girei, M. A. Shah, and M. B. Shahid, "An enhanced botnet
detection technique for mobile devices using log analysis," in 2016 22nd
International Conference on Automation and Computing (ICAC), 2016:
IEEE, pp. 450-455.
[20] M. Yusof, M. M. Saudi, and F. Ridzuan, "A New Android Botnet
Classification for GPS Exploitation Based on Permission and API
Calls," in International Conference on Advanced Engineering Theory
and Applications, 2017: Springer, pp. 27-37.
[21] Y. LeCun, Y.Bengio, and G. Hinton, Deep learning, Nature 521 (2015),
no. 7553, 436-444
[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Stuskever, and R.
Salakhutdinov. “Dropout: A simple way to prevent neural networks
from overfitting” The Journal of Machine Learning Research,
15(1):1929-1958, 2014.
[23] X. Glorot, A. Bordes, and Y. Bengio, ‘‘Deep sparse rectier neural
networks,’’ in Proc. 14th Int. Conf. Artif. Intell. Statist., 2011, pp. 315–
323.
[24] M. K. Alzaylaee, S. Y. Yerima, Sakir Sezer “DL-Droid: Deep learning
based android malware detection using real devices” Computers &
Security, Volume 89, 2020, 101663, ISSN 0167-4048,
https://doi.org/10.1016/j.cose.2019.101663.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 21,2020 at 18:37:56 UTC from IEEE Xplore. Restrictions apply.