Feature Engineering and Evaluation For Android Malware Detection Scheme
Feature Engineering and Evaluation For Android Malware Detection Scheme
Jaemin Jung1, Jihyeon Park2, Seong-je Cho2, Sangchul Han3, Minkyu Park3, Hsin-Hung Cho4
1 Departmetnt of Computer Science and Engineering, Dankook University, Korea
2 Department
of Software Science, Dankook University, Korea
3 Department of Software Technology, Konkuk University, Korea
4 Department of Computer Science and Information Engineering, National Ilan University, Taiwan
*
Corresponding Author: Seong-je Cho; E-mail: [email protected]
DOI: 10.3966/160792642021032202017
424 Journal of Internet Technology Volume 22 (2021) No.2
detection and classification, there are several combined feature selection technique without
challenges: feature extraction and selection [19, 23-24, degrading the detection performance: the minimal
26-30], collection of a comprehensive real-world domain knowledge-based plus the Gini importance-
dataset [13], choosing and optimizing a suitable based. Using minimal domain knowledge is recent
learning algorithm [21-23], performance evaluation [20, trends in the research on malware detection [38-39].
31], and identifying false alarm [25]. • We construct the balanced datasets using real-world
We propose a new machine learning technique to datasets, AndroZoo [37] and Drebin [33], in our
detect Android malware utilizing permissions and API experiments. The well-known but older datasets
calls. Among the above-mentioned challenges, we such as Drebin, AMD [40] and GooglePlay
focus on feature extraction and selection, dataset (during 2014 – 2016) show some different
collection and identifying false alarms. Feature characteristics compared with the latest AndroZoo
extraction maps a large collection of input data onto a dataset, especially in terms of the number of APIs
small set of features while preserving the relevant invoked by apps (see Section 4).
information [29-30]. Feature extraction may transform • We disclose the causes of incorrect classification
original features into an organized and more significant where a malicious app is undetected or a benign app
subset of information. Feature selection reduces the is misclassified as malicious. To the best of our
dimensionality of datasets, which is a general knowledge, a few studies have been conducted on
preprocessing method in high dimensional data identifying incorrect classification issued by a
analysis [24, 27, 30]. Through feature selection, we machine learning technique in malware detection.
select the relevant feature that we expect to be useful This article is organized as follows. Section 2
for malware detection. The classification results can be explains background knowledge about API calls and
improved by selecting the most relevant features from permissions on the Android platform. Section 3
the extracted features. Feature extraction and selection presents our machine learning-based malware detection
methods can be applied separately or combined in one technique. Section 4 explains our experimental results
step. They significantly affect the performance in terms and analyzes the misclassified samples. In Section 5,
of efficiency, robustness, and accuracy. we compare our work with the related works. Finally,
In our scheme, we first extract the information on all we give the concluding remarks and present possible
API invocations and permission requests from sample future work in Section 6.
apps. Next, we reduce the size of the feature set by
using two feature selection methods: (1) a minimal
domain knowledge-based method and (2) a Gini 2 Background
importance-based selection method. The minimal
domain knowledge-based method simply chooses the 2.1 API (Application Programming Interface)
API calls and permissions used in the existing well-
known studies [19, 32-34] and the Gini importance- The Android platform provides Application
based method decreases the size of the feature set Programming Interfaces (APIs) that applications can
under consideration. We adopt the feature importance use to interact with the underlying Android system to
[35-36] of each feature derived from the Gini impurity do various things [19]. The framework API refers to
of the resulting Random Forest (RF) trees. the collection of various software that makes up the
Many existing studies used imbalanced and/or small Android SDK such as a core set of packages and
datasets. However, imbalanced dataset may result in a classes, a set of XML elements and attributes for
skewed model and too small dataset may lead to poor declaring a manifest file, etc. Android apps contain
generalization. In our study, we construct a large and many API calls and permissions. Each API call is
balanced dataset to build a generalized and non-skewed composed of four types of information: class name,
model. We collect 27,041 benign apps and 26,276 method name, argument information, and return data
malwares from a real-world dataset, AndroZoo. type.
We have carried out several experiments and API calls reflect the functionality and behavior of an
evaluated the proposed Android malware detection app and have been widely used in studies for malware
scheme. It achieved up to 96.51% accuracy with detection, especially using machine learning algorithms.
Random Forest algorithm. We have also investigated Android apps use the official Android APIs and third-
the undetected or misclassified apps in detail and party APIs [41]. Third-party APIs are often only used
discovered that we might incorrectly classify apps that in a few apps and utilizing those APIs as a feature for
are transformed by code obfuscation tools or written machine learning can lead to sparse data problems.
with cross-platform development tools. Also, third-party APIs may have different names but
The main contribution of this work is summarized as the same functionality, and vice versa. Hence, we use
follows: only the official Android APIs in malware detection.
• We reduce the dimensionality of datasets and Salehi et al. [42-43] mentioned that API name alone
decrease the curse of dimensionality using the might not represent its operations and both API calls
Feature Engineering and Evaluation for Android Malware Detection Scheme 425
and their arguments could be an effective representative an app wants to read an address book on the device, it
of the executable behavior. They adopted each API call should declare the READ_CONTACTS permission in
name, its arguments, and return value to detect the AndroidManifest.xml. We collected lists of
Microsoft Windows malware. In our work, we consider permissions from an Android application analysis tool
the following API call information: class name, AndroGuard [48]. The total number of Android
method name, method’s argument types, and method’s permissions collected is 474.
return data type. The API calls with the same class and The permissions declared in a manifest file are
method name are counted as different API calls if they useful in catching the potential risks of apps [19, 32,
have different arguments or return data type. The total 47]. The system’s behavior depends on how sensitive
number of API calls belonging to Android 7.1 (API the permission is. There are three protection levels in
level 25) is 133,271 [44]. Figure 1 shows a bytecode- the Android permission system: normal, signature, and
level API call that consists of a class name (including a dangerous. Permissions for resources and data
package name), a method name, and a method involving the user’s private information or affecting
descriptor. The method descriptor consists of the types the action of other apps fall on dangerous permissions
of arguments and return value [45]. [19, 32]. For example, ACCESS_FINE_LOCATION
(to read the location of the user) and
READ_CONTACTS (to read the user’s contacts) are
classified as dangerous. For dangerous permissions,
Figure 1. An exapmle of bytecode-level API call apps should obtain the permission grant from the user
representation at runtime.
more suitable in classifying sample apps as malicious Table 1 do not have important effects on Android
or benign malware detection. We found 861 APIs have the
Figure 3 lists the top 20 APIs in the order of feature importance of 0 (zero).
decreasing feature importance. The API call for Figure 4 lists the top 20 permissions selected in the
displaying notifications in the Notification Bar order of decreasing feature importance. The READ_
is the most important. The other important APIs PHONE_STATE permission is ranked first. It allows
include the APIs related to the ContentResolver access to device-specific information such as IMEI and
object that accesses data in the Content Providers or phone number. The permissions associated with the
gets information about system settings, the APIs file system, Wi-Fi service, and Android launcher have
related to the Handler class for Android inter-thread also high importance scores. On the other hand, SMS
communication, the APIs to perform operations like and Bluetooth-related permissions are ranked below
locating a device, the APIs for Wi-Fi or Bluetooth 25th among the 79 permissions selected by the
services, and the APIs for file write operation. On the minimal domain knowledge. No permission has the
other hand, SMS and audio-related APIs presented in importance of zero.
428 Journal of Internet Technology Volume 22 (2021) No.2
Based on the feature importance, we select the top N detecting Android malware with the least
APIs and the top M permissions as a feature for computational overhead.
machine learning. We perform a grid search for the
best combination of N and M. Incrementing N from 5 3.4 Machine Learning Models
to 987 and M from 5 to 79 with a step of 5 respectively, We developed the machine learning model for
M+N features were tested. Note that the maximum classifying Android apps into malicious or benign
value of N is 987 (excluding 861 APIs with zero using the features selected in Section 3.3. We choose
importance). We found that when N=405 and M=25, the Random Forest algorithm and use grid search to
Random Forest shows the highest accuracy for determine hyper-parameters. Random Forest has the
Feature Engineering and Evaluation for Android Malware Detection Scheme 429
following advantages [54-55]: (1) it has a relatively VirusShare, and is currently being updated. Recent
small number of parameters that should be controlled, research such as [57-58] used the AndroZoo dataset
and removes the need for pruning the trees, (2) it can in their experiments.
achieve high classification accuracy, (3) it can To construct a balanced dataset, we collected a
overcome the problem of overfitting, and (4) feature similar number of benign apps and malicious ones. For
importance is computed automatically. Random Forest the benign dataset, we downloaded 27,364 benign apps
takes several hyper-parameters. In our experiments we from the AndroZoo website between 2017 and 2018.
consider two important parameters among them: For the malware dataset, we also downloaded 26,438
max_depth and n_estimators, which control malicious apps between 2014 and 2018. Then we
the maximum depth of each tree and the number of removed apps from which we cannot extract any API
trees in the forest, respectively. We perform a grid calls or permissions. We also removed apps that belong
search to find out the parameter values with which the to both datasets. The resulting dataset consists of
Random Forest model achieves the highest detection 27,041 benign apps and 26,276 malware.
accuracy on our datasets. Table 3 compares our dataset with other well-known
datasets in terms of the average number of used APIs.
4 Experiments and Analysis As mobile users require more useful and convenient
functions, recent apps use more APIs. This fact makes
4.1 Dataset extracting and selecting significant features more
important for the efficiency and effectiveness of
In our experiments, we leverage the AndroZoo machine learning. The high dimensionality of features
dataset [37, 56], a well-known large-scale collection of may lead to computational difficulty, classification
Android apps. AndroZoo collects Android apps from noise, or overfitting.
several sources including Google Play and
4.2 Metrics TP + TN
Accuracy =
We describe the performance of our machine TP + FP + FN + TN
learning model based on a confusion matrix (Table 4), The ground truth indicates that we already know if
commonly used in machine learning. The performance the app is malicious or benign. Reliable ground truth is
metrics we consider are recall (True Positive Rate), essential to verify malware detection models. For
specificity (True Negative Rate), and accuracy, which building a reliable ground truth dataset, we rely on
can be derived from the confusion matrix. Their AndroZoo’s classification and VirusTotal anti-
definitions are as follows. virus decisions. The malicious dataset consists of the
apps that three or more anti-virus software of
Table 4. Confusion matrix VirusTotal judged to be malicious. The benign
Prediction dataset consists of the apps that all anti-virus software
Malicious Benign judged to benign.
TP FN 4.3 Experiments
Malicious
Ground (True Positive) (False Negative)
Truth FP TN First, we measure the performance using features
Benign
(False Positive) (True Negative) after applying the domain knowledge-based feature
selection. We construct the feature vector with relevant
1,848 APIs and 79 permissions as explained in Section
TP
Recall = 3.3.1. We evaluate our scheme using 5-fold cross-
TP + FN validation. The samples are randomly grouped into 5
disjoint subsets of equal size. The Random Forest is
TN trained and tested five times using each subset as test
Specificity =
TN + FP data and the others as training data. The detection
430 Journal of Internet Technology Volume 22 (2021) No.2
accuracy is 96.33 % with the training time 38.89s and new dataset. We collected 200 benign apps from the
the testing time 1.12s on average. Table 5 shows the AndroZoo (AndroZoo2019) and 200 malware from the
prediction results when the model performs best. The DREBIN [33] (Drebin). AndroZoo2019 is a set of
detection accuracy is 96.72%, the recall is 97.15%, and benign apps collected from AndroZoo during 2019.
the specificity is 96.30%. Note that we collected our original benign apps during
2017~2018 and malware during 2014~2018; both were
Table 5. The best prediction results with the domain collected from AndroZoo. No new apps are in our
knowledge-based feature selection original datasets. We train our model with our original
datasets and test it with the new dataset. The results are
Prediction
Malicious Benign
shown in Table 8. The detection accuracy is 96.0%, the
Ground Malicious 5,105 150 recall is 97.5% and the specificity is 94.5%. This
Truth Benign 200 5,208 means that our model is not overfitted.
Then we measure the performance using features Table 8. Prediction results with the new test dataset
after applying the combined feature selection. The Prediction
feature vector is composed of 405 APIs and 25 Malicious Benign
permissions as explained in Section 3.3.2. We also Ground Malicious 195 5
employ 5-fold cross-validation. The detection accuracy Truth Benign 11 189
is 96.51 % with the training time 12.06s and the testing
time 0.62s on average. Table 6 shows the prediction Adversarial machine learning is a technique that
results when the model performs best. The detection tries to deceive machine learning models into
accuracy is 96.85%, the recall is 97.09%, and the misclassification by modifying input data. One of the
specificity is 96.61%. strategies of adversarial machine learning is an evasion
attack. Attackers obfuscate their apps to hide or distort
Table 6. The best prediction results with the combined the features and behaviors and evade detection. We
feature selection measure the performance of our model against evasion
Prediction attacks. We conducted an experiment corresponding to
Malicious Benign DexGuard-based obfuscation attack in the attack
Ground Malicious 5,102 153 scenarios of [75]. We train the model with our
Truth Benign 183 5,225 AndroZoo dataset, then test it with 200 benign apps
collected from the F-Droid project [76] before and
To show the selection method is effective, we also after obfuscation. We obfuscate the apps using
measure the performance of the model before the Obfusapk [77] (with reflection). Out of 200 apps, our
feature selection. Before the feature selection, the total model misclassified 6 apps before obfuscation and 14
number of APIs is 133,271 and the total number of apps after obfuscation. The accuracy decreases from
permissions is 474. If we use all the APIs as a feature, 97% to 93%.
the training could take too long, thus, we applied the
domain knowledge-based selection to APIs only. Table 4.4 Analysis of Misclassified Apps
7 summarizes the number of features, training time, This section analyzes some of the falsely classified
and accuracy. The combined feature selection approach apps in the worst performance experiment of the
reduces the training time by 79.60% compared with the combined feature selection approach. They are 66
domain knowledge-based approach (only to APIs) and malicious apps (false negative) and 142 benign apps
69.00% the domain knowledge-based approach. Also, (false positive). We discuss the possible reasons for the
it achieves almost the same detection accuracy despite misclassification in terms of code obfuscation,
reduced features. grayware, and cross-platform development tools.
Table 7. Summary of experimental results 4.4.1 Code Obfuscation
# of # of Training Detection
Feature selection From a laborious manual analysis, we discover that
APIs permissions time accuracy
Domain all misclassified apps are obfuscated. Most obfuscators
knowledge-based 1,848 474 59.11s 96.36% support identifier renaming and/or API hiding [6, 59-
(only to API) 60]. Identifier renaming changes the names of
Domain packages, classes, and methods. If any of them is
1,848 79 38.89s 96.33%
knowledge-based changed, the extracted APIs cannot be found in the list
Combined 405 25 12.06s 96.51% of the official APIs. API hiding hides the names of
invoked APIs using the Java reflection mechanism.
To check if our model is overfitted we test it with a API invocation codes are replaced with the codes for
Feature Engineering and Evaluation for Android Malware Detection Scheme 431
finding and calling APIs via Java reflection-related are predicted as benign, and vice versa.
APIs. These types of code obfuscation can transform We investigate the 66 undetected malicious apps.
the functional parts of the apps by altering the API They are divided into 14 malware families as shown in
invocations. Therefore, code obfuscation can Figure 5. We found that about 75 % of them (50 out of
significantly degrade the performance of API call- 66) are adware. Their families are Dowgin, Kuguo,
based malware detection. Jfpush, Feiwo, and unknown adware. A typical
adware program displays advertising sentences in the
4.4.2 Grayware notification bar. If a user touches the notification, an
advertisement is displayed in a WebView component.
Grayware is an unwanted application that is not
No permission is required to display a sentence in the
classified as malware by most anti-malware products
notification bar. And the ranks of WebView-related
but behaves in an undesirable manner or causes
APIs in our API ranking are 270 ~ 325 as shown in
security risks. Grayware is neither benign nor
Table 9, which means that the importance of
malicious. Grayware includes spyware, adware, remote
WebView-related APIs is relatively low.
access tools, etc. Some grayware tagged as malware
Table 9. Example of WebView-related APIs. The column Rank denotes the importance rank in the API list
Rank API
270 Landroid/webkit/WebView;.setWebViewClient:(Landroid/webkit/WebViewClient;)V
Landroid/webkit/WebViewClient;.shouldInterceptRequest:(Landroid/webkit/WebView;Ljava/lang/String;)Landro
271
id/webkit/WebResourceResponse;
279 Landroid/webkit/WebView$HitTestResult;.getType:()I
285 Landroid/webkit/WebView;.setFocusable:(Z)V
... ...
325 Landroid/webkit/WebView;.removeJavascriptInterface:(Ljava/lang/String;)V
We submit the 142 misclassified benign apps to These features may cause our approach to misclassify
VirusTotal [61] in June 2019. VirusTotal apps as malware. Figure 6, for example, shows the
judged nine of them as malware (Table 10), but only screenshot of ‘com.unicrios.funnyskeleton’
one or two of about 70 anti-malware products app. This app provides live wallpapers. Users can set
classified them as malware. We found that these apps animation speed and send feedback to Google Play
are grayware. These apps request unnecessary Store. Its functionality is simple, but it requires an
permissions or use APIs for the subsidiary unnecessary permission WRITE_EXTERNAL_STORAGE
functionality such as advertisements or information and contains WebView-related APIs that are irrelevant
sharing. However, the relevant features rank high. to its functionality.
432 Journal of Internet Technology Volume 22 (2021) No.2
Figure 7 shows the screenshots of another game app. improvement. So this app contains several permissions
In Figure 7, the left figure displays a game scene, the and APIs, which rank high, as shown in Table 11.
middle one advertisement, and the right one “privacy These permissions and APIs have little to do with the
policy”. This app collects IMEI information, network functionality of the game but may cause our model to
information (IP address and Wi-Fi information), and classify the app as malware.
location information for advertisements and service
libieunh.so, a malicious advertisement library (Figure analysis for detecting Android malware using machine
10). We conclude that the effect of cross-platform learning. Static analysis is an approach that evaluates
development tools on our malware detection approach Android apps by scanning their executable code
is relatively small. without runtime analysis. The static features are
obtained without executing the sample apps. On the
contrary, dynamic analysis conducts malware detection
by executing sample apps and monitoring their
behavior.
Dynamic analysis need to mimic the actual runtime
environment and simulate effectively human
operations to achieve high code coverage. Static
analysis has several advantages over dynamic analysis.
It does not need any execution scenario as well as the
notions of test case. It can be implemented in a
lightweight manner for deployment on computing
resource-limited devices and operate on a stand-alone
basis on a mobile device. In addition, there is no
possibility for mobile devices to be infected by
malware during its analysis. In this work, therefore, we
focus on static analysis.
Several studies on Android malware detection adopt
machine learning algorithms and use APIs and
Figure 10. Malware detection result on libieunh.so
permissions as the features. These studies have
considered various criteria in selecting APIs and
permissions for efficient malware detection. Table 13
5 Related Work summarizes those studies.
Table 13. Comparison of our study and existing studies on Android malware detection
Feature selection Dataset
Static/
Features (Feature (Malware/ Acc. Classifier
Dynamic
refinement) Benign)
Peiravian
APIs, Permissions Static analysis None 1,260 / 1,250 96.88% SVM, J48, Bagging
et al. [19]
APIs, Permissions,
Arp
Network addresses, Static analysis feature weight 5,560 / 123,453 94% SVM
et al. [33]
Filtered intents, etc.
Aafer APIs frequency analysis k-NN, ID3 DT, C4.5
Static analysis 3,987 / 16,000 99%
et al. [34] (with arguments) + data flow analysis DT, SVM
NB, SVM, RBF
Chan APIs,
Static analysis information gain 175 / 621 92.36% Network, MLP,
et al. [72] Permissions,
Liblinear, J48, RF
Qiao APIs, ANOVA,
Static analysis 1,260 / 5,000 94.41% SVM, RF, NN
et al. [73] Permissions SVM-RFE
Zhu Sensitive APIs, TF-IDF,
Static analysis 1,065 / 1,065 88.26% Rotation Forest, SVM
et al. [74] Permission rate cosine similarity
multilevel data FT, RF, Random
Li Dynamic
Permissions pruning (PRNR, 5,494 / 310,926 95.63%, Committee, SVM,
et al. [71] analysis
SPR, PMAR) Rotation Forest, PART
SVM, Logistic
Salah et al. FF_AF based on
Symmetric patterns Static analysis 5,560 / 123,453 99% regression, SGD
[79] TF_IDF
AdaBoost, LDA
APIs
Gini impor-
(with arguments,
Our study Static analysis tance based 26,276 / 27,041 96.51% RF
return type),
method
Permissions,
Peiravian et al. [19] employed three machine Machine (SVM) with API calls and permissions as
learning models, Bagging, J48 and Support Vector features. They performed experiments using a total of
Feature Engineering and Evaluation for Android Malware Detection Scheme 435
2510 samples including 1260 malicious and 1250 information considered in this work is permission
benign apps, and the experiments demonstrated that requests, APIs, permission-rate and monitoring system
Bagging achieved the best performance in classifying events. They scored each feature through methods such
the datasets. They used a relatively small dataset as TF-IDF or cosine similarity to select top features. At
compared to our work. Their scheme differs from ours classification stage, an ensemble classifier Rotation
in that it does not have feature selection step. The Forest is employed. With 2,130 samples (1,065 benign
reduced number of permissions and APIs make our and 1,065 malware), the classifier achieves an accuracy
scheme perform efficiently. of 88.26%, which is higher than SVM by 3.33% under
Arp et al. [33] developed the machine learning the same experimental conditions.
technique called DREBIN which resorted to static Salah et al. [79] found out symmetric features across
analysis for malware detection on Android mobile malicious Android applications. They took into
device. From Android apps, DREBIN extracted APIs, account different types of static features and chosen the
permissions, hardware components, filtered intents, most important features to detect Android malware.
network addresses, etc. The extracted features were They introduced a frequency-based feature selection
presented as strings and organized as eight different method called the feature frequency-application
feature sets. They embedded the features into a high- frequency (FF - AF) to reduce the feature space size,
dimensional vector space. After representing Android and merged Android app URLs into a single feature
apps as feature vectors, DREBIN learned a linear SVM called the URL_score. The proposed method was
algorithm to classify. A dataset of about 120,000 apps evaluated using five machine learning classifiers with
is used for training and detection. The evaluation the DREBIN dataset. They used 349 features from the
results indicated that DREBIN could achieve a six feature categories such as APIs, permissions, app
detection accuracy rate of 94% by incorporating components, etc. The linear SVM of the five classifiers
numerous features. However, utilizing too many showed the highest accuracy up to 99%.
features can increase the computational overhead [71]. All the aforementioned studies selected features
Li et al. [71] presented a permission usage-based based on domain knowledge. For example, DREBIN
malware detection system SigPID. Through three- [33] analyzed malware, selected relevant APIs, and
levels of permission pruning methods, they identified used them as feature. Other approaches selected
22 significant permissions. Then they experimented feature(s) based on statistical analysis or data mining
SigPID using 67 machine learning models and found with domain knowledge [34, 71-74]. For example, in
[34], after selecting APIs related to malicious behavior,
that Functional Tree (FT) yielded the highest recall
the authors analyzed the frequency of APIs in normal
with the shortest processing time. They also compared
apps and malware and selected APIs with the large
SigPID with other malware detection approaches
difference in the frequency. In this paper, we select
such as DREBIN [33] and showed that SigPID+FT
features using minimal domain knowledge, and then
achieved a high detection rate in spite of a small select relevant features among them using Gini
number of features (22 permissions). importance-based method. Specifically, features are
Aafer et al. [34] proposed DroidAPIMiner that selected based on the algorithm of decision trees in
used API call information including parameter values. Random Forest, which is a kind of statistical analysis
They deployed four classifiers: SVM, k-NN, C4.5, and method, and the experimental results before and after
ID5. They collected around 20,000 apps (3,987 the analysis are presented. Most of all, we analyze the
malware and around 16,000 benign apps) and the falsely classified apps and suggest future work.
classifiers achieved a high accuracy (up to 99%). Su et al. [80] constructed the behavioral portrait of
Chan et al. [72] also considered permissions and Android malware to depict behaviors of malware
APIs. The authors selected permissions and API calls samples and detect them based on both static and
with a positive information gain. They conducted the dynamic analysis. They defined several dimensions of
experiments using WEKA using several machine behavioral features to depict malware, and defined
learning algorithms. On 796 apps (621 benign and 175 behavioral tags to generalize meta-data of the features.
malicious), the classifiers achieved the accuracy of They then analyzed the correlation of the behavior tags
92.36%. to construct a behavioral portrait of Android malware.
Qiao et al. [73] utilized the patterns of API calls and Finally a random forest algorithm was combined with
permissions. They considered APIs that were the behavior portrait of malware for Android malware
controlled by permissions. They classified benign and detection.
malicious apps using SVM, RBF kernels, Random Alswaina et al. [78] reviewed the literature over the
Forest, and Artificial Neural Networks. Using 6260 past 10 years related to Android malware families by
apps (5,000 benign and 1,620 malware), the classifiers surveying on Android malware family detection,
with the feature selection achieved an accuracy of identification, and categorization techniques. The
about 78~94%. survey was conducted using three dimensions: analysis
Zhu et al. [74] presented DroidDet. The type (static, dynamic, hybrid), feature (static, dynamic),
436 Journal of Internet Technology Volume 22 (2021) No.2
and techniques (model-based, analysis-based). They spyware, etc. Our experiment results showed that many
introduced a new taxonomy that could categorize anti-malware products of VirusTotal could not
malware familial classification-related studies in terms detect grayware correctly. In order to correctly detect
of the three dimensions. The limitations of the related Android grayware using machine learning, it is
studies and future trends have been highlighted too. necessary to build reliable ground truth dataset for
A meta-classifier or classifier fusion approach current grayware. Therefore, we plan to construct a
extracts features from Android apps, trains several base reliable ground truth dataset for grayware in the future.
classifiers with the features, and collates their detection
results, and selects a final model [81-82]. The Acknowledgements
performance of this approach depends upon the
accuracy of individual base classifiers. If base
This research was supported by Basic Science
classifiers cannot detect malware accurately, the
Research Program through the National Research
performance of the final classification is limited. Hence,
Foundation of Korea (NRF) funded by the Ministry of
studies on effective base classifiers, like our work, are
Science and ICT (No. 2018R1A2B2004830).
significant.
6 Conclusions References
In this paper, we proposed feature extraction and [1] Android Things Home Page, https://developer.android.
selection techniques that use API call and permission com/things/get-started, March, 2020.
information as features of a machine learning model [2] M. Chibuye, J. Phiri, A Remote Sensor Network using
for classifying efficiently and effectively Android apps Android Things and Cloud Computing for the Food Reserve
into malicious or benign. For the API call information, Agency in Zambia, International Journal of Advanced
we used as features class name, method name, and Computer Science and Applications (IJACSA), Vol. 8, No. 11,
arguments and return data type of each method. Since pp. 411-418, 2017.
Android apps contains a very large number of features, [3] W. Song, H. Lee, S.-H. Lee, M.-H. Choi, M. Hong,
it is necessary to reduce the number of features. By Implementation of Android Application for Indoor
combining a minimal domain knowledge-based and Positioning System with Estimote BLE Beacons, Journal of
Gini importance-based methods, we finally selected Internet Technology (JIT), Vol. 19, No. 3, pp. 871-878, May,
405 APIs and 25 permissions out of 133,271 APIs and 2018.
474 permissions, respectively. We constructed a [4] B. Sharma, M. S. Obaidat, Comparative analysis of IoT based
dataset that is balanced and large enough to build a products, technology and integration of IoT with cloud
generalized machine learning model. We downloaded computing, IET Networks, Vol. 9, No. 2, pp. 43-47, March,
the latest Android sample apps, 27,041 benign apps 2020.
and 26,276 malware, from the AndroZoo dataset. We [5] J. Qi, P. Yang, M. Hanneghan, D. Fan, Z. Deng, F. Dong,
then conducted some experiments on the sample apps. Ellipse fitting model for improving the effectiveness of life-
The experiment results showed that our technique had logging physical activity measures in an Internet of Things
the classification accuracy of 96.51% using the environment, IET Networks, Vol. 5, No. 5, pp. 107-113,
features selected by the combined methods. It reduced September, 2016.
the training time by 68.99% without degrading the [6] T. Cho, H. Kim, J. H. Yi, Security Assessment of Code
classification accuracy. Obfuscation based on Dynamic Monitoring in Android
In addition, we demonstrated the superiority of our Things, IEEE Access, Vol. 5, pp. 6361-6371, April, 2017.
model by performing another experiment with a new [7] H. S. Ham, H. H. Kim, M. S. Kim, M. J. Choi, Linear SVM-
test dataset, where no apps in the new dataset are in the based Android Malware Detection for Reliable IoT Services,
aforementioned dataset. The experiment results
Journal of Applied Mathematics, Vol. 2014, Article ID
achieved the accuracy of 96%. This implies that our
594501, September, 2014.
model is not overfitted.
[8] A. K. Sikder, H. Aksu, A. S. Uluagac, 6thSense: A context-
We finally investigated the misclassified 66
aware sensor-based attack detector for smart devices, The
malicious apps and 142 benign apps in detail and
26th USENIX Security Symposium (USENIX Security 17),
discovered that the performance of our model can be
Vancouver, Canada, 2017, pp. 397-414.
degraded by code obfuscation, grayware, and cross-
[9] A. K. Sikder, H. Aksu, A. S. Uluagac, A context-aware
platform development tools. Specially, API hiding
using Java reflection can be a major obstacle to framework for detecting sensor-based threats on smart
Android malware detection based on API calls because devices, IEEE Transactions on Mobile Computing, Vol. 19,
it conceals the functional parts of the sample app by No. 2, pp. 245-261, February, 2020.
hiding the API calls in the app. Meanwhile, about 75% [10] E. B. Karbab, M. Debbabi, A. Derhab, D. Mouheb, MalDozer:
of the undetected malicious apps and 6.3% of Automatic framework for android malware detection using
misclassified apps were greyware such as adware, deep learning, Digital Investigation, Vol. 24, No. Supplement,
Feature Engineering and Evaluation for Android Malware Detection Scheme 437
pp. S48-S59, March, 2018. [26] M. Hassen, M. Carvalho, P. Chan, Malware classification
[11] McAfee, McAfee Mobile Threat Report, https://www.mcafee. using static analysis based features, IEEE Symposium Series
com/enterprise/en-us/assets/reports/rp-mobile-threat-report- on Computational Intelligence (SSCI), Honolulu, HI, USA,
2019.pdf, March, 2019. 2017, pp. 1-7.
[12] A. P. Felt, M. Finifter, E. Chin, S. Hanna, D. Wagner, A [27] Z. Zhu, T. Dumitraş, Featuresmith: Automatically
survey of mobile malware in the wild, Proceedings the 1st engineering features for malware detection by mining the
ACM workshop on Security and privacy in smartphones and security literature, Proceedings of the 2016 ACM SIGSAC
mobile devices, Chicago, Illinois, USA, 2011, pp. 3-14. Conference on Computer and Communications Security,
[13] M. Chandramohan, H. B. K. Tan, Detection of mobile Vienna, Austria, 2016, pp. 767-778.
malware in the wild, IEEE Computer, Vol. 45, No. 9, pp. 65- [28] M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, G.
71, September, 2012. Giacinto, Novel feature extraction, selection and fusion for
[14] K. Shaerpour, A. Dehghantanha, R. Mahmod, Trends in effective malware family classification, Proceedings of the
android malware detection, Journal of Digital Forensics, sixth ACM conference on data and application security and
Security and Law, Vol. 8, No. 3, pp. 21-40, 2013. privacy, New Orleans, Louisiana, USA, 2016, pp. 183-194.
[15] S. H. Seo, A. Gupta, A. M. Sallam, E. Bertino, K. Yim, [29] S. Ranveer, S. Hiray, Comparative analysis of feature
Detecting mobile malware threats to homeland security extraction methods of malware detection, International
through static analysis, Journal of Network and Computer Journal of Computer Applications, Vol. 120, No. 5, pp. 1-7,
Applications, Vol. 38, pp. 43-53, February, 2014. June, 2015.
[16] M. Christodorescu, S. Jha, Static analysis of executables to [30] S. Khalid, T. Khalil, S. Nasreen, A survey of feature selection
detect malicious patterns, Technical Report at the Computer and feature extraction techniques in machine learning, IEEE
Sciences Department of the University of Wisconsin, 2006. Science and Information Conference, London, UK, 2014, pp.
[17] R. W. Lo, K. N. Levitt, R. A. Olsson, MCF: A malicious code 372-378.
filter, Computers & Security, Vol. 14, No. 6, pp. 541-566, [31] B. N. Narayanan, O. Djaneye-Boundjou, T. M. Kebede,
1995. Performance analysis of machine learning and pattern
[18] J. Sahs, L. Khan, A machine learning approach to android recognition algorithms for malware classification, IEEE
malware detection, IEEE European Intelligence and Security National Aerospace and Electronics Conference (NAECON)
Informatics Conference, Odense, Denmark, 2012, pp. 141- and Ohio Innovation Summit (OIS), Dayton, OH, USA, 2016,
147. pp. 338-342.
[19] N. Peiravian, X. Zhu, Machine learning for android malware [32] Android developer, Dangerous permission group, https://
detection using permission and api calls, IEEE 25th developer.android.com/guide/topics/permissions/overview#pe
international conference on tools with artificial intelligence, rmission-groups and https://developer.android.com/training/
Herndon, VA, USA, 2013, pp. 300-305. permissions/requesting#normal-dangerous, March, 2019.
[20] F. A. Narudin, A. Feizollah, N. B. Anuar, A. Gani, Evaluation [33] D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, K. Rieck,
of machine learning classifiers for mobile malware detection, DREBIN: Effective and Explainable Detection of Android
Soft Computing, Vol. 20, No. 1, pp. 343-357, January, 2016. Malware in Your Pocket, Network and Distributed System
[21] M. G. Schultz, E. Eskin, F. Zadok, S. J. Stolfo, Data mining Security (NDSS), San Diego, California, USA, 2014, pp. 23-
methods for detection of new malicious executables, IEEE 26.
Symposium on Security and Privacy (S&P 2001), Oakland, [34] Y. Aafer, W. Du, H. Yin, DroidAPIMiner: Mining API-Level
CA, USA, 2000, pp. 38-49. Features for Robust Malware Detection in Android,
[22] Z. Markel, M. Bilzor, Building a machine learning classifier International conference on security and privacy in
for malware detection, IEEE Second Workshop on Anti- communication systems, Sydney, NSW, Australia, 2013, pp.
malware Testing Research (WATeR), Canterbury, UK, 2014, 86-103.
pp. 1- 4. [35] B. H. Menze, B. M. Kelm, R. Masuch, U. Himmelreich, P.
[23] J. Saxe, K. Berlin, Deep neural network based malware Bachert, W. Petrich, F. A. Hamprecht, A comparison of
detection using two dimensional binary program features, random forest and its Gini importance with standard
IEEE 10th International Conference on Malicious and chemometric methods for the feature selection and
Unwanted Software (MALWARE), Fajardo, Puerto Rico, 2015, classification of spectral data, BMC bioinformatics, Vol. 10,
pp. 11-20. No. 1, pp. 1-16, July, 2009.
[24] A. Feizollah, N. B. Anuar, R. Salleh, A. W. A. Wahab, A [36] Y. Qi, Random forest for bioinformatics, in: C. Zhang, Y. Ma
review on feature selection in mobile malware detection, (Eds.), Ensemble machine learning, Springer US, 2012, pp.
Digital investigation, Vol. 13, pp. 22-37, June, 2015. 307-323.
[25] N. B. Anuar, H. Sallehudin, A. Gani, O. Zakari, Identifying [37] K. Allix, T. F. Bissyandé, J. Klein, Y. L. Traon, Androzoo:
false alarm for network intrusion detection system using Collecting millions of android apps for the research
hybrid data mining and decision tree, Malaysian journal of community, IEEE/ACM 13th Working Conference on Mining
computer science, Vol. 21, No. 2, pp. 101-115, December, Software Repositories (MSR), Austin, Texas, USA, 2016, pp.
2008. 468-471.
438 Journal of Internet Technology Volume 22 (2021) No.2
[38] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, C. [54] V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-
K. Nicholas, Malware detection by eating a whole exe, Olmo, J. P. Rigol-Sanchez, An assessment of the
Workshops at the Thirty-Second AAAI Conference on effectiveness of a random forest classifier for land-cover
Artificial Intelligence, New Orleans, Louisiana, USA, 2018, classification, ISPRS Journal of Photogrammetry and Remote
pp. 268-276. Sensing, Vol. 67, pp. 93-104, January, 2012.
[39] E. Raff, J. Sylvester, C. Nicholas, Learning the pe header, [55] J. Ali, R. Khan, N. Ahmad, I. Maqsood, Random forests and
malware detection with minimal domain knowledge, decision trees, International Journal of Computer Science
Proceedings of the 10th ACM Workshop on Artificial Issues (IJCSI), Vol. 9, No. 5, pp. 272-278, September, 2012.
Intelligence and Security, Dallas, Texas, USA, 2017, pp. 121- [56] L. Li, J. Gao, M. Hurier, P. Kong, T. F. Bissyandé, A. Bartel,
132. J. Klein, Y. L. Traon, Androzoo++: Collecting millions of
[40] F. Wei, Y. Li, S. Roy, X. Ou, W. Zhou, Deep ground truth android apps and their metadata for the research community,
analysis of current android malware, International arXiv preprint arXiv:1709.05281, https://arxiv.org/pdf/1709.
Conference on Detection of Intrusions and Malware, and 05281.pdf, 2017.
Vulnerability Assessment, Bonn, Germany, 2017, pp. 252-276. [57] H. Cai, N. Meng, B. Ryder, D. Yao, Droidcat: Effective
[41] M. Linares-Vásquez, G. Bavota, C. Bernal-Cárdenas, M. Di android malware detection and categorization via app-level
Penta, R. Oliveto, D. Poshyvanyk, API change and fault profiling, IEEE Transactions on Information Forensics and
proneness: a threat to the success of Android apps, Security, Vol. 14, No. 6, pp. 1455-1470, June, 2019.
Proceedings of the 2013 9th joint meeting on foundations of [58] A. Hamidreza, N. Mohammed, Permission-based analysis of
software engineering, Saint Petersburg, Russia, 2013, pp. Android applications using categorization and deep learning
477-487. scheme, MATEC Web of Conferences, Engineering Application
[42] Z. Salehi, M. Ghiasi, A. Sami, A miner for malware detection of Artificial Intelligence Conference 2018 (EAAIC 2018),
based on API function calls and their arguments, The 16th Sabah, Malaysia, 2018, Vol. 255, Article No. 05005, January,
CSI International Symposium on Artificial Intelligence and 2019.
Signal Processing (AISP 2012), Shiraz, Fars, Iran, 2012, pp. [59] J. H. Park, H. J. Kim, Y. S. Jeong, S. J. Cho, S. C. Han, M. K.
563-568. Park, Effects of Code Obfuscation on Android App Similarity
[43] Z. Salehi, A. Sami, M. Ghiasi, MAAR: Robust features to Analysis, Journal of Wireless Mobile Networks, Ubiquitous
detect malicious activity based on API calls, their arguments Computing, and Dependable Applications (JoWUA), Vol. 6,
and return values, Engineering Applications of Artificial No. 4, pp. 86-98, December, 2015.
Intelligence, Vol. 59, pp. 93-102, March, 2017. [60] M. Backes, S. Bugiel, E. Derr, Reliable third-party library
[44] Android Studio, SDK Platform release notes: Android 7.1 detection in android and its security applications, Proceedings
(API level 25), https://developer.android.com/studio/releases/ of the 2016 ACM SIGSAC Conference on Computer and
platforms, January, 2020. Communications Security, Vienna, Austria, 2016, pp. 356-
[45] Java Virtual Machine class file format - Method descriptors, 367.
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html [61] VirusTotal – a free virus, malware and URL online scanning
#jvms-4.3.3, January, 2020. service, https://www.virustotal.com/, January, 2020.
[46] Manifest.permission, https://developer.android.com/reference/ [62] Xamarin homepage, https://dotnet.microsoft.com/apps/xamarin,
android/Manifest.permission, March, 2019. January, 2021.
[47] S. Liang, X. Du, Permission-combination-based scheme for [63] Unity homepage, https://unity.com/, 2020.
android mobile malware detection, IEEE international [64] PhoneGap homepage, https://phonegap.com/, 2020.
conference on communications (ICC), Sydney, NSW, [65] Titanium Mobile Development Environment, https://www.
Australia, 2014, pp. 2301-2306. appcelerator.com/Titanium/, 2020.
[48] AndroGuard Home Page, https://github.com/androguard/ [66] Cocos2D, https://cocos2d-x.org/, 2020.
androguard, March, 2020. [67] J. W. Shim, K. H. Lim, S. J. Cho, S. C. Han, M. K. Park,
[49] Android AAPT - Android packaging tool to create. APK file, Static and Dynamic Analysis of Android Malware and
https://androidaapt.com/, January, 2020. Goodware Written with Unity Framework, Security and
[50] Android studio and Android SDK tools, https://developer. Communication Networks, Vol. 2018, Article ID 6280768,
android.com/studio and https://developer.android.com/studio/ June, 2018.
command-line#tools-sdk, January, 2020. [68] B. Zahran, S. Nicholson, A. Ali-gombe, Cross-Platform
[51] Wikipedia, Domain knowledge, https://en.wikipedia.org/wiki/ Malware: Study of the Forthcoming Hazard Adaptation and
Domain_knowledge, January, 2020. Behavior, Proceeding of the International Conference on
[52] S. Ronaghan, The Mathematics of Decision Trees, Random Security and Management (SAM), The Steering Committee of
Forest and Feature Importance in Scikit-learn and Spark, The World Congress in Computer Science, Computer
https://towardsdatascience.com/the-mathematics-of-decision- Engineering and Applied Computing (WorldComp), Las
trees-random-forest-and-feature-importance-in-scikit-learn- Vegas, Nevada, USA, 2019, pp. 91-94.
and-spark-f2861df67e3, May, 2018. [69] P. Feng, J. Ma, C. Sun, X. Xu, Y. Ma, A novel dynamic
[53] Scikit-learn, https://scikit-learn.org/, January, 2020. Android malware detection system with ensemble learning,
Feature Engineering and Evaluation for Android Malware Detection Scheme 439