Available online at www.sciencedirect.
com
ScienceDirect
ICT Express 6 (2020) 325–331
www.elsevier.com/locate/icte
Ransomware Detection using Random Forest Technique
Ban Mohammed Khammas
Department of Computer Networks Engineering, Collage of Information Engineering, Al-Nahrain University, Baghdad, Iraq
Received 27 June 2020; received in revised form 18 September 2020; accepted 5 November 2020
Available online 11 November 2020
Abstract
Nowadays, the ransomware became a serious threat challenge the computing world that requires an immediate consideration to avoid
financial and moral blackmail. So, there is a real need for a new method that can detect and stop this type of attack. Most of the previous
detection methods followed a dynamic analysis technique which involves a complicated process. The present study proposes a novel method
based on static analysis to detect ransomware. The significant characteristic of proposed method is dispensing of disassemble process by
direct extraction of features from raw byte with the use of frequent pattern mining which remarkably increases the detection speed. The Gain
Ratio technique was used for feature selection which exhibited that 1000 features was the optimal number for detection process. The current
study involved using random forest classifier with a comprehensive analysis to the effect of both tree and seed numbers on the ransomware
detection. The results showed that tree numbers of 100 with seed number of 1 achieved best results in terms of time-consuming and accuracy.
The experimental evaluation revealed that the proposed method could achieve a high accuracy of 97.74% for detection ransomware.
⃝c 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: Ransomware detection; Machine learning; Random forest; Cyber security
1. Introduction computer without asking for paying the ransom [4]. According
Nowadays, the attackers use intelligent techniques to gen- to Chittooparambil et al. [1], none of the existing methods
erate new profitable malware type. One of these attacks which can afford detection and stop this type of attack. Besides,
highly spread recently is ransomware. Ransomware is irre- Weckstén, M., et al. [5] and Kharaz, A., et al. [6], confirmed to
versible and difficult to stop not like other security problems the difficulty of stopping this type of attack. Therefore, there
[1]. The strategy of this malware is based on access restriction is an urgent need to introduce new technique that can be used
to user files by encrypting them and demands a ransom in to detect ransomware.
order to obtain the decryption key. According to Symantec This article investigates the machine learning technique
Corporation 2016, hundreds of millions of dollars are enforced for the classification of ransomware using random forest and
to be paid by users as a ransom every year. In 2016, Osterman features extracted from raw byte of the file. Different size
Research and Inc. [2] has conducted a survey including about of seed and tree have been tested experimentally in order
290 organizations from various industrial sectors in Europe to design the best random forest classifier that can detect
and United States. The survey revealed that 50% of them ransomware accurately.
had been victims of a ransomware during a year. About 40% The remainder of this paper is organized as follows:
of these victims have paid to attackers. In another hand, a Section 2 describes the previous works in the literature on ran-
statistics report from VirusTotal (https://www.virustotal.com/ somware detection; Section 3 describes the proposed method,
en/statistics) described that on Feb 2017 around 1.37 million Section 4 includes the description of the collected datasets
of new samples cyber-attack were submitted [3]. which used in the experiment. Section 5 illustrates the exper-
The essential difference between malware and ransomware
imental results and finally Section 6 concludes the paper.
by time taken for the attack and attack behavior. While mal-
ware hide behind applications and then infect and damage the
2. Related works
E-mail address: [email protected].
Peer review under responsibility of The Korean Institute of Communica- Ransomware attack is launched in September of 2013 using
tions and Information Sciences (KICS). RSA public-key cryptography. In 2016, this attack turned
https://doi.org/10.1016/j.icte.2020.11.001
2405-9595/⃝ c 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
B.M. Khammas ICT Express 6 (2020) 325–331
into global issue when more than 1,400,000 Kaspersky users in ransomware detection. Their method included transferring
were attacked in various sectors (Kaspersky Security Bulletin, opcode sequences to N-gram sequences then Term Frequency-
2017). In one Day of 2017 about 400,000 machines in 150 Inverse document frequency (TF-IDF). Five machine-learning
countries have infected by “WannaCry” ransomware (Crowe). methods were used to distinguish between ransomware and
So, in the last few years, many cyberspace researchers show goodware such as; Decision Tree, Random Forest, K-Nearest
significant attention in detection ransomware. Neighbor, Naive Bayes, and Gradient boosting. The best ac-
In general, the detection techniques classified into three curacy of 91.43% was obtained using random forest. Baldwin
categories: first using dynamic analysis, second static analysis, and Dehghantanha [14] used static analysis to detect ran-
and the third try to use a hybrid system that combine dynamic somware. They extracted the opcode characteristics as the
and static analysis. features to be used as input to the machine learning technique
Most ransomwares detection solutions are relying on be- represented by SVM classifier. The WEKA machine learning
havioral detection which called “dynamic analysis” [6–9]. toolset has been used in this work. The best accuracy gained
Takeuchi et al. [7] detected the ransomware based on dynamic was about 96.5% from five crypto ransomware families.
analysis using SVM classifier. They first extracted specific ran- Some researchers used a hybrid technique combines be-
somware feature called the Application Programming Interface tween dynamic analysis and static analysis to detect ran-
(API) call, then study the API call history and its behavior somware. Subedi et al. [15] utilized dynamic analysis and
using Cuckoo Sandbox. The API calls are represented by static analysis in three different levels; assembly, function calls
q-gram vectors. They used 276 ransomware and 312 goodware and library. Besides, they designed CRSTATIC which is an
files. The results showed an accuracy of 97.48% in detection analysis tool that build signatures for identify ransomware
ransomware using SVM. Vinayakumar et al. [8] proposed a families using reverse engineering. Shaukat and Ribeiro [16]
new method by using dynamic analysis to collect the API se- introduced a Strong Trap Layer using a hybrid system of
quences from a sandbox. In their experiment, they downloaded dynamic and static analysis with the use of machine learn-
seven ransomware families. They use multilayer perceptron ing techniques. They utilized 74 samples from 12 Crypto-
(MLP) to classify ransomware and goodware. This method graphic Ransomware families. The results showed detection
revealed an accuracy of 98%. Kharraz et al. [6] utilized a dy- rate around 98.25% using Gradient Tree Boosting Algorithm.
namic analysis system called UNVEIL to detect ransomware. Ferrante et al. [17] proposed a hybrid approach to android ran-
The system creates an artificial and realistic execution en- somware detection. The system combined between dynamic
vironment to detect ransomware. This system exhibited an and static analysis. The dynamic detection method considers
accuracy around 96.3%. Homayoun et al. [9] introduced a memory usage, system call statistics, CPU usage, and network
framework ransomware detection system based on Sequential usage while the static detection method uses the frequency of
Pattern Mining as candidate features to be used as input to the opcodes.
machine learning techniques (MLP, Bagging, Random Forest, Honeypot is another technique which used by many re-
and J48) for classification purpose. The results showed an searchers to detect ransomware. Moore [18] used a honeypot
accuracy about 99% for ransomware detection. folder to monitor the changes occurring in the folder. Some
Weckstén et al. [5] analyzed the behavior for four types researcher invented special tools to detect ransomware. Kolo-
of crypto ransomwares in a virtual machine installed in Win- denker et al. [19] proposed a PayBreak tool which stores the
dows 7 operating system. The authors used the software pro- cryptographic encryption keys in a key vault. These keys are
cess monitor, registry manipulation, file system activity, and used to decrypt affected files after a ransomware attack. In
regshots for tracking the process activity. They asserted that another work, Scaife et al. [20] suggested to use CryptoDrop
crypto ransomware attacks are essentially depend on vssad- system which gives alert the user during suspicious file ac-
min.exe file. Therefore, the users must avoid accessing to tivity using a set of behavior indicators. Continella et al. [21]
the vssadmin.exe software in order to prevent this attack. introduced ShieldFS system that scan the process memory and
Tseng et al. [10] used deep learning method to analyze ran- searching for any indication of using cryptographic.
somwares behavioral from header file of network packets. Based on author knowledge, there is no previous work
Chen et al. [11] proposed a generative adversarial network try to detect ransomware using static analysis for byte level
(GAN). The proposed technique could automatically produce features. In the present study, the byte-level static analysis
dynamic features. has been utilizing to overcome the shortcoming of dynamic
Since dynamic analysis executes the ransomware, it has a analysis. The features are extracted directly from raw bytes
high accuracy rate. However, this analysis takes a relatively of executable file, then the frequent pattern mining has been
long time to process and analyze it, while at this time the used. For classification process, Random Forest machine learn-
malicious payload has likely already been delivered [12]. At ing classifier has been used to classify the ransomware and
the same time, if the environment is fingerprinted by the goodware files.
ransomware, then they are not able to extract significant API
sequences [13].
3. The proposed method
A few analysts have proposed methods based on static anal-
ysis to detect ransomware attack. A recent study conducted by This article provides a novel framework to solve the prob-
Zhang et al. [13] used opcode based as features to be used lem of detecting ransomware attack using static analysis and
326
B.M. Khammas ICT Express 6 (2020) 325–331
one of the prominent and robust machine learning technique
called random forest classifier. This classifier demonstrates re-
markably more sensible and preferable results compared to the
other classifiers in detection different types of attacks [13,22].
Also, this type of classifier has several advantages [23]:
- Few input parameters are needed
- The algorithm is resistant to overfitting.
- The variance decreases with increasing in the number of
trees without resulting in bias
The strategy adopted in the present study is based on
extracting the hierarchical features in ransomware family since
that each ransomware family has common features [1,9].
Therefore, the byte-level static analysis has been utilized
where the features are extracted directly from raw bytes of
executable file (using n-gram features). Besides, the direct fea-
tures extraction is considered faster and more straightforward
since it dealing with bytes level [24–26].
The preprocessing includes three steps namely feature ex-
traction from raw bytes, frequent pattern mining, and normal-
ization. The feature extraction process is performed in Virtual
machine (VM) using 32-bit sliding windows (4-gram) features
for high detection accuracy [27–30]. In frequent pattern min-
ing process, the frequent patterns are extracted from a database
that related to interesting items in data. Fig. 1. Shows the flow diagram of the proposed method. It contains the
The last step is a normalization process, where all frequent preprocessing, feature selection and the classification technique that used in
patterns are given an equal weight for variance stabilization experiment.
according to Eq. (1) [31,32].
n i, j
Nf = ∑ (1) files. The Windows Portable Executable (PE32) ransomware
k n k, j files comprise three different families [9]; (Cerber (267 sam-
where N f is the normalized
∑ frequency, n i, j is the frequency of ples), TeslaCrypt (315 samples), and Locky (258 samples))
specific features, and k n k, j is the total number of features which downloaded from VirusTotal [37]. The goodware files
in a file. included two types of executable files; first type was col-
In second stage, Gain Ratio (GR) has been chosen as one lected from windows platform while the other type was col-
of feature selection methods. The role of feature selection is lected from Portable Apps platform [38]. Both ransomware
manifested in reducing the features dimensionality by remov- and goodware are checked using virustotal.com. Virus Total
ing the irrelevant features and select only the most important is a free tool that used to detect whether file is goodware or
features that used in present predictive model. ransomware file.
The most important stage that follows the feature selection The present method was implemented using computer of
process is the classification stage. The current model utilized Core i7 CPU with 8 core, and 16 GB RAM with two systems;
the random forest classifier which is one of the supervised Windows 10, and Linux 4.1.
learning technique that widely adopted in many studies [9,33].
One of the significant advantage of this classifier is less time-
5. Experimental results and discussion
consuming. Besides it can classify large amounts of data with
a lower percentage error [22,34]. The first step in experiments is dividing the preprocessed
The random forest prediction is based on the majority data into two group; training and testing. Each group contains
voting for the result of the combination predictions of multiple 50% of dataset (840 files) in order to avoid unbalancing.
decision trees. The decision tree is first constructed and based Both groups are consisting the same number of goodware and
on the best combination of variables then the dataset will be ransomware files (420 for each) that randomly selected.
spilled in to subtrees. But at the same time finding the right The experimental analysis utilized different size of the tree
combination of variables is not an easy task [35]. and different number of the seed in each tree. WEKA (WEKA
Random Forest can train efficiently without scaling [36] GUI based machine learning tool) is used to find the best
therefore it was selected in current proposed technique (see design of random forest classifier that provide high accuracy in
Fig. 1). detecting ransomware attack. The common machine learning
performance evaluation metrics are used such as False Nega-
4. Dataset tive Ratio (FNR), False Positive Ratio (FPR), True Negative
The dataset consists of 1680 executable files: 840 ran- Ratio (TNR), True Positive Ratio (TPR), and Accuracy, and
somware executable of different families, and 840 goodware F-Measure (the harmonic mean of precision and recall) [39] to
327
B.M. Khammas ICT Express 6 (2020) 325–331
evaluate the efficiency of our proposed method, as in following
equations:
TP FP
T RP = , FRP = ,
T P + FN FP + T N
TP
Pr ecision =
T P + FP
TP TP +TN
Recall = , Accuracy =
T P + FN T P + FP + T N + FN
(Pr ecision ∗ Recall)
F − Measure = 2 ∗
Pr ecision + Recall
where: True Positive (TP): the number of ransomware that is
correctly predicted as ransomware.
True Negative (TN): the number of goodware files that are
correctly classified as goodware.
False Positive (FP): number of goodware files misclassified
as ransomware. Fig. 2. The accuracy for different features dimension.
False Negative (FN): number of ransomware which is mis-
classified as goodware.
5.1. The effect of features dimension
To explore the suitable effective size of features dimension
to build the classifier, various number of features have been
tested ranging from 1000 to 7000 with a seed number of (1)
and tree set of 100. The remained size of the features from
100 to 1000 is not included in this result because they gave a
very bad detection rate.
Figs. 2, 3 and Table 1 show the classifier accuracy, machine
learning performance criteria and classifier confusion matrix
respectively. Fig. 2 depicts the variation of feature dimensions
versus accuracy. It is evident from the results that the 1000 fea-
tures dimension has the best accuracy of 97.74%. Meanwhile,
Fig. 2 revealed that the increasing the number of features is not Fig. 3. Recall, F-Measure, Precision, and ROC for different features
dimension.
contributing in enhancing the classifier accuracy [27]. Fig. 3
illustrates the ROC, Recall, Precision, and F-Measure of the
Table 1
classification model. It is obviously seen that the performance The TPR, TNR, FPR, and FNR for different features dimension.
in the case of 1000 features dimension is the best not only in Feature dimension FPR FNR TPR TNR
terms of Recall, but also in terms of Precision. The F1-measure
1000 0.043 0.002 0.998 0.957
for 1000 features dimension is above 97.8% and ROC about 2000 0.045 0.002 0.998 0.955
99.6%. Table 1 demonstrates the Confusion Matrix of the 3000 0.05 0.002 0.998 0.95
present model which reveals that the 1000 features dimension 4000 0.055 0.002 0.998 0.945
has the best values for FPR, FNR, TPR, and TNR with 0.043, 5000 0.052 0.005 0.995 0.948
0.002, 0.998, and 0.957 respectively. Accordingly, these results 6000 0.057 0.005 0.995 0.943
7000 0.143 0.005 0.995 0.857
emphasized that 1000 features dimension is the best size to be
used in classification model. So, all the rest of experiments
will adopt this size of 1000 features.
crucial issue highlighted here, what is the best number of
5.2. The effect of tree and seed numbers tree that provides high accuracy with acceptable time for
classification.
Regarding to the effect of tree and seed numbers, the To resolve this issue, several experiments have been con-
current study has tested different sizes of tree changing from ducted to find the best number of trees in terms of accuracy,
10–1000 and the size of seed changing from 1–1000. The recall, F-Measure, precision, and ROC value for the tree
procedure of this testing is implemented by fixing seed number number from 10 to 100 which shown in Fig. 5, in addition to
to one seed and changing tree size from 10–1000 according to Confusion Matrix analysis as in Table 2. It is clearly shown
time-consuming as shown in Fig. 4. that 100 is the best number among the other tree values
It is evident from this figure that classification time is regarding of accuracy, Recall, F-Measure, precision, ROC,
directly proportional with increasing in tree numbers. The FPR, and TNR with a reasonable time of 1.37 s. The rest
328
B.M. Khammas ICT Express 6 (2020) 325–331
Fig. 6. The accuracy of the random forest classifier using different seed
Fig. 4. The time for classification build using 1000 features diminution, the
number.
test dataset using different number of tree (10 to 1000) for the random forest
classifier.
Table 2
The TPR, TNR, FPR, and FNR for different number of tree in the random
forest classifier.
No. of tree FPR FNR TPR TNR
10 0.055 0.005 0.995 0.945
20 0.048 0.005 0.995 0.952
30 0.048 0.002 0.998 0.952
40 0.048 0.002 0.998 0.952
50 0.045 0.002 0.998 0.955
60 0.045 0.002 0.998 0.955
70 0.045 0.002 0.998 0.955
80 0.045 0.002 0.998 0.955
90 0.045 0.002 0.998 0.955
100 0.043 0.002 0.998 0.957
Table 3
The confusion matrix for different classifier.
Classifier type FPR FNR TPR TNR
Fig. 5. Accuracy, Recall, F-Measure, Precision, and ROC for different tree
number. Ada Boost M1 0.05 0.14 0.86 0.95
Bagging 0.035 0.062 0.938 0.965
Rotation Forest 0.026 0.031 0.969 0.974
RF 0.043 0.002 0.998 0.957
number of tree (200 to 1000) is not included in these result
because the accuracy, recall, F-Measure, precision, and ROC
are giving the same results as for the 100 tree but with more
time-consuming. of rotation forest classifier which about (25.63 s), while the
To find the effect of seed numbers on the performance of RF need about (1.3 s) only to build the model. So it can be
the classifier. The number of trees is maintained at 100, while deduced that RF classifier is also more efficient than Rotation
the seed numbers is changing from 1 to 1000. The results show Forest in terms of time consuming.
that best result of accuracy is 97.74% for seed numbers of one To create a basis for comparing estimate metrics between
as illustrated in Fig. 6. This result is in line with [40] which k-fold cross-validation metrics and those obtained by testing
mentioned that in most cases choosing one seed is efficient. on an unseen dataset. Using the entire dataset of 1680 samples,
To prove the ability of RF classifier in detection ran- the present study utilized a 10-fold cross validation approach.
somware attack, three different types of classifiers have been This method trains the classifier iteratively on 90% of the
used for comparison namely Ada Boost M1, Bagging, and training data and checks on the other 10%. The results are
Rotation Forest. The results show that the confusion matrix determined after 10 iterations by adopting the average accu-
(FNR, FPR, TNR, and TPR) and the standard classification racy of all models. The standard classification measures and
measures (accuracy, Recall, Precision, ROC, and F-Measure) confusion matrix results using 10-fold cross validation are
of RF are superior to that of Ada Boost M1and Bagging as illustrated in Fig. 8 and Table 4 respectively.
in Table 3 and Fig. 7 respectively. Meanwhile it is found that The present study is compared with the results of Zhang
the results of Rotation Forest are so close to RF. However, et al. [13] which used opcode based to represent them using
the Rotation Forest needed longer time to build the model n-gram as a features. This technique need a disassembler to get
329
B.M. Khammas ICT Express 6 (2020) 325–331
Table 5
The comparison between the proposed technique and Zhang et al. [13]
technique.
Method Features Dataset Accuracy Prediction
(ransomware/ % time
goodware) in s
Proposed method Byte level 840/840 97.74 1.37
Zhang et al. [13] Opcode-based 1787/100 91.43 7.27
1- The experiments show a magnificent performance of
random forest classifier with the byte level static anal-
ysis for ransomware attack detection.
2- The current analysis emphasized that tree size of 100
with seed size of 1 achieved a high accuracy of
Fig. 7. The standard classification measures of different classifiers.
(97.74%), high ROC about 99.6%, low FPR (around
0.04), and low FNR (around 0.002) in just 1.37 s time
of detection.
3- The features from 100 to 1000 showed bad detection
rate. Also, the increasing features number more than
1000 led to a degradation in accuracy.
4- The number of tree from (200 to 1000) showed the same
performance of that of 100.
CRediT authorship contribution statement
Ban Mohammed Khammas: Conceptualization, Method-
ology, Software, Data collection and curation, Writing - orig-
inal draft, Visualization, Investigation, Validation, Writing -
review & editing.
Declaration of competing interest
The authors declare that they have no known competing
Fig. 8. The standard classification measures of different classifiers when
10-fold cross validation has been used. financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Table 4
The confusion matrix for 10-fold cross validation of different classifier. References
Classifier type FPR FNR TPR TNR
[1] H.J. Chittooparambil, et al., A review of ransomware families and de-
Ada Boost M1 0.045 0.038 0.962 0.955 tection methods, in: International Conference of Reliable Information
Bagging 0.026 0.019 0.981 0.974 and Communication Technology, 2018, pp. 588–597.
Rotation Forest 0.01 0.014 0.986 0.99 [2] I. Osterman Research, Understanding the depth of the global ran-
RF 0.006 0.007 0.993 0.994 somware problem, 2016, http://www.malwarebytes.com/pdf/white-pap
ers/UnderstandingTheDepthOfRansomwareIntheUS.pdf.
[3] P. Burnap, et al., Malware classification using self organising feature
opcode from the file, while the present method has eliminated maps and machine activity data, Comput. Secur. 73 (2018) 399–410.
[4] X. Luo, Q. Liao, Awareness education as the key to ransomware
the disassembly process by extracting the features directly prevention, Inf. Syst. Secur. 16 (2007) 195–202.
from raw data. Table 5 shows the comparison between the [5] M. Weckstén, et al., A novel method for recovery from crypto
proposed technique and Zhang et al. [13] technique. It is ransomware infections, in: 2016 2nd IEEE International Conference
evident from the comparison that the present study shows more on Computer and Communications (ICCC), 2016, pp. 1354–1358.
accuracy than Zhang with a shorter time of prediction. [6] A. Kharaz, et al., {UNVEIL}: A large-scale, automated approach
to detecting ransomware, in: 25th {USENIX} Security Symposium
({USENIX} Security 16), 2016, pp. 757–772.
6. Conclusion [7] Y. Takeuchi, et al., Detecting ransomware using support vector ma-
This study presented an approach based on machine learn- chines, in: Proceedings of the 47th International Conference on Parallel
Processing Companion, 2018, p. 1.
ing technique (random forest classifier) to detect ransomware
[8] R. Vinayakumar, et al., Evaluating shallow and deep networks for
attack. The current study has tested different sizes of tree and ransomware detection and classification, in: 2017 International Con-
seeds ranged from 10–1000 and from 1–1000 respectively. The ference on Advances in Computing, Communications and Informatics
following conclusions can be extracted: (ICACCI), 2017, pp. 259–265.
330
B.M. Khammas ICT Express 6 (2020) 325–331
[9] S. Homayoun, et al., Know abnormal, find evil: frequent pattern [24] I. Santos, et al., N-grams-based file signatures for malware detection,
mining for ransomware threat hunting and intelligence, IEEE Trans. ICEIS (2) 9 (2009) 317–320.
Emerg. Top. Comput. (2017). [25] M.G. Schultz, et al., Data mining methods for detection of new
[10] A. Tseng, et al., Deep learning for ransomware detection, IEICE Tech. malicious executables, in: Proceedings 2001 IEEE Symposium on
Rep. 116 (2016) 87–92. Security and Privacy. S & P 2001, 2000, pp. 38–49.
[11] L. Chen, et al., Towards resilient machine learning for ransomware [26] B.M. Khammas, et al., First line defense against spreading new mal-
detection, 2018, arXiv preprint arXiv:1812.09400. ware in the network, in: 2018 10th Computer Science and Electronic
[12] M. Rhode, et al., Early-stage malware prediction using recurrent neural Engineering (CEEC), 2018, pp. 113–118.
networks, Comput. Secur. 77 (2018) 578–594. [27] B.M. Khammas, et al., Metamorphic malware detection based on
[13] H. Zhang, et al., Classification of ransomware families with machine support vector machine classification of malware sub-signatures,
learning based on N-gram of opcodes, Future Gener. Comput. Syst. TELKOMNIKA (Telecommun. Comput. Electron. Control) 14 (2016).
90 (2019) 211–221. [28] B.M. Khammas, et al., Feature selection and machine learning
[14] J. Baldwin, A. Dehghantanha, Leveraging support vector machine for classification for malware detection, J. Teknol. 77 (2015).
opcode density based detection of crypto-ransomware, Cyber Threat [29] B.M. Khammas, et al., Pre-filters in-transit malware packets detection
Intell. (2018) 107–136. in the network, Telkomnika 17 (2019).
[15] K.P. Subedi, et al., Forensic analysis of ransomware families using [30] I. Ismail, et al., Incorporating known malware signatures to classify
static and dynamic analysis, in: 2018 IEEE Security and Privacy new malware variants in network traffic, Int. J. Netw. Manage. 25
Workshops (SPW), 2018, pp. 180–185.
(2015) 471–489.
[16] S.K. Shaukat, V.J. Ribeiro, RansomWall: A layered defense system
[31] I. Santos, et al., Opcode sequences as representation of executables
against cryptographic ransomware attacks using machine learning, in:
for data-mining-based unknown malware detection, Inform. Sci. 231
2018 10th International Conference on Communication Systems &
(2013) 64–82.
Networks (COMSNETS), 2018, pp. 356–363.
[32] M. Shankarpani, et al., Computational intelligent techniques and
[17] A. Ferrante, et al., Extinguishing ransomware-a hybrid approach
similarity measures for malware classification, in: Computational
to android ransomware detection, in: International Symposium on
Intelligence for Privacy and Security, ed: Springer, 2012, pp. 215–236.
Foundations and Practice of Security, 2017, pp. 242–258.
[18] C. Moore, Detecting ransomware with honeypot techniques, in: 2016 [33] K. Singh, et al., Big data analytics framework for peer-to-peer botnet
Cybersecurity and Cyberforensics Conference (CCC), 2016, pp. 77–81. detection using random forests, Inform. Sci. 278 (2014) 488–497.
[19] E. Kolodenker, et al., PayBreak: defense against cryptographic ran- [34] K. Singh, B. Nagpal, Random forest algorithm in intrusion detection
somware, in: Proceedings of the 2017 ACM on Asia Conference on system: a survey, 2018.
Computer and Communications Security, 2017, pp. 599–611. [35] D. Bhalla, Random forest tutorial, 2014, Available: http://www.listen
[20] N. Scaife, et al., Cryptolock (and drop it): stopping ransomware data.com/2014/11/random-forest-with-r.html.
attacks on user data, in: 2016 IEEE 36th International Conference [36] H. Takase, et al., A prototype implementation and evaluation of the
on Distributed Computing Systems (ICDCS), 2016, pp. 303–312. malware detection mechanism for IoT devices using the processor
[21] A. Continella, et al., ShieldFS: a self-healing, ransomware-aware information, Int. J. Inf. Secur. 19 (2020) 71–81.
filesystem, in: Proceedings of the 32nd Annual Conference on [37] Virus Total - Intelligence Search Engine, http://www.virustotal.com.
Computer Security Applications, 2016, pp. 336–347. [38] https://portableapps.com/apps.
[22] I. Ahmad, et al., Performance comparison of support vector machine, [39] H. Hashemi, et al., Graph embedding as a new approach for un-
random forest, and extreme learning machine for intrusion detection, known malware detection, J. Comput. Virol. Hacking Tech. 13 (2017)
IEEE Access 6 (2018) 33789–33795. 153–166.
[23] Y. Meidan, et al., Detection of unauthorized IoT devices using machine [40] M.A.M. Hasan, et al., Feature selection for intrusion detection using
learning techniques, 2017, arXiv preprint arXiv:1709.04647. random forest, J. Inf. Secur. 7 (2016) 129.
331