Module 1 The Role of Machine Learning in Cyber Security
Module 1 The Role of Machine Learning in Cyber Security
Machine Learning (ML) represents a pivotal technology for current and future information systems, and many domains
already leverage the capabilities of ML. However, deployment of ML in cybersecurity is still at an early stage, revealing a
significant discrepancy between research and practice. Such a discrepancy has its root cause in the current state of the art,
which does not allow us to identify the role of ML in cybersecurity. The full potential of ML will never be unleashed unless
its pros and cons are understood by a broad audience.
This article is the first attempt to provide a holistic understanding of the role of ML in the entire cybersecurity domain—to
any potential reader with an interest in this topic. We highlight the advantages of ML with respect to human-driven detection
methods, as well as the additional tasks that can be addressed by ML in cybersecurity. Moreover, we elucidate various intrin-
sic problems affecting real ML deployments in cybersecurity. Finally, we present how various stakeholders can contribute
to future developments of ML in cybersecurity, which is essential for further progress in this field. Our contributions are
complemented with two real case studies describing industrial applications of ML as defense against cyber-threats.
Additional Key Words and Phrases: Cybersecurity, incident detection, machine learning, artificial intelligence
1 INTRODUCTION
With the rising complexity of modern information systems and the resulting ever increasing flow of big data,
the benefits of Artificial Intelligence (AI) are now widely recognized. Specifically, Machine Learning (ML)
methods [85] are already deployed to solve diverse real world tasks—especially with the advent of deep learn-
ing [98]. Fascinating examples of practical achievements of ML are machine translation [168], travel and vacation
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:2 • G. Apruzzese et al.
recommendations [77], object detection and tracking [139], and even various applications in healthcare [57]. Fur-
thermore, ML is rightly considered to be a technology enabler, as it has shown great potential in the context of
telecommunication systems [114] or autonomous driving [8].
Nevertheless, modern society is increasingly relying on Information Technology (IT) systems—including
autonomous ones—which are also actively leveraged by malicious entities. Digital threats are, in fact, continu-
ously evolving [90], and according to Gartner attackers will have sufficient capabilities to harm or kill humans
by 2025 [3]. To prevent such incidents and mitigate the plethora of risks that can target current and future IT
systems, defensive mechanisms require the capability to quickly adapt to the (i) mutating environments and
(ii) dynamic threat landscape.
Coping with such a twofold requirement via static and human-defined methods is clearly unfeasible, and
deployment of ML in cybersecurity is inescapable. Not surprisingly, abundant work addressed integration of ML
in cybersecurity, as evidenced by recent survey papers (e.g., References [23, 36, 71]) and technical reports (e.g.,
References [35, 106]). Despite impressive results in research settings, however, the development and integration
of ML in production environments is progressing at a slow pace. A recent survey [93] shows that although over
90% of companies already use some AI/ML in their defensive tools, we observe that most of these solutions still
leverage “unsupervised” methods (e.g., References [2, 97]) and mostly for “anomaly detection.” Such observation
demonstrates a drastic discrepancy between research and practice, especially in comparison with other domains
where ML has already become an indispensable asset.
The peculiarity of the security domain is that all operational decisions—made by the top management—are
about the tradeoff between losses and losses [83]. In simple terms, the rationale is “paying x to avoid paying
y x.” Investment in security should be justified by the prevention of substantially higher but ultimately
unpredictable losses from security incidents. Hence, decision makers must have a clear understanding of the
(i) benefits, (ii) problems, and (iii) challenges of a cybersecurity solution before endorsing their adoption in prac-
tice. However, the current state of the art of ML for cybersecurity fails to deliver such understanding. Taken
individually, research papers—commonly claiming to outperform previous work—often lead to contradictory re-
sults. For instance, Reference [166] shows that deep learning methods outperform “traditional” ML methods, but
the opposite is claimed in Reference [134] in the exact same setting. Furthermore, existing literature surveys re-
lated to ML in cybersecurity do not provide a holistic coverage suitable for operational decisions. Some of them
are too technical and hence tailored for ML experts (e.g., Reference [180]), others focus only on research efforts
neglecting real-world implications (e.g., Reference [23]) or have a limited scope (e.g., only deep learning [36]).
As a result, the role of ML in cybersecurity is portrayed in a highly fragmented way, thus hindering deployment
of ML in practice—despite its great potential for cybersecurity.
We attempt to rectify this problem. Specifically, this article is the first effort to provide a comprehensive anal-
ysis of the role of ML in cybersecurity. We distill scientific knowledge and industrial experience related to de-
ployment of ML within the entire domain of cybersecurity. One of our goals is to make the current state of the
art understandable to any reader, irrespective of their prior expertise in cybersecurity or ML. We also take this
opportunity to clarify many misconceptions related to ML in the context of cybersecurity. We highlight the bene-
fits of using ML in cybersecurity by listing all the tasks where it outperforms or provides novel capabilities with
respect to traditional security mechanisms. We also elucidate the intrinsic problems of ML in the cybersecurity
context. Such an analysis reveals the challenges that require the joint contribution of all relevant stakeholders
to improve the quality of ML-driven security mechanisms.
Let us explain how we achieve our objective and outline the structure of our article, which comprises several
self-contained sections. We begin (Section 2) by introducing the key concepts of the ML paradigm in a notation-
free form. We also define the intended audience of this article and outline the differences of our work from
previous literature surveys and reports.
Then, in Section 3, we present the most emblematic application of ML in security: cyberthreat detection. We
distinguish between three broad areas: network intrusion detection, malware detection, and phishing detection,
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:3
which is common in related literature [23, 163]. The goal of this section is to highlight the added value of ML
with respect to traditional detection mechanisms.
Next, in Section 4, we elucidate the cybersecurity tasks orthogonal to threat detection that can exploit the
capabilities of ML to analyze unstructured data. In contrast to detection problems that require (costly) labels, raw
data are abundant in cybersecurity and can also be exploited via ML. For instance, alerts can be filtered to remove
annoying false alarms or compressed into more manageable reports. Furthermore, information from diverse
sources can be cross-correlated to anticipate novel attacks or to identify the weak-spots of a given organization.
The goal of this section is to illustrate that there exist many (and vastly unexplored) additional areas in which
ML can be deployed to enhance the security of modern systems.
We continue (Section 5) by emphasizing the intrinsic problems of cybersecurity applications of ML. Some
of these problems (e.g., concept drift, adversarial examples, confidentiality) are fundamental and arise from the
contrasting assumptions of cybersecurity and ML. Further problems are specific to either in-house development
(e.g., hidden maintenance costs) or commercial products (e.g., limited scope and transparency). The goal of this
section is highlighting that ML is not perfect and real deployments involve many tradeoffs, which must be known
(to decision makers), mitigated (by ML engineers), and addressed (in future work).
As our main constructive contribution, we outline the impending challenges of ML in cybersecurity in
Section 6. Solving these challenges will strongly facilitate the operational deployment of ML in cybersecurity.
However, it requires the joint effort of (i) regulatory bodies, (ii) corporate executives, (iii) ML engineers and
practitioners, and (iv) the scientific community. Our takeaway is that rectifying the current immaturity of ML
in cybersecurity requires a radical re-thinking of future technological developments. For instance, research
efforts should focus on more pragmatic results instead of merely “outperforming the state of the art.” However,
such efforts necessitate an increased availability of real data whose disclosure requires authorization by senior
management, as well as potentially new regulations that enable public release of such data.
To establish a connection between research and practice, we discuss two real industrial applications of ML in
cybersecurity in Section 7. We note that commercial security products are typically provided as “black boxes”
with little technical details about the actual implementation of ML. This section sheds light into the operational
tradeoffs and “tricks of the trade” needed to meet the practical needs of the customers. These case studies are
provided with the contribution of Montimage and S2Grupo.1
This article is a result of collaboration among researchers, industry practitioners, and policy-makers. Our find-
ings reflect the insights from both recent technical reports and scientific literature. To the best of our knowledge,
no previous work combines such a broad scope with our heterogeneous intended audience.
Contribution. Our main goal is to foster the deployment of ML in cybersecurity by bridging the gap between
research and practice. Specifically, our article makes the following contributions:
• it provides an overview of the benefits and problems of ML in the entire cybersecurity domain;
• it considers the twofold perspective of the research and industrial community;
• it identifies many misconceptions that are becoming common in this field;
• it highlights how (i) regulatory bodies, (ii) corporate executives, (iii) engineers, and (iv) the research com-
munity can contribute to future developments of ML in cybersecurity.
• it elucidates two real deployments of ML products.
Furthermore, this article is meant to be understandable by any reader, irrespective of their technical expertise.
1 The names of all authors, companies and vendors were anonymised during the reviewing process.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:4 • G. Apruzzese et al.
Fig. 1. Machine Learning development. After collecting some training data and analyzing such data via an ML algorithm, an
ML model is obtained. Such an ML model must be tested via some validation data. If the performance of such an assessment
is appreciable, then the ML model can be deployed in production.
classes of existing ML methods. We then define the scope and target audience of this article (Section 2.2) and
highlight the differences of our effort with respect to previous work (Section 2.3).
2 The notion of a “machine” refers to a software component that can be deployed on any computing device, even in the cloud.
3 In time-series forecasting [44], the learning is done by analyzing the past history of a given phenomenon, which is used to make the future
predictions. Such history can be seen as the training data, where each element is associated to its timestamp and its known value (i.e., the
label).
4 Such a mechanism only requires defining the “actions” that can be taken by the ML model and the “reward” that should be provided to the
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:5
Fig. 2. Typical machine learning algorithms. An algorithm can be “deep” if it relies on neural networks; otherwise, it is “shal-
low.” Algorithms requiring labelled data are used for “supervised” tasks; otherwise, they can be used also in “unsupervised”
tasks.
We provide an overview of some of the most popular ML algorithms in the above-mentioned categories in
Figure 2. For a more comprehensive description, we refer the reader to Reference [23].
Finally, let us briefly address the performance assessment of ML models. The most common quality measure
is the accuracy metric, which represents the percentage of correct predictions made by the ML model. However,
accuracy can be misleading in the presence of imbalanced data distributions, which is typical in cyber-threat de-
tection, because malicious activities tend to be rare events and are (hopefully) overshadowed by benign samples.
In such a context, it is common to differentiate between “positives” (i.e., malicious activities) and “negatives” (i.e.,
benign activities). The performance can then be measured by taking into account the correct (i.e., True Positives
and True Negatives) and incorrect (i.e., False Positives and False Negatives) predictions generated by a given ML
model. A complete list of the most common performance metrics is in Table 1. Note that performance assess-
ment pertains to ML models and not methods. Depending on the specific setting—e.g., the training data, the ML
algorithm, its parameters—a ML method may yield many ML models, each having a different performance.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:6 • G. Apruzzese et al.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:7
Fig. 3. Pros and cons of supervised and unsupervised ML for cyber threat detection.
competence in ML. With respect to past works, this article represents a “meta-review” of the state of the art5 that
provides a (i) comprehensive overview and (ii) practical recommendations and research directions (iii) within the
entire cybersecurity sphere. Moreover, we (iv) clear many misconceptions that are becoming prevalent in this
domain. Finally, we (v) address all potential stakeholders—which include but are not limited to researchers. To
the best of our knowledge, no existing paper unifies all of the above in a single contribution.
5 We observe that our article includes almost 200 referenced works. However, most of such works are cited only once, i.e., in the section
devoted to the specific problem addressed by the referenced article.
6 For instance, anomaly detection can be done in an unsupervised fashion, but not all anomalies correspond to security incidents.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:8 • G. Apruzzese et al.
Fig. 4. Typical deployment of a ML-NIDS. The border router forwards all the outgoing/incoming network traffic to a NIDS,
which further analyzes such data via a ML model.
It is common to associate ML methods with anomaly detection (even recent papers suffer from such confusion,
e.g., Reference [34]). This is a misconception, because ML can be used also for misuse-based approaches [91].
Specifically, by analyzing large amounts of data, ML methods can learn the patterns differentiating benign events
from malicious ones so as to automatically define the “signatures” for misuse-based approaches. At the same time,
ML can be used for anomaly detection by automatically identifying the “normal” activities that correspond to
regular behaviors within a given environment.
Let us elucidate some successful applications of ML aimed at the detection of illicit activities that may occur
in a modern enterprise. Without loss of generality, we organize this section by distinguishing three broad cyber
detection areas: network intrusion detection (Section 3.1), malware detection (Section 3.2), and phishing detection
(Section 3.3). There are hundreds of works proposing ML for these tasks, and analyzing all such proposals is
outside our scope. Hence, we focus on some interesting and recent applications of ML, emphasizing their practical
results. Our case studies in Section 7 will consider two exemplary applications of ML for cyberthreat detection.
7 Netflow: https://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-netflow/.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:9
ML methods based on unsupervised learning are particularly appreciated, because acquiring labelled data for an
entire network is difficult [63]. Among these approaches, we highlight the results obtained by clustering methods.
For example, in Reference [26] the authors aim to detect attacks by clustering NetFlows with similar temporal
behavior and, subsequently, finding the clusters containing hosts that raised alarms from a commercial NIDS
based on manual signatures. The results showed a remarkable increase in detection performance8 with respect
to the commercial signature-based NIDS, which only detected three malicious hosts, whereas the integration of
ML allowed us to detect 12.
Unsupervised methods can also be used to support the (manual) generation of rules for misuse-based NIDS.
In CyberProbe [117], the authors cluster honeypot traffic and create specific rules for each cluster: Such rules
allowed us to detect over 75% attacks that were not included in any security feed. Some papers also exploit unsu-
pervised approaches to counter lateral movement 9 : The approach in Reference [43] can successfully detect such
instances (over 90% recall) with low FP (10%). Finally, NIDS can also benefit from deep unsupervised algorithms.
As an example, in Kitsune [113] the authors use deep learning to analyze PCAP data and improve the detection
rate from below 1% to over 95% while maintaining a low FP rate (below 0.1%). The advantages of unsupervised
ML methods make them suitable for commercial products: As an example, the method in Reference [105] is
used by Aizoon10 to support botnet detection via DNS analyses, achieving less than 0.1% FP rate. Our detailed
case study in Section 7.1 presents the deployment of unsupervised ML used by Montimage to detect anomalous
activities in a modern network.
However, approaches based on supervised learning, due to their reliance on good quality labels, are more
expensive to deploy but can also provide excellent results. For instance, Exposure [40] leverages labelled DNS
records to detect domains involved in malicious activities and achieves less than 10% false alarm rate. A notable
effort against botnets is Reference [155], where the authors collect and label some NetFlows, and then use such
labelled data to develop a ML botnet detector achieving over 95% precision. Moreover, the work in Reference [19]
proposes the usage of probability labels (instead of binary labels) to detect botnet NetFlows that may evade
traditional ML-NIDS and reach over 97% precision. Remarkable successes also include deep learning methods,
such as the approach in Reference [84], which achieves almost 95% detection rate. In particular, we highlight those
solutions that combine deep learning with temporal analyses: A twofold perspective allows us to detect additional
malicious patterns that can improve detection performance. For instance, in Reference [56] the F1-score improves
from 0.90 to 0.95 when also temporal dependencies are considered. We will present a real deployment of a similar
solution in Section 7.2, describing how S2Grupo protects Industrial Control Systems (ICS), showcasing the
pros (and cons) of ML with respect to older techniques based on heuristics.
Let us conclude with a remark: The superiority of deep learning for NIDS is not yet proven. For instance, the
authors of References [134] and [166] both evaluate shallow and deep ML methods on the same dataset (the
CICIDS17 [147]): While Reference [166] claims that deep learning outperforms traditional ML, the authors of
Reference [134] achieve the opposite result. Specifically, Reference [166] shows a “deep” neural network achiev-
ing an F1-score of 0.96 and a “shallow” decision tree achieving an F1-score of 0.95, whereas Reference [134] shows
a “deep” neural network also achieving an F1-score of 0.96, but their “shallow” decision tree reaches an F1-score
of 0.99. Our stance on this subject is that, under the assumption that deep learning is superior, the marginal
improvement does not justify its adoption due to its additional complexity and computational requirements.
8A similar approach has been successfully integrated even in a commercial product, which we cannot name due to NDA.
9 Lateral Movement: https://www.lastline.com/blog/lateral-movement-what-it-is-and-how-to-block-it/.
10 https://www.aizoongroup.com/.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:10 • G. Apruzzese et al.
Fig. 5. Malware detection via ML. In static analyses, the properties of a given file are extracted and analyzed by a ML model.
In dynamic analyses, the file is executed and the entire behavior is monitored and then analyzed by a ML model.
antiviruses can be considered as a subset of HIDS [94]. A given malware variant is tailored for a given operating
system (OS). The popularity of Windows OS made it the most common malware target for more than two
decades. However, attackers are now turning their attention to mobile devices running, e.g., Android OS.11
Malware detection can use two types of analyses: static or dynamic. The former aim to detect malware without
running any code by simply analyzing a given file. The latter focus on analyzing the behavior of a piece of
software during its execution, usually by deploying it in a controlled environment and monitoring its activities.
Both static and dynamic analyses, schematically depicted in Figure 5, can benefit from ML.
Static Analysis. These analyses are simple, particularly effective against known pieces of malware, and can
be enhanced via ML in many ways. For instance, clustering is useful to identify properties of similar pieces of
malware. A similar method is proposed in Reference [80], with the goal of finding a common treatment against
all elements in each cluster, and reaches up to 90% precision. In contrast, the authors of Reference [100] leverage
clustering to improve the detection of Android malware, and exceed 95% detection rate. Static analyses can
be further improved when labelled data are available. An early example is the detection of malicious Portable
Document Format (PDF) files in Reference [153]: Here, the authors use ML to analyze the structural properties
of PDF files, extracting features that yield proficient detection results (over 99% detection rate with less than
0.001% FP rate). Recently, a different approach leverages deep learning to transform executables into images,
which are then used to perform the detection: The authors of Reference [87] achieve over 99% accuracy in
identifying Windows malware.
Despite these successes, all static malware detection approaches are prone to evasion. This can be easily
achieved by modifying the malware executable, which can be implemented without changing its underlying
malicious logic. To aggravate the problem, advanced malware variants (e.g., polymorphic or metamorphic) auto-
matically modify their executables, defeating any static detection approach.
Dynamic Analysis. The combination of dynamic approaches with ML techniques yields effective countermea-
sures against polymorphic malware. Multiple ML solutions exploit clustering: grouping malware with similar
behavior allows us to focus only on those clusters that have not been seen before. For example, Reference [141]
proposes a dynamic approach combining clustering and anti-virus scanners to detect and sanitize entire groups
of malware variants, achieving almost perfect accuracy against Windows malware. More recently, the work in
Reference [15] focuses on Windows malware by leveraging a combination of graph and Natural Language Pro-
cessing (NLP) techniques applied to dynamic API calls and achieves 99.99% accuracy. Some papers even propose
deep learning, such as Reference [103], which uses deep neural networks to extract the most relevant dynamic
features to classify Android malware, achieving nearly 80% accuracy. Moreover, the authors of Reference [7]
apply deep learning to detect Windows ransomware and achieve 93% detection rate and 97% precision. An inter-
esting work is HeNet [52], which leverages ML for dynamic malware detection by analyzing hardware-specific
11 https://www.gdatasoftware.com/news/2019/07/35228-mobile-malware-report-no-let-up-with-android-malware.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:11
Fig. 6. Phishing detection via ML. For websites, the ML model can analyze the URL, the HTML, or the visual representation
of a webpage. For emails, the ML model can analyze the body text, the headers, or the attachment of the email.
(i.e., Intel CPU) data streams, achieving perfect accuracy on real benchmarks. Finally, it is possible to combine
static with dynamic analyses via ML: This is done in EC2 [49], which combines unsupervised with supervised
ML to detect novel android malware, achieving over 90% detection rate.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:12 • G. Apruzzese et al.
Fig. 7. Additional tasks that can be addressed via ML in cybersecurity. All such tasks mostly involve dealing with raw and
unstructured data from heterogeneous sources and provide fertile ground for ML.
HTML and URL): For nearly 1,000 squatting phishing websites, manual blacklisting only detected 9%, whereas
ML detected 70% of such phishing attempts.
Phishing Email Detection. One of the earliest applications of ML for cybersecurity involves the detection of
unsolicited emails (also often referred to as “spam”). Recent advances in NLP can be leveraged by ML to analyze
the body of an email and identify malicious intent [39].
Only few proposals leverage unsupervised ML, such as Reference [61], which achieves over 95% detection rate.
However, as it is the case for phishing website detection, acquiring ground-truth labels for emails is a trivial task,
which facilitates the deployment of supervised ML used by email providers to enhance their automatic filters [89].
For instance, Reference [9] analyzes the text of an email and reaches almost 99% accuracy with less than 0.01%
false-positive rate. The authors of Themis [67] exploit deep learning to analyze both the text and the header of
an email and exceed 99% accuracy. Finally, we mention the work in Reference [73], where the authors leverage
supervised ML to detect spear-phishing attacks by analyzing an email from different perspectives and achieve
over 90% detection rate at the cost of 1% false positives. Attachments can also be analyzed by any malware
detection technique (Section 3.2).
As a small digression, we mention that the fight against phishing (and spam) has recently moved to Online So-
cial Networks. This setting exhibits many similarities with the detection of phishing in emails, as it also involves
NLP techniques. As an example, the authors of Reference [172] use deep learning to detect malicious tweets and
obtain promising results with almost 95% detection rate but with a 5% false-positive rate. Similarly, MalTP [96]
specifically focuses on tweets luring victims to phishing websites, achieving over 95% detection rate and nearly
90% precision.
Takeaway. Using ML for cyberthreat detection has proven to be greatly successful (e.g., References [52, 113,
159]).
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:13
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:14 • G. Apruzzese et al.
determined via expert knowledge. Such a problem was overcome with the advent of deep learning. A prominent
example is DeepLog [64], which analyzes heterogeneous log data (e.g., Hadoop, or OpenStack logs) with a similar
objective as Beehive. DeepLog achieves impressive results in a lab environment, with close to 100% detection
rate after training on only 1% of the available data.
Labelling Optimization. Many threat detection techniques (Section 3) rely on supervised ML, which may re-
quire huge amounts of labelled data. Such a requirement prevents their applicability in real scenarios, because
manual labelling can be prohibitive—especially in Network Intrusion Detection. In contrast, unlabelled data are
common in cybersecurity, and many efforts proposed semi-supervised learning methods to increase the “return”
of small sets of labelled data and hence enable deployment of fully supervised ML methods [29]. For instance,
the botnet detector in Reference [184] reaches an F1-score of 0.83 with only 2,400 labels; in contrast, the detector
in Reference [21] reaches and F1-score of 0.95 on the same network scenario but requires millions of labelled
samples. A parallel line of research leverages the so-called active learning paradigm. The idea is to use a ML
model (trained on a small labelled dataset) to “suggest” which samples should be labelled in a (large) unlabelled
dataset to maximize its “learning rate.” As an example, Reference [183] shows that it is possible to save significant
labelling effort (from 30% up to 90%) by providing the ground truth of only a restricted amount of samples. An
intriguing property of active learning is that it can be used even for already-deployed ML models by following
the so-called lifelong learning principle: For instance, Tesseract [130] can boost its performance from 57% to 70%
after being retrained on 700 samples “actively labelled” by a human expert.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:15
that a crucial aspect in the protection of enterprises revolves around the value of the items being considered:
Hence, ML methods for cyber threat intelligence should be configured so as to prioritize the protection of the
most business-critical infrastructures. Failure to take this into account may limit the usefulness of ML.
Nevertheless, applications of ML for threat intelligence can leverage either internal or external data sources
(or both).
Internal Sources. Foreseeing future attack strategies via ML can be done with exclusive reliance on internal
corporate data. For instance, Reference [158] leverages ML to artificially create alerts corresponding to past
cyberattacks and then use such alerts to study an attacker’s behaviour—potentially by using additional ML so-
lutions. As an example, SAGE [116] exploits ML to compress over 300k individual alerts in less than 100 “attack
graphs” representing the specific steps of an entire offensive strategy. Another possibility is to use deep learning
to “disassemble” some code executables, allowing us to identify some potentially malicious patterns that can
reappear in future malware: For instance, EKLAVIA [54] achieves a remarkable 80% accuracy in such a task. Fi-
nally, internal and external data sources can be mixed: The authors of Reference [88] exploit historical malware
information (provided by Symantec) to foresee how future malware could affect a corporation, and their ML
solution provided up to 4 times as many correct predictions as non-ML baselines.
External Sources. It is possible to use ML for the so-called open source intelligence. For example, the authors of
Reference [146] focus on security incidents mentioned on Twitter. Their ML approach identified many malicious
activities occurring in 2016, such as the Mirai botnet (October 2016) or the data breach at AdultFriendFinder
(November 2016), where over 400 million accounts were exposed. Similarly, the deep learning method in Refer-
ence [165] analyzed tweets to study the development of ransomware attacks. It is also possible to use information
from security feeds, such as the Common Vulnerability Score (CVS) stored on well-known databases.13 For in-
stance, in Reference [51] the authors use ML to predict the CVS with almost 1 week earlier than traditional cyber-
security feeds. Prediction of the CVS with ML can also be done via darkweb data as shown in Reference [12]. The
authors use ML to crawl underground forums and correlate meaningful information with vulnerability descrip-
tions. By validating the results via third-party signatures (e.g., Symantec), the proposed ML method successfully
predicted the exploitability for about 40% of recorded vulnerabilities compared to about 10% of common feeds.
Automated analyses via ML of underground forums (in different languages) aimed at uncovering “cyber-criminal
markets” are also performed in Reference [135], allowing us to infer the prices of malicious exploits. Finally, we
even mention the existence of patents that leverage ML to predict cyberattacks in modern environments [126].
Takeaway. There are many tasks complementary to threat detection that can be covered by ML. The main
challenge lies in obtaining relevant information from unlabelled (e.g., References [29, 125]) or unstructured
data coming from heterogeneous sources (e.g., References [12, 64, 173]). Such a challenge, however, also rep-
resents an intriguing opportunity.
13 An example is the CVE database, storing vulnerabilities as well as their exploitance likelihood: https://cve.mitre.org/.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:16 • G. Apruzzese et al.
Fig. 8. Problems of ML in cybersecurity. Some are specific to either in-house solutions or to commercial-off-the-shelf (COTS)
ML products. Others are shared by both of these categories.
and we conclude with the problems related to the adoption of commercial-off-the-shelf (COTS) ML products
(Section 5.3).
We stress that all problems herein described are intrinsic: They can be mitigated to some degree, but the current
state of the art does not allow us to completely resolve them.
14 For instance, hundreds of apps in well-known marketplaces were recently found to be malicious [92].
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:17
Fig. 9. Machine leaning in the presence of concept drift. The ML model expects that the data will not deviate from the one
seen during its training. In cybersecurity, however, the environment evolves, and adversaries also become more powerful.
Fig. 10. Typical adversarial attack against a deployed ML model. By inserting tiny perturbations in the input data, it is
possible to fool a ML model and induce an incorrect prediction.
deployment of ML also exposes to the threat of adversarial samples [154], which specifically target ML systems.
Such a threat, schematically depicted in Figure 10, involves applying tiny “perturbations” to some input data
with the goal of compromising the predictions of a ML model. Even imperceptible modifications can affect
proficient cybersecurity ML detectors. For instance, Reference [25] evaded 20 ML botnet detectors by appending
a few bytes of junk data to some network communications; whereas References [133] and [154] showed a
similar effect against ML malware detectors. Even commercial products are affected, such as Google Chrome’s
phishing detector [101]. There exist a wide array of strategies to carry out attacks based on adversarial samples,
which can affect either the pre- or post-deployment phase of a ML model [24, 154]. Despite the proposal of many
countermeasures against adversarial samples, (e.g., References [19, 76]), no universal solution has been found
so-far, and some mechanisms can even decrease the baseline performance (as shown in References [21, 59]).
The best defense, according to Biggio and Roli [39], is a proactive approach: The adversary must be anticipated
and evaluated (and, possibly, countered) before ML deployment.
To further stress the importance of such a threat, let us clear two misconceptions:
• it is common to refer to adversarial samples as “illegitimate.” Such a notation is wrong from a security stand-
point: Any sample (adversarial or not) analyzed by a ML model is considered as legitimate (i.e., trusted) by
the underlying system that forwarded such a sample to the ML model. What is illegitimate is the attack, i.e.,
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:18 • G. Apruzzese et al.
the application of a perturbation that is specifically crafted to thwart a ML model—but not the adversarial
sample.15
• in related literature, it is common to search for the “minimal” perturbation that allows a sample to thwart
a target ML model. However, real attackers are not subject to such constraint.16
The latter observation is crucial for demystifying the effectiveness of the so-called certified defenses [138], which
only work if the perturbation is minimal or restricted within a very small boundary.
Confidentiality. The cybersecurity domain is characterized by its sensitivity to data-privacy, representing a
strong barrier for long-term reliance on ML. Let us provide a few examples. The increasing usage of encryption
can make some ML systems simply unusable. For instance, a ML-NIDS that inspects the payload of HTTP traffic
will not work if the traffic is encrypted via HTTPS—and HTTPS is increasingly replacing the insecure HTTP
protocol worldwide. Such a problem can also affect other use-cases of ML, such as phishing email detectors:
If the emails are encrypted (e.g., via PGP), then it is impossible to analyze their contents with ML. Another
problematic scenario can involve the analysis of confidential data: The constant changes in data regulation (e.g.,
the GDPR [167]) make it difficult to identify data that can be reliably used in the long term. For instance, consider
the approach in Reference [177] (cf. Section 4.3), which leverages (among others) user information to estimate
the infection risk. Such an approach could not be applied today without the explicit consent of all the users
of a company. Moreover, both of these issues (confidential and encrypted data) also impair labelling procedures,
because it is not possible to (manually) verify the ground truth of a sample if such a sample cannot be “seen” by a
human expert. Finally, it is understandable that enterprises do not want to publicly disclose their data, generating
an overall shortage of publicly available datasets that can be used to evaluate ML systems [147]. Although this
latter problem primarily affects research, it also implicitly affects practice, because showing a ML system that
works in different settings can foster its adoption in real scenarios. We discuss potential solutions to the limited
data availability in Section 6.2.
15 To provide a concrete example, let us consider [22]: It is legitimate to increase the size of network communications, but it is illegitimate
to do so with the intent of thwarting a ML model. However, a ML model considers all analyzed samples as trusted, because the ML model is
oblivious of the intent of the data generation process.
16 For instance, in Reference [22] adding 1 KB of data is more effective than adding only 1B. Hence, a real attacker is more likely to add 1 KB
than just 1 B.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:19
to label more than 80 malware samples per day. For reference, the initial deployment of Tesseract [130] required
50,000 labelled samples. Unlabelled data may be easier to acquire, but as shown in Section 4 it can come from
heterogeneous sources and be in different formats, requiring a detailed preprocessing pipeline to collect, store,
and forward such data to the ML model. Furthermore, the iid assumption (cf. Section 5.1) prevents a reliable use
of data originating from different environments [149], hence even the (few) publicly available data can have ques-
tionable effectiveness. Finally, a common misconception is thinking that the performance of a ML model is linearly
dependant on the size of its training data17 : In some cases, smaller datasets can yield to superior ML models—we
will show this in our case studies (Section 7.1). Nevertheless, any given dataset must also be balanced: In real
environments, a malicious event is a rare occurrence and a given dataset should reflect such distribution [170].
To aggravate all of the above, it is not possible to determine a priori which combination (algorithm, fea-
tures, dataset, balancing) yields the best performance after deployment. Hence, empirical and time-consuming
evaluations—by training and testing multiple ML models—are always a necessity. As a result, finding the most
optimal tuning for real deployments may require a huge amount of manual effort by trial-and-error.
Constant Maintenance. To mitigate the disruptive effects of concept drift (Section 5.1), it is fundamental to
continuously update a given ML solution with data reflecting the current trends. Such procedures are costly
but can be alleviated via lifelong learning solutions (cf. Section 4.2). However, a common misconception is that
“update” procedures simply entail finding (and, if necessary, labelling) new data. This is an underestimation,
because such procedures also necessitate (i) deciding what to do with “old” data and (ii) finding the “sweet spot”
that yields the adequate performance. Indeed, maintaining old data can be detrimental in some cases (e.g., if
some “benign” samples are discovered to be “malicious”), but completely removing it can also adversely affect
the performance (e.g., some “old” phenomena can reappear in the future). Nonetheless, even small changes in the
training data can decrease the performance of an ML system (e.g., this is the fundamental principle of poisoning
attacks [24]). These issues require additional manual labour through trial-and-error.
A potential mitigation for all such tuning operations (both pre- and post-deployment) may come in the de-
velopment of techniques focused on explaining the decisions of ML systems (e.g., Reference [107]), which are
currently difficult to interpret—especially for deep learning [14]. This is an intriguing direction of research, which
has very recently also touched the area of adversarial ML (e.g., References [16, 60])
Limited Scope. Relying on third-party solutions limits any end-user to their intended scope, meaning that some
tasks simply cannot be accomplished with products currently on the market. For instance, any commercial ML
model cannot be trained on the exact data used by an organization—at least initially. The organization can allow
the vendor to collect their data and use such data to refine the ML model; however, this may not be possible due
to confidentiality reasons (Section 5.1). Therefore, some commercial solutions can be used only if the deployment
environment (of the organization) resembles the pre-deployment environment (of the vendor) used to generate
the data for the corresponding ML model. For example, phishing websites are malicious “everywhere,” meaning
that it is possible to transfer [179] ML phishing detectors. However, such a transfer cannot be easily done for
other cybersecurity tasks, such as NIDS [27]. This is because every network is unique [149], and a malicious
17 According to the founder of Deep Learning, Andrew Ng, this is also becoming true for Deep Neural Networks [152].
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:20 • G. Apruzzese et al.
behavior in one network can be benign in a different network. Due to such an issue, most COTS products leverage
(unsupervised) ML and mostly for anomaly detection (e.g., References [2, 97]).
Lack of Transparency. COTS solutions come as a “black box,” and the decision to deploy such solutions de-
pends on their advertised performance. This fact leads to many issues, all sharing a common culprit: the cost of
misclassifications in cybersecurity. In some domains, incorrect predictions do not have severe consequences: For
instance, a recommender ML system (e.g., the one in AirBnb [77]) that makes an incorrect recommendation is
not a cause of concern. In contrast, in cybersecurity a single FN can be the difference between a compromised
and a secure system. At the same time, both employees and security analysts are annoyed by false alarms, which
can even be exploited by attackers to conceal more severe threats [53]. By considering the performance met-
rics reported in Table 1 (cf. Section 2.1), we remark that each metric focuses on a single aspect, and even good
scores can be meaningless if not contextualized.18 Nonetheless, even if a COTS ML solution is fully transparent
(i.e., all metrics are reported and contextualized), the performance will always refer to the environment of the
vendor, which is likely to differ from the real deployment setting. Finally, we mention that—to the best of our
knowledge—no COTS ML solution (including those not pertaining to security tasks) reports its robustness to
potential adversarial attacks, which is a severe deficiency in cybersecurity scenarios.
Takeaway. In cybersecurity, ML can provide great benefits but also presents many risks due to the intrinsic
adversarial setting and the dynamic ecosystem. Such risks must be taken into account today and should be
addressed by future works.
18 Asan example, consider a detector evaluated on a dataset containing 9,990 benign samples and 10 malicious samples: Accuracy of 99.99%
can be obtained by only detecting 1 malicious sample (of 10), despite its inability to detect 90% of the attacks. Another example is an FPR
of 1%: It may appear low, but if the environment generates 300k alarms (as in Reference [116]), then the FPR corresponds to 3,000 false
alarms. Note that also the inverse is true: An increment of just 1% in the TPR can be either an almost negligible or an extremely significant
performance boost.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:21
Fig. 11. Future challenges of machine learning in cybersecurity. Addressing all such challenges requires the cooperation of
four players: regulatory bodies, corporate executives, engineers, and researchers.
Performance Certification. Comprehensive testing represents the only instrument for performance verification
of a ML system. However, despite hundreds of works, there is a lack of standardized evaluation protocols. This
is a problem especially for COTS products, as performance assessments may be carried out in biased environ-
ments or may consider unfair comparisons that inflate the results to favor a given ML solution. Meaningful
assessments must consider the realistic distribution of data and take into account the (likely) temporal shift.
Traditional cross-validation techniques, typical for ML in the computer vision domain, should be used only for
tuning: Specifically, the performance should be validated via statistical tests. Establishing standardized evalu-
ation protocols would foster pragmatic and fair comparisons, promoting overall ML deployment in practice.
Nevertheless, the full details of such operations (e.g., the data used, the evaluation methodology, and the final
results) should be transparent to the customers of COTS ML systems.
Robustness Certification. The increased interest toward ML led to (scientific) investigations of its robustness in
adversarial scenarios, bringing to light the vulnerability to adversarial examples (Section 5.1). Yet no universal
solution has been found so far, with some defenses being broken in the time span between their appearance as a
preprint and their publication as a peer-reviewed article.19 The first step to solve this problem is to acknowledge
that no ML solution is flawless. Indeed, to quote a recent survey on the cybersecurity perspective of European
stakeholders [69]: “security of ML and adversarial attacks was not mentioned as one of the key challenges by the
interviewees,” which epitomizes that such a threat is not perceived by the end-users of ML solutions. To address
these issues, assessments of adversarial robustness must become mandatory in evaluations of any ML-based
solution for cybersecurity. The most likely security risks, and their potential consequences, should be known
before real ML deployments. Moreover, all the details of such assessments should be transparently provided.
Recommendation: To ensure better transparency and reliability, regulatory bodies must enforce the devel-
opment and adoption of standardized procedures that certify the performance and robustness of ML systems.
19 For instance, defensive distillation was proposed in 2016 [128] and broken few months later [48].
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:22 • G. Apruzzese et al.
regulation authorities: The former should promote data sharing, and the latter should devise more actionable
data regulation policies.
Data Sharing. A solution to the lack of adequate data is the promotion of data-sharing practices. In cybersecu-
rity, some portions of data can be easily shared: For instance, Sophos has recently released over 20M (labelled)
malware samples [78]; similarly, the recent CrimeBB dataset [129] contains 1 million accounts crawled from
darkweb forums for 10 years. In contrast, other pieces of data (especially benign data) are more confidential and
hence their disclosure requires explicit permission from corporate executives. Acquiring such permission is a tough
barrier, especially due to privacy and secrecy issues. However, we observe that sensitive information can be
anonymized (e.g., Reference [136]), and recent advances in federated learning overcame such problems [57].
There indeed exist some success stories of data-sharing platforms focused on security information, such as
the EU-OF2CEN project [151]. Similar platforms represent a great opportunity for some companies, as they
open the doors to a new market entirely dedicated to ML datasets, potentially with (updated) ground truth
(e.g., Reference [150]). From this perspective, a promising initiative is STIX CyBox [143]: Its goal is creating
a threat intelligence platform shared by multiple parties, facilitating the entire process of incident detection
and response. Nonetheless, such platforms must (i) contain unbiased data—otherwise, there a the risk of
manipulating future developments [109]—and (ii) comply with the existing regulation, hence requiring the
involvement of the respective authorities.
Actionable Data Regulations. The strategical importance of data gave birth to multiple regulations that “pro-
tect” data owners and limit abuse of sensitive information. Despite ensuring more privacy rights, such regula-
tions introduced additional constraints on data gathering and processing, resulting in yet another barrier to ML
developments—both for research and practice. Specifically, the (already costly) data-labelling procedures are cru-
cially affected by such regulations (Section 5.1). Even if action is taken by executives to disclose their corporate
data, existing regulation policies are difficult to interpret and likely to change in the future [124]: For instance, in-
formation that can be shared “today” may not be shareable “tomorrow,” hindering long-term projects. However,
we observe that some GDPR compliant data-sharing platforms exist (e.g., Reference [79]). Hence, the regulatory
authorities should promote such efforts even in the cybersecurity context, for instance by providing actionable
policies that ensure the compliance of (open) data in the long term.
Recommendation: To address the shortage of adequate data, companies should be more willing to share data
originating in their environments (e.g., Reference [151]), whereas regulation authorities should promote such
disclosure by defining proper policies and incentives [124].
Pragmatic Results. One of the primary goals of research is to “outperform the state of the art.” In the context
of ML, such a goal requires us to propose a novel ML method and then show that this method achieves a better
performance than prior works—an objective that can be achieved without providing any “true” contribution
to the state of the art. For example, by slightly changing the training data it is possible to achieve a superior
performance; similarly, an existing solution may be sub-optimally reproduced (by using, e.g., a different dataset,
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:23
or different tuning parameters). Note that all such “flaws” can be unconsciously introduced by researchers.20 This
phenomenon, also referred to as benchmark lottery [58], results in an overall confusion on what really works best
and impairs real ML deployments. Among the main culprits of such a phenomenon is the poor reproducibility of
researches, as very few works disclose the entire information required to replicate their experiments. Therefore,
novel researches cannot properly reproduce previous works, and the peer-review process cannot assess whether
the experimental protocol is correct and unbiased. At the same time, however, we point out that most scientific
venues do not allow (or require) inclusion of any supplementary and technical resource. Hence, even researchers
must face a difficult decision about what low-level information should be included in the actual submission—
which is subject to page limitations.
Recommendation: The peer-review process should facilitate and enforce the inclusion of the material for
replicating ML experiments. At the same time, such a material should be evaluated to ensure its correctness—
potentially by a separate set of reviewers with more technical expertise.
Realistic Security Scenarios. As a direct consequence of the benchmark lottery phenomenon, many research
papers simply focus on providing “better numbers” than past work, overlooking the assumptions made by such
past work. In the context of cybersecurity, this is a problem, because realistic circumstances must be considered,
and any result that stems from unrealistic scenarios is of questionable value. For instance, there is a superficial
treatment of training data: Only few papers (e.g., References [18, 130]) consider the concept drift, which is in-
trinsic in cybersecurity; moreover, many recent papers (e.g., Reference [156]) still use outdated datasets, such as
the NSL-KDD, which is over 20 years old and does not reflect any current environment. The result is that all
papers propose ML methods that achieve near-perfect performance—but what is the practical impact of all such
research? We acknowledge that public (labelled) data are difficult to acquire, but over the past few years several
datasets have been openly released (e.g., References [115, 147]). The impression is that the cybersecurity setting is
turning into a yet-another research playground where new ML methods are evaluated on some “security-related”
data, but realistic security considerations are only made in the introduction to provide some justification for a
given publication venue. Specifically, there is a lack of realistic threat models. Such lack is epitomized in the
emerging field of adversarial ML (Section 5.1), where most attacks against security systems assume extremely
powerful opponents. For instance, the authors of Reference [20] show that the majority of attacks against ML-
NIDS require adversaries with direct access to the ML-NIDS itself, which is an assumption that violates the basic
security principles. Similarly, Reference [133] show that adversarial attacks have a different effectiveness when
the opponent cannot manipulate the data-processing pipeline (which is usually not accessible). Hence, it is not
surprising that the industrial stakeholders are either confused or do not care about adversarial examples—as
evidenced by two recent surveys [69, 95] and the detailed case study in Reference [42].
Recommendation: Future researches on ML applications for cybersecurity should have a closer connection
with the real world. The assumed threat model should be realistic, the dataset should resemble recent trends,
and the concept drift should be taken into account.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:24 • G. Apruzzese et al.
protect against all threats that can target modern organizations.21 Addressing all such issues is possible by orches-
trating diverse ML solutions. Indeed, any ML model (irrespective of its goal) ultimately represents just a single
component of a cybersecurity system—which can be a “hybrid” system that leverages also non-ML techniques.
However, such orchestration requires the expertise of ML engineers, who must coordinate different outputs to
extract actionable information. Specifically, ML (and non-ML) models can be combined either in an ensemble or
in a pipeline architecture, depending on the final goal of the system.
Ensemble architecture. One of the most proficient ways to combine different ML models is the so-called ensem-
ble [38]. The idea is leveraging many simplified learners with a common goal: Each ML model of the ensemble
analyzes the same data but by focusing on a specific problem. For instance, it is possible to create ML-NIDS using
ensembles of ML models, in which each model has the same goal (i.e., intrusion detection) but focuses on a spe-
cific threat (e.g., botnet or Denial of Service (DoS) attacks [113]). Despite the proven performance benefits of
such architectures, a tough challenge faced by engineers is the lack of standardized feature sets that can be used
to devise all such systems. Each model of the ensemble must ultimately analyze the same data, and, depending
on the features provided as input, the performance can greatly differ (as shown in Reference [41]). Our industrial
case studies in Section 7 consider a similar architecture.
Pipeline architecture. When the system envisions ML models having systematically different inputs and out-
puts, they must be organized in a pipeline architecture. For example, it is possible to create an ensemble of ML
models for threat detection (Section 3) and then use their outputs for threat intelligence (Section 4). Similar sys-
tems already exist, either as COTS products (e.g., SIEM22 or SOARS23 ) or as scientific proposals: For instance,
ARCUS [175] is a security-focused orchestration platform that could benefit from the integration of many of the
ML solutions discussed in this article. However, such architectures are challenging to implement by engineers:
Each individual component is affected by all the issues presented in Section 5, therefore multiplying their impact.
Recommendation: Orchestrating complex systems that use (combinations of) ML and non-ML solutions is
beneficial for cybersecurity. Hence, ML engineers and practitioners should clearly highlight how to combine
all such components to maximize their practical effectiveness.
21 The most exemplary use-case are zero-day attacks, which can easily evade supervised ML methods: Zero-day samples cannot—by
definition—be included in the training data. Anomaly detection through unsupervised ML is more feasible but at the cost of many false
positives.
22 System Information and Event Managers: https://www.forcepoint.com/cyber-edu/siem.
23 Security Orchestration Automation and Response Systems: https://www.rapid7.com/solutions/security-orchestration-and-automation/.
24 The commercial nature of such systems—which are built on the end-users data—makes some low-level details to be protected by NDA, but
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:25
Scenario and Challenges. This case study focuses on the well-known ICN approach of Named Data Network-
ing (NDN) [181]. This NDN approach leverages a pull-based mechanism using two kinds of packets: Interest (a
request for a content) and Data (the response with the content). When a given user wants to retrieve some
content, the user (i) specifies the desired content’s name (e.g., “/data/video.mp4”) in an Interest, (ii) sends such
Interest through the NDN network, and (iii) receives the corresponding Data—which can be provided either by
the content producer or by any intermediate NDN node storing a copy of such Data. The practical implementation
of NDN exposes to the risk of new security attacks, such as the Content Poisoning Attack (CPA) [174]. In CPA,
a malicious producer (content creator) colludes with a malicious consumer (a user requesting content) to force
any NDN node on their path to insert malicious content in their content storage (CS), hence causing poisoning
attacks. This results in nodes answering some requests with such malicious content: For example, a victim may
ask for a specific webpage and instead be redirected to a malicious phishing website. CPA are a dangerous threat
to NDN, as shown in Reference [119]: Analyses on real system highlighted that identifying CPA is impossible
via static and human-based approaches. This is due to the intrinsic characteristics of NDN, as each node in the
network topology reacts differently. Moreover, NDN are also susceptible to Interest Flooding Attacks (IFA),
which represent a variant of DoS in which the NDN is “flooded” with interest requests [148] for existing or even
non-existing content that can disrupt the distribution of content. Although IFA are easier to identify than CPA,
countering both IFA and CPA is challenging and requires the usage of more dynamic analytical techniques—such
as ML.
Montimage ML-Solution. The ML-solution developed by Montimage leverages ensembles of ML models orga-
nized in a Bayesian Network Classifier (BNC) [120]. The intuition is that detection of CPA is only possible by
monitoring the behaviour of each node in a NDN network—and, specifically, by analyzing and cross-correlating
the evolution of different metrics for each node.
Such a goal is achieved by means of specific probes deployed on each node and monitoring its complete
activity. In particular, each probe collect metrics related to the Data plane of NDN: CS, Pending Interest Table
(PIT), Faces. The latter, in particular, are an abstraction of a communication channel that NDN uses for packet
forwarding. Such abstraction represents data coming from diverse “faces,” i.e., overlay tunnels over TCP and
UDP, delivery of NDN network layer packets (e.g., Interest, Data packets), inter-node communication channels
that send packets to other nodes, and intra-node communication channels that send packets to another process
on the same node.
The information captured by these probes is then analyzed by ensembles of micro-anomaly-detectors, each
focusing on deviations from the normal behaviour of a single metric captured by each probe. It is true that
CPA can impact many metrics and in different ways, raising hundreds of (likely) false alarms by each micro-
detector. However, correlating all the alarms with a BNC allows us to (i) increase the detection performance while
(ii) mitigating the high rate of false alarms generated by individual micro-detectors.
25 https://montimage.com/products/MMT_DPI.html.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:26 • G. Apruzzese et al.
Fig. 12. Architecture of the Bayesian network classifier adopted by Montimage to detect CPA in NDN. Each node represents
a micro detector that focuses on a single metric. The Anomaly node correlates the output of all other NDN nodes.
A schematic representation of the considered BNC is provided in Figure 12: the “anomaly” node (denoted
in red) represents the anomalies that can occur in the entire NDN, whereas the remaining nodes represent the
individual micro-detectors. Hence, each node focuses on a single metric, specifically Faces, CS, or PIT (denoted in
green, purple, and blue in Figure 12). The (directed) edges in the BNC represent the causal relationships between
the Anomaly node and a metric (or pairs of metrics). An edge connects the “causing” node to the “affected” node.
The causal relationships are deduced based on the processing of each packet arriving to the NDN node.
Evaluation and Results. It is necessary to conduct a preliminary assessment of the learning efficency of the BNC
before its deployment. This is because NDN generate a lot of traffic, and even though the BNC can “condense” the
raised alarms it is still important that such alarms—and, specifically, false alarms—are within acceptable levels.
To this purpose, Montimage first collects huge amounts of real data from the probes and then uses such data
(assumed to be benign) to train (and test) a BNC. Specifically, multiple BNC are assessed, each considering a
different training size: The goal is finding the optimal size that minimizes the rate of false alarms. The results of
such an assessment are reported in Figure 13, showing the misclassification error (as measured via fivefold cross-
validation) as a function of the training size. We observe that an optimal value is achieved when the training set
contains ∼280 samples.26 For higher values, the error increases due to overfitting (this phenomenon confirms the
misconception outlined in Section 5.2). Thus, for the considered deployment scenario, Montimage uses training
sets of 280 samples—corresponding to 23 minutes of real reportings.
Fig. 13. Preliminary assessment of the BNC to identify the optimal size of the training dataset.
26 We observe that such samples represent alarms corresponding to multiple signals, and not to raw events.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:27
To evaluate the performance in production settings, Montimage reproduces the NDN topology in Refer-
ence [182] and creates two distinct environments, each adopting a specific NDN routing strategy: bestroute or
multicast. Then, each environment is monitored for 10 minutes, and the attack is simulated in the last 5 minutes.
Specifically, multiple CPA are launched, each considering increasing payloads, denoting the number of requests
for content (i.e., Interests) per second; in our case, we consider payloads of 5, 10, 20, and 50 Interests per sec-
ond. In comparison, legitimate clients produce 10 Interests per second (on average): Hence, the malicious traffic
ranges from half to five times the legitimate traffic. The traffic generated during such simulations is collected
and used to assess the quality of the BNC: The goal is to verify whether the BNC is capable of identifying the
CPA, which occurs in the last 5 minutes.
To provide a twofold perspective of the performance (see Section 5.3), Montimage measures the True-
Positive Rate (TPR) and False Positive Rate (FPR) (–cf. Table 1 in Section 2.1). The results of such evaluation,
performed on a testing set of 240 samples, are reported in Table 2. We observe that the TPR increases for greater
payloads, because the CPA become more conspicuous. Nonetheless, it is appreciable that even CPA with low
payload can be effectively detected. Finally, the low FPR is crucial for real deployments as they are annoying
to human operators. All such results are due to the advantages provided by the BNC, because BNC use a
probabilistic approach that allows us to take into account the underlying random nature of the observed metrics.
Such a property makes BNC tailored for multi-variate anomaly detection in real environments. In contrast, other
ML algorithms present significant drawbacks: For instance, “deep” neural networks are excessively difficult to
develop in such settings (also due to their poor explainability), whereas other “shallow” algorithms, such as
SVM, simply do not allow us to efficiently represent and correlate all the metrics affected by CPA.
The major limitation of BNC is its intrinsic function as anomaly detector: Indeed, an anomaly is not necessarily
malicious. For instance, in a NDN setting, a sudden demand for a video from legitimate users could lead to a
temporary increase in traffic, indicating an abnormal activity. To mitigate this problem, Montimage considers
four possible “states”: normal state, IFA attack state, CPA attack state, and number of users increase. Each state is
denoted by different “anomalous” combinations taking into account a total of 18 metrics: A similar solution allows
us to maintain the FPR to acceptable levels (as shown in Table 2). We take this opportunity to make a crucial
remark for real ML deployments: One may believe that defining more “states” and/or increasing the amount
of considered metrics leads to better results. However, according to Montimage a similar approach can yield
proficient results only in a lab environment, because it induces overfitting, and the true deployment performance
may suffer excessive FPR.
Finally, an intriguing future development of such an ML solution involves the consideration of “stateful” anal-
yses that take into account the time-axis (as done, e.g., in Reference [56]) and allow to detect even anomalies
occurring in the temporal domain. The next case-study by S2Grupo will consider a similar application.
for attackers [68]. In this case study, we share the experience in the design and operation of CAIAC,27 a non-
intrusive device that leverages sequential ML to protect ICS against APT and other cyber-threats.
Scenario and Challenges. This case study highlights the advantages of ML applications for anomaly detection
in time-series data. The intuition is that APT leverage zero-day vulnerabilities and hence cannot be detected via
misuse-based detection approaches—irrespective of being human or data driven. However, pointwise and static
anomaly detection approaches are not enough to detect advanced cyberattacks, and the additional perspective
provided by the temporal domain may facilitate the detection of refined offensive strategies [132].
In the specific ICS scenario, there are two crucial requirements that must be met by security systems. First, they
should operate in a non-intrusive way, avoiding additional overhead and ensuring the regular functionalities of
the ICS: This is a tough requirement, because ICS include hundreds of devices and while excessive false alarms
are annoying, slow reaction times may imply a fallout of the entire ICS. Second, they must take into account the
complexity and variability of the data in ICS, which is difficult to manage to the intrinsic heterogeneity of ICS.
Such a requirement cannot be met just with traditional approaches for time-series anomaly detection based on
heuristics: To address this problem, S2Grupo leverages the capabilities of deep learning.
S2Grupo ML-Solution. The ML solution developed by S2Grupo, CAIAC, is an intriguing example of ML or-
chestration (Section 6.4): CAIAC not only leverages the benefits provided by “small” ML models (as done in
Section 7.1) but also exploits the potential of non-ML methods for time-series analyses. In particular, the idea
is to combine deep learning algorithms, epitomized by Long-Short Term Memory (LSTM) neural networks,
with statistical approaches for time-series forecasting, such as Seasonal Autoregressive Integrated Moving
Average (SARIMA). The result is an ensemble of ML and non-ML models, exploiting the benefits of both ap-
proaches and overcoming their limitations: Statistical models can be more manageable, but when the data have
high complexity deep learning is superior. Such a design choice is particularly suited for real ICS deployments
due to a threefold advantage with respect to “one-size-fits-all” ML architectures. Specifically:
• individual ML models are easier to train, because they must deal only with a tiny portion of the data,
resulting in better performance and lower false alarms;
• it allows combining different algorithms, each addressed to a specific problem and data type.
• it makes the resulting system more “future proof,” because each ML model can be individually updated,
removed, or replaced.
Furthermore, CAIAC is based on passive monitoring in near real time, hence preventing excessive information
overhead while still allowing timely responses.
Let us explain CAIAC in more detail. The intuition is to analyze the network traffic of the considered ICS from
different perspectives, each associated to a specific time series. This time series can differ on the basis of two
criteria: the network metric (e.g., transmitted packets) and the granularity used to aggregate the corresponding
metric in time slots of fixed length. All such time series are used to devise multiple ML and non-ML models: The
performance of each model can be assessed individually by forwarding its detected anomalies to a higher-level
correlation layer (similarly to Reference [132]). The goal of this layer is determining the nature of such anomalies:
They can either be legitimate (i.e., a “normal” malfunctioning of a component that must be investigated) or
illegitimate (i.e., an attack is taking place). Such a procedure allows us to identify the most suitable models
that will be integrated in CAIAC, depending on the pros and cons of each model. Indeed, LSTM models may
yield a superior performance but require a training phase, whereas statistical models are easier to develop and
only require some tuning. Hence, such (non-ML) models are the preferred choice when they exhibit similar
performance to LSTM.
27 https://s2grupo.es/en/research-development-innovation/industrial-cybersecurity/caiac.html.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:29
Fig. 14. Anomaly detection with (non-ML) SARIMA, using a sliding window of 30 minutes. The time series represents the
transmitted packets (y-axis) within 5-minute slots, over a period of 1 week (x-axis), corresponding to a total of 2K sam-
ples. Dark blue correspond to actual values, orange denotes the values predicted with SARIMA, and light blue denotes the
confidence interval of SARIMA’s predictions. Vertical gray lines correspond to the anomalies detected by SARIMA.
Evaluation and Results. To develop CAIAC, it is necessary to first assess the characteristics of the specific
ICS: Indeed, it is not possible to use models trained on different environments (as explained in Section 5.3).
Hence, S2Grupo monitors and collects the network traffic of the considered ICS and creates multiple time series,
each considering a given metric and granularity. Some metrics are commonly adopted in NIDS (e.g., transmitted
packets or bytes, in-/out-degree [132]); others are specific of ICS and require dedicated industrial dissectors that
extract the relevant information (e.g., protocol, parameters, command density). Finally, each metric is aggregated
in time slots of varying length, from 1 minute to 1 hour.
After this data collection phase, which in the considered setting typically amounts to about 10 GB of data per
day, S2Grupo performs the exploratory analysis focused on determining the most proficient (ML and non-ML)
algorithms for studying each time series. Let us elucidate the differences between two specific applications of
SARIMA and LSTM, starting from the non-ML algorithm.
Specifically, SARIMA analyzes a time series by adopting a sliding window approach: All data points within
a given time window are considered by SARIMA to predict a “future” value, which is provided alongside a
confidence range. We provide an example of SARIMA in Figure 14, showing the time series of the transmitted
packets aggregated in time slots of 5 minutes, over a period of 1 week; the sliding window considered by SARIMA
is of 30 minutes. The actual values are reported in dark blue, whereas the values predicted via SARIMA are shown
in orange; the confidence window of each predicted value is shown in light blue: therefore, actual values that fall
outside of this range are treated as anomalous. In particular, vertical gray lines denote the anomalies detected
by SARIMA.
From Figure 14, we observe that SARIMA accurately detects stationary deviations. However, SARIMA can
only detect non-stationary changes when they happen within its sliding window. Furthermore, non-stationary
(but legitimate) changes that occur after a long stationary interval are falsely detected as anomalies by SARIMA.
Despite some incorrect predictions, the considered application of SARIMA obtained a performance that was
deemed appropriate for the given task and integrated in CAIAC.
Let us showcase an application of deep learning via LSTM. Since LSTM do not provide a confidence interval for
each prediction, S2Grupo developed a custom anomaly threshold that takes into account the deviation between
predicted and actual values, as well as the degree of accumulation of such deviation in the past history. An
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:30 • G. Apruzzese et al.
Fig. 15. Anomaly detection with a deep LSTM neural network. The time series represents the transmitted packets (y-axis)
within 1-minute slots, over the period of 1 week (x-axis), corresponding to a total of 10K samples. Actual values are shown
in blue, and the LSTM predictions are shown in orange. Vertical gray lines denote the anomalies detected by the LSTM.
example of such an LSTM application is given in Figure 15, showing the time series of the transmitted packets
(same as Figure 14) but with a time slot of 1 minute. The actual values are shown in blue, whereas the LSTM
predictions are in orange. Vertical grey lines denote the anomalies detected by the LSTM, i.e., when the actual
values falls outside the given anomalous threshold predicted with the LSTM.
From Figure 15, we can observe that, by reducing the time slot from 5 to 1 minute, the resulting time series is
less predictable, making statistical methods unfeasible and requiring the advanced capabilities of deep learning.
Indeed, the considered LSTM can detect anomalous values without being affected by non-stationary changes—
even after long stationary intervals. This example highlights the capabilities of (deep) ML to deal with data with
high dimensionality: The LSTM takes into account a long “past” history, allowing to better infer the “normal”
behaviour. In contrast, applying SARIMA on the same time series resulted in very poor results due to the intrinsic
variability of the sequence, which forced us to aggregate data in 5-minute time slots.
However, it is important to take into account that the LSTM require a training step, whereas SARIMA only
requires some parameter adjustment. In this use-case, the LSTM in Figure 15 was trained with data collected
over 3 weeks. Such a characteristic implies that a similar LSTM model requires at least 3 weeks of data collection,
since no previous network traffic data were available to train the model—alongside the additional computational
resources to store such data and train the LSTM model (which were within acceptable levels). Hence, CAIAC
would initially make use of SARIMA and then replace it after enough data have been collected to develop a more
proficient LSTM model.
We can conclude that machine (and deep) learning are powerful instruments for protecting modern ICS, but
methods that do not leverage ML are equally important to compensate some of the limitations of ML. As such,
future developments should not exclusively focus on ML and overlook the benefits provided by other data-driven
methods.
8 CONCLUSION
This article elucidates the role of ML for Cybersecurity by providing a broad and high-level overview of the
benefits, problems, and future challenges of ML in this domain. Our article is oriented at the entire cybersecurity
sphere, and to make our contribution understandable by a broad audience, we limit technical terms to a minimum.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:31
# Misconception Ref.
1 Deep Learning vs Shallow Learning Section 2.1
2 Machine Learning and Anomaly Detection Section 3
3 Legitimacy of Adversarial Samples Section 5.1
4 Minimal Adversarial Perturbations Section 5.1
5 Size of training data Section 5.2
6 Updating ML models with new data Section 5.2
Moreover, we also clarify many misconceptions (summarized in Table 3) that are becoming common due to the
increasing abundance of works that link ML with cybersecurity applications.
After introducing the basic concepts of ML, we provide a concise summary of their applications to detect three
types of cyber threats: Malware, Phishing, and Network Intrusions. Then we elucidate some additional cyberse-
curity areas that can leverage the autonomous capabilities of ML, such as raw-data analysis, alert management,
cyber risk estimation, and threat intelligence. What follows is a description of the fundamental problems affect-
ing ML within the specific context of operational cybersecurity, which should be known to weigh the pros and
cons of the still-emerging ML solutions. Some of these problems stem from the intrinsic conflicts between the
fundamental principles of ML and the cybersecurity domain and can be addressed only by the joint effort of
different worlds: regulatory and authoritative bodies, corporate executives and engineers, as well as the entire
scientific community. To this end, we highlight the future challenges of ML in cybersecurity, which we integrate
by comprehensive recommendations addressed at each of these separate worlds. Finally, we present two case
studies of successful—and operational—industrial deployments of ML to counter cyber threats.
This article will hopefully inspire meaningful developments of ML in the cybersecurity domain, laying the
foundations for an increased deployment of ML solutions to protect current and future systems.
REFERENCES
[1] 2020. On Artificial Intelligence—A European Approach to Excellence and Trust. Technical Report. European Commission.
[2] 2021. Darktrace Industrial Uses Machine Learning to Identify Cyber Campaigns Targeting Critical Infrastructure. Retrieved August
2021 from https://www.darktrace.com/en/press/2017/204/.
[3] 2021. Gartner Predicts by 2025 Cyber Attackers Will Have Weaponized Operational Technology Environments to Successfully Harm
or Kill Humans. Retrieved August 2021 from https://www.gartner.com/en/newsroom/press-releases/2021-07-21-gartner-predicts-by-
2025-cyber-attackers-will-have-we.
[4] 2021. S&T Artificial Intelligence and Machine Learning Strategic Plan. Technical Report. U.S. Department of Homeland Security.
[5] Alexander Afanasyev, Priya Mahadevan, Ilya Moiseenko, Ersin Uzun, and Lixia Zhang. 2013. Interest flooding attack and countermea-
sures in named data networking. In Proceedings of the IFIP Networking Conference. IEEE, 1–9.
[6] Bengt Ahlgren, Christian Dannewitz, Claudio Imbrenda, Dirk Kutscher, and Börje Ohlman. 2012. A survey of information-centric
networking. IEEE Commun. Mag. 50, 7 (2012), 26–36. https://doi.org/10.1109/MCOM.2012.6231276
[7] Muna Al-Hawawreh and Elena Sitnikova. 2019. Leveraging deep learning models for ransomware detection in the industrial Internet
of Things environment. In Proceedings of the IEEE Military Communications and Information Systems Conference. 1–6.
[8] Mohammed Al-Qizwini, Iman Barjasteh, Hothaifa Al-Qassab, and Hayder Radha. 2017. Deep learning algorithm for autonomous
driving using GoogLeNet. In Proceedings of the IEEE Intelligent Vehicles Symposium. 89–96.
[9] Areej Alhogail and Afrah Alsabih. 2021. Applying machine learning and natural language processing to detect phishing email. Comput.
Secur. 110 (2021), 102414.
[10] Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. Androzoo: Collecting millions of android apps for the
research community. In Proceedings of the IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR’16). IEEE, 468–471.
[11] Mohammad Almseidin, Maen Alzubi, Szilveszter Kovacs, and Mouhammd Alkasassbeh. 2017. Evaluation of machine learning algo-
rithms for intrusion detection system. In Proceedings of the IEEE 15th International Symposium on Intelligent Systems and Informatics
(SISY’17). IEEE, 000277–000282.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:32 • G. Apruzzese et al.
[12] Mohammed Almukaynizi, Eric Nunes, Krishna Dharaiya, Manoj Senguttuvan, Jana Shakarian, and Paulo Shakarian. 2017. Proactive
identification of exploits in the wild through vulnerability mentions online. In Proceedings of the IEEE International Conference on Cyber
Conflict US (CyCon US’17). Institute of Electrical and Electronics Engineers Inc., 82–88.
[13] Nisreen Alzahrani and Daniyal Alghazzawi. 2019. A review on android ransomware detection using deep learning techniques. In
Proceedings of the ACM International Conference Management of Digital EcoSystems. 330–335.
[14] Kasun Amarasinghe, Kevin Kenney, and Milos Manic. 2018. Toward explainable deep neural network based anomaly detection. In
Proceedings of the IEEE International Conference Human System Interaction. 311–317.
[15] Eslam Amer and Ivan Zelinka. 2020. A dynamic Windows malware detection and prediction method based on contextual understanding
of API call sequence. Comput. Secur. 92 (2020), 101760.
[16] Abderrahmen Amich and Birhanu Eshete. 2021. Explanation-guided diagnosis of machine learning evasion attacks. Proceedings of the
ACM International Conference on Availability, Reliability and Security Conference.
[17] Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection.
In Proceedings of the ACM Workshop on Artificial Intelligence and Security. 13–21.
[18] Giuseppina Andresini, Feargus Pendlebury, Fabio Pierazzi, Corrado Loglisci, Annalisa Appice, and Lorenzo Cavallaro. 2021. INSOM-
NIA: Towards concept-drift robustness in network intrusion detection. In Proceedings of the ACM CCS Workshop on Artificial Intelligence
and Security.
[19] Giovanni Apruzzese, Mauro Andreolini, Michele Colajanni, and Mirco Marchetti. 2020. Hardening random forest cyber detectors
against adversarial attacks. IEEE Trans. Emerg. Top. Comput. Intell. 4, 4 (2020), 427–439.
[20] Giovanni Apruzzese, Mauro Andreolini, Luca Ferretti, Mirco Marchetti, and Michele Colajanni. 2021. Modeling realistic adversarial
attacks against network intrusion detection systems. ACM Digit. Threats: Res. Pract. (2021).
[21] G. Apruzzese, M. Andreolini, M. Marchetti, A. Venturi, and M. Colajanni. 2020. Deep reinforcement adversarial learning against botnet
evasion attacks. IEEE Trans. Netw. Serv. Manage. (2020).
[22] Giovanni Apruzzese and Michele Colajanni. 2018. Evading botnet detectors based on flows and random forest with adversarial samples.
In Proceedings of the IEEE International Symposium on Network Computing and Applications. 1–8.
[23] Giovanni Apruzzese, Michele Colajanni, Luca Ferretti, Alessandro Guido, and Mirco Marchetti. 2018. On the effectiveness of machine
and deep learning for cybersecurity. In Proceedings of the IEEE International Conference on Cyber Conflicts. 371–390.
[24] Giovanni Apruzzese, Michele Colajanni, Luca Ferretti, and Mirco Marchetti. 2019. Addressing adversarial attacks against security
systems based on machine learning. In Proceedings of the IEEE International Conference on Cyber Conflicts. 1–18.
[25] Giovanni Apruzzese, Michele Colajanni, and Mirco Marchetti. 2019. Evaluating the effectiveness of adversarial attacks against botnet
detectors. In Proceedings of the IEEE 18th International Symposium on Network Computing and Applications (NCA’19). IEEE, 1–8.
[26] Giovanni Apruzzese, Mirco Marchetti, Michele Colajanni, Gabriele Gambigliani Zoccoli, and Alessandro Guido. 2017. Identifying
malicious hosts involved in periodic communications. In Proceedings of the IEEE International Symposium on Network Computing
Applications. 1–8.
[27] Giovanni Apruzzese, Luca Pajola, and Mauro Conti. 2022. The cross-evaluation of machine learning-based network intrusion detection
systems. IEEE Trans. Netw. Serv. Manage. (2022).
[28] Giovanni Apruzzese, Fabio Pierazzi, Michele Colajanni, and Mirco Marchetti. 2017. Detection and threat prioritization of pivoting
attacks in large networks. IEEE Trans. Emerg. Top. Comput. 8, 2 (2017), 404–415.
[29] Giovanni Apruzzese, Aliya Tastemirova, and Pavel Laskov. 2022. SoK: The impact of unlabelled data in cyberthreat detection. In
Proceedings of the IEEE European Symposium on Security Privacy.
[30] Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and
Konrad Rieck. 2021. Dos and don’ts of machine learning in computer security. In Proceedings of the USENIX Security Symposium.
[31] Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and
explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium
(NDSS’14), Vol. 14. 23–26.
[32] Mehdi Babagoli, Mohammad Pourmahmood Aghababa, and Vahid Solouk. 2019. Heuristic nonlinear regression strategy for detecting
phishing websites. Soft Comput. 23, 12 (2019), 4315–4327.
[33] Ram Basnet. 2014. Learning to detect phishing URLs. Int. J. Res. Eng. Technol. 3 (2014), 11–24.
[34] Manjula C. Belavagi and Balachandra Muniyal. 2016. Performance evaluation of supervised machine learning algorithms for intrusion
detection. Proc. Comput. Sci. 89 (2016), 117–123.
[35] Jacopo Bellasio and Erik Silfversten. 2020. The impact of new and emerging technologies on the cyber threat landscape and their
implications for NATO. In Cyber Threats and NATO 2030: Horizon Scanning and Analysis, 88.
[36] Daniel S. Berman, Anna L. Buczak, Jeffrey S. Chavis, and Cherita L. Corbett. 2019. A survey of deep learning methods for cyber security.
Information 10, 4 (2019), 122.
[37] Gustavo de Carvalho Bertoli, Lourenço Alves Pereira Junior, Filipe Alves Neto Verri, Aldri Luiz dos Santos, and Osamu Saotome.
2021. Bridging the gap to real-world for network intrusion detection systems with data-centric approach. Proceedings of the Neural
Information Processing Systems.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:33
[38] Battista Biggio, Igino Corona, Zhi-Min He, Patrick P. K. Chan, Giorgio Giacinto, Daniel S. Yeung, and Fabio Roli. 2015. One-and-a-half-
class multiple classifier systems for secure learning against evasion attacks at test time. In Proceedings of the International Workshop
on Multiple Classifier Systems. Springer, 168–180.
[39] Battista Biggio and Fabio Roli. 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recogn. 84 (2018),
317–331.
[40] Leyla Bilge, Engin Kirda, Christopher Kruegel, and Marco Balduzzi. 2011. EXPOSURE: Finding malicious domains using passive DNS
analysis. In Proceedings of the Network and Distributed System Security Symposium (NDSS’11). 1–17.
[41] Adel Binbusayyis and Thavavel Vaiyapuri. 2019. Identifying and benchmarking key features for cyber intrusion detection: An ensemble
approach. IEEE Access 7 (2019), 106495–106513.
[42] Franziska Boenisch, Verena Battis, Nicolas Buchmann, and Maija Poikela. 2021. “I never thought about securing my machine learning
systems”: A study of security and privacy awareness of machine learning practitioners. In Mensch und Computer 2021. 520–546.
[43] Atul Bohara, Mohammad A. Noureddine, Ahmed Fawaz, and William H. Sanders. 2017. An unsupervised multi-detector approach for
identifying malicious lateral movement. In Proceedings of the IEEE 36th Symposium on Reliable Distributed Systems (SRDS’17). IEEE,
224–233.
[44] Gianluca Bontempi, Souhaib Ben Taieb, and Yann-Aël Le Borgne. 2012. Machine learning strategies for time series forecasting. In
European Business Intelligence Summer School. 62–77.
[45] Emilie Bout, Valeria Loscri, and Antoine Gallais. 2021. How machine learning changes the nature of cyberattacks on IoT networks: A
survey. IEEE Commun. Surv. Tutor. (2021).
[46] Anna L. Buczak and Erhan Guven. 2015. A survey of data mining and machine learning methods for cyber security intrusion detection.
IEEE Commun. Surv. Tutor. 18, 2 (2015), 1153–1176.
[47] Elie Bursztein, Matthieu Martin, and John Mitchell. 2011. Text-based CAPTCHA strengths and weaknesses. In Proceedings of the ACM
Computer and Communications Security Conference. 125–138.
[48] Nicholas Carlini and David Wagner. 2016. Defensive distillation is not robust to adversarial examples. arXiv:1607.04311. Retrieved
from https://arxiv.org/abs/1607.04311.
[49] Tanmoy Chakraborty, Fabio Pierazzi, and V. S. Subrahmanian. 2017. EC2: Ensemble clustering and classification for predicting android
malware families. IEEE Trans. Depend. Sec. Comput. (2017).
[50] Sujita Chaudhary, Austin O’Brien, and Shengjie Xu. 2020. Automated post-breach penetration testing through reinforcement learning.
In Proceedings of the IEEE Conference on Communications and Network Security (CNS’20). 1–2.
[51] Haipeng Chen, Jing Liu, Rui Liu, Noseong Park, and V. S. Subrahmanian. 2019. VASE: A twitter-based vulnerability analysis and score
engine. In Proceedings of the IEEE International Conference on Data Mining (ICDM’19). IEEE, 976–981.
[52] Li Chen, Salmin Sultana, and Ravi Sahita. 2018. Henet: A deep learning approach on intel processor trace for effective exploit detection.
In Proceedings of the IEEE Security and Privacy Workshops. 109–115.
[53] Howard Chivers, John A. Clark, Philip Nobles, Siraj A. Shaikh, and Hao Chen. 2013. Knowing who to watch: Identifying attackers
whose actions are hidden within false alarms and background noise. Inf. Syst. Front. 15, 1 (2013), 17–34.
[54] Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. 2017. Neural nets can learn function type signatures from binaries.
In Proceedings of the 26th USENIX Security Symposium (USENIX Security’17). 99–116.
[55] Igino Corona, Battista Biggio, Matteo Contini, Luca Piras, Roberto Corda, Mauro Mereu, Guido Mureddu, Davide Ariu, and Fabio Roli.
2017. Deltaphish: Detecting phishing webpages in compromised websites. In European Symposium on Research in Computer Security.
Springer, 370–388.
[56] Andrea Corsini, Shanchieh Yang, and Giovanni Apruzzese. 2021. On the evaluation of sequential machine learning for network intru-
sion detection. In Proceedings of the International Conference Availability, Reliability, Security.
[57] Ittai Dayan, Holger R. Roth, Aoxiao Zhong, Ahmed Harouni, Amilcare Gentili, Anas Z. Abidin, Andrew Liu, Anthony Beardsworth
Costa, Bradford J. Wood, Chien-Sung Tsai, et al. 2021. Federated learning for predicting clinical outcomes in patients with COVID-19.
Nat. Med. (2021), 1–9.
[58] Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, and Oriol Vinyals. 2021. The
benchmark lottery. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’21).
[59] Ambra Demontis, Marco Melis, Battista Biggio, Davide Maiorca, Daniel Arp, Konrad Rieck, Igino Corona, Giorgio Giacinto, and Fabio
Roli. 2017. Yes, machine learning can be more secure! A case study on android malware detection. IEEE Trans. Depend. Sec. Comput.
(2017).
[60] Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, and Fabio Roli.
2019. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In Proceedings of the USENIX
Security Symposium. 321–338.
[61] Melvin Diale, Turgay Celik, and Christiaan Van Der Walt. 2019. Unsupervised feature learning for spam email filtering. Comput. Electr.
Eng. 74 (2019), 89–104.
[62] Luis Dias, Simão Valente, and Miguel Correia. 2020. Go with the flow: Clustering dynamically-defined netflow features for network
intrusion detection with DynIDS. In Proceedings of the IEEE 19th International Symposium on Network Computing and Applications
(NCA’20). IEEE, 1–10.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:34 • G. Apruzzese et al.
[63] Jesús E. Díaz-Verdejo, Antonio Estepa, Rafael Estepa, German Madinabeitia, and Fco Javier Muñoz-Calle. 2020. A methodology for
conducting efficient sanitization of HTTP training datasets. Fut. Gener. Comput. Syst. 109 (2020), 67–82.
[64] Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through
deep learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1285–1298.
[65] Murat Dundar, Balaji Krishnapuram, Jinbo Bi, and R. Bharat Rao. 2007. Learning classifiers when the training data is not IID. In
Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’07). 756–61.
[66] Gints Engelen, Vera Rimmer, and Wouter Joosen. 2021. Troubleshooting an intrusion detection dataset: The CICIDS2017 case study.
In Proceedings of the IEEE Security and Privacy Workshop. 7–12.
[67] Yong Fang, Cheng Zhang, Cheng Huang, Liang Liu, and Yue Yang. 2019. Phishing email detection using improved RCNN model with
multilevel vectors and attention mechanism. IEEE Access 7 (2019), 56329–56340.
[68] Cheng Feng, Tingting Li, and Deeph Chana. 2017. Multi-level anomaly detection in industrial control systems via package signatures
and LSTM networks. In Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’17).
IEEE, 261–272.
[69] Simone Fischer-Hübner, Cristina Alcaraz, Afonso Ferreira, Carmen Fernandez-Gago, Javier Lopez, Evangelos Markatos, Lejla Islami,
and Mahdi Akil. 2021. Stakeholder perspectives and requirements on cybersecurity in Europe. J. Inf. Secur. Appl. 61 (2021), 102916.
[70] Tushaar Gangavarapu, C. D. Jaidhar, and Bhabesh Chanduka. 2020. Applicability of machine learning in spam and phishing email
filtering: Review and approaches. Artif. Intell. Rev. (2020), 1–63.
[71] Joseph Gardiner and Shishir Nagaraja. 2016. On the security of machine learning in malware c&c detection: A survey. ACM Comput.
Surv. 49, 3 (2016), 59.
[72] José Tomás Martínez Garre, Manuel Gil Pérez, and Antonio Ruiz-Martínez. 2021. A novel machine learning-based approach for the
detection of SSH botnet infection. Fut. Gener. Comput. Syst. 115 (2021), 387–396.
[73] Hugo Gascon, Steffen Ullrich, Benjamin Stritter, and Konrad Rieck. 2018. Reading between the lines: Content-agnostic detection of
spear-phishing emails. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 69–91.
[74] Mohamed C. Ghanem and Thomas M. Chen. 2018. Reinforcement learning for intelligent penetration testing. In Proceedings of the
IEEE 2nd World Conference on Smart Trends in Systems, Security and Sustainability. 185–192.
[75] Arnaldo Gouveia and Miguel Correia. 2020. Towards quantum-enhanced machine learning for network intrusion detection. In Pro-
ceedings of the IEEE 19th International Symposium on Network Computing and Applications (NCA’20). 1–8.
[76] Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2017. Adversarial examples for malware
detection. In Proceedings of the European Symposium on Research in Computer Security. Springer, 62–79.
[77] Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Yang, Huizhong Duan, Qing Zhang, Nick Barrow-Williams,
Bradley C. Turnbull, Brendan M. Collins, et al. 2019. Applying deep learning to AirBnB search. In Proceedings of the ACM SIGKDD
International Conference Knowledge Discovery and Data Mining. 1927–1935.
[78] Richard Harang and Ethan M. Rudd. 2020. SOREL-20M: A large scale benchmark dataset for malicious PE detection. arXiv:2012.07634.
Retrieved from https://arxiv.org/abs/2012.07634.
[79] Martin Horák, Václav Stupka, and Martin Husák. 2019. GDPR compliance in cybersecurity software: A case study of DPIA in informa-
tion sharing platform. In Proceedings of the ACM International Conference Availability, Reliability and Security. 1–8.
[80] Xin Hu, Kang G. Shin, Sandeep Bhatkar, and Kent Griffin. 2013. Mutantx-s: Scalable malware clustering based on static features. In
Proceedings of the USENIX Annual Technical Conference. 187–198.
[81] Yupeng Hu, Wenxin Kuang, Zheng Qin, Kenli Li, Jiliang Zhang, Yansong Gao, Wenjia Li, and Keqin Li. 2021. Artificial intelligence
security: Threats and countermeasures. ACM Comput. Surv. 55, 1 (2021), 1–36.
[82] Martin Husák, Tomáš Jirsík, and Shanchieh Jay Yang. 2020. SoK: Contemporary issues and challenges to enable cyber situational
awareness for network security. In Proceedings of the International Conference on Availability, Reliability and Security. 1–10.
[83] Mohammad S. Jalali, Michael Siegel, and Stuart Madnick. 2019. Decision-making and biases in cybersecurity capability development:
Evidence from a simulation game experiment. J. Strateg. Inf. Syst. 28, 1 (2019), 66–82.
[84] Ahmad Javaid, Quamar Niyaz, Weiqing Sun, and Mansoor Alam. 2016. A deep learning approach for network intrusion detection
system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (Formerly
BIONETICS). 21–26.
[85] Michael I. Jordan and Tom M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. Science 349, 6245 (2015), 255–260.
[86] Roberto Jordaney, Kumar Sharad, Santanu K. Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro. 2017. Transcend:
Detecting concept drift in malware classification models. In Proceedings of the USENIX Security Symposium. 625–642.
[87] Mahmoud Kalash, Mrigank Rochan, Noman Mohammed, Neil D. B. Bruce, Yang Wang, and Farkhund Iqbal. 2018. Malware classification
with deep convolutional neural networks. In Proceedings of the 9th IFIP International Conference on New Technologies, Mobility and
Security (NTMS’18). IEEE, 1–5.
[88] Chanhyun Kang, Noseong Park, B. Aditya Prakash, Edoardo Serra, and V. S. Subrahmanian. 2016. Ensemble models for data-driven
prediction of malware infections. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 583–592.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:35
[89] Asif Karim, Sami Azam, Bharanidharan Shanmugam, Krishnan Kannoorpatti, and Mamoun Alazab. 2019. A comprehensive survey for
intelligent spam email detection. IEEE Access 7 (2019), 168261–168295.
[90] Houssain Kettani and Polly Wainwright. 2019. On the top threats to cyber systems. In Proceedings of the IEEE 2nd International Con-
ference on Information and Computer Technologies (ICICT’19). IEEE, 175–179.
[91] Ahsan Al Zaki Khan. 2019. Misuse intrusion detection using machine learning for gas pipeline scada networks. In Proceedings of the
International Conference Security and Management. 84–90.
[92] Platon Kotzias, Juan Caballero, and Leyla Bilge. 2021. How did that get in my phone? Unwanted app distribution on android devices.
In Proceedings of the IEEE Symposium on Security and Privacy. 53–69.
[93] Nir Kshetri. 2021. Economics of artificial intelligence in cybersecurity. IEEE IT Profess. 23, 5 (2021), 73–77.
[94] Gunupudi Rajesh Kumar, Nimmala Mangathayaru, and Gugulothu Narsimha. 2016. An approach for intrusion detection using fuzzy
feature clustering. In Proceedings of the IEEE International Conference on Engineering & MIS (ICEMIS’16). 1–8.
[95] Ram Shankar Siva Kumar, Magnus Nyström, John Lambert, Andrew Marshall, Mario Goertzel, Andi Comissoneru, Matt Swann, and
Sharon Xia. 2020. Adversarial machine learning-industry perspectives. In Proceedings of the IEEE Security and Privacy Workshops.
69–75.
[96] Eric Lancaster, Tanmoy Chakraborty, and V. S. Subrahmanian. 2018. MALTP : Parallel prediction of malicious tweets. IEEE T. Comput.
Soc. Syst. 5, 4 (2018), 1096–1108.
[97] Lastline. 2020. Using AI to Detect and Contain Cyberthreats. Technical Report.
[98] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[99] Jhen-Hao Li and Sheng-De Wang. 2017. PhishBox: An approach for phishing validation and detection. In Proceedings of the IEEE
DASC/PiCom/DataCom/CyberSciTech Conference. 557–564.
[100] Yuping Li, Jiyong Jang, Xin Hu, and Xinming Ou. 2017. Android malware clustering through malicious payload mining. In Proceedings
of the International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 192–214.
[101] Bin Liang, Miaoqiang Su, Wei You, Wenchang Shi, and Gang Yang. 2016. Cracking classifiers for evasion: A case study on the google’s
phishing pages filter. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences
Steering Committee, 345–356.
[102] Hongyu Liu and Bo Lang. 2019. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 9,
20 (2019), 4396.
[103] Zhen Liu, Ruoyu Wang, Nathalie Japkowicz, Deyu Tang, Wenbin Zhang, and Jie Zhao. 2021. Research on unsupervised feature learning
for Android malware detection based on Restricted Boltzmann Machines. Fut. Gener. Comput. Syst. 120 (2021), 91–108.
[104] Siti-Farhana Lokman, Abu Talib Othman, and Muhammad-Husaini Abu-Bakar. 2019. Intrusion detection system for automotive Con-
troller Area Network (CAN) bus system: A review. EURASIP J. Wireless Commun. Netw. 2019, 1 (2019), 1–17.
[105] Pierangelo Lombardo, Salvatore Saeli, Federica Bisio, Davide Bernardi, and Danilo Massa. 2018. Fast flux service network detection
via data mining on passive DNS traffic. In Proceedings of the International Conference on Information Security. Springer, 463–480.
[106] Dimitris Margaritis. 2020. Artificial Intelligence Cybersecurity Challenges. Technical Report. European Union Agency for Cybersecurity.
[107] Daniel L. Marino, Chathurika S. Wickramasinghe, and Milos Manic. 2018. An adversarial approach for explainable ai in intrusion
detection systems. In Proceedings of the IEEE Conference of the Industrial Electronics Society. 3237–3243.
[108] Nuno Martins, José Magalhães Cruz, Tiago Cruz, and Pedro Henriques Abreu. 2020. Adversarial machine learning applied to intrusion
and malware scenarios: A systematic review. IEEE Access 8 (2020), 35403–35419.
[109] Lennart Maschmeyer, Ronald J. Deibert, and Jon R. Lindsay. 2021. A tale of two cybers-how threat reporting by cybersecurity firms
systematically underrepresents threats to civil society. J. Inf. Technol. Polit. 18, 1 (2021), 1–20.
[110] Steven McElwee, Jeffrey Heaton, James Fraley, and James Cannady. 2017. Deep learning for prioritizing and responding to intrusion
detection alerts. In Proceedings of the IEEE Military Communications Conference. 1–5.
[111] Dean Richard McKinnel, Tooska Dargahi, Ali Dehghantanha, and Kim-Kwang Raymond Choo. 2019. A systematic literature review
and meta-analysis on artificial intelligence in penetration testing and vulnerability assessment. Comput. Electr. Eng. 75 (2019), 175–188.
[112] Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar,
Tony Wu, George Yiu, et al. 2016. Reviewer integration and performance measurement for malware detection. In Proceedings of the
International Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA’16). 122–141.
[113] Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. 2018. Kitsune: An ensemble of autoencoders for online network
intrusion detection. In Proceedings of the Network and Distributed System Security Symposium (NDSS’18), Vol. 5. 2.
[114] Manuel Eugenio Morocho-Cayamcela, Haeyoung Lee, and Wansu Lim. 2019. Machine learning for 5G/B5G mobile and wireless com-
munications: Potential, limitations, and future directions. IEEE Access 7 (2019), 137184–137206.
[115] Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15
network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1–6.
[116] Azqa Nadeem, Sicco Verwer, Stephen Moskal, and Shanchieh Jay Yang. 2021. Alert-driven attack graph generation using S-PDFA. IEEE
Trans. Depend. Sec. Comput. (2021).
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:36 • G. Apruzzese et al.
[117] Antonio Nappa, Zhaoyan Xu, M. Zubair Rafique, Juan Caballero, and Guofei Gu. 2014. Cyberprobe: Towards internet-scale active
detection of malicious servers. In Proceedings of the Network and Distributed System Security Symposium (NDSS’14). 1–15.
[118] Tan Nguyen, Hoang-Long Mai, Guillaume Doyen, Rémi Cogranne, Wissam Mallouli, Edgardo Montes De Oca, and Olivier Festor. 2018.
A security monitoring plane for named data networking deployment. IEEE Commun. Mag. 56, 11 (2018), 88–94.
[119] Tan Nguyen, Xavier Marchal, Guillaume Doyen, Thibault Cholez, and Rémi Cogranne. 2017. Content poisoning in named data net-
working: Comprehensive characterization of real deployment. In Proceedings of the IFIP/IEEE Symposium on Integrated Network and
Service Management (IM’17). IEEE, 72–80.
[120] Tan N. Nguyen, Xavier Marchal, Guillaume Doyen, Thibault Cholez, and Rémi Cogranne. 2017. Content poisoning in named data
networking: Comprehensive characterization of real deployment. In Proceedings of the IFIP/IEEE Symposium on Integrated Network
and Service Management (IM’17). IEEE, 72–80. https://doi.org/10.23919/INM.2017.7987266
[121] Thanh Thi Nguyen and Vijay Janapa Reddi. 2021. Deep reinforcement learning for cyber security. IEEE Trans. Neur. Netw. Learn. Syst.
(2021), 1–17.
[122] Amirreza Niakanlahiji, Bei-Tseng Chu, and Ehab Al-Shaer. 2018. PhishMon: A machine learning framework for detecting phishing
webpages. In Proceedings of the IEEE International Conference Intelligent Security Informatics. 220–225.
[123] Beny Nugraha, Anshitha Nambiar, and Thomas Bauschert. 2020. Performance evaluation of botnet detection using deep learning
techniques. In Proceedings of the IEEE International Conference Network of the Future. 141–149.
[124] Livinus Obiora Nweke and Stephen Wolthusen. 2020. Legal issues related to cyber threat information sharing among private entities
for critical infrastructure protection. In Proceedings of the IEEE International Conference on Cyber Conflict (CyCon’20).
[125] Ahmet Okutan and Shanchieh Jay Yang. 2019. ASSERT: Attack synthesis and separation with entropy redistribution towards predictive
cyber defense. Cybersecurity 2, 1 (2019), 1–18.
[126] Ahmet Okutan, Shanchieh Jay Yang, and Katie McConky. 2021. Cyberattack Forecasting Using Predictive Information. (Jan. 21 2021).
US Patent App. 16/898,618.
[127] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. 2018. SoK: Security and privacy in machine learning. In
Proceedings of the IEEE European Symposium on Security and Privacy. 399–414.
[128] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial pertur-
bations against deep neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’16). IEEE, 582–597.
[129] Sergio Pastrana, Daniel R. Thomas, Alice Hutchings, and Richard Clayton. 2018. Crimebb: Enabling cybercrime research on under-
ground forums at scale. In Proceedings of the World Wide Web Conference. International World Wide Web Conferences Steering Com-
mittee, 1845–1854.
[130] Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. {TESSERACT}: Eliminating
experimental bias in malware classification across space and time. In Proceedings of the 28th USENIX Security Symposium (USENIX
Security’19). 729–746.
[131] Roberto Perdisci and Wenke Lee. 2018. Method and System for Detecting Malicious and/or Botnet-related Domain Names. (July 17
2018). US Patent 10,027,688.
[132] Fabio Pierazzi, Giovanni Apruzzese, Michele Colajanni, Alessandro Guido, and Mirco Marchetti. 2017. Scalable architecture for online
prioritisation of cyber threats. In Proceedings of the IEEE International Conference on Cyber Conflicts. 1–18.
[133] Fabio Pierazzi, Feargus Pendlebury, Jacopo Cortellazzi, and Lorenzo Cavallaro. 2020. Intriguing properties of adversarial ml attacks in
the problem space. In Proceedings of the IEEE Symposium on Security and Privacy. 1332–1349.
[134] Camila Pontes, Manuela Souza, João Gondim, Matt Bishop, and Marcelo Marotta. 2021. A new method for flow-based network intrusion
detection using the inverse Potts model. IEEE Trans. Netw. Serv. Manage. (2021).
[135] Rebecca S. Portnoff, Sadia Afroz, Greg Durrett, Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, Damon McCoy, Kirill Levchenko,
and Vern Paxson. 2017. Tools for automated analysis of cybercriminal markets. In Proceedings of the 26th International Conference on
World Wide Web. 657–666.
[136] Artur Potiguara Carvalho, Fernanda Potiguara Carvalho, Edna Dias Canedo, and Pedro Henrique Potiguara Carvalho. 2020. Big data,
anonymisation and governance to personal data protection. In Proceedings of the International Conference on Digital Government Re-
search. 185–195.
[137] Petar Radanliev, David De Roure, Rob Walton, Max Van Kleek, Rafael Mantilla Montalvo, Omar Santos, Peter Burnap, Eirini Anthi,
et al. 2020. Artificial intelligence and machine learning in dynamic cyber risk analytics at the edge. SN Appl. Sci. 2, 11 (2020), 1–8.
[138] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. 2018. Certified defenses against adversarial examples. In Proceedings of the
International Conference on Learning Representations.
[139] Vignesh Ramanathan, Rui Wang, and Dhruv Mahajan. 2021. PreDet: Large-scale weakly supervised pre-training for detection. In
Proceedings of the IEEE/CVF International Conference on Computer Vision. 2865–2875.
[140] Supranamaya Ranjan. 2014. Machine Learning Based Botnet Detection Using Real-time Extracted Traffic Features. (March 25 2014).
US Patent 8,682,812.
[141] Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. 2008. Learning and classification of malware behavior.
In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 108–125.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
The Role of Machine Learning in Cybersecurity • 8:37
[142] Markus Ring, Sarah Wunderlich, Deniz Scheuring, Dieter Landes, and Andreas Hotho. 2019. A survey of network-based intrusion
detection data sets. Comput. Secur. 86 (2019), 147–167.
[143] Farhan Sadique, Sui Cheung, Iman Vakilinia, Shahriar Badsha, and Shamik Sengupta. 2018. Automated structured threat information
eXpression (STIX) document generation with privacy preservation. In Proceedings of the IEEE Ubiquitous Computing, Electronics &
Mobile Communication Conference (UEMCon’18). 847–853.
[144] Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, and Banu Diri. 2019. Machine learning based phishing detection from URLs.
Expert Syst. Appl. 117 (2019), 345–357.
[145] Wojciech Samek, Thomas Wiegand, and Klaus-Robert Müller. 2017. Explainable artificial intelligence: Understanding, visualizing and
interpreting deep learning models. arXiv:1708.08296. Retrieved from https://arxiv.org/abs/1708.08296.
[146] Anna Sapienza, Alessandro Bessi, Saranya Damodaran, Paulo Shakarian, Kristina Lerman, and Emilio Ferrara. 2017. Early warnings of
cyber threats in online discussions. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDMW’17). IEEE,
667–674.
[147] Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward generating a new intrusion detection dataset and intrusion
traffic characterization. In Proceedings of the International Conference on Information Systems Security and Privacy (ICISSP’18). 108–116.
[148] Salvatore Signorello, Samuel Marchal, Jerome Francois, Olivier Festor, and Radu State. 2017. Advanced interest flooding attacks in
named-data networking. In Proceedings of the IEEE 16th International Symposium on Network Computing and Applications (NCA’17).
[149] Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In
Proceedings of the IEEE Symposium on Security and Privacy. IEEE, 305–316.
[150] Qiyang Song, Jiahao Cao, Kun Sun, Qi Li, and Ke Xu. 2021. Try before you buy: Privacy-preserving data evaluation on cloud-based
machine learning data marketplace. In Proceedings of the ACM Annual Computer Security Applications Conference. 260–272.
[151] Paolo Spagnolettia and Andrea Salvia. 2020. Digital systems in high-reliability organizations: Balancing mindfulness and mindlessness.
In Proceedings of the International Workshop Socio-Technical Perspective in Information Systems Development.
[152] IEEE Spectrum. 2022. Andrew Ng: Unbiggen AI. Technical Report.
[153] Nedim Šrndic and Pavel Laskov. 2013. Detection of malicious pdf files based on hierarchical document structure. In Proceedings of the
20th Annual Network & Distributed System Security Symposium. 1–16.
[154] Nedim Šrndic and Pavel Laskov. 2014. Practical evasion of a learning-based classifier: A case study. In Proceedings of the IEEE Symposium
on Security and Privacy. IEEE, 197–211.
[155] Matija Stevanovic and Jens Myrup Pedersen. 2014. An efficient flow-based botnet detection using supervised machine learning. In
Proceedings of the International Conference on Computing, Networking and Communications (ICNC’14). IEEE, 797–801.
[156] Tongtong Su, Huazhi Sun, Jinqi Zhu, Sheng Wang, and Yabo Li. 2020. BAT: Deep learning methods on network intrusion detection
using NSL-KDD dataset. IEEE Access 8 (2020), 29575–29585.
[157] Yuan-Hsiang Su, Michael Cheng Yi Cho, and Hsiu-Chuan Huang. 2019. False alert buster: An adaptive approach for NIDS false alert
filtering. In Proceedings of the ACM International Conference on Big Data. 58–62.
[158] Christopher Sweet, Stephen Moskal, and Shanchieh Jay Yang. 2020. On the variety and veracity of cyber intrusion alerts synthesized
by generative adversarial networks. ACM Trans. Manage. Inf. Syst. 11, 4 (2020), 1–21.
[159] Ke Tian, Steve T. K. Jan, Hang Hu, Danfeng Yao, and Gang Wang. 2018. Needle in a haystack: Tracking down elite phishing domains
in the wild. In Proceedings of the Internet Measurement Conference. 429–442.
[160] Daniele Ucci, Leonardo Aniello, and Roberto Baldoni. 2019. Survey of machine learning techniques for malware analysis. Comput.
Secur. 81 (2019), 123–147.
[161] Solomon Ogbomon Uwagbole, William J. Buchanan, and Lu Fan. 2017. Applied machine learning predictive analytics to SQL injection
attack detection and prevention. In Proceedings of the IFIP/IEEE Symposium on Integrated Network and Service Management (IM’17).
1087–1090.
[162] Maneesh Kumar Verma, Shankar Yadav, Bhoopesh Kumar Goyal, Bakshi Rohit Prasad, and Sonali Agarawal. 2019. Phishing website
detection using neural network and deep belief network. In Recent Findings in Intelligent Computing Techniques. Springer, 293–300.
[163] Rakesh M. Verma, Victor Zeng, and Houtan Faridi. 2019. Data quality for security challenges: Case studies of phishing, malware and
intrusion detection datasets. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 2605–2607.
[164] Kristijan Vidović, Ivan Tomičić, Karlo Slovenec, Miljenko Mikuc, and Ivona Brajdić. 2021. Ranking network devices for alarm prioritisa-
tion: Intrusion detection case study. In Proceedings of the IEEE International Conference on Software, Telecommunications and Computer
Networks (SoftCOM’21). 1–5.
[165] R. Vinayakumar, Mamoun Alazab, Alireza Jolfaei, K. P. Soman, and Prabaharan Poornachandran. 2019. Ransomware triage using deep
learning: Twitter as a case study. In Proceedings of the IEEE Cybersecurity & Cyberforensics Conference. 67–73.
[166] Ravi Vinayakumar, Mamoun Alazab, K. P. Soman, Prabaharan Poornachandran, Ameer Al-Nemrat, and Sitalakshmi Venkatraman.
2019. Deep learning approach for intelligent intrusion detection system. IEEE Access 7 (2019), 41525–41550.
[167] Paul Voigt and Axel Von dem Bussche. 2017. The EU general data protection regulation (GDPR). In A Practical Guide (1st ed.). Springer
International Publishing, Cham, 3152676.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.
8:38 • G. Apruzzese et al.
[168] Lise Volkart, Pierrette Bouillon, and Sabrina Girletti. 2018. Statistical vs. neural machine translation: A comparison of mth and deepl
at swiss post’s language service. In Proceedings of the 40th Conference Translating and the Computer. 145–150.
[169] Bachar Wehbi, Edgardo Montes de Oca, and Michel Bourdellès. 2012. Events-based security monitoring using MMT tool. In Proceedings
of the 5th IEEE International Conference on Software Testing, Verification and Validation (ICST’12), Giuliano Antoniol, Antonia Bertolino,
and Yvan Labiche (Eds.). IEEE Computer Society, 860–863. https://doi.org/10.1109/ICST.2012.188
[170] Charles Wheelus, Elias Bou-Harb, and Xingquan Zhu. 2018. Tackling class imbalance in cyber security datasets. In Proceedings of the
IEEE International Conference Information Reuse and Integration. 229–232.
[171] Laurie Williams, Gary McGraw, and Sammy Migues. 2018. Engineering security vulnerability prevention, detection, and response.
IEEE Softw. 35, 5 (2018), 76–80.
[172] Tingmin Wu, Shigang Liu, Jun Zhang, and Yang Xiang. 2017. Twitter spam detection based on deep learning. In Proceedings of the
Australasian Computer Science Week Multiconference. 1–8.
[173] Teng Xu, Gerard Goossen, Huseyin Kerem Cevahir, Sara Khodeir, Yingyezhe Jin, Frank Li, Shawn Shan, Sagar Patel, David Freeman,
and Paul Pearce. 2021. Deep entity classification: Abusive account detection for online social networks. In Proceedings of the USENIX
Security Symposium.
[174] Zhiwei Xu, Bo Chen, Ninghan Wang, Yujun Zhang, and Zhongcheng Li. 2015. ELDA: Towards efficient and lightweight detection of
cache pollution attacks in NDN. In Proceedings of the IEEE 40th Conference on Local Computer Networks (LCN’15). IEEE, 82–90.
[175] Carter Yagemann, Matthew Pruett, Simon P. Chung, Kennon Bittick, Brendan Saltaformaggio, and Wenke Lee. 2021. {ARCUS}: Sym-
bolic root cause analysis of exploits in production systems. In Proceedings of the USENIX Security Symposium.
[176] Aviv Yehezkel, Eyal Elyashiv, and Ol Soffer. 2021. Network anomaly detection using transfer learning based on auto-encoders loss
normalization. In Proceedings of the ACM Computer and Commununications Security Workshop.
[177] Ting-Fang Yen, Victor Heorhiadi, Alina Oprea, Michael K. Reiter, and Ari Juels. 2014. An epidemiological study of malware encounters
in a large enterprise. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1117–1130.
[178] Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham, William Robertson, Ari Juels, and Engin Kirda. 2013. Beehive: Large-scale
log analysis for detecting suspicious activity in enterprise networks. In Proceedings of the 29th Annual Computer Security Applications
Conference. ACM, 199–208.
[179] Jiao Yin, MingJian Tang, Jinli Cao, and Hua Wang. 2020. Apply transfer learning to cybersecurity: Predicting exploitability of vulner-
abilities by description. Knowl.-Bas. Syst. 210 (2020), 106529.
[180] Chika Yinka-Banjo and Ogban-Asuquo Ugot. 2020. A review of generative adversarial networks and its application in cybersecurity.
Artif. Intell. Rev. 53, 3 (2020), 1721–1736.
[181] Lixia Zhang, Alexander Afanasyev, Jeffrey Burke, Van Jacobson, Patrick Crowley, Christos Papadopoulos, Lan Wang, Beichuan Zhang,
et al. 2014. Named data networking. ACM SIGCOMM Comput. Commun. Rev. 44, 3 (2014), 66–73.
[182] Lixia Zhang, Alexander Afanasyev, Jeff Burke, Van Jacobson, KC Claffy, Patrick Crowley, Christos Papadopoulos, Lan Wang, and
Beichuan Zhang. 2014. Named data networking. Comput. Commun. Rev. 44, 3 (2014), 66–73. https://doi.org/10.1145/2656877.2656887
[183] Xiaohan Zhang, Yuan Zhang, Ming Zhong, Daizong Ding, Yinzhi Cao, Yukun Zhang, Mi Zhang, and Min Yang. 2020. Enhancing state-
of-the-art classifiers with API semantics to detect evolved android malware. In Proceedings of the ACM SIGSAC Conference on Computer
and Communications Security. 757–770.
[184] Yong Zhang, Jie Niu, Guojian He, Lin Zhu, and Da Guo. 2021. Network intrusion detection based on active semi-supervised learning.
In Proceedings of the IEEE International Conference on Dependable Systems and Networks. 129–135.
[185] Weiwei Zhuang, Qingshan Jiang, and Tengke Xiong. 2012. An intelligent anti-phishing strategy model for phishing website detection.
In Proceedings of the 32nd International Conference on Distributed Computing Systems Workshops. IEEE, 51–56.
Digital Threats: Research and Practice, Vol. 4, No. 1, Article 8. Publication date: March 2023.