AI, Machine Learning and Deep Learning A Security Perspective CRC
AI, Machine Learning and Deep Learning A Security Perspective CRC
Edited By
Fei Hu and Xiali Hei
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2023 selection and editorial matter, Fei Hu and Xiali Hei; individual
chapters, the contributors
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information,
but the author and publisher cannot assume responsibility for the validity of
all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish
in this form has not been obtained. If any copyright material has not been
acknowledged please write and let us know so we may rectify in any future
reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be
reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including
photocopying, microfilming, and recording, or in any information storage
or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work,
access www.copyright.com or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For
works that are not available on CCC please contact
[email protected]
Trademark notice: Product or corporate names may be trademarks or
registered trademarks and are used only for identification and explanation
without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Names: Hu, Fei, 1972– editor. | Hei, Xiali, editor.
Title: AI, machine learning and deep learning : a security perspective /
edited by Fei Hu and Xiali Hei.
Description: First edition. | Boca Raton : CRC Press, 2023. |
Includes bibliographical references and index.
Identifiers: LCCN 2022055385 (print) | LCCN 2022055386 (ebook) |
ISBN 9781032034041 (hardback) | ISBN 9781032034058 (paperback) |
ISBN 9781003187158 (ebook)
Subjects: LCSH: Computer networks–Security measures. |
Machine learning–Security measures. | Deep learning (Machine
learning)–Security measures. | Computer security–Data processing. |
Artificial intelligence.
Classification: LCC TK5105.59 .A39175 2023 (print) |
LCC TK5105.59 (ebook) | DDC 006.3/1028563–dc23/eng/20221223
LC record available at https://lccn.loc.gov/2022055385
LC ebook record available at https://lccn.loc.gov/2022055386
ISBN: 9781032034041 (hbk)
ISBN: 9781032034058 (pbk)
ISBN: 9781003187158 (ebk)
DOI: 10.1201/9781003187158
Typeset in Times
by Newgen Publishing UK
Contents
Preface
About the Editors
Contributors
PART IV Applications
This book will have a big impact in AI-related products since today so
many AI/ML/DL products have been invented and implemented, albeit with
very limited security supports. Many companies have serious concerns
about the security/privacy issues of AI systems. Millions of AI
engineers/researchers want to learn how to effectively prevent, detect, and
overcome such attacks. To the best of our knowledge, we have not seen a
book that comprehensively discusses these critical areas. This book is more
than purely theoretical description. Instead, it has many practical designs
and implementations for a variety of security models. It has detailed
hardware and software design strategies.
This book has invited worldwide AI security/privacy experts to
contribute to the complete picture of AI/ML/DL security and privacy issues.
First, the book explains the evolution process from conventional AI to
various ML algorithms (Bayesian learning, SVN, HMM, etc.) to the latest
DL models. It covers various applications based on those algorithms, from
robots to automatic vehicles, from smart city to big data applications.
Then it explains how an adversary can possibly attack an intelligent
system with built-in AI/ML/DL algorithms/tools. The attack models are
described with different environment settings and attack goals. Then the
book compares all those attacks from perspectives of attack severity, attack
cost, and attack pattern. The differences between internal and external
attacks are explained, and intentional attacks and privacy leak issues are
also covered.
Then the book provides different sets of defense solutions to those
attacks. The solutions may be based on various encryption/decryption
methods, statistical modeling, distributed trust models, differential privacy
methods, and even the ML/DL algorithms.
Next, it discusses the use of different AI/ML/DL models and algorithms
for cyber security purposes. Particularly, it explains the use of pattern
recognition algorithms for the detection of external attacks on a network
system.
Finally, it discusses some practical applications that can utilize the
discussed AI security solutions. Concrete implementation strategies are also
explained, and the application scenarios with different external attacks are
compared.
The targeted audiences of this book include people from both academia
and industry. Currently, more than 75 million people in the world are
involved in the R&D process of AI products. This book will be highly
valuable to those AI/ML/DL engineers and scientists in the industry of
vehicles, robots, smart cities, intelligent transportation, etc. In addition,
many academic people (faculty/students, researchers) who are working on
intelligent systems will also benefit from this book.
About the Editors
Darine Ameyed
École de Technologie Supérieure – Quebec University, Canada
Pallavi Arora
IK Gujral Punjab Technical University, Punjab, India
Sikha Bagui
University of West Florida, USA
Lauren Burgess
Towson University, Maryland, USA
Yen-Hung Chen
College of Informatics, National Chengchi University, Taipei, Taiwan
Mohamed Cheriet
École de Technologie Supérieure – Quebec University, Canada
Long Dang
University of South Florida, USA
Regina Eckhardt
University of West Florida, USA
E. Bijolin Edwin
Karunya Institute of Technology and Sciences, Tamil Nadu, India
Tugba Erpek
Intelligent Automation, Inc., Maryland, USA
Bryse Flowers
University of California, USA
Linsheng He
University of Alabama, USA
William Headley
Virginia Tech National Security Institute, USA
Xiali Hei
University of Louisiana at Lafayette, USA
Diego Heredia
Escuela Politécnica Nacional, Quito, Ecuador
Fei Hu
University of Alabama, USA
Yuh-Jong Hu
National Chengchi University, Taipei, Taiwan
Mu-Tien Huang
National Chengchi University, Taipei, Taiwan
Fehmi Jaafar
Quebec University at Chicoutimi, Quebec, Canada
Krzysztof Jagiełło
Warsaw School of Economics, Poland
Brian Jalaian
Joint Artificial Intelligence Center, Washington, DC, USA
G. Jaspher
Karunya Institute of Technology and Sciences, Tamil Nadu, India
Gabriel Kabanda
University of Zimbabwe, Harare, Zimbabwe
W. Kathrine
Karunya Institute of Technology and Sciences, Tamil Nadu, India
Baljeet Kaur
Guru Nanak Dev Engineering College, Punjab, India
Joseph Layton
University of Alabama, USA
Hengshuo Liang
Towson University, Maryland, USA
Weixian Liao
Towson University, Maryland, USA
Jing Lin
University of South Florida, USA
Zhuo Lu
University of South Florida, USA
Mahbub Rahman
University of Alabama, USA
Mohamed Rahouti
Fordham University, New York USA
Yi Shi
Intelligent Automation, Inc., Maryland, USA
George Stantchev
Naval Research Laboratory, Washington, DC, USA
Jerzy Surma
Warsaw School of Economics, Poland
D. Roshni Thanka
Karunya Institute of Technology and Sciences, Tamil Nadu, India
Qianlong Wang
Towson University, Maryland, USA
Kaiqi Xiong
University of South Florida, USA
Wei Yu
Towson University, Maryland, USA
Jiamiao Zhao
University of Alabama, USA
PART I
DOI: 10.1201/9781003187158-2
Contents
1.1 Introduction
1.2 Background
1.2.1 Notation
1.2.2 Support Vector Machines
1.2.3 Neural Networks
1.3 White-Box Adversarial Attacks
1.3.1 L-BGFS Attack
1.3.2 Fast Gradient Sign Method
1.3.3 Basic Iterative Method
1.3.4 DeepFool
1.3.5 Fast Adaptive Boundary Attack
1.3.6 Carlini and Wagner’s Attack
1.3.7 Shadow Attack
1.3.8 Wasserstein Attack
1.4 Black-Box Adversarial Attacks
1.4.1 Transfer Attack
1.4.2 Score-based Black-box Attacks
1.4.3 Decision-based Attack
1.5 Data Poisoning Attacks
1.5.1 Label Flipping Attacks
1.5.2 Clean Label Data Poisoning Attack
1.5.3 Backdoor Attack
1.6 Conclusions
Acknowledgment
References
Notes
1.1 INTRODUCTION
1.2 BACKGROUND
1.2.1 Notation
Table 1.1 presents a list of notations used throughout the chapter.
subject to the constraints, for all where w is the normal vector of the(1.2.1)
maximum-margin hyperplane This QP problem can be reformatted as an
unconstrained problem using Lagrange multipliers for as follows:
This indicates that the normal vector w of the hyperplane depends (1.2.3)
on and Similarly, taking the derivative of the preceding expression with
respect to we obtain
where is the component of Here are three frequently used norms in(1.2.10)
literature.
-norm:
counts the number of coordinates i such that where and are the ith
component of x and respectively. The norm is useful when an attacker
wants to limit the number of attack pixels without limiting the size of the
change to each pixel.
-norm:
(1.3.10)(1.3.9)
where denotes the size of the attack, and is a projection operator that
projects the value of onto the intersection of the box constraint of x (for
instance, if x is an image, then a box constraint of x can be the set of integer
values between 0 and 255) and the neighbor ball of x. This procedure
ensures that the produced adversarial examples are within bound of the
input x. Using a reduced perturbation magnitude limits the number of active
attack pixels and thus prevents a simple outlier detector from detecting the
adversarial examples.
1.3.4 DeepFool
The DeepFool [20] attack is an untargeted white-box attack. Like Szegedy
et al. [21], Moosavi-Dezfooli et al. [20] studied the minimally distorted
adversarial example problem. However, rather than finding the gradient of a
loss function, DeepFool searches for the shortest distance from the original
input to the nearest decision boundary using an iterative linear
approximation of the decision boundary/hyperplane and the orthogonal
projection of the input onto the approximated decision boundary.
Moosavi-Dezfooli et al. searched the shortest distance path for an input x
to cross the decision boundary and get misclassified. Formally, the
DeepFool attack can be defined as follows.
subject to by finding the saddle node of the Lagrangian given by: (1.3.13)
This saddle node can be considered the orthogonal projection of onto the
decision hyperplane g at iteration and the iteration stops when is
misclassified; i.e.,
The multiclass (in one-vs.-all scheme) approach of the DeepFool attack
follows the iterative linearization procedure in the binary case, except that
an attacker needs to determine the closest decision boundary l to the input x.
The iterative linear approximation method used to find the minimum
perturbation is a greedy method that may not guarantee the global
minimum. Moreover, the closest distance from the original input to the
nearest decision boundary may not be equivalent to the minimum difference
observed by human eyes. In practice, DeepFool usually generates small
unnoticeable perturbations.
So far, we have only considered untargeted DeepFool attacks. A targeted
version of DeepFool can be achieved if the input x is pushed toward the
boundary of a target class subject to Furthermore, the DeepFool attack can
also be adapted to find the minimal distorted perturbation for any -norm,
where If interested, see [28] for details.
Add random restarts to widen the search space for adversarial (1.3.16)
examples. That is, instead of initializing a FAB attacker setsas a
random sample in a -neighborhood of where
Add a final search step to further reduce the distance between the
adversarial example and its original input. This step uses a modified
binary search on and to find a better adversarial example within a few
iterations; e.g., Croce and Hein [25] set the number of iterations to 3
for their experiments. For details of the final search step and the
random restarts, see [25].
Note that the standard gradient descent algorithm can be used to find
the solution of this minimization problem.
Carlini and Wagner [27] showed that the C&W attack is very powerful,
and ten proposed defense strategies cannot withstand C&W attacks
constructed by minimizing defense-specific loss functions [28].
Furthermore, the C&W attack can also be used to evaluate the efficacy of
potential defense strategies since it is one of the strongest adversarial
attacks [27].
In the next two subsections, we discuss some adversarial attacks other
than -norm-based attacks.
ZOO Attack
Instead of using transfer attacks to exploit the transferability of adversarial
images, Chen et al. [35] proposed a ZOO attack to directly approximate the
gradients of the target model using confidence scores. Therefore, the ZOO
attack is considered a score-based black-box attack, and it does not need to
train a substitute model. The ZOO attack is as effective as the C&W attack
and markedly surpasses existing transfer attacks in terms of success rate.
Chen et al. [35] also proposed a general framework utilizing capable
gradient-based white-box attacks for generating adversarial examples in the
black-box setting.
The ZOO attack finds the adversarial example by also solving the
optimization problem (1.3.19). Motivated by the attack loss functions
(1.3.6.2) used in the C&W attack, a new hinge-like loss function [35] based
on the log probability score vector of the model f, instead of Z, is proposed
as follows:
where is the element of the probability score vector, and the (1.4.1)
parameter ensures a constant gap between the log probability score of the
adversarial example classified as class t and all remaining classes. The log
probability score is used instead of a probability score since well-trained
DNNs yield a significant high confidence score for a class compared to
other classes, and the log function lessens this dominance effect without
affecting the order of confidence score. The ZOO attack is defined as
follows.
(1.4.2)
where h is a small constant, and is a standard basis vector. For networks
with a large input size e.g., Inception-v3 network [11] with the number of
model queries per gradient evaluate is (two function evaluations per
coordinate-wise gradient estimation). This is very query inefficient.
To overcome this inefficiency, Chen et al. [29] proposed the five
acceleration techniques.
Note that acceleration techniques 3–5 are not required when n is small. For
instance, Chen et al. [35] did not use techniques 3–5 for the MNIST and
CIFAR10 dataset. Although the attack success rate is comparable to the
success rate of the C&W attack, the number of required queries is large for
gradient estimation despite the proposed acceleration techniques.
Square Attack
Andriushchenko et al. [26] proposed a square attack (SA), a query-efficient
attack on both and -bounded adversarial perturbations. The SA is based on a
random search (RS), an iterative technique in DFO methods. Therefore, the
SA is a gradient-free attack [39], and it is resistant to gradient masking. SA
improves the query efficiency and success rate by employing RS and a task-
specific sampling distribution. In some cases, SA even competes with
white-box attacks’ performance.
Compared to an untargeted attack, SA is targeted to find a -norm
bounded adversarial example by solving the following box-constrained
optimization problem:
1. Find the side length of the square by determining the closest positive
integer to where is the percentage of pixels of the original image x that
can be perturbed and w is the width of an image. p gradually decreases
with the iterations, but for the -norm SA. This mimics the step size
reduction in gradient-based optimization, in which we begin with
initial large learning rates to quickly shrink the loss value [40].
2. Find the location of the square with side length for each color channel
by uniformly random selection. The square denotes the set of pixels
that can be modified.
3. Uniformly assign all pixels’ values in the square to either or for each
color channel c.
4. Add the square perturbation generated in step 3 to the current iterate to
obtain the new point
5. Project onto the intersection of and the -norm ball of radius to obtain
6. If the new point attains a lower loss than the best loss so far, the
change is accepted, and the best loss is updated. Otherwise, it is
discarded.
The iteration continues until the adversarial example is found. For those
who are interested, see [26].
Boundary Attack
Relying neither on training data nor on the assumption of transferability, the
boundary attack uses a simple rejection sampling algorithm with a
constrained independent and identically distributed Gaussian distribution as
a proposed distribution and a dynamic step-size adjustment inspired by
Trust Region methods to generate minimal perturbation adversarial
samples. The boundary attack algorithm is given as follows. First, a data
point is sampled randomly from either a maximum entropy distribution (for
an untargeted attack) or a set of data points belonging to the target class (for
a targeted attack). The selected data point serves as a starting point. At each
step of the algorithm, a random perturbation is drawn from a proposed
distribution such that the perturbed data still lies within the input domain,
and the difference between the perturbed image and the original input is
within the specified maximum allowable perturbation Newly perturbed data
is used as a new starting point if it is misclassified for an untargeted attack
(or misclassified as the target class for a targeted attack). The process
continues until the maximum number of steps is reached.
The boundary attack is conceptually simple, requires little
hyperparameter tuning, and performs as well as the state-of-the-art gradient
attacks (such as the C&W attack) in both targeted and untargeted computer
vision scenarios without algorithm knowledge [43]. Furthermore, it is
robust against common deceptions such as gradient obfuscation or masking,
intrinsic stochasticity, or adversarial training. However, the boundary attack
has two main drawbacks. First, the number of queries for generating an
adversarial sample is large, making it impractical for real-world
applications [44]. Instead of a rejection sampling algorithm, Metropolis-
Hastings’s sampling may be a better option since it does not simply discard
the rejected sample. Secondly, it only considers -norm.
HopSkipJump Attack
Conversely, the HopSkipJump attack is a family of query-efficient
algorithms that generate both targeted and untargeted adversarial examples
for both and -norm distances. Furthermore, the HopSkipJump attack is
more query efficient than the Boundary attack [45], and it is a
hyperparameter-free iterative algorithm. The HopSkipJump attack is
defined as follows.
and (1.4.7)
(1.4.8)
where depends on the distance between and the image size is a (1.4.10)
set of independent and identically distributed uniform random noise, is
the batch size of the initial batch size is set to 100 in [53], and
In contrast to the boundary attack and the HopSkipJump attack that are
based on bounded adversarial perturbations, the spatial transformation
attack [4] is a black-box “semantic” adversarial attack that reveals a small
random transformation, such as a small rotation on an image, and can easily
fool a state-of-the-art image classifier. This attack really questions the
robustness of these state-of-the-art image classifiers. Furthermore, the
primary disadvantage of the ZOO attack is the need to probe the classifiers
thousands of times before the adversarial examples are found [27]. This is
not a case for spatial transformation attacks. Engstrom et al. [4] showed that
the worst-of-10 ( is able to reduce the model accuracy significantly with
just 10 queries. Basically, it rotates or translates a natural image slightly to
cause misclassification. This attack reveals the vulnerability of the current
state-of-the-art ML models.
While adversarial attacks cannot change the training process of a model and
can only modify the test instance, data poisoning attacks, on the contrary,
can manipulate the training process. Specifically, in data poisoning attacks,
attackers aim to manipulate the training data (e.g., poisoning features,
flipping labels, manipulating the model configuration settings, and altering
the model weights) in order to influence the learning model. It is assumed
that attackers have the capability to contribute to the training data or have
control over the training data itself. The main objective of injecting poison
data is to influence the model’s learning outcome.
Recent studies on adversarial ML have demonstrated particular interest
in data poisoning attack settings. This section discusses a few of the data
poisoning attack models. We start with briefly going over label flipping
attacks in subsection 1.5.1 and then focus on clean label data poisoning
attacks in subsection 1.5.2 since they are stealthy. To that end, we introduce
backdoor attacks.
This random label flipping attack can further be divided into two groups:
targeted and untargeted. In an untargeted random label flipping attack, an
attacker may select some instances from class 1 to misclassify as class –1
and some instances from class –1to misclassify as class 1. In contrast, an
attacker misclassifies one class as another consistently in a targeted random
label flipping attack. The targeted random label flipping attack is more
severe compared to the untargeted one as the targeted attack consistently
misleads the learning algorithm to classify a specific class of instances as
another specific class.
Rather than random label flipping, an attacker can also utilize label flip
attacks that are model dependent. For instance, in subsection 1.2.2, we
showed that SVM constructs a decision hyperplane using only the support
vectors. Existing studies presented a few label flipping attacks based on this
characteristic. For example, a Farfirst attack [46] is such a label flipping
attack, where a training instance far away from the margin of an SVM is
flipped. This attack effectively changes many nonsupport vector training
instances (training instances that are far from the margin and are correctly
classified by an untainted SVM) to support vectors and significantly alter
the decision boundary consequently.
Formally, an optimal label flipping attack can be considered a bilevel
optimization problem defined as follows.
(1.5.2)
An attacker tries to find a poison instance that collides with a given target
instance in a feature space while maintaining its indistinguishability with a
base instance from class c other than target class Hence, the generated
poison instance looks like a base instance, and an annotator labels it as an
instance from class c. However, that poison instance is close to the target
instance in a feature space, and the ML model is likely classifying it as an
instance from class t. This causes the targeted misclassification, and only
one such poison instance is needed to poison a transfer learning model. That
is why this attack is sometimes called a one-shot attack as well, and its
formal definition is as follows.
The base instance can be selected randomly from any classes other than
the target class. However, some base instances may be easier for an attacker
to find a poison instance than others. The coefficient must be tuned by
attackers in order for them to make the poison instance seem
indistinguishable from a base instance. Shafahi et al. [47] solved the
optimization problem by using a forward-backward-splitting iterative
procedure [48].
This attack has a remarkable attack success rate (e.g., 100% in one
experiment presented in [47]) against transfer learning. Nevertheless, we
want to point out that such an impressive attack success rate reported in
[47] is due to the overfitting of the victim model to the poison instance. The
data poisoning attack success rate drops significantly on end-to-end training
and in black-box settings, and Shafahi et al. [47] proposed to use a
watermark and multiple poison instances to increase the attack success rate
on end-to-end training. However, one obvious drawback is that the pattern
of the target instance sometimes shows up in the poison instances.
(1.5.4)
subject to the constraints where is a coefficient vector consisting of
elements
1.6 CONCLUSIONS
ACKNOWLEDGMENT
NOTE
REFERENCES
DOI: 10.1201/9781003187158-3
Contents
2.1 Introduction
2.1.1 Scope and Background
2.2 Adversarial Machine Learning
2.3 Challenges and Gaps
2.3.1 Development Environment
2.3.2 Training and Test Datasets
2.3.3 Repeatability, Hyperparameter Optimization, and
Explainability
2.3.4 Embedded Implementation
2.4 Conclusions and Recommendations
References
2.1 INTRODUCTION
Radio access network (RAN): Connects the network and the UE as the
final link.
Multi-access edge computing (MEC): Provides services/computing
functions for edge nodes.
Core network: Controls the networks.
Transport: Links RAN components (fronthaul/midhaul) and links
RAN and core (backhaul).
Our focus is on the challenges regarding the RFML applications for the
air interface and RAN. As RFML solutions rely on various waveform,
channel, and radio hardware characteristics, a multilevel development
environment is ultimately needed, as illustrated in Figure 2.7.
We draw the following conclusions built on the challenges and gaps that we
identified in the previous section:
REFERENCES
DOI: 10.1201/9781003187158-4
Contents
3.1 Introduction
3.2 Categories of Attacks
3.2.1 White-box Attacks
3.2.2 Black-box Attacks
3.3 Attacks Overview
3.3.1 Attacks on Computer-Vision-Based Applications
3.3.2 Attacks on Natural Language Processing Applications
3.3.3 Attacks on Data Poisoning Applications
3.4 Specific Attacks in the Real World
3.4.1 Attacks on Natural Language Processing
3.4.2 Attacks Using Data Poisoning
3.5 Discussions and Open Issues
3.6 Conclusions
References
3.1 INTRODUCTION
A targeted attack means that the DL model may mistakenly classify the
image being attacked as the adversary’s desired class (then we say that the
attack is successful). In general, the target category in a targeted attack can
be specified as the category with the lowest probability of classification by
the defense model at the input of the original image. This is the most
difficult case to defend against. The target category can also be specified as
the category whose classification probability is second only to the
probability of the correct category at the input of the original image to the
defense model, which is the easiest case to defend. Finally, the target
category can also be specified randomly, which is between the two previous
cases in terms of difficulty level. There has not been any major
breakthrough in the transferability of targeted attacks.
An untargeted attack is the one in which the defense model classifies the
attack image into any category other than the correct one, and the attack is
considered successful. In other words, as long as the model misclassifies the
result, then it is a successful untargeted attack.
FGSM-based Method
Among image attack algorithms, the fast gradient sign method (FGSM) is a
classic algorithm. In [10], the use of gradients to generate attack noise was
proposed. Assume that the general classification model classifies a
conventional image as a panda. But after adding the attack noise generated
by the network gradient, although it looks like a panda, the model may
classify it as a gibbon. The purpose of the image attack is not to modify the
parameters of the classification network but to modify the pixel values of
the input image so that the modified image can destroy the classification of
the classification network. The traditional classification model training is to
subtract the calculated gradient from the parameters when updating the
parameters. The iterative subtractions can make the loss value smaller and
smaller, and the probability of model prediction becomes larger and larger.
If the model mistakenly classifies the input image into any category other
than the correct one, it is considered a successful attack. Attacks only need
to increase the loss value. They can add the calculated gradient direction to
the input image so that the loss value is greater than the value when the
image before the modification passes through the classification network.
Thus the probability that the model predicts the label correctly becomes
smaller. On one hand, FGSM calculates the gradient based on the input
image. On the other hand, when the model updates the input image, the
gradient is added instead of subtracted. This is just the opposite of the
common classification model for updating parameters. The FGSM
algorithm is simple and effective, and it plays a major role in the field of
image attacks. Many subsequent studies are based on this algorithm.
Moosavi-Dezfooli et al. [11] proposed the DeepFool concept to obtain
superior results by calculating the minimum necessary perturbation and
applying it to the method of adversarial sample construction, by using a
method that limits the size of the perturbation to the L2 parametrization.
Notably, this algorithm seeks the current point in the high-dimensional
space that is closest to the decision boundary of all nonreal classes. And as
a result, this class will be the postattack label.
L2 is one of the evaluation indices. The evaluation index of the current
attack algorithm mainly uses the Lp distance (also generally called Lp
parametric) with the following formula:
JSMA-based Method
The Jacobian saliency map (JSMA) is an attack sample generation method
for acyclic feed-forward DNN networks. It limits the number of perturbed
pixel points in the input image. While gradient-based and GAN-based
adversarial attacks are based on global perturbations and generate
adversarial samples that can be perceived by the human eye, JSMA
generates adversarial samples based on point perturbations. Thus the
resulting adversarial perturbations are relatively small. The adversarial
saliency map (ASP) in [12] is based on the forward derivation of neural
networks for targeted attacks under white boxes, and it indicates which
input features in clean samples can be disturbed to achieve the attack effect.
One of the drawbacks of JSMA is that only targeted attacks can be
performed, no untargeted attacks can be implemented, and the direction of
the attack needs to be formulated (i.e., whether to increase or decrease
pixels). In the MJSMA paper [13], the authors proposed two variants of
JSMA: one without specifying a target class and the other without
specifying the direction of the attack (i.e., increasing or decreasing pixels).
In [14], the authors proposed two improved versions of JSMA: the
MJSMA means-weighted JSMA (WJSMA) and the Taylor JSMA
(TJSMA). WJSMA is a simple weighting applied to the adversarial saliency
map (ASP) by the output probability, and TJSMA also uses the ASP
concept while additionally penalizing the input features. Both attacks are
more effective than JSMA. The authors showed that TJSMA outperforms
WJSMA in the case of targeted attacks and that WJSMA outperforms
TJSMA in the case of untargeted attacks.
In addition to these two main methods of white-box attacks, there are
other attack methods such as C&W-based, direction-based, and attention-
based. The white-box attack is a common, basic adversarial attack
technology, and we discuss it more later.
Mobility-based Approach
It is shown that the adversarial samples generated against the target model
have transferability; i.e., the adversarial samples generated against the target
model will likely be effective against other models with different structures.
Therefore, in a black-box scenario, an attacker can train the model on the
same dataset or the dataset with similar distribution as the attack target and
thereby generates the adversarial samples against the trained model. Then it
uses transferability to deceive the black-box target model. If the attacker
does not have access to the training data, the attacker can use the target
model to label the synthetic data based on the idea of model distillation and
then use the synthesized data to train an alternative model to approximate
the target black-box model. Eventually it can use the white-box attack
method to generate adversarial samples against the alternative model and
use the generated adversarial samples to perform a black-box migration
attack on the target model [15]. However, while this approach has been
shown to be applicable to datasets with low intraclass variability (e.g.,
MNIST), no study has yet demonstrated its extension to more complex
datasets such as CIFAR or ImageNet. Subsequently, Papernot et al. [16]
improved the training efficiency of alternative models by using reservoir
sampling, which can further optimize the alternative model-based attack
methods in terms of diversity, mobility, and noise size.
TextBugger has five methods to fool the NLP model, which are insert,
swap, delete, subword, and subcharacter. This slight change works in both
black-box and white-box scenarios.
1. White box attack: It finds the most important word by using Jacobian
matrix, then generates five types of bugs and finds the best one
according to the confidence level.
Step 1: Find the most important word. This is done through the
Jacobian matrix with two parameters: K (the number of categories) and
N (the length of the input). The derivative of each category with
respect to the input at each time step is obtained to measure the degree
of influence of each word in the input on the output label.
Step 2: Bug generation. In order to ensure that the generated
adversarial samples are visually and semantically consistent with the
original samples, the perturbations should be as small as possible. Two
levels of perturbation are considered:
Letter-level perturbation: Simply changing the order of
letters in a word at random can easily allow the model to
determine the word as unrecognized, thus achieving the effect of
attack.
Word-level perturbation: The closest word can be found in
the word embedding layer. However, in the word2vec model,
some antonyms are also close in distance. This would then
completely change the meaning of the utterance, and then the
semantic preservation technique is used to experiment in the
GloVe model.
The adversarial samples generated using the candidate words are fed into
the model to calculate the confidence level of the corresponding category,
and the word that gives the largest confidence drop is selected. If the
semantic similarity between the adversarial sample after replacing the
words and the original sample is greater than the threshold, the adversarial
sample is generated successfully. If it is not greater than the threshold, the
next word is selected for modification.
3.6 CONCLUSIONS
REFERENCES
DOI: 10.1201/9781003187158-5
Contents
4.1 Introduction
4.2 Background
4.2.1 Deep Learning (DL)
4.2.2 Collaborative Deep Learning (CDL)
4.2.3 Deep Learning Security and Collaborative Deep Learning
Security
4.3 Auror: An Automated Defense
4.3.1 Problem Setting
4.3.2 Threat Model
4.3.3 AUROR Defense
4.3.4 Evaluation
4.4 A New Cdl Attack: Gan Attack
4.4.1 Generative Adversarial Network (GAN)
4.4.2 GAN Attack
4.4.3 Experiment Setups
4.4.4 Evaluation
4.5 Defend Against Gan Attack In IoT
4.5.1 Threat Model
4.5.2 Defense System
4.5.3 Main Protocols
4.5.4 Evaluation
4.6 Conclusions
Acknowledgment
References
4.1 INTRODUCTION
In recent years, big data processing has drawn more and more attention in
Internet companies. Deep learning (DL) has been used in many
applications, from image recognition (e.g., Tesla Autonomous vehicle [1]
and Google’s Photos [2]) to voice recognition (e.g., Apple’s Siri [3] and
Amazon’s Alexa [4]). The DL’s accuracy is often related to the size of the
training model and the structure of the DL algorithm. However, the training
datasets are enormous. It takes a long time to learn the model in a computer.
Moreover, in practice the data is often stored in isolated places. For
example, in telemedicine applications, a smart bracelet monitors the user’s
pulse rate and physical activities, while a smart camera can monitor the
user’s daily routines. These data can be collected to provide a higher
personalized service quality. By using such isolated data storage, we can
better maintain data privacy. To utilize deep learning on various data
sources and huge datasets, the researchers proposed the concept of
collaborative deep learning (CDL). It is an enhanced deep learning (DL)
framework where two or more users learn a model together. It transforms
traditional centralized learning into decentralized model training [5]. As we
know that an ensemble of multiple learning models generates better
prediction results than a single neuron network model, CDL has a better
performance than centralized DL.
Although CDL is powerful, privacy preservation is still a top concern.
Data from different companies must face the fact that nontrusted third
parties and untrusted users can contaminate the training parameters by
uploading forged features and eventually make prediction useless [6]. Some
attackers can pretend to be trusted users and download the server’s feature
locally to infer other users’ privacy information. However, the severity of
poisoning attacks is not yet well understood.
Therefore, it is significant to understand how cyber attacks can influence
the CDL model and user’s privacy. If a system that deploys CDL is
undergoing cyber attacks, researchers should know how to detect malicious
attacks and eliminate the influences of adversary. In this chapter, we
introduce mechanisms that can detect poisoning attacks and GAN attacks in
a CDL system. These attack types and corresponding workflows are
discussed in sections 4.3 and 4.4.
We first introduce the architecture of CDL and the principles and design
goals behind it in section 4.2. Then we show that although the indirect CDL
is a privacy-preserving system, it is sensitive to malicious users. Then we
introduce a mechanism – AUROR in section 4.3 – which can detect
malicious users, exclude the forged upload features, and maintain the
accuracy of the global learning model. Section 4.4 introduces a new type of
cyber attack: generative adversarial network (GAN) attack that generates
prototypical samples of the training dataset and steal sensitive information
from victim users. In section 4.5, to fight against the newest GANs attack in
the Internet of Things (IoT), a defense mechanism is presented. Server and
users communicate through a trusted third party. Finally, we conclude the
attacks and defenses in CDL and point out the future work in CDL.
4.2 BACKGROUND
Architecture
CDL algorithm is similar to traditional supervised learning: The training
phase trains the model based on the labeled datasets and obtains the
prediction result. We divide CDL into two modes.
1. Direct CDL: These are a central server and multiple users. Users
directly upload their datasets to a server, and the server is responsible
for processing the whole dataset and running DL algorithms. But direct
CDL cannot protect the user’s information. Passive inference attacks
can de-anonymize users from public data sources [10]. Thus this
chapter mainly focuses on the security in indirect CDL.
2. Indirect CDL: Unlike users in direct CDL, users in indirect CDL train
their own datasets to generate the masked features [10] and only
submit the masked features to the central server. Then the server
performs the remaining computation processes on the masked features
to generate a DL model. Each user’s data is secure and releases the
computation burden from the centralized server. The adversary in this
mode can only tamper with its local dataset without observing datasets
from other users. Figure 4.1 shows the structure of indirect CDL.
FIGURE 4.1 Structure of indirect collaborative deep learning.
1. Selection: Devices that meet the criteria (e.g., charging and connected
to WiFi) are connected to the server and open a bidirectional stream.
The server selects a subset of connected devices based on certain
goals.
2. Configuration: The server sends the federated learning (FL) plan and
global model parameters to selected devices.
3. Reporting: Devices run the model locally on their own dataset and
report updates. Once updates are received, the server aggregates them
using federated averaging and tells devices when to reconnect. If
enough devices report their results in time, the round will be
successfully completed, and the server updates the global model.
Otherwise, this round is abandoned.
4.3.4 Evaluation
Shen et al. [10] built an indirect CDL model and demonstrated how false
training data affects the global model. They used two well-known datasets:
MNIST (handwritten numbers) and GTSRB (German traffic images). Shen
et al. set 30% of customers as malicious ones, and the accuracy of the
global model dropped by 24% compared to the model trained by the benign
dataset for MNIST. The GTSRB trained model accuracy dropped 9% when
30% of users are malicious. In a word, indirect CDL is susceptible to
poisoning attacks.
After that, Shen et al. [10] added AUROR during indirect CDL. They
observed that the attack success rate was reduced to below 5% when the
malicious ratio was 10%, 20%, and 30% for MNIST simulations. The
accuracy drop is 0%, 1%, and 3% when the malicious ratio is 10%, 20%,
and 30%. This proved that AUROR could efficiently defend poisoning
attacks and that the image recognition system retains similar accuracy even
excluding the malicious features. In the GTSRB experiment, Shen et al.
[10] observed that the attack success rate is below 5% for malicious ratios
from 10% to 30% when AUROR is used to preprocess the input. And the
accuracy drop remains negligible for malicious ratios from 10% to 30%.
Main Protocol
Suppose the adversary A is an insider of a privacy-preserving CDL
protocol. All users agree in advance on the learning object, which means
that they agree on the neural network architecture.
For simplicity, we first consider only two players: the adversary A and
the victim V. V declares labels [a, b], while A declares labels [b, c]. They
first run CDL for several epochs until the accuracy of both the global model
and the local model have reached a predetermined threshold.
Then V continuously trains its network: V downloads parameters from
the central server and updates its local model, which corresponds to the
label [a, b]. V uploads the parameters of its local model to the server after it
trains the local model.
On the other side, A also downloads the parameters, updates its local
network with labels [b, c]. Unlike V, A first trains its local generative
adversarial network to mimic class [a]’s distribution. Then the faked class
[a]’s samples are generated. Next, A labels class [a]’s samples as class [c].
After that, it trains its local model on [b, c] and uploads the parameters to
the central server, as V does. This process iterates until V and A converge
to the threshold.
These GAN attacks work as long as A’s local model improves its
accuracy over time. Even if A and V use different privacy techniques, GAN
attacks still work because the success of the GAN learning relies only on
the accuracy of the discriminative model. And the parameters downloaded
from the server contain the distribution information that can be used in the
discriminator.
Figure 4.5 shows this process.
System Architecture
CNN-based architecture is used during the experiments on MNIST and
AT&T datasets and then is applied for the activation function in the last
layer. MNIST output has 41 nodes, 40 for the person’s real data and 1 as the
class (which indicates whether the adversary puts the reconstructions for its
class of interest). For the AT&T dataset, Hitaj et al. [12] added extra
convolutional layers. Batch normalization is applied to all layers’ output
except the last one to accelerate the training speed. The output is 64×64
images.
Hyperparameter Setup
For MNIST-related experiments, the learning rates for G and D are set to
1e-3, the learning rate decay is 1e-7, momentum is 0, and the batch size of
64 is used. For AT&T-related experiments, the learning rate is set to 0.02,
and the batch size is 32. The Adam optimizer with a learning rate of 0.0002
and a momentum of 0.5 are used in DCGAN.
4.4.4 Evaluation
After setting the attack system, Hitaj et al. ran the GAN attack on MNIST
and AT&T datasets while changing the privacy settings and the number of
participants. The results show that the GAN attack has a high attack success
rate in CDL; even the CDL has different privacy techniques. When only
10% of gradient parameters are downloaded, the adversary can get clear
victim’s images in the MNIST experiment and noisy images in the AT&T
experiment because of the small dataset. The author concluded that the
generator could learn well as long as the local model’s accuracy increases.
He observed that the G starts producing good results as soon as the global
model’s accuracy reaches 80%. Also, the GAN attack performs better if the
adversary uploads an artificial class and makes the victim release a more
detailed distribution of its own data.
4.5 DEFEND AGAINST GAN ATTACK IN IOT
The IoT (Internet of Things) refers to the billions of physical devices that
can connect to the Internet and collect environment data by using embedded
sensors. IoT has become more and more popular recently since IoT devices
can communicate and operate without human involvement. The tech
analysts estimated that, in the IoT, the “things”-to-people ratio has grown
from 0.08 in 2003 to 1.84 in 2010. Now the deep learning is linked to the
IoT, making IoT devices more intelligent and agile to time-sensitive tasks
than previous settings. These IoT devices, such as smartphones, can share
data by using CDL and train deep learning models in the Cloud. In the same
way, IoT devices generate large amounts of data, fueling the development
of deep learning.
CDL can successfully resolve the conflict between data isolation in the
real world and data privacy [14]. However, IoT and DL are inherently
vulnerable to cyber attacks. IoT devices always communicate in plain text
or with low encryption. Second, CDL is not safe because the server can
extract loaded data information from the updated gradient. A simple
solution, like limiting the size of the upload or download parameters, can
protect users from data leakage, but the accuracy of the deep learning model
would also decrease, making the CDL useless. Hitaj et al. [13] also pointed
out that, in the MNIST experiment, the GAN attack’s efficiency was high
even if only 10% of parameters were uploaded or downloaded.
From the GAN attack previously described, we can see that CDL is
susceptible to GAN attack. When an adversary pretends to be an insider and
gets engaged in the learning process, the adversary can steal the victim’s
dataset online. To fight against the GAN attack, Chen et al. [14] constructed
a privacy-preserving CDL in the Internet of things (IoT) network, in which
the participants are isolated from the model parameters and use the
interactive mode to learn a local model. Therefore, neither the adversary nor
the server has access to the other’s data.
Although neither participants have access to the others’ local data, Chen
et al. [14] introduced an improved Du-Atallah scheme [16], which helps
participants interact with the central server, and the neural networks learn
from updating parameters. The experiments on the real dataset show that
the defense system proposed by Chen et al. [14] has a good performance
both in model accuracy and efficiency. Their scheme is also easy to
implement for all participants without complex homomorphic encryption
technology.
4.6 CONCLUSIONS
ACKNOWLEDGMENT
REFERENCES
DOI: 10.1201/9781003187158-6
Contents
5.1 Introduction
5.2 Characterizing Attacks on DRL Systems
5.3 Adversarial Attacks
5.4 Policy Induction Attacks
5.5 Conclusions and Future Directions
References
5.1 INTRODUCTION
While deep reinforcement learning systems are useful tools for complex
decision making in higher-dimensioned spaces, a number of attack
techniques have been identified for reducing model confidence even
without direct access to internal mechanisms. While some adversarial
attacks can be mitigated through robust defensive techniques implemented
during the training and deployment phases (perturbed training inputs and
multiple deployment observations), some techniques such as policy-based
attacks are difficult to train specifically against.
REFERENCES
DOI: 10.1201/9781003187158-7
Contents
6.1 Introduction
6.2 Deep Reinforcement Learning Overview
6.2.1 Markov Decision Process
6.2.2 Value-based Methods
6.2.3 Policy-based Methods
6.2.4 Actor–Critic Methods
6.2.5 Deep Reinforcement Learning
6.3 The Most Recent Reviews
6.3.1 Adversarial Attack on Machine Learning
6.3.1.1 Evasion Attack
6.3.1.2 Poisoning Attack
6.3.2 Adversarial Attack on Deep Learning
6.3.2.1 Evasion Attack
6.3.2.2 Poisoning Attack
6.3.3 Adversarial Deep Reinforcement Learning
6.4 Attacks on Drl Systems
6.4.1 Attacks on Environment
6.4.2 Attacks on States
6.4.3 Attacks on Policy Function
6.4.4 Attacks on Reward Function
6.5 Defenses Against Drl System Attacks
6.5.1 Adversarial Training
6.5.2 Robust Learning
6.5.3 Adversarial Detection
6.6 Robust Drl Systems
6.6.1 Secure Cloud Platform
6.6.2 Robust DRL Modules
6.7 A Scenario of Financial Stability
6.7.1 Automatic Algorithm Trading Systems
6.8 Conclusion and Future Work
References
6.1 INTRODUCTION
The term artificial intelligence (AI) was first coined by John McCarthy in
1956. AI is the science and engineering of making intelligent machines. The
history of AI development is up and down between rule-based knowledge
representation of deduction reasoning and machine learning (ML) from data
of inductive reasoning. Modern AI is the study of intelligent (or software)
agents in a multiagent system environment, where an agent perceives its
environment and takes action to maximize its optimal reward. For the past
decade, AI has been closely related to ML, deep learning (DL), and deep
reinforcement learning (DRL) for their analytic capability to find a pattern
from big data created on the Social Web or Internet of Things (IoT)
platforms. The exponential growth of computing power based on Moor’s
law is another driving factor accelerating machine learning adoption.
ML algorithms used for data analytics have three types: supervised
learning for classification or regression, unsupervised learning for
clustering, and reinforcement learning (RL) for sequential decision making.
DL is an emerging machine learning algorithm for supervised and
unsupervised learning over various pattern matching applications, such as
computer vision, language translation, voice recognition, etc. But we cannot
achieve autonomous decision-making capabilities of intelligent agents
without considering using reinforcement learning (RL). RL provides a
reward for an intelligent agent for self-learning without requiring ground-
truth training data. Furthermore, we can integrate DL with RL to have DRL
for other AI applications, such as automatic robotics, AlphaGo games, Atari
video games, autonomous self-driving vehicles, and intelligent algorithmic
trading in FinTech [1].
Adversarial ML has been intensively studied for the past two decades [2,
3]. Big data analytics services face many security and privacy challenges
from an ML modeling perspective. Numerous studies have developed
multiple adversarial attacks and defenses for the entire big data analytics
pipeline on data preparation, modeling training, and deployment for
inferencing [4, 5]. Specific attacks and defenses also aim at different ML
types and algorithms, especially on attacking DL. Indeed, these DL attack
surfaces are also related to DRL.
Like AI applications for cybersecurity, the AI on adversarial machine
learning for attack and defense is a double-edged sword [6]. On the one
hand, AI-powered attacks can strengthen attacks and evade detection. In
addition, attackers can leverage adversarial AI to defeat protections and
steal or influence AI. On the other hand, AI-powered defenses can
proactively protect and disable AI-powered attacks and counter malicious
AI to fortify AI and withstand adversarial environments [7].
In [8], the authors proposed a framework for analyzing adversarial
attacks against AI models. Attack strategy, based on an attacker’s
knowledge, can be white-box, gray-box, or black-box. Attacks can be
poisoning attacks in the training phase or evasion attacks in the testing
phase. We will show a more detailed discussion in the following Sections.
Motivation: In this study, we are dealing with the trust and security
issues of AI. Compared with recent similar AI for cybersecurity studies [6],
we address trust and security of the ML modeling and deployment, notably
the DRL algorithm in the big data analytics process [9]. We are
investigating possible attacks and defenses on the DRL systems on the
secure Cloud platform [10].
A trusted and secure Cloud infrastructure might mitigate existing
adversarial attacks on the DRL system, but it is not guaranteed [11].
Unknown attacks will still appear, and we are not yet aware of them [12].
We will investigate how to follow the principle of designing robust ML
modeling and deployment systems with security by design in mind in a
secure Cloud. We attempt to demonstrate possible attacks on automatic
algorithmic trading systems while applying a robust DRL system to ensure
financial stability in the Cloud [13, 14].
Our contributions: We summarize the main contributions of this study as
follows:
MDP aims to train an agent to find an optimal policy that will return a
maximum cumulative reward by taking a series of actions for each step
with a set of states created from observations in an environment.
An MDP is a 5-tuple ,
where
S is a set of state space;
A is a set of action space;
is the transition probability P that takes action a in state s at the time t
that will lead to state at the time
is the immediate reward R received after taking action a with the state
transition from a state s to a state
is a discount factor for a discounted reward.
6.2.2 Value-based Methods
There are three types of RL. Of those, we first focus on value-based
methods, which easily explain the necessary mathematical background from
model-free approaches.
V-value Function
An RL agent goal is to find a policy such that it optimizes the expected
return
Q-value Function
The optimal Q-value is the expected discounted return when in a given state
s, and for a given action a, an agent follows the optimal policy π* after that.
We can make the optimal policy directly from this optimal value:
Advantage Function
We combine the last two functions, which describe “how good” an action a
is, compared to the expected return by following direct policy π.
Bellman Equation
The Bellman equation learns the Q value. It promises a unique solution Q*:
The rise of the study of adversarial ML has taken place over more than a
decade [2]. People have used different approaches to address the security
and privacy issues of AI or ML. We briefly introduce a general survey
discussion of adversarial ML as follows. First, in [15], the authors
articulated a comprehensive threat model for ML and categorized attacks
and defenses within an adversarial framework. Second, in another paper [4],
the authors discussed different adversarial attacks with various threat
models. They elaborated on the efficiency and challenges of recent
countermeasures against adversarial attacks.
Third, in [5], a wide range of dataset vulnerabilities and exploits
approaches for defending against these threats were discussed. In addition,
the authors developed their unified taxonomy to describe various poisoning
and backdoor threat models and their relationships. Fourth, in [16], the
authors proposed a system-driven taxonomy of attacks and defenses in
adversarial ML based on the configuration factors of an ML learning
process pipeline, including input dataset, ML architecture, adversary’s
specifications, attack generation methodology, and defense strategy. In [17],
the authors proposed robust machine learning systems with their trends and
perspectives. They summarized the most critical challenges that hamper
robust ML systems development.
Fifth, in [18], the authors gave a comprehensive survey on adversarial
attacks in reinforcement learning from an AI security viewpoint. Similarly,
in another paper [19], the authors presented a foundational treatment of the
security problem in DRL. They also provided a high-level threat model by
classifying and identifying vulnerabilities, attack vectors, and adversarial
capabilities. Finally, the authors showed a comprehensive survey of
emerging malicious attacks in DRL-based systems and the potential
countermeasures to defend against these attacks. They highlighted open
issues and research challenges for developing solutions to deal with various
attacks for DRL systems. We will now show more details of adversarial
attacks on ML, DL, and DRL.
FIGURE 6.2 In both figures, the black lines are the boundary of three
classes. Left side: An error-specific evasion attack. The attacker
specifies the output to be square with the circle as an input. Right: An
error-generic evasion attack. In this attack, the attacker misclassifies an
input sample to be a diamond rather than a circle.
FIGURE 6.4 Image with class C, with some noise 𝛿𝑥 added into the
images and put into a trained DNN. The trained DNN then returns the
incorrect class D with the highest probability of 0.8. For an excellent
adversarial image, we check to see whether the noise is as minor as
expected. If Yes, a malicious image is created. If No, we update the
noise and create another image as an input to the trained DNN.
For active attacks, the attacker’s target is to change the agent’s behavior
and make the reward as low as possible. Passive attacks do not focus on the
behavior of the agent but on stealing the details of a victim agent, such as
reward function, policy function, or other parts of DRL. The details are
helpful information for an attack agent that can produce adversarial
examples against a victim agent. An attack agent uses these adversarial
examples to attack a victim agent.
The authors also compare the difference between safe RL and secure
RL. On the one hand, the safe RL learns a policy to maximize the
expectation of the reward function. On the other hand, the secure RL makes
the learning process robust and resilient to adversarial attacks. We address
the attacks and defenses of the DRL systems in sections 6.4–6.5.
Furthermore, we present a robust DRL system in the secure Cloud platform
with robust DRL modules in section 6.6.
6.4 ATTACKS ON DRL SYSTEMS
FIGURE 6.8 To attack policy, the attack agent can perturb the states
and make the victim agent execute an action that the attack agent wants.
Hussenot et al. propose two attacks: a preobservation attack, which adds
different adversarial perturbations to each environment, and a constant
attack, which adds the same noise to each environment [25]. The attack
agent manipulates the environment and makes a victim agent learn the
wrong policy from the controlled environment.
Gleave et al. put two agents in the same environment, one is an
adversarial attack agent, and another is a legitimate victim agent [26]. The
attack agent in the same environment tries to make the victim agent follow
the desired policy by creating adversarial natural observations input to the
victim agent. It uses two-player zero-sum discounted games. The attack
agent tries to get the rewards as much as possible, making the victim agent
get as little reward as feasible.
Behzadan et al. provide a model extraction attack on the optimal policy
[22]. If a victim agent is online and provides services to others, it faces a
model extraction attack risk. The attack agent can use imitation learning
technology to extract and steal policy content to train a new mimic agent.
This mimic attack agent can generate adversarial examples to attack the
victim agent.
FIGURE 6.10 The retraining procedure trains a trained agent first and
uses the adversarial attack methods, such as FGSM or CDG, to generate
adversarial images. Original images and adversarial images are used to
create a new retrained agent.
6.5.2 Robust Learning
The central concept of robust learning is to create a robust agent while in
the training phase. Behzadan et al. used NoisyNets to provide robust
learning modules during the training phase [31, 32]. NoisyNets is a neural
network, and its biases and weights can be perturbed during the training
phase. We can represent the function as follows. The function is the neural
network function with parameters. When we have x as input to the function
f, it returns output y (see equation 6.5.1):
where A1 and A2 are the set of actions that two players can take, 𝜇 is the
strategy of player 1, 𝜈 is the strategy of player 2, and R is the total reward
with the strategy. The main idea of the zero-sum game is to allow adding an
adversarial agent while training. Consider player 1 is trying to maximize the
reward d and player 2 (adversary) is trying to minimize player 1’s reward.
The algorithm is called adversarial reinforcement learning (ARPL). After
the training phase, the legitimate victim agent learns all situations and
performs the best action according to observations. Based on this concept,
some researchers consider risk in RARL and propose risk-averse robust
adversarial reinforcement learning (RARARL) [34, 35].
In this section, we propose more about how to deliver a robust DRL system.
We discuss two aspects of robust DRL systems: One is the security of the
Cloud platform, and the other is how to provide robust DRL modules.
Several attacks might happen if we establish the ML system architecture
insecurely (see Figure 6.11). Then several weak points are exploited by a
malicious third party.
FIGURE 6.11 This architecture is insecure because the database and
DRL module is in a public network, which means the attacker is much
easier to access than a private on-site closed network. The
communication between agent and environment is not secure.
Kumar et al. had points to make about the robust DRL system. On the
system level, they can save adversarial attacks in a repository. Then they
audit the logs and monitor the ML system – also coding style to prevent
vulnerabilities [38].
Adversarial machine learning (ML) has been studied for more than a
decade. Adversarial deep learning (DL) have also been researched for a
decade. This paper presents a high-level overview of the trust and security
of deep reinforcement learning (DRL) with its applications in automatic
algorithmic trading to avoid financial instability. We first briefly introduce
deep reinforcement learning (DRL). Then we address the most recent
reviews of adversarial ML, DL, and DRL; we conceptually review the
attacks and defenses of DRL systems to make the complex concepts much
easier to understand.
To see how to preserve financial stability while applying AI techniques,
particularly big data analytics for the FinTech application domain, the big
data analytics pipeline has been extensively designed and developed on the
public Cloud or the on-site private Cloud platforms. We present the robust
DRL systems based on the secure Cloud platforms with possible proposed
defense techniques against Cloud platform system attacks and adversarial
DRL modeling attacks. Finally, we demonstrate a scenario of financial
stability problems with an automatic algorithm trading system application
by using a summarization table of previously proposed adversarial ML, DL,
and DRL concerning this FinTech application. This study is an initial road
map of trusted and security DRL with its applications in the financial world.
In our future work, we intend to investigate further issues on the
adversarial DRL with its instability effects on more inclusive financial
applications, such as high-frequency program trading, asset management,
financial asset pricing, time series forecasting of a financial index, etc. We
will design and implement the proposed robust DRL modules in the secure
Cloud platform. We intend to verify our trust in and the security of the DRL
proposition in the real financial world to see whether it can mitigate the risk
of financial instability while facing different adversarial ML, DL, and DRL
challenges.
REFERENCES
Diego Heredia
Facultad de Ciencias: Escuela Politécnica Nacional, Quito, Ecuador
DOI: 10.1201/9781003187158-8
Contents
7.1 Background
7.2 Topics of Chapter
7.3 Scope
7.4 Cyber Security in IoT Networks
7.4.1 Smart Home
7.4.2 Attack Graphs
7.5 Modeling with Bayesian Networks
7.5.1 Graph Theory
7.5.2 Probabilities and Distributions
7.5.3 Bayesian Networks
7.5.4 Parameter Learning
7.5.5 Inference
7.6 Model Implementation
7.6.1 Network Structure
7.6.2 Attack Simulation
7.6.3 Network Parametrization
7.6.4 Results
7.7 Conclusions and Future Work
References
7.1 BACKGROUND
7.3 SCOPE
The smart home has three wi-fi connections between these levels [4]:
Wi-fi network 1: Provides connection between the access point and the
management level.
Wi-fi network 2: Provides connection between the management level
and the KNX IP router.
Wi-fi network 3: Provides connection between the KNX IP router and
the field devices level.
Figure 7.1 represents the structure of the smart home. The following
attacks are considered in the model implementation: social engineering
(SE), phishing (PH), malware injection (MI), denial of service (DoS),
routing table poisoning (RTP), persistent attack (PA), and man in the middle
(MitM).
Consider one of the factors of equation (7.5.3): Since the nodes (7.5.3)
are in topological order, it follows that and that none of the descendants of
Xi is in the mentioned set. Hence, , where From the local independencies for
Xi and from the decomposition property of conditional independencies, it
follows that I(Xi,Pa(Xi),Z). Hence
Equation (7.5.5) is called the chain rule for Bayesian networks, (7.5.5)
and this result explains the joint distribution for the variables of the model.
Bayesian networks represent the joint distribution of a set of random
variables in a compact way. In the case of discrete random variables, the
probabilities of all combinations of values of the variables must be declared
to represent the joint distribution, and the number of probabilities can be
large. For example, if we consider a set of n binary-valued random variables
(Bernoulli variables), the joint distribution requires the specification of 2n
probabilities. Noting that the sum of probabilities must be 1, then we would
need the specification of 2n−1 numbers in total. For large values of n, this
representation becomes unmanageable. Depending on the problem of study,
estimating those probabilities can be difficult for human experts, and if we
want to learn the distribution from data, we would need large amounts of
data to estimate the parameters robustly [5]. Bayesian networks address this
problem by using the conditional independences between variables, so that
the representation of the joint distribution require fewer parameters and it
can explain more about the whole phenomenon. If each node in the graph of
a Bayesian network has at most k parents, then the total number of
independent parameters required is less than [5].
where is the number of cases in the data set D that satisfy event . (7.5.6)
Let x be an instantiation of a variable in X and u an instantiation of its
parents in G; then, using definition (7.5.6), the parameters can be estimated
with the empirical probability:
7.5.5 Inference
Once the Bayesian network has been designed, the objective is to obtain
information by answering questions about the variables of the Bayesian
network and their probabilities [6, 7]. The techniques used are known in
general as inference. For Bayesian networks, the process of answering these
questions is also known as probabilistic reasoning or belief updating, while
the questions themselves are called queries [7]. The queries used for risk
assessment are the probabilities of evidence, posterior marginals, and most
probable explanations. Considering that we use Bernoulli random variables
in the model, given a Bayesian network for a set of Bernoulli random
variables these inferences are:
ΘSECP
ΘPHCP | SECP
ΘMICP | SECP
ΘDOSR | PHCP, MICP
ΘRTPR | PHCP, MICP
ΘPAR | PHCP, MICP
ΘDOSF | DOSR, RTPR, PAR
ΘMITMF | DOSR, RTPR, PAR
7.6.4 Results
This section presents the results obtained from the implementation of the
model. The number of attacks simulated were 1000. Using exact inference
algorithms in Bayesian networks [7], four results were calculated:
The first three results are presented in Tables 7.4, 7.5, and 7.6. As for the
most probable explanation: If there is evidence that there was a DoS attack
but not a MitM attack or that there was a MitM attack but not a DoS attack
to the field devices level, the most probable explanation includes a social
engineering attack to the management level, then a phishing attack to the
management level, after that a DoS attack to the KNX IP router, and finally
any of the attacks of the evidence.
TABLE 7.4 Probability of DoS and MitM Attacks to the Field Devices
Level
ATTACK PROBABILITY
DOSF 0.3257822
MITMF 0.2627260
DOSF and MITMF 0.1897563
TABLE 7.5 Probability of DoS and MitM Attacks to the Field Devices
Level Given the Evidence of Attack on Any Other Nodes in the
Network
ATTACK EVIDENCE PROBABILITY
DOSF SECP 0.4134291
MITMF 0.3334087
DOSF and MITMF 0.2408075
DOSF PHCP 0.6054472
ATTACK EVIDENCE PROBABILITY
MITMF 0.4905264
DOSF and MITMF 0.3567098
DOSF MICP 0.6333557
MITMF 0.5179413
DOSF and MITMF 0.3813129
DOSF DOSR 0.7190847
MITMF 0.5931191
DOSF and MITMF 0.4460161
DOSF RTPR 0.7592573
MITMF 0.6449062
DOSF and MITMF 0.5024086
DOSF PAR 0.7097519
MITMF 0.6507042
DOSF and MITMF 0.4829878
TABLE 7.6 Probability of DoS and MitM Attacks to the Field Devices
Level Given the Evidence of Attacks and That There Were No Phishing
and Malware Injection Attacks to the Management Level
ATTACK EVIDENCE 1 EVIDENCE 2 PROBABILITY
DOSF No PHCP MICP 0.5625418
MITMF 0.4474423
DOSF and MITMF 0.3165204
DOSF PHCP No MICP 0.5468065
MITMF 0.4323725
DOSF and MITMF 0.3036255
DOSF PHCP MICP 0.7013511
MITMF 0.5856344
DOSF and MITMF 0.4435267
REFERENCES
DOI: 10.1201/9781003187158-10
Contents
8.1 Introduction
8.2 Security Threats
8.3 Honeypot Defense
8.4 Poisoned Data Defense
8.5 Mixup Inference Against Adversarial Attacks
8.6 Cyber-Physical Techniques
8.7 Information Fusion Defense
8.8 Conclusions and Future Directions
References
8.1 INTRODUCTION
Recent research into machine learning algorithms has shown that a potential
attack vector is the training dataset used for the machine learning system, a
so-called poisoned data attack. This compromise of the integrity of a
machine learning system early into its operational lifetime can have
deleterious effects, especially if the malicious alterations are subtle and
difficult for human operators to detect. For example, a machine learning
system trained to diagnose patient data could report more false positives
than expected, reducing confidence in the system as well as worsening
patient outcomes.
In their paper on the defensive techniques against poisoned data,
Mozaffari-Kermani et al. presented general algorithm-independent
approaches to using poisoning attacks in healthcare systems and
countermeasures to those proposed generic attacks [5]. Since this chapter
focuses on defensive measures against machine-learning attacks, the
authors can only cover the generic attacks briefly.
The attack model for this poisoned data assumes that potential attackers
have some knowledge of the training dataset, either through it being
publicly available (population health data) or by eavesdropping on network
traffic to gather specific data. These potential attackers’ motivations may
vary, but the general result of these attacks is the worsening model
performance during particular classifications (denoted as the attacked
class). For example, an attacker may wish to misdiagnose hypothyroidism
as being normal for the patient: Once the initial misdiagnosis is made, either
scenario is advantageous for a malicious attacker: (1) the diagnosis is
corrected, but confidence in the ML system is reduced, or (2) the diagnosis
is not corrected, and the patient suffers the health effects. For an adversarial
attacker to accomplish this goal, it must add enough poisoned data to the
training set that the machine learning system begins to identify
perturbations incorrectly, mistaking the maliciously placed signals as actual
identifiers of health conditions.
The countermeasure proposed by Mozaffari-Kermani et al. is the
periodic construction and evaluation of a separate model using the training
dataset, with threshold alarms being triggered if the accuracy statistics of
the model changes significantly between constructions. By tracking the rate
of correct classifications and Kappa statistic, perturbations in the model can
be measured over time, and the timeframe of when malicious data entered
the training set can be identified. By using both statistics, an alarm can be
triggered even when malicious data does not directly decrease the
classification ratio.
Information fusion (or data fusion) (Figure 8.4) integrates multiple distinct
data sources to produce more helpful information than a single source can
provide [10]. Depending on where this fusion takes place and the
complexity of the data used, the fusion process can be placed on a
continuum, with the low end corresponding to simple combinations of raw
data and the high end corresponding to complex fusions of highly processed
data streams. For machine learning defense, this concept of information
fusion can be applied as a network-level mechanism for detecting and
mitigating potential attacks against IoT devices, as demonstrated in the
2014 paper by Chen et al. [11].
FIGURE 8.4 Network model of IoT fusion.
REFERENCES
DOI: 10.1201/9781003187158-11
Contents
9.1 Introduction
9.2 Categories of Defenses
9.2.1 Modified Training or Modified Input
9.2.2 Modifying Networks Architecture
9.2.3 Network Add-on
9.4 Discussions and Open Issues
9.5 Conclusions
References
9.1 INTRODUCTION
In recent years, technologies like machine learning and deep learning (DL)
neural networks [1] have been widely used in real-world tasks such as
image classification, speech recognition, autonomous driving, spam
filtering, and intelligent antifraud on image-like data. As a result, DL
systems are becoming commonplace and ubiquitous in our lives, which will
open up new possibilities for adversaries to carry out attacks. Attackers
attempt to achieve adversarial goals by altering the model’s input features in
various ways to bypass the detection of machine learning models in realistic
tasks or by attacking the models directly to compromise their integrity.
For example, the initial adversarial attack is intended to construct an
adversarial example by adding finely designed noise to the normal example
that is not perceptible to humans, so as to drive the machine learning or DL
model to misjudge the carefully constructed adversarial example without
interfering with human cognition, as shown in Figure 9.1 [2]. Here the
original image is recognized as “panda” by the image classification model
with 57.7% confidence, while the adversarial image obtained after adding
subtle perturbations is incorrectly recognized as “gibbon” with 99.3%
confidence, yet the adversarial image is still recognized as a giant panda
normally for a human.
White-box attack: White-box attack assumes that the attacker can fully
obtain information such as the structure and parameters of the target
model. Therefore, during the attack, the attacker can use the complete
information of the model to solve the gradient information of the target
model to guide the generation of adversarial examples.Black-box
attacks: Unlike white-box attacks, black-box attacks assume that the
attacker neither knows the training data and model structure nor can
obtain the specific parameters of the model. It can only obtain the final
decision result of the learning model. In this case, the attacker can only
detect the sensitivity of the target model or estimate the gradient
information of the model by manipulating the input of the model and
using the final decision result to guide the adversarial sample.
Therefore, compared with the white-box attack, the black-box attack
uses less information, and the attack is more difficult to implement.
1. Decreasing the depth of each pixel: The size of the channel of the
image pixel is 28. It is found that using a compressing size of 2i to
represent the image does not affect the neural network’s judgment of
the image. It effectively reduces the possibility of being attacked; that
is, the effect of noise added to the confrontation sample will be
effectively reduced.
2. Reduction in the spatial dimension: In [5], the example of median
smoothing was given, which is equivalent to pooling in convolutional
neural networks. It takes the median of the n×n filter and replaces the
value of the original region. Because the values of the pixels in the
neighboring regions are similarly correlated, smoothing the whole map
does not affect the final result but removes the noise added to it.
Data Augmentation
Data augmentation is typically used to counter black-box attacks. It extends
the original training set using generated adversarial examples and attempts
to allow the model to see more data during training.
As an example, in [7], the text of adversarial examples by training on
reading ability through generative adversarial networks (GAN) algorithm is
expanded [8]. They proposed two approaches to generate more data-diverse
features: (1) knowledge-based words, i.e., replacing words with
synonym/negative pairs from online database species; (2) neural network-
based, i.e., using a seq2seq model to generate hypotheses containing
examples and measuring the cross-entropy between the original and
generated hypotheses by performing a loss function. In the training process,
they used the GAN algorithm to train the discriminator and generator and
merged the optimization steps of the adversarial exemplar discriminator in
it.
Wang et al. [9] enhanced the complexity of the database, thus improving
the resistance of the model. Moreover, its complexity conversion formula
can update itself with the continuous detection of adversarial samples. Lee
et al. [10] proposed the idea that a data augmentation module can be
incorporated to simulate adversarial attacks when training DL models,
which uses an uncertainty generator to train DL models. The generator can
generate FGSM-like perturbations. After passing through the classifier of
DL, it can yield either normal or interference results. The DL network
learned autonomously by this data augmentation module has greater
robustness against larger adversarial attacks based on FGSM perturbations.
Model Regularization
Model regularization enforces the generated adversarial examples as the
regularizer and follows the form of:
MegNet Model
Meng et al. [15] conceived that, for most AI tasks, the sample space is a a
high-dimensional space but that the effective samples we need are actually
in a manifold space with much lower dimensionality than the original
sample space.
The authors implemented a two-pronged defense system, proposing the
corresponding detector and reformer methods for two reasons: (1) Detector
is used to determine whether a sample is far away from the manifold
boundary. For example, if the task is to classify handwritten digital images,
all other images that do not contain numbers are adversarial samples that
can lead to misclassification because the classifier of the target DL model
has to classify that image. Input samples belonging to this class will be
removed directly. (2) The adversarial sample is very close to the popular
boundary for that task. If the generalization of that classification method is
not good at that point, misclassification can also occur. Efforts are made to
find a sample that is close to or just above the manifold for that task and
then hand it over to the classifier. This is the second defense method
conceived by the authors. Figure 9.3 demonstrates the workflow of
MagNet.
where Xtrain is the normal samples in training set. and E(x) is the
reconstruction error. The authors argue that, if the sample to be tested
is a normal sample, its reconstruction error is small because the sample
has the same generation process as the training data of autoencoder,
and that, conversely, if the sample with detection is an adversarial
sample, then the reconstruction error of that sample will be large.
2. Probabilistic divergence-based detection: The authors were able to
find the adversarial input samples by comparing the results of two
different sets of input data that went through the softmax layer output.
They used the Jensen–Shannon scatter to measure the degree of
disagreement between the results of the two sets of data.
The ideal reformer needs to satisfy the following two conditions: (1) It
should not change the classification structure of the normal samples; (2) it
should reconstruct the adversarial samples sufficiently so that the
reconstructed samples are close to the normal samples. The authors trained
the autoencoder to minimize the reconstruction errors on the training set
and to ensure that it is well adapted to the validation set. Afterward, when
given a normal input set, the autoencoder output a very similar example.
However, when given an adversarial example, the autoencoder should
output an example that approximates the adversarial one. In this way,
MagNet can improve the classification accuracy while maintaining
classification accuracy.
9.5 CONCLUSIONS
With the further development of deep learning research and the widespread
application of DL technologies in real-world scenarios, deep learning
security and privacy have become a new and promising research area,
which attracts a large number of scholars from academia and industry to
conduct in-depth research and achieve many remarkable research results.
However, so far, DL security and privacy research is still in its infancy, and
many critical scientific issues remain to be solved.
This chapter summarized adversarial defense strategies against the DL
model. First, we introduced the targets of different adversarial attacks, for
which we grouped all the defense strategies into three categories: modified
training or modified input, modifying networks architecture, and network
add-on. In the different categories, we also illustrated in detail the different
defense algorithms, each of which has different advantages. However, the
study of the adversarial defense of DL is still in its infancy, and we still face
two major difficulties: how to combat white-box attacks and the
pervasiveness of adversarial defense. We have no effective algorithms to
defend against potentially harmful white-box attacks.
REFERENCES
DOI: 10.1201/9781003187158-12
Contents
10.1 Introduction
10.2 Background
10.2.1 Model-free RL
10.2.2 Deep Reinforcement Learning
10.2.3 Security of DRL
10.3 Certificated Verification for Adversarial Examples
10.3.1 Robustness Certification
10.3.2 System Architecture
10.3.3 Experimental Results
10.4 Robustness on Adversarial State Observations
10.4.1 State-adversarial DRL for Deterministic Policies: DDPG
10.4.2 State-adversarial DRL for Q-Learning: DQN
10.4.3 Experimental Results
10.5 Conclusion and Challenges
Acknowledgment
References
10.1 INTRODUCTION
Machine learning (ML) can be divided into three categories: supervised,
unsupervised, and reinforcement learning. In supervised learnings (e.g.,
SVM, decision trees, etc.), the training samples are labeled for finding the
boundary among different clusters. On the other hand, if the training dataset
does not have ground-truth labels, its ML algorithm is called unsupervised
learning, such as KNN, K-means, etc. They classify the training data with
respect to the distance in hyperplane. Reinforcement learning (RL) uses the
intelligent agent to learn from the history states/actions and can use the
maximum cumulative reward to generate actions and interact with the
environment in real time.
RL has been used for many decision-making problems in robotics,
electric grids, sensor networks, and many other real-world applications.
However, the RL algorithm may take a long time to converge to the optimal
point (from a few hours to a few weeks depending on the state space size)
[1]. Recently, a new algorithm combining deep learning (DL) with RL,
called deep reinforcement learning (DRL), has been applied to handle high-
dimensional inputs, such as camera images, big vector problems. DRL has
shown great results in many complex decision-making processes, such as
robotics [2], autonomous vehicles, smart city, and games such as Go [3],
etc. DRL has shown its capabilities in dealing with critical real-world
applications. An autonomous racing car, which is called AWS DeepRacer,
has been designed by Amazon using DRL. It uses cameras to visualize the
track and a DRL model to make decisions on throttle and direction.
Therefore, the security and privacy of DRL need to be fully investigated
before deploying DRL in critical real-world systems. Recently, DRL is
proved to be vulnerable to adversarial attacks [4]. Attackers insert
perturbations into the input of DRL model and cause DRL decision errors.
DRL utilizes a deep neural network (DNN) model to achieve high
prediction, but it is not robust against input perturbations. Even a small
change in the input may lead to dramatic oscillations in the output. This
makes it necessary to have a comprehensive understanding of the types and
features of adversarial attacks.
This chapter aims to introduce the security issues in DRL and current
defensive methods to overcome adversary attacks. We discuss DRL’s
structure, its security problems, and the existing attack targets and
objectives in DRL.
Organization of this chapter: Overviews of RL and DRL, as well as the
privacy problems associated with them, are presented in section 10.2.
Section 10.3 talks about a proposed defensive method to achieve a robust
DRL against adversarial attacks. Section 10.4 is an extended version of
section 10.3. It presents an upgraded DRL mechanism by finding a robust
policy during the learning process. We conclude this chapter and discuss
research challenges in designing a privacy-preserved and robust DRL in
section 10.5.
10.2 BACKGROUND
State : An agent interacts and learns about states. At each time step, the
agent interacts with the state space governed by the policy and obtains
a reward that determines the quality of the action.
Policy defines how the agent will activate in an environment. It maps
the present state to an action. Policies are divided into two types:
deterministic policy and stochastic policy. When the action taken by
the agent is deterministic, the policy is deterministic; if the action
follows a probability distribution function for different states, the
policy is stochastic.
On-policy algorithm evaluates the exploration value.
Off-policy algorithm does not evaluate the exploration value. Q-
learning is an off-policy method.
Action is a behavior performed by the agent for interactions with
environment.
Reward : The agent receives the reward after making an action. The
goal of the agent is to maximize the accumulative reward.
Value function: The maximum expected accumulative reward an agent
can get at a specific state. It can be calculated by
where , is the sum of reward from states to the final point, is a discount
factor,
Exploration and exploitation: Exploration is the process by which the
agent tries to explore the environment by taking different actions at a
given state. Exploitation occurs after exploration. The agent exploits
the optimal actions to get the highest accumulated reward.
10.2.1 Model-free RL
Model-free RL focuses on calculating the value function directly from the
interactions with the environment. It can be further divided into two types:
policy optimizing and Q-learning.
1. Attacks targeting the reward: In 2018, Hen et al. [9] evaluated the
reaction of DRL in a software-defined network to adversary attacks.
The adversary knows the structure and parameters of the model. By
flipping the rewards and manipulating states, the attacker is able to
mislead the intelligent node to choose suboptimal action. Pattamaik et
al. [10] proposed three attack methods by adding perturbations to the
observations. The evaluation results showed that the third attack,
which uses stochastic gradient descent for creating cost function, can
mislead the DRL agent to end up in a predefined adversarial state [1].
These adversaries are assumed to have the knowledge of the DRL, but
Huang et al. [11] proved that even when the adversary has no
information of the model, it can still fool the agent to perform a
desired action.
2. Attacks targeting the policy: Trstschk et al. [12] added an adversarial
transformer network (ATN) to impose adversarial reward on the policy
network for DRL. The ATN makes the cumulative reward reach the
maximum if the agent follows a sequence of adversarial inputs. Thus
the DRL automatically follows the adversary’s policy. But this method
requires the attacker to have the complete information of the agent and
the target environment (called a white-box attack).
3. Attacks targeting the observation: The adversary can also directly
manipulate the sensory data. The research on real-time attacks to a
robotic system in a dynamic environment has already been evaluated.
Clark et al. [13] used a white-box adversarial attack on the DRL policy
of an autonomous robot in a dynamic environment. The robot is
intended to find a route in a maze while the adversary’s goal is to
mislead robot to the wrong routes. The adversary tampers with sensory
data of the robot, so that the robot is deviated from the optimal route.
Clark et al. [13] also observed that if the adversarial input is removed,
the robot automatically corrects its actions and finds the correct route.
Therefore, an adversary can tamper with the sensory data temporarily
and leave behind very little evidence. Recently, Tu et al. [14] showed
that the motion sensing and actuation systems in robotic VR/AR
applications can be manipulated in real-time noninvasive attacks.
4. Attacks targeting the environment: The final way of performing an
adversarial attack on DRL is compromising the environment. The core
idea of environment attack is placing additional confusing obstacles to
confuse the robot. Bai et al. [15] proposed a method of creating
adversarial examples for deep Q-network (DQN), which is trained for
pathfinding. The DQN model already has a complete solution to find
an optimal path. The adversary extracts the model parameters,
analyzes them, and finds the weakness of the Q-value. The adversary
adds adversarial environmental examples to the robot. This method
generates a successful attack and stops the robot from finding an
optimal route in a maze.
The matrix A contains the neural network weight and ReLU activation
for each layer, , and an identity layer at the last layer.
By follow this policy, the intelligent agents select the most robust action
under the worst-case perturbation.
Zhang et al. [19] also investigated the robust DRL algorithm under
adversarial perturbation or noisy environment conditions (Figure 10.4).
Zhang et al. [19] further demonstrated that the naïve adversarial training
methods cannot improve DRL robustness significantly in noisy
environment but instead make DRL training unstable and deteriorate agent
performance. They concluded that naïvely applying techniques from
supervised learning to RL is not appropriate since RL and supervised
learning are two quite different problems.
FIGURE 10.4 Reinforcement learning with perturbation from
observation.
where , and means the second norm. For each state s, the agent needs to
solve a maximization problem, which can be solved using convex
relaxations. This smoothing procedure can be done at test time. During
training time, Zhang et al.’s goal is to keep small.
Zhang et al.[19] pointed out that, unlike DDPG, the regularized reward
sum is similar to the robustness of classification tasks, if is treated as the
“correct” label. The maximization can be solved using projected gradient
descent (PGD) or convex relaxation of neural networks [19].
ACKNOWLEDGMENT
REFERENCES
DOI: 10.1201/9781003187158-13
Contents
11.1 Introduction
11.2 Support Vector Machine (Svm) Under Evasion Attacks
11.2.1 Adversary Model
11.2.2 Attack Scenarios
11.2.3 Attack Strategy
11.3 Svm under Causality Availability Attack
11.4 Adversarial Label Contamination on Svm
11.4.1 Random Label Flips
11.4.2 Adversarial Label Flips
11.5 Conclusions
References
11.1 INTRODUCTION
Attacks against learning methods can be carried out either during the
training or testing stage.
In [28], the authors proposed a model for the analysis of label noise in
support vector learning and developed a modification of the SVM
formulation that indirectly compensates for the noise. The model is based
on a simple assumption that any label may be flipped with a fixed
probability. They demonstrated that the noise can be compensated for by
correcting the kernel matrix of SVM with a specially structured matrix,
which depends on the noise parameters. They adopt two different strategies
for contaminating the training sets through label flipping: random or
adversarial label flips.
11.5 CONCLUSIONS
REFERENCES
DOI: 10.1201/9781003187158-14
Contents
12.1 Introduction
12.2 Data Security and Federated Learning
12.3 Federated Learning Context
12.3.1 Type of Federation
12.3.1.1 Model-centric Federated Learning
12.3.1.2 Data-centric Federated Learning
12.3.2 Techniques
12.3.2.1 Horizontal Federated Learning
12.3.2.2 Vertical Federated Learning
12.4 Challenges
12.4.1 Trade-off Between Efficiency and Privacy
12.4.2 Communication Bottlenecks [1]
12.4.3 Poisoning [2]
12.5 Opportunities
12.5.1 Leveraging Blockchain
12.6 Use Case: Leveraging Privacy, Integrity, and Availability for Data-
Centric Federated Learning Using a Blockchain-based Approach
12.6.1 Results
12.7 Conclusion
References
12.1 INTRODUCTION
Data security [5] is defined as the process of securing and protecting private
and public sectors data and preventing them from data loss, data tampering,
and unauthorized access. This represents the CIA triad, which consists of
data confidentiality, integrity, and availability.
Seven principles of the GDPR specify the rules that each company or
individual needs to follow in order to deal with customers’ personal data.
Following [6], the principals are as follows:
Consumers have the right to access all the data that an organization
collects about them.
Consumers can choose to not have their information sold to third
parties.
Consumers can demand that an organization delete their personal data.
Consumers have the right to know to whom their data has been sold.
Consumers have the right to know the reason for the data collection.
Consumers can take legal action without proof of damages if they are
subjected to a beach of privacy.
12.3.2 Techniques
In federated learning, understanding your data, how it is split, and its
technical and practical challenges is really important in defining the terms
of implementation.
12.4 CHALLENGES
In this section we discuss the current issues and challenges that federated
learning is facing and that it’s required to focus on in order to obtain higher
efficiency [12].
12.5 OPPORTUNITIES
In our case, the model and the data are stored in a decentralized
environment that is the public blockchain, not in a centralized server that is
accessible by all network participants. Those same participants provide the
data to the model in a collaborative approach, making the data available for
model owners. The personal data is fully controlled by its owners, i.e., they
can control the amount of data they want to provide. With every input of
data, we have a data aggregation, but before any update to the model, the
data needs to be verified. To ensure the integrity of the data, we implement
an incentive mechanism to encourage contributors and to oblige each user
to insert a deposit in order to add data to train the model, making it costly
for malicious users to disturb model efficiency. The process starts by the
model owner providing a deposit, reward, and time-out function, thus
establishing the foundation of our incentive mechanism. The deposit
corresponds to the monetary amount that a user needs to make in order to
have access to the model smart contract, so that it can add data to train the
model inside the blockchain. The reward represents the amount that a user
receives when checking the integrity of the data and restores it. The time-
out function defines the time between a return of deposit and validation of
the owner. When the approved data is sent to the model for training, the
model is updated; for each beneficial deposit, a refund is made, and for each
beneficial verification of the inputs, a reward is offered. Access to the
contributor’s data is necessary since the data will only be used to train the
model within the smart contract. Those features will provide an enormous
improvement in terms of data security and machine learning.
12.6.1 Results
We explored in this approach different security aspects with respect to data
regulation policies while at the same time enhancing the efficiency of AI
models. Using federated learning, we are able to create a model for training
while attending to privacy concerns for personal data, and with blockchain,
we provided contributors with full anonymity, control, and access in
providing the data that they want the model to assess, while having full
transparency of the training process. Those features lead to the growth of
the training data set by an average of 500 additional data points per day.
Also, with the usage of the incentive mechanism, we are able to avoid
malicious contribution since the data is verified by the contributors before
the training phase. Since we are using a deposit/refund mechanism, each
input requires monetary deposit. After a period of time, we saw that the
account balance of malicious users went all the way to zero because all data
was verified and deleted when it led to no refund to the contributor.
Simultaneously the account balance of good contributors went up, since
they were rewarded for their verification. This mechanism help avoids the
poisoning problem described previously. As for availability, the
decentralized infrastructure of the blockchain helps in reaching countless
users, who can provide their data in a secure and private environment.
Despite all those security features, the accuracy and the efficiency of the
model increased by 5%. This is due to the fact that we don’t consider
adding noise to the data to enhance privacy, such as we do in differential
privacy. Thus the model is being trained with quality data received from
contributors
12.7 CONCLUSION
In this chapter, we have shown that, despite new data regulation and
policies, we are still able to provide quality and secure model training for
the AI model, while complying with regulatory rules. And this is possible
due to federated learning and blockchain technology. As we saw in our use
case, where we merged federated learning and blockchain, multiple data
security attributea were addressed while enhancing a AI model. This
combination offers a specific approach to contribute to the CIA triad; i.e.,
by using federated learning, we manage to provide privacy and
confidentiality to the data during training. At the same time, the blockchain
offers traceability, integrity, and transparency. Those attributes also answer
to the GDPR principals and CCPA regulations.
REFERENCES
DOI: 10.1201/9781003187158-16
Contents
13.1 Introduction
13.2 Is Artificial Intelligence Enough to Stop Cyber Crime?
13.3 Corporations’ Use of Machine Learning to Strengthen Their Cyber
Security Systems
13.4 Cyber Attack/Cyber Security Threats and Attacks
13.4.1 Malware
13.4.2 Data Breach
13.4.3 Structured Query Language Injection (SQL-i)
13.4.4 Cross-site Scripting (XSS)
13.4.5 Denial-of-service (DOS) Attack
13.4.6 Insider Threats
13.4.7 Birthday Attack
13.4.8 Network Intrusions
13.4.9 Impersonation Attacks
13.4.10 DDoS Attacks Detection on Online Systems
13.5 Different Machine Learning Techniques in Cyber Security
13.5.1 Support Vector Machine (SVM)
13.5.2 K-nearest Neighbor (KNN)
13.5.3 Naïve Bayes
13.5.4 Decision Tree
13.5.5 Random Forest (RF)
13.5.6 Multilayer Perceptron (MLP)
13.6 Application of Machine Learning
13.6.1 ML in Aviation Industry
13.6.2 Cyber ML Under Cyber Security Monitoring
13.6.3 Battery Energy Storage System (BESS) Cyber Attack
Mitigation
13.6.4 Energy-based Cyber Attack Detection in Large-Scale Smart
Grids
13.6.5 IDS for Internet of Vehicles (IoV)
13.7 Deep Learning Techniques in Cyber Security
13.7.1 Deep Auto-encoder
13.7.2 Convolutional Neural Networks (CNN)
13.7.3 Recurrent Neural Networks (RNNs)
13.7.4 Deep Neural Networks (DNNs)
13.7.5 Generative Adversarial Networks (GANs)
13.7.6 Restricted Boltzmann Machine (RBM)
13.7.7 Deep Belief Network (DBN)
13.8 Applications of Deep Learning in Cyber Security
13.8.1 Keystroke Analysis
13.8.2 Secure Communication in IoT
13.8.3 Botnet Detection
13.8.4 Intrusion Detection and Prevention Systems (IDS/IPS)
13.8.5 Malware Detection in Android
13.8.6 Cyber Security Datasets
13.8.7 Evaluation Metrics
13.9 Conclusion
References
13.1 INTRODUCTION
The facilities of the internet and its ease of use have created a platform for
the movement of data from anywhere to everywhere. Huge data is
accumulated by the various data sources, and they grow day by day and
even by second to second. Protecting all this data from malicious users in
the real world, where an attack is pertinent at each corner of the network, is
the main focal point. Cyber security has a significant role in this context of
maintaining security. Brute-force attack, credential stuffing, phishing,
malware attacks, botnets denial-of-service, instant messaging abuse, worms,
Trojans, intellectual property theft, rootkit, password sniffing, etc. are just
some of the attacks commonly faced by any organization or by users of the
internet. To detect and prevent these kinds of attacks, to assist analysts in
dealing with them, and to prevent the abuse of data in the modern era,
artificial intelligence categories like machine learning and deep learning
play a vital role. With the use of machine learning techniques, patterns are
developed and applied to algorithms to preemptively prevent various
unforeseen attacks and bolster the security system. To generate these
patterns, relevant, potentially rich-quality data is essential. Models
developed using machine learning techniques automate computers and
assist analysts in preserving valuable data without the explicit presence of
the analysts. In any kind of attack, the very first thing is to know about the
goal of the attack [2], which may be under any of the categories shown in
Figure 13.1. A second important dimension is to deflate an adversary’s
ability to execute malicious activities.
FIGURE 13.1 Types of attack.
In today’s emerging technology, data all over the world is evolving more
and more, squeezing and bringing the essence is the major challenge. These
squeezing reveal threats. In this aspect use of machine learning improves
cyber security by predicting the future with the support of human expertise.
13.4.1 Malware
Intruders have developed different kinds of software to destroy computer
systems, devices, and networks in order to steal sensitive information,
greatly disrupting users. The modalities of malware come in the form of
email attachments, infected files, file-sharing software, etc. Some of the
variants of malware include botnets, crypto jacking, virus, worms,
malvertising, ransomware, Trojans, and spyware, to name a few.
Botnets are networks of infected computers that mainly cause denial of
service attacks. Crypto jacking is a kind of malicious software that mines
for cryptocurrencies through malicious links in an email. These are
financially motivated, may finally overload the affected system, and can
lead to physical damage.
Viruses are executable files that attach themselves in a clean code or in
an automated process and spread across the network very quickly, inflicting
great damage on the functionalities of the system by corrupting and locking
them. Worm malware is a type of self-replicative attacks and may or may
not be attached with existing files or programs that start from one system
but spread through the entire network very quickly.
A seemingly legitimate advertisement that turns into an attack is
malvertising. Here, an advertisement in a legitimate network and website is
used to spread the malware. The ransomware/scareware malware blocks or
places a hold on some or all of the computer system until a certain amount
of money is paid to the attackers. This creates an expensive loss to the
organization since they cannot access their system.
The Trojan malware acts as legitimate software and creates backdoors
for other malware to attack easily. Spyware is a malware that hides in the
background of a system, by which attackers obtain sensitive information
like credit card numbers, passwords etc. from the infected systems.
Here n denotes the total quantity of data points, and xi and yi specify the
ith and jth feature elements of instances x and y. The objective of any
intrusion detection system is the quick detection of intrusion, mitigating the
false alarm and providing fast activation of the protective system [5, 6, 7].
But the problem with KNN is that is is a lazy learner and requires high
computational time. To overcome this, fast KNN is implemented to provide
less computational time with better accuracy.
The naïve Bayes algorithm calculates and saves the probabilities of theft
by providing certain attributes during the training phase. This is done for
each attribute, and the time it takes to calculate the necessary probability for
each attribute is recorded. The time it takes to calculate the probability of a
given class for each sample in the worst scenario is proportional to n, the
number of attributes, during the testing phase. In the worst-case scenario,
the testing phase takes the same amount of time as the training phase.
The user also specifies whether the data has to be worked with a
multiclass algorithm or a one-class classification algorithm. A multiclass
algorithm makes use of anomalous and nonanomalous data for all sets.
Anomalous data is used only in the testing set. For designing the model and
for performance evaluation, the machine learning algorithm is performed
using the collected data. This model, as shown in Figure 13.9, is used for
identifying the network status [27].
13.6.3 Battery Energy Storage System (BESS) Cyber Attack
Mitigation
For obtaining sufficient detection accuracy, the quality and the dataset of
appropriate size are essential. A battery management system (BMS) is used
for managing BESS. For preventing the attack against a battery’s state of
charge, predicting approaches can be applied. Depending on the state
estimation forecast with the sensing data, a state-of-the-art FDIA against the
electric grid detection method is done. If there is a variance between the
estimation and measurements, it means the residual surpasses the given
threshold. For residual-based method implementation, the measurement
prediction is necessary due to the constant data exchange between the BESS
and the electric grid; cyber attacks influence the integrity of commands that
the BESS receives, as depicted in Figure 13.10. Distributed methods have
been used in decentralized systems for detecting cyber attacks. The main
objective of the technique is to form a residual signal. It decides on a
residual assessment function that is associated with the predefined
threshold.
FIGURE 13.10 BESS cyber threat.
Once the attack is detected, its impact on the BESS operation has to be
removed. An unobserved FDIA might threaten the chronological data
applied for the training purposes and corrupt the prediction. For that reason,
we use pseudo-measurements to feel the gap generated by a cyber-attack.
Pseudo-measurements generation and SE forecast are the two major
mitigation schemes [28].
13.6.4 Energy-based Cyber Attack Detection in Large-Scale
Smart Grids
Dynamic Bayesian networks (DBN) are a probabilistic graphical model
used to simplify the structure. Directed acyclic graph (DAG), mutual
information (MI) to obtain the features, and restricted Boltzmann machine
(RBM), which trains the data and learns the pattern in the context behavior
with the help of unsupervised DBN model, are used in the process of cyber
attack detection depicted in Figure 13.11.
Each node represents a feature from the dataset that is multiplied with
the weight and added with the bias passed through the activation function,
giving the output of the particular node. Also, multiple inputs can be
combined and given to one particular neuron by multiplying the multiple
input with its weight, summing it up and adding the bias, and then sending
it through the activation function to generate the output. Training makes it
possible to produce a compact depiction of input. RBM helps in
differentiating normal and anomalous traffic.
where TP represents the data points that are rightly predicted as attack. TN
represents the data points classified as normal. FN are those wrongly
classified as normal. FP represents data points wrongly said to be attack.
The different evaluation metrics are as follows:
Precision refers to the rightly identified threats to all the samples, i.e.,
said to be attacks.
Recall specifies the data points rightly classified as attacks.
This measures the accuracy by considering precision and recall.
Accuracy is the ratio of correctly classified data points to the total number
of data points.
13.9 CONCLUSION
REFERENCES
Gabriel Kabanda
Zimbabwe Academy of Sciences, University of Zimbabwe, Harare,
Zimbabwe
DOI: 10.1201/9781003187158-17
Contents
14.1 Introduction
14.1.1 Background on Cyber Security and Machine Learning
14.1.2 Background Perspectives to Big Data Analytics and Cyber
Security
14.1.3 Supervised Learning Algorithms
14.1.4 Statement of the Problem
14.1.5 Purpose of Study
14.1.6 Research Objectives
14.1.7 Research Questions
14.2 Literature Review
14.2.1 Overview
14.2.2 Classical Machine Learning (CML)
14.2.2.1 Logistic Regression (LR)
14.2.2.2 Naïve Bayes (NB)
14.2.2.3 Decision Tree (DT)
14.2.2.4 K-nearest Neighbor (KNN)
14.2.2.5 AdaBoost (AB)
14.2.2.6 Random Forest (RF)
14.2.2.7 Support Vector Machine (SVM)
14.2.3 Modern Machine Learning
14.2.3.1 Deep Neural Network (DNN)
14.2.3.2 Future of AI in the Fight against Cyber Crimes
14.2.4 Big Data Analytics and Cyber Security
14.2.4.1 Big Data Analytics Issues
14.2.4.2 Independent Variable: Big Data Analytics
14.2.4.3 Intermediating Variables
14.2.4.4 Conceptual Framework
14.2.4.5 Theoretical Framework
14.2.4.6 Big Data Analytics Application to Cyber Security
14.2.4.7 Big Data Analytics and Cyber Security Limitations
14.2.4.8 Limitations
14.2.5 Advances in Cloud Computing
14.2.5.1 Explaining Cloud Computing and How It Has
Evolved to Date
14.2.6 Cloud Characteristics
14.2.7 Cloud Computing Service Models
14.2.7.1 Software as a Service (SaaS)
14.2.7.2 Platform as a Service (PaaS)
14.2.7.3 Infrastructure as a Service (IaaS)
14.2.8 Cloud Deployment Models
14.2.8.1 Private Cloud
14.2.8.2 Public Cloud
14.2.8.3 Hybrid Cloud
14.2.8.4 Community Cloud
14.2.8.5 Advantages and Disadvantages of Cloud
Computing
14.2.8.6 Six Main Characteristics of Cloud Computing and
How They Are Leveraged
14.2.8.7 Some Advantages of Network Function
Virtualization
14.2.8.8 Virtualization and Containerization Compared and
Contrasted
14.3 Research Methodology
14.3.1 Presentation of the Methodology
14.3.1.1 Research Approach and Philosophy
14.3.1.2 Research Design and Methods
14.3.2 Population and Sampling
14.3.2.1 Population
14.3.2.2 Sample
14.3.3 Sources and Types of Data
14.3.4 Model for Analysis
14.3.4.1 Big Data
14.3.4.2 Big Data Analytics
14.3.4.3 Insights for Action
14.3.4.4 Predictive Analytics
14.3.5 Validity and Reliability
14.3.6 Summary of Research Methodology
14.3.7 Possible Outcomes
14.4 Analysis and Research Outcomes
14.4.1 Overview
14.4.2 Support Vector Machine
14.4.3 KNN Algorithm
14.4.4 Multilinear Discriminant Analysis (LDA)
14.4.5 Random Forest Classifier
14.4.6 Variable Importance
14.4.7 Model Results
14.4.8 Classification and Regression Trees (CART)
14.4.9 Support Vector Machine
14.4.10 Linear Discriminant Algorithm
14.4.11 K-Nearest Neighbor
14.4.12 Random Forest
14.4.13 Challenges and Future Direction
14.4.13.1 Model 1: Experimental/Prototype Model
14.4.13.2 Model 2: Cloud Computing/Outsourcing
14.4.13.3 Application of Big Data Analytics Models in
Cyber Security
14.4.13.4 Summary of Analysis
14.5 Conclusion
References
14.1 INTRODUCTION
14.1.1 Background on Cyber Security and Machine Learning
Cyber security consolidates the confidentiality, integrity, and availability of
computing resources, networks, software programs, and data into a coherent
collection of policies, technologies, processes, and techniques in order to
prevent the occurrence of attacks [1]. Cyber security refers to a combination
of technologies, processes, and operations that are framed to protect
information systems, computers, devices, programs, data, and networks
from internal or external threats, harm, damage, attacks, or unauthorized
access, such as ransomware or denial of service attacks [2]. The rapid
advances in mobile computing, communications, and mass storage
architectures have precipitated the new phenomena of big data and the
Internet of Things (IoT). Outsider and insider threats can have serious
ramifications on an institution, for example, failure to provide services,
higher costs of operations, loss in revenue, and reputational damage [2, 3].
Therefore, an effective cyber security model must be able to mitigate cyber
security events, such as unauthorized access, zero day attack, denial of
service, data breach, malware attacks, social engineering (or phishing),
fraud, and spam attacks, through intrusion detection and malware detection
[4, 5].
The researcher identifies the primary variable that this study seeks to
investigate as cyber security. Prior literature notes that the definition of
cyber security differs among institutions and across nations [6]. However,
in basic terms cyber security may be defined as a combination of
technologies and processes that are set up to protect computer hosts,
programs, networks, and data from external and external threats, attacks,
harm, damage, and unauthorized access [2]. The major cyber security
applications are intrusion detection and malware detection. The
transformation and expansion of the cyber space has resulted in an
exponential growth in the amount, quality, and diversity of data generated,
stored, and processed by networks and hosts [7]. These changes have
necessitated a radical shift in the technology and operations of cyber
security to detect and eliminate cyber threats so that cyber security remains
relevant and effective in mitigating costs arising from computers, networks,
and data breaches [2].
A cyber crime is a criminal activity that is computer related or that uses
the internet. Dealing with cyber crimes is a big challenge all over the world.
Network intrusion detection systems (NIDS) are a category of computer
software that monitors system behaviour with a view to ascertain
anomalous violations of security policies and that distinguishes between
malicious users and the legitimate network users [8]. The two taxonomies
of NIDS are anomaly detectors and misuse network detectors. According to
[9], the components in intrusion detection and prevention systems (IDPSs)
can be sensors or agents, servers, and consoles for network management.
Artificial intelligence (AI) emerged as a research discipline at the
Summer Research Project of Dartmouth College in July 1956. Genetic
algorithms are an example of an AI technique, which imitates the process of
natural selection and is founded on the theory of evolutionary computation.
Data over networks may be secured through the use of antivirus software,
firewall, encryption, secure protocols, etc. However, hackers can always
devise innovative ways of breaking into network systems. An intrusion
detection and prevention system (IDPS), shown on Figure 14.1, is placed
inside the network to detect possible network intrusions and, where
possible, prevent cyber attacks. The key functions of the IDPSs are to
monitor, detect, analyze, and respond to cyber threats.
1. Supervised learning: Where the methods are given inputs labeled with
corresponding outputs as training examples
2. Unsupervised learning: Where the methods are given unlabeled inputs
3. Reinforcement learning: where data is in the form of sequences of
observations, actions, and rewards
Big data came into existence when the traditional relational database
systems were not able to handle the unstructured data generated by
organizationd, social media [79], or any other data generating sourced [29].
It is easy to “predict” the inexorably increasing availability of big data due
to ongoing technology evolution. Specifically, environmental sensor data
and social media data will become increasingly available for disaster
management due to the advances of many kinds of capable sensors [30].
Characterising big data reveals that it is one place where organizations can
derive very useful data that can aid them in informed decision making. The
challenge with developing countries is that, due to various reasons, they lag
far behind when it comes to technological innovations that support the use
of current inventions. There is a lack of infrastructure to support such
innovations, a lack of skilled data scientists, and a lack of policies or
legislation that promote such innovations.
Some researchers have added the fourth V, that is, veracity. to stress the
importance of maintaining quality data within an organization. In such a
situation, BDA is largely supported by Apaches Hadoop framework, which
is an open-source, completely fault-tolerant, and highly scalable distributed
computing paradigm. Compared to traditional approaches, security
analytics provides a “richer” cyber security context by separating what is
“normal” from what is “abnormal,” i.e., separating the patterns generated
by legitimate users from those generated by suspicious or malicious users.
Passive data sources can include computer-based data, for example
geographical IP location; computer security health certificates; and
keyboard typing and clickstream patterns. Mobile-based data may include,
for example, GPS location, network location, WAP data. Active (relating to
real-time) data sources can include credential data, e.g., username and
password, one-time passwords for, say, online access and digital
certificates.
Big data analytics (BDA) can offer a variety of security dimensions in
network traffic management, access patterns in web transactions [31],
configuration of network servers, network data sources, and user
credentials. These activities have brought a huge revolution in the domains
of security management, identity and access management, fraud prevention
and governance, risk and compliance. However, there is also a lack of in-
depth technical knowledge regarding basic BDA concepts, Hadoop,
predictive analytics, and cluster analysis, etc. With these limitations in
mind, appropriate steps can be taken to build on the skills and competences
of security analytics.
The information that is evaluated in big data analytics includes a mixer
of unstructured and semistructured data, for instance, social media content
[32, 33, 34], mobile phone records, web server logs, and internet click
stream data. Also analysed includes text from survey responses, customer
emails, and machine data captured by sensors connected to the Internet of
Things (IoT).
evaluate machine learning and big data analytics paradigms for use in
cyber security;
develop a cyber security system that uses machine learning and big
data analytics paradigms.
how are the machine learning and big data analytics paradigms used in
cyber security?
how is a cyber security system developed using machine learning and
big data analytics paradigms?
14.2.1 Overview
Computers, phones, internet, and all other information systems developed
for the benefit of humanity are susceptible to criminal activity [10]. Cyber
crimes consist of offenses such as computer intrusions, misuse of
intellectual property rights, economic espionage, online extortion,
international money laundering, nondelivery of goods or services, etc. [11].
Today, “cyber crimes” has become known as a very common phrase, yet its
definition has not been easy to compose as there are many variants to its
classification. In their varying and diverging definitions, the majority of the
authors agree on two of the most important aspects: that it’s usually an
unauthorized performance of an act on a computer network by a third party
and that, usually, that party seeks to bring damage or loss to the
organization.
On a daily basis, the amount of data processed and stored on computers
as well as other digital technologies is increasing and human dependence on
these computer networks is also increasing. Intrusion detection and
prevention systems (IDPS) include all protective actions or identification of
possible incidents and analysing log information of such incidents [9]. In
[11], the use of various security control measures in an organization was
recommended. Various attack descriptions from the outcome of the research
by [37] are shown on Table 14.2.
TABLE 14.2 Various Attack Descriptions
ATTACK DESCRIPTION
TYPE
DoS Denial of service: an attempt to make a network resource unavailable to
its intended users; temporarily interrupt services of a host connected to the
Internet
Scan A process that sends client requests to a range of server port addresses on
a host to find an active port
Local The attacker has an account on the system in question and can use that
access account to attempt unauthorized tasks
User to Attackers access a user account on the system and are able to exploit some
root vulnerability to gain root access to the system
Data Attackers involve someone performing an action that they may be able to
do on a given computer system but that they are not allowed to do
according to policy
Source: [37].
f: R → R, (14.2.3)
FIGURE 14.4 Conceptual framework for cyber security and big data
analytics.
14.2.4.8 Limitations
According to [63], there are three main three major limitations of big data.
14.2.5.2 Definition
Cloud computing is about using the internet to access someone else’s
software running on someone else’s hardware in someone else’s data center
[64]. Cloud computing is essentially virtualized distributed processing,
storage, and software resources and a service, where the focus is on
delivering computing as an on-demand, pay-as-you-go service. In [1],
similar definitions were provided that put more emphasis on the service
being highly scalable and subscription based, the pay-per-use aspect and
delivery channel being the Internet. In [26],cloud computing’s role in
driving IT adoption in developing economies was highlighted. According to
[65], a cloud is a type of parallel and distributed system consisting of a
collection of interconnected and virtualized computers that are dynamically
provisioned.
The characteristics of a cloud include the following:
Pay-per-use
Elastic capacity
Illusion of infinite resources
Self-service interface
Resources that are abstracted or virtualized
Provision of development tools and API to build scalable applications
on their services
1. On-demand self-service
2. Broad network access
3. Resource pooling, i.e., location independence
4. Rapid elasticity
5. Measured service
Google Docs,
Aviary,
Pixlr,
Microsoft Office Web App.
language,
operating system (OS),
database,
middleware,
other applications.
rapid deployment,
low cost,
private or public deployment.
1. Geographical presence
a. Responsiveness
b. Availability
2. User interfaces and access to servers
a. Providing means of accessing their cloud
i. GUI
ii. CLI
iii. Web services
3. Advance reservation of capacity
a. Time-frame reservations
4. Automatic scaling and load balancing
a. Elasticity of the service
b. One of the most desirable features of an IaaS cloud
c. Traffic distribution
5. Service-level agreement
a. As with all services, parties must sign an agreement.
b. Metrics
Uptime, performance measures
c. Penalties
Amazon
d. Hypervisor and operating system choice
Xen
VMWare, vCloud, Citric Cloud Center
privacy,
security,
availability,
legal issues,
compliance,
performance.
Who benefits from cloud computing (CC)? CC should not be used by the
internet-impaired, offline workers, the security conscious, and anyone
married to existing applications, e.g., Microsoft Office. The most
commonly used CC services (SaaS) include
For most businesses today when they plan to modernize their computing
and networking architectures, cloud-native architectures have become the
principal target environments [83]. Cloud computing is now a vital part of
corporate life, bringing a significant opportunity to accelerate business
growth through efficient utilization of IT resources and providing new
opportunities for collaboration.
Several benefits are associated with cloud adoption, and these are
highlighted in this section. In [84] as cited in [85], the attributes of cloud
computing include:
14.2.8.5.2.1 Flexibility
Business flexibility is increased tremendously by the adoption of cloud
computing, which provides flexibility to the working arrangements of
employees within or outside their workplace. Employees who are on
business trip can access the data as long as they have an internet connection
through any kind of device. Every employee can get the updated version of
the platforms and services.
14.2.8.5.2.4 Agility
In response to the customer’s fast changing needs, organizations need the
capability to stay competitive. Cloud computing is available 24/7 due to the
availability of the internet around the clock. This enables organizations to
deliver the services in the shortest possible time, and thus it can be used as a
competitive tool for rapid development.
14.2.8.5.3 Summary
The top six benefits of cloud computing are as follows:
Similarly, database services are available in CC, and the common examples
are
Dabbledb.com [89],
Teamdesk.net,
Trackvia.com,
Baseportal.com,
Springbase.com,
Viravis.com,
Infodome.com,
Creator.zoho.com,
Quickbase.intuit.com.
With broad network access, the computing resources on the web can be
accessed from any standard device from anywhere and at anytime. Broad
network access provides the ability to access cloud services through
multiple devices, i.e., heterogeneous thin or thick client platforms (e.g.,
workstations, laptops, mobile phones, and tablets) [66]. In fact, these days
this also speaks to access using any Internet-capable device. In broad
network access, users can have access to cloud computing from anywhere
and at any time using less disk device.
Broad network access is when any network-based appliance can access
the hosted application from devices that can include but are not limited to
laptops,
desktops,
smartphones,
tablet devices.
3. Resource Pooling
location independence,
pooled provider resources serving multiple clients.
4. Rapid Elasticity
5. Measured Service
6. Multitenacity
Massive scale
Homogeneity
Virtualization
Resilient computing
Low-cost software
Geographic distribution
Service orientation
Advanced security technologies
14.2.8.7 Some Advantages of Network Function Virtualization
Network function virtualization (NFV) is a new paradigm for designing and
operating telecommunication networks. Traditionally, these networks rely
on dedicated hardware-based network equipment and their functions to
provide communication services. However, this reliance is becoming
increasingly inflexible and inefficient, especially in dealing with traffic
bursts, for example, during large crowd events. NFV strives to overcome
current limitations by (1) implementing network functions in software and
(2) deploying them in a virtualized environment. The resulting virtualized
network functions (VNFs) require a virtual infrastructure that is flexible,
scalable, and fault tolerant.
The growing maturity of container-based virtualization and the
introduction of production-grade container platforms promotes containers
as a candidate for the implementation of NFV infrastructure (NFVI).
Containers offer a simplified method of packaging and deploying
applications and services.
14.2.8.7.1 Virtualization
Virtualization is basically making a virtual image, or “version,” of
something usable on multiple machines at the same time. This is a way of
managing workload by transforming traditional computing to make it more
scalable, efficient, and economical. Virtualization can be applied to
hardware-level virtualization, operating system virtualization, and server
virtualization. In virtualization, the costs of hardware are reduced and
energy is saved when the resources and services are separated from the
underlying physical delivery environment. A primary driver for
virtualization is consolidation of servers in order to improve the efficiency
and potential cost savings.
Virtualization entails
underutilization of resources,
division of resources,
maintenance required, e.g., controlling job flow.
The benefits of virtualization include the lower costs and extended life
of the technology, which has made it a popular option with small to
medium-sized businesses. The physical infrastructure owned by the service
provider is shared among many users using virtualization in order to
increase the resource utilization. Virtualization facilitates efficient resource
utilization and increased return on investment (ROI), which results in low
capital expenditures (CapEx) and operational expenditures (OpEx).
With virtualization, one can attain better utilization rates of the resources
of the service providers, increase ROI for both the service providers and the
consumers, and promote green IT by reducing energy wastage.
Virtualization technology has a drawback as follows:A single point of
failure in the virtualization software could affect the performance of the
entire system. Virtualization in general has tremendous advantages, as
follows:
Virtualization support
Backbone
CPU, memory, storage
Sizing and resizing
Self-service, on-demand resource provisioning
Directly obtaining services from the cloud
Creation of servers
Tailoring software
Configurations
Security policies
Elimination of having to go through a system administrator
Multiple back-end hypervisors
Drawbacks of virtualization models
Uniform management of virtualization
Storage virtualization
Abstracting logical storage from physical storage
Creation of an independent virtual disk
Storage area networks (SAN)
Fiber channel, iSCSI, NFS
Interface to public clouds
Overloading requiring borrowing
VIMs obtaining resources from external sources using VIMs during
spikes
Dynamic resource allocation
Having to allocate and deallocate resources as needed
Difficulty in calculating demand prediction
Moving loads around to reduce overheating
Monitoring resource utilization and reallocating accordingly
Virtual clusters
Holistically managing interconnected groups of virtual machines
High availability and data recovery
Need for little downtime
14.2.8.8.1 Virtualization
Virtualization is the optimum way to efficiently enhance resource
utilization. It refers to the act of creating a virtual (i.e., similar to actual)
variations of the system. Physical hardware is managed with the help of
software and converted into the logical resource that is in a shared pool or
that can be used by the privileged user. This service is known as
infrastructure-as-a-service. Virtualization is base of any public and private
cloud development. Most public cloud providers such as Amazon EC2,
Google Compute Engine, and Microsoft Azure leverage virtualization
technologies to power their public cloud infrastructure [90]. The core
component of virtualization is the hypervisor.
14.2.8.8.2 B. Hypervisor
A hypervisor is software that provides isolation for virtual machines
running on top of physical hosts. This thin layer of software that typically
provides the capabilities to virtual partitioning and that runs directly on
hardwareprovides a potential for virtual partitioning and is responsible for
running multiple kernels on top of the physical host. This feature makes the
application and process isolation very expensive, so there will be a big
impact if computer resources can be used more efficiently. The most
popular hypervisors today are VMware, KVM, Xen, and HyperV.
As the generic term in the Linux universe, the container offers more
opportunity for confusion. Basically, a container is nothing more than a
virtual file system that is isolated, with some Linux kernel features such as
namespaces and process groups, from the main physical system. A
container framework offers an environment as close to the desired one as
we can ask of a VM but without the overhead that comes with running on
another kernel and simulating all the hardware. Due to lightweight nature of
containers, more of them can run per host than virtual machines can. Unlike
containers, virtual machines require emulation layers (either software or
hardware), which consume more resources and add additional overhead.
Containers are different from virtualization in a number of aspects.
14.3.2.2 Sample
The researcher identified two data analytics models or frameworks from a
review of literature and the sample size of 8. Eight participants in total were
interviewed. However, while this may be limited data, it is sufficient for the
present needs of this study. Research in the future may review more
journals to identify more data analytics models that can be applied to cyber
security.
14.4.1 Overview
Figure 14.11 shows the landscape for intrusion detection. Service provision
by each specific piece of equipment with a known IP address determines
the network traffic behavior. To avoid denial of service attacks, the
knowledge representation model can be established using sensibility
analysis [8]. Figure 14.12 details the simple rules for the analysis of attack.
The occurrence of an unusual behavior on the network triggers an alarm on
the IDS in anomaly-based intrusion detection.
FIGURE 14.11 Landscape for intrusion detection.
In [99], the use of machine learning (ML), neural network, and fuzzy
logic to detect attacks on private networks on the different artificial
intelligence (AI) techniques was highlighted. It is not technically feasible to
develop a perfect sophisticated intrusion detection system, since the
majority of IDSs are signature based.
The IDS is divided into either a host IDS (HIDS) or as a network IDS
(NIDS). Analysis of the network traffic can be handled by a NIDS that
distinguishes unlicensed, illegitimate, and anomalous behavior on the
network. Packets traversing through the network should generally be
captured by the IDS using network taps or span port in order to detect and
flag any suspicious activity [99]. Anomalous behavior or malicious activity
on the specific device can be effectively detected by a device-specific IDS.
The ever increasing Internet of Things (IoT) creates a huge challenge to
achieving absolute cyber security. The IDS is characterised by network
observation, analysis, and identification of possible behavior anomalies or
unauthorized access to the network, with some protecting the computer
network in the event of an intrusion. However, existing methods have
several limitations and problems of the existing methods [100].
Provision of a reliable way of protecting the network system or of
trusting an existing IDS is a greater challenge. In cases of specific
weaknesses and limitations, administrators would be required to regularly
update the protection mechanisms, which further challenges the detection
system. The vulnerability of networks and susceptibility to cyber attacks is
exacerbated by the use of wireless technology [100].
The gross inadequacies of classical security measures have been overtly
exposed. Therefore, effective solutions for a dynamic and adaptive network
defence mechanism should be determined. Neural networks can provide
better solutions for the representative sets of training data [100]. [100]
argues for the use of ML classification problems that are solvable with
supervised or semisupervised learning models for the majority of the IDS.
However, the one major limitation of the work done by [100] is on the
informational structure in cyber security for the analysis of the strategies
and the solutions of the players.
Autonomous robotic vehicles attract cyber attacks, which prevent them
from accomplishing the intrusion prevention mission. Knowledge-based
and vehicle-specific methods have limitations in detection, which is
applicable to only specific known attacks [41]. The attack vectors of the
attack scenarios used by [41] is shown on Figure 14.13.
FIGURE 14.13 Attack vectors of the attack scenarios.
Source: [41].
Table 14.4 shows a comparison of the data mining techniques that can be
used in intrusion detection.
An IDS must keep track of all the data, networking components, and
devices involved. Additional requirements must be met when developing a
cloud-based intrusion detection system due to its complexity and integrated
services.
where is the input vector to the support vector classifier, is the real vector of
weights, and f is the function that translates the dot product of the input and
real vector of weights into desired classes of bank performance. is learned
from the labeled training data set.
where d(x, y) is the distance measure for finding the similarity between new
observations and training cases and then finding the K-closest instance to
the new instance. Variables are standardized before calculating the distance
since they are measured in different units. Standardization is performed by
the following function:
where is the standardized value, X is the instance measure, and mean and
s.d. are the mean and standard deviation of instances. Lower values of K are
sensitive to outliers, and higher values are more resilient to outliers and
more voters are considered to decide the prediction.
where the overall is mean, and and Ni are the sample mean and sizes of the
respective classes. The within-class variance is then calculated, which is the
distance between the mean and the sample of every class.et Sy. be the
within-class variance.
The final procedure is to then construct the lower dimensional space for
maximization of the seperability between classes and the minimization of
within-class variance. Let P be the lower dimensional space.
The LDA estimates the probability that a new instance belongs to every
class. Bayes theorem is used to estimate the probabilities. For instance, if
the output of the class is a, and the input is b. Then
where P|a is the prior probability of each class as obseed ithe training
dataset, f(b) is the estimated probability of b belonging to the class, and f(b)
uses the Gaussian distribution function to determine whether b belongs to
that particular class.
The next procedure was fitting these variables into our algorithms and
hence evaluating their performance using the metrics discussed in the
models section. The Boruta algorithm also clusters banks on important
variables, as shown in Figure 14.16, for effective risk management and
analysis.
FIGURE 14.16 Boruta algorithm clustering banks based on
nonperforming loans.
The level of accuracy on the training dataset was 66.2%. The best tune
parameter for our model was k = 9, or 9 neighbors as shown on the
accuracy curve in Figure 14.20. The Kappa statistic and the Kappa SD were
47.2% and 0.17, respectively. On the test dataset, the algorithm achieved an
accuracy level of 67.5% and a kappa of 49%. The algorithm was not highly
effective in classifying bank performance in comparison to other
algorithms.
FIGURE 14.20 KNN confusion accuracy graph.
On the training set, the accuracy of our random forest was 85.5%, as
designated in Table 14.11. The best tune parameter for our model was the
mtry of 14, which is the number of randomly selected predictors in
constructing trees, as shown on Figure 14.21. The Kappa statistic and the
Kappa SD were 78.3% and 0.09, respectively. On the test dataset, the
algorithm achieved an accuracy level of 96% and a Kappa of 96%. The
algorithm was highly effective in classifying bank performance in
comparison to all algorithms.
14.5 CONCLUSION
REFERENCES
1. Berman, D.S., Buczak, A.L., Chavis, J.S., and Corbett, C.L. (2019).
“Survey of Deep Learning Methods for Cyber Security,” Information
2019, 10, p. 122. DOI:10.3390/info10040122
2. Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., &
Ng, A. (2020). Cybersecurity data science: an overview from machine
learning perspective. Journal of Big Data.
https://doi.org/10.1186/s40537-020-00318-5
3. Gheyas, I. A. & Abdallah, A. E. (2016). Detection and prediction of
insider threats to cyber security: A systematic Literature Review and
Meta-Analysis. Big Data Analytics, 1, p. 6.
4. Kantarcioglu, M & Xi B (2016). Adversarial Data Mining: Big data
meets cybersecurity, CCS, 16 October 24–28, 2016, Vienna, Austria.
5. Lei, G., Liu, C., Li, Y., Chen, D., Guo, Y., Zhu, J. (2019). Robust
Design Optimization of a High-Temperature Superconducting Linear
Synchronous Motor Based on Taguchi Method. IEEE Transactions on
Applied Superconductivity, 29(2), pp. 1–6.
6. Min, K S, Chai S W, & Han M (2015). An international comparative
study on cyber security strategy. International Journal of Security and
Its Applications, 9(2), pp. 13–20.
7. Lee, J. W., & Xuan, Y. (2019). Effects of technology and innovation
management and total factor productivity on the economic growth of
China. Journal of Asian Finance, Economics and Business, 6(2), pp.
63–73. https://doi.org/10.13106/jafeb.2019. vol6.no2.63.
8. Bringas, P.B., and Santos, I. (2010). Bayesian Networks for Network
Intrusion Detection, Bayesian Network, Ahmed Rebai (Ed.), ISBN:
978-953-307-124-4, InTech, Available from:
www.intechopen.com/books/bayesian-network/bayesian-networks-for-
network-intrusion-detection
9. Umamaheswari, K., and Sujatha, S. (2017). Impregnable Defence
Architecture using Dynamic Correlation-based Graded Intrusion
Detection System for Cloud. Defence Science Journal, 67(6), pp. 645–
653. DOI: 10.14429/dsj.67.11118.
10. Nielsen, R. (2015). CS651 Computer Systems Security Foundations 3d
Imagination Cyber Security Management Plan, Technical Report
January 2015, Los Alamos National Laboratory, USA.
11. Stallings, W. (2015). Operating System Stability. Accessed on 27th
March, 2019.
www.unf.edu/public/cop4610/ree/Notes/PPT/PPT8E/CH15-OS8e.pdf
12. Truong, T.C; Diep, Q.B.; & Zelinka, I. (2020). Artificial Intelligence in
the Cyber Domain: Offense and Defense. Symmetry 2020, 12, 410.
13. .Proko, E., Hyso, A., and Gjylapi, D. (2018). Machine Learning
Algorithms in Cybersecurity, www.CEURS-WS.org/Vol-2280/paper-
32.pdf
14. Pentakalos, O. (2019). Introduction to machine learning. CMG IMPACT
2019. https://doi.org/10.4018/978-1-7998-0414-7.ch003
15. Russom, P (2011) Big Data Analytics. In: TDWI Best Practices Report,
pp. 1–40.
16. Hammond, K. (2015). Practical Artificial Intelligence For Dummies®,
Narrative Science Edition. Hoboken, New Jersey: Wiley.
17. Bloice, M. & Holzinger, A. (2018). A Tutorial on Machine Learning
and Data Science Tools with Python. Graz, Austria: s.n.
18. Cuzzocrea, A., Song, I., Davis, K.C. (2011). Analytics over Large-Scale
Multidimensional Data: The Big Data Revolution! In: Proceedings of
the ACM International Workshop on Data Warehousing and OLAP, pp.
101–104.
19. Moorthy, M., Baby, R. & Senthamaraiselvi, S. (2014). An Analysis for
Big Data and its Technologies. International Journal of Computer
Science Engineering and Technology(IJCSET), 4(12), pp. 413–415.
20. Menzes, F.S.D., Liska, G.R., Cirillo, M.A. and Vivanco, M.J.F. (2016)
Data Classification with Binary Response through the Boosting
Algorithm and Logistic Regression. Expert Systems with Applications,
69, pp. 62–73. https://doi.org/10.1016/j.eswa.2016.08.014
21. Mazumdar, S & Wang J (2018). Big Data and Cyber security: A visual
Analytics perspective, in S. Parkinson et al (Eds), Guide to Vulnerability
Analysis for Computer Networks and Systems.
22. Economist Intelligence Unit: The Deciding Factor: Big Data & Decision
Making. In: Elgendy, N.: Big Data Analytics in Support of the Decision
Making Process. MSc Thesis, German University in Cairo, p. 164
(2013).
23. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B.,
Babu, S.: Starfish: A Self-tuning System for Big Data Analytics. In:
Proceedings of the Conference on Innovative Data Systems Research,
pp. 261–272 (2011).
24. EMC (2012): Data Science and Big Data Analytics. In: EMC Education
Services, pp. 1–508.
25. Kubick, W.R.: Big Data, Information and Meaning. In: Clinical Trial
Insights, pp. 26–28 (2012).
26. Wilson, B. M. R., Khazaei, B., & Hirsch, L. (2015, November).
Enablers and barriers of cloud adoption among Small and Medium
Enterprises in Tamil Nadu. In: 2015 IEEE International Conference on
Cloud Computing in Emerging Markets (CCEM) (pp. 140–145). IEEE.
27. .Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., &
Ullah Khan, S. (2015). The rise of “big data” on cloud computing:
Review and open research issues. In Information Systems.
https://doi.org/10.1016/j.is.2014.07.006
28. Hadi, J. (2015) “Big Data and Five V’S Characteristics,” International
Journal of Advances in Electronics and Computer Science (2), pp.
2393–2835.[29]. Serrat, O.: Social Network Analysis. Knowledge
Network Solutions 28, 1–4 (2009).
29. Siti Nurul Mahfuzah, M., Sazilah, S., & Norasiken, B. (2017). An
Analysis of Gamification Elements in Online Learning To Enhance
Learning Engagement. 6th International Conference on Computing &
Informatics.
30. Pu, C. and Kitsuregawa, M., 2019, Technical Report No. GIT-CERCS-
13-09; Georgia Institute of Technology, CERCS.
31. Shen, Z., Wei, J., Sundaresan, N., Ma, K.L.: Visual Analysis of Massive
Web Session Data. In: Large Data Analysis and Visualization (LDAV),
pp. 65–72 (2012).
32. Asur, S., Huberman, B.A. (2010). Predicting the Future with Social
Media. ACM International Conference on Web Intelligence and
Intelligent Agent Technology, 1, pp. 492–499.
33. Van Der Valk, T., Gibers, G (2010): The Use of Social Network
Analysis in Innovation Studies: Mapping Actors and Technologies.
Innovation: Management, Policy & Practice 12(1), 5–17.
34. Zeng, D., Hsinchun, C., Lusch, R., Li, S.H.: Social Media Analytics and
Intelligence. IEEE Intelligent Systems 25(6), 13–16 (2010).
35. Bolzoni, D. (2009). Revisiting Anomaly-based Network Intrusion
Detection Systems, Ph.D Thesis, University of Twente, The
Netherlands, ISBN: 978-90-365-2853-5, ISSN: 1381-3617.
DOI:10.3990/1.9789036528535
36. Gercke, M. (2012). “Cybercrime Understanding Cybercrime,”
Understanding cybercrime: phenomena, challenges and legal response.
37. Karimpour, J., Lotfi, S., and Siahmarzkooh, A.T. (2016). Intrusion
detection in network flows based on an optimized clustering criterion,
Turkish Journal of Electrical Engineering & Computer Sciences,
Accepted/Published Online: 17.07.2016,
http://journals.tubitak.gov.tr/elektrik
38. Murugan, S., and Rajan, M.S. (2014). Detecting Anomaly IDS in
Network using Bayesian Network, IOSR Journal of Computer
Engineering (IOSR-JCE), e-ISSN: 2278-0661, p- ISSN: 2278-8727,
Volume 16, Issue 1, Ver. III (Jan. 2014), PP 01-07,
www.iosrjournals.org
39. National Institute of Standards and Technology (2018). Framework for
Improving Critical Infrastructure Cybersecurity Version 1.1.
40. .Fernando, J. I., & Dawson, L. L. (2009). The health information system
security threat lifecycle: An informatics theory. International Journal of
Medical Informatics. https://doi.org/10.1016/j.ijmedinf.2009.08.006
41. Bezemskij, A., Loukas, G., Gan, D., and Anthony, R.J. (2017).
Detecting cyber-physical threats in an autonomous robotic vehicle using
Bayesian Networks, 2017 IEEE International Conference on Internet of
Things (iThings) and IEEE Green Computing and Communications
(GreenCom) and IEEE Cyber, Physical and Social Computing
(CPSCom) and IEEE Smart Data (SmartData), 21–23 June 2017, IEEE,
United Kingdom, https://ieeexplore.ieee.org/document/8276737
42. Gercke, M. (2012). “Cybercrime Understanding Cybercrime,”
Understanding cybercrime: phenomena, challenges and legal response.
43. National Institute of Standards and Technology (2018). Framework for
Improving Critical Infrastructure Cybersecurity Version 1.1.
44. Analysis, F., Cybersecurity, F., Development, S., & Nemayire, T.
(2019). A Study on National Cybersecurity Strategies A Study on
National Cybersecurity Strategies.
45. .CENTER for Cyber and Information Security (https://ccis.no/cyber-
security-versus-information-security/
46. Op-Ed column, Gary Marcus and Ernest Davis (2019). How to Build
Artificial Intelligence We Can Trust,
www.nytimes.com/2019/09/06/opinion/ai-explainability.html
47. Cox, R. & Wang, G. (2014). Predicting the US bank failure: A
discriminant analysis. Economic Analysis and Policy, Issue 44.2, pp.
201–211.
48. Fernando, J. I., & Dawson, L. L. (2009). The health information system
security threat lifecycle: An informatics theory. International Journal of
Medical Informatics. https://doi.org/10.1016/j.ijmedinf.2009.08.006
49. Lakshami, R.V. (2019), Machine Learning for Cyber Security using Big
Data Analytics. Journal of Artificial Intelligence, Machine Learning
and Soft Computing, 4(2), pp. 1–8.
http://doi.org/10.5281/zenodo.3362228
50. Aljebreen, M.J. (2018, p.18), Towards Intelligent Intrusion Detection
Systems for Cloud Comouting, Ph.D. Dissertation, Florida Institute of
Technology, 2018.
51. Yang, C., Yu, M., Hu, F., Jiang, Y., & Li, Y. (2017). Utilizing Cloud
Computing to address big geospatial data challenges. Computers,
Environment and Urban Systems.
https://doi.org/10.1016/j.compenvurbsys.2016.10.010
52. Petrenko, S A & Makovechuk K A (2020). Big Data Technologies for
Cybersecurity.
53. Snowdon, D. A., Sargent, M., Williams, C. M., Maloney, S., Caspers,
K., & Taylor, N. F. (2019). Effective clinical supervision of allied health
professionals: A mixed methods study. BMC Health Services Research.
https://doi.org/10.1186/s12913-019-4873-8
54. Bou-Harb, E., & Celeda, P. (2018). Survey of Attack Projection,
Prediction, and Forecasting in Cyber Security. September.
https://doi.org/10.1109/COMST.2018.2871866
55. Wang, H., Zheng, Z., Xie, S., Dai, H. N., & Chen, X. (2018).
Blockchain challenges and opportunities: a survey. International
Journal of Web and Grid Services.
https://doi.org/10.1504/ijwgs.2018.10016848
56. Zhang, L., Wu, X., Skibniewski, M. J., Zhong, J., & Lu, Y. (2014).
Bayesian-network-based safety risk analysis in construction projects.
Reliability Engineering and System Safety.
https://doi.org/10.1016/j.ress.2014.06.006
57. Laney, D., and Beyer, M.A. (2012), “The importance of ‘big data’: A
definition,” Russell, S., & Norvig, P. (2010). Artificial Intelligence: A
Modern Approach (3rd edition). Prentice Hall.
58. Editing, S., Cnf, D. N. F., Paul M. Muchinsky, Hechavarría, Rodney;
López, G., Paul M. Muchinsky, Drift, T. H., 研究開発戦略センター国
立研究開発法人科学技術振興機構, Basyarudin, Unavailable, O. H.,
Overview, C. W., Overview, S. S. E., Overview, T., Overview, S. S. E.,
Graff, G., Birkenstein, C., Walshaw, M., Walshaw, M., Saurin, R., Van
Yperen, N. W., … Malone, S. A. (2016). Qjarterly. Computers in
Human Behavior. https://doi.org/10.1017/CBO9781107415324.004
59. Iafrate, F. (2015), From Big Data to Smart Data, ISBN: 978-1-848-
21755-3 March, 2015, Wiley-ISTE, 190 Pages.
60. Pence, H. E. (2014). What is Big Data and Why is it important? Journal
of Educational Technology Systems, 43(2), pp. 159–171.
https://doi.org/10.2190/ET.43.2.d
61. Thomas, E. M., Temko, A., Marnane, W. P., Boylan, G. B., &
Lightbody, G. (2013). Discriminative and generative classification
techniques applied to automated neonatal seizure detection. IEEE
Journal of Biomedical and Health Informatics.
https://doi.org/10.1109/JBHI.2012.2237035
62. Pavan Vadapalli (2020). “AI vs Human Intelligence: Difference
Between AI & Human Intelligence,” 15th September, 2020,
www.upgrad.com/blog/ai-vs-human-intelligence/
63. Dezzain.com website (2021), www.dezzain.com/
64. Cunningham, Lawrence A. (2008). The SEC’s Global Accounting
Vision: A Realistic Appraisal of a Quixotic Quest. North Carolina Law
Review, Vol. 87, 2008, GWU Legal Studies Research Paper No. 401,
GWU Law School Public Law Research Paper No. 401, Available at
SSRN: https://ssrn.com/abstract=1118377
65. Buyya, R., Yeo, C. S., & Venugopal, S. (2008). Market-oriented cloud
computing: Vision, hype, and reality for delivering IT services as
computing utilities. Proceedings – 10th IEEE International Conference
on High Performance Computing and Communications, HPCC 2008.
https://doi.org/10.1109/HPCC.2008.172
66. Mell, P. M., & Grance, T. (2011). The NIST definition of cloud
computing. https://doi.org/10.6028/NIST.SP.800-145
67. Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., Gao, M., Hou, H.,
& Wang, C. (2018). Machine Learning and Deep Learning Methods for
Cybersecurity. IEEE Access, 6, pp. 35365–35381.
https://doi.org/10.1109/ACCESS.2018.2836950
68. Hassan, H. (2017). Organisational factors affecting cloud computing
adoption in small and medium enterprises (SMEs) in service sector.
Procedia Computer Science. https://doi.org/10.1016/j.procs.2017.11.126
69. He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: RCFile:
A Fast and Space-efficient Data Placement Structure in MapReduce-
based Warehouse Systems. In: IEEE International Conference on Data
Engineering (ICDE), pp. 1199–1208 (2011).
70. Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.: Ysmart: Yet
Another SQL-to-MapReduce Translator. In: IEEE International
Conference on Distributed Computing Systems (ICDCS), pp. 25–36
(2011).
71. Burt, D., Nicholas, P., Sullivan, K., & Scoles, T. (2013). Cybersecurity
Risk Paradox. Microsoft SIR.
72. Marzantowicz (2015), Corporate Social Responsibility of TSL sector:
attitude analysis in the light of research, “Logistyka” 2014, No. 5, pp.
1773–1785.
73. Pai & Aithal (2017). The basis of social responsibility in management,
Poltext, Warszawa.
74. Sen and Tiwari (2017). Port sustainability and stakeholder management
in supply chains: A framework on resource dependence theory, The
Asian Journal of Shipping and Logistics, No. 28 (3): 301–319.
75. Tashkandi, A. N., & Al-Jabri, I. M. (2015). Cloud computing adoption
by higher education institutions in Saudi Arabia: An exploratory study.
Cluster Computing. https://doi.org/10.1007/s10586-015-0490-4
76. Fehling, C., Leymann, F., Retter, R., Schupeck, W., & Arbitter, P.
(2014). Cloud Computing Patterns. In Cloud Computing Patterns.
https://doi.org/10.1007/978-3-7091-1568-8
77. Sether, A. (2016), Cloud Computing Benefits (2016).
78. Handa, A., Sharma, A., & Shukla, S. K. (2019). Machine learning in
cybersecurity: A review. In Wiley Interdisciplinary Reviews: Data
Mining and Knowledge Discovery. https://doi.org/10.1002/widm.1306
79. KPMG (2018), Clarity on Cybersecurity. Driving growth with
confidence.
80. Gillward & Moyo (2013). Green performance criteria for sustainable
ports in Asia, International Journal of Physical Distribution & Logistics
Management, No. 43(5): p. 5.
81. Greengard, S. (2016). Cybersecurity gets smart. In Communications of
the ACM. https://doi.org/10.1145/2898969
82. Rivard, Raymond, and Verreault (2006). Resource-based view and
competitive strategy: An integrated model of the contribution of
information technology to firm performance,
DOI:10.1016/j.jsis.2005.06.003, Corpus ID: 206514952
83. Kobielus, J. (2018). Deploying Big Data Analytics Applica- tions to the
Cloud: Roadmap for Success. Cloud Standards Customer Council.
84. Furht (2010). Information orientation, competitive advantage, and firm
performance: a resource-based view. European Journal of Business
Research, 12(1), 95–106.
85. Lee, J. (2017). HACKING INTO CHINA’ S CYBERSECURITY LAW, In:
IEEE International Conference on Distributed Computing Systems
(2017).
86. Zhang, Q., Cheng, L., & Boutaba, R. (2010). Cloud computing: state-of-
the-art and research challenges. Journal of Internet Services and
Applications, 1(1), pp.7–18.
87. Oliveira, T., Thomas, M., & Espadanal, M. (2014). Assessing the
determinants of cloud computing adoption: An analysis of the
manufacturing and services sectors. Information and Management.
https://doi.org/10.1016/j.im.2014.03.006
88. Hsu, T. C., Yang, H., Chung, Y. C., & Hsu, C. H. (2018). A Creative IoT
agriculture platform for cloud fog computing. Sustainable Computing:
Informatics and Systems. https://doi.org/10.1016/j.suscom.2018.10.006
89. Bou-Harb, E., & Celeda, P. (2018). Survey of Attack Projection,
Prediction, and Forecasting in Cyber Security. September.
https://doi.org/10.1109/COMST.2018.2871866
90. Berman, D.S., Buczak, A.L., Chavis, J.S., and Corbett, C.L. (2019).
“Survey of Deep Learning Methods for Cyber Security,” Information
2019, 10, p. 122. DOI:10.3390/info10040122
91. Kumar, R. (2011). Research Methodology: A step by step guide for
beginners. 3rd ed. London: Sage.
92. Merrian, S.B. & Tisdell E.J. (2016). Qualitative Research: A guide to
design and implementation, 4th Edition. Jossey-Bass, A Wiley Brand.
93. TechAmerica (2012): Demystifying Big Data: A Practical Guide to
Transforming the Business of Government. In: TechAmerica Reports,
pp. 1–40 (2012).
94. Sanchez, D., Martin-Bautista, M.J., Blanco, I., Torre, C.: Text
Knowledge Mining: An Alternative to Text Data Mining. In: IEEE
International Conference on Data Mining Workshops, pp. 664–672
(2008).
95. Zhang, L., Stoffel, A., Behrisch, M., Mittelstadt, S., Schreck, T., Pompl,
R., Weber, S., Last, H., Keim, D.: Visual Analytics for the Big Data Era
– A Comparative Review of State-of-the-Art Commercial Systems. In:
IEEE Conference on Visual Analytics Science and Technology (VAST),
pp. 173–182 (2012).
96. Song, Z., Kusiak, A. (2009). Optimizing Product Configurations with a
Data Mining Approach. International Journal of Production Research
47(7), 1733–1751.
97. Adams, M.N. (2010). Perspectives on Data Mining. International
Journal of Market Research, 52(1), pp. 11–19.
98. .Fu, J., Zhu, E. Zhuang, J. Fu, J. Baranowski, A. Ford and J. Shen
(2016) “A Framework-Based Approach to Utility Big Data Analytics,”
in IEEE Transactions on Power Systems, 31(3), pp. 2455–2462.
99. Napanda, K., Shah, H., and Kurup, L. (2015). Artificial Intelligence
Techniques for Network Intrusion Detection, International Journal of
Engineering Research & Technology (IJERT), ISSN: 2278-0181,
IJERTV4IS110283 www.ijert.org, Vol. 4 Issue 11, November-2015.
100. Stefanova, Z.S. (2018). “Machine Learning Methods for Network
Intrusion Detection and Intrusion Prevention Systems,” Graduate
Theses and Dissertations, 2018,
https://scholarcommons.usf.edu/etd/7367
101. Almutairi, A. (2016). Improving intrusion detection systems using data
mining techniques, Ph.D Thesis, Loughborough University, 2016.
102. Bolzoni, D. (2009). Revisiting Anomaly-based Network Intrusion
Detection Systems, Ph.D Thesis, University of Twente, The
Netherlands, ISBN: 978-90-365-2853-5, ISSN: 1381-3617.
DOI:10.3990/1.9789036528535
103. Mouthami, K., Devi, K.N., Bhaskaran, V.M.: Sentiment Analysis and
Classification Based on Textual Reviews. In: International Conference
on Information Communication and Embedded Systems (ICICES), pp.
271–276 (2013).
104. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C.,
Byers, A.H.: Big Data: The Next Frontier for Innovation, Competition,
and Productivity. In: McKinsey Global Institute Reports, pp. 1–156
(2011).
Using ML and DL Algorithms for Intrusion
Detection in the Industrial Internet of Things 15
DOI: 10.1201/9781003187158-18
Contents
15.1 Introduction
15.2 Ids Applications
15.2.1 Random Forest Classifier
15.2.2 Pearson Correlation Coefficient
15.2.3 Related Works
15.3 Use of ML and DL Algorithms in Iiot Applications
15.4 Practical Application of ML Algorithms in Iiot
15.4.1 Results
15.5 Conclusion
References
15.1 INTRODUCTION
The internet has turned into a worldwide essential resource for humanity,
taking place not only on smartphones and computers but also in TVs,
refrigerators, and even automobiles. This advance is named the Internet of
Things (IoT). According to Nick [1], in 2021 there were 46 billion internet-
connected devices. Running alongside these developments, industrial
devices’ technology and connectivity are also increasing and giving rise to a
new technology, the Industrial Internet of Things (IIoT), which is the IoT
concept applied to industrially dedicated network applications.
IIoT technologies are being heavily used in critical environments, such
as manufacture and energy management, water supply control, and many
others. The system that operates this kind of environment is called an
industry control system (ICS). Once the industrial machines and devices are
connected to the internet, it becomes a potential target for cyber attacks,
being at the mercy of data exposure, theft, modification, or destruction [2].
To trick modern security mechanisms, cyber attacks are constantly
moving forward. The aftermaths are countless and can reach out to ordinary
people as well as large companies and turn into catastrophic scenarios. To
try to hold back this cybernetic epidemic, a multitude of intrusion detection
systems (IDS) are being developed using machine learning (ml) and deep
learning (dl) algorithms to detect attack patterns in network traffic. To
detect these patterns, the ML and DL algorithms can be trained using
network traffic datasets. Despite the several datasets available to train ML
and DL models, is important to identify the best features in the dataset so
that the training can be made properly. In this chapter, we discuss the use of
ML and DL algorithms in practical applications to develop efficient IDSs
for industrial networks considering the best features of each attack present
in the datasets used.
There are various IDS types, each one intended for a different environment.
Among the existing types, the characteristics of host-based and network-
based IDS are described in Figure 15.1.
FIGURE 15.1 Cyber security traditional systems.
Source: Adapted from Teixeira et al. [3].
1. (RF) dataset with only the features selected with the random forest
algorithm
2. (PC) dataset with only the features selected with the Pearson
correlation coefficient
3. (AF) dataset with all the features
The same experiments will be made with these datasets using different
prediction algorithms to find the best algorithm for this purpose.
Afterward, the datasets are balanced using the under sampling method,
and the same experiments are made to analyze whether there is any on the
training with an imbalanced dataset.
At the end of the step 1, all the experiments made are evaluated to pick
up the best approach, which must satisfy the following terms: balanced or
imbalanced dataset according to the best performance; specific features of
each cyber attack or general features as attested by the best performance on
the tests; the prediction algorithm with the best rate at the evaluation.
Ins step 2, the best approach is put into practice using ensemble learning
to develop the IDS to detect malicious activities in industrial networks. In
order to implement this approach, the features are extracted – using the
features selection approach from the first step – and standardized, so that
the dataset can be divided into two parts: 80% of the dataset for training and
20% for testing. At the testing point, the developed IDS makes its
predictions, which are finally evaluated.
To measure the performance of our approach, metrics from the confusion
matrix are used, as shown in Table 15.2.
Input represents the network flow input that will be analyzed by the
model; the input value can be 0 – normal flow – or 1 – flow under
attack.
True-negative represents the number of normal flows correctly
predicted as normal.
False-positive represents the number of normal flows incorrectly
predicted as under attack.
False-negative represents the number of under attack flows incorrectly
predicted as normal.
True-positive represents the number of under attack flows correctly
predicted as under attack.
The metrics from the confusion matrix used are accuracy, false alarm rate
(FAR), undetected rate (UR), Matthews correlation coefficient (MCC), and
sensitivity.
Accuracy is the rate of predictions made correctly considering the total
number of predictions and can be represented as shown in equation
(15.4.1).
(15.4.6)
15.4.1 Results
For the development of this practical application, two different datasets
were used – available in [13] – with different cyber attacks:
To find the best algorithm to choose the most important dataset features,
two different techniques are used: the random forest algorithm and the
Pearson correlation coefficient. These techniques are used in three different
scenarios:
RF: dataset with only the features selected with the random forest
algorithm
PC: dataset with only the features selected with the Pearson
correlation coefficient
AF: dataset with all the features
In the first scenario (RF), the random forest [18] was used to find the
features with the best scores in the dataset A. As a result, the best features
found were DstBytes, DstLoad, and DstRate, described in the Table 15.3,
where the description and score are shown.
Still in the scenario RF, the best features from the dataset B were found,
with a score over 0.15: Ploss, Dport, SrcPkts, and TotPkts. The features are
described in Table 15.4.
The same calculation was performed over dataset B, and the best
features were DstLoss, Dport, SrcLoss, and Ploss. The best features and
scores in this experiment are described at the Table 15.6.
Besides the high levels of accuracy achieved in all tests so far – over
99% – it’s necessary to consider other metrics to evaluate the training
model performance with imbalanced datasets. For this purpose, the false
alarm rate (FAR) was used to calculate the percentage of normal traffic
classified wrongly as traffic under attack. Applying the FARequation,
presented in the previous section of this chapter, satisfying results of under
1% were achieved.
Analyzing Figure 15.5, the training using dataset A predicted wrongly
more normal flows than the training using dataset B. While the lower rate
using dataset B was 0.01%, the lower rate using dataset A was 0.51%, a rate
higher than the highest rate using dataset B.
FIGURE 15.5 Accuracy.
Analyzing Figure 15.7, it’s possible to observe that all the experiments
so far were satisfying for this approach’s objectives, presenting 99.57% at
the PC and AF and 98.35% at the RF using the dataset B. The experiments
using dataset A presented lower results; however, the lowest rate was at the
RF, with 95.39%, which is still a high rate achieved.
Finally, the sensitivity equation was employed to measure the efficiency
of the models to predict anomalous activities correctly. As seen in Figure
15.8, the models showed high sensitivity in intrusion detection in network
traffic being trained using imbalanced datasets. The experiments using
dataset A presented the best results using this metric, achieving the highest
rate – 100% – at RF and PC scenarios, as using dataset B, the rates were
97.38% and 99.46% respectively (see Figure 15.9).
FIGURE 15.8 MCC.
15.5 CONCLUSION
In this chapter, we presented the methods and metrics used to analyze the
best approach to develop an IDS for industrial networks using ML
algorithms. Using datasets present in the literature, several experiments
were conducted aiming to choose the best approach to develop an efficient
IDS for industrial networks. To achieve this goal, the performance of the
training model was evaluated using the logistic regression algorithm in
three different scenarios, considering the learning time duration and other
metrics from the confusion matrix, such as accuracy, FAR, UR, MCC, and
sensitivity.
For upcoming steps, experiments using other prediction algorithms will
be done, such as experiments with balanced datasets to analyze whether
balanced and imbalanced datasets can lead to different results in training
prediction models, so that the best approaches can be used to develop the
IDS for industrial networks using ML and ensemble learning algorithms.
Moreover, we will conduct the experiments considering a dataset with more
types of attacks.
REFERENCES
1. Nick G. How Many IoT Devices Are There in 2021? [All You Need To
Know]. [S.l.], 2021. [Online]. https://techjury.net/blog/how-many-iot-
devices-are-there/#gref
2. ISO/IEC 27000:2018(en). Information technology – Security techniques
– Information security management systems – Overview and
vocabulary. [S.l.], 2018. [Online]. www.iso.org/obp/ui/#iso:std:iso-
iec:27000:ed-5:v1:en
3. Brown, D. J.; Suckow, B.; Wang, T. A survey of intrusion detection
systems. Department of Computer Science, University of California,
San Diego, 2002.
4. Biju, Jibi Mariam, Neethu Gopal, and Anju J. Prakash. “Cyber attacks
and its different types.” International Research Journal of Engineering
and Technology 6.3 (2019): 4849–4852.
5. Burton, J.; Dubrawsky, I.; Osipov, V.; Tate Baumrucker, C.; Sweeney,
M. (Ed.). Cisco Security Professional’s Guide to Secure Intrusion
Detection Systems. Burlington: Syngress, 2003. P. 1–38. ISBN 978-1-
932266-69-6. Available in:
www.sciencedirect.com/science/article/pii/B9781932266696500215.
6. Pajouh, H. H.; Javidan, R.; Khayami, R.; Dehghantanha, A.; Choo, K.-
K. R. A two-layer dimension reduction and two-tier classification model
for anomaly-based intrusion detection in IOT backbone networks. IEEE
Transactions on Emerging Topics in Computing, v. 7, n. 2, p. 314–323,
2019.
7. License, B. (Ed.). RandomForestClassifier. [S.l.], 2020. Available in:
https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestCla
ssifier.html.
8. Foundation, P. S. Python. [S.l.], 2021. Available in: www.python.org
9. Ambusaidi, M. A.; He, X.; Nanda, P.; Tan, Z. Building an intrusion
detection system using a filter-based feature selection algorithm. IEEE
Transactions on Computers, v. 65, n. 10, p. 2986–2998, 2016.
10. Sarker I.H.; Abushark, Y. A. F. K. A. IntruDTree: A Machine Learning
Based Cyber Security Intrusion Detection Model. [S.l.], 2020.
11. Keshk, M.; Moustafa, N.; Sitnikova, E.; Creech, G. Privacy preservation
intrusion detection technique for SCADA systems. In: 2017 Military
Communications and Information Systems Conference (MilCIS). [S.l.:
s.n.], 2017. P. 1–6.
12. Zargar, G. R.; Baghaie, T. et al. Category-based intrusion detection
using PCA. Journal of Information Security, Scientific Research
Publishing, v. 3, n. 04, p. 259, 2012.
13. Teixeira, M. A.; Salman, T.; Zolanvari, M.; Jain, R.; Meskin, N.;
Samaka, M. Scada system testbed for cybersecurity research using
machine learning approach. Future Internet, v. 10, n. 8, 2018. ISSN
1999–5903. Available in: www.mdpi.com/1999-5903/10/8/76
14. Apruzzese, G.; Colajanni, M.; Ferretti, L.; Guido, A.; Marchetti, M. On
the effectiveness of machine and deep learning for cyber security. In:
2018 10th International Conference on Cyber Conflict (CyCon). [S.l.:
s.n.], 2018. P. 371–390.
15. Melo, C. Como lidar com dados desbalanceados? [S.l.], 2019.
16. Zolanvari, M.; Teixeira, M.; Jain, R. Effect of imbalanced datasets on
security of industrial iot using machine learning. 12 2019.
17. Teixeira, Marcio Andrey and Zolanvari, Maede and Khan, Khaled M.
Flow-based intrusion detection algorithm for supervisory control and
data acquisition systems: A real-time approach. 2021.
18. Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and
Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and
Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.,
Journal of Machine Learning Research. Scikit-learn: Machine Learning
in {P}ython. 2011.
Part IV
Applications
On Detecting Interest Flooding Attacks in
Named Data Networking (NDN)–based IoT 16
Searches
DOI: 10.1201/9781003187158-20
Contents
16.1 Introduction
16.2 Preliminaries
16.2.1 Named Data Networking (NDN)
16.2.2 Internet of Things Search Engine (IoTSE)
16.2.3 Machine Learning (ML)
16.3 Machine Learning Assisted for Ndn-Based Ifa Detection in Iotse
16.3.1 Attack Model
16.3.2 Attack Scale
16.3.3 Attack Scenarios
16.3.4 Machine Learning (ML) Detection Models
16.4 Performance Evaluation
16.4.1 Methodology
16.4.2 IFA Performance
16.4.2.1 Simple Tree Topology (Small Scale)
16.4.2.2 Rocketfuel ISP like Topology (Large Scale)
16.4.3 Data Processing for Detection
16.4.4 Detection Results
ML Detection Performance in Simple Tree
16.4.4.1 Topology
16.4.4.2 ML Detection in Rocketful ISP Topology
16.5 Discussion
16.6 Related Works
16.7 Final Remarks
Acknowledgment
References
16.1 INTRODUCTION
The Internet of Things (IoT) has drawn much attention in recent years [1,
2]. The volume of IoT is increased by the millisecond and will expand to 50
billion by 2025 [3]. To better manage IoT devices and such massive data
traffic, the IoT search engine (IoTSE) is designed to meet those
requirements [4]. Generally speaking, the IoTSE system consists of three
components, as shown in Figure 16.1:
1. Users: There are two types of users: human users and machine users
(e.g., asmart vehicle).
2. IoT searching engine (IoTSE): This has three logical layers:
a. Network communication layer: This sets network communication
protocols determined based on the type of network that the devices
and users are on, such as a wired network or wireless network.
b. Data service layer: This is the core layer of IoTSE, where the search
engine is deployed. At this layer, the requests from the users are
processed based on the request’s needs in data storage. It is then
forwarded to the upper layer for any advanced requirements or sent
back to the users.
c. Application layer: This layer analyzes the requested data to extract
insightful patterns and generate further decisions.
3. Data source: IoT devices generate data and send the data back to the
data storage of IoTSE. In general, the user posts a request to IoTSE at
the beginning. If the requested data exists in data storage, the IoTSE
then returns it to the user. Otherwise, IoTSE requests data from IoT
devices and sends it back to the user.
The most popular IoTSE is the web-crawler-based system that uses the
TCP/IP network structure to transmit data. This type of website-based
configuration raises some issues:
16.2 PRELIMINARIES
This section introduces our proposed attack model, attack scale, attack
scenarios, and machine learning models.
Number of adversary users: For each attack scene, the total number of
users determines the primary user group of the network. It should be
noticed that the larger user group, no matter the adversary or the
legitimate, usually leads to higher chance of utilizing all of the
network resources. For these experiments, the goal is for the adversary
users to hinder the network performance for the legitimate user.
Location of adversary users: The distribution and location of the
adversary users are based on the total number of legitimate users. The
distance between the adversary and the IoTSE determines the data
transmission delay. For instance, if the adversary is closer to the
IoTSE, the lower latency is in sending fake interest requests, causing a
more negative impact on the performance of legitimate users.
Active time of adversary user: This refers to how long it takes for the
adversary to flood the network. It depends on the active amount of
time when these users send the fake interest packets. For this research,
the ideal case is to have each adversary user always be active to
produce maximum damage to the network.
Flooding rate of adversary user: The number of packets sent by the
adversary user every second determines the bandwidth usage. The
faster the adversary user sends the fake interest packet, the more
hindrance is placed on bandwidth usage, and the more floods the
network prevents the legitimate users from getting responses from the
IoTSE.
In the following section, we first introduce how to prepare the IFA dataset,
set up the experiments, and evaluate the performance of IFA and the result
of IFA performance. We then introduce how to prepossess the dataset
collected. Lastly, we introduce how to evaluate the performance of machine
learning and then show the result.
16.4.1 Methodology
Based on the attacking scenarios in section 16.3, we generate IFA network
traffic. As mentioned in section 16.3, we have defined two attacking
scenarios, small scale and large scale. We use ns3 [28] as our main
simulation platform and NDNsim [29] as the running protocol in ns3.
For the configuration of both attacking scenarios, we set up the
following: (1) The simulation is fixed to 900 seconds. (2) All the legitimate
users only send the request at a maximized data rate but do not bypass
network congestion. This variable is calculated on the network topology
configuration, including numbers of users, locations of IoTSE, and the link
table in both scenarios. (3) We launch the attack from 300 to 600 seconds.
For all the adversary users, those are randomly active to attack with a 50%
probability for each second, with a fixed data generating rate of 1,000
packets per second.
To evaluate attacking performance, we consider the following
performance metric to qualify the impact of attacks: the packet successful
ratio, which refers to successful packet reception ratio of legitimated users.
During the whole simulation time, we keep tracking all network activities in
all users (including IoTSE), including received or forward packet numbers
received or forward request numbers and the timestamp of each node
behavior. We can easily calculate the successful packet reception of all
legitimate users for each second based on those. If this rate drops, the attack
is successful. Ideally, if the rate becomes zero, the attack damage is
maximized.
We use the statistic-based machine learning models described in section
16.3.4 via sklearn [30]. We also randomly split the dataset into 20% test
dataset and 80% training dataset. In addition, we define the following
performance metrics to measure the detection results of ML based models:
(1) Highest training accuracy: During the training, we keep recording the
highest training accuracy and use that configuration for further testing. (ii)
Average testing accuracy: During the test, we record each test result and
compute the average of actual results to measure the overall accuracy. (iii)
Average precision and average recall in testing: During the test session, we
record each time test with precision and recall, computing an average result.
(iv) f1-score: Since this is a biclassifying problem, the F1 score would be a
better metric to determine performance, based on the previous two metrics
in testing.
Due to those reasons, the datasets are briefed into Table 16.4 and Table
16.5. There are around 3,000 records for the small scale, including 1,800
records about legitimate users and 1,200 records about adversary users.
There are over 224,000 records in total for the large scale, including
140,000 records about legitimate users and 83,000 records about adversary
users. Each record has two features: the number of sent packets and the
label of attacks. Finally, we transform this complicated task into a
simplified task to build a binary classifier.
16.5 DISCUSSION
There are some potential future directions for IFA in NDN-based IoTSE,
concerning the improvement of detection, launching more comprehensive
IFA, and IFA mitigation.
Improvement of detection: In this chapter, we assume the adversary does
not know the profile of the IoTSE, including the naming scheme of the
interest request. Thus the adversary can only target PIT in the NDN router
by sending massive, faked interest requests quickly. We consider a packets
rate as a detection feature and leverage ML-based schemes to carry out
anomaly detection. Since legitimate users have a static packet sampling
rate, we can easily identify abnormal behaviors. Nonetheless, supposing the
adversary has knowledge of part of setting the user’s profile (e.g., request
sampling rate), the adversary can use the same configuration, and the
adversary’s behavior will be indistinguishable from legitimated users.
Therefore, the detection scheme based on packet rate will be ineffective. To
deal with such an issue, we shall consider other schemes, e.g., setting more
rules [8] to restrict the network behaviors, designing an attributed-based
detection algorithm [31] to identify unusual traffic statistics in PIT entity,
and using graph neutral network-based schemes [32] to detect unusual
network traffic, as well as to investigate features that are fundamental and
difficult for the adversary to manipulate [33].
Studying comprehensive IFA: In this chapter, we set the adversary with a
fixed sampling rate with a 50% probability to be on-off for each second and
assume that the adversary does not know the profile of IoTSE. This kind of
attack can evolve into a much more comprehensive one. There are two
potential ways to hinder the detection: (1) The adversary can send those
nonexistent interest packets to the adversary’s IoTSE. Then the adversary’s
IoTSE responds to those requests by sending data back. Since the target is
to overuse the whole bandwidth of the network instead of the IoTSE, the
detection based on packet rate as a feature in IoTSE would consider that
malicious traffic as normal traffic of other IoTSE. (2) With the knowledge
of the setting configuration for IoTSE and users, the adversary can launch
satisfied interest packets with random selections or nonexistent interest
requests with a known naming scheme; therefore, those malicious requests
become indistinguishable from those of legitimate users. Thus it can cause
more damage to the network and become more challenging to detect.
IFA mitigation: To better mitigate the IFA, we intend to design a
countermeasure in the future study. There are some existing methods [34–
37] to deal with IFA in NDN. Nonetheless, there are few limitations to
those methods: (1) The efficacy of those algorithms highly depends on the
shapes of topology. (2) Those designs are for single detection points, i.e.,
one single PIT table or one single interface. Thus we plan to investigate
more comprehensive mitigation schemes in our future study. One is to
consider different detection points to increase detection accuracy and
reduce the detection sensitivity on the different topologies. The other is to
leverage with deep learning-based schemes to deal with large-scale and
complex IFA.
This chapter has investigated three attacking scenes for interest flooding
attacks (IFA) in NDN-based IoTSE. Using ns3 as the evaluation platform,
we have designed two IFA scenarios with a nonexistent interest flooding
attack strategy in small- and large-scale simulations, utilizing different link
configurations. We first show the severity of the damage caused by IFA.
Then, based on the collected dataset of the successful IFA, we process the
dataset and reduce it into only the out-request and outer network interface
in the PIT-related dataset. Finally, we have obtained a dataset with two
features: packets and the label of attack. We compared the eight ML models
to carry out attack detection and concluded that most ML models could
effectively detect this kind of attack in a packet-rate-based dataset.
However, we find that the linear-based ML model obtains worse detection
accuracy when the dataset volume increases tremendously.
ACKNOWLEDGMENT
This material is based upon work supported by the Air Force Office of
Scientific Research under award number FA9550-20-1-0418. Any opinions,
findings, and conclusions or recommendations expressed in this material
are those of the author(s) and do not necessarily reflect the views of the
United States Air Force.
REFERENCES
1. J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao, “A survey on
internet of things: Architecture, enabling technologies, security and
privacy, and applications,” IEEE Internet of Things Journal, vol. 4, no.
5, pp. 1125–1142, 2017.
2. H. Xu, W. Yu, D. Griffith, and N. Golmie, “A survey on industrial
internet of things: A cyber-physical systems perspective,” IEEE Access,
vol. 6, pp. 78 238–78 259, 2018.
3. J. Zhang and D. Tao, “Empowering things with intelligence: a survey of
the progress, challenges, and opportunities in artificial intelligence of
things,” IEEE Internet of Things Journal, vol. 8, no. 10, pp. 7789–7817,
2020.
4. F. Liang, C. Qian, W. G. Hatcher, and W. Yu, “Search engine for the
internet of things: Lessons from web search, vision, and opportunities,”
IEEE Access, vol. 7, pp. 104 673–104 691, 2019.
5. A. Abraham, F. Pedregosa, M. Eickenberg, P. Gervais, A. Mueller, J.
Kossaifi, A. Gramfort, B. Thirion, and G. Varoquaux, “Machine learn-
ing for neuroimaging with scikit-learn,” Frontiers in neuroinformatics,
vol. 8, p. 14, 2014.
6. M. Mayhew, M. Atighetchi, A. Adler, and R. Greenstadt, “Use of
machine learning in big data analytics for insider threat detection,” in
MILCOM 2015–2015 IEEE Military Communications Conference.
IEEE, 2015, pp. 915–922.
7. R.-T. Lee, Y.-B. Leau, Y. J. Park, and M. Anbar, “A survey of interest
flooding attack in named-data networking: Taxonomy, performance and
future research challenges,” IETE Technical Review, pp. 1–19, 2021.
8. A. Afanasyev, J. Burke, T. Refaei, L. Wang, B. Zhang, and L. Zhang,
“A brief introduction to named data networking,” in MILCOM 2018–
2018 IEEE Military Communications Conference (MILCOM), 2018, pp.
1–6.
9. S. Shannigrahi, C. Fan, and C. Partridge, “What’s in a name?: Naming
big science data in named data networking.” Proceedings of the 7th
ACM Conference on Information-Centric Networking, pp. 12–23, 2020.
10. L. Zhang, A. Afanasyev, J. Burke, V. Jacobson, K. Claffy, P. Crowley,
C. Papadopoulos, L. Wang, and B. Zhang, “Named data networking,”
ACM SIGCOMM Computer Communication Review, vol. 44, no. 3, pp.
66–73, 2014.
11. H. Liang, L. Burgess, W. Liao, C. Lu, and W. Yu, “Towards named data
networking for internet of things search engine,” 2021. [Online].
Available: to be published.
12. F. Liang, W. Yu, D. An, Q. Yang, X. Fu, and W. Zhao, “A survey on big
data market: Pricing, trading and protection,” IEEE Access, vol. 6, pp.
15 132–15 154, 2018.
13. W. G. Hatcher, C. Qian, W. Gao, F. Liang, K. Hua, and W. Yu,
“Towards efficient and intelligent internet of things search engine,”
IEEE Access, 2021.
14. H. Liang, L. Burgess, W. Liao, C. Lu, and W. Yu, “Deep learning assist
iot search engine for disaster damage assessment,” 2021. [Online].
Available: to be published.
15. X. Yang, K. Lingshuang, L. Zhi, C. Yuling, L. Yanmiao, Z. Hongliang,
G. Mingcheng, H. Haixia, and W. Chunhua, “Machine learning and
deep learning methods for cybersecurity.” IEEE Access, vol. 6, pp. 35
365–35 381, 2018.
16. T. G. Dietterich, “Steps toward robust artificial intelligence.” AI
Magazine, vol. 38, no. 3, pp. 3–24, 2017.
17. M. Capra, B. Bussolino, A. Marchisio, G. Masera, and M. Shafique,
“Hardware and software optimizations for accelerating deep neural
networks: Survey for current trends, challenges, and the road ahead,”
IEEE Access, vol. 8, pp. 225 134–225 180, 2020.
18. W. G. Hatcher and W. Yu, “A survey of deep learning: Platforms,
applications and emerging research trends,” IEEE Access, vol. 6, pp. 24
411–24 432, 2018.
19. J. M. Helm, A. M. Swiergosz, H. S. Haeberle, J. M. Karnuta, J. L.
Schaffer, V. E. Krebs, A. I. Spitzer, and P. N. Ramkumar, “Machine
learning and artificial intelligence: Definitions, applications, and future
directions.” Current Reviews in Musculoskeletal Medicine, vol. 13, no.
1, p. 69, 2020.
20. N. Spring, R. Mahajan, and D. Wetherall, “Measuring isp topologies
with rocketfuel,” ACM SIGCOMM Computer Communication Review,
vol. 32, no. 4, pp. 133–145, 2002.
21. P. H. Swain and H. Hauska, “The decision tree classifier: Design and
potential,” IEEE Transactions on Geoscience Electronics, vol. 15, no. 3,
pp. 142–147, 1977.
22. B. Scholkopf, K.-K. Sung, C. J. Burges, F. Girosi, P. Niyogi, T. Poggio,
and V. Vapnik, “Comparing support vector machines with gaussian
kernels to radial basis function classifiers,” IEEE transactions on Signal
Processing, vol. 45, no. 11, pp. 2758–2765, 1997.
23. C. Cortes and V. Vapnik, “Support vector machine,” Machine learning,
vol. 20, no. 3, pp. 273–297, 1995.
24. L. E. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p.
1883, 2009.
25. T. K. Ho, “Random decision forests,” in Proceedings of 3rd
international conference on document analysis and recognition, vol. 1.
IEEE, 1995, pp. 278–282.
26. P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,”
Machine learning, vol. 63, no. 1, pp. 3–42, 2006.
27. L. Meier, S. Van De Geer, and P. Bu¨hlmann, “The group lasso for
logistic regression,” Journal of the Royal Statistical Society: Series B
(Statistical Methodology), vol. 70, no. 1, pp. 53–71, 2008.
28. G. F. Riley and T. R. Henderson, “The ns-3 network simulator,” in
Modeling and tools for network simulation. Springer, 2010, pp. 15–34.
29. A. Afanasyev, I. Moiseenko, L. Zhang et al., “ndnsim: Ndn simulator
for ns-3,” 2012.
30. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.
Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-
learn: Machine learning in python,” the Journal of machine Learning
research, vol. 12, pp. 2825–2830, 2011.
31. Y. Xin, Y. Li, W. Wang, W. Li, and X. Chen, “A novel interest flooding
attacks detection and countermeasure scheme in ndn,” in 2016 IEEE
Global Communications Conference (GLOBECOM). IEEE, 2016, pp.
1–7.
32. F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfar-
dini, “The graph neural network model,” IEEE transactions on neural
networks, vol. 20, no. 1, pp. 61–80, 2008.
33. H. Xu, W. Yu, X. Liu, D. Griffith, and N. Golmie, “On data in- tegrity
attacks against industrial internet of things,” in 2020 IEEE Intl Conf on
Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive
Intelligence and Computing, Intl Conf on Cloud and Big Data
Computing, Intl Conf on Cyber Science and Technology Congress
(DASC/PiCom/CBDCom/CyberSciTech), 2020, pp. 21–28.
34. H. Dai, Y. Wang, J. Fan, and B. Liu, “Mitigate ddos attacks in ndn by
interest traceback,” in 2013 IEEE Conference on Computer
Communications Workshops (INFOCOM WKSHPS). IEEE, 2013, pp.
381–386.
35. P. Gasti, G. Tsudik, E. Uzun, and L. Zhang, “Dos and ddos in named
data networking,” in 2013 22nd International Conference on Computer
Communication and Networks (ICCCN). IEEE, 2013, pp. 1–7.
36. T. Zhi, H. Luo, and Y. Liu, “A gini impurity-based interest flooding
attack defence mechanism in ndn,” IEEE Communications Letters, vol.
22, no. 3, pp. 538–541, 2018.
37. A. Compagno, M. Conti, P. Gasti, and G. Tsudik, “Poseidon: Mitigating
interest flooding ddos attacks in named data networking,” in 38th
annual IEEE conference on local computer networks. IEEE, 2013, pp.
630–638.
38. R. Doshi, N. Apthorpe, and N. Feamster, “Machine learning ddos
detection for consumer internet of things devices,” in 2018 IEEE
Security and Privacy Workshops (SPW). IEEE, 2018, pp. 29–35.
39. Z. He, T. Zhang, and R. B. Lee, “Machine learning based ddos attack
detection from source side in cloud,” in 2017 IEEE 4th International
Conference on Cyber Security and Cloud Computing (CSCloud). IEEE,
2017, pp. 114–120.
40. R. Santos, D. Souza, W. Santo, A. Ribeiro, and E. Moreno, “Machine
learning algorithms to detect ddos attacks in sdn,” Concurrency and
Computation: Practice and Experience, vol. 32, no. 16, p. e5402, 2020.
41. A. Afanasyev, P. Mahadevan, I. Moiseenko, E. Uzun, and L. Zhang,
“Interest flooding attack and countermeasures in named data
networking,” in 2013 IFIP Networking Conference, 2013, pp. 1–9.
42. H. Khelifi, S. Luo, B. Nour, H. Moungla, and S. H. Ahmed,
“Reputation- based blockchain for secure ndn caching in vehicular
networks,” in 2018 IEEE Conference on Standards for Communications
and Networking (CSCN). IEEE, 2018, pp. 1–6.
43. G. Cheng, Z. Li, J. Han, X. Yao, and L. Guo, “Exploring hierarchical
convolutional features for hyperspectral image classification,” IEEE
Transactions on Geoscience and Remote Sensing, vol. 56, no. 11, pp.
6712–6722, 2018.
44. Q. Zhu, R. Wang, Q. Chen, Y. Liu, and W. Qin, “Iot gateway: Bridging-
wireless sensor networks into internet of things,” in 2010 IEEE/IFIP
International Conference on Embedded and Ubiquitous Computing.
Ieee, 2010, pp. 347–352.
45. C. Thirumalai, S. Mohan, and G. Srivastava, “An efficient public key
secure scheme for cloud and iot security,” Computer Communications,
vol. 150, pp. 634–643, 2020.
Attack on Fraud Detection Systems in Online
Banking Using Generative Adversarial 17
Networks
DOI: 10.1201/9781003187158-21
Contents
17.1 Introduction
17.1.1 Problem of Fraud Detection in Banking
17.1.2 Fraud Detection and Prevention System
17.2 Experiment Description
17.2.1 Research Goal
17.2.2 Empirical Data
17.2.3 Attack Scenario
17.3 Generator and Discrimination Model
17.3.1 Model Construction
17.3.1.1 Imitation Fraud Detection System Model
17.3.1.2 Generator Models
17.3.2 Evaluation of Models
17.4 Final Conclusions and Recommendations
References
17.1 INTRODUCTION
17.1.1 Problem of Fraud Detection in Banking
Millions of transactions are registered daily in electronic banking systems.
Most of them are legal operations, but a small percentage are attempts at
illegal activities. The Black’s Law Dictionary [1] defines fraud as “a
knowing misrepresentation of the truth or concealment of a material fact to
induce another to act to his or her detriment.” Fraud is usually a tort, but in
some cases (especially where the conduct is willful), it may be a crime. The
fraud causes significant losses for banks and heavyweight problems for
users. Fraud detection describes a set of activities undertaken to identify
financial fraud.
An essential element of fraud detection systems currently being
developed is data mining, i.e., discovering significant and previously
unknown patterns in large data sets [2]. Several dozen techniques are used
in this context [3]. One of the simplest is logistic regression [4], and other
widely used methods are the support vector machine (SVM) [5] and
decision tree techniques. In recent years, neural networks have gained
particular popularity [6]. Machine learning techniques can be categorized
into supervised and unsupervised learning techniques. Supervised learning
means it is clearly defined whether or not a specific transaction is
fraudulent. Based on such data, a fraud detection model is created. On the
other hand, unsupervised learning refers to a situation in which a model was
created based on data without class assignment. In this way, transactions
that are not necessarily fraudulent but significantly deviate from the pattern
are identified [7].
In this chapter, we look at the use of the generative adversarial networks
(GAN) approach that was introduced by [8] and is currently one of the most
effective types of generative modeling. The use of neural networks allows
one to obtain results unattainable with classical methods. However, they
generally require much information to operate effectively and are also
characterized by a prediction time that is often clearly longer than, for
example, logistic regression [9].
These elements can create an effective system only when newly detected
fraud is immediately added to the pool of historically detected ones. The
BFDS should then be updated based on this enlarged learning set.
Some of the first commercially used systems were based on fuzzy logic.
They could apply a specific fraud assessment policy to select the optimal
threshold values [11]. The result of the system’s operation was the
likelihood of fraud. It is assumed that the quality of the assessment was
comparable to that given by the expert. The credit fraud detection model
introduced later [12] used the classification approach. Classification-based
techniques have been systematically developed and have repeatedly proven
their effectiveness in many business applications [13]. An alternative
approach is to build regression models. Classification and regression may
also be used complementarily for fraud detection [10].
The hacker has complete knowledge about the learning set for BFDS
creation.
The hacker does not have direct access to BDFS.
The hacker can have access to different bank accounts.
Table 17.2 shows the averaged results of the IFDS classification. The
results were obtained by calculating each of the presented measures
separately for each of the ten created models and then calculating their
arithmetic mean. The classification accuracy = 0.99 of the IFDS is very
good.
Table 17.4 shows the difference between the proportion of a given class
in a given set and the percentage of successful attacks on IFDS that enforce
it. For legal transactions, it is small (the attack using model 1). On the other
hand, the percentage of successful attacks (using model 2) forcing a false-
positive classification was over 200 times greater than the representation of
fraud in the data set.
NOTES
DOI: 10.1201/9781003187158-22
Contents
18.1 Introduction
18.2 Smart Healthcare System (SHS)
18.2.1 Formal Modeling of SHS
18.2.2 Machine Learning (ML)–based Patient Status Classification Module (PSCM) in
SHS
18.2.2.1 Decision Tree (DT)
18.2.2.2 Logistic Regression (LR)
18.2.2.3 Neural Network (NN)
18.2.3 Hyperparameter Optimization of PSCM in SHS
18.2.3.1 Whale Optimization (WO)
18.2.3.2 Grey Wolf Optimization (GWO)
18.2.3.3 Firefly Optimization (FO)
18.2.3.4 Evaluation Results
18.3 Formal Attack Modeling of SHS
18.3.1 Attacks in SHS
18.3.2 Attacker’s Knowledge
18.3.3 Attacker’s Capability
18.3.4 Attacker’s Accessibility
18.3.5 Attacker’s Goal
18.4 Anomaly Detection Models (ADMS) in SHS
18.4.1 ML-based Anomaly Detection Model (ADM) in SHS
18.4.1.1 Density-based Spatial Clustering of Applications with Noise
(DBSCAN)
18.4.1.2 K-means
18.4.1.3 One-class SVM (OCSVM)
18.4.1.4 Autoencoder (AE)
18.4.2 Ensemble-based ADMs in SHS
18.4.2.1 Data Collection and Preprocessing
18.4.2.2 Model Training
18.4.2.3 Threshold Calculation
18.4.2.4 Anomaly Detection
18.4.2.5 Example Case Studies
18.4.2.6 Evaluation Result
18.4.2.7 Hyperparameter Optimization of ADMs in SHS
18.5 Formal Attack Analysis of Smart Healthcare Systems
18.5.1 Example Case Studies
18.5.2 Performance with Respect to Attacker Capability
18.5.3 Frequency of Sensors in the Attack Vectors
18.5.4 Scalability Analysis
18.6 Resiliency Analysis of Smart Healthcare System
18.7 Conclusion and Future Works
References
18.1 INTRODUCTION
The high reliance on human involvement in conventional healthcare systems for consultation,
patient monitoring, and treatment creates a significant challenge and feasibility issues for the
healthcare sector in this pandemic situation, while COVID 19 has indiscriminately spread
throughout the world. The widespread nature of the virus is introducing delayed and incorrect
treatment resulting in serious health concerns and human mortality. Moreover, healthcare costs
are also skyrocketing in parallel, making this basic human need overburdening for the mean
population even in first world countries. For instance, the United States has experienced almost
$3.81 trillion expenditure for the healthcare sector in 2019, which is projected to reach $6.2
trillion by 2028 [1]. Hence the contemporary healthcare sector is shifting toward adopting the
internet of medical things (IoMT)–based smart healthcare system (SHS) to monitor and treat
patients remotely with wireless body sensor devices (WBSDs) and implantable medical
devices (IMDs) [2]. The WBSDs-provided sensor measurements are analyzed by an artificial
intelligence (AI)–based controller for assessing patients’ health status, which uses various
supervised machine learning (ML) algorithms like decision tree (DT), logistic regression (LR),
support vector machine (SVM), neural network (NN), etc. [3]. The patient statuses identified
by the ML models allow the SHS controller to generate necessary control signals for actuating
the IMDs to deliver automated treatment. Due to computational and device constraints, the
WBSDs’ measurements cannot be protected with computationally expensive cryptographic
algorithms. Moreover, the data transfer among the sensor devices and the controller takes place
in the open network.
The humongous attack surface in open network communication of WBSDs raises reliability
issues on the sensor measurements. Recent reports and statistics suggest that the frequency of
cyber attacks is increasing tremendously. For instance, Check Point Software reported a 45%
increase in cyber attacks (626 average weekly attacks) at healthcare organizations since
November 2020, which is significantly higher than other industry sectors [4]. Five years’
worth of confidential patient records were stolen in Colorado through a ransomware attack in
June 2020 [5]. Another report states that the University of Vermont Medical Center (UVMC)
was losing $1.5 million daily due to a cyber attack [6]. Moreover, a security breach at
Blackbaud cloud service provider exposed the information of almost 1 million patients of 46
hospitals and health systems. Hence a comprehensive anomaly detection model (ADM) with
zero-day attack detection capability is required for next-generation safety-critical SHS. The
effectiveness of using unsupervised ML models (e.g., density-based spatial clustering of
applications with noise [DBSCAN], K-means) for SHS ADMs has been experimented with
one of our ongoing works [7]. We propose an unsupervised machine learning (ML)–based
ADM ensembling autoencoder (AE) and one-class SVM (OCSVM) for SHS abnormality
detection. The effectiveness of the proposed ADM is evaluated in one of our recent works that
developed ADM in the smart building domain [7]. The concept of the ensembled OCSVM and
AE model is inspired by the fact that the OCSVM model can show significant performance in
case-identifying malicious sensor measurements. However, the OCSVM model creates a lot of
unnecessary alarms, while the opposite is true of AE. The proposed ensembled OCSVM-AE
model can combine the benefit of both models and incurs significantly low false anomalous
and benign sensor measurements.
In our proposed SHS design, both the patient status classification model (PSCM) and ADM
use ML models, and the success of the models largely relies on their hyperparameters. Several
contemporary research works attempt to get the best out of the ML models by obtaining
optimal hyperparameters utilizing various optimization techniques. Various optimization
algorithms can be used for ML model hyperparameter optimization. However, the ML models
are trained with a massive number of samples, and hence computational efficacy is compulsory
for feasible implementation. Therefore, we choose metaheuristic bio-inspired stochastic
optimization algorithms for ML models’ hyperparameter optimization due to their
computational efficiency. The approaches mimic nature’s strategy as a process of constrained
optimization. We consider grey wolf optimization (GWO), whale optimization (WO), and
firefly optimization (FO) algorithms for optimizing ML models’ hyperparameters for both
PSCM and ADM. The latter comes up with many more challenges due to the absence of
abnormal samples in the case of zero-day attack detection. We propose a novel fitness function
for ADMs’ hyperparameter optimization as identified in one of our recently published works
[7].
The SHS embedded with an ADM might still be vulnerable to cyber attacks. The robustness
and resiliency analysis of the system is mandatory for ensuring the safety of patients’ lives.
Hence, we propose a formal attack analyzer leveraging ML models and formal methods to
assess the predeployment vulnerability of the SHS. The proposed attack analyzer inspects the
underlying PSCM and ADM’s decisions by identifying the possible attacks that can be
deployed by minimal alteration of sensor measurement values. Moreover, the analyzer verifies
whether the attack goal is attainable based on the attackers’ capability not. However, the
formal constraint acquisition from the ML models is a challenging task. It becomes more
difficult for the clustering-based ADMs since they impose plenty of constraints that are not
solvable in feasible time. We develop a novel concave-hull-based boundary acquisition
algorithm that can mimic the original clustering algorithms and create a lot less linear
constraint. A smaller set of constraints make it feasible to solve them using formal methods
like satisfiable modulo theories (SMT)–based solvers. We verify our attack analyzer’s efficacy
using the University of Queensland Vital Signs (UQVS) dataset and our synthetic dataset using
several performance metrics [8].
18.2 SMART HEALTHCARE SYSTEM (SHS)
The formal constraint acquisition from the DT model is quite simple and straightforward
since the trained DT model outputs paths from source to destination using a hierarchical rule-
based system. According to the inference rules from DT, we define a Boolean function
inference (P, j) that returns true if the patient’s sensor measurements are following the rules
based on the DT-provided label, j.
NN is one of the most popular ML algorithms that facilitate nonlinear pattern extraction from
underlying complex feature relationships of data [17]. The NN model is comprised of a
network of nodes, where the arrangement and the number of the nodes depend on the data
distribution. The architecture of NN is inspired by the working principle of the human brain,
which can simultaneously perform several tasks while maintaining system performance. NN
can observe a notable performance for solving multiclass classification problems. However,
training such a network requires a lot of tuning, such as learning rate, batch size, number of
hidden layers, etc.
An NN consists of an input layer, one output layer, and one or more hidden layers. The
node inputs are calculated using the weight and bias values of connecting links along with the
outputs of the connected previous layer nodes (with an exception for layer 1). The calculation
is carried out as follows.
The input and output of the layer 1 nodes are basically the input feature values or(18.2.2)
patient sensor measurements, in our case as in equation (18.2.3). A N-layer NN model is
shown in Figure 18.3, where the last hidden layer is = .
The patient sensor measurements in consideration get a label, j, from the NN model,(18.2.4)
if and only if the softmax function output of the jth output nodes gets the maximum values
compared to the other output nodes. The inference rules of NN for a patient vital sign, P is
shown in equation (18.2.5).
(18.2.5)
18.2.3 Hyperparameter Optimization of PSCM in SHS
The choice of hyperparameters largely determines the performance of ML models. In this
work, we optimize the hyperparameters of both ADMs and PSCMs. The hyperparameter
optimization of ADMs is discussed in section 18.4.2.5. We have used metaheuristic BIC-based
algorithms for hyperparameter optimization of SHS patient data classification anomaly
detection (Figure 18.4). Three potential BIC algorithms are explored: whale optimization
(WO), grey wolf optimization (GWO), and firefly optimization (FFO).
FIGURE 18.4 Schematic diagram of the BIC-based optimization algorithms of the ML-
based PSCM’s hyperparameter optimization.
Here, denotes the current best position found by an agent at time, t, while(18.2.7)(18.2.6)
the position vector is indicated by . The following equations are used to determine the
coefficient vectors and .
1. Shrinking encircling: Follows equation (18.2.8), where the coefficient vector, is altered
randomly in the interval . The process allows a linear movement throughout iterations.
2. Spiral updating position: The agent uses a spiral updating equation resembling a
movement similar to a helix shape, which can be represented as equation (18.2.10).
Here, , which signifies ith agent’s distance from the prey, b indicates a logarithmic (18.2.10)
spiral shape constant, and denotes a random value.
These techniques are used simultaneously based on a probability value, p.
Prey searching: In prey searching, the agents vary the coefficient for exploring the search
space. Making the value of out of [–1,1] interval, the exploration process ensures traversing the
remote places from the agent of interest. The mathematical representation of exploration can
be expressed as follows:
Here, denotes a random vector selected from the current population. (18.2.12) (18.2.11)
Here, t indicates the current iteration, and denotes the coefficient vectors,(2.14) (2.13)
corresponds to the position vector of the prey’s location, signifies the current position vector,
and, finally, using , GWO calculates the new position vector of a grey wolf agent. The
coefficient vectors and can be determined using the following equations:
Xin-She Yang developed FO, which is another bio-inspired optimization algorithm for solving
complex optimization problems [20]. The algorithm is designed based on the behavior of the
fireflies, since they approach toward the direction based on the luminosity of other fireflies.
The FO algorithm follows three basic principles for movement.
All fireflies are unisex. Hence, the progression is irrelative to sex of the fireflies and
solely dependent on the brightness of other fireflies.
The brightness of the fireflies is determined by an encoded objective function.
The attractiveness for progression is directly related to brightness. Consequently,
brightness and the attractiveness decrease with the increment in the distance
In summary, a firefly moves toward brighter fireflies and moves randomly provided the
unavailability of brighter fireflies in visible range. The firefly position update maintains the
following equation:
The right-hand side of the expression expresses the interest of toward brighter (18.2.17)
firefly . The last expression is a randomization term with (i.e.,). The term denotes a vector of
several random values selected from a normal or some other distribution during the time, t. The
exploitation parameter, can be expressed as:
Here, (18.2.18)
In our work, we use a formal method-based attack analyzer to figure out potential
vulnerabilities since the tool allows us to explore the search space of possible system
behaviors. Our attack analyzer takes several constraints like attacker’s capability, accessibility,
and goal along with controller ML models as an input and formally models them using
satisfiability modulo theory (SMT)–based solvers. Depending on the satisfiability outcome of
the SMT solvers, the proposed analyzer can figure out potential threats. Our proposed
framework attempts to find the attack vector that can attain the attacker’s goal compromising a
minimum number of sensors within the attacker’s capability. We consider the sensor
measurement alteration attacks and following attack models throughout the write-up.
Our attack model assumes that the attacker cannot attack more than(18.3.2)(18.3.1)
Maxsensors number of sensor measurements. Moreover, the attacker is also unable to alter
measurements more than Threshold.
Here, we consider to be an integer value, where false is thought to be 0 and true to(18.3.3)
be 1.
In equation (18.3.4), shows the actual measurements, denotes the necessary (18.3.4)
alteration, and represents the altered measurement value of sensor S.
This section provides an overview of various ML-based ADMs of SHS along with a short
overview of the proposed ensemble of unsupervised ML model.
: Checks whether the y-coordinates of the data point are within the two end points of the
line segment, .
FIGURE 18.5 Logic behind checking whether a point is inside a polygon cluster in
DBSCAN algorithm.
From the Figure 18.5, it can be said that returns true for the point since . However,(18.4.1)
returns false for line segment .
: Checks whether the x-coordinates of the data point are on the left side of the line
segment, .
It is apparent from Figure 18.5 that the returns true for the data point (x,y).(18.4.2)
However, it returns false for the .
We draw an imaginary line parallel to the x-axis from the point of interest (x,y). The
function checks whether the imaginary line segment intersects the imaginary line. A data
point is said to intersect with a line segment if the data point is both and as shown in
equation (18.4.3):
It is obvious from Figure 18.5 that returns true for line segments , while false for(18.4.3)
the rest.
Checks whether a data point (x,y) is within a cluster, . The function performs on all
boundary line segments of a particular cluster and XOR them. The function outcome is
true for a cluster, provided the imaginary line parallel to x-axis from data point (x,y)
intersects with am odd number of line segments for that cluster. Equation (18.4.4) shows
the function definition.
Here, determines whether the cluster includes the line segment, . For instance, from (18.4.4)
Figure 18.5, it is clear that is inside cluster but not in since the imaginary line paralleled to the
x-axis from it intersects three line segments (am odd number) of the cluster.
From Table 18.1, we can see that ns denotes the total sensor measurements, and the overall
possible patient health status is denoted by nl. In this work, we assume each measurement is
recorded/reported by a sensor measurement. From the table, we can also see that S represents
the set of all the sensors, P is the measurement values of those sensor measurements, and L is
the set of all possible patient statuses. Let us say the status of the patient in consideration is
denoted by j, where j . For simplicity and other reasons, we consider 2-dimensional clusters,
and hence we think of two sensor relationships at a time. Thus, for the label and sensor
measurement pair , where we obtain one or more clusters, , which represents the relationship
between the two measurements for that specific label, k. These clusters comprise a few line
segments, which are represented as . For a pair of sensor measurements from all measurements
=, DBSCAN algorithm checks consistencies using (18.4.5), (18.4.6), and (18.4.7).
where (18.4.5)
The K-means algorithm produces almost similar constraints to those of DBSCAN with a
variation in the number of clusters, noise points, etc. (Likas et al., 2003). One of the main
challenges in the K-means algorithm is to determine the optimal value of k, which can be
optimally selected with various techniques like the elbow method, the silhouette method, etc.
[29].
18.4.1.3 One-class SVM (OCSVM)
The OCSVM model is a bit different from the state-of-the-art support vector machine (SVM)
model for novel pattern detection [30]. The SVM is a supervised ML for drawing hyperplanes
between different classes. But the SVM model is dependent on the labels for the training
samples. For zero-day/novel attack detection, the OCSVM model can be leveraged, which
separates the trained patterns from the origin using a decision boundary. The OCSVM model’s
decision boundary can be modified by tuning two hyperparameters: gamma and nu.
Case Study 1
The first case study was performed to check the ADM performance on a benign sample. The
prediction of the OCSVM model produces the to be –0.038 − and labels the sample as an
anomaly since the output is negative. The sample is labeled as benign by the AE model since
the calculated (=0.41) does not exceed threshold, E(=3.38). Moreover, the normalized OCSVM
threshold, was found to be 0.84, and normalized weight, is calculated to be 0.28, which makes
the normalized weight of OCSVM to be 0.28. Similarly, the value of was found to be 0.46.
Since is higher than , the ensembled model prediction corresponds to the prediction of AE
model. Thus the ensembled model correctly helps to reduce the false anomaly rate.
Case Study 2
Case study 2 is performed on two attack samples. The attack samples are obtained from a
benign sample considering two different cost increment attacks.
For the attacked sample, the OCSVM model predicted to be 0.06 and labeled the sample as
an anomalous sample. However, the was found to be 1.23, which does not exceed . Hence, the
sample is labeled as benign by the AE model. The measured is 0.42. Similarly, the value of
was determined as 0.39. The value of being higher than the makes the ensembled decision
correspond to the OCSVM prediction. Thus the ensembled model lowers the false benign rate
for critical attack samples.
The formal attack analyzer takes the SHS dataset and trains both ADM and PSCM models. To
recall, the PSCM is responsible for labeling patients’ status, while the ADM checks the
consistency of the sensor measurements of the PSCM-provided label. The attack analyzer
leverages the learned model parameters from both ML models and converts them into decision
boundaries to generate constraints from them. The constraints are formulated as a constraints
satisfaction problem (CSP) and are fed into an SMT solver. The solver takes the constraints of
the attacker’s capability, accessibility, attack goal along with the ML-model constraints and
assesses the threat associated with a patient’s sensor measurements.
The solver returns a satisfiable (SAT) outcome provided the constraints are nonconflicting
based on the measurements of the patient in consideration. The solver of the analyzer can
report an attack vector indicating required alteration for accessible sensor measurements to
misclassify the patient’s measurement through PSCM and evade ADM and thus achieve the
attack goal. The analyzer is capable of determining the most critical threats, which signify the
minimal attacker’s capability to obtain the attack goal. It can be said that an attack vector also
implies that an attacker cannot successfully launch an attack to accomplish the attack goal if its
capability is less than analyzer-provided capability. Hence the system is said to be threat
resilient up to that capability. If the solver produces unsatisfiable (UNSAT) results, it means
that, based on the attacker’s capability, the attack goal cannot be attained. In this case, the
analyzer gradually increases the attacker’s capability and redetermines the attack feasibility
until reaching an SAT point. The SMT solver uses several background theories to solve CSP
problems, and the solver outcome possesses a formal guarantee.
Case Study 1
The first case study is on the synthetic dataset that resembles the futuristic healthcare system
we are considering. The dataset comprised of various vital signs like heart rate, blood pressure,
blood alcohol, respiratory rate, etc. was collected from several datasets like Fetal ECG
synthetic dataset [33], UCI ML diabetes dataset [34], oxygen saturation variability dataset [35],
BIDMC PPG respiration dataset [36], and StatCrunch dataset [7]. Those datasets contain over
17,000 samples each with eight sensor measurements. The reference datasets of Table 18.3
were used to generate the data. The dataset contains sensor measurements of six different
patient statuses. Four different sensor measurements of a patient are demonstrated in Table
18.3. Our attack analyzer reports that the attacker cannot be successful with the attack intent of
letting the SHS PSCM to incorrectly classify a high blood cholesterol–labeled patient as a high
blood pressure one, compromising a single sensor measurement only. An attacker having
access to two sensor measurements (i.e., systolic blood pressure and blood oxygen) might
attain the attack goal and mislabel the patients so that caregivers deliver the wrong
medication/treatment. However, the ADM can detect the attack because altering two sensors’
measurements made for a consistency violation in terms of the learned patient sensor
measurement distribution derived from historic data. Our attack analyzer identified a possible
attack path by altering heart rate measurement, systolic blood pressure, diastolic blood
pressure, and blood oxygen sensor measurements by 2%, 7%, 2.5%, and 5.7%, respectively.
Sample DT and DBSCAN constraints from PSCM and ADM are shown in Table 18.4 and
Table 18.5.
Case Study 2
We have also verified the performance of our proposed framework using real test bed data
collected by the University of Queensland Vital Signs dataset [8]. The dataset collects 49
sensor measurements from more than 30 anesthesia patients undergoing surery at Royal
Adelaide Hospital and being monitored using Philips intellivue monitors and Datex-Ohmeda
anesthesia machine. The prepossessed dataset possesses 209,115 samples with 26 sensor
measurements and 58 patient status (labels) with 28 different alarms. Our attack analyzer
found the feasibility of different attacks by raising wrong alarms instead of the intended ones.
For instance, an adversary can activate alarms (APNEA, low blood pressure, low end-tidal
carbon dioxide, high inspired concentration of sevoflurane) instead of other (APNEA, high
minute volume) alarms by changing measurement values in artery diastolic pressure, artery
mean pressure, effective end-tidal decreased hemoglobin oxygen saturation label, inspired
decreased hemoglobin oxygen saturation label, end-tidal isoelectric point, inspired isoelectric
point, effective end-tidal concentration of sevoflurane, and inspired concentration of
sevoflurane sensors by 9%, 8%, 8.4%, 2.3%, 6%, 10%, 2%, 4%, respectively.
FIGURE 18.8 Frequency of the different sensors in the attack vectors for (Left) synthetic
dataset and (Right) UQVS dataset.
18.5.4 Scalability Analysis
The scalability analysis enables us to assess the feasibility of implementation of the proposed
attack analyzer for the large-scale system. The scalability of the analyzer is evaluated by
inspecting required time with variable size of the SHS. The size of SHS is varied by altering
the number of sensors to build our SHS model. Figure 18.9 and Figure 18.10 show that the
DBSCAN cluster creation time takes lot less time compared to the boundary creation time, and
there is a linear growth for the sensor measurements. The figures suggest that the constraints
generation for the ADM (DBSCAN) requires significantly more time than the DT clusters.
However, the cluster, boundary creation, and constraints generation are performed before
deploying the analyzer in real time. Hence the time requirements for these processes are
insignificant. From the figures, it can also be seen that the execution time for the solver
increases remarkably w.r.t. the attacker’s capability due to the need for checking more
constraints. In the case of real-time threat analysis, we can see an exponential increment of the
execution time requirement, which creates scalability concerns for large SHSs.
FIGURE 18.9 Execution time for the (a) cluster and boundary creation time, based on
number of sensor measurements; (b) ML constraints generation, based on number of sensor
measurements; (c) threat analysis based on threshold for data injection; and (d) threat
analysis based on the number of sensor measurements measured from the synthetic dataset.
FIGURE 18.10 Execution time for the (a) cluster and boundary creation time, based on
number of sensor measurements, (b) ML constraints generation, based on number of sensor
measurements, (c) threat analysis based on threshold for data injection, and (d) threat
analysis based on the number of sensor measurements measured from the UQVS dataset.
We consider a system to be resilient based to the degree to which it effectively and speedily
saves its critical capabilities from disruption and disturbance carried out by adverse conditions
and events. Our proposed analyzer can figure out the resiliency of a system for a targeted
attack condition. A system is said to be k-resilient if it can perform smoothly even k of its
components are faulty or nonresponding. The resiliency analysis for the synthetic dataset is
shown in Table 18.6, which signifies that an attacker cannot be successful in its attack goal of
misclassifying a normal patient into a high cholesterol patient provided it cannot modify more
than two devices, and hence the system is denoted as 2-resilient for the attack goal. An attacker
having alteration access to more than one sensor measurement can misclassify a high blood
sugar patient into an abnormal oxygen level patient or an excessive sweating state patient into
a normal one. Similarly, the UQVS dataset in which the alteration of the normal status patient
to show a decrease in hemoglobin oxygen saturation (DESAT) is 20-resilient. The resiliency
analysis capability of the proposed analyzer provides a design guide indicating the relationship
between the number of protected sensors and associated risk.
REFERENCES
1. National Health Expenditure Data – NHE Fact Sheet. (2020). Opgehaal 21 September
2021, van www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-
Reports/NationalHealthExpendData/NHE-Fact-Sheet
2. Chen, M., Li, W., Hao, Y., Qian, Y., & Humar, I. (2018). Edge cognitive computing based
smart healthcare system. Future Generation Computer Systems, 86, 403–411.
3. Wiens, J., & Shenoy, E. S. (2018). Machine learning for healthcare: on the verge of a major
shift in healthcare epidemiology. Clinical Infectious Diseases, 66(1), 149–153.
4. Bracken, B. (2021). Cyberattacks on Healthcare Spike 45% Since November. Opgehaal 16
Maart 2021, van https://threatpost.com/cyberattacks-healthcare-spike-ransomware/162770/
5. Hawkins, L. (2021). Cyberattacks increase in healthcare, but sector unprepared. Opgehaal
16 Maart 2021, van www.healthcareglobal.com/technology-and-ai-3/cyberattacks-increase-
healthcare-sector-unprepared
6. Dyrda, L. (2020). The 5 most significant cyberattacks in healthcare for 2020. Opgehaal 16
Maart 2021, van www.beckershospitalreview.com/cybersecurity/the-5-most-significant-
cyberattacks-in-healthcare-for-2020.html
7. Haque, N. I., Rahman, M. A., & Shahriar, H. (2021, July). Ensemble-based Efficient
Anomaly Detection for Smart Building Control Systems. In 2021 IEEE 45th Annual
Computers, Software, and Applications Conference (COMPSAC) (pp. 504–513). IEEE.
8. Liu, D., Görges, M., & Jenkins, S. A. (2012). University of Queensland vital signs dataset:
development of an accessible repository of anesthesia patient monitoring data for research.
Anesthesia & Analgesia, 114(3), 584–589.
9. Demirkan, H. (2013). A smart healthcare systems framework. It Professional, 15(5), 38–
45.
10. Fell, J. C., & Voas, R. B. (2014). The effectiveness of a 0.05 blood alcohol concentration
(bac) limit for driving in the United States. Addiction, 109(6), 869–874.
11. Petersen, H., Baccelli, E., & Wählisch, M. (2014, June). Interoperable services on
constrained devices in the internet of things. In W3C Workshop on the Web of Things.
12. Pimentel, M. A., Johnson, A. E., Charlton, P. H., Birrenkott, D., Watkinson, P. J.,
Tarassenko, L., & Clifton, D. A. (2016). Toward a robust estimation of respiratory rate
from pulse oximeters. IEEE Transactions on Biomedical Engineering, 64(8), 1914–1923.
13. Wang, S. C. (2003). Artificial neural network. In Interdisciplinary computing in java
programming (pp. 81–100). Springer, Boston, MA.
14. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81–106.
15. Haque, N. I., Khalil, A. A., Rahman, M. A., Amini, M. H., & Ahamed, S. I. (2021,
September). BIOCAD: Bio-Inspired Optimization for Classification and Anomaly
Detection in Digital Healthcare Systems. IEEE International Conference on Digital Health.
16. Saputro, D. R. S., & Widyaningsih, P. (2017, August). Limited memory Broyden-Fletcher-
Goldfarb-Shanno (L-BFGS) method for the parameter estimation on geographically
weighted ordinal logistic regression model (GWOLR). In AIP Conference Proceedings
(Vol. 1868, No. 1, p. 040009). AIP Publishing LLC.
17. Hagan, M. T., Demuth, H. B., & Beale, M. (1997). Neural network design. PWS
Publishing Co.
18. Mirjalili, S., & Lewis, A. (2016). The whale optimization algorithm. Advances in
engineering software, 95, 51–67.
19. Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Grey wolf optimizer. Advances in
engineering software, 69, 46–61.
20. Yang, X. S. (2009, October). Firefly algorithms for multimodal optimization. In
International symposium on stochastic algorithms (pp. 169–178). Springer, Berlin,
Heidelberg.
21. Haque, N. I., Rahman, M. A., Shahriar, M. H., Khalil, A. A., & Uluagac, S. (2021, March).
A Novel Framework for Threat Analysis of Machine Learning-based Smart Healthcare
Systems. arXiv preprint arXiv:2103.03472.
22. Wehbe, T., Mooney, V. J., Javaid, A. Q., & Inan, O. T. (2017, May). A novel physiological
features-assisted architecture for rapidly distinguishing health problems from hardware
Trojan attacks and errors in medical devices. In 2017 IEEE International Symposium on
Hardware Oriented Security and Trust (HOST) (pp. 106–109). IEEE.
23. Storm, D. (2015). MEDJACK: Hackers hijacking medical devices to create backdoors in
hospital networks. Opgehaal 08 Januarie 2020, van
www.computerworld.com/article/2932371/medjack-hackers-hijacking-medical-devices-to-
create-backdoors-in-hospital-networks.html
24. Almogren, A., Mohiuddin, I., Din, I. U., Almajed, H., & Guizani, N. (2020). Ftm-iomt:
Fuzzy-based trust management for preventing sybil attacks in internet of medical things.
IEEE Internet of Things Journal, 8(6), 4485–4497.
25. Bapuji, V., & Reddy, D. S. (2018). Internet of Things interoperability using embedded Web
technologies. International Journal of Pure and Applied Mathematics, 120(6), 7321–7331.
26. Deshmukh, R. V., & Devadkar, K. K. (2015). Understanding DDoS attack & its effect in
cloud environment. Procedia Computer Science, 49, 202–210.
27. Pournaghshband, V., Sarrafzadeh, M., & Reiher, P. (2012, November). Securing legacy
mobile medical devices. In International Conference on Wireless Mobile Communication
and Healthcare (pp. 163–172). Springer, Berlin, Heidelberg.
28. Asaeedi, S., Didehvar, F., & Mohades, A. (2017). α-Concave hull, a generalization of
convex hull. Theoretical Computer Science, 702, 48–59.
29. Patel, P., Sivaiah, B., & Patel, R. (2022). Approaches for finding optimal number of
clusters using K-means and agglomerative hierarchical clustering techniques. 2022
International Conference on Intelligent Controller and Computing for Smart Power
(ICICCSP), pp. 1–6, doi: 10.1109/ICICCSP53532.2022.9862439
30. Chen, Y., Zhou, X. S., & Huang, T. S. (2001, October). One-class SVM for learning in
image retrieval. In Proceedings 2001 International Conference on Image Processing (Cat.
No. 01CH37205) (Vol. 1, pp. 34–37). IEEE.
31. Ng, A. (2011). Sparse autoencoder. CS294A Lecture notes, 72(2011), 1–19.
32. Liu, K. S., Pinto, E. V., Munir, S., Francis, J., Shelton, C., Berges, M., & Lin, S. (2017,
November). Cod: A dataset of commercial building occupancy traces. 1–2).
33. Hypertension. (2018, April). Opgehaal van https://catalog.data.gov/dataset/hypertension/
34. Martin, R. J., Ratan, R. R., Reding, M. J., & Olsen, T. S. (2012). Higher blood glucose
within the normal range is associated with more severe strokes. Stroke research and
treatment, 2012.
35. Bhogal, A. S., & Mani, A. R. (2017). Pattern analysis of oxygen saturation variability in
healthy individuals: Entropy of pulse oximetry signals carries information about mean
oxygen saturation. Frontiers in physiology, 8, 555.
36. Pimentel, M., Johnson, A., Charlton, P., & Clifton, D. (2017). BIDMC PPG and
Respiration Dataset. Physionet, https://physionet.org/content/bidmc/1.0.0/.
A User-centric Focus for Detecting Phishing
Emails 19
DOI: 10.1201/9781003187158-23
Contents
19.1 Introduction
19.2 Background and Related Work
19.2.1 Behavioral Models Related to Phishing Susceptibility
19.2.2 User-centric Antiphishing Measures
19.2.3 Technical Antiphishing Measures
19.2.4 Research Gap
19.3 The Dataset
19.4 Understanding the Decision Behavior of Machine Learning Models
19.4.1 Interpreter for Machine Learning Algorithms
19.4.2 Local Interpretable Model-Agnostic Explanations (LIME)
19.4.3 Anchor Explanations
19.4.3.1 Share of Emails in the Data for Which the Rule
Holds
19.5 Designing the Artifact
19.5.1 Background
19.5.2 Identifying Suspected Phishing Attempts
19.5.3 Cues in Phishing Emails
19.5.4 Extracting Cues
19.5.5 Examples of the Application of XAI for Extracting Cues and
Phrases
19.6 Conclusion and Future Works
19.6.1 Completion of the Artifact
References
19.1 INTRODUCTION
Cyber attacks are becoming ever more widespread, and the majority of
cyber attacks are phishing attacks.1 A study conducted by Cofense,
formerly Phishme, found that 91% of all cyber attacks are phishing attacks.2
In a phishing attack, an email is disguised in such a way that recipients are
led to believe that the message is something that they want to read; hence
the recipients click on the link or attachment and ends up revealing
sensitive information.3 Since phishing emails have no specific
characteristic, they are difficult to detect, and little research has been done
on the detection of phishing emails from a user’s perspective.
For successful defense against phishing attacks, the ability to detect
phishing emails is of utmost necessity. Measures to detect phishing can be
classified into technical as well as user-centric [30, 39]. Technical measures
include detecting phishing emails before the user receives the email,
warning the email recipient through technical means [30]. Other technical
measures include up-to-date security software that prevent users from
following links to malicious websites by sending out warnings, asking for a
second confirmation, or directly blocking potentially fraudulent websites
[15]. Most research to date has focused on perfecting the technical aspects
by trying to perfect the classification of phishing emails through machine
learning (ML) or deep learning (DL) algorithms [51, 52].
To date, little emphasis has been placed on the user-centric measures as a
line of defense again phishing attacks. User-centric measures focus on an
email recipient and focus on increasing user awareness [30, 39].
Antiphishing training is often used to reduce susceptibility to phishing, but
such training often focuses on specific types of phishing [12]. New forms of
phishing attacks are continuously taking form, and it is difficult to keep up
with new attacks with antiphishing training [31]. Moreover, user-centric
measures need to be understood in the context of behavioral models.
Software might detect an email as phishing and warn the user, but the attack
may still be successful. This is because it is shown that users have to not
only trust but also understand the behavior of technical phishing detectors
to feel involved in the detection process [30].
As the second line of defense, this work develops an artifact that would
be an interpreter for the ML/DL phishing detector. LIME and anchor
explanations are used to take a closer look at cues of phishing emails and
how they can be extracted by the means of explainable AI (XAI) methods
[28]. Being able to interpret ML/DL algorithms enables an effective
interaction between the technical approach of using a ML/DL phishing
detector and a user-focused approach.
Figure 19.1 demonstrates dependence of the success of a phishing attack
on the interaction between technical phishing detectors and user
involvement. For effective defense, both technical phishing detectors and
active user involvement is necessary.
The rest of this paper is organized as follows: section 19.2 presents the
background and related works; section 9.3 presents a summary of the
dataset; section 19.4 presents the decision behavior of ML/DL models;
section 19.5 presents the design of the artifact. The overarching goal of the
artifact is to prevent users from falling for phishing attempts. Essential
technical as well as theoretical fundamentals for the development of an
artifact are provided. Section 19.6 presents the conclusions of this study.
Since it has been shown that even the best DL/ML phishing detectors
should be complemented by attentive email users to improve the detection
rate, two different frameworks for interpreting ML classification models are
presented.
f: Rd →R
f: X → Y
ED(z|R)[If(x)=f(z)] ≥ τ
with R(x) = 1, that is, whenever the probability for another email z satisfies
the rule R to produce the same prediction is at least τ, τ needs to be fixed
whereby a higher value of τ represents more confidence that a rule might be
related to the classification of an email. The preceding formula can also be
transformed into:
P(ED(z|R)[If(x)=f(z)] ≥ τ) ≥ 1 – δ,
that is, the probability for a rule being an anchor is at least 1−δ. Here, the
smaller the value for δ, the more difficult it is for a rule to be called an
anchor. For the usage of rules, the term coverage is of importance and can
be defined as follows:
19.4.3.1 Share of Emails in the Data for Which the Rule Holds
Once several rules that satisfy the anchor condition have been generated,
the one with the largest coverage is preferred. The question left unanswered
is how to get an anchor from a set of rules, and this is addressed with the
application of the beam-search method. This method represents an approach
to finding the optimal anchor with the largest coverage [46].
Beam-search is used to find the optimal number as well as set of
anchors. It is an improvement of the standard bottom-up construction [57].
The standard bottom-up construction starts with an empty rule R = ∅ as an
anchor R. The empty rule is complemented by the feature candidate with
the highest precision r1 so that the new anchor is:
R = {r1}
This iteration is repeated with the remaining feature candidates until the
probability of the constructed anchor hits 1 − δ [46]. The final anchor found
might not be optimal since, a feature, once added, cannot be removed from
the set in a later iteration [57]. The beam-size method addresses this
shortcoming and considers sets of B candidates instead of single candidates
in each step. This approach can provide anchors with larger coverage [46].
The value for τ as well as settings for the other parameters mentioned can
be fixed during the implementation of the anchor explanations. One
variable for which a value needs to be assigned is the confidence threshold.
It is defined as follows:
Definition 3: The share c of instances satisfying the rule that lead
to the same prediction can be described by
The minimum of this share is called the confidence threshold ct [27] and
will be set to ct = 0.95.
The higher the confidence threshold, the higher the probability that
another set of words will satisfy this rule and lead to the same prediction.
This means that the respective anchor candidate is more stable in
guaranteeing a specific prediction the higher the threshold.
These functionalities of the anchor explanations were implemented with
the help of the alibi packages for python. Just like the output of LIME, this
algorithm outputs the anchor, i.e., the set of words that characterize the
classification of this email. It also provides the emails with the word
replacements that led to the classification. The implementation was done
with the default settings of this algorithm, except that the anchor candidate
was replaced by UNK with a certain probability. An exemplary output for a
phishing email can be seen in Figure 19.3.
For this email, Figure 19.3, it can be noted that a precision of 0.75 was
set and that no anchor could be found that satisfied this precision.
Nevertheless, the algorithm output the best anchor from all candidate
anchors. The elements in this anchor are “a,” “us,” “1999,” “prevent,”
“account,” “are,” “.” and the anchor has a precision of about 0.007. Some of
the elements of the anchor seem to be useful in terms of presenting it to a
user such as “prevent” or “account,” but the remaining are less meaningful.
The computation time for finding anchor explanations for an email varies
significantly between the emails, from a few minutes up to several hours.
Taking this into account, in addition to not always being able to find
anchors satisfying the precision level and the existence of elements in the
anchor that lack in their content, there is indeed an upside potential for the
anchor explanations. The outputs of LIME and anchor explanations both
seem to provide reasonable and understandable explanations, highlighting
the relevant words or phrases in the email, making it more user-friendly
than providing all examples of “successful” replacements. With LIME
being only locally faithful, anchor explanations provide more stable
explanations for predictions. For the design of the artifact, the advantages of
both LIME and anchor explanations were incorporated for an optimal
solution.
19.5.1 Background
The goal of the artifact is to prevent users from falling victim to phishing
attempts. It achieves this objective by drawing the user’s attention to cues in
emails. This artifact enables users to carefully examine the parts of the
email most relevant to classifying it as a phishing attempt as well as alerts
the email recipient, with the objective of moving the user from System 1
behavior to System 2 behavior.
The artifact is designed, as shown in Figure 19.4, based on research in
the areas of phishing susceptibility, antiphishing training, and the generation
of explanations for document classification based on methods of AI [32, 34,
46]. An AI system takes an email as the input and assigns a score that
represents the likelihood that the particular email is a phishing attempt.
Based on the score, the emails are assigned to one of three classes:
“phishing,” “suspicious,” and “legitimate.” Emails classified as “phishing”
are immediately discarded, and those classified as “legitimate” are passed
through. For “suspicious” email, an XAI system determines which property
of the email’s text, that is, the presence of certain words, phrases, or
typographical features, led to the score assigned by the phishing detector.
For this, an efficient search-based XAI algorithm is employed that only
requires query access to the phishing detector. The parts of the text that
contributed to the email’s classification as “suspicious” are highlighted
when presented to the user.
Figure 19.6 illustrates the second point in Table 19.1. Blythe et al. [4]
state that a typical phishing email asks for a confirmation of banking
details. This information can be captured by explanation algorithms, since
this is often expressed by keywords such as “bank,” “account,” “update,”
“details,” etc.
Logos of the sender as well as the style of an email can only be partly
analyzed by the explanation algorithm. Just like links, the algorithm does
not recognize the logo as a logo, but it can recognize it as a characteristic
element in an email. Indirectly, the style of an email in terms of length or
tone [4] can also be analyzed. The tone of an email can be categorized by
finding relevant words. The length of an email, however, cannot be
expressed through LIME or anchor explanations, though this might be
important in neural networks. The email presented in Figure 19.9 is striking
due to its short length and its being in mostly capital letters with a lot of
exclamation marks. The capital letters and the exclamation marks evoke a
tone of excitement in that email. The word “ignore” was used as an
imputation method whenever the body of an email was empty so that the
algorithm does not give an error message when working with an empty
body, an empty subject, or both empty.
FIGURE 19.9 Example tone.
Other factors are important in helping the email user detect a phishing
email, and these are not included in the preceding methods, for example, the
order of the words in a message with some key structural elements [4].
These structural elements can be viewed in the preceding methods in an
indirect way, for example, through the tone of one of the emails just
described. The structure itself cannot be displayed as a characteristic
element in the decision of a neural network. However, a semantic network
analysis can be helpful for an algorithm to detect patterns in a classification
task by taking into account word frequencies, the proximity between words
or group of words, and co-occurences. However, the order in a message, for
example, cannot be expressed with the explanation algorithms used. The
same holds for trust or distrust of the user in an email, the boredom
proneness and lack of focus. These depend solely on the email user and can
only be affected by specific training on awareness. In summary, these
findings can help improve the detection of phishing emails.
NOTES
1. Website: www.csoonline.com/article/2117843/what-is-phishing-how-
this-cyber-attack-works-and-how-to-prevent-it.html
2. Website: https://cofense.com/
3. AppRiver. AppRiver, 2002.
4. Alibi anchors. https://docs.seldon.io/projects/alibi/en/stable/methods/
Anchors.html. Accessed: 2021-01-08.
5. Keras.io. callbacks–keras documentation. https://keras.io/callbacks/.
Accessed: 2021-01-05.
REFERENCES