0% found this document useful (0 votes)
15 views

Explaining Vulnerabilities To Adversarial Machine Learning Through Visual Analytics

1) This document describes a visual analytics framework for explaining and exploring machine learning model vulnerabilities to adversarial attacks. 2) The framework uses multi-faceted visualizations to analyze data poisoning attacks from the perspectives of models, data instances, features, and local structures. 3) It demonstrates the framework on binary classifiers and illustrates how visualizations can reveal model vulnerabilities under different attack strategies.

Uploaded by

Marco Ramirez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Explaining Vulnerabilities To Adversarial Machine Learning Through Visual Analytics

1) This document describes a visual analytics framework for explaining and exploring machine learning model vulnerabilities to adversarial attacks. 2) The framework uses multi-faceted visualizations to analyze data poisoning attacks from the perspectives of models, data instances, features, and local structures. 3) It demonstrates the framework on binary classifiers and illustrates how visualizations can reveal model vulnerabilities under different attack strategies.

Uploaded by

Marco Ramirez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO.

1, JANUARY 2020 1075

Explaining Vulnerabilities to Adversarial Machine Learning through


Visual Analytics
Yuxin Ma, Tiankai Xie, Jundong Li, Ross Maciejewski, Senior Member, IEEE

A B
1

7
Recall:
2 0.81 in the poisoned model 6
0.90 in the victim model F
C G
G.1

G.2
4

E D

Fig. 1. Reliability attack on spam filters. (1) Poisoning instance #40 has the largest impact on the recall value, which is (2) also depicted
in the model overview. (3) There is heavy overlap among instances in the two classes as well the poisoning instances. (4) Instance #40
has been successfully attacked causing a number of innocent instances to have their labels flipped. (5) The flipped instances are very
close to the decision boundary. (6) On the feature of words “will” and “email”, the variances of poisoning instances are large. (7) A
sub-optimal target (instance #80) has less impact on the recall value, but the cost of insertions is 40% lower than that of instance #40.

Abstract— Machine learning models are currently being deployed in a variety of real-world applications where model predictions are
used to make decisions about healthcare, bank loans, and numerous other critical tasks. As the deployment of artificial intelligence
technologies becomes ubiquitous, it is unsurprising that adversaries have begun developing methods to manipulate machine learning
models to their advantage. While the visual analytics community has developed methods for opening the black box of machine learning
models, little work has focused on helping the user understand their model vulnerabilities in the context of adversarial attacks. In this
paper, we present a visual analytics framework for explaining and exploring model vulnerabilities to adversarial attacks. Our framework
employs a multi-faceted visualization scheme designed to support the analysis of data poisoning attacks from the perspective of
models, data instances, features, and local structures. We demonstrate our framework through two case studies on binary classifiers
and illustrate model vulnerabilities with respect to varying attack strategies.
Index Terms—Adversarial machine learning, data poisoning, visual analytics

1 I NTRODUCTION real-world phenomena. To date, applications of these models are found


In the era of Big Data, Artificial Intelligence and Machine Learning in cancer diagnosis tools [19], self-driving cars [41], biometrics [58],
have made immense strides in developing models and classifiers for and numerous other areas. Many of these models were developed
under assumptions of static environments, where new data instances
are assumed to be from a statistical distribution similar to that of the
E ; = ;   % ;  !.*" ) . + + &&# & &$',+ %= training and test data. Unfortunately, the real-world application of these
%&)$+ * C  * &% 0*+$* % %) %= ) 1&% ++ % -)* +0; models introduces a dynamic environment which is home to malicious
B$ #< {0,/ %$=+/ 43=)$ !}D*,;,; individuals who may wish to exploit these underlying assumptions in
E ; * . + + ')+$%+ & #+) # % &$',+) % %) %= the machine-learning models. Consider e-mail spam filtering as an
% -)* +0 &  ) % ; B$ #< !#8("D- ) % ;,; example. To date, a variety of machine learning methods [11, 13] have
been developed to protect e-mail inboxes from unwanted messages.
Manuscript received 31 Mar. 2019; accepted 1 Aug. 2019.
Date of publication 16 Aug. 2019; date of current version 20 Oct. 2019. These methods build models to classify e-mail as spam or not-spam.
For information on obtaining reprints of this article, please send e-mail to: However, adversaries still want their spam messages to reach your
[email protected], and reference the Digital Object Identifier below. inbox, and these adversaries try to build input data (i.e., spam e-mails)
Digital Object Identifier no. 10.1109/TVCG.2019.2934631

1077-2626 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1076 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

that will fool the model into classifying their spam as safe. This can relations between feature values and predictions are revealed by using
be done by misspelling words that might cause the machine learning partial dependence diagnostics.
classifier to flag a mail as spam or by inserting words and phrases that While those approaches focused on utilizing model inputs and
might cause the classifier to believe the message is safe. Other adversar- outputs, other visual analytics work focuses on “opening the black box,”
ial attacks have explored methods to fake bio-metric data to gain access utilizing the internal mechanisms of specific models to help explain
to personal accounts [8] and to cause computer vision algorithms to model outputs. Work by Muhlbacher et al. [47] summarizes a set
misclassify stop signs [15]. Such exploits can have devastating effects, of guidelines for integrating visualization into machine learning algo-
and researchers are finding that applications of machine learning in real- rithms through a formalized description and comparison. For automated
world environments are increasingly vulnerable to adversarial attacks. iterative algorithms, which are widely used in model optimization,
As such, it is imperative that model designers and end-users be able to Muhlbacher et al. recommended exposing APIs so that visualization
diagnose security risks in their machine learning models. developers can access the internal iterations for a tighter integration of
Recently, researchers have begun identifying design issues and the user in the decision loop. In terms of decision tree-based models,
research challenges for defending against adversarial machine learning, BaobabView [61] proposes a natural visual representation of decision
such as data de-noising, robust modeling, and defensive validation tree structures where decision criterion are visualized in the tree nodes.
schemes [10, 62], citing the need to identify potential vulnerabilities BOOSTVis [36] and iForest [70] also focus on explaining tree ensemble
and explore attack strategies to identify threats and impacts. These models through the use of multiple coordinated views to help explain
challenges lend themselves well to a visual analytics paradigm, where and explore decision paths. Similarly, recent visual analytics work on
training datasets and models can be dynamically explored against the deep learning [24, 25, 30, 34, 44, 49, 55, 63–65, 68] tackles the issue
backdrop of adversarial attacks. In this paper, we present a visual of the low interpretability of neural network structures and supports
analytics framework (Figure 1) designed to explain model vulnera- revealing the internal logic of the training and prediction processes.
bilities with respect to adversarial attack algorithms. Our framework Model Performance Diagnosis: It is also critical for users to under-
uses modularized components to allow users to swap out various attack stand statistical performance metrics of models, such as accuracy and
algorithms. A multi-faceted visualization scheme summarizes the recall. These metrics are widely-used in the machine learning commu-
attack results from the perspective of the machine learning model and its nity to evaluate prediction results; however, these metrics provide only
corresponding training dataset, and coordinated views are designed to a single measure, obfuscating details about critical instances, failures,
help users quickly identify model vulnerabilities and explore potential and model features [2,69]. To better explain performance diagnostics, a
attack vectors. For an in-depth analysis of specific data instances variety of visual analytics approaches have been developed. Alsallakh
affected by the attack, a locality-based visualization is designed to et al. [1] present a tool for diagnosing probabilistic classifiers through
reveal neighborhood structure changes due to an adversarial attack. To a novel visual design called Confusion Wheel, which is used as a
demonstrate our framework, we explore model vulnerabilities to data replacement for traditional confusion matrices. For multi-class classifi-
poisoning attacks. Our contributions include: cation models, Squares [50] establishes a connection between statistical
performance metrics and instance-level analysis with a stacked view.
W A visual analytics framework that supports the examination, Zhang et al. [69] propose Manifold, a model-agnostic framework that
creation, and exploration of adversarial machine learning attacks; does not rely on specific model types; instead, Manifold analyzes the
input and output of a model through an iterative analysis process of
W A visual representation of model vulnerability that reveals the inspection, explanation, and refinement. Manifold supports a fine-
impact of adversarial attacks in terms of model performance, grained analysis of “symptom” instances where predictions are not
instance attributes, feature distributions, and local structures. agreed upon by different models. Other work has focused on profiling
and debugging deep neural networks, such as LSTMs [56], sequence-
2 R ELATED W ORK to-sequence models [55], and data-flow graphs [65].
Our work focuses on explaining model vulnerabilities in relation to ad- While these works focus on linking performance metrics to input-
versarial attacks. In this section, we review recent work on explainable output instances, other methods have been developed for feature-level
artificial intelligence and adversarial machine learning. analysis to enable users to explore the relations between features and
model outputs. For example, the INFUSE system [28] supports the
2.1 Explainable Artificial Intelligence - XAI interactive ranking of features based on feature selection algorithms and
Due to the dramatic success of machine learning, artificial intelli- cross-validation performances. Later work by Krause et al. [27] also
gence applications have been deployed into a variety of real-world proposed a performance diagnosis workflow where the instance-level
systems. However, the uptake of these systems has been hampered by diagnosis leverages measures of “local feature relevance” to guide the
the inherent black-box nature of these machine learning models [29]. visual inspection of root causes that trigger misclassification.
Users want to know why models perform a certain way, why models As such, the visual analytics community has focused on explain-
make specific decisions, and why models succeed or fail in specific ability with respect to model input-outputs, hidden layers, underlying
instances [18]. The visual analytics community has tackled this problem “black-box” mechanisms, and performance metrics; however, there is
by developing methods to open the black-box of machine learning still a need to explain model vulnerabilities. To this end, Liu et al. [33]
models [5, 35, 38, 39]. The goal is to improve the explainability of present AEVis, a visual analytics tool for deep learning models, which
models, allow for more user feedback, and increase the user’s trust visualizes data-paths along the hidden layers in order to interpret the
in a model. To date, a variety of visual analytics methods have been prediction process of adversarial examples. However, the approach is
developed to support model explainability and performance diagnosis. tightly coupled with generating adversarial examples for deep neural
Model Explainability: Under the black-box metaphor of machine networks, which is not extensible to other attack forms and model
learning, several model-independent approaches have been developed types. Our work builds upon previous visual analytics explainability
in the visual analytics community. EnsembleMatrix [59] supports the work, adopting coordinated multiple views that support various types
visual adjustment of preferences among a set of base classifiers. Since of models and attack strategies. What is unique in our work is the
the base classifiers share the same output protocol (confusion matrices), integration of attack strategies into the visual analytics pipeline, which
the approach does not rely on knowledge of specific model types. In allows us to highlight model vulnerabilities.
EnsembleMatrix, the users are provided a visual summary of the model
outputs to help generate insights into the classification results. The 2.2 Adversarial Machine Learning
RuleMatrix system [45] also focuses on the input-output behavior of a Since our goal is to support the exploration of model vulnerabilities, it
classifier through the use of classification rules, where a matrix based- is critical to identify common attack strategies and model weaknesses.
visualization is used to explain classification criterion. Similarly, model The four main features of an adversary (or attacker) [10, 62] are the
input-output behaviors were utilized in Prospector [29], where the adversary’s Goal, Knowledge, Capability, and Strategy, Figure 2 (Left).
MA ET AL.: EXPLAINING VULNERABILITIES TO ADVERSARIAL MACHINE LEARNING THROUGH VISUAL ANALYTICS 1077

General Aspects Targeted Poisoning Attack how their models may be vulnerable.
Goal Goal
3 D ESIGN OVERVIEW
What do we want from the attack? Make target instances classified as a
desired class Given the key features of an adversary, we have designed a visual
analytics framework that uses existing adversarial attack algorithms as
mechanisms for exploring and explaining model vulnerabilities. Our
Knowledge Knowledge framework is designed to be robust to general adversarial machine
How much do we know about the Perfect-knowledge setting learning attacks. However, in order to demonstrate our proposed
model? (know everything about the model) visual analytics framework, we focus our discussion on targeted data
poisoning attacks [10]. Data poisoning is an adversarial attack that
tries to manipulate the training dataset in order to control the prediction
Capability Capability behavior of a trained model such that the model will label malicious
In which way and to what extend can - Insert specially-crafted instances examples into a desired classes (e.g., labeling spam e-mails as safe).
we manupulate the training? - Limited number of insertions allowed Figure 2 (Right) maps the specific goal, knowledge, capabilities, and
strategies of a poisoning attack to the generalized adversarial attack.
For the purposes of demonstrating our framework, we assume that
Strategy Strategy the attack takes place in a white-box setting, i.e., the attacker has
Binary-Search Attack
full knowledge of the training process. Although the scenario seems
How can we achieve the goal? partial to attackers, it is not unusual for attackers to gain perfect- or
StingRay Attack
near-perfect-knowledge of a model by adopting multi-channel attacks
through reverse engineering or intrusion attacks on the model training
Fig. 2. Key features of an adversary. (Left) The general components servers [7]. Furthermore, in the paradigm of proactive defense, it is
an adversary must consider when planning an attack. (Right) Specific meaningful to use the worst case attack to explore the upper bounds
considerations in a data poisoning attack. of model vulnerability [10]. In terms of poisoning operations on the
training dataset, we focus on causative attacks [4], where attackers are
Goal: In adversarial machine learning, an attacker’s goal can be only allowed to insert specially-crafted data instances. This kind of
separated into two major categories: targeted attacks and reliability insertion widely exists in real-world systems, which need to periodically
attacks. In a targeted attack, the attacker seeks to insert specific collect new training data, examples include recommender systems and
malicious instances or regions in the input feature space and prevent email spam filters [53]. In such attacks, there is a limit to the number
these insertions from being detected [37, 51]. In a reliability attack, the of poisoned instances that can be inserted in each attack iteration, i.e.,
goal of the attacker is to maximize the overall prediction error of the a budget for an attack. An optimal attack attempts to reach its goal by
model and make the model unusable for making predictions [53]. using the smallest number of insertions within the given budget.
Knowledge: The information that can be accessed by an attacker
3.1 Analytical Tasks
plays a significant role in how an attacker will design and deploy attack
operations. The more knowledge an attacker has about a model (victim), After reviewing various literature on poisoning attacks [9, 10, 23, 46, 51,
the more precise an attack can be. In a black-box model, the attacker 53, 57, 60, 62], we extracted common high-level tasks for analyzing poi-
will have imprecise (or even no) knowledge about the machine learning soning attack strategies. These tasks were refined through discussions
model, while in a white-box setting, the attacker will have most (if with our co-author, a domain-expert in adversarial machine learning.
not all) of the information about the model, including the model type,
T1 Summarize the attack space. A prerequisite for many of the
hyper-parameters, input features, and training dataset [10].
algorithms is to set target instances to be attacked in the training dataset.
Capability: The capability of the attacker refers to when and what In our framework, analysts need to be able to identify attack vectors and
the attacker can do to influence the training and testing process to vulnerabilities of the victim model in order to specify target instances.
achieve the attack’s goal. Where the attack takes place (i.e., the stage
of the modeling process - training, testing) limits the capability of T2 Summarize the attack results. By following the well-known
the attacker. For example, poisoning attacks [9, 66] take place during visual information seeking mantra [52], the system should provide
the training-stage, and the attacker attempts to manipulate the training a summary of the attack results after an attack is executed. In data
dataset. Typical operations in data poisoning attacks include adding poisoning, typical questions that the attackers might ask include:
noise instances and flipping labels of existing instances. An evasion
attack [6, 16, 21] takes place during the testing stage. Such an attack is W T2.1 How many poisoning data instances are inserted? What is
intended to manipulate unlabeled data in order to avoid detection in the their distribution? Has the attack goal been achieved yet?
testing stage without touching the training process. In all of these cases,
the attacker is constrained by how much they can manipulate either the W T2.2 What is the performance of the model before and after the
training or test data without being detected or whether the training and attack and is there a significant difference? How many instances
test data are even vulnerable to such attacks. in the training dataset are misclassified by the poisoned model?
Strategy: Given the attacker’s goal, knowledge, and capabilities, all T3 Diagnose the impact of data poisoning. In this phase, the user
that remains is for the attacker to design an attack strategy. An optimal explores the prediction results and analyzes the details of the poisoned
attack strategy can be described as maximizing the attack effectiveness model. Inspired by the recent work in interpretable machine learning [1,
while minimizing the cost of data manipulation or other constraints [43]. 27, 50, 69], we explore the influence of insertion focusing on: attribute
Currently, numerous adversarial machine learning attacks are being changes for individual instances; and drifts of data distributions on
developed, with evasion and poisoning strategies receiving the most features due to poisoning. We consider both instance-level and feature-
attention [60]. In evasion attacks, a common strategy is to add noise level diagnoses when investigating the impact of poisoning data. The
to test data instances. Goodfellow et al. [21] proposed a method to following questions are explored in this phase:
add “imperceptible” noise to an image, which can drastically confuse
a trained deep neural network resulting in unwanted predictions. For W T3.1 At the instance-level, is the original prediction different from
poisoning attacks, the strategies are usually formalized as bi-level the victim model prediction? How close is the data instance to
optimization problems, such as gradient ascending [9] and machine the decision boundary? How do the neighboring instances affect
teaching [43]. Common among these attacks is the goal of manipulating the class label? Is there any poisoned data in the data-instance’s
the trained model, and it is critical for users to understand where and top-k nearest neighbors?
1078 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

W T3.2 At the feature-level, what is the impact of data poisoning on Vulnerability Analysis
the feature distributions? Attack Results
Victim Model Poisoning
Algorithm Victim Model Performance
3.2 Design Requirements Victim Training Dataset Inserted Poison
Metrics
1 - - - -
From the task requirements, we iteratively refined a set of framework 2 - - - -
For each
1
2 -
- -
-
-
-
-
-
design requirements to identify how visual analytics can be used to best instance 3 - - - - Vulnerability
3 - - - -
4 - - - - Measures
to support attack analysis and explanation. We have mapped different 4 - - - -
analytic tasks to each design requirement.
Visualizing the Attack Space - D1. The framework should allow Attack Space Analysis
users to upload their victim model and explore vulnerabilities. By
Victim Training Dataset
examining statistical measures of attack costs and potential impact, the 1 - - - -
users should be able to find weak points in the victim model depending 2 - - - -
3 - - - -
on the application scenario, and finally identify desired target instances 4 - - - -
for in-depth analysis in the next step (T1).
Attack Results for All Instances Data Table View
Visualizing Attack Results - D2. To analyze the results of an attack,
the framework should support overview and details-on-demand: Attack Result Analysis
W Model Overview - D2.1, summarize prediction performance for
Feature
e
the victim model as well as the poisoned model (T2.2); View
W Data Instances - D2.2, present the labels of the original and
poisoned data instances (T2.1, T3.1); Model Overview

W Data Features - D2.3, visualize the statistical distributions of data


along each feature (T3.2); Instance View
w
Local Impact View
W Local Impacts - D2.4, depict the relationships between target data
instances and their nearest neighbors (T3.1). Fig. 3. A visual analytics framework for explaining model vulnerabilities
to adversarial machine learning attacks. The framework consists of:
4 V ISUAL A NALYTICS F RAMEWORK vulnerability analysis, attack space analysis, and attack result analysis.
Based on the user tasks and design requirements, we have developed
a visual analytics framework (Figure 3) for identifying vulnerabilities training data in the opposite class {xi |yi = −yt }. The classification
to adversarial machine learning. The framework supports three main model acts as an outlier detector and separates this target from the
activities: vulnerability analysis, analyzing the attack space, and analyz- opposite class −yt . For crafting poisoning instances in a Binary-Search
ing attack results. Each activity is supported by a unique set of multiple attack, the goal is to establish connections between the target and
coordinated views, and the user can freely switch between interfaces the desired class −yt that mitigate the outlyingness of the target. As
and views. All views share the same color mapping in order to establish illustrated in Figure 4, for each iteration, the Binary-Search Attack
a consistent visual design. Negative and positive classes are represented utilizes the midpoint xmid between xt and its nearest neighbor xnn in
by red and blue, respectively, and the dark red and blue colors are used the opposite class, −yt , as a poisoning candidate. If this midpoint is in
for indicating the labels of poisoning data instances. All actions in our the desired class, it is considered to be a valid poisoning instance. This
framework are predicated on the user loading their training data and instance is appended to the original training dataset, and the model is
model. While our framework is designed to be modular to an array of re-trained (θ1 in Step 3 - Figure 4). In this way, the poisoned instances
attack algorithms, different performance and vulnerability measures are iteratively generated, and the classification boundary is gradually
are unique to specific attack algorithms. Thus, for discussion and pushed towards the target until the target label is flipped. Sometimes the
demonstration, we instantiate our framework on data poisoning attacks. midpoint may be outside of the desired class. Under this circumstance,
a reset of the procedure is required by using the midpoint between
4.1 Data-Poisoning Attack Algorithms xmid and xnn as the new candidate.
We focus on the binary classification task described in Figure 4 (a) StingRay Attack. The StingRay attack [57] inserts new copies of
where the training data instances are denoted as x ∈ X , X ⊆ Rn×d existing data instances by perturbing less-informative features. The
with class labels of y ∈ {−1, +1} (we refer to the −1 labels as negative StingRay attack shares the same assumptions and pipeline as the Binary-
and the +1 labels as positive). A classification model θ is trained on Search attack. The main difference between the attacks is how poi-
the victim training dataset, which creates a victim model. For a target soning instances are generated (Step 2, Figure 4). In StingRay, a base
data instance xt and the corresponding predicted label yt = θ(xt ), the instance, xnn , near the target, xt , in the desired class is selected, and a
attacker’s goal is to flip the prediction yt into the desired class −yt copy of the base instance is created as a poisoned candidate. By using
by inserting m poisoning instances P = {pi |pi ∈ Rd , i ∈ [1, m]}. some feature importance measures, a subset of features are selected
We use B to represent the budget, which limits the upper bound of m, for value perturbation on the poisoned candidate. After randomly
i.e., an attacker is only allowed to insert at most B poisoned instances. perturbing the feature values, the poisoned instance closest to the target
To maximize the impact of data poisoning on the classifier, the attack is inserted into the training data.
algorithms craft poisoned instances in the desired class, ypi = −yt .
4.1.2 Attack Results
4.1.1 Attack Strategies
Both attacks insert poisoned data instances into the victim training
Various attack algorithms have been developed to create an optimal set dataset resulting in the poisoned training dataset. The model trained
of P with |P| ≤ B. To demonstrate how attacks can be explored in on this poisoned dataset is called the poisoned model, and we can
our proposed framework, we implement two different attack algorithms explore a variety of performance metrics to help explain the results
(Binary-Search and StingRay) described in Figure 4 (b). of an attack (e.g., prediction accuracy, recall). For data instance level
Binary-Search Attack1 . The Binary-Search Attack [12] assumes that analysis (D2.2), we derive two metrics that can characterize the impact
the target instance xt can be considered as an outlier with respect to the of data poisoning on the model predictions.
1 Forsimplicity, we refer to the Burkard and Lagesse algorithm [12] as Decision Boundary Distance (DBD) [22]: In a classifier, the decision
“Binary-Search Attack” even though it is not named by the original authors. boundary distance is defined as the shortest distance from a data in-
MA ET AL.: EXPLAINING VULNERABILITIES TO ADVERSARIAL MACHINE LEARNING THROUGH VISUAL ANALYTICS 1079

Loop until șpoisoned(xt) = -1, or return “failed” if upper bound B is reached.


Model ș
+1 -1
Step 1: Find the nearest neighbor Step 2: Insert a poisoning instance Step 3: Retrain the model

The Nearest Neighbor xnn Binary-Search StingRay Binary-Search StingRay


Attack Attack Attack Attack
Target xt ș1 ș2

Perturb xnn
xt xt xt
xmid
xnn xnn xnn xp

ymid  ynn
Goal: prevent xt from being classified as +1
Set xmid as the midpoint between xnn and xmid, and go back to Step 1

(a) Attack Setup (b) Binary-Search and StingRay Attack

Fig. 4. An illustration of data poisoning attacks using the Binary-Search and Stingray algorithms. (a) In a binary classification problem, the goal of a
data-poisoning attack is to prevent the target instance, xt , from being classified as its original class. (b) The Binary-Search and StingRay attacks
consist of three main steps: 1) select the nearest neighbor to the target instance, 2) find a proper poisoning candidate, and 3) retrain the model with
the poisoned training data. The procedure repeats until the predicted class label of xt is flipped, or the budget is reached.

Decision Boundary high risk, since only a small amount of poisoned instances can cause
4 steps

3 steps
label flipping in these data instances, and poisoning rates of 20% are
likely infeasible (high risk of being caught). We define three levels for
Unit ball the poisoning rates: 1) high risk (red) - lower than 5%; 2) intermediate
DBD = 3 × step length risk (yellow) - 5% to 20%, and; 3) low risk (green) - more than 20% .
The rows in the table can be sorted by assigning a column as the
Randomly-sampled sorting key. The user can click on one of the checkboxes to browse
directions (infinite steps) details on the data ID, class label, and feature values, Figure 1 (B). In
4 steps
addition, the clicking operation will trigger a dialog to choose between
the two attack algorithms, and the interface for the corresponding attack
Fig. 5. Estimating the decision boundary distance. (Left) Six directional result will be opened in a new tab page below.
vectors are sampled from the unit ball. (Right) For each direction, the
original instance is perturbed one step at a time until it is in the opposite 4.3 Visualizing the Attack Results
class. In this example, the direction highlighted by the red rectangle is
the minimum perturbed step (3 steps) among all the directions. After selecting a target instance and an attack algorithm, the user can
perform an in-depth analysis of the corresponding attack results. To
stance to the decision boundary. Under the assumption of outlyingness visualize the results of the attack, we use four views: model overview,
in the Binary-Search or StingRay attack, DBD is an indication of the instance view, feature view, and kNN graph view.
difficulty of building connections between a target instance and its Model Overview: The model overview provides a summary of the
opposite class. However, it is difficult (and sometimes infeasible) poisoned model as well as a comparison between the original (victim)
to derive exact values of DBD from the corresponding classifiers, and poisoned model (T2, D2.1). The model overview (Figure 1 (C))
especially in non-linear models. We employ a sample-based, model- provides a brief summary of the names of the victim and the poisoned
independent method to estimate the DBDs for the training data as models, the ID of the target data instance, and the class of the poisoned
illustrated in Figure 5. First, with a unit ball centered at the data instances. A radar chart is used to describe the performance of the two
instance, we uniformly sample a set of unit direction vectors from the models. The four elements commonly used in confusion matrices (true
ball. For each vector, we perturb the original instance along the vector negative (TN), false negative (FN), true positive (TP), and false positive
iteratively with a fixed step length, predict the class label with the (FP)) are mapped to the four axes on the left side of the radar chart, and
classifier, and stop if the prediction is flipped. We use the number of accuracy, recall, F1 and ROC-AUC scores are mapped to the right side.
perturbation steps as the distance to the decision boundary. We use the When hovering on the lines, the tooltip shows the detailed values on
product of step length and the minimum steps among all the directions the axes. The two lines in the radar chart can be disabled or enabled by
as an estimation of the DBD for each data instance. clicking on the legends.
Minimum Cost for a Successful Attack (MCSA): To help users Instance View: The instance view illustrates changes in the training
understand the cost of an attack with respect to the budget, we calculate datasets and supports the comparative analysis of predictions made
the minimum number of insertions needed to attack a data instance. by the victim and poisoned models from the perspective of individual
For each data instance, the MCSA is the number of poisoning instances data instances (T3.1, D2.2). The instance view is comprised of two
that must be inserted for a successful attack under an unlimited budget. sub-views, a projection view and an instance attribute view, which
The MCSA value is dependent on the attack algorithm. visualize data instances under the “overview + detail” scheme.
4.2 Visualizing the Attack Space Projection View: The projection view (Figure 1 (D)) provides a global
The data table view (Figure 1 (B)) acts as an entry point to the attack picture of the data distribution, clusters, and relationships between
process. After loading a model, all the training data instances are listed the original and poisoned instances. We apply the t-SNE projection
in the table to provide an initial static check of vulnerabilities (T1, D1). method [40] to the poisoned training dataset. The projection coordinates
Each row represents a data instance in the training dataset, and columns are then visualized in a scatterplot. We share the colors used in the
describe attributes and vulnerability measures which includes the DBD Model Overview, where red is for label predictions in the negative
and MCSA for both the Binary-Search and StingRay attack algorithms, class and blue for the positive class. To support comparisons between
as well as the original and the predicted labels. Inspired by Jagielski the victim and poisoned model, we apply the corresponding poisoning
et al. [23] and Steinhardt et al. [53], we use colored bars for MCSA color to the border of poisoned instances and stripe patterns to the data
to highlight different vulnerability levels based on the poisoning rates, instances whose class prediction changed after the attack.
which is defined as the percentage of poison instances in the entire Instance Attribute View: The instance attribute view (Figure 1 (E))
training dataset. Poisoning rates of lower than 5% are considered to be uses a table-based layout where each row represents the attributes of
1080 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

1
1.00 0.50 Dataset Find Build
Length: Classification Probability - - - - kNNs
... kNN Graph
(Most (Most 1
Certain) Uncertain)
2 - - - - 2
3 - - - - * Any Poisoning
Positive Negative
4 - - - - Instance
Class Class
(a) kNN Graph Building

Arrow: from the prediction of the Target and Innocent Instances


Virtual Decision Boundary
victim model to the poisoned one Color: The predicted label
Inner Ring: The class distribution
of kNNs in the victim model Texture: Whether the predicted label
Decision Boundary Distance is flipped from the victim model
Outer Ring: The class distribution
of kNNs in the poisoned model
Size: Classification Probability
Positive On the decision boundary Negative
Poisoning and kNN Instances Edges
Fig. 6. Design of the virtual decision boundaries in the instance attribute
Normal Edge
view. The central vertical line acts as the virtual decision boundary. Two Size: Classification Gray lines for showing the
Probability topological structures
circles representing the prediction results of the victim and the poisoned
models are placed beside the line. In this example, the data instance Lightness: Its total
exported impact value Impact (When edge highlighted)
far away from the decision boundary was classified as positive with a 0 Largest
Color: The source node of the impact
relatively high probability. However, in the poisoned model, the instance Solid small circles for other kNN Gradient: Direction of the impact
crosses the boundary, causing the label to flip. instances Thickness: Impact value

an individual data instance including classification probabilities and (b) Visual Encoding of Node
DBDs from the victim and poisoned model. To conduct a comparison
between the attributes of the victim and poisoned models, we embed an Fig. 7. Visual design of the local impact view. (a) The process of building
illustration of attribute changes into the rows using a virtual decision kNN graph structures. (b) The visual encodings for nodes and edges.
boundary, Figure 6. Here, the vertical central line acts as a virtual
decision boundary and separates the region into two half panes indicat- For the target and innocent instances, we extract their kNNs before
ing the negative and positive class regions. Two glyphs, representing the attack, i.e., the top-k nearest non-poisoned neighbors. This allows
the predictions of the victim and the poisoned models, are placed the user to compare the two sets of kNNs to reveal changes in the local
in the corresponding half panes based on the predicted class labels. structures after inserting poisoned instances.
The horizontal distances from the center dots to the central line are The design of the local impact view is based on a node-link diagram
proportional to their DBDs. To show the direction of change, we link of the extracted kNN graph where the data instances are represented
an arrow from the victim circle to the poisoned circle. Additionally, the as nodes. The coordinates of the nodes are computed with the force-
classification probabilities are mapped to the length of the lines in the directed layout method on the corresponding graph structure. We use
glyph. A set of options are provided in the top right corner of the view three different node glyphs to encode the data instances depending on
for filtering out irrelevant instances based on their types. the instance type (target, poisoned, innocent), Figure 7 (b).
Feature View: The feature view is designed to reflect the relationship For the target and innocent instances, we utilize a nested design
between class features and prediction outputs to help users understand consisting of three layers: a circle, an inner ring, and an outer ring. The
the effects of data poisoning (T3.2, D2.3). In Figure 1 (F), each row in circle is filled with a blue or red color representing the predicted label.
the list represents an individual feature. The feature value distribution is A striped texture is applied to the filled color if the label predicted
visualized as grouped colored bars that correspond to positive, negative, by the poisoned model is different from the victim one, indicating
and poisoning data. To facilitate searching for informative features, the that label flipping has occurred for this data instance. Additionally,
rows can be ranked by a feature importance measure on both the victim the classification probability from the poisoned model is mapped to
and the poisoned models. In our framework, we utilize the feature the radius of the circle. The inner ring uses two colored segments to
weights exported from classifiers as the measure, e.g., weight vectors show the distribution of the two classes in the k-nearest non-poisoning
for linear classifiers and Gini importance for tree-based models. In the neighbors. The outer ring is divided into three segments that correspond
list, the importance values and their rankings from the two models, as to the negative and positive classes and poisoning instances in the kNN.
well as the difference, are shown in the last three columns. For poisoned instances, we use circles that are filled with the
corresponding poisoning color. To depict the total impact on its
Local Impact View: In order to understand model vulnerabilities,
neighborhoods, we map the sum of the impact values due to poisoned
users need to audit the relationship between poisoned instances and
instances to the lightness of the filled color. As in the encoding of the
targets to gain insights into the impact of an attack (T3.1, D2.4). We
target instances, the radius of the poisoned instance circles represent
have designed a local impact view, Figure 1 (G), to assist users in
the classification probability. All other data instances are drawn as
investigating the neighborhood structures of the critical data instances.
small dots colored by their corresponding prediction labels.
For characterizing the neighborhood structures of data instances,
we utilize the k-nearest-neighbor graph (kNN graph), Figure 7 (a), to The edges in the local impact view correspond to measures of relative
represent the closeness of neighborhoods, which can reveal the potential impacts, which are represented by directed curved edges. Inspired by
impact on the nearby decision boundary. A poisoned instance that is the classic leave-one-out cross validation method, the relative impact is
closer to a target may have more impact on the predicted class of the a quantitative measure of how the existence of a data instance (poisoned
target. Such a representation naturally corresponds to the underlying or not) influences the prediction result of another instance with respect
logic of the attack algorithms, which try to influence the neighborhood to the classification probability. Algorithm 1 is used to calculate the
structures of target instances. Our view is designed to help the user impact of a neighbor xnn on a data instance x. First, we train a new
focus on the most influential instances in an attack. To reduce the model with the same parameter settings as the poisoned model; however,
analytical burden, we condense the scale of the kNN graph to contain xnn is excluded. Then, we compute the classification probability of x
only three types of instances as well as their k-nearest neighbors: with this new model. Finally, the relative impact value is calculated as
the absolute difference between the new probability and the previous
1. The target instance, which is the instance being attacked; one. To indicate the source of the impact, we color an edge using the
2. The poisoning instances, and; same color as the impacting data instance. The color gradient maps to
3. The “innocent” instances, whose labels are flipped after an attack, the direction of impact and curve thickness maps to the impact value.
which is a side-effect of poisoning. Additionally, since the kNN graph may not be a fully-connected graph,
MA ET AL.: EXPLAINING VULNERABILITIES TO ADVERSARIAL MACHINE LEARNING THROUGH VISUAL ANALYTICS 1081

1 Data Table View 6 Instance Attribute View 5 Local Impact View

2 The Selected Target 3 Model Overview 4 Projection View

Fig. 8. A targeted attack on hand-written digits. (1) In the data table view, we identify the target instance #152 (2) as a potential vulnerability. (3) In
the model overview, we observe no significant change of the prediction performance after an attack on #152 occurs. (4) In the projection view, the
two classes of instances are clearly separated into two clusters. The poisoning instances (dark blue circles) penetrate class Number 8 and reach the
neighboring region of instance #152. (5) The attack can also be explored in the local impact view where poisoning nodes and the target show strong
neighboring connections. (6) The detailed prediction results for instance #152 are further inspected in the instance attribute view.

we employ dashed curves to link the nodes with the minimum distances implemented in Python Scikit-Learn library [48], using 200 randomly
between two connected components in the kNN graph. sampled images from the numbers 6 and 8, respectively. The value of
k for extracting kNN graphs in the local impact view is set to 7.
Algorithm 1: Computing the impact of xnn on x
Data: training dataset X ; two instances x ∈ X , xnn ∈ X ; Initial Vulnerability Check (T1): After the training dataset and model
previous classification probability of x, px are loaded into the system, vulnerability measures are automatically
Result: The impact value of xnn on x, I(xnn , x) calculated based on all possible attacks from the Binary-Search and
1 θ ← Classifier(X \ {xnn }) StingRay Attack, and results are displayed in the data table view

2 px ← Probability of θ(x) (Figure 8 (1)). By ranking the two columns of MCSAs for each attack
 algorithm, the user finds that the red bar colors indicate that many of
3 I(xnn , x) ← |px − px |
the data instances are at high risk of a low cost poisoning attack. From
The local impact view supports various interactions on the kNN the table, the user can also observe that the accuracy and recall values
graph. Clicking on a node glyph in the local impact view will highlight are not highly influenced by an attack, suggesting that a targeted attack
the connected edges and nodes and fade out other irrelevant elements. on a single instance will not influence prediction performances. To
A tooltip will be displayed as well to show the change of neighboring some extent, this may disguise the behavior of a targeted attack by not
instances before and after the attack. The highlighting effects of data alerting the model owners with a significant performance reduction.
instances are also linked between the projection view and the local
impact view. Triggering a highlighting effect in one view will be Visual Analysis of Attack Results (T2, T3): Next, the user wants to
synchronized in the other one. explore a potential worst case attack scenario. Here, they select the
One limitation in the proposed design is the potential for visual instance with the largest MCSA among all the data instances (instance
clutter once the size of the graph becomes considerably large. In order #152, 3.5% in poisoning rate) (Figure 8 (2)) under the StingRay attack.
to provide a clear entry point and support detail-on-demand analysis, As illustrated in Figure 8 (3), first the user performs a general check of
we support various filters and alternative representations to the visual the model performance (T2.2). In the model overview, the two lines
elements. By default, the edges are replaced by gray lines, which only on four performance metrics in the radar chart overlap, indicating little
indicates the linking relationships between nodes. Users can enable to no model performance impact after a poisoning attack. Next, the
the colored curves mentioned above to examine the impacts with a list user explores the distribution of the poisoning instances (T2.1, T3.1).
of switches, Figure 1 (G.1). Unnecessary types of nodes can also be In the projection results, Figure 8 (4), the poisoning instances span
disabled with the filtering options, Figure 1 (G.2). the border region of two clusters and flip the prediction of the target
instance. However, there are no other innocent instances influenced
5 C ASE S TUDY AND E XPERT I NTERVIEW by the poison insertions. The user can further inspect the impact of at
attack on instance #152 by examining the local impact view, Figure 8
In this section, we present two case studies to demonstrate how our (5). Here, the user can observe that in a poisoning attack on instance
framework can support the analysis of data poisoning attacks from the #152, the neighborhood of #152 must be heavily poisoned, and these
perspective of models, data instances, features, and local structures. We poison insertions establish a connection between the target instance
also summarize feedback from four domain experts. #152 and two other blue instances, leading to label flipping. In this
5.1 Targeted Attack on Hand-written Digits case, the user can identify that the sparsity of the data distribution in the
feature space may be contributing to the vulnerability of instance #152.
Digit recognizers are widely-used in real applications including auto- Finally, the user explores the detailed prediction result of instance #152
graders, automatic mail sorting, and bank deposits. In such a system, by navigating to the instance attribute view (Figure 8 (6)). Here, the
an attacker may wish to introduce reliability issues that can result in user observes that the label has flipped from Number 8 (red) to Number
mis-delivered mail, or create targeted attacks that cause checks to be 6 (blue); however, the poisoning results in a very short DBD and a low
mis-read during electronic deposit. For this case study, we employ a toy classification probability for instance #152.
example in which a model is used to classify hand-written digits. This
case study serves as a mechanism for demonstrating system features. Lessons Learned and Possible Defense: From the analysis, our do-
For this classifier, we utilize the MNIST dataset [31], which contains main expert identified several issues in the victim model and dataset.
60,000 images of ten hand-written digits at a 28×28 pixel resolution First, even if instance #152 is successfully poisoned, the instance is
(784 dimensions in total). We trained a Logistic Regression classifier, fairly near the decision boundary of the poisoned model, which can be
1082 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

identified by the low value of DBD and the low classification probability. possibility of flipping innocent instances coupled with a decrease in
If any further data manipulations occur in the poisoned dataset, the prediction performance. In the local impact view (Figure 1 (4)), it can
prediction of the target instance may flip back, i.e., #152 is sensitive be observed that the poisoning instances are also strongly connected to
to future manipulations and the poisoning may be unstable. For the each other in their nearest neighbor graph (T2.1). Additionally, there
attackers, additional protection methods that mitigate the sensitivity are five poisoning instances with a darker color than the others. As
of previous target instances can be adopted by continuously attacking the lightness of poisoning nodes reflects their output relative impact,
neighboring instances, further pushing the decision boundary away these five neighbors of the target instance contributes more to the
from the target, or improving attacking algorithms to insert duplicated prediction results than other poisons. For target instance #40, the outer
poisons near the target. Our domain expert was also interested in the ring consists only of the poisoning color, indicating that it must be
pattern of a clear connection from the two blue instances to instance completely surrounded by poisoning instances in order for the attack
#152 in the local impact view. He noted that it may be due to data spar- to be successful. Additionally, in a successful attack, there would be
sity, where no other instances are along the connection path established more than 20 innocent instances whose label are flipped from spam to
by the poisoning instances, resulting in #152 having a high vulnerability non-spam, which is the main cause of the decreased recall value. After
to poisoning insertions. For defenders who want to alleviate the sparsity examining the details of these instances in Figure 1 (5), we found that
issue and improve the security of the victim model, possible solutions most of their DBDs in the victim model are relatively small, i.e., they
could be to add more validated labeled samples into the original training are close to the previous decision boundary. As such, their prediction
dataset and adopt feature extraction or dimension reduction methods to can be influenced by even a small perturbation of the decision boundary.
reduce the number of the original features. Finally, we conducted a feature-level analysis by browsing the feature
view (T3.2, Figure 1 (6)). We find that for distributions of poisoning
5.2 Reliability Attack on Spam Filters instances along each feature, the variances are quite large on some
For spammers, one of their main goals is to maximize the number of words including “will” and “email”. This suggests that there are large
spam emails that reach the customers’ inbox. Some models, such as gaps between the non-spam emails and instance #40 on these words in
the Naive Bayes spam filter, are extremely vulnerable to data poisoning terms of word frequencies, which could be exploited by attackers when
attacks, as known spammers can exploit the fact that the e-mails they designing the contents of the poisoned emails.
send will be treated as ground truth and used as part of classifier training. Lessons Learned and Possible Defense: From our analysis, our
Since known spammers will have their mail integrated into the modeling domain expert was able to identify several key issues. First, from
process, they can craft poisoned data instances and try to corrupt the the distribution of impact values and classification probabilities among
reliability of the filter. These specially-crafted emails can mislead the the poisoning instances, an interesting finding was that the poisoning
behavior of the updated spam filter once they are selected in the set of instances close to the target are more uncertain (i.e., of low classification
new samples. In this case study, we demonstrate how our framework probability values) and essential to flipping its label. Our domain expert
could be used to explore the vulnerabilities of a spam filter. mentioned that further optimization may be performed by removing
We utilize the Spambase dataset [17] that contains emails tagged poisoning instances far away from the target because their impact and
as non-spam and spam collected from daily business and personal classification uncertainty could be too low to influence the model train-
communications. All emails are tokenized and transformed into 57- ing. Second, even though an attack on instance #40 has the maximum
dimensional vectors containing statistical measures of word frequencies influence on the recall value, there is a large (but not unfathomable)
and lengths of sentences. For demonstration purposes, we sub-sampled cost associated with inserting 51 poisoning instances (poisoning rate =
the dataset into 400 emails, keeping the proportion of non-spam and 12.75%). Given the large attack cost, our domain expert was interested
spam emails (non-spam:spam = 1.538:1) in the original dataset, re- in exploring alternative attacks with similar impacts and lower costs,
sulting in 243 non-spam instances and 157 spam ones. A Logistic such as instance #80 (Figure 1 (7)). A poisoning attack on #80 can result
Regression classifier is trained on the sub-sampled dataset. The value in a reduction of 0.07 on the recall at almost half the cost of #40 (29
of k for the kNN graphs is again set to 7. insertions, poisoning rate = 7.25%). The key takeaway that our analyst
had was that there are multiple viable attack vectors that could greatly
Initial Vulnerability Check (T1): Using the Logistic Regression impact the reliability of the spam filter. Given that there are several
Classifier as our spam-filter model, we can explore vulnerabilities in the critical vulnerable targets, the attackers could perform continuous low-
training data. For spam filters, the recall score (True-Positives / True- cost manipulations to reduce the reliability of the spam filter. This sort
Positives + False-Negatives) is critical as it represents the proportion of approach is typically referred to as a “boiling-frog attack [62]”. Here,
of detected spam emails in all the “true” spams. For a spam filter, a our domain expert noted that the training-sample selection process may
lower recall score indicates that fewer true spam emails are detected need to be monitored.
by the classifier. We want to understand what instances in our training
dataset may be the most exploitable. Here, the user can sort the training 5.3 Expert Interview
data instances by the change in recall score after an attack (Figure 1 To further assess our framework, we conducted a group interview with
(1)). After ranking the two columns of recall in ascending order for our collaborator (E0) and three additional domain experts in adversarial
each attack algorithm, we found that the Binary-Search attack, when machine learning (denoted as E1, E2, and E3). For the interview,
performed on instance #40, could result in a 0.09 reduction in the recall we first introduced the background and goals of our visual analytics
score at the cost of inserting 51 poisoned instances. framework, followed by an illustration of the functions supported by
Visual Analysis of Attack Results (T2, T3): To further understand each view. Then, we presented a tutorial of the analytical flow with the
what an attack on instance # 40 may look like, the user can click on the two case studies described in Section 5.1 and 5.2. Finally, the experts
row of instance #40 and choose “Binary-Search Attack” for a detailed were allowed to freely explore the two datasets (MNIST and Spambase)
attack visualization. In the model overview, Figure 1 (2), we see that in our system. The interview lasted approximately 1.5 hours.
the false negative value representing the undetected spams increased At the end of the interview session, we collected free-form responses
from 16 to 30 (nearly doubling the amount of spam e-mails that would to the following questions:
have gotten through the filter), while the number of detected spams
decreased from 141 to 127. This result indicates that the performance 1. Does the system fulfill the analytical tasks proposed in our work?
of correctly labeling spam emails in the poisoned model is worse than 2. Does our analytical pipeline match your daily workflow?
the victim model (T2.2).
We can further examine the effects of this attack by doing an instance- 3. What are the differences between our visual analytics system and
level inspection using the projection view (T3.1). As depicted in conventional machine learning workflows?
Figure 1 (3), the two classes of points, as well as the poisoned in-
stances, show a heavy overlap. This indicates that there is an increased 4. Is the core information well-represented in the views?
MA ET AL.: EXPLAINING VULNERABILITIES TO ADVERSARIAL MACHINE LEARNING THROUGH VISUAL ANALYTICS 1083

5. Are there any views that are confusing, or that you feel could scientists can use the proposed framework to perform extensive checks
have a better design? on their model training processes in order to enrich the quality of
training datasets. Similarly, security experts can benefit from using our
6. What results can you find with our system that would be difficult framework by actively adopting new attack strategies for the purpose
to discover with non-visualization tools? of penetration testing following the paradigm of “security-by-design”
in proactive defense [10].
Framework: The overall workflow of our framework received positive
Limitations: One major concern in our design is scalability. We have
feedback with the experts noting that the system was practical and
identified issues with both the attack algorithms and visual design.
understandable. E3 appreciated the two-stage (attack space analysis
and attack result analysis) design in the interface, and he conducted Attack Algorithms: The computational efficiency of an attack algorithm
a combination of “general checks + detailed analysis”. E2 noted that has a significant influence on the cost of pre-computing vulnerability
“the stage of attack space analysis gives our domain users a clear sense measures. In order to explore vulnerabilities in data poisoning, every
about the risk of individual samples, so we can start thinking about data instance must undergo an attack. In the two case studies, it takes
further actions to make the original learning models more robust and about 15 minutes to compute the MCSA values for the 400 training data
secure,”. E1 mentioned that the framework could be easily adapted instances. In large-scale datasets, this may make the pre-computation
into their daily workflow and improve the efficiency of diagnosing new infeasible. Sampling methods could be used to reduce the analysis
poisoning attack algorithms. E1 also suggested that it will be more space, and weighted sampling can be adopted to increase the number of
flexible if we can support hot-swapping of attack algorithms to facilitate samples in the potentially vulnerable regions in the feature space. An
the diagnosis process. upper limit on attack costs could also be used so that a poisoning attack
would simply be marked as “failure” if the upper bound is reached.
Visualization: All the experts agreed that the combination of different Furthermore, progressive visual analysis [3, 54] can be employed in the
visualization views can benefit multi-faceted analysis and provide many sampling process, allowing users to conduct a coarse-grained analysis
aspects for scrutinizing the influence of poisoning attacks. E2 was of the samples and then increase sample rates on targeted regions.
impressed by the instance attribute view and felt that the glyphs were
more intuitive than looking at data tables since the changes of distances Visual Design: In our visual design, circles may overlap when the
to the decision boundary can be directly perceived. E3 mentioned number of training data instances is over one thousand. To mitigate
that the local impact view provides essential information on how the visual clutter in the projection result, we have employed semantic
neighboring structures are being influenced. The two-ring design of zooming in the projection view to support interactive exploration
the target and innocent instances provides a clear comparison of two in multiple levels. In the future, various abstraction techniques for
groups of nearest neighbors before and after an attack. E3 further scatterplots such as Splatterplots [42], down-sampling [14], and glyph-
added that the node-link diagram and the visual encoding of impacts based summarization [32] can be integrated to reduce the number of
are effective for tracing the cause of label flipping and the valuable points displayed in the canvas. Interactive filtering can also be adopted
poisoning instances. “With the depiction of impacts, maybe we could to remove the points in less important regions, e.g., far away from the
find how to optimize our attack algorithms by further reducing the target instance. A similar issue also exists in the local impact view
number of insertions, since some of the low-impact poisoning instances where the current implementation supports up to 100 nodes shown as
may be removed or aggregated.” graph structures. The readability of large graph visualizations is still an
open topic in the community. One way to scale our design would be to
Limitations: One issue found by our collaborator, E0, was the training build hierarchical aggregation structures on the nodes by clustering the
time that was necessary for using our framework. E0 commented that corresponding instances with specific criteria [20, 26, 67].
during the first hour of the interview, we were often required to repeat
the visual encoding and functions in the views. However, once the Future Work: In this work, we use the data poisoning attack as the
domain experts became familiar with the system after free exploration main scenario to guide the visual design and interactions in the visual
for some time, they found that the design is useful for gaining insights analytics framework. Based on the successful application in poisoning
from attacks. We acknowledge that there could be a long learning attacks, we plan to adapt our framework for other typical adversarial
curve for domain experts who are novice users in comprehensive visual settings and attack strategies, such as label-flipping attacks [53] where
analytics systems. the labels of training data instances can be manipulated, and evasion
attacks [6, 16, 21] that focus on the testing stage. Another issue of our
6 D ISCUSSION AND C ONCLUSIONS work is that the framework currently does not support the integration
In this work, we propose a visual analytics framework for exploring of known defense strategies. In practice, attack and defense strategies
vulnerabilities and adversarial attacks to machine learning models. By often co-exist and must be simultaneously considered in assessing
focusing on targeted data poisoning attacks, our framework enables vulnerabilities. Future iterations of this framework will incorporate
users to examine potential weak points in the training dataset and defense methods as a post-processing stage to evaluate the vulnerability
explore the impacts of poisoning attacks on model performance. Task and effectiveness of attacks under countermeasures. In addition, since
and design requirements for supporting the analysis of adversarial currently our work only considers classification models for general-
machine learning attacks were identified through collaboration with purpose tasks, another extension would be to specialize our framework
domain experts. System usability was assessed by multiple domain to support domain-specific analyses, such as image recognition, biolog-
experts and case studies were developed by our collaborating domain ical analysis, and network intrusion detection.
scientist. Target users of our framework are data scientists who utilize
machine learning models in mission-critical domains. In contrast to ACKNOWLEDGMENTS
traditional reactive defense strategies that respond when attacks are
This work was supported by the U.S.Department of Homeland Security
detected, our framework serves as a mechanism for iterative proactive
under Grant Award 2017-ST-061-QA0001. The views and conclusions
defense. The users can simulate poisoning operations and explore
contained in this document are those of the authors and should not
attack vectors that have never been seen in the historical records.
be interpreted as necessarily representing the official policies, either
This can enable domain scientists to design more reliable machine
expressed or implied, of the U.S. Department of Homeland Security.
learning models and data processing pipelines. An implementation of
our framework is provided in Github2 .
R EFERENCES
Target Users: The target users of our framework are data scientists
and security experts who wish to explore model vulnerabilities. Data [1] B. Alsallakh, A. Hanbury, H. Hauser, S. Miksch, and A. Rauber. Visual
methods for analyzing probabilistic classification data. IEEE Transactions
2 https://github.com/VADERASU/visual-analytics-adversarial-attacks on Visualization and Computer Graphics, 20(12):1703–1712, 2014.
1084 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

[2] S. Amershi, M. Chickering, S. M. Drucker, B. Lee, P. Simard, and J. Suh. ration of industry-scale deep neural network models. IEEE Transactions
Modeltracker: Redesigning performance analysis tools for machine learn- on Visualization and Computer Graphics, 24(1):88–97, 2017.
ing. Proceedings of the ACM Conference on Human Factors in Computing [25] M. Kahng, N. Thorat, D. H. P. Chau, F. B. Viégas, and M. Wattenberg.
Systems, pp. 337–346, 2015. Gan lab: Understanding complex deep generative models using interactive
[3] M. Angelini, G. Santucci, H. Schumann, and H.-J. Schulz. A review visual experimentation. IEEE Transactions on Visualization and Computer
and characterization of progressive visual analytics. Informatics, 5(3):31, Graphics, 25(1):310–320, 2019.
2018. [26] S. Kairam, D. MacLean, M. Savva, and J. Heer. Graphprism: Compact
[4] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The security of visualization of network structure. In Proceedings of the International
machine learning. Machine Learning, 81(2):121–148, Nov 2010. Working Conference on Advanced Visual Interfaces, pp. 498–505, 2012.
[5] E. Bertini and D. Lalanne. Surveying the complementary role of automatic [27] J. Krause, A. Dasgupta, J. Swartz, Y. Aphinyanaphongs, and E. Bertini. A
data analysis and visualization in knowledge discovery. Proceedings of the workflow for visual diagnostics of binary classifiers using instance-level
ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery explanations. In Proceedings of the IEEE Conference on Visual Analytics
Integrating Automated Analysis with Interactive Exploration, pp. 12–20, Science and Technology, pp. 162–172, 2017.
2009. [28] J. Krause, A. Perer, and E. Bertini. INFUSE: Interactive feature selection
[6] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, for predictive modeling of high dimensional data. IEEE Transactions on
G. Giacinto, and F. Roli. Evasion attacks against machine learning at test Visualization and Computer Graphics, 20(12):1614–1623, 2014.
time. In Proceedings of the European Conference on Machine Learning [29] J. Krause, A. Perer, and K. Ng. Interacting with predictions: Visual
and Knowledge Discovery in Databases, pp. 387–402. Springer-Verlag, inspection of black-box machine learning models. In Proceedings of the
2013. CHI Conference on Human Factors in Computing Systems, pp. 5686–5697.
[7] B. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classifiers ACM, 2016.
under attack. IEEE Transactions on Knowledge and Data Engineering, [30] B. C. Kwon, M.-J. Choi, J. T. Kim, E. Choi, Y. B. Kim, S. Kwon,
26(4):984–996, 2014. J. Sun, and J. Choo. Retainvis: Visual analytics with interpretable and
[8] B. Biggio, G. Fumera, P. Russu, L. Didaci, and F. Roli. Adversarial interactive recurrent neural networks on electronic medical records. IEEE
biometric recognition: A review on biometric system security from the ad- Transactions on Visualization and Computer Graphics, 25(1):299–309,
versarial machine-learning perspective. IEEE Signal Processing Magazine, 2019.
32(5):31–41, 2015. [31] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. Gradient-based learning
[9] B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support applied to document recognition. Proceedings of the IEEE, 86(11):2278–
vector machines. In Proceedings of the International Conference on 2324, 1998.
Machine Learning, pp. 1467–1474, 2012. [32] H. Liao, Y. Wu, L. Chen, and W. Chen. Cluster-based visual abstraction
[10] B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial for multivariate scatterplots. IEEE Transactions on Visualization and
machine learning. Pattern Recognition, 84:317–331, 2018. Computer Graphics, 24(9):2531–2545, Sep. 2018. doi: 10.1109/TVCG.
[11] E. Blanzieri and A. Bryl. A survey of learning-based techniques of email 2017.2754480
spam filtering. Artificial Intelligence Review, 29(1):63–92, 2008. [33] M. Liu, S. Liu, H. Su, K. Cao, and J. Zhu. Analyzing the Noise Robustness
[12] C. Burkard and B. Lagesse. Analysis of causative attacks against svms of Deep Neural Networks. In Proceedings of the IEEE Conference on
learning from data streams. In Proceedings of the 3rd ACM on Interna- Visual Analytics Science and Technology, 2018.
tional Workshop on Security And Privacy Analytics, pp. 31–36. ACM, [34] M. Liu, J. Shi, K. Cao, J. Zhu, and S. Liu. Analyzing the training processes
2017. of deep generative models. IEEE Transactions on Visualization and
[13] G. Caruana and M. Li. A survey of emerging approaches to spam filtering. Computer Graphics, 24(1):77–87, 2018.
ACM Computing Surveys, 44(2):9, 2012. [35] S. Liu, X. Wang, M. Liu, and J. Zhu. Towards better analysis of machine
[14] H. Chen, W. Chen, H. Mei, Z. Liu, K. Zhou, W. Chen, W. Gu, and learning models: A visual analytics perspective. Visual Informatics,
K. Ma. Visual abstraction and exploration of multi-class scatterplots. 1(1):48 – 56, 2017.
IEEE Transactions on Visualization and Computer Graphics, 20(12):1683– [36] S. Liu, J. Xiao, J. Liu, X. Wang, J. Wu, and J. Zhu. Visual diagnosis of
1692, Dec 2014. doi: 10.1109/TVCG.2014.2346594 tree boosting methods. IEEE Transactions on Visualization and Computer
[15] S.-T. Chen, C. Cornelius, J. Martin, and D. H. P. Chau. Shapeshifter: Graphics, 24(1):163–173, 2017.
Robust physical adversarial attack on faster R-CNN object detector. In [37] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang.
Proceedings of the Joint European Conference on Machine Learning and Trojaning attack on neural networks. In Proceedings of the 25th Annual
Knowledge Discovery in Databases, pp. 52–68. Springer, 2018. Network and Distributed System Security Symposium. The Internet Society,
[16] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial 2018.
classification. In Proceedings of the ACM SIGKDD International Con- [38] J. Lu, W. Chen, Y. Ma, J. Ke, Z. Li, F. Zhang, and R. Maciejewski. Recent
ference on Knowledge Discovery and Data Mining, pp. 99–108. ACM, progress and trends in predictive visual analytics. Frontiers of Computer
2004. Science, 11(2):192–207, Apr 2017.
[17] D. Dua and C. Graff. UCI machine learning repository, 2017. [39] Y. Lu, R. Garcia, B. Hansen, M. Gleicher, and R. Maciejewski. The
[18] A. Endert, W. Ribarsky, C. Turkay, B. L. W. Wong, I. Nabney, I. D. Blanco, state-of-the-art in predictive visual analytics. Computer Graphics Forum,
and F. Rossi. The state of the art in integrating machine learning into 36(3):539–562, 2017.
visual analytics. Computer Graphics Forum, 36(8):1–28, 2017. [40] L. v. d. Maaten and G. Hinton. Visualizing data using t-SNE. Journal of
[19] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, Machine Learning Research, 9(Nov):2579–2605, 2008.
and S. Thrun. Dermatologist-level classification of skin cancer with deep [41] C. M. Martinez, M. Heucke, F.-Y. Wang, B. Gao, and D. Cao. Driving style
neural networks. Nature, 542(7639):115, 2017. recognition for intelligent vehicle control and advanced driver assistance:
[20] M. Freire, C. Plaisant, B. Shneiderman, and J. Golbeck. Manynets: An A survey. IEEE Transactions on Intelligent Transportation Systems,
interface for multiple network analysis and visualization. In Proceedings 19(3):666–676, 2018.
of the CHI Conference on Human Factors in Computing Systems, pp. [42] A. Mayorga and M. Gleicher. Splatterplots: Overcoming overdraw in
213–222. ACM, 2010. scatter plots. IEEE Transactions on Visualization and Computer Graphics,
[21] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing 19(9):1526–1538, Sep. 2013. doi: 10.1109/TVCG.2013.65
adversarial examples. In Proceedings of the International Conference on [43] S. Mei and X. Zhu. Using machine teaching to identify optimal training-set
Learning Representations, 2015. attacks on machine learners. In Proceedings of the 29th AAAI Conference
[22] W. He, B. Li, and D. Song. Decision boundary analysis of adversarial on Artificial Intelligence, pp. 2871–2877, 2015.
examples. In Proceedings of the International Conference on Learning [44] Y. Ming, Z. Li, and Y. Chen. Understanding hidden memories of recurrent
Representations, 2018. neural networks. In Proceedings of the IEEE Conference on Visual
[23] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li. Analytics Science and Technology, pp. 13–24, 2017.
Manipulating machine learning: Poisoning attacks and countermeasures [45] Y. Ming, H. Qu, and E. Bertini. RuleMatrix: Visualizing and understanding
for regression learning. In Proceedings of the IEEE Symposium on Security classifiers with rules. IEEE Transactions on Visualization and Computer
and Privacy, pp. 19–35. IEEE, 2018. Graphics, 25(1):342–352, 2019.
[24] M. Kahng, P. Y. Andrews, A. Kalro, and D. H. Chau. Activis: Visual explo- [46] M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha. Sys-
MA ET AL.: EXPLAINING VULNERABILITIES TO ADVERSARIAL MACHINE LEARNING THROUGH VISUAL ANALYTICS 1085

tematic poisoning attacks on and defenses for machine learning in health- Thumbnails: Identifying and comparing multiple graphs at a glance. IEEE
care. IEEE Journal of Biomedical and Health Informatics, 19(6):1893– Transactions on Visualization and Computer Graphics, 14(8), 2018.
1905, Nov 2015. doi: 10.1109/JBHI.2014.2344095 [68] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding
[47] T. Muhlbacher, H. Piringer, S. Gratzl, M. Sedlmair, and M. Streit. Opening neural networks through deep visualization. In Deep Learning Workshop,
the black box: Strategies for increased user involvement in existing Proceedings of the International Conference on Machine Learning, 2015.
algorithm implementations. IEEE Transactions on Visualization and [69] J. Zhang, Y. Wang, P. Molino, L. Li, and D. S. Ebert. Manifold: A model-
Computer Graphics, 20(12):1643–1652, 2014. agnostic framework for interpretation and diagnosis of machine learning
[48] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, models. IEEE Transactions on Visualization and Computer Graphics,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- 25(1):364–373, 2019.
sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit- [70] X. Zhao, Y. Wu, D. L. Lee, and W. Cui. iForest: Interpreting random
learn: Machine learning in Python. Journal of Machine Learning Research, forests via visual analytics. IEEE Transactions on Visualization and
12:2825–2830, 2011. Computer Graphics, 25(1):407–416, 2019.
[49] P. E. Rauber, S. Fadel, A. Falcao, and A. Telea. Visualizing the hidden
activity of artificial neural networks. IEEE Transactions on Visualization
and Computer Graphics, 23(1):101–110, 2016.
[50] D. Ren, S. Amershi, B. Lee, J. Suh, and J. D. Williams. Squares: Sup-
porting interactive performance analysis for multiclass classifiers. IEEE
Transactions on Visualization and Computer Graphics, 23(1):61–70, 2017.
[51] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras,
and T. Goldstein. Poison frogs! Targeted clean-label poisoning attacks on
neural networks. In Proceedings of the 32nd International Conference on
Neural Information Processing Systems, pp. 6106–6116, 2018.
[52] B. Shneiderman. The eyes have it: a task by data type taxonomy for
information visualizations. In Proceedings of IEEE Symposium on Visual
Languages, pp. 336–343, Sep. 1996. doi: 10.1109/VL.1996.545307
[53] J. Steinhardt, P. W. Koh, and P. Liang. Certified defenses for data poisoning
attacks. In Proceedings of the 31st International Conference on Neural
Information Processing Systems, pp. 3520–3532, 2017.
[54] C. D. Stolper, A. Perer, and D. Gotz. Progressive visual analytics: User-
driven visual exploration of in-progress analytics. IEEE Transactions on
Visualization and Computer Graphics, 20(12):1653–1662, 2014.
[55] H. Strobelt, S. Gehrmann, M. Behrisch, A. Perer, H. Pfister, and A. M.
Rush. Seq2seq-Vis: A visual debugging tool for sequence-to-sequence
models. IEEE Transactions on Visualization and Computer Graphics,
25(1):353–363, 2019.
[56] H. Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush. LSTMVis: A tool
for visual analysis of hidden state dynamics in recurrent neural networks.
IEEE Transactions on Visualization and Computer Graphics, 24(1):667–
676, 2018.
[57] O. Suciu, R. Marginean, Y. Kaya, H. Daume III, and T. Dumitras. When
does machine learning fail? Generalized transferability for evasion and
poisoning attacks. In Proceedings of the USENIX Security Symposium, pp.
1299–1316, 2018.
[58] K. Sundararajan and D. L. Woodard. Deep learning for biometrics: A
survey. ACM Computing Surveys, 51(3):65, 2018.
[59] J. Talbot, B. Lee, A. Kapoor, and D. S. Tan. EnsembleMatrix: Interactive
visualization to support machine learning with multiple classifiers. Pro-
ceedings of the International Conference on Human factors in Computing
Systems, p. 1283, 2009.
[60] S. Thomas and N. Tabrizi. Adversarial machine learning: A literature
review. In P. Perner, ed., Proceedings of the Machine Learning and
Data Mining in Pattern Recognition, pp. 324–334. Springer International
Publishing, 2018.
[61] S. van den Elzen and J. J. van Wijk. BaobabView: Interactive construction
and analysis of decision trees. In Proceedings of the IEEE Conference on
Visual Analytics Science and Technology, pp. 151–160, 2011.
[62] Y. Vorobeychik and M. Kantarcioglu. Adversarial Machine Learning. Syn-
thesis Lectures on Artificial Intelligence and Machine Learning. Morgan
& Claypool Publishers, 2018.
[63] J. Wang, L. Gou, H. W. Shen, and H. Yang. DQNViz: A visual ana-
lytics approach to understand deep q-networks. IEEE Transactions on
Visualization and Computer Graphics, 25(1):288–298, 2019.
[64] J. Wang, L. Gou, H. Yang, and H. W. Shen. GANViz: A visual analytics
approach to understand the adversarial game. IEEE Transactions on
Visualization and Computer Graphics, 24(6):1905–1917, 2018.
[65] K. Wongsuphasawat, D. Smilkov, J. Wexler, J. Wilson, D. Mane, D. Fritz,
D. Krishnan, F. B. Viegas, and M. Wattenberg. Visualizing dataflow
graphs of deep learning models in TensorFlow. IEEE Transactions on
Visualization and Computer Graphics, 24(1):1–12, 2017.
[66] H. Xiao, H. Xiao, and C. Eckert. Adversarial label flips attack on support
vector machines. In Proceedings of the European Conference on Artificial
Intelligence, pp. 870–875, 2012.
[67] V. Yoghourdjian, T. Dwyer, K. Klein, K. Marriott, and M. Wybrow. Graph

You might also like