0% found this document useful (0 votes)
21 views

Adversarial Diffusion Attacks on Graph-based

Uploaded by

Shuhan Qiu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Adversarial Diffusion Attacks on Graph-based

Uploaded by

Shuhan Qiu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 1

Adversarial Diffusion Attacks on Graph-based


Traffic Prediction Models
Lyuyi Zhu, Kairui Feng, Ziyuan Pu, Member, IEEE Wei Ma, Member, IEEE

Abstract—Real-time traffic prediction models play a pivotal with Google Maps to improve the accuracy of real-time
role in smart mobility systems and have been widely used in Estimated Time of Arrival (ETA) prediction using GCN [2].
route guidance, emerging mobility services, and advanced traffic The predicted traffic information plays a critical role in our
management systems. With the availability of massive traffic
arXiv:2104.09369v1 [cs.LG] 19 Apr 2021

data, neural network-based deep learning methods, especially daily traveling, and travelers take for granted that the predicted
the graph convolutional networks (GCN) have demonstrated results are accurate and trustworthy. However, the robustness
outstanding performance in mining spatio-temporal information and vulnerability issues of these deep learning models have not
and achieving high prediction accuracy. Recent studies reveal the been investigated for traffic prediction models. Recent studies
vulnerability of GCN under adversarial attacks, while there is a have shown that neural networks are vulnerable to deliberately
lack of studies to understand the vulnerability issues of the GCN-
based traffic prediction models. Given this, this paper proposes designed samples, which are known as adversarial samples.
a new task – diffusion attack, to study the robustness of GCN- In general, the adversarial samples could be generated by
based traffic prediction models. The diffusion attack aims to adding imperceptible perturbations to the original data sample.
select and attack a small set of nodes to degrade the performance Though the adversarial sample is very similar to its original
of the entire prediction model. To conduct the diffusion attack, counterpart, it can significantly change the performance of the
we propose a novel attack algorithm, which consists of two major
components: 1) approximating the gradient of the black-box deep learning models. Szegedy et al. (2013) firstly discovered
prediction model with Simultaneous Perturbation Stochastic Ap- this phenomenon on deep neural networks (DNN), and they
proximation (SPSA); 2) adapting the knapsack greedy algorithm found that adversarial samples are low-probability but densely
to select the attack nodes. The proposed algorithm is examined distributed [3]. Goodfellow et al. [4] also showed neural
with three GCN-based traffic prediction models: S T-G CN, T- networks are vulnerable to the adversarial samples in the sense
G CN, and A3 T-G CN on two cities. The proposed algorithm
demonstrates high efficiency in the adversarial attack tasks under that it is sufficient to generate adversarial samples when DNNs
various scenarios, and it can still generate adversarial samples demonstrate linear behaviors in high-dimensional spaces.
under the drop regularization such as D ROP O UT, D ROP N ODE, Due to the existence of the adversarial samples, potential
and D ROP E DGE. The research outcomes could help to improve attackers could take advantage of the deep learning models
the robustness of the GCN-based traffic prediction models and and degrade the model performance. Though related theories
better protect the smart mobility systems.
and applications have been studied in various areas such as
Index Terms—Traffic Prediction, Deep Learning, Graph Con- computer vision [5], social networks [6], traffic signs [7]
volutional Network, Adversarial Attack, Intelligent Transporta- and recommendation systems [8], few of the studies have
tion Systems
investigated the vulnerability and robustness of the traffic
prediction systems. It has been shown that industry-level traffic
I. I NTRODUCTION information systems can be “attacked” easily. Recently, a
German artist walked slowly with a handcart, which was
P EOPLE’S activities and movements in smart cities rely on
accurate, robust, and real-time traffic information. With
massive data collected in the intelligent transportation systems
loaded with 99 smartphones. On each smartphone, the mobile
application Google Maps was turned on. The 99 cell phones
virtually created 99 vehicles on the roads, and all the “vehi-
(ITS), various methods, such as time series models, state-space
cles” were slowly moving along the road. As Google Maps
models, and deep learning, have been developed to carry out
estimated and predicted the traffic states based on the data sent
the short-term prediction for traffic operation and management
back from those cell phones, it wrongly identified an empty
[1]. Among these methods, deep learning methods, especially
street (green) to be a congested road (red), as shown in Fig. 1.
the graph convolutional networks (GCN), achieve state-of-the-
Though it is unclear which model Google Maps is using, this
art accuracy and are widely employed in industry-level smart
experiment indeed reveals the possibility of adversarial attacks
mobility applications. For example, Deepmind has partnered
on real-world and industry-level traffic information systems.
L. Zhu is with the College of Civil Engineering and Architecture, Zhejiang Adversarial attacks on traffic prediction models can affect
University, Hangzhou, China (E-mail: [email protected]). every aspect of the smart mobility systems. We summarize
K. Feng is with the Department of Civil and Environmental Engineering, the following four scenarios in which adversarial attacks can
Princeton University, NJ, U.S.A (E-mail: [email protected])
Z. Pu is with the School of Engineering, Monash University, Jalan Lagoon significantly degrade the performance of the systems.
Selatan, 47500 Bandar Sunway, Malaysia (Email: [email protected]). • Smartphone-based mobility applications. Mobile
W. Ma is with the Department of Civil and Environmental Engineering, phone-based mapping services such as Google Maps and
The Hong Kong Polytechnic University, Hong Kong SAR, China (E-mail:
[email protected]). AutoNavi make traffic state estimation and prediction
Manuscript received: April 20, 2021. based on the GPS trajectories sent from their users [10].
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 2

Fig. 1: Google Maps Hacks using 99 cell phones [9].

However, users’ mobile phones can be hacked and the in- this is impractical on traffic prediction models. On real-
formation can be deliberately altered to attack the systems world traffic networks, modifying the features on all nodes
[11]. In general, these individual mobility applications are is challenging and costly and our objective is to degrade the
vulnerable to adversarial attacks because the difficulties network-wide system performance instead of a single node.
of hacking users’ mobile phones are way lower than A practical task is to degrade the overall system performance
hacking the application server. by perturbing the features on a small subset of nodes, which
• Connected vehicle (CV) systems. In the CV systems, can be viewed as the opposite of the conventional adversarial
traffic information is collected by the roadside units attack problems. To this end, we propose a novel concept –
(RSUs) and sent to the traffic control center [12]. CVs diffusion attack, and its definition is presented in Definition 1.
use information from either RSU or control center to plan
Definition 1 (Diffusion Attack). Considering a GCN-based
routes and avoid congestion. However, it is possible to
deep neural network, the input features are associated with
hack some of the RSUs to send the adversarial samples
each node of the corresponding graph. The diffusion attack
to manipulate the predicted traffic information from the
aims to modify the input features on a limited set of nodes
control center. Attackers could make use of adversarial
while keeping the graph topology unchanged, and the goal is
attacks to benefit a group of vehicles while causing
to degrade the overall performance of the neural network on
unexpected delays to other vehicles.
all the nodes.
• Emerging mobility services. Transportation network
companies (TNCs) depend on accurate traffic speed pre- An illustration of the diffusion attack is presented in
diction for vehicle dispatching, routing, and relocation on Fig. 2. On the left-hand side, the prediction model performs
the central platform. It is possible that a group of vehicles normally and can generate accurate predictions. After the
collide with each other and falsely report their GPS diffusion attack, two nodes in red are attacked, and their
trajectories and speed to confuse the central platform. By nearby neighbors are strongly perturbed, followed by their
carefully designing the adversarial samples with purpose, 2-hop neighbors being slightly perturbed. One can see that
this group of vehicles could take advantage of the central the attack effects diffuse from the node being attacked to
platform by receiving more orders and running on less its neighbors, and later we will develop mathematical proofs
congested roads. and numerical experiments to verify this phenomenon. The
• Advanced Traffic Management Systems (ATMS). Most purpose of attackers is to select the optimal attack nodes and
of the network-wide transportation management systems to generate adversarial samples to maximize attack effects.
[13] rely on user equilibrium (or stochastic user equi-
librium) models to depict and predict travelers’ behav- Predicted Speed Unchanged
iors and both models assume that travelers can acquire Attack Node
Slightly Perturbed
accurate (or nearly accurate) traffic information. Under Attack Node
Diffusion Attack
adversarial attacks, this assumption no longer holds. The
inaccurately predicted traffic information can reduce the
Strongly Perturbed
effectiveness of the ATMS and degrade the efficiency of
the entire network.
Fig. 2: An illustration of the diffusion attack.
Adversarial attack for traffic prediction systems is a unique
task that is different from existing literature. In this paper, we To summarize, vulnerability issues of the traffic prediction
focus on the GCN-based traffic prediction models. Previous models are critical to smart mobility systems, while the related
literature aims at attacking one node by modifying the features research is still lacking. Given this, we explore the robustness
of all the nodes in the GCN-based neural networks, while of the GCN-based traffic prediction models. Based on the
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 3

unique characteristics of the adversarial attacks on traffic A. Traffic Prediction


prediction models, we propose the concept of diffusion attack, The traffic prediction problem has been extensively studied
which aims to attack a small set of nodes to degrade the for decades, and various statistical models have been devel-
performance of the entire network. On top of that, we develop oped to solve the problem, including History Average (HA)
a diffusion attack algorithm, which consists of two major [14], Autoregressive Integrated Moving Average (ARIMA)
components: 1) approximating the gradient of the black-box [15]–[19], Support Vector Regression (SVR) [20], clustering
prediction model with Simultaneous Perturbation Stochastic [21], and Kalman filtering [22], [23]. In recent years, the data
Approximation (SPSA); 2) adapting the knapsack greedy scale becomes large and the spatio-temporal correlation of
algorithm to select nodes to attack. The proposed algorithm the data becomes complicated, and hence traditional statistical
is examined with three GCN-based traffic prediction models: methods reveal their limits in face of the massive and complex
S T-G CN, T-G CN, and A3 T-G CN on two cities: Los Angeles data. Instead, neural network models demonstrate potentials
and Hong Kong. The proposed algorithm demonstrates high in traffic prediction with multi-source data on large-scale
efficiency in the diffusion attack tasks under various scenar- networks. Various neural network models have been used
ios, and the algorithm can still generate adversarial samples for traffic prediction, including Convolutional Neural Network
under different drop regularization. We further discuss how (CNN) [24], Recurrent Neural Network (RNN) [25]–[27],
to improve the robustness of the GCN-based traffic prediction attention [28], [29] and Graph Convolutional Network (GCN)
models, and the research outcomes could help to better protect [30]–[33].
the smart mobility systems in both the cyber and physical Traffic prediction tasks can also be categorized into multiple
world. The contributions of this study are summarized as purposes, such as traffic state prediction, demand prediction,
follows: and trajectory prediction [34]. Traffic state prediction includes
• Different from existing adversarial attack tasks on graphs, the prediction of traffic flow [28], speed [30], and travel time
we propose a novel task of diffusion attack, which aims [35]. Traffic demand prediction aims to make prediction of
to select and attack a small set of nodes to degrade the the number of users and traffic demand, such as taxi request
performance of the entire traffic prediction models. This [36], subway inflow/outflow [37], bike-sharing demand [38],
task is suitable for the traffic prediction context, while it [39], and origin-destination demand [40]. It is also possible
is overlooked in the existing literature. to predict the trajectories of travelers and vehicles, and this
• To generate the adversarial samples on traffic predic- task is used for dynamic positioning and resource allocation
tion models, we propose the Simultaneous Perturbation [41], [42]. Overall, most of the traffic prediction tasks can be
Stochastic Approximation (SPSA) algorithm to efficiently carried out by neural network models, and hence it is crucial
approximate gradients of the black-box prediction mod- to study their vulnerability issues.
els.
• To select the optimal attack nodes, we formulate the B. GCN and its Applications on Traffic Prediction
diffusion problem as a knapsack problem and then adapt Traffic data is closely associated with the topological struc-
the greedy algorithm to determine the priority of the ture of the road networks, and hence it is typical graph-based
attack nodes. data. The graph-based data is represented in the non-Euclidean
• We conduct extensive experiments to attack the widely- space, and conventional machine learning methods (e.g., multi-
used traffic prediction models on Los Angeles and layer perceptron) overlooks the graph-based inter-relationship
Hong Kong. Different drop regularization strategies (e.g., among data [43]. In this paper, we summarize that traffic data
D ROP O UT, D ROP E DGE, D ROP N ODE) for defending the consists of the following two types of information:
adversarial attacks are also tested to ensure that the
• Spatial Information. Traffic-related data can be pre-
proposed algorithm can still generate effective adversarial
sented on a graph G = (V, E), where V = {1, 2, · · · , N }
samples in various scenarios.
represents a set of nodes with N = |V|, and E denotes a
The remainder of this paper is organized as follows. Section set of edges. In traffic prediction, one way to construct the
II reviews the related studies on traffic prediction, GCN graph is to make each node vi represent a road segment,
models, and adversarial attacks on graphs. Section III rig- and each edge represents the connectivity relationship
orously formulates the diffusion attack problem and presents between the road segments [30]. We further define the
the developed attack algorithms. In Sections IV, three traffic adjacency matrix A ∈ {0, 1}N ×N on the graph G, where
prediction models and two real-world datasets are used to Aij = 1 when node i connects node j, and 0 otherwise.
examine the proposed attack algorithms. Finally, conclusions • Temporal Information. Traffic data, such as traffic
and future research are summarized in Section V. speed, density, and flow, on each node can be viewed as
a time series. The traffic data on the graph is represented
by a feature matrix X ∈ RN ×S , where S denotes the
II. R ELATED W ORKS
number of time intervals in the study period.
In this section, we first overview the traffic prediction Graph Convolutional Network (GCN) demonstrates great
tasks and then summarize recent studies on GCN-based traffic capacities in learning graph-based information for various
prediction models. Robustness issues of the GCN-based pre- applications. Short-term traffic prediction is one of the most
diction models under adversarial attacks are also discussed. important and practical applications. The GCN can be used to
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 4

extract the spatial (non-Euclidean) relationship among nodes parameters, and only the model input and output are exposed to
[30], and it can couple with recurrent neural networks (RNN) the attackers. For example, if attackers plan to attack the traffic
to learn the spatio-temporal patterns of the traffic data. For prediction system, he/she is unlikely to know the internal
example, Long Short-Term Memory (LSTM) is widely used structure or information of the prediction model.
with GCN for traffic prediction [44], and the Gated Recurrent Though it is more common in the real world, the black-
Unit (GRU) is also adopted to model the time-series on each box attack is much more challenging than the white-box
node [30]. Some emerging models for time-series modeling, attack. Black-box attacks can be achieved through response
such as gated CNN and attention, can also be incorporated surface models [58] and meta-heuristic. For instance, one-
into the GCN-based deep learning framework [28], [31]–[33]. pixel attack [5] uses Differential Evolution (DE) algorithm and
Readers are referred to [34] for a comprehensive review of the decision-based attack [59] uses Covariance Matrix Adaptation
GCN-based traffic prediction methods. Evolutionary Strategies (CMA-ES) to generate and improve
As the backbone of a GCN-based model, the graph convo- the adversarial samples by iteration. It is also possible to
lutional layer is defined as follows: approximate the gradient of the target models, and the rep-
resentative models include Zeroth Order Optimization (ZOO)
H (l+1) = σ(ÂH (l) W (l) ),
[60], Autoencoder-based Zeroth Order Optimization Method
where l is the layer index, H (0) = X, and  = D e − 12 A
eDe − 12 (AutoZOOM) [61], and Natural Evolutionary Strategies (NES)
is the Laplacian matrix. D
e is the corresponding degree matrix [62]. Besides, the semi-black-box attacks are in between, and
it is assumed that the information of the prediction models
P
Dii = j Aij , and A = A + IN , where IN is a N ×
e e
N identity matrix. σ(·) represents the activation function and is partially observed [63]. The above attack algorithms mainly
W (l−1) are parameters of the lth layer. focus on the classification task, while attack methods for traffic
A L-layers GCN model can be expressed as following: prediction models (i.e. regression task) are still lacking.

Y = f (X, A) = g(Â · · · σ(ÂXW (0) ) · · · W (L−1) ), (1)


III. P ROPOSED W ORKS
where g(·) is a generalized function, Y ∈ RN ×T and In this section, we first present the general formulation
the parameters W (l−1) for each layer could be learned by of the adversarial attack on graphs. Then, the new concept
minimizing the loss between the estimated Y and true Ytrue , of diffusion attack is proposed for traffic prediction models.
represented by L (Y , Ytrue ). This paper will adopt the GCN as Lastly, we propose the new formulation and algorithm to
the backbone model and study its robustness and vulnerability construct the diffusion attack and discuss its implementations.
issues under adversarial attacks.
A. Preliminaries
C. Adversarial Attacks On Graphs
Traffic prediction on graphs can be regarded as a graph-
Like other neural networks, GCN is vulnerable to adversar- based regression problem [34]. Using f : RN ×S → RN ×T
ial attacks. Various attack concepts on graph have proposed, in Equation 1 as a regression model (e.g., a traffic prediction
such as targeted and non-targeted attacks, structure and feature model) on graph G, and f contains a L-layer GCN, where S
attacks, poisoning and evasion attacks. Targeted attacks aim represents the look-back time interval, and T is the number
to attack a target node [45], while the non-targeted attacks of time intervals to predict. The feature matrix X contains
aim to compromise global performance of a model [46]– the historical traffic states, and Y is the traffic states we want
[49]. Structure attacks modify the structure of graph, such as to predict. For traffic prediction problems, Y mainly depends
adding or deleting nodes/edges [47], [50], [51], and feature on X as most of A are fixed [34]. We further denote X =
attacks perturb the labels/features of nodes without changing (x1 ; · · · ; xN ), Y = f (X) = (y1 ; · · · ; yN ), and xi and yi
the connectivity of graph [46], [52], [53]. Poisoning attacks are the ith row of X and Y , respectively. For node i, xi
modify the training data [46], and evasion attacks insert an is the ith row of the feature matrix X, and xi represents
adversarial samples when using the models [47], [48], [52], a time series of speed on node i. The prediction model for
[54]. To be specific, attacks on traffic prediction systems are node i can be written as xi = (xi1 , xi2 , · · · , xiS ) 7→ yi =
non-targeted, feature, and evasion attacks, which have not been (yi1 , yi2 , · · · , yiT ), where xi· is the historical traffic states,
well studied in the existing literature. and yi· is the future traffic states on node i.
The attack algorithms can be further categorized into white-
box attacks and black-box attacks. In white-box attacks, the
neural network structure and parameters, training methods, and B. Diffusion Attacks on Traffic State Forecasting Models
training samples are exposed to attackers. Attackers could uti- This paper aims to attack the traffic prediction system by
lize the neural network model to generate adversarial samples. adding perturbations on the feature matrix X, to maximally
Many white-box methods achieve great performance, such as change the prediction results over a selected set of nodes. As
fast gradient sign method (FGSM) [4], Jacobian-based saliency discussed above, the graph structure is fixed and cannot be
map approach (JSMA) [55], Carlini and Wagner Attacks (CW) changed easily for the problem of traffic prediction, so we
[56], and Deepfool [57]. In contrast, in black-box attacks, assume that A is fixed throughout the paper.
attackers know little about internal information of the target We construct an adversarial sample by adding the pertur-
neural network model, especially the model structure and bation U , which is the same size as X, to the original input
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 5

feature matrix X, as represented by X 0 = X + U . Conse- For each k we can further expand it as the sum of perturbation
quently, the corresponding output changes from Y = f (X) on different nodes:
to Y 0 = f (X + U ). PN
ÂL
i· |U |·k =
L
h=1 Âih |U |hk ,
( a set of nodes P ⊆ V to
We suppose that attackers select
6= 0 i ∈ P where |U |hk is the absolute value of perturbation added on kth
attack. For each node i ∈ P, ui , where ui is
=0 i∈ /P historical traffic state of node h. ÂL
ih represents normalized
the ith row of U . Then adversarial sample can be expressed connectivity weight between node i and node h (similar to
as follows: AL ih , which represents number of L-hop paths between i and
h, according to the graph theory). If ÂL
ih is large, node h will
X 0 = X + U = (x1 + u1 ; · · · ; xi + ui ; · · · ; xN + uN ). have more impact on φi (U ). Specially, if node h is out of
By perturbing the node i, we change the original prediction L-hop neighbors of node i, then ÂL ih ≡ 0, which means that
result on node i (denoted as yi ) to yi0 . The attack influence attack effect will not diffuse to node i when attacking h. For
function φi (U ) on node i is defined as follows: most of GCN-based traffic prediction models L ≤ 3, so the
effect of U only diffuses to its local neighbors.
φi (U ) = L (yi0 (X 0 ), yi (X)) , (2)
where we use yi (X) to indicate that yi is a function of
As discussed in the previous section, this paper focuses
X. L(·, ·) represents loss function between yi0 and yi . Here
on the novel concept of diffusion attack, which aims at
the attack influence evaluates the difference between original
changing the network-wide prediction results by perturbing
prediction (instead of true speed) and perturbed speed. The
a small subset of node features. Mathematically, the diffusion
reason is that we assume the prediction is accurate enough,
attack problem can be expressed as to find the optimal P
otherwise there is no need to attack. On the other hand, we
and the corresponding perturbation U such that the following
could never know the true traffic condition in the future, so it
influence function Φ(U ) is maximized:
is impossible to perturb the prediction against true values.
To mathematically characterize the diffusion phenomenon
P
max Φ(U ) = i∈V wi φi (U )
U ,z
when attacking the GCN, we demonstrate that the following − +
s.t. −ε
P zi xi ≤ ui ≤ ε zi xi ∀i ∈ V (3)
Proposition 1 holds.
b z
i∈V i i ≤ B
Proposition 1. Using the L-layer GCN model presented in zi ∈ {0, 1} ∀i ∈ V
Equation 1 for traffic prediction, the effect of perturbation U
where (zi indicates whether node i is attacked. To be precise,
on each node i, which is denoted as φi (U ), depends on the
1 i∈P
perturbations of its L-hop neighbors. zi = . bi represents the cost of attacking node
0 i∈ /P
Proof. Using | · | to represent the element-wise absolute value
operator, and assuming that g(·) is Lipschitz continuous with i, and B is the total budget. The objective function Φ(U )
constant M , [·]i is the ith row of a matrix, σ(·) is the Relu represents attack influence function for the entire network, and
function, and L represents the Mean Squared Error (MSE), wi is a pre-determined importance weight of node i. ε− , ε+ >
we have 0 are used to control the scale of the perturbations.
2
φi (U ) = kyi0 (X 0 ) − yi (X)k2 Formulation 3 is a mixed integer programming (MIP), and
2
it contains two components: 1) determining U given a fixed
h    i
= g ÂH 0(L−1)W (L−1) −g ÂH (L)W (L−1)
h i 2 i 2 P; 2) determining P. The two components will be discussed
≤ M Â(H 0(L−1) − H (L−1) )W (L−1) in section III-C and III-D, respectively.
i 2i
h   2
0(L−1) (L−1) (L−1)
≤ M Â |H −H | |W |
i 2
h  i 2 C. Black-box attacks using SPSA
≤ M Â |H 0(L−1) − H (L−1) | |W (L−1) |
h  ii2
2
2 Given a fixed P, Equation 3 reduces to a continuous
≤ M W Â |H 0(L−1) − H (L−1) | optimization problem with affine constraints, as shown in the
i
i2 2
h  following equation:
≤ M W 2 Â2 |H 0(L−2) − H (L−2) | P
h  i 2 i 2 max Φ(U ) = i∈V wi φi (U )
≤ M W L ÂL |H 0(0) − H (0) | U (4)
2
i 2 s.t. −ε− xi ≤ ui ≤ ε+ xi ∀i ∈ P
= M W L ÂL
i· |U | ,
2 Formulation 4 suggests that an ideal perturbation U ∗ should
2 be small, and meanwhile it maximizes the overall influence
where |W (l) | 2 ≤ W and ÂL i· represents the ith row of
L function Φ(U ∗ ). In real-world applications, the internal infor-
 . Here we can see that the upper bound of attack influence mation about the traffic prediction model is opaque, hence
2
φi (U ) associates closely with ÂL
i· |U | . Denoting |U |·k as it is proper to consider the traffic prediction model as a
2
2 black-box. We assume both input feature X and prediction
the kth column of |U |, we can expand ÂL
i· |U | as follows: results Y are known to attackers as both matrices represent
2
2 h i 2 the true and predicted traffic states in real-world, and hence
ÂL
i· |U | = ÂL L L
i· |U |·1 , · · · , Âi· |U |·k , · · · , Âi· |U |·S Formulation 4 can be viewed as a black-box optimization
2 PS 2
= k=1 (ÂL 2 problem. To solve it, we adopt the Simultaneous Perturbation
i· |U |·k )
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 6

Stochastic Approximation (SPSA) method, which is featured Algorithm 1 Determine the adversarial sample X 0 and opti-
with its efficiency and scalability [64]. In recent years, SPSA mal perturbation U given a fixed P
has been used for adversarial attacks on classification problems Input: Traffic prediction model f (X), input feature matrix
[65], while it has not been used for regression problems, in X, attack set P, maximum iteration MaxIter.
particular, the traffic prediction problem. Output: Adversarial sample X 0 , and optimal perturbation
The SPSA method uses finite differences between two U.
randomly perturbed inputs to approximate the gradient of the Initialize U1 ∈ RN ×S
objective function. Mathematically, the gradient of Φ can be for n = 1, 2, · · · , MaxIter do
calculated as follows: Update an and cn based on Equation ( 6.
d n ) = Φ(Un + cn ∆n ) − Φ(Un − cn ∆n ) ,
∇Φ(U (5) For i ∈ V, random sample δi =
01×S if i ∈
/P
2cn ∆n R AD1×S if i ∈ P
where n is the index of the iteration, ∆n is a random perturba- ∆ = (δ1 ; · · · ; δi ; · · · ; δN ).
tion vector whose elements are sampled from Rademacher dis- U + ← Un + cn ∆; U − ← Un − cn ∆.
tribution (Bernoulli ±1 distribution with probability p = 0.5, Compute ∇Φ(Ud n ) based on Equation 5.
and we denote R AD1×S ∈ {−1, 1}1×S as a sample vector Compute Un+1 based on Equation 7.
that follows Rademacher distribution). We further denote se- Set (u1 ; · · · ; ui ; · · · ; uN ) ← Un+1 .
quences {cn } and {an } as follows: for i ∈ V do
a c if i ∈ P then
an = cn = γ , (6) ui ← min(+ xi , ui )
(η + n)α n
ui ← max(−− xi , ui )
where a, c, α, γ are hyper-parameters for SPSA [64]. Both else
sequences decrease when the iteration n increases. Then the ui ← 0
gradient ascent approach is utilized to maximize Φ(U ), as end if
shown in the following equation: end for
Un+1 = Un + an ∇Φ(U
d n ) ∀n. (7) Un+1 ← (u1 ; · · · ; ui ; · · · ; uN )
end for
To summarized, the adversarial attack with fixed node set U ← UMaxIter+1
P is presented in Algorithm 1. X0 ← X + U
Return X 0 , U
D. Node Selection using Knapsack Greedy
This section focuses on determining the attack set P with
one node and the perturbation u is small enough, then the
a limited budget B. The cost bi is different across different
object function in Formulation 3 is locally convex.
nodes due to the level of difficulty in attacking the node. For
example, urban roads may contain a more recent and secured Proposition 2. The objective function Φ in Formulation 3 is
information collection system (bi is high), while rural roads locally convex under small perturbation U .
can be attacked easily (bi is low). In contrast, attacks on urban
Proof. Given function Φ is smooth and attains global optimal
roads usually generate a higher impact on the traffic prediction
when the perturbation U = 0, there exists one region around
methods because the urban traffic volumes are high. It can be
U = 0, in which Φ(U ) is convex.
seen that there is a trade-off between the cost and benefit when
selecting the attack set P. Existing literature has shown that the convex separable
We review the formulation of the diffusion attack in Equa- nonlinear knapsack problems could be approximated with
tion 3, and it is actually similar to the 0-1 knapsack problem, the greedy search framework described by Algorithm 2
except for that the utility of each node i (φi (U )) is unknown [67]. Recalling Proposition 1, the perturbation on GCN-based
[66]. The attack set P can be viewed as a knapsack with models would only arise local effect, which means if the attack
maximum capacity B, and the nodes are items with their nodes are selected L-hop away from each other, the objective
weight wi . Node i is added to P if zi = 1. Due to the nature of of Formulation 3 would be separable. We also observed that
integer programming, there is no provably efficient method to the objective is locally convex given attack on only one node
solve formulation 3, which is a NP-hard problem. Real-world in Proposition 2. Combing the enlightenment we get from
networks contain hundreds or thousands of nodes, and hence Proposition 1 and 2, the greedy search framework would
it is impractical to enumerate all possible integer solutions. work efficiently for solving formulation 3.
In this paper, we develop a family of Knapsack Greedy (K G) The proposed solution procedure consists of two steps:
algorithms to solve for formulation 3, and those algorithms 1) compute φ̂i to approximate φi for each i; 2) adopt the
are inspired by the original greedy algorithm for the knapsack greedy algorithm for the standard knapsack problem with φ̂i
problem. as the utility. In Step 1, we proposed that the utility φ̂i can
A trivial but insightful observation is that any perturbation be obtained by S PSA. To be precise, we run Algorithm 1
U could reduce the performance of the prediction model. with P = V, and the algorithm outcome is UV . Then, φi
Proposition 2 shows that the convexity exists if we only perturb is approximated by φ̂i = φi (UV ); in Step 2, we initialize the
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 7

attack set P as an empty set, then each node is added to the is 2. The speed data are collected every 5 minutes, and
attack set sequentially with the highest utility over budget. the average speed is 45km/h. The study area is showed
The entire procedure is referred as K G -S PSA, and details of in the lower part of Fig. 4.
the algorithm are presented in Algorithm 2. Traffic prediction models. We evaluate the developed
diffusion attack framework on three traffic prediction models:
Algorithm 2 K G -S PSA for the diffusion attack on traffic T-G CN [30], S T-G CN [31], and A3 T-G CN [32], which are all
prediction models. based on GCN structures. For each model, we set S = 12 and
Input: Traffic prediction model f (X), input feature matrix T = 1. To conduct the comprehensive evaluation, we train
X, total budget B, and cost of each node {bi }i∈V . four variants of each model, which are the original model, the
Output: Adversarial sample X 0 , optimal perturbation U , original model with D ROP O UT, D ROP N ODE, and D ROP E-
and attack set P. DGE regularization, respectively [68], [69]. For different drop
Initialize P = ∅. regularization strategies, we set the drop probability to 30%.
EvaluatePφ̂i = φi (UV ), i ∈ V with Algorithm 1. In Appendix A, the Accuracy and Root Mean Squared Error
while bi ≤ B do (RMSE) of each model are showed in TABLE III and TABLE
i∈P IV, and both measures are defined in [30]. Overall, accuracy
Set max_utility = −∞, max_idx = −∞.
of the different prediction models are around 90% in testing
for i ∈ V \ P do
data.
if φ̂bii > max_utility then
Attack settings. The parameter settings for the diffusion at-
Set max_utility = φ̂bii . tack models and algorithms, as well as the evaluation criterion
Set max_idx = i. are discussed as follows:
end if • Model specifications. In Equation 3, we set the constraint
endPfor −zi xi ≤ ui ≤ 0.5zi xi , and wi = 1, which means
if bi + bmax_idx ≤ B then each node is equally important. The cost P is defined as
i∈P
P = P ∪ {max_idx} bi = Degree(i) = SD (i), where SD (i) = j Aij . B =
end if {20, 50, 100,
P 150, 200}. The optimal target reduces to
end while Φ(U ) = i∈V φi (U ), where φi (U ) = yi0 (X 0 ) − yi (X).
Run Algorithm 1 with fixed P, obtain X 0 , U . The attack algorithm aims to reduce the predicted speed
Return X 0 , U , P. and we intend to generate virtual “congestion” on the
network.
• Evaluation of Algorithms. We use Average Attack Influ-
It is possible to adopt other methods to estimate φ̂i in Step
1, including clustering methods and graph centrality measures. ence (AAI) and Average Attack Influence Ratio (AAIR)
These algorithms will be viewed as baseline algorithms and to evaluate the effect of the diffusion attacks. We define
= N1 i∈V |φi (U )|
P
compared with K G -S PSA in numerical experiments. AAI
,
AAIR = N1 i∈V |yφii(X)| (U )
P
IV. E XPERIMENTS
where AAI represents the average degradation and AAIR
In this section, we evaluate the performance of the proposed represents the average degradation ratio of the prediction
diffusion attack algorithm under different scenarios using real- on the entire network, respectively.
world data. • SPSA Setting. For Algorithm 1, a = 0.328, c = 0.1, α =
n
0.202, γ = 0.101, and η = 10 . For diffusion attack,
A. Experiment Setup MaxIter = 30000; for computing the φ̂i , MaxIter =
We examine the proposed attack algorithm on three traffic 100.
prediction models and two datasets, and details are described Baseline algorithms. Because the diffusion attack is a
as follows: newly proposed task, there are very few existing methods that
Traffic Data. We consider two real-world traffic datasets: can be used as baseline algorithms. In addition to the proposed
• LA: The LA dataset contains traffic speed obtained from K G -S PSA approach, we modify and develop 8 algorithms for
207 loop detectors in Los Angeles [30], and The data comparison. The major difference of each baseline algorithm
ranges from March 1st to March 7th, 2012. The average lies in how to select the attack set P and whether to use the
degree of the adjacency matrix is 14. The speed data K G-? greedy algorithm. D EGREE selects nodes with highest
are collected every 5 minutes, and the average speed is degree SD (i). K-M EDOIDS selects nodes by clustering the
58km/h. The study area is showed in the upper part of nodes with geo-location features until reaching the total budget
Fig. 4. B [70], PAGERANK selectsP nodes with highest pagerank scores
SP R (j)
• HK: The HK speed data is collected from an open data SP R (i) = 1−αN + α j∈Ni |Nj | , where α = 0.85, and
platform initiated by the Hong Kong government, and B ETWEENNESS chooses nodes with high betweenness scores
P Path (i)
overall 179 roads are considered in the Hong Kong island SBw (i) = j6=i,k6=i Pathjkjk , where Pathjk is the number of
and Kowloon area. The data ranges from May 1st to May shortest path between node j and k, and Pathjk (i) is the
31st, 2020. The average degree of the adjacency matrix number of shortest path that passes node i. The R ANDOM
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 8

algorithm just selects the node randomly until meets the budget does not rely on knowledge of the graph structure (i.e. adja-
limit. S PSA selects the highest φ̂i = φi (UV ) by running cency matrix A). In real-world, it is challenging to obtain the
Algorithm 1 with P = V, and the greedy algorithm is not information of A in the prediction system as the graph can
used in S PSA. When we use the greedy algorithm, it is also be generated by different configurations such as sensor layout,
possible to use centrality measures such as pagerank and network topology, and causal relationship, and this information
betweenness to represent φ̂i . Different from PAGERANK and is hidden to users [34]. Fig. 4 presents the selected nodes by
B ETWEENNESS, which determine P by the highest scores, K G -S PSA for different prediction methods with B = 50, and
K G -PAGERANK and K G -B ETWEENNESS approximate φ̂i by the color (from green to red) represents AAIR of each node
the pagerank and betweenness scores, followed by running under the attack. To maximize the attack effect, the selected
Algorithm 2 with different φ̂i . nodes distribute across the entire network, which is consistent
with our previous conjecture.
In addition, it is observed that the robustness of the three
B. Results of diffusion attack prediction models is different. A3 T-G CN demonstrates great
In this section, we present the experimental results. We first vulnerability under attack algorithms, while both T-G CN and
verify the diffusion effect when attacking a single node, then S T-G CN are more robust. This could be due to the strength-
the performance of the proposed attack algorithm is evaluated ened connection among nodes by the attention layers in
and compared with baseline methods in different scenarios. A3 T-G CN, meanwhile, the strengthened connection can also
Lastly, we examine the robustness of the proposed algorithm increase the vulnerability of the prediction models. Comparing
under different drop regularization strategies. the dataset LA, predictions on HK are more vulnerable to
1) Diffusion Effects of attacks on a single node: To demon- adversarial diffusion attacks, which might be explained by the
strate the diffusion phenomenon, we construct an attack on drastic changes of Hong Kong’s traffic conditions within a day
a single node for the three traffic prediction models, and [71].
the attack effect of different hops of neighbors is presented 3) Sensitivity analysis on the budget B: To study the effect
in Fig. 3. As can be seen, the attack mainly influences of budget B on the diffusion attack, we run K G -S PSA with
the ego node and its 1∼2-hop neighbors, and the influence B ∈ [20, 50, 100, 150, 200] for both LA and HK, and the
will diminish as the number of hops increases. As proven corresponding AAI is presented in Fig. 5. The attack influence
in Proposition 1, for a L-layer GCN model, the diffusion will increase when the total budget B increases for both
only occurs within L-hop neighbors, which is verified in this datasets while the trend is becoming marginal. It is also
experiment. The experimental results also indicate that the observed that prediction models on HK are more vulnerable,
influence of a single node attack is localized, and a successful while A3 T-G CN is the least robust predictions models for both
diffusion attack algorithm requires a scattered node selection datasets.
strategy. 4) Performance of the proposed algorithm on drop regular-
ization strategies: To better understand the performance of the
proposed attack algorithms, we examine the attack effects on
the prediction models with drop regularization. Existing stud-
ies have widely demonstrated that drop regularization strate-
gies could improve the robustness of the GCN-based model
[68], [69], hence it is crucial to show that the performance of
the proposed methods remains effective under different drop
regularization strategies. To this end, we conduct diffusion
attacks with K G -S PSA on the prediction models trained with
D ROP O UT, D ROP N ODE, and D ROP E GDE. D ROP O UT ran-
domly drops rows in feature matrix X, D ROP N ODE randomly
drops a subset of the nodes on the graph, and D ROP E GDE
Fig. 3: Diffusion effects when attacking on a single node. will randomly drop the edges of the graph for each epoch
during the model training. Details of the models are presented
2) Comparisons of different attack algorithms: The pro- in section IV-A. We run K G -S PSA and K G -PAGERANK for
posed algorithm K G -S PSA is compared with different base- prediction models with different drop regularization on the
line algorithms with B = 50, and the corresponding AAI two datasets, and the algorithm performance is presented
is presented in TABLE I (unit: km/hour), and the AAIR in TABLE II. The reason we choose K G -S PSA and K G -
table is presented in Appendix B. The attack algorithms are PAGERANK is because both algorithms outperform other semi-
categorized into two types: semi-black-box algorithms that black-box and black-box algorithms. For dataset LA, K G -S PSA
know the graph structure, and black-box algorithms that only outperforms K G -PAGERANK on all the prediction models, and
require inputs and outputs of the prediction models. From K G -S PSA is slightly better on HK. In most cases, D ROP O UT
TABLE I one can see, the proposed algorithm K G -S PSA could degrade the performance of the attack algorithms, while
outperforms all the baseline algorithms on LA, and K G-? the other two drop regularization strategies do not protect the
generally outperforms the original counterparts. Comparing prediction models. Overall, the proposed diffusion attack algo-
with other methods, K G -S PSA is a black-box method, which rithms could still generate adversarial samples under various
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 9

TABLE I: Comparison of different diffusion attack algorithms in terms of AAI. (B = 50)


LA HK
Types Algorithm
S T-G CN T-G CN A3 T-G CN S T-G CN T-G CN A3 T-G CN
D EGREE 0.74 0.65 0.64 5.30 3.64 4.22
K-M EDOIDS 0.69 0.87 1.69 12.41 5.72 7.74
PAGERANK 2.41 2.11 2.61 9.42 4.84 5.54
Semi-black-box B ETWEENNESS 2.35 1.56 2.06 16.28 7.84 19.88
K G -B ETWEENNESS 2.75 1.92 2.66 17.37 8.42 24.44
K G -PAGERANK 4.74 3.69 5.06 22.63 12.99 34.47
R ANDOM 1.06 1.31 1.86 14.61 8.74 20.42
S PSA 3.18 1.46 7.66 18.60 7.43 28.97
Black-box
K G -S PSA 5.46 4.36 12.74 23.34 12.21 32.86

Los Angeles

Nodes in attack set

Original model ST-GCN T-GCN A3T-GCN

Hong Kong

Original model ST-GCN T-GCN A3T-GCN

Fig. 4: The distribution of selected nodes and AAIR of each node under the attack algorithm K G -S PSA. (the selected nodes
are marked as triangle, and the color represents AAIR)

TABLE II: Comparison of K G -S PSA and K G -PAGERANK on


the three defense strategies in terms of AAI. (B = 50)
Datasets Model Baseline D ROP O UT D ROP N ODE D ROP E DGE
K G -S PSA
S T-G CN 5.46 5.65 6.86 11.60
LA T-G CN 4.36 3.43 3.49 2.79
A3 T-G CN 12.74 3.07 18.59 6.18
S T-G CN 23.34 11.89 14.53 25.84
HK T-G CN 12.21 12.34 11.63 11.43
A3 T-G CN 32.86 41.77 11.44 91.77
K G -PAGERANK
S T-G CN 4.74 2.71 4.65 6.67
LA T-G CN 3.69 3.06 3.21 2.47
A3 T-G CN 5.06 2.68 6.07 3.42
Fig. 5: Attack effect with different budget B. S T-G CN 22.63 12.84 16.02 24.84
HK T-G CN 12.99 12.34 12.31 12.04
A3 T-G CN 34.47 28.34 24.10 74.93

drop regularization strategies.

C. Discussions • Model selection. When choosing GCN-based models for


In this section, we discuss the implications and suggestions speed prediction, RNN-based models are generally more
for improving the robustness of the traffic prediction models. robust than attention-based models. Depending on the
In the previous section, we carry out numerical experiments to data and city scale, it is suggested to choose models
demonstrate the performance of the proposed attack algorithms with simpler layers, as the complex layers in S T-G CN
on different datasets, prediction models, and regularization and A3 T-G CN can sometimes degrade significantly un-
strategies. Based on the experimental results, we provide the der attacks. There is a trade-off between accuracy and
following suggestions to improve the model robustness during robustness, so it is critical to balance the accuracy and
different phrases: robustness for practical usage.
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 10

• Model regularization. Based on the experimental results, but also other traffic modes such as urban railway transit sys-
it is suggested to adopt D ROP O UT during the training, as tems, ride-sourcing services, and parking systems [73], [74].
the model accuracy remains high while the robustness It is also important to study the effect of adversarial attacks
can be improved after the D ROP O UT training. It is also on flow prediction, origin-destination demand prediction, and
suggested to test different drop regularization strategies other tasks relying on the GCN-based models. For the users of
before the actual deployment. the traffic prediction models, it is critical to develop models for
• Model privacy. The graph structure should not be dis- defending adversarial attacks and protecting traffic prediction
closed to the public, as it can significantly improve the results. Another way to protect the prediction model is through
efficiency of the attack algorithms. It is also suggested real-time anomaly detection and filtering of the incoming
to frequently update the prediction model, as the attack data stream, which could be a new research direction for
models rely on multiple trials and errors on the prediction improving the robustness of the traffic prediction models under
models. If the prediction model updates frequently, then adversarial attacks.
the robustness of the entire prediction system can be
significantly improved. S UPPLEMENTARY M ATERIALS
• Active defending strategies. Before actual deployment,
The proposed diffusion attack algorithm and evaluation
it is necessary to comprehensively test the vulnerability
framework are implemented in Python and open-sourced
of the prediction models and to identify the critical nodes
on GitHub (https://github.com/LYZ98/Adversarial-Diffusion-
with significant attack influence. For those important
Attacks-on-Graph-based-Traffic-Prediction-Models).
nodes, we can enhance the protection by regular patrol
in the physical world and consistency checking in the
cyber system. For example, if an RSU on a road segment ACKNOWLEDGMENT
is identified to be critical, then this device should be The work described in this study was supported by a grant
protected physically [72]. If the attack on this device funded by the Hong Kong Polytechnic University (Project No.
indeed occurs, the traffic center should spot the anomaly P0033933). The contents of this paper reflect the views of the
in real-time and block the information sent from this authors, who are responsible for the facts and the accuracy of
device. the information presented herein.

V. C ONCLUSION A PPENDIX A
M ORE D ETAILS ABOUT THE THREE TRAFFIC PREDICTION
In this paper, we explore the robustness and vulnerabil- MODELS AND ATTACK RESULTS
ity issues of graph-based neural network models for traffic
To train the three traffic prediction models, we set the
prediction. Different from existing adversarial attack tasks,
learning rate to be 0.001, batch size to be 32, and the number
adversarial attacks for traffic prediction require to degrade
of epoch to be 300. The two datasets are divided into two
the model performance for the entire network, rather than
parts, in which 80% and 20% are training set and testing
a specific sample of nodes. Given this, we propose a novel
set, respectively. Mean Squared Error (MSE) is used as the
concept of diffusion attack, which aims to reduce the predic-
loss function [30], and Adam is adopted as the optimizer. The
tion accuracy of the whole traffic network by perturbing a
testing accuracy in terms of accuracy and Root Mean Squared
small number of nodes. To solve for the diffusion attack task,
Error (RMSE) of the trained prediction models are presented
we develop an algorithm K G -S PSA, which consists of two
in TABLE III and TABLE IV, respectively. Overall, all the
major components: 1) using SPSA to generate the optimal
prediction models could achieve high prediction accuracy on
perturbations to maximize the attack effects; 2) adapting the
both datasets.
greedy algorithm in the knapsack problem to select the most
critical nodes. The proposed algorithm is examined with three TABLE III: Accuracy of trained models
widely used GCN-based traffic prediction models (S T-G CN,
Dataset Model Baseline D ROPOUT D ROPNODE D ROPEDGE
T-G CN, and A3 T-G CN) on the Los Angeles and Hong Kong
datasets. The experimental results indicate that the proposed S T-G CN 92.68% 92.70% 92.77% 88.18%
LA T-G CN 90.44% 89.24% 90.01% 90.05%
algorithm outperforms the baseline algorithms under various A3 T-G CN 89.04% 72.71% 89.15% 89.84%
scenarios, which demonstrates the effectiveness and efficiency S T-G CN 92.88% 93.20% 93.17% 92.80%
HK T-G CN 88.50% 87.70% 86.78% 88.08%
of the proposed algorithm. In addition, the proposed attack A3 T-G CN 89.47% 87.30% 80.57% 88.12%
algorithms can still generate effective adversarial samples
for traffic prediction models trained with drop regularization.
This study could help the public agencies and private sectors
better understand the robustness and vulnerability of GCN- A PPENDIX B
based traffic prediction models under adversarial attacks, and C OMPARISONS OF DIFFERENT ATTACK ALGORITHMS IN
strategies to improve the model robustness in different phrases TERMS OF AAIR. (B = 50)
are also discussed. Similar to TABLE I, we evaluate the performance of dif-
As for the future research directions, the proposed attack ferent attack algorithms in terms of AAIR in TABLE V. In
algorithms could be applied to not only road traffic prediction, general, similar arguments could be obtained based on the
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 11

TABLE IV: RMSE of trained models [18] S. Lee and D. B. Fambro, “Application of subset autoregressive in-
tegrated moving average model for short-term freeway traffic volume
Dataset Model Baseline D ROP O UT D ROP N ODE D ROP E DGE forecasting,” Transportation Research Record, vol. 1678, no. 1, pp. 179–
S T-G CN 4.30 4.28 4.25 6.95 188, 1999.
LA T-G CN 5.61 6.32 5.86 5.84 [19] B. M. Williams and L. A. Hoel, “Modeling and forecasting vehicular
A3 T-G CN 6.43 16.02 6.36 5.96 traffic flow as a seasonal arima process: Theoretical basis and empirical
S T-G CN 3.55 3.39 3.41 3.59 results,” Journal of transportation engineering, vol. 129, no. 6, pp. 664–
HK T-G CN 5.74 6.13 6.59 5.95 672, 2003.
A3 T-G CN 5.25 6.33 9.68 5.92 [20] C.-H. Wu, J.-M. Ho, and D.-T. Lee, “Travel-time prediction with sup-
port vector regression,” IEEE transactions on intelligent transportation
systems, vol. 5, no. 4, pp. 276–281, 2004.
[21] F. G. Habtemichael and M. Cetin, “Short-term traffic flow rate forecast-
AAIR, and the proposed K G -S PSA outperforms other baseline ing based on identifying similar traffic patterns,” Transportation research
models in LA, and performs similarly as the semi-black-box Part C: emerging technologies, vol. 66, pp. 61–78, 2016.
algorithms in HK. [22] I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic vol-
ume through kalman filtering theory,” Transportation Research Part B:
Results in TABLE VI follow the same patterns as in Methodological, vol. 18, no. 1, pp. 1–11, 1984.
TABLE II, and one can observe that the proposed attack [23] C. P. Van Hinsbergen, T. Schreiter, F. S. Zuurbier, J. Van Lint, and H. J.
algorithms can still generate effective adversarial samples in Van Zuylen, “Localized extended kalman filter for scalable real-time
traffic state estimation,” IEEE transactions on intelligent transportation
terms of AAIR. systems, vol. 13, no. 1, pp. 385–394, 2011.
[24] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning
R EFERENCES traffic as images: a deep convolutional neural network for large-scale
transportation network speed prediction,” Sensors, vol. 17, no. 4, p. 818,
[1] E. I. Vlahogianni, J. C. Golias, and M. G. Karlaftis, “Short-term traffic 2017.
forecasting: Overview of objectives and methods,” Transport reviews, [25] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
vol. 24, no. 5, pp. 533–557, 2004. computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[2] O. Lange and L. Perez, “Traffic prediction with advanced graph [26] A. Azzouni and G. Pujolle, “A long short-term memory recurrent neural
neural networks.” [Online]. Available: https://deepmind.com/blog/ network framework for network traffic matrix prediction,” arXiv preprint
article/traffic-prediction-with-advanced-graph-neural-networks arXiv:1705.05690, 2017.
[3] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, [27] N. Ramakrishnan and T. Soni, “Network traffic prediction using recur-
and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint rent neural networks,” in 2018 17th IEEE International Conference on
arXiv:1312.6199, 2013. Machine Learning and Applications (ICMLA). IEEE, 2018, pp. 187–
[4] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing 193.
adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
[28] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-
[5] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep
temporal graph convolutional networks for traffic flow forecasting,” in
neural networks,” IEEE Transactions on Evolutionary Computation,
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33,
vol. 23, no. 5, pp. 828–841, 2019.
2019, pp. 922–929.
[6] K. Zhou, T. P. Michalak, M. Waniek, T. Rahwan, and Y. Vorobeychik,
[29] F. Zhou, Q. Yang, K. Zhang, G. Trajcevski, T. Zhong, and A. Khokhar,
“Attacking similarity-based link prediction in social networks,” in Pro-
“Reinforced spatiotemporal attentive graph neural networks for traffic
ceedings of the 18th International Conference on Autonomous Agents
forecasting,” IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6414–
and MultiAgent Systems, ser. AAMAS ’19. Richland, SC: International
6428, 2020.
Foundation for Autonomous Agents and Multiagent Systems, 2019, p.
305–313. [30] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li,
[7] Y. Li, X. Xu, J. Xiao, S. Li, and H. T. Shen, “Adaptive square attack: “T-gcn: A temporal graph convolutional network for traffic prediction,”
Fooling autonomous cars with adversarial traffic signs,” IEEE Internet IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 9,
of Things Journal, pp. 1–1, 2020. pp. 3848–3858, 2020.
[8] M. Fang, G. Yang, N. Z. Gong, and J. Liu, “Poisoning attacks to graph- [31] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional net-
based recommender systems,” Proceedings of the 34th Annual Computer works: a deep learning framework for traffic forecasting,” in Proceedings
Security Applications Conference, Dec 2018. of the 27th International Joint Conference on Artificial Intelligence,
[9] S. Weckert, “Google maps hacks.” [Online]. Available: http://www. 2018, pp. 3634–3640.
simonweckert.com/googlemapshacks.html [32] J. Zhu, Y. Song, L. Zhao, and H. Li, “A3t-gcn: attention temporal
[10] A. Bayen, J. Butler, A. Patire et al., “Mobile millennium,” Tech. Rep. graph convolutional network for traffic forecasting,” arXiv preprint
UCB-ITS-CWP-2011-6, CCIT Research Report, UC Berkeley, Tech. arXiv:2006.11583, 2020.
Rep., 2011. [33] B. Yu, Y. Lee, and K. Sohn, “Forecasting road traffic speeds by
[11] M. T. Ahvanooey, Q. Li, M. Rabbani, and A. R. Rajput, “A survey on considering area-wide spatio-temporal dependencies based on a graph
smartphones security: software vulnerabilities, malware, and attacks,” convolutional neural network (gcn),” Transportation Research Part C:
arXiv preprint arXiv:2001.09406, 2020. Emerging Technologies, vol. 114, pp. 189–204, 2020.
[12] N. Lu, N. Cheng, N. Zhang, X. Shen, and J. W. Mark, “Connected [34] J. Ye, J. Zhao, K. Ye, and C. Xu, “How to build a graph-based deep
vehicles: Solutions and challenges,” IEEE Internet of Things Journal, learning architecture in traffic domain: A survey,” IEEE Transactions on
vol. 1, no. 4, pp. 289–299, 2014. Intelligent Transportation Systems, p. 1–21, 2020.
[13] B. K. J. Al-Shammari, N. Al-Aboody, and H. S. Al-Raweshidy, “Iot [35] D. Wang, J. Zhang, W. Cao, J. Li, and Y. Zheng, “When will you arrive?
traffic management and integration in the qos supported network,” IEEE estimating travel time based on deep neural networks,” in Proceedings
Internet of Things Journal, vol. 5, no. 1, pp. 352–370, 2018. of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
[14] Y. J. Edes, P. G. Michalopoulos, and R. A. Plum, “Improved estimation [36] L. Bai, L. Yao, S. Kanhere, X. Wang, Q. Sheng et al., “Stg2seq: Spatial-
of traffic flow for real-time control,” Transportation Research Record, temporal graph to sequence model for multi-step passenger demand
vol. 7, no. 9, p. 28, 1980. forecasting,” arXiv preprint arXiv:1905.10069, 2019.
[15] M. S. Ahmed and A. R. Cook, “Analysis of freeway traffic time-series [37] L. Liu, J. Chen, H. Wu, J. Zhen, G. Li, and L. Lin, “Physical-
data by using box-jenkins techniques,” Transportation Research Record, virtual collaboration modeling for intra-and inter-station metro ridership
no. 722, pp. 1–9, 1979. prediction,” IEEE Transactions on Intelligent Transportation Systems,
[16] M. M. Hamed, H. R. Al-Masaeid, and Z. M. B. Said, “Short-term 2020.
prediction of traffic volume in urban arterials,” Journal of Transportation [38] L. Lin, Z. He, and S. Peeta, “Predicting station-level hourly demand in a
Engineering, vol. 121, no. 3, pp. 249–254, 1995. large-scale bike-sharing network: A graph convolutional neural network
[17] M. Van Der Voort, M. Dougherty, and S. Watson, “Combining kohonen approach,” Transportation Research Part C: Emerging Technologies,
maps with arima time series models to forecast traffic flow,” Trans- vol. 97, p. 258–276, Dec 2018.
portation Research Part C: Emerging Technologies, vol. 4, no. 5, pp. [39] J. Yang, B. Guo, Z. Wang, and Y. Ma, “Hierarchical prediction based
307–318, 1996. on network-representation-learning-enhanced clustering for bike-sharing
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 12

TABLE V: Comparison of different diffusion attack algorithms in terms of AAIR. (B = 50)


LA HK
Types Algorithm
S T-G CN T-G CN A3 T-G CN S T-G CN T-G CN A3 T-G CN
D EGREE 1.54% 1.88% 2.07% 12.35% 8.69% 10.28%
K-M EDOIDS 1.37% 1.77% 2.33% 25.19% 11.84% 17.09%
PAGERANK 3.79% 3.77% 3.80% 19.90% 10.77% 12.57%
Semi-blackbox B ETWEENNESS 3.83% 3.70% 3.34% 30.95% 14.46% 43.92%
K G -B ETWEENNESS 4.73% 4.21% 2.55% 32.69% 15.07% 52.50%
K G -PAGERANK 7.88% 6.99% 8.38% 42.27% 23.91% 72.37%
R ANDOM 1.65% 2.51% 3.19% 28.57% 16.63% 45.33%
S PSA 5.80% 3.07% 12.36% 35.61% 15.00% 60.63%
Blackbox
K G -S PSA 8.32% 7.76% 22.77% 43.25% 24.26% 70.21%

TABLE VI: Comparison of K G -S PSA and K G -PAGERANK on [52] F. Liu, L. M. Moreno, and L. Sun, “One vertex attack on graph
the three defense strategies im terms of AAIR. (B = 50) neural networks-based spatiotemporal forecasting,” in ICLR Conference
OpenReview, 2021.
Dataset Model Baseline D ROP O UT D ROP N ODE D ROP E DGE [53] B. Finkelshtein, C. Baskin, E. Zheltonozhskii, and U. Alon, “Single-
K G -S PSA node attack for fooling graph neural networks,” arXiv preprint
S T-G CN 8.32% 8.73% 11.36% 13.88% arXiv:2011.03574, 2020.
LA T-G CN 7.76% 6.21% 6.15% 4.98% [54] B. Wang, T. Zhou, M. Lin, P. Zhou, A. Li, M. Pang, C. Fu, H. Li,
A3 T-G CN 22.77% 8.33% 38.11% 14.35% and Y. Chen, “Efficient evasion attacks to graph neural networks via
S T-G CN 43.25% 23.66% 27.51% 46.36% influence function,” arXiv preprint arXiv:2009.00203, 2020.
HK T-G CN 24.26% 23.47% 22.83% 22.75% [55] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
A3 T-G CN 70.21% 79.80% 24.34% 230.85% A. Swami, “The limitations of deep learning in adversarial settings,” in
K G -PAGERANK 2016 IEEE European symposium on security and privacy (EuroS&P).
S T-G CN 7.88% 4.47% 7.84% 9.38% IEEE, 2016, pp. 372–387.
LA T-G CN 6.99% 5.60% 5.71% 4.44%
A3 T-G CN 8.38% 6.93% 12.08% 6.60%
[56] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural
S T-G CN 42.27% 24.47% 29.25% 45.24% networks,” in 2017 IEEE Symposium on Security and Privacy (SP),
HK T-G CN 23.91% 23.09% 22.07% 22.86% 2017, pp. 39–57.
A3 T-G CN 72.37% 52.71% 56.86% 174.56% [57] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple
and accurate method to fool deep neural networks,” in Proceedings of
the IEEE conference on computer vision and pattern recognition, 2016,
pp. 2574–2582.
system in smart city,” IEEE Internet of Things Journal, vol. 8, no. 8, [58] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and
pp. 6416–6424, 2021. A. Swami, “Practical black-box attacks against machine learning,” in
[40] K. F. Chu, A. Y. S. Lam, and V. O. K. Li, “Deep multi-scale Proceedings of the 2017 ACM on Asia conference on computer and
convolutional lstm network for travel demand and origin-destination communications security, 2017, pp. 506–519.
predictions,” IEEE Transactions on Intelligent Transportation Systems, [59] W. Brendel, J. Rauber, and M. Bethge, “Decision-based adversarial
vol. 21, no. 8, pp. 3219–3232, 2020. attacks: Reliable attacks against black-box machine learning models,”
[41] A. Monti, A. Bertugli, S. Calderara, and R. Cucchiara, “Dag-net: Double arXiv preprint arXiv:1712.04248, 2017.
attentive graph neural network for trajectory forecasting,” arXiv preprint [60] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zeroth
arXiv:2005.12661, 2020. order optimization based black-box attacks to deep neural networks
[42] A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-stgcnn: without training substitute models,” in Proceedings of the 10th ACM
A social spatio-temporal graph convolutional neural network for human Workshop on Artificial Intelligence and Security, 2017, pp. 15–26.
trajectory prediction,” in Proceedings of the IEEE/CVF Conference on [61] C.-C. Tu, P. Ting, P.-Y. Chen, S. Liu, H. Zhang, J. Yi, C.-J. Hsieh, and
Computer Vision and Pattern Recognition, 2020, pp. 14 424–14 432. S.-M. Cheng, “Autozoom: Autoencoder-based zeroth order optimization
[43] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, method for attacking black-box neural networks,” in Proceedings of the
“Geometric deep learning: going beyond euclidean data,” IEEE Signal AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 742–749.
Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017. [62] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin, “Black-box adver-
[44] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-term memory sarial attacks with limited queries and information,” arXiv preprint
neural network for traffic speed prediction using remote microwave arXiv:1804.08598, 2018.
sensor data,” Transportation Research Part C: Emerging Technologies,
[63] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning
vol. 54, pp. 187–197, 2015.
in computer vision: A survey,” Ieee Access, vol. 6, pp. 14 410–14 430,
[45] J. Dai, W. Zhu, and X. Luo, “A targeted universal attack on graph
2018.
convolutional network,” arXiv preprint arXiv:2011.14365, 2020.
[46] D. Zügner and S. Günnemann, “Adversarial attacks on graph neural [64] J. C. Spall, “An overview of the simultaneous perturbation method for
networks via meta learning,” arXiv preprint arXiv:1902.08412, 2019. efficient optimization,” Johns Hopkins apl technical digest, vol. 19, no. 4,
[47] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adver- pp. 482–492, 1998.
sarial attack on graph structured data,” arXiv preprint arXiv:1806.02371, [65] J. Uesato, B. O’Donoghue, P. Kohli, and A. van den Oord, “Adversarial
2018. risk and the dangers of evaluating against weak attacks,” in Proceed-
[48] J. Ma, S. Ding, and Q. Mei, “Towards more practical adversarial attacks ings of the 35th International Conference on Machine Learning, ser.
on graph neural networks,” Advances in neural information processing Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds.,
systems, 2020. vol. 80. Stockholmsmässan, Stockholm Sweden: PMLR, 10–15 Jul
[49] J. Ma, J. Deng, and Q. Mei, “Near-black-box adversarial attacks on 2018, pp. 5025–5034.
graph neural networks as an influence maximization problem,” in ICLR [66] S. Martello, D. Pisinger, and P. Toth, “Dynamic programming and strong
Conference OpenReview, 2021. bounds for the 0-1 knapsack problem,” Management science, vol. 45,
[50] K. Xu, H. Chen, S. Liu, P.-Y. Chen, T.-W. Weng, M. Hong, and no. 3, pp. 414–424, 1999.
X. Lin, “Topology attack and defense for graph neural networks: An [67] B. Zhang and Z. Hua, “A unified method for a class of convex sepa-
optimization perspective,” arXiv preprint arXiv:1906.04214, 2019. rable nonlinear knapsack problems,” European Journal of Operational
[51] X. Xu, X. Du, and Q. Zeng, “Attacking graph-based classification Research, vol. 191, no. 1, pp. 1–6, 2008.
without changing existing connections,” in Annual Computer Security [68] Y. Rong, W. Huang, T. Xu, and J. Huang, “Dropedge: Towards deep
Applications Conference, ser. ACSAC ’20. New York, NY, USA: graph convolutional networks on node classification,” in ICLR Confer-
Association for Computing Machinery, 2020, p. 951–962. ence OpenReview, 2020.
MANUSCRIPT SUBMITTED TO IEEE INTERNET OF THINGS JOURNAL, 2021 13

[69] X. L. Lingwei Chen and D. Wu, “Enhancing robustness of graph Wei Ma received bachelor’s degrees in Civil En-
convolutional networks via dropping graph connections.” [Online]. gineering and Mathematics from Tsinghua Univer-
Available: https://faculty.ist.psu.edu/wu/papers/DropCONN.pdf sity, China, master degrees in Machine Learning
[70] N. K. Kaur, U. Kaur, and D. D. Singh, “K-medoid clustering algorithm-a and Civil and Environmental Engineering, and PhD
review,” International Journal of Computer Application and Technology degree in Civil and Environmental Engineering from
(IJCAT), vol. 1, no. 1, pp. 2349–1841, 2014. Carnegie Mellon University, USA. He is currently an
[71] M. L. Tam and W. H. Lam, “Application of automatic vehicle identi- assistant professor with the Department of Civil and
fication technology for real-time journey time estimation,” Information Environmental Engineering at the Hong Kong Poly-
Fusion, vol. 12, no. 1, pp. 11–19, 2011. technic University (PolyU). His research focuses on
[72] J. Zhang, Y. Wang, S. Li, and S. Shi, “An architecture for iot-enabled intersection of machine learning, data mining, and
smart transportation security system: A geospatial approach,” IEEE transportation network modeling, with applications
Internet of Things Journal, vol. 8, no. 8, pp. 6205–6213, 2021. for smart and sustainable mobility systems. He has received awards for
[73] S. Yang, W. Ma, X. Pi, and S. Qian, “A deep learning approach research excellence and his contributions to the area, including 2020 Mao
to real-time parking occupancy prediction in transportation networks Yisheng Outstanding Dissertation Award, and best paper award (theoretical
incorporating multiple spatio-temporal data sources,” Transportation track) at INFORMS Data Mining and Decision Analytics Workshop.
Research Part C: Emerging Technologies, vol. 107, pp. 248–265, 2019.
[74] J. Zhang, H. Che, F. Chen, W. Ma, and Z. He, “Short-term origin-
destination demand prediction in urban rail transit systems: A channel-
wise attentive split-convolutional neural network method,” Transporta-
tion Research Part C: Emerging Technologies, vol. 124, p. 102928, 2021.

Lyuyi ZHU is an undergraduate student from Col-


lege of Civil Engineering and Architecture, Zhe-
jiang University, Hangzhou, China. He will join the
School of Data Science, City University of Hong
Kong as a PhD student. His research interest in-
cludes machine learning, optimization and numerical
method.

Kairui Feng is currently a Ph.D. student from


Department of Civil and Environmental Engineer-
ing, Princeton University, New Jersey, USA. He
received bachelor’s degrees in Civil Engineering and
Mathematics from Tsinghua University, China. His
research interest includes infrastructure system mod-
eling/optimization and climate change using data-
driven and numerical approaches.

Ziyuan Pu is currently a Lecturer (Assistant Profes-


sor) at Monash University. He received B.S. degree
in transportation engineering in 2010 at Southeast
University, China. He received M.S. and Ph.D. de-
gree in civil and environmental engineering in 2015
and 2020, respectively, at the University of Washing-
ton, US. His research interest includes transportation
data science, smart transportation infrastructures,
connected and autonomous vehicles (CAV), and ur-
ban computing.

You might also like