0% found this document useful (0 votes)
35 views21 pages

(2017) Formal Guarantees On The Robustness of A Classifier Against Adversarial Manipulation

This document discusses recent work showing that machine learning classifiers can be manipulated by small adversarial changes to input data. The authors propose a new approach called Cross-Lipschitz regularization that provides formal guarantees of a classifier's robustness against adversarial examples. They show that classifiers trained with Cross-Lipschitz regularization require larger adversarial changes to manipulate the model's predictions while maintaining prediction performance.

Uploaded by

Sofiane Bessai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views21 pages

(2017) Formal Guarantees On The Robustness of A Classifier Against Adversarial Manipulation

This document discusses recent work showing that machine learning classifiers can be manipulated by small adversarial changes to input data. The authors propose a new approach called Cross-Lipschitz regularization that provides formal guarantees of a classifier's robustness against adversarial examples. They show that classifiers trained with Cross-Lipschitz regularization require larger adversarial changes to manipulate the model's predictions while maintaining prediction performance.

Uploaded by

Sofiane Bessai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Formal Guarantees on the Robustness of a

Classifier against Adversarial Manipulation

Matthias Hein and Maksym Andriushchenko


Department of Mathematics and Computer Science
Saarland University, Saarbrücken Informatics Campus, Germany
arXiv:1705.08475v2 [cs.LG] 5 Nov 2017

Abstract
Recent work has shown that state-of-the-art classifiers are quite brittle,
in the sense that a small adversarial change of an originally with high
confidence correctly classified input leads to a wrong classification again
with high confidence. This raises concerns that such classifiers are vulnerable
to attacks and calls into question their usage in safety-critical systems. We
show in this paper for the first time formal guarantees on the robustness
of a classifier by giving instance-specific lower bounds on the norm of the
input manipulation required to change the classifier decision. Based on
this analysis we propose the Cross-Lipschitz regularization functional. We
show that using this form of regularization in kernel methods resp. neural
networks improves the robustness of the classifier with no or small loss in
prediction performance.

1 Introduction
The problem of adversarial manipulation of classifiers has been addressed initially in the area
of spam email detection, see e.g. [5, 16]. The goal of the spammer is to manipulate the spam
email (the input of the classifier) in such a way that it is not detected by the classifier. In deep
learning the problem was brought up in the seminal paper by [24]. They showed for state-of-
the-art deep neural networks, that one can manipulate an originally correctly classified input
image with a non-perceivable small transformation so that the classifier now misclassifies
this image with high confidence, see [7] or Figure 4 for an illustration. This property calls
into question the usage of neural networks and other classifiers showing this behavior in
safety critical systems, as they are vulnerable to attacks. On the other hand this also shows
that the concepts learned by a classifier are still quite far away from the visual perception
of humans. Subsequent research has found fast ways to generate adversarial samples with
high probability [7, 12, 19] and suggested to use them during training as a form of data
augmentation to gain more robustness. However, it turns out that the so-called adversarial
training does not settle the problem as one can yet again construct adversarial examples
for the final classifier. Interestingly, it has recently been shown that there exist universal
adversarial changes which when applied lead, for every image, to a wrong classification with
high probability [17]. While one needs access to the neural network model for the generation
of adversarial changes, it has been shown that adversarial manipulations generalize across
neural networks [18, 15, 14], which means that neural network classifiers can be attacked
even as a black-box method. The most extreme case has been shown recently [15], where
they attack the commercial system Clarifai, which is a black-box system as neither the
underlying classifier nor the training data are known. Nevertheless, they could successfully
generate adversarial images with an existing network and fool this commercial system. This
emphasizes that there are indeed severe security issues with modern neural networks. While
countermeasures have been proposed [8, 7, 26, 18, 12, 2], none of them provides a guarantee

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
of preventing this behavior [3]. One might think that generative adversarial neural networks
should be resistant to this problem, but it has recently been shown [13] that they can also
be attacked by adversarial manipulation of input images.
In this paper we show for the first time instance-specific formal guarantees on the robustness
of a classifier against adversarial manipulation. That means we provide lower bounds on the
norm of the change of the input required to alter the classifier decision or said otherwise: we
provide a guarantee that the classifier decision does not change in a certain ball around the
considered instance. We exemplify our technique for two widely used family of classifiers:
kernel methods and neural networks. Based on the analysis we propose a new regularization
functional, which we call Cross-Lipschitz Regularization. This regularization functional
can be used in kernel methods and neural networks. We show that using Cross-Lipschitz
regularization improves both the formal guarantees of the resulting classifier (lower bounds)
as well as the change required for adversarial manipulation (upper bounds) while maintaining
similar prediction performance achievable with other forms of regularization. While there
exist fast ways to generate adversarial samples [7, 12, 19] without constraints, we provide
algorithms based on the first order approximation of the classifier which generate adversarial
samples satisfying box constraints in O(d log d), where d is the input dimension.

2 Formal Robustness Guarantees for Classifiers


In the following we consider the multi-class setting for K classes and d features where one
has a classifier f : Rd → RK and a point x is classified via c = arg max fj (x). We call a
j=1,...,K
classifier robust at x if small changes of the input do not alter the decision. Formally, the
problem can be described as follows [24]. Suppose that the classifier outputs class c for
6 c (we assume the decision is unique). The problem of
input x, that is fc (x) > fj (x) for j =
generating an input x + δ such that the classifier decision changes, can be formulated as
min kδkp , s.th. max fl (x + δ) ≥ fc (x + δ) and x + δ ∈ C, (1)
δ∈Rd l6=c

where C is a constraint set specifying certain requirements on the generated input x + δ, e.g.,
an image has to be in [0, 1]d . Typically, the optimization problem (1) is non-convex and thus
intractable. The so generated points x + δ are called adversarial samples. Depending on
the p-norm the perturbations have different characteristics: for p = ∞ the perturbations are
small and affect all features, whereas for p = 1 one gets sparse solutions up to the extreme
case that only a single feature is changed. In [24] they used p = 2 which leads to more spread
but still localized perturbations. The striking result of [24, 7] was that for most instances
in computer vision datasets, the change δ necessary to alter the decision is astonishingly
small and thus clearly the label should not change. However, we will see later that our new
regularizer leads to robust classifiers in the sense that the required adversarial change is so
large that now also the class label changes (we have found the correct decision boundary),
see Fig 4. Already in [24] it is suggested to add the generated adversarial samples as a form
of data augmentation during the training of neural networks in order to achieve robustness.
This is denoted as adversarial training. Later on fast ways to approximately solve (1) were
proposed in order to speed up the adversarial training process [7, 12, 19]. However, in this
way, given that the approximation is successful, that is arg max fj (x + δ) 6= c, one gets just
j
upper bounds on the perturbation necessary to change the classifier decision. Also it was
noted early on, that the final classifier achieved by adversarial training is again vulnerable
to adversarial samples [7]. Robust optimization has been suggested as a measure against
adversarial manipulation [12, 21] which effectively boils down to adversarial training in
practice. It is thus fair to say that up to date no mechanism exists which prevents the
generation of adversarial samples nor can defend against it [3].
In this paper we focus instead on robustness guarantees, that is we show that the classifier
decision does not change in a small ball around the instance. Thus our guarantees hold for
any method to generate adversarial samples or input transformations due to noise or sensor
failure etc. Such formal guarantees are in our point of view absolutely necessary when a
classifier becomes part of a safety-critical technical system such as autonomous driving. In
the following we will first show how one can achieve such a guarantee and then explicitly

2
derive bounds for kernel methods and neural networks. We think that such formal guarantees
on robustness should be investigated further and it should become standard to report them
for different classifiers alongside the usual performance measures.

2.1 Formal Robustness Guarantee against Adversarial Manipulation

The following guarantee holds for any classifier which is continuously differentiable with
respect to the input in each output component. It is instance-specific and depends to some
extent on the confidence in the decision, at least if we measure confidence by the relative
difference fc (x) − maxj6=c fj (x) as it is typical for the cross-entropy loss and other multi-class
losses. In the following we use the notation Bp (x, R) = {y ∈ Rd | kx − ykp ≤ R}.
Theorem 2.1. Let x ∈ Rd and f : Rd → RK be a multi-class classifier with continuously
differentiable components and let c = arg max fj (x) be the class which f predicts for x. Let
j=1,...,K
1 1
q ∈ R be defined as p= 1, then for all δ ∈ Rd with
+ q
 
 fc (x) − fj (x) 
kδkp ≤ max min min , R := α,
R>0  j6=c max k∇fc (y) − ∇fj (y)kq 
y∈Bp (x,R)

it holds c = arg max fj (x + δ), that is the classifier decision does not change on Bp (x, α).
j=1,...,K

Proof. By the main theorem of calculus, it holds that


Z 1
fj (x + δ) = fj (x) + h∇fj (x + tδ), δi dt, for j = 1, . . . , K.
0

Thus, in order to achieve fj (x + δ) ≥ fc (x + δ), it has to hold that


Z 1
0 ≤ fc (x) − fj (x) ≤ h∇fj (x + tδ) − ∇fc (x + tδ), δi dt
0
Z 1
≤ kδkp k∇fj (x + tδ) − ∇fc (x + tδ)kq dt,
0

where the first inequality holds as fc (x) ≥ fj (x) for all j = 1, . . . , K and in the last step we
have used Hölder inequality together with the fact that the q-norm is dual to the p-norm,
where q is defined via p1 + 1q = 1. Thus the minimal norm of the change δ required to change
the classifier decision from c to j satisfies
fc (x) − fj (x)
kδkp ≥ R 1 .
0
k∇fj (x + tδ) − ∇fc (x + tδ)kq dt

We upper bound the denominator over some fixed ball Bp (x, R). Note that by doing this,
we can only make assertions of perturbations δ ∈ Bp (0, R) and thus the upper bound in the
guarantee is at most R. It holds
Z 1
sup k∇fj (x + tδ) − ∇fc (x + tδ)kq dt ≤ maxy∈Bp (x,R) k∇fj (y) − ∇fc (y)kq .
δ∈Bp (0,R) 0

Thus we get the lower bound for the minimal norm of the change δ required to change the
classifier decision from c to j,
( )
fc (x) − fj (x)
kδkp ≥ min R, := α.
maxy∈Bp (x,R) k∇fj (y) − ∇fc (y)kq

As we are interested in the worst case, we take the minimum over all j 6= c. Finally, the result
holds for any fixed R > 0 so that we can maximize over R which yields the final result.

3
Note that the bound requires in the denominator a bound on the local Lipschitz constant
of all cross terms fc − fj , which we call local cross-Lipschitz constant in the following.
However, we do not require to have a global bound. The problem with a global bound is
that the ideal robust classifier is basically piecewise constant on larger regions with sharp
transitions between the classes. However, the global Lipschitz constant would then just be
influenced by the sharp transition zones and would not yield a good bound, whereas the
local bound can adapt to regions where the classifier is approximately constant and then
yields good guarantees. In [24, 4] they suggest to study the global Lipschitz constant1 of
each fj , j = 1, . . . , K. A small global Lipschitz constant for all fj implies a good bound as
k∇fj (y) − ∇fc (y)kq ≤ k∇fj (y)kq + k∇fc (y)kq , (2)
but the converse does not hold. As discussed below it turns out that our local estimates are
significantly better than the suggested global estimates which implies also better robustness
guarantees. In turn we want to emphasize that our bound is tight, that is the bound is
attained, for linear classifiers fj (x) = hwj , xi, j = 1, . . . , K. It holds
hwc − wj , xi
kδkp = min .
j6=c kwc − wj kq
In Section 4 we refine this result for the case when the input is constrained to [0, 1]d . In
general, it is possible to integrate constraints on the input by simply doing the maximum
over the intersection of Bp (x, R) with the constraint set e.g. [0, 1]d for gray-scale images.

2.2 Evaluation of the Bound for Kernel Methods


Next, we discuss how the bound can be evaluated for different classifier models. For simplicity
we restrict ourselves to the case p = 2 (which implies q = 2) and leave the other cases to
future work. We consider the class of kernel methods, that is the classifier has the form
n
X
fj (x) = αjr k(xr , x),
r=1

where (xr )nr=1 are the n training points, k : Rd × Rd → R is a positive definite kernel function
and α ∈ RK×n are the trained parameters e.g. of a SVM. The goal is to upper bound the
term maxy∈B2 (x,R) k∇fj (y) − ∇fc (y)k2 for this classifier model. A simple calculation shows
n
X
2
0 ≤ k∇fj (y) − ∇fc (y)k2 = (αjr − αcr )(αjs − αcs ) h∇y k(xr , y), ∇y k(xs , y)i (3)
r,s=1

It has been reported that kernel methods with a Gaussian kernel are robust to noise. Thus
2
we specialize now to this class, that is k(x, y) = e−γkx−yk2 . In this case
2 2
h∇y k(xr , y), ∇y k(xs , y)i = 4γ 2 hy − xr , y − xs i e−γkxr −yk2 e−γkxs −yk2 .
We will now derive lower and upper bounds on this term uniformly over B2 (x, R) which
allows us to derive the guarantee.
n o
k2x−xr −xs k2
Lemma 2.1. Let M = min 2 , R , then
max hy − xr , y − xs i = hx − xr , x − xs i + R k2x − xr − xs k2 + R2
y∈B2 (x,R)

min hy − xr , y − xs i = hx − xr , x − xs i − M k2x − xr − xs k2 + M 2
y∈B2 (x,R)
 
2 2 −γ kx−xr k22 +kx−xs k22 −2M k2x−xr −xs k2 +2M 2
max e−γkxr −yk2 e−γkxs −yk2 = e
y∈B2 (x,R)
 
−γkxr −yk22 −γkxs −yk22 −γ kx−xr k22 +kx−xs k22 +2Rk2x−xr −xs k2 +2R2
min e e =e
y∈B2 (x,R)

1
The Lipschitz constant L wrt to p-norm of a piecewise continuously differentiable function is
given as L = supx∈Rd k∇f (x)kq . Then it holds, |f (x) − f (y)| ≤ L kx − ykp .

4
Proof. For the first part we use
maxy∈B2 (x,R) hy − xr , y − xs i = maxh∈B2 (0,R) hx − xr + h, x − xs + hi
2
= hx − xr , x − xs i + maxh∈B2 (0,R) hh, 2x − xr − xs i + khk2
= hx − xr , x − xs i + R k2x − xr − xs k2 + R2 ,
where the last equality follows by Cauchy-Schwarz and noting that equality is attained as
we maximize over the Euclidean unit ball. For the second part we consider
miny∈B2 (x,R) hy − xr , y − xs i = minh∈B2 (0,R) hx − xr + h, x − xs + hi
2
= hx − xr , x − xs i + minh∈B2 (0,R) hh, 2x − xr − xs i + khk2
= hx − xr , x − xs i + min0≤α≤R −α k2x − xr − xs k2 + α2
n k2x − x − x k o
r s 2
= hx − xr , x − xs i − min , R k2x − xr − xs k2
2
 n k2x − x − x k 2
r s 2
+ min , R} ,
2
where in the second step we have separated direction and norm of the vector, optimization
over the direction yields with Cauchy-Schwarz the result. Finally, the constrained convex
k2x−xr −xs k2
one-dimensional optimization problem can be solved explicitly as α = min{ 2 , R}.
The proof of the other results follows analogously noting that
 
2 2 −γ kx−xr k22 +kx−xs k22 +2hh,2x−xr −xs i+2khk22
e−γkx+h−xr k2 e−γkx+h−xs k2 = e .

Using this lemma it is easy to derive the final result.


n o
k2x−xr −xs k2
Proposition 2.1. Let βr = αjr − αcr , r = 1, . . . , n and define M = min 2 ,R
and S = k2x − xr − xs k2 . Then
maxy∈B2 (x,R) k∇fj (y) − ∇fc (y)k2 ≤ 2γ
n h 
kx−xr k22 +kx−xs k22 −2M S+2M 2
X
βr βs max{hx − xr , x − xs i + RS + R2 , 0}e−γ
r,s=1
βr βs ≥0
i
2 −γ kx−xr k22 +kx−xs k22 +2RS+2R2
+ min{hx − xr , x − xs i + RS + R , 0}e
n h 
kx−xr k22 +kx−xs k22 +2RS+2R2
X
+ βr βs max{hx − xr , x − xs i − M S + M 2 , 0}e−γ
r,s=1
βr βs <0
1
 i! 2
kx−xr k22 +kx−xs k22 −2M S+2M 2
+ min{hx − xr , x − xs i − M S + M 2 , 0}e−γ

Proof. We bound each term in the sum in Equation 3 separately using that ac ≤ bd if b, c ≥ 0,
a ≤ b and c ≤ d or b ≤ 0, d ≥ 0, a ≤ b and c ≥ d, where c, d correspond to the exponential
terms and a, b to the upper bounds of the inner product. Similarly, ab ≥ cd if b, c ≥ 0, a ≥ b
and c ≥ d or a ≤ 0, d ≥ 0, a ≥ b and c ≤ d. The individual upper and lower bounds are
taken from Lemma 2.1.

While the bound leads to non-trivial estimates as seen in Section 5, the bound is not very
tight. The reason is that the sum is bounded elementwise, which is quite pessimistic. We
think that better bounds are possible but have to postpone this to future work.

5
2.3 Evaluation of the Bound for Neural Networks
We derive the bound for a neural network with one hidden layer. In principle, the technique
we apply below can be used for arbitrary layers but the computational complexity increases
rapidly. The problem is that in the directed network topology one has to consider almost
each path separately to derive the bound. Let U be the number of hidden units and w, u
are the weight matrices of the output resp. input layer. We assume that the activation
function σ is continuously differentiable and assume that the derivative σ 0 is monotonically
increasing. Our prototype activation function we have in mind and which we use later
on in the experiment is the differentiable approximation, σα (x) = α1 log(1 + eαx ) of the
ReLU activation function σReLU (x) = max{0, x}. Note that limα→∞ σα (x) = σReLU (x) and
σα0 (x) = 1+e1−αx . The output of the neural network can be written as
U
X d
X 
fj (x) = wjr σ urs xs , j = 1, . . . , K,
r=1 s=1

where for simplicity we omit any bias terms, but it is straightforward to consider also models
with bias. A direct computation shows that
U
X d
X
2
k∇fj (y) − ∇fc (y)k2 = (wjr − wcr )(wjm − wcm )σ 0 (hur , yi)σ 0 (hum , yi) url uml , (4)
r,m=1 l=1

where ur ∈ Rd is the r-th row of the weight matrix u ∈ RU ×d . The resulting bound is given
in the following proposition.
Proposition 2.2. Let σ be a continuously differentiable activation function with σ 0 mono-
Pd
tonically increasing. Define βrm = (wjr − wcr )(wjm − wcm ) l=1 url uml . Then
maxy∈B2 (x,R) k∇fj (y) − ∇fc (y)k2
h XU
max{βrm , 0}σ 0 hur , xi + R kur k2 σ 0 hum , xi + R kum k2
 

r,m=1
i 12
+ min{βrm , 0}σ 0 hur , xi − R kur k2 σ 0 hum , xi − R kum k2


Proof. The proof is based on the fact that due the monotonicity of σ 0 and with Cauchy-
Schwarz,
maxy∈B2 (x,R) σ 0 (hur , yi) = maxh∈B2 (0,R) σ 0 (hur , xi + hur , hi) = σ 0 (hur , xi + R kur k2 ).
Similarly, one gets
miny∈B2 (x,R) σ 0 (hur , yi) = minh∈B2 (0,R) σ 0 (hur , xi + hur , hi) = σ 0 (hur , xi − R kur k2 ).
The rest of the result follows by element-wise bounding the terms in the sum in Equation
4.

As discussed above the global Lipschitz bounds of the individual classifier outputs, see (2),
lead to an upper bound of our desired local cross-Lipschitz constant. In the experiments
below our local bounds on the Lipschitz constant are up to 8 times smaller, than what one
would achieve via the global Lipschitz bounds of [24]. This shows that their global approach
is much too rough to get meaningful robustness guarantees.

3 The Cross-Lipschitz Regularization Functional


We have seen in Section 2 that if
max max k∇fc (y) − ∇fj (y)kq , (5)
j6=c y∈Bp (x,R)

is small and fc (x) − fj (x) is large, then we get good robustness guarantees. The latter
property is typically already optimized in a multi-class loss function. We consider for all

6
methods in this paper the cross-entropy loss so that the differences in the results only come
from the chosen function class (kernel methods versus neural networks) and the chosen
regularization functional. The cross-entropy loss L : {1, . . . , K} × RK → R is given as
 efy (x)   XK 
L(y, f (x)) = − log PK = log 1 + efk (x)−fy (x) .
fk (x)
k=1 e k6=y

In the latter formulation it becomes apparent that the loss tries to make the difference
fy (x) − fk (x) as large as possible for all k = 1, . . . , K.
As our goal are good robustness guarantees it is natural to consider a proxy of the quantity
in (5) for regularization. We define the Cross-Lipschitz Regularization functional as
n K
1 X X 2
Ω(f ) = k∇fl (xi ) − ∇fm (xi )k2 , (6)
nK 2 i=1
l,m=1

where the (xi )ni=1 are the training points. The goal of this regularization functional is to
make the differences of the classifier functions at the data points as constant as possible. In
total by minimizing
n
1X 
L yi , f (xi ) + λΩ(f ), (7)
n i=1

over some function class we thus try to maximize fc (xi ) − fj (xi ) and at the same time
2
keep k∇fl (xi ) − ∇fm (xi )k2 small uniformly over all classes. This automatically enforces
robustness of the resulting classifier. It is important to note that this regularization functional
is coherent with the loss as it shares the same degrees of freedom, that is adding the same
function g to all outputs: fj0 (x) = fj (x) + g(x) leaves loss and regularization functional
invariant. This is the main difference to [4], where they enforce the global Lipschitz constant
to be smaller than one.

3.1 Cross-Lipschitz Regularization in Kernel Methods

In kernel methods one uses typically the regularization P


functional induced by the kernel which
n
is given as the squared norm of the function, f (x) = i=1 αi k(xi , x), in the corresponding
2 Pn
reproducing kernel Hilbert space Hk , kf kHk = i,j=1 αi αj k(xi , xj ). In particular, for
translation invariant kernels one can make directly a connection to penalization of derivatives
of the function f via the Fourier transform, see [20]. However, penalizing higher-order
derivatives is irrelevant for achieving robustness. Given the kernel expansion of f , one can
write the Cross-Lipschitz regularization function as
n K n
1 X X X
Ω(f ) = (αlr − αmr )(αls − αms ) h∇y k(xr , xi ), ∇y k(xs , xi )i
nK 2 i,j=1 r,s=1
l,m=1

Ω is convex in α ∈ RK×n as k 0 (xr , xs ) = h∇y k(xr , xi ), ∇y k(xs , xi )i is a positive definite


kernel for any xi and with the convex cross-entropy loss the learning problem in (7) is convex.

3.2 Cross-Lipschitz Regularization in Neural Networks

The standard way to regularize neural networks is weight decay; that is, the squared
Euclidean norm of all weights is added to the objective. More recently dropout [22], which
can be seen as a form of stochastic regularization, has been introduced. Dropout can
also be interpreted as a form of regularization of the weights [22, 10]. It is interesting
to note that classical regularization functionals which penalize derivatives of the resulting
classifier function are not typically used in deep learning, but see [6, 11]. As noted above
we restrict ourselves to one hidden layer neural networks to simplify notation, that is,
PU Pd 
fj (x) = r=1 wjr σ s=1 urs xs , j = 1, . . . , K. Then we can write the Cross-Lipschitz

7
regularization as
U K K K n d
2 X X X X X
0 0
X
Ω(f ) = w w
lr ls − w lr wms σ (hur , xi i)σ (hus , xi i) url usl
nK 2 r,s=1 m=1 i,j=1
l=1 l=1 l=1

which leads to an expression which can be fast evaluated using vectorization. Obviously, one
can also implement the Cross-Lipschitz Regularization also for all standard deep networks.

4 Box Constrained Adversarial Sample Generation


The main emphasis of this paper are robustness guarantees without resorting to particular
ways how to generate adversarial samples. On the other hand while Theorem 2.1 gives
lower bounds on the required input transformation, efficient ways to approximately solve
the adversarial sample generation in (1) are helpful to get upper bounds on the required
change. Upper bounds allow us to check how tight our derived lower bounds are. As all of
our experiments will be concerned with images, it is reasonable that our adversarial samples
are also images. However, up to our knowledge, the current main techniques to generate
adversarial samples [7, 12, 19] integrate box constraints by clipping the results to [0, 1]d . We
provide in the following fast algorithms to generate adversarial samples which lie in [0, 1]d .
The strategy is similar to [12], where they use a linear approximation of the classifier to
derive adversarial samples with respect to different norms. Formally,
fj (x + δ) ≈ fj (x) + h∇fj (x), δi , j = 1, . . . , K.
Assuming that the linear approximation holds, the optimization problem (1) integrating box
constraints for changing class c into j becomes
minδ∈Rd kδkp (8)
sbj. to: fj (x) − fc (x) ≥ h∇fc (x) − ∇fj (x), δi
0 ≤ xj + δj ≤ 1
In order to get the minimal adversarial sample we have to solve this for all j 6= c and take
the one with minimal kδkp . This yields the minimal adversarial change for linear classiifers.
Note that (8) is a convex optimization problem, which can be reduced to a one-parameter
problem in the dual. This allows to derive the following result (proofs and algorithms are in
the supplement).
Proposition 4.1. Let p ∈ {1, 2, ∞}, then (8) can be solved in O(d log d) time.

We start with problem for p = 2 which is given as:


minδ∈Rd kδk2 (9)
sbj. to: fj (x) − fc (x) ≥ h∇fc (x) − ∇fj (x), δi
0 ≤ xj + δ j ≤ 1
Lemma 4.1. Let m = arg max fj (x) and define v = ∇fm (x) − ∇fj (x) and 0 > c =
j
fj (x) − fm (x). If a solution of problem (9) exists, then it is given as

−λvr , if − xr ≤ −λvr ≤ 1 − xr ,
δ r = 1 − xr , if − λvr ≥ 1 − xr ,
−xr , if − xr ≥ −λvr .

The optimal λ ≥ 0 can be obtained by solving


X X X
c = hv, δi = −λ vr2 + vr (1 − xr ) − vr x r .
−xr ≤−λvr ≤1−xr −λvr >1−xr −λvr <−xr

If Problem 9 is infeasible, then this equation has no solution for λ ≥ 0. In both the feasible and
infeasible case the solution can be found in O(d log d). The algorithm is given in Algorithm 1.

8
Proof. The Lagrangian is given by
1 2 
L(δ, λ, α, β) = kδk2 + λ hv, δi − c + hα, x + δ − 1i − hβ, x + δi .
2
The KKT conditions become
δ + λv + α − β = 0
αr (xr + δr − 1) = 0, ∀r = 1, . . . , d
βr (xr + δr ) = 0, ∀r = 1, . . . , d
λ(hv, δi − c) = 0
αr ≥ 0, ∀r = 1, . . . , d
βr ≥ 0, ∀r = 1, . . . , d
λ ≥ 0.
We deduce that if βr > 0 then αr = 0 which implies
δr = −xr = −λvr + βr =⇒ βr = max{0, −xr + λvr }.
Similarly, if αr > 0 then βr = 0 which implies
δr = 1 − xr = −λvr − αr =⇒ αr = max{0, xr − 1 − λvr }.
It follows 
−λvr if − xr < −λvr < 1 − xr
δr = 1 − xr if − λvr > 1 − xr .
−xr if − λvr < −xr

We can determine λ by inspecting hv, δi which is given as


X X X
hv, δi = −λ vr2 + vr (1 − xr ) − vr xr .
−xr ≤−λvr ≤1−xr −λvr >1−xr −λvr <−xr
Note that λ ≥ 0 and 1 − xr ≥ 0 and thus −λvr > 1 − xr implies vr < 0, thus vr (1 − xr ) ≤ 0
and similarly −λvr < −xr implies vr > 0 and thus also −vr xr < 0. Note that the term
hv, δi is monotonically decreasing as λ is increasing. Thus one can sort max{ xrv−1
r
, xvrr } in
increasing order and they represent the thresholds when the summation changes. Then we
compute hv, δi for all of these thresholds and determine the largest threshold λ∗ such that
hv, δi ≤ c. This fixes the index sets of all sums. Then we determine λ by
P P
vr (1 − xr ) − vr x r − c
−λ∗ vr >1−xr −λ∗ vr <−xr
λ= P .
vr2
−xr ≤−λ∗ vr ≤1−xr
In total sorting takes time O(d log d) and solving for λ∗ has complexity O(d).

Next we consider the case p = 1.


minδ∈Rd kδk1 (10)
sbj. to: fj (x) − fc (x) ≥ h∇fc (x) − ∇fj (x), δi
0 ≤ xj + δj ≤ 1
Lemma 4.2. Let m = arg max fj (x) and define v = ∇fm (x) − ∇fj (x) and c = fj (x) −
j
fm (x) < 0, then the solution of the problem (10) can be found by Algorithm 2.

Proof. The result is basically obvious but we derive it formally. First of all we rewrite (10)
as a linear program.
Xd
min ti (11)
δ∈Rd
i=1
sbj. to: c ≥ hv, δi
− xj ≤ δ j ≤ 1 − xj
− ti ≤ δi ≤ ti
ti ≥ 0

9
Algorithm 1 Computation of box-constrained adversarial samples wrt to k·k2 -norm.
INPUT: c = fj (x) − fm (x) and v = ∇fm (x) − ∇fj (x) (j, desired class, m original class)
sort γr = max{ xrv−1
r
, xvrr } in increasing order π
s=0; ρ = 0
while ρ > c do
s←s+1
compute ρ = hv, δ(γπs )i, where δ(λ) is the function defined in Lemma 4.1
end while
if ρ ≤ c then
compute Im = {r | − xr ≤ −γπs−1 ) vr ≤ 1 − xr },
compute Iu = {r | − γπs−1 vr > 1 − xr },
P Il = {r | P
compute − γπs−1 vr < −xr }
vr (1−xr )− vr xr −c
r∈Iu r∈Il
λ= P
vr2
.
r∈Im

δ = max{−xr , min{−λvr , 1 − xr }}
else
Problem has no feasible solution
end if

Algorithm 2 Computation of box-constrained adversarial samples wrt to k·k1 -norm.


INPUT: c = fj (x) − fm (x) and v = ∇fm (x) − ∇fj (x) (j, desired class, m original class)
sort |vi | in decreasing order
s:=0, g:=0,
while g > c do
s←s+1
if vπs > 0 then
δπs = −xπs
else
δπs = 1 − xπs
end if
g ← g + δπs vπs
end while
if g ≤ c then
g ← g − vπs δπs
δπs ← c−gv πs
else
Problem has no feasible solution
end if

10
The Lagrangian of this problem is
L(δ, t, α, β, γ, θ, κ, λ) = ht, 1i + λ(hv, δi − c) − hα, δ + xi + hβ, δ − 1 + xi − hγ, δ + ti + hθ, δ − ti − hκ, ti
= ht, 1 − γ − θ − κi + hδ, β − α + λv + θ − γi + hβ − α, xi − hβ, 1i − λc
Minimization of the Lagrangian over t resp. δ leads only to a non-trivial result if
1−γ−θ−κ=0
β − α + λv + θ − γ = 0
We get the dual problem
max hβ − α, xi − hβ, 1i − λc (12)
α,β,θ,γ,κ,λ

sbj. to: 1 − γ − θ − κ = 0
β − α + λv + θ − γ = 0
α ≥ 0, β ≥ 0, θ ≥ 0, γ ≥ 0, κ ≥ 0, λ ≥ 0 (13)
Using the equalities we can now simplify the problem by replacing α and using the fact that
κ is not part of the objective, the positivity just induces an additional constraint. We get
α = β + λv + θ − γ.
Plugging this into the problem (16) we get
max − hλv + θ − γ, xi − hβ, 1i − λc (14)
β,θ,γ,λ

sbj. to: 1 − γ − θ ≥ 0
β + λv + θ − γ ≥ 0
β ≥ 0, θ ≥ 0, γ ≥ 0, λ ≥ 0 (15)
We get the constraint β ≥ max{0, −λv − θ + γ} (all the inequalities and functions are taken
here componentwise) and thus we can explicitly maximize over β
d
X
max − hλv + θ − γ, xi − max{0, −λvi − θi + γi } − λc (16)
θ,γ,λ
i=1
sbj. to: γ + θ ≤ 1
θ ≥ 0, γ ≥ 0, λ ≥ 0 (17)
As 0 ≤ xi ≤ 1 it holds for all θ ≥ 0, γ ≥ 0
d
X
h−λv − θ + γ, xi − max{0, −λvi − θi + γi } ≤ 0.
i=1

The maximum is attained if γi − θi − λvi = 0 resp. with the constraints on γi , θi this is


equivalent to −1 ≤ λvi ≤ 1. Suppose that λvi > 1 then the maximum is attained for γi = 1
and θi = 0, and for λvi < −1 the maximum is attained for γi = 0 and θi = 1. Thus by
solving explicitly for θ and γ we obtain
X X
max (1 − λi vi )xi + (1 − λi v)(xi − 1) − λc (18)
λ
λvi >1 λvi <−1
sbj. to: λ ≥ 0 (19)
Note that the first two terms are decreasing with λ and the last term is increasing with λ.
Let λ∗ be the optimum, then we have the following characterization
−1 < λvi < 1 =⇒ γi + θi < 1 =⇒ κi > 0 =⇒ ti = 0 =⇒ δi = 0
λvi > 1 =⇒ γi = 1, θi = 0, βi = 0, αi > 0 =⇒ δi = −xi ,
λvi < −1 =⇒ γi = 0, θi = 1, βi > 0 =⇒ δi = 1 − xi ,
The cases |λvi | = 1 are undetermined but given that λ > 0 the remaining values can be fixed
by solving for c = hv, δi. The time complexity is again determined by the initial sorting step
of O(d log d). The following linear scan requires O(d).

11
Finally, we consider the case p = ∞.
min kδk∞ (20)
δ∈Rd
sbj. to: fj (x) − fc (x) ≥ h∇fc (x) − ∇fj (x), δi
0 ≤ xj + δj ≤ 1
Lemma 4.3. Let m = arg max fj (x) and define v = ∇fm (x) − ∇fj (x) and c = fj (x) −
j
fm (x) < 0, then the solution of (20) can be found by solving for t ≥ 0,
X X
vi max{−t, −xr } + vi min{t, 1 − xr } = c.
vi >0 vi <0

which is done in Algorithm 3.

Proof. We can rewrite the optimization problem into a linear program


min t (21)
t∈R,δ∈Rd

sbj. to: c ≥ hv, δi


− xj ≤ δj ≤ 1 − xj , j = 1, . . . , d
− t ≤ δj ≤ t, j = 1, . . . , d
t≥0 (22)
Thus we have max{−t, −xr } ≤ δr ≤ min{t, 1 − xr } for r = 1, . . . , d. Then it holds
X X
hv, δi ≥ vr max{−t, −xr } + vr min{t, 1 − xr }
vr >0 vr <0
X X X X
=− vr x r − t vr + t vr + vr (1 − xr ),
vr >0,t≥xr vr >0,t<xr vr <0,t<1−xr vr <0,t≥1−xr

and the lower bound can be attained if δr = min{t, 1−xr } for vr < 0 and δr = max{−t, −xr }
for vr > 0, δr = 0 if vr = 0. Note that both terms are monotonically decreasing with t. The
algorithm 3 has complexity O(d log d) due to the initial sorting step followed by steps of
complexity O(d).

For nonlinear classifiers a change of the decision is not guaranteed and thus we use later on
a binary search with a variable c instead of fc (x) − fj (x).

5 Experiments
The goal of the experiments is the evaluation of the robustness of the resulting classifiers
and not necessarily state-of-the-art results in terms of test error. In all cases we compute the
robustness guarantees from Theorem 2.1 (lower bound on the norm of the minimal change
required to change the classifier decision), where we optimize over R using binary search,
and adversarial samples with the algorithm for the 2-norm from Section 4 (upper bound
on the norm of the minimal change required to change the classifier decision), where we do
a binary search in the classifier output difference in order to find a point on the decision
boundary. Additional experiments can be found in the supplementary material.

Kernel methods: We optimize the cross-entropy loss once with the standard regularization
(Kernel-LogReg) and with Cross-Lipschitz regularization (Kernel-CL). Both are convex
optimization problems and we use L-BFGS to solve them. We use the Gaussian kernel
2
k(x, y) = e−γkx−yk where γ = ρ2 α and ρKNN40 is the mean of the 40 nearest neighbor
KNN40
distances on the training set and α ∈ {0.5, 1, 2, 4}. We show the results for MNIST (60000
training and 10000 test samples). However, we have checked that parameter selection
using a subset of 50000 images from the training set and evaluating on the rest yields
indeed the parameters which give the best test errors when trained on the full set. The

12
Algorithm 3 Computation of box-constrained adversarial samples wrt to k·k∞ -norm.
INPUT: c = fj (x) − fm (x) and v = ∇fm (x) − ∇fj (x) (j, desired class, m original class)
d+ := |{l | vl > 0}| and d− := |{l | vl < 0}|
sort {xl | vl > 0} in increasingPorder π, sort {1P
− xl | vl < 0} in increasing order ρ
s:=1, r:=1, g:=0, t:=0, κ+ = vl >0 vl , κ− = vl <0 vl , γ+ = γ− = 0.
while g > c AND (s ≤ d+ OR t ≤ d− ) do
if xπs < 1 − xρr then
t = xπs , κ+ ← κ+ − vπs , γ+ ← γ+ − vπs xπs ,
s←s+1
else
t = 1 − xρr , κ− ← κ− − vρr , γ− ← γ− − vρr (1 − xρr ),
t←t+1
end if
g = γ+ + γ− + t (κ+ − κ− )
end while
if g ≤ c then
undo last step
t = (c − γ+ − γ− )/(κ− − κ+ )

min{t, 1 − xr } for vr > 0
compute δr =
max{−t, −xr } for vr < 0
else
Problem has no feasible solution
end if

regularization parameter is chosen in λ ∈ {10−k |k ∈ {5, 6, 7, 8}} for Kernel-SVM and


λ ∈ {10−k | k ∈ {0, 1, 2, 3}} for our Kernel-CL. The results of the optimal parameters are
given in the following table and the performance of all parameters is shown in Figure 1. Note
that due to the high computational complexity we could evaluate the robustness guarantees
only for the optimal parameters.

avg. k·k2 avg.k·k2


test adv. rob.
error samples guar.
No Reg. 2.23% 2.39 0.037
(λ = 0)
K-SVM 1.48% 1.91 0.058
K-CL 1.44% 3.12 0.045

Figure 1: Kernel Methods: Cross-Lipschitz regularization achieves both better test error and robustness against
adversarial samples (upper bounds, larger is better) compared to the standard regularization. The robustness
guarantee is weaker than for neural networks but this is most likely due to the relatively loose bound.

Neural Networks: Before we demonstrate how upper and lower bounds improve using
cross-Lipschitz regularization, we first want to highlight the importance of the usage of the
local cross-Lipschitz constant in Theorem 2.1 for our robustness guarantee.

Local versus global Cross-Lipschitz constant: While no robustness guarantee has


been proven before, it has been discussed in [24] that penalization of the global Lipschitz
constant should improve robustness, see also [4]. For that purpose they derive the Lipschitz
constants of several different layers and use the fact that the Lipschitz constant of a
composition of functions is upper bounded by the product of the Lipschitz constants of
the functions. In analogy, this would mean that the term supy∈B(x,R) k∇fc (y) − ∇fj (y)k2 ,
which we have upper bounded in Proposition 2.2, in the denominator in Theorem 2.1 could
be replaced2 by the global Lipschitz constant of g(x) := fc (x) − fj (x). which is given as
supy∈Rd k∇g(x)k2 = supx6=y |g(x)−g(y)|
kx−yk . We have with kU k2,2 being the largest singular value
2

2
Note that then the optimization of R in Theorem 2.1 would be unnecessary.

13
MNIST (plain) CIFAR10 (plain)
None Dropout Weight Dec. Cross Lip. None Dropout Weight Dec. Cross Lip.
0.69 0.48 0.68 0.21 0.22 0.13 0.24 0.17
α
Table 1: We show the average ratio αglobal of the robustness guarantees αglobal , αlocal from Theorem 2.1 on
local
the test data for MNIST and CIFAR10 and different regularizers. The guarantees using the local Cross-Lipschitz
constant are up to eight times better than with the global one.

of U ,
|g(x) − g(y)| = hwc − wj , σ(U x) − σ(U y)i ≤ kwc − wj k2 kσ(U x) − σ(U y)k2
≤ kwc − wj k2 kU (x − y)k2 ≤ kwc − wj k2 kU k2,2 kx − yk2 ,

where we used that σ is contractive as σ 0 (z) = 1


1+e−αz and thus we get
sup k∇fc (x) − ∇fj (x)k2 ≤ kwc − wj k2 kU k2,2 .
y∈Rd

The advantage is clearly that this global Cross-Lipschitz constant can just be computed
once and by using it in Theorem 2.1 one can evaluate the guarantees very quickly. However,
it turns out that one gets significantly better robustness guarantees by using the local
Cross-Lipschitz constant in terms of the bound derived in Proposition 2.2 instead of the just
derived global Lipschitz constant. Note that the optimization over R in Theorem 2.1 is done
using a binary search, noting that the bound of the local Lipschitz constant in Proposition
2.2 is monotonically decreasing in R. We have the following comparison in Table 1. We
want to highlight that the robustness guarantee with the global Cross-Lipschitz constant
was always worse than when using the local Cross-Lipschitz constant across all regularizers
and data sets. Table 1 shows that the guarantees using the local Cross-Lipschitz can be up
to eight times better than for the global one. As these are just one hidden layer networks, it
is obvious that robustness guarantees for deep neural networks based on the global Lipschitz
constants will be too coarse to be useful.

Experiments: We use a one hidden layer network with 1024 hidden units and the softplus
activation function with α = 10. Thus the resulting classifier is continuously differentiable.
We compare three different regularization techniques: weight decay, dropout and our Cross-
Lipschitz regularization. Training is done with SGD. For each method we have adapted the
learning rate (two per method) and regularization parameters (4 per method) so that all
methods achieve good performance. We do experiments for MNIST and CIFAR10 in three
settings: plain, data augmentation and adversarial training. The exact settings of the param-
eters and the augmentation techniques are described below.The results for MNIST are shown
in Figure 2 and the results for CIFAR10 are in Figure 3.For MNIST there is a clear trend that
our Cross-Lipschitz regularization improves the robustness of the resulting classifier while
having competitive resp. better test error. It is surprising that data augmentation does not
lead to more robust models. However, adversarial training improves the guarantees as well as
adversarial resistance. For CIFAR10 the picture is mixed, our CL-Regularization performs
well for the augmented task in test error and upper bounds but is not significantly better in
the robustness guarantees. The problem might be that the overall bad performance due to the
simple model is preventing a better behavior. Data augmentation leads to better test error
but the robustness properties (upper and lower bounds) are basically unchanged. Adversarial
training slightly improves performance compared to the plain setting and improves upper
and lower bounds in terms of robustness. We want to highlight that our guarantees (lower
bounds) and the upper bounds from the adversarial samples are not too far away.
For MNIST (all settings) the learning rate is for all methods chosen from {0.2, 0.5}.
The regularization parameters for weight decay are chosen from {10−5 , 10−4 , 10−3 , 10−2 },
for Cross-Lipschitz from {10−5 , 10−4 , 5 ∗ 10−4 , 10−3 } and the dropout probabilities are
taken from {0.4, 0.5, 0.6, 0.7}. For CIFAR10 the learning rate is for all methods chosen
from {0.04, 0.1}, the regularization parameters for weight decay and Cross-Lipschitz are
{10−5 , 10−4 , 5 ∗ 10−4 , 10−3 } and dropout probabilities are taken from {0.5, 0.6, 0.7, 0.8}.
For CIFAR10 with data augmentation we choose the learning rate for all methods from
{0.04, 0.1}, the regularization parameters for weight decay are {10−6 , 10−5 , 10−4 , 10−3 } and

14
Adversarial Resistance (Upper Bound) Robustness Guarantee (Lower Bound)
wrt to L2 -norm wrt to L2 -norm

Figure 2: Neural Networks, Left: Adversarial resistance wrt to L2 -norm on MNIST. Right: Average ro-
bustness guarantee wrt to L2 -norm on MNIST for different neural networks (one hidden layer, 1024 HU) and
hyperparameters. The Cross-Lipschitz regularization leads to better robustness with similar or better prediction
performance. Top row: plain MNIST, Middle: Data Augmentation, Bottom: Adv. Training

for Cross-Lipschitz {10−5 , 10−4 , 5 ∗ 10−4 , 10−3 } and the dropout probabilities are taken from
{0.5, 0.6, 0.7, 0.8}. Data augmentation for MNIST means that we apply random rotations in
π π
the angle [− 20 , 20 ] and random crop from 28x28 to 24x24. For CIFAR-10 we apply the same
and additionally we mirror the image (left to right) with probability 0.5 and apply random
brightness [−0.1, 0.1] and random contrast change [0.6, 1.4]. In each substep we ensure that
we get an image in [0, 1]d by clipping. We implemented adversarial training by generating
adversarial samples wrt to the infinity norm with the code from Section 4 and replaced 50%
of each batch as adversarial samples. Finally, we use for SGD batchsize 64 in all experiments.

Illustration of adversarial samples: we take one test image from MNIST and apply the
adversarial generation from Section 4 wrt to the 2-norm to generate the adversarial samples for
the different kernel methods and neural networks (plain setting), where we use for each method
the parameters leading to best test performance. All classifiers change their originally correct
decision to a “wrong” one. It is interesting to note that for Cross-Lipschitz regularization
(both kernel method and neural network) the “adversarial” sample is really at the decision
boundary between 1 and 8 (as predicted) and thus the new decision is actually correct.
This effect is strongest for our Kernel-CL, which also requires the strongest modification to
generate the adversarial sample. The situation is different for neural networks, where the
classifiers obtained from the two standard regularization techniques are still vulnerable, as
the adversarial sample is still clearly a 1 for dropout and weight decay. We show further
examples below.

15
Adversarial Resistance (Upper Bound) Robustness Guarantee (Lower Bound)
wrt to L2 -norm wrt to L2 -norm

Figure 3: Left: Adversarial resistance wrt to L2 -norm on test set of CIFAR10. Right: Average robustness
guarantee on the test set wrt to L2 -norm for the test set of CIFAR10 for different neural networks (one hidden
layer, 1024 HU) and hyperparameters. While Cross-Lipschitz regularization yields good test errors, the guarantees
are not necessarily stronger. Top row: CIFAR10 (plain), Middle: CIFAR10 trained with data augmentation,
Bottom: Adversarial Training.

Original, Class 6 K-SVM, Pred:0, kδk2 = 3.0 K-CL, Pred:0, kδk2 = 4.7

NN-WD, Pred:4, kδk2 = 1.4 NN-DO, Pred:4, kδk2 = 1.9 NN-CL, Pred:0, kδk2 = 3.2

Original, Class 1 K-SVM, Pred:7, kδk2 = 1.2 K-CL, Pred:8, kδk2 = 3.5

16
NN-WD, Pred:8, kδk2 = 1.2 NN-DO, Pred:7, kδk2 = 1.1 NN-CL, Pred:8, kδk2 = 2.6
Figure 4: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for Cross-Lipschitz regularization this new decision
makes (often) sense, whereas for the neural network models (weight decay/dropout) the change is so small that
Figure 5: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.

Original, Class 4 K-SVM, Pred:9, kδk2 = 1.4 K-CL, Pred:9, kδk2 = 2.2

NN-WD, Pred:9, kδk2 = 1.3 NN-DO, Pred:9, kδk2 = 1.5 NN-CL, Pred:9, kδk2 = 2.2
Figure 6: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.

Original, Class 2 K-SVM, Pred:3, kδk2 = 2.5 K-CL, Pred:3, kδk2 = 4.4

NN-WD, Pred:3, kδk2 = 1.1 NN-DO, Pred:3, kδk2 = 1.4 NN-CL, Pred:3, kδk2 = 2.0
Figure 7: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.

Original, Class 8 K-SVM, Pred:3, kδk2 = 2.2 K-CL, Pred:5, kδk2 = 4.2

NN-WD, Pred:3, kδk2 = 1.4 NN-DO, Pred:5, kδk2 = 1.6 NN-CL, Pred:3, kδk2 = 2.8
Figure 8: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.

17
Original, Class 1 K-SVM, Pred:8, kδk2 = 1.2 K-CL, Pred:2, kδk2 = 3.7

NN-WD, Pred:2, kδk2 = 1.1 NN-DO, Pred:2, kδk2 = 1.1 NN-CL, Pred:8, kδk2 = 2.7
Figure 9: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.

Original, Class 3 K-SVM, Pred:8, kδk2 = 2.1 K-CL, Pred:8, kδk2 = 3.3

NN-WD, Pred:8, kδk2 = 1.7 NN-DO, Pred:8, kδk2 = 1.4 NN-CL, Pred:5, kδk2 = 3.2
Figure 10: Top left: original test image, for each classifier we generate the corresponding adversarial sample
which changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes
sense, whereas for all neural network models the change is so small that the new decision is clearly wrong.

Original, Class 8 K-SVM, Pred:9, kδk2 = 2.1 K-CL, Pred:5, kδk2 = 2.6

NN-WD, Pred:9, kδk2 = 1.4 NN-DO, Pred:9, kδk2 = 1.8 NN-CL, Pred:9, kδk2 = 1.8
Figure 11: Top left: original test image, for each classifier we generate the corresponding adversarial sample
which changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes
sense, whereas for all neural network models the change is so small that the new decision is clearly wrong.

18
Adversarial Resistance (Upper Bound) Robustness Guarantee (Lower Bound)
wrt to L2 -norm wrt to L2 -norm

Figure 13: Left: Adversarial resistance wrt to L2 -norm on test set of the german traffic sign benchmark (GTSB)
in the plain setting. Right: Average robustness guarantee on the test set wrt to L2 -norm for the test set of GTSB
for different neural networks (one hidden layer, 1024 HU) and hyperparameters. Here dropout performs very well
both in terms of performance and robustness.

Original, Class 5 K-SVM, Pred:9, kδk2 = 1.5 K-CL, Pred:9, kδk2 = 2.2

NN-WD, Pred:9, kδk2 = 1.1 NN-DO, Pred:9, kδk2 = 1.0 NN-CL, Pred:9, kδk2 = 1.4
Figure 12: Top left: original test image, for each classifier we generate the corresponding adversarial sample
which changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes
sense, whereas for all neural network models the change is so small that the new decision is clearly wrong.

German Traffic Sign Benchmark: As a third dataset we used the German Traffic Sign
Benchmark (GTSB) [23], which consists of images of german traffic signs, which has 43
classes with 34209 training and 12630 test samples. The results are shown in Figure 13. For
this dataset Cross-Lipschitz regularization improves the upper bounds compared to weight
decay but dropout achieves significantly better prediction performance and has similar upper
bounds. The robustness guarantees for weight decay and Cross-Lipschitz are slightly better
than for dropout.

Residual Networks: All experiments so far were done with one hidden layer neural
networks so that we can evaluate lower and upper bounds. Now we want to demonstrate
that Cross-Lipschitz regularization can also successfully be used for deep networks. We
use residual networks proposed in [9] with 32 parameter layers and non-bottleneck residual
blocks. We follow basically their setting, apart from that we did not subtract the per-pixel
mean so that all images are in [0, 1]d and use random crop but without any padding as in [9].
Similar to [9], we train for 160 epochs, and the learning rate is divided by 10 on the 115-th
and 140-th epochs. For the experiments with dropout we followed the recommendation of
[25], inserting a dropout layer between convolutional layers inside each residual block. For
Cross-Lipschitz regularization we use automatic differentiation in TensorFlow [1] to calculate
the derivative with respect to the input, which slows done the training by a factor of 10.
For the plain setting the learning rate for all methods is chosen from {0.2, 0.5}, except for the
runs without regularization, for which it is from {0.08, 0.1, 0.2, 0.4, 0.6, 0.8}. For weight decay
the regularization parameter is chosen from {10−5 , 10−4 , 10−3 , 10−2 }, for Cross-Lipschitz
from {10−4 , 10−3 , 10−2 , 10−1 }, and for dropout the probabilities are from {0.5, 0.6, 0.7, 0.8}.
For the data augmentation setting the only difference was in the higher learning rates: no

19
Adversarial Resistance (Upper Bound) Adversarial Resistance (Upper Bound)
wrt to L2 -norm (ResNets) wrt to L2 -norm (ResNets)

Figure 14: Results on CIFAR10 for a residual network with different regularizers. As we only have lower
bounds for one hidden layer networks, we can only show upper bounds for adversarial resistance. Left: with data
augmentation similar to [25] Right: plain setting

regularization - {0.2, 0.5, 0.8, 1.0, 1.5, 2.0, 3.0, 4.0}, weight decay - {0.1, 0.4}, Cross-Lipschitz
- {0.2, 1.0}. The results are shown in Figure 14. Cross-Lipschitz regularization improves
the upper bounds on the robustness against adversarial manipulation compare to weight
decay and dropout by a factor of 2 to 3 both in the plain setting (right) and with data
augmentation (left). This comes at a price of a slightly worse test performance. However,
it shows that Cross-Lipschitz regularization is also effective for deep neural networks. It
remains interesting future work to come up also with interesting instance-specific lower
bounds (robustness guarantees) for deep neural networks.

Outlook Formal guarantees on machine learning systems are becoming increasingly more
important as they are used in safety-critical systems. We think that there should be more
research on robustness guarantees (lower bounds), whereas current research is focused on
new attacks (upper bounds). We have argued that our instance-specific guarantees using our
local Cross-Lipschitz constant is more effective than using a global one and leads to lower
bounds which are up to 8 times better. A major open problem is to come up with tight
lower bounds for deep networks.

References
[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,
J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia,
R. Józefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. G.
Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. A. Tucker,
V. Vanhoucke, V. Vasudevan, F. B. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke,
Y. Yu, and X. Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed
systems, 2016.
[2] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi. Measuring
neural net robustness with constraints. In NIPS, 2016.
[3] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten
detection methods. In ACM Workshop on Artificial Intelligence and Security, 2017.
[4] M. Cisse, P. Bojanowksi, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: Improving
robustness to adversarial examples. In ICML, 2017.
[5] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In KDD,
2004.
[6] H. Drucker and Y. Le Cun. Double backpropagation increasing generalization performance. In
IJCNN, 1992.
[7] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.
In ICLR, 2015.
[8] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples.
In ICLR Workshop, 2015.

20
[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR,
pages 770–778, 2016.
[10] D. P. Helmbold and P. Long. On the inductive bias of dropout. Journal of Machine Learning
Research, 16:3403–3454, 2015.
[11] S. Hochreiter and J. Schmidhuber. Simplifying neural nets by discovering flat minima. In NIPS,
1995.
[12] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. Learning with a strong adversary. In
ICLR, 2016.
[13] J. Kos, I. Fischer, and D. Song. Adversarial examples for generative models. In ICLR Workshop,
2017.
[14] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In
ICLR Workshop, 2017.
[15] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and
black-box attacks. In ICLR, 2017.
[16] D. Lowd and C. Meek. Adversarial learning. In KDD, 2005.
[17] S.M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations.
In CVPR, 2017.
[18] N. Papernot, P. McDonald, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial
perturbations against deep networks. In IEEE Symposium on Security & Privacy, 2016.
[19] P. Frossard S.-M. Moosavi-Dezfooli, A. Fawzi. Deepfool: a simple and accurate method to fool
deep neural networks. In CVPR, pages 2574–2582, 2016.
[20] B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
[21] U. Shaham, Y. Yamada, and S. Negahban. Understanding adversarial training: Increasing local
stability of neural nets through robust optimization. In NIPS, 2016.
[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A
simple way to prevent neural networks from overfitting. Journal of Machine Learning Research,
15:1929–1958, 2014.
[23] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. computer: Benchmarking machine
learning algorithms for traffic sign recognition. Neural Networks, 32:323–332, 2012.
[24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.
Intriguing properties of neural networks. In ICLR, pages 2503–2511, 2014.
[25] S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC, pages 87.1–87.12.
[26] S. Zheng, Y. Song, T. Leung, and I. J. Goodfellow. Improving the robustness of deep neural
networks via stability training. In CVPR, 2016.

21

You might also like