(2017) Formal Guarantees On The Robustness of A Classifier Against Adversarial Manipulation
(2017) Formal Guarantees On The Robustness of A Classifier Against Adversarial Manipulation
Abstract
Recent work has shown that state-of-the-art classifiers are quite brittle,
in the sense that a small adversarial change of an originally with high
confidence correctly classified input leads to a wrong classification again
with high confidence. This raises concerns that such classifiers are vulnerable
to attacks and calls into question their usage in safety-critical systems. We
show in this paper for the first time formal guarantees on the robustness
of a classifier by giving instance-specific lower bounds on the norm of the
input manipulation required to change the classifier decision. Based on
this analysis we propose the Cross-Lipschitz regularization functional. We
show that using this form of regularization in kernel methods resp. neural
networks improves the robustness of the classifier with no or small loss in
prediction performance.
1 Introduction
The problem of adversarial manipulation of classifiers has been addressed initially in the area
of spam email detection, see e.g. [5, 16]. The goal of the spammer is to manipulate the spam
email (the input of the classifier) in such a way that it is not detected by the classifier. In deep
learning the problem was brought up in the seminal paper by [24]. They showed for state-of-
the-art deep neural networks, that one can manipulate an originally correctly classified input
image with a non-perceivable small transformation so that the classifier now misclassifies
this image with high confidence, see [7] or Figure 4 for an illustration. This property calls
into question the usage of neural networks and other classifiers showing this behavior in
safety critical systems, as they are vulnerable to attacks. On the other hand this also shows
that the concepts learned by a classifier are still quite far away from the visual perception
of humans. Subsequent research has found fast ways to generate adversarial samples with
high probability [7, 12, 19] and suggested to use them during training as a form of data
augmentation to gain more robustness. However, it turns out that the so-called adversarial
training does not settle the problem as one can yet again construct adversarial examples
for the final classifier. Interestingly, it has recently been shown that there exist universal
adversarial changes which when applied lead, for every image, to a wrong classification with
high probability [17]. While one needs access to the neural network model for the generation
of adversarial changes, it has been shown that adversarial manipulations generalize across
neural networks [18, 15, 14], which means that neural network classifiers can be attacked
even as a black-box method. The most extreme case has been shown recently [15], where
they attack the commercial system Clarifai, which is a black-box system as neither the
underlying classifier nor the training data are known. Nevertheless, they could successfully
generate adversarial images with an existing network and fool this commercial system. This
emphasizes that there are indeed severe security issues with modern neural networks. While
countermeasures have been proposed [8, 7, 26, 18, 12, 2], none of them provides a guarantee
31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
of preventing this behavior [3]. One might think that generative adversarial neural networks
should be resistant to this problem, but it has recently been shown [13] that they can also
be attacked by adversarial manipulation of input images.
In this paper we show for the first time instance-specific formal guarantees on the robustness
of a classifier against adversarial manipulation. That means we provide lower bounds on the
norm of the change of the input required to alter the classifier decision or said otherwise: we
provide a guarantee that the classifier decision does not change in a certain ball around the
considered instance. We exemplify our technique for two widely used family of classifiers:
kernel methods and neural networks. Based on the analysis we propose a new regularization
functional, which we call Cross-Lipschitz Regularization. This regularization functional
can be used in kernel methods and neural networks. We show that using Cross-Lipschitz
regularization improves both the formal guarantees of the resulting classifier (lower bounds)
as well as the change required for adversarial manipulation (upper bounds) while maintaining
similar prediction performance achievable with other forms of regularization. While there
exist fast ways to generate adversarial samples [7, 12, 19] without constraints, we provide
algorithms based on the first order approximation of the classifier which generate adversarial
samples satisfying box constraints in O(d log d), where d is the input dimension.
where C is a constraint set specifying certain requirements on the generated input x + δ, e.g.,
an image has to be in [0, 1]d . Typically, the optimization problem (1) is non-convex and thus
intractable. The so generated points x + δ are called adversarial samples. Depending on
the p-norm the perturbations have different characteristics: for p = ∞ the perturbations are
small and affect all features, whereas for p = 1 one gets sparse solutions up to the extreme
case that only a single feature is changed. In [24] they used p = 2 which leads to more spread
but still localized perturbations. The striking result of [24, 7] was that for most instances
in computer vision datasets, the change δ necessary to alter the decision is astonishingly
small and thus clearly the label should not change. However, we will see later that our new
regularizer leads to robust classifiers in the sense that the required adversarial change is so
large that now also the class label changes (we have found the correct decision boundary),
see Fig 4. Already in [24] it is suggested to add the generated adversarial samples as a form
of data augmentation during the training of neural networks in order to achieve robustness.
This is denoted as adversarial training. Later on fast ways to approximately solve (1) were
proposed in order to speed up the adversarial training process [7, 12, 19]. However, in this
way, given that the approximation is successful, that is arg max fj (x + δ) 6= c, one gets just
j
upper bounds on the perturbation necessary to change the classifier decision. Also it was
noted early on, that the final classifier achieved by adversarial training is again vulnerable
to adversarial samples [7]. Robust optimization has been suggested as a measure against
adversarial manipulation [12, 21] which effectively boils down to adversarial training in
practice. It is thus fair to say that up to date no mechanism exists which prevents the
generation of adversarial samples nor can defend against it [3].
In this paper we focus instead on robustness guarantees, that is we show that the classifier
decision does not change in a small ball around the instance. Thus our guarantees hold for
any method to generate adversarial samples or input transformations due to noise or sensor
failure etc. Such formal guarantees are in our point of view absolutely necessary when a
classifier becomes part of a safety-critical technical system such as autonomous driving. In
the following we will first show how one can achieve such a guarantee and then explicitly
2
derive bounds for kernel methods and neural networks. We think that such formal guarantees
on robustness should be investigated further and it should become standard to report them
for different classifiers alongside the usual performance measures.
The following guarantee holds for any classifier which is continuously differentiable with
respect to the input in each output component. It is instance-specific and depends to some
extent on the confidence in the decision, at least if we measure confidence by the relative
difference fc (x) − maxj6=c fj (x) as it is typical for the cross-entropy loss and other multi-class
losses. In the following we use the notation Bp (x, R) = {y ∈ Rd | kx − ykp ≤ R}.
Theorem 2.1. Let x ∈ Rd and f : Rd → RK be a multi-class classifier with continuously
differentiable components and let c = arg max fj (x) be the class which f predicts for x. Let
j=1,...,K
1 1
q ∈ R be defined as p= 1, then for all δ ∈ Rd with
+ q
fc (x) − fj (x)
kδkp ≤ max min min , R := α,
R>0 j6=c max k∇fc (y) − ∇fj (y)kq
y∈Bp (x,R)
it holds c = arg max fj (x + δ), that is the classifier decision does not change on Bp (x, α).
j=1,...,K
where the first inequality holds as fc (x) ≥ fj (x) for all j = 1, . . . , K and in the last step we
have used Hölder inequality together with the fact that the q-norm is dual to the p-norm,
where q is defined via p1 + 1q = 1. Thus the minimal norm of the change δ required to change
the classifier decision from c to j satisfies
fc (x) − fj (x)
kδkp ≥ R 1 .
0
k∇fj (x + tδ) − ∇fc (x + tδ)kq dt
We upper bound the denominator over some fixed ball Bp (x, R). Note that by doing this,
we can only make assertions of perturbations δ ∈ Bp (0, R) and thus the upper bound in the
guarantee is at most R. It holds
Z 1
sup k∇fj (x + tδ) − ∇fc (x + tδ)kq dt ≤ maxy∈Bp (x,R) k∇fj (y) − ∇fc (y)kq .
δ∈Bp (0,R) 0
Thus we get the lower bound for the minimal norm of the change δ required to change the
classifier decision from c to j,
( )
fc (x) − fj (x)
kδkp ≥ min R, := α.
maxy∈Bp (x,R) k∇fj (y) − ∇fc (y)kq
As we are interested in the worst case, we take the minimum over all j 6= c. Finally, the result
holds for any fixed R > 0 so that we can maximize over R which yields the final result.
3
Note that the bound requires in the denominator a bound on the local Lipschitz constant
of all cross terms fc − fj , which we call local cross-Lipschitz constant in the following.
However, we do not require to have a global bound. The problem with a global bound is
that the ideal robust classifier is basically piecewise constant on larger regions with sharp
transitions between the classes. However, the global Lipschitz constant would then just be
influenced by the sharp transition zones and would not yield a good bound, whereas the
local bound can adapt to regions where the classifier is approximately constant and then
yields good guarantees. In [24, 4] they suggest to study the global Lipschitz constant1 of
each fj , j = 1, . . . , K. A small global Lipschitz constant for all fj implies a good bound as
k∇fj (y) − ∇fc (y)kq ≤ k∇fj (y)kq + k∇fc (y)kq , (2)
but the converse does not hold. As discussed below it turns out that our local estimates are
significantly better than the suggested global estimates which implies also better robustness
guarantees. In turn we want to emphasize that our bound is tight, that is the bound is
attained, for linear classifiers fj (x) = hwj , xi, j = 1, . . . , K. It holds
hwc − wj , xi
kδkp = min .
j6=c kwc − wj kq
In Section 4 we refine this result for the case when the input is constrained to [0, 1]d . In
general, it is possible to integrate constraints on the input by simply doing the maximum
over the intersection of Bp (x, R) with the constraint set e.g. [0, 1]d for gray-scale images.
where (xr )nr=1 are the n training points, k : Rd × Rd → R is a positive definite kernel function
and α ∈ RK×n are the trained parameters e.g. of a SVM. The goal is to upper bound the
term maxy∈B2 (x,R) k∇fj (y) − ∇fc (y)k2 for this classifier model. A simple calculation shows
n
X
2
0 ≤ k∇fj (y) − ∇fc (y)k2 = (αjr − αcr )(αjs − αcs ) h∇y k(xr , y), ∇y k(xs , y)i (3)
r,s=1
It has been reported that kernel methods with a Gaussian kernel are robust to noise. Thus
2
we specialize now to this class, that is k(x, y) = e−γkx−yk2 . In this case
2 2
h∇y k(xr , y), ∇y k(xs , y)i = 4γ 2 hy − xr , y − xs i e−γkxr −yk2 e−γkxs −yk2 .
We will now derive lower and upper bounds on this term uniformly over B2 (x, R) which
allows us to derive the guarantee.
n o
k2x−xr −xs k2
Lemma 2.1. Let M = min 2 , R , then
max hy − xr , y − xs i = hx − xr , x − xs i + R k2x − xr − xs k2 + R2
y∈B2 (x,R)
min hy − xr , y − xs i = hx − xr , x − xs i − M k2x − xr − xs k2 + M 2
y∈B2 (x,R)
2 2 −γ kx−xr k22 +kx−xs k22 −2M k2x−xr −xs k2 +2M 2
max e−γkxr −yk2 e−γkxs −yk2 = e
y∈B2 (x,R)
−γkxr −yk22 −γkxs −yk22 −γ kx−xr k22 +kx−xs k22 +2Rk2x−xr −xs k2 +2R2
min e e =e
y∈B2 (x,R)
1
The Lipschitz constant L wrt to p-norm of a piecewise continuously differentiable function is
given as L = supx∈Rd k∇f (x)kq . Then it holds, |f (x) − f (y)| ≤ L kx − ykp .
4
Proof. For the first part we use
maxy∈B2 (x,R) hy − xr , y − xs i = maxh∈B2 (0,R) hx − xr + h, x − xs + hi
2
= hx − xr , x − xs i + maxh∈B2 (0,R) hh, 2x − xr − xs i + khk2
= hx − xr , x − xs i + R k2x − xr − xs k2 + R2 ,
where the last equality follows by Cauchy-Schwarz and noting that equality is attained as
we maximize over the Euclidean unit ball. For the second part we consider
miny∈B2 (x,R) hy − xr , y − xs i = minh∈B2 (0,R) hx − xr + h, x − xs + hi
2
= hx − xr , x − xs i + minh∈B2 (0,R) hh, 2x − xr − xs i + khk2
= hx − xr , x − xs i + min0≤α≤R −α k2x − xr − xs k2 + α2
n k2x − x − x k o
r s 2
= hx − xr , x − xs i − min , R k2x − xr − xs k2
2
n k2x − x − x k 2
r s 2
+ min , R} ,
2
where in the second step we have separated direction and norm of the vector, optimization
over the direction yields with Cauchy-Schwarz the result. Finally, the constrained convex
k2x−xr −xs k2
one-dimensional optimization problem can be solved explicitly as α = min{ 2 , R}.
The proof of the other results follows analogously noting that
2 2 −γ kx−xr k22 +kx−xs k22 +2hh,2x−xr −xs i+2khk22
e−γkx+h−xr k2 e−γkx+h−xs k2 = e .
Proof. We bound each term in the sum in Equation 3 separately using that ac ≤ bd if b, c ≥ 0,
a ≤ b and c ≤ d or b ≤ 0, d ≥ 0, a ≤ b and c ≥ d, where c, d correspond to the exponential
terms and a, b to the upper bounds of the inner product. Similarly, ab ≥ cd if b, c ≥ 0, a ≥ b
and c ≥ d or a ≤ 0, d ≥ 0, a ≥ b and c ≤ d. The individual upper and lower bounds are
taken from Lemma 2.1.
While the bound leads to non-trivial estimates as seen in Section 5, the bound is not very
tight. The reason is that the sum is bounded elementwise, which is quite pessimistic. We
think that better bounds are possible but have to postpone this to future work.
5
2.3 Evaluation of the Bound for Neural Networks
We derive the bound for a neural network with one hidden layer. In principle, the technique
we apply below can be used for arbitrary layers but the computational complexity increases
rapidly. The problem is that in the directed network topology one has to consider almost
each path separately to derive the bound. Let U be the number of hidden units and w, u
are the weight matrices of the output resp. input layer. We assume that the activation
function σ is continuously differentiable and assume that the derivative σ 0 is monotonically
increasing. Our prototype activation function we have in mind and which we use later
on in the experiment is the differentiable approximation, σα (x) = α1 log(1 + eαx ) of the
ReLU activation function σReLU (x) = max{0, x}. Note that limα→∞ σα (x) = σReLU (x) and
σα0 (x) = 1+e1−αx . The output of the neural network can be written as
U
X d
X
fj (x) = wjr σ urs xs , j = 1, . . . , K,
r=1 s=1
where for simplicity we omit any bias terms, but it is straightforward to consider also models
with bias. A direct computation shows that
U
X d
X
2
k∇fj (y) − ∇fc (y)k2 = (wjr − wcr )(wjm − wcm )σ 0 (hur , yi)σ 0 (hum , yi) url uml , (4)
r,m=1 l=1
where ur ∈ Rd is the r-th row of the weight matrix u ∈ RU ×d . The resulting bound is given
in the following proposition.
Proposition 2.2. Let σ be a continuously differentiable activation function with σ 0 mono-
Pd
tonically increasing. Define βrm = (wjr − wcr )(wjm − wcm ) l=1 url uml . Then
maxy∈B2 (x,R) k∇fj (y) − ∇fc (y)k2
h XU
max{βrm , 0}σ 0 hur , xi + R kur k2 σ 0 hum , xi + R kum k2
≤
r,m=1
i 12
+ min{βrm , 0}σ 0 hur , xi − R kur k2 σ 0 hum , xi − R kum k2
Proof. The proof is based on the fact that due the monotonicity of σ 0 and with Cauchy-
Schwarz,
maxy∈B2 (x,R) σ 0 (hur , yi) = maxh∈B2 (0,R) σ 0 (hur , xi + hur , hi) = σ 0 (hur , xi + R kur k2 ).
Similarly, one gets
miny∈B2 (x,R) σ 0 (hur , yi) = minh∈B2 (0,R) σ 0 (hur , xi + hur , hi) = σ 0 (hur , xi − R kur k2 ).
The rest of the result follows by element-wise bounding the terms in the sum in Equation
4.
As discussed above the global Lipschitz bounds of the individual classifier outputs, see (2),
lead to an upper bound of our desired local cross-Lipschitz constant. In the experiments
below our local bounds on the Lipschitz constant are up to 8 times smaller, than what one
would achieve via the global Lipschitz bounds of [24]. This shows that their global approach
is much too rough to get meaningful robustness guarantees.
is small and fc (x) − fj (x) is large, then we get good robustness guarantees. The latter
property is typically already optimized in a multi-class loss function. We consider for all
6
methods in this paper the cross-entropy loss so that the differences in the results only come
from the chosen function class (kernel methods versus neural networks) and the chosen
regularization functional. The cross-entropy loss L : {1, . . . , K} × RK → R is given as
efy (x) XK
L(y, f (x)) = − log PK = log 1 + efk (x)−fy (x) .
fk (x)
k=1 e k6=y
In the latter formulation it becomes apparent that the loss tries to make the difference
fy (x) − fk (x) as large as possible for all k = 1, . . . , K.
As our goal are good robustness guarantees it is natural to consider a proxy of the quantity
in (5) for regularization. We define the Cross-Lipschitz Regularization functional as
n K
1 X X 2
Ω(f ) = k∇fl (xi ) − ∇fm (xi )k2 , (6)
nK 2 i=1
l,m=1
where the (xi )ni=1 are the training points. The goal of this regularization functional is to
make the differences of the classifier functions at the data points as constant as possible. In
total by minimizing
n
1X
L yi , f (xi ) + λΩ(f ), (7)
n i=1
over some function class we thus try to maximize fc (xi ) − fj (xi ) and at the same time
2
keep k∇fl (xi ) − ∇fm (xi )k2 small uniformly over all classes. This automatically enforces
robustness of the resulting classifier. It is important to note that this regularization functional
is coherent with the loss as it shares the same degrees of freedom, that is adding the same
function g to all outputs: fj0 (x) = fj (x) + g(x) leaves loss and regularization functional
invariant. This is the main difference to [4], where they enforce the global Lipschitz constant
to be smaller than one.
The standard way to regularize neural networks is weight decay; that is, the squared
Euclidean norm of all weights is added to the objective. More recently dropout [22], which
can be seen as a form of stochastic regularization, has been introduced. Dropout can
also be interpreted as a form of regularization of the weights [22, 10]. It is interesting
to note that classical regularization functionals which penalize derivatives of the resulting
classifier function are not typically used in deep learning, but see [6, 11]. As noted above
we restrict ourselves to one hidden layer neural networks to simplify notation, that is,
PU Pd
fj (x) = r=1 wjr σ s=1 urs xs , j = 1, . . . , K. Then we can write the Cross-Lipschitz
7
regularization as
U K K K n d
2 X X X X X
0 0
X
Ω(f ) = w w
lr ls − w lr wms σ (hur , xi i)σ (hus , xi i) url usl
nK 2 r,s=1 m=1 i,j=1
l=1 l=1 l=1
which leads to an expression which can be fast evaluated using vectorization. Obviously, one
can also implement the Cross-Lipschitz Regularization also for all standard deep networks.
If Problem 9 is infeasible, then this equation has no solution for λ ≥ 0. In both the feasible and
infeasible case the solution can be found in O(d log d). The algorithm is given in Algorithm 1.
8
Proof. The Lagrangian is given by
1 2
L(δ, λ, α, β) = kδk2 + λ hv, δi − c + hα, x + δ − 1i − hβ, x + δi .
2
The KKT conditions become
δ + λv + α − β = 0
αr (xr + δr − 1) = 0, ∀r = 1, . . . , d
βr (xr + δr ) = 0, ∀r = 1, . . . , d
λ(hv, δi − c) = 0
αr ≥ 0, ∀r = 1, . . . , d
βr ≥ 0, ∀r = 1, . . . , d
λ ≥ 0.
We deduce that if βr > 0 then αr = 0 which implies
δr = −xr = −λvr + βr =⇒ βr = max{0, −xr + λvr }.
Similarly, if αr > 0 then βr = 0 which implies
δr = 1 − xr = −λvr − αr =⇒ αr = max{0, xr − 1 − λvr }.
It follows
−λvr if − xr < −λvr < 1 − xr
δr = 1 − xr if − λvr > 1 − xr .
−xr if − λvr < −xr
Proof. The result is basically obvious but we derive it formally. First of all we rewrite (10)
as a linear program.
Xd
min ti (11)
δ∈Rd
i=1
sbj. to: c ≥ hv, δi
− xj ≤ δ j ≤ 1 − xj
− ti ≤ δi ≤ ti
ti ≥ 0
9
Algorithm 1 Computation of box-constrained adversarial samples wrt to k·k2 -norm.
INPUT: c = fj (x) − fm (x) and v = ∇fm (x) − ∇fj (x) (j, desired class, m original class)
sort γr = max{ xrv−1
r
, xvrr } in increasing order π
s=0; ρ = 0
while ρ > c do
s←s+1
compute ρ = hv, δ(γπs )i, where δ(λ) is the function defined in Lemma 4.1
end while
if ρ ≤ c then
compute Im = {r | − xr ≤ −γπs−1 ) vr ≤ 1 − xr },
compute Iu = {r | − γπs−1 vr > 1 − xr },
P Il = {r | P
compute − γπs−1 vr < −xr }
vr (1−xr )− vr xr −c
r∈Iu r∈Il
λ= P
vr2
.
r∈Im
δ = max{−xr , min{−λvr , 1 − xr }}
else
Problem has no feasible solution
end if
10
The Lagrangian of this problem is
L(δ, t, α, β, γ, θ, κ, λ) = ht, 1i + λ(hv, δi − c) − hα, δ + xi + hβ, δ − 1 + xi − hγ, δ + ti + hθ, δ − ti − hκ, ti
= ht, 1 − γ − θ − κi + hδ, β − α + λv + θ − γi + hβ − α, xi − hβ, 1i − λc
Minimization of the Lagrangian over t resp. δ leads only to a non-trivial result if
1−γ−θ−κ=0
β − α + λv + θ − γ = 0
We get the dual problem
max hβ − α, xi − hβ, 1i − λc (12)
α,β,θ,γ,κ,λ
sbj. to: 1 − γ − θ − κ = 0
β − α + λv + θ − γ = 0
α ≥ 0, β ≥ 0, θ ≥ 0, γ ≥ 0, κ ≥ 0, λ ≥ 0 (13)
Using the equalities we can now simplify the problem by replacing α and using the fact that
κ is not part of the objective, the positivity just induces an additional constraint. We get
α = β + λv + θ − γ.
Plugging this into the problem (16) we get
max − hλv + θ − γ, xi − hβ, 1i − λc (14)
β,θ,γ,λ
sbj. to: 1 − γ − θ ≥ 0
β + λv + θ − γ ≥ 0
β ≥ 0, θ ≥ 0, γ ≥ 0, λ ≥ 0 (15)
We get the constraint β ≥ max{0, −λv − θ + γ} (all the inequalities and functions are taken
here componentwise) and thus we can explicitly maximize over β
d
X
max − hλv + θ − γ, xi − max{0, −λvi − θi + γi } − λc (16)
θ,γ,λ
i=1
sbj. to: γ + θ ≤ 1
θ ≥ 0, γ ≥ 0, λ ≥ 0 (17)
As 0 ≤ xi ≤ 1 it holds for all θ ≥ 0, γ ≥ 0
d
X
h−λv − θ + γ, xi − max{0, −λvi − θi + γi } ≤ 0.
i=1
11
Finally, we consider the case p = ∞.
min kδk∞ (20)
δ∈Rd
sbj. to: fj (x) − fc (x) ≥ h∇fc (x) − ∇fj (x), δi
0 ≤ xj + δj ≤ 1
Lemma 4.3. Let m = arg max fj (x) and define v = ∇fm (x) − ∇fj (x) and c = fj (x) −
j
fm (x) < 0, then the solution of (20) can be found by solving for t ≥ 0,
X X
vi max{−t, −xr } + vi min{t, 1 − xr } = c.
vi >0 vi <0
and the lower bound can be attained if δr = min{t, 1−xr } for vr < 0 and δr = max{−t, −xr }
for vr > 0, δr = 0 if vr = 0. Note that both terms are monotonically decreasing with t. The
algorithm 3 has complexity O(d log d) due to the initial sorting step followed by steps of
complexity O(d).
For nonlinear classifiers a change of the decision is not guaranteed and thus we use later on
a binary search with a variable c instead of fc (x) − fj (x).
5 Experiments
The goal of the experiments is the evaluation of the robustness of the resulting classifiers
and not necessarily state-of-the-art results in terms of test error. In all cases we compute the
robustness guarantees from Theorem 2.1 (lower bound on the norm of the minimal change
required to change the classifier decision), where we optimize over R using binary search,
and adversarial samples with the algorithm for the 2-norm from Section 4 (upper bound
on the norm of the minimal change required to change the classifier decision), where we do
a binary search in the classifier output difference in order to find a point on the decision
boundary. Additional experiments can be found in the supplementary material.
Kernel methods: We optimize the cross-entropy loss once with the standard regularization
(Kernel-LogReg) and with Cross-Lipschitz regularization (Kernel-CL). Both are convex
optimization problems and we use L-BFGS to solve them. We use the Gaussian kernel
2
k(x, y) = e−γkx−yk where γ = ρ2 α and ρKNN40 is the mean of the 40 nearest neighbor
KNN40
distances on the training set and α ∈ {0.5, 1, 2, 4}. We show the results for MNIST (60000
training and 10000 test samples). However, we have checked that parameter selection
using a subset of 50000 images from the training set and evaluating on the rest yields
indeed the parameters which give the best test errors when trained on the full set. The
12
Algorithm 3 Computation of box-constrained adversarial samples wrt to k·k∞ -norm.
INPUT: c = fj (x) − fm (x) and v = ∇fm (x) − ∇fj (x) (j, desired class, m original class)
d+ := |{l | vl > 0}| and d− := |{l | vl < 0}|
sort {xl | vl > 0} in increasingPorder π, sort {1P
− xl | vl < 0} in increasing order ρ
s:=1, r:=1, g:=0, t:=0, κ+ = vl >0 vl , κ− = vl <0 vl , γ+ = γ− = 0.
while g > c AND (s ≤ d+ OR t ≤ d− ) do
if xπs < 1 − xρr then
t = xπs , κ+ ← κ+ − vπs , γ+ ← γ+ − vπs xπs ,
s←s+1
else
t = 1 − xρr , κ− ← κ− − vρr , γ− ← γ− − vρr (1 − xρr ),
t←t+1
end if
g = γ+ + γ− + t (κ+ − κ− )
end while
if g ≤ c then
undo last step
t = (c − γ+ − γ− )/(κ− − κ+ )
min{t, 1 − xr } for vr > 0
compute δr =
max{−t, −xr } for vr < 0
else
Problem has no feasible solution
end if
Figure 1: Kernel Methods: Cross-Lipschitz regularization achieves both better test error and robustness against
adversarial samples (upper bounds, larger is better) compared to the standard regularization. The robustness
guarantee is weaker than for neural networks but this is most likely due to the relatively loose bound.
Neural Networks: Before we demonstrate how upper and lower bounds improve using
cross-Lipschitz regularization, we first want to highlight the importance of the usage of the
local cross-Lipschitz constant in Theorem 2.1 for our robustness guarantee.
2
Note that then the optimization of R in Theorem 2.1 would be unnecessary.
13
MNIST (plain) CIFAR10 (plain)
None Dropout Weight Dec. Cross Lip. None Dropout Weight Dec. Cross Lip.
0.69 0.48 0.68 0.21 0.22 0.13 0.24 0.17
α
Table 1: We show the average ratio αglobal of the robustness guarantees αglobal , αlocal from Theorem 2.1 on
local
the test data for MNIST and CIFAR10 and different regularizers. The guarantees using the local Cross-Lipschitz
constant are up to eight times better than with the global one.
of U ,
|g(x) − g(y)| = hwc − wj , σ(U x) − σ(U y)i ≤ kwc − wj k2 kσ(U x) − σ(U y)k2
≤ kwc − wj k2 kU (x − y)k2 ≤ kwc − wj k2 kU k2,2 kx − yk2 ,
The advantage is clearly that this global Cross-Lipschitz constant can just be computed
once and by using it in Theorem 2.1 one can evaluate the guarantees very quickly. However,
it turns out that one gets significantly better robustness guarantees by using the local
Cross-Lipschitz constant in terms of the bound derived in Proposition 2.2 instead of the just
derived global Lipschitz constant. Note that the optimization over R in Theorem 2.1 is done
using a binary search, noting that the bound of the local Lipschitz constant in Proposition
2.2 is monotonically decreasing in R. We have the following comparison in Table 1. We
want to highlight that the robustness guarantee with the global Cross-Lipschitz constant
was always worse than when using the local Cross-Lipschitz constant across all regularizers
and data sets. Table 1 shows that the guarantees using the local Cross-Lipschitz can be up
to eight times better than for the global one. As these are just one hidden layer networks, it
is obvious that robustness guarantees for deep neural networks based on the global Lipschitz
constants will be too coarse to be useful.
Experiments: We use a one hidden layer network with 1024 hidden units and the softplus
activation function with α = 10. Thus the resulting classifier is continuously differentiable.
We compare three different regularization techniques: weight decay, dropout and our Cross-
Lipschitz regularization. Training is done with SGD. For each method we have adapted the
learning rate (two per method) and regularization parameters (4 per method) so that all
methods achieve good performance. We do experiments for MNIST and CIFAR10 in three
settings: plain, data augmentation and adversarial training. The exact settings of the param-
eters and the augmentation techniques are described below.The results for MNIST are shown
in Figure 2 and the results for CIFAR10 are in Figure 3.For MNIST there is a clear trend that
our Cross-Lipschitz regularization improves the robustness of the resulting classifier while
having competitive resp. better test error. It is surprising that data augmentation does not
lead to more robust models. However, adversarial training improves the guarantees as well as
adversarial resistance. For CIFAR10 the picture is mixed, our CL-Regularization performs
well for the augmented task in test error and upper bounds but is not significantly better in
the robustness guarantees. The problem might be that the overall bad performance due to the
simple model is preventing a better behavior. Data augmentation leads to better test error
but the robustness properties (upper and lower bounds) are basically unchanged. Adversarial
training slightly improves performance compared to the plain setting and improves upper
and lower bounds in terms of robustness. We want to highlight that our guarantees (lower
bounds) and the upper bounds from the adversarial samples are not too far away.
For MNIST (all settings) the learning rate is for all methods chosen from {0.2, 0.5}.
The regularization parameters for weight decay are chosen from {10−5 , 10−4 , 10−3 , 10−2 },
for Cross-Lipschitz from {10−5 , 10−4 , 5 ∗ 10−4 , 10−3 } and the dropout probabilities are
taken from {0.4, 0.5, 0.6, 0.7}. For CIFAR10 the learning rate is for all methods chosen
from {0.04, 0.1}, the regularization parameters for weight decay and Cross-Lipschitz are
{10−5 , 10−4 , 5 ∗ 10−4 , 10−3 } and dropout probabilities are taken from {0.5, 0.6, 0.7, 0.8}.
For CIFAR10 with data augmentation we choose the learning rate for all methods from
{0.04, 0.1}, the regularization parameters for weight decay are {10−6 , 10−5 , 10−4 , 10−3 } and
14
Adversarial Resistance (Upper Bound) Robustness Guarantee (Lower Bound)
wrt to L2 -norm wrt to L2 -norm
Figure 2: Neural Networks, Left: Adversarial resistance wrt to L2 -norm on MNIST. Right: Average ro-
bustness guarantee wrt to L2 -norm on MNIST for different neural networks (one hidden layer, 1024 HU) and
hyperparameters. The Cross-Lipschitz regularization leads to better robustness with similar or better prediction
performance. Top row: plain MNIST, Middle: Data Augmentation, Bottom: Adv. Training
for Cross-Lipschitz {10−5 , 10−4 , 5 ∗ 10−4 , 10−3 } and the dropout probabilities are taken from
{0.5, 0.6, 0.7, 0.8}. Data augmentation for MNIST means that we apply random rotations in
π π
the angle [− 20 , 20 ] and random crop from 28x28 to 24x24. For CIFAR-10 we apply the same
and additionally we mirror the image (left to right) with probability 0.5 and apply random
brightness [−0.1, 0.1] and random contrast change [0.6, 1.4]. In each substep we ensure that
we get an image in [0, 1]d by clipping. We implemented adversarial training by generating
adversarial samples wrt to the infinity norm with the code from Section 4 and replaced 50%
of each batch as adversarial samples. Finally, we use for SGD batchsize 64 in all experiments.
Illustration of adversarial samples: we take one test image from MNIST and apply the
adversarial generation from Section 4 wrt to the 2-norm to generate the adversarial samples for
the different kernel methods and neural networks (plain setting), where we use for each method
the parameters leading to best test performance. All classifiers change their originally correct
decision to a “wrong” one. It is interesting to note that for Cross-Lipschitz regularization
(both kernel method and neural network) the “adversarial” sample is really at the decision
boundary between 1 and 8 (as predicted) and thus the new decision is actually correct.
This effect is strongest for our Kernel-CL, which also requires the strongest modification to
generate the adversarial sample. The situation is different for neural networks, where the
classifiers obtained from the two standard regularization techniques are still vulnerable, as
the adversarial sample is still clearly a 1 for dropout and weight decay. We show further
examples below.
15
Adversarial Resistance (Upper Bound) Robustness Guarantee (Lower Bound)
wrt to L2 -norm wrt to L2 -norm
Figure 3: Left: Adversarial resistance wrt to L2 -norm on test set of CIFAR10. Right: Average robustness
guarantee on the test set wrt to L2 -norm for the test set of CIFAR10 for different neural networks (one hidden
layer, 1024 HU) and hyperparameters. While Cross-Lipschitz regularization yields good test errors, the guarantees
are not necessarily stronger. Top row: CIFAR10 (plain), Middle: CIFAR10 trained with data augmentation,
Bottom: Adversarial Training.
Original, Class 6 K-SVM, Pred:0, kδk2 = 3.0 K-CL, Pred:0, kδk2 = 4.7
NN-WD, Pred:4, kδk2 = 1.4 NN-DO, Pred:4, kδk2 = 1.9 NN-CL, Pred:0, kδk2 = 3.2
Original, Class 1 K-SVM, Pred:7, kδk2 = 1.2 K-CL, Pred:8, kδk2 = 3.5
16
NN-WD, Pred:8, kδk2 = 1.2 NN-DO, Pred:7, kδk2 = 1.1 NN-CL, Pred:8, kδk2 = 2.6
Figure 4: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for Cross-Lipschitz regularization this new decision
makes (often) sense, whereas for the neural network models (weight decay/dropout) the change is so small that
Figure 5: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.
Original, Class 4 K-SVM, Pred:9, kδk2 = 1.4 K-CL, Pred:9, kδk2 = 2.2
NN-WD, Pred:9, kδk2 = 1.3 NN-DO, Pred:9, kδk2 = 1.5 NN-CL, Pred:9, kδk2 = 2.2
Figure 6: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.
Original, Class 2 K-SVM, Pred:3, kδk2 = 2.5 K-CL, Pred:3, kδk2 = 4.4
NN-WD, Pred:3, kδk2 = 1.1 NN-DO, Pred:3, kδk2 = 1.4 NN-CL, Pred:3, kδk2 = 2.0
Figure 7: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.
Original, Class 8 K-SVM, Pred:3, kδk2 = 2.2 K-CL, Pred:5, kδk2 = 4.2
NN-WD, Pred:3, kδk2 = 1.4 NN-DO, Pred:5, kδk2 = 1.6 NN-CL, Pred:3, kδk2 = 2.8
Figure 8: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.
17
Original, Class 1 K-SVM, Pred:8, kδk2 = 1.2 K-CL, Pred:2, kδk2 = 3.7
NN-WD, Pred:2, kδk2 = 1.1 NN-DO, Pred:2, kδk2 = 1.1 NN-CL, Pred:8, kδk2 = 2.7
Figure 9: Top left: original test image, for each classifier we generate the corresponding adversarial sample which
changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes sense,
whereas for all neural network models the change is so small that the new decision is clearly wrong.
Original, Class 3 K-SVM, Pred:8, kδk2 = 2.1 K-CL, Pred:8, kδk2 = 3.3
NN-WD, Pred:8, kδk2 = 1.7 NN-DO, Pred:8, kδk2 = 1.4 NN-CL, Pred:5, kδk2 = 3.2
Figure 10: Top left: original test image, for each classifier we generate the corresponding adversarial sample
which changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes
sense, whereas for all neural network models the change is so small that the new decision is clearly wrong.
Original, Class 8 K-SVM, Pred:9, kδk2 = 2.1 K-CL, Pred:5, kδk2 = 2.6
NN-WD, Pred:9, kδk2 = 1.4 NN-DO, Pred:9, kδk2 = 1.8 NN-CL, Pred:9, kδk2 = 1.8
Figure 11: Top left: original test image, for each classifier we generate the corresponding adversarial sample
which changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes
sense, whereas for all neural network models the change is so small that the new decision is clearly wrong.
18
Adversarial Resistance (Upper Bound) Robustness Guarantee (Lower Bound)
wrt to L2 -norm wrt to L2 -norm
Figure 13: Left: Adversarial resistance wrt to L2 -norm on test set of the german traffic sign benchmark (GTSB)
in the plain setting. Right: Average robustness guarantee on the test set wrt to L2 -norm for the test set of GTSB
for different neural networks (one hidden layer, 1024 HU) and hyperparameters. Here dropout performs very well
both in terms of performance and robustness.
Original, Class 5 K-SVM, Pred:9, kδk2 = 1.5 K-CL, Pred:9, kδk2 = 2.2
NN-WD, Pred:9, kδk2 = 1.1 NN-DO, Pred:9, kδk2 = 1.0 NN-CL, Pred:9, kδk2 = 1.4
Figure 12: Top left: original test image, for each classifier we generate the corresponding adversarial sample
which changes the classifier decision (denoted as Pred). Note that for the kernel methods this new decision makes
sense, whereas for all neural network models the change is so small that the new decision is clearly wrong.
German Traffic Sign Benchmark: As a third dataset we used the German Traffic Sign
Benchmark (GTSB) [23], which consists of images of german traffic signs, which has 43
classes with 34209 training and 12630 test samples. The results are shown in Figure 13. For
this dataset Cross-Lipschitz regularization improves the upper bounds compared to weight
decay but dropout achieves significantly better prediction performance and has similar upper
bounds. The robustness guarantees for weight decay and Cross-Lipschitz are slightly better
than for dropout.
Residual Networks: All experiments so far were done with one hidden layer neural
networks so that we can evaluate lower and upper bounds. Now we want to demonstrate
that Cross-Lipschitz regularization can also successfully be used for deep networks. We
use residual networks proposed in [9] with 32 parameter layers and non-bottleneck residual
blocks. We follow basically their setting, apart from that we did not subtract the per-pixel
mean so that all images are in [0, 1]d and use random crop but without any padding as in [9].
Similar to [9], we train for 160 epochs, and the learning rate is divided by 10 on the 115-th
and 140-th epochs. For the experiments with dropout we followed the recommendation of
[25], inserting a dropout layer between convolutional layers inside each residual block. For
Cross-Lipschitz regularization we use automatic differentiation in TensorFlow [1] to calculate
the derivative with respect to the input, which slows done the training by a factor of 10.
For the plain setting the learning rate for all methods is chosen from {0.2, 0.5}, except for the
runs without regularization, for which it is from {0.08, 0.1, 0.2, 0.4, 0.6, 0.8}. For weight decay
the regularization parameter is chosen from {10−5 , 10−4 , 10−3 , 10−2 }, for Cross-Lipschitz
from {10−4 , 10−3 , 10−2 , 10−1 }, and for dropout the probabilities are from {0.5, 0.6, 0.7, 0.8}.
For the data augmentation setting the only difference was in the higher learning rates: no
19
Adversarial Resistance (Upper Bound) Adversarial Resistance (Upper Bound)
wrt to L2 -norm (ResNets) wrt to L2 -norm (ResNets)
Figure 14: Results on CIFAR10 for a residual network with different regularizers. As we only have lower
bounds for one hidden layer networks, we can only show upper bounds for adversarial resistance. Left: with data
augmentation similar to [25] Right: plain setting
regularization - {0.2, 0.5, 0.8, 1.0, 1.5, 2.0, 3.0, 4.0}, weight decay - {0.1, 0.4}, Cross-Lipschitz
- {0.2, 1.0}. The results are shown in Figure 14. Cross-Lipschitz regularization improves
the upper bounds on the robustness against adversarial manipulation compare to weight
decay and dropout by a factor of 2 to 3 both in the plain setting (right) and with data
augmentation (left). This comes at a price of a slightly worse test performance. However,
it shows that Cross-Lipschitz regularization is also effective for deep neural networks. It
remains interesting future work to come up also with interesting instance-specific lower
bounds (robustness guarantees) for deep neural networks.
Outlook Formal guarantees on machine learning systems are becoming increasingly more
important as they are used in safety-critical systems. We think that there should be more
research on robustness guarantees (lower bounds), whereas current research is focused on
new attacks (upper bounds). We have argued that our instance-specific guarantees using our
local Cross-Lipschitz constant is more effective than using a global one and leads to lower
bounds which are up to 8 times better. A major open problem is to come up with tight
lower bounds for deep networks.
References
[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,
J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia,
R. Józefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. G.
Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. A. Tucker,
V. Vanhoucke, V. Vasudevan, F. B. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke,
Y. Yu, and X. Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed
systems, 2016.
[2] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi. Measuring
neural net robustness with constraints. In NIPS, 2016.
[3] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten
detection methods. In ACM Workshop on Artificial Intelligence and Security, 2017.
[4] M. Cisse, P. Bojanowksi, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: Improving
robustness to adversarial examples. In ICML, 2017.
[5] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In KDD,
2004.
[6] H. Drucker and Y. Le Cun. Double backpropagation increasing generalization performance. In
IJCNN, 1992.
[7] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.
In ICLR, 2015.
[8] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples.
In ICLR Workshop, 2015.
20
[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR,
pages 770–778, 2016.
[10] D. P. Helmbold and P. Long. On the inductive bias of dropout. Journal of Machine Learning
Research, 16:3403–3454, 2015.
[11] S. Hochreiter and J. Schmidhuber. Simplifying neural nets by discovering flat minima. In NIPS,
1995.
[12] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. Learning with a strong adversary. In
ICLR, 2016.
[13] J. Kos, I. Fischer, and D. Song. Adversarial examples for generative models. In ICLR Workshop,
2017.
[14] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In
ICLR Workshop, 2017.
[15] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and
black-box attacks. In ICLR, 2017.
[16] D. Lowd and C. Meek. Adversarial learning. In KDD, 2005.
[17] S.M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations.
In CVPR, 2017.
[18] N. Papernot, P. McDonald, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial
perturbations against deep networks. In IEEE Symposium on Security & Privacy, 2016.
[19] P. Frossard S.-M. Moosavi-Dezfooli, A. Fawzi. Deepfool: a simple and accurate method to fool
deep neural networks. In CVPR, pages 2574–2582, 2016.
[20] B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
[21] U. Shaham, Y. Yamada, and S. Negahban. Understanding adversarial training: Increasing local
stability of neural nets through robust optimization. In NIPS, 2016.
[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A
simple way to prevent neural networks from overfitting. Journal of Machine Learning Research,
15:1929–1958, 2014.
[23] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. computer: Benchmarking machine
learning algorithms for traffic sign recognition. Neural Networks, 32:323–332, 2012.
[24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.
Intriguing properties of neural networks. In ICLR, pages 2503–2511, 2014.
[25] S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC, pages 87.1–87.12.
[26] S. Zheng, Y. Song, T. Leung, and I. J. Goodfellow. Improving the robustness of deep neural
networks via stability training. In CVPR, 2016.
21