Proximal Exe Solutions
Proximal Exe Solutions
Robert M. Gower.
1 Introduction
This is an exercise in deducing closed form expressions for proximal operators. In the
first part we will show how to deduce that the proximal operator of the L1 norm is the
soft-thresholding operator. In the second part we will show the equivalence between the
proximal operator of the matrix nuclear norm and the singular value soft-thresholding
operator.
First some necessary notation.
def a
Notation: For every x, y, P Rn let hx, yi “ xJ y and let kxk2 “ hx, xi. Let σpAq “
rσ1 pAq, . . . , σn pAqs be the singular values of A.
2 def
` J ˘ ř 2
ř Let kAk F “ Tr A A “ ij Aij denote the Frobenius norm of A and let kAk˚ “
i σi pAq be the nuclear norm.
Let f : x P Rd Ñ f pxq be a convex function. Consider the proximal operator
def 1
proxf pvq “ arg min kx ´ vk22 ` f pxq. (1)
x 2
2 Properties
řn
Ex. 1 — If f pxq “ i“1 fi pxi q then
n
ą
proxf pvq “ proxfi pvi q “ pproxf1 pv1 q, . . . , proxfd pvd qq, @v P Rd .
i“1
1
Answer (Ex. I) —
1
proxf pvq “ arg min kw ´ vk22 ` f pwq
w 2
n
ÿ 1
“ arg min pwi ´ vi q2 ` fi pwi q
w
i“1
2
n n
ą ÿ 1
“ arg min pwi ´ vi q2 ` fi pwi q
i“1
wi
i“1
2
n
ą
“ proxfi pvi q
i“1
“ pproxf1 pv1 q, . . . , proxfd pvd qq.
then proxf pvq “ projC pvq “ arg minwPC kw ´ vk2 the orthogonal projection of v onto C.
2
3 Soft Thresholding
Ex. 4 — In this exercise we will show step-by-step that the proximal operator of the L1
norm is the soft thresholding operator, that is
where $
&v ´ λ
’ if λ ă v
Sλ pvq “ 0 if ´ λ ď v ď λ (3)
’
v`λ if v ă ´λ.
%
Part I
Show that
proxλkwk1 pvq “ pproxλ|w1 | pv1 q, . . . , proxλ|wd | pvd qq.
Part II
Show that if
1
α˚ “ arg min pα ´ vq2 ` λ|α| (4)
α 2
then
α˚ P v ´ λB|α˚ |. (5)
Note that by definition α˚ “ proxλ|α| pvq.
Part III
If λ ă v show that the solution to the inclusion (5) is given by
α˚ “ v ´ λ.
Part IV
If ´λ ă v ă λ show that the solution to the inclusion (5) is given by
α˚ “ 0.
Part V
Using the previous items, prove that
3
and that the equality (10) holds.
Part VI
Show that the soft-threshold operator can be written in a more compact way as
Sλ pvq “ signpvqp|v| ´ λq` , (6)
def
where pαq` “ maxt 0, αu and signpvq is the sign function given by signpvq “ 1vě0 ´ 1vă0 .
This can be implemented efficiently in python using numpy and
Consequently
ˆ " * " *˙
1 2 1 2 1 2
arg min kx ´ vk2 ` f pxq “ arg min px1 ´ v1 q ` f1 px1 q , . . . , arg min pxd ´ vd q ` fd pxd q
x 2 x1 2 xd 2
` ˘
“ proxf1 pv1 q, . . . , proxfd pvd q .
4
Answer (Ex. IV) — If ´λ ď v ď λ then due to (7) the solution to the inclusion (5) is
bounded by
$
&tv ` λu
’ if α˚ ă 0
α˚ P v ´ λB|α˚ | Ă rv ´ λ, v ` λs if α˚ “ 0. (8)
’
%
tv ´ λu ˚
if α ą 0
Now suppose that α˚ ă 0. The above shows that
α˚ P r0, 2λs ,
α˚ P r´2λ, 0s ,
Thus
(3)
proxλ|α| pvq “ α˚ “ Sλ pvq.
Answer (Ex. VI) — It suffices to do a case by case analysis, that is, for 0 ď λ ă v we
have that
signpvqp|v| ´ λq` “ pv ´ λq` “ v ´ λ,
while for ´λ ď v ď λ we have that
5
4 Singular Value Soft Thresholding
Consider the extension of proximal operators to matrices
def 1
proxF pAq “ arg min kX ´ Ak2F ` F pXq. (9)
XPRdˆd 2
We will now prove step by step that
where kXk˚ “ di“1 σi pXq and A “ U diagpσi pAqqV J is the singular value decomposition
ř
of A.
This proximal operator forms the basis of the celebrated algorithm for solving the
matrix completion problem [CaiCandes:2010].
Ex. 5 — Part I
Show that the nuclear and the Frobenius norm are invariant under rotations. That is, for
any matrix A and orthogonal matrices O and Q we have that
and
kAk˚ “ kOAk˚ “ kAQk˚ .
Part II
(Level HARD): Prove that (10) holds. You may use the following Theorem by Von
Neumann
Theorem 4.1 (Von Neumann 1937) For any matrices X and A of the same dimen-
sions and orthogonal matrices U and V , we have that
where diagpσi pAqq is a diagonal matrix with the singulars values of A on the diagonal.
For the nuclear norm, note that for any orthogonal matrices O and U we have that OU is
an orthogonal matrix since
6
If the SVD decomposition is given by A “ U diagpσi pAqqV J then OA “ OU diagpσi pAqqV J
is the SVD decomposition of OA, that is, the matrix OA has the same singular values of
A. Consequently we have that
d
ÿ
kOAk˚ “ kpOU qdiagpσi pAqqV J k “ σi pAq “ kAk˚ ,
i“1
Consequently
1 1
min kX̄ ´ diagpσi pAqqk2F ` λkX̄k˚ ě min kdiagpσi pX̄qq ´ diagpσi pAqqk2F ` λkdiagpσi pX̄qqk˚ ,
X̄ 2 X̄ 2
where we used the invariance of the nuclear norm under orthogonal transformations. This
proves that the solution X̄ “ diagpX̄11 , . . . , X̄dd q will be a diagonal matrix. From now
on we assume that X̄ “ diagpX̄ii q is a diagonal matrix. Let x̄ “ pX̄11 , . . . , X̄dd q be the
vectorization of X̄. Thus kX̄k˚ “ kx̄k1 and kX̄k2F “ kX̄k22 . Let σpAq “ rσ1 pAq, . . . , σd pAqs P
Rd . Finally we have that (13) becomes
1 1
min kX̄ ´ diagpσi pAqqk2F ` λkX̄k˚ “ min kx̄ ´ σpAqk22 ` λkx̄k1 .
XPRdˆd 2 x̄PR 2
d
7
Consequently, taking the minimum argument we have that
1
Sλ pdiagpσpAqqq “ arg min kX̄ ´ diagpσi pAqqk2F ` kX̄k˚ ,
X̄ is diag 2
where Sλ pdiagpσpAqqq :“ diagpSλ pσi pAqqq. To conclude, note that our original argument is
U XV J “ X̄ due to (12). Thus finally