0% found this document useful (0 votes)
47 views8 pages

Proximal Exe Solutions

The document provides examples and step-by-step solutions for deducing closed form expressions of proximal operators. It first shows the proximal operator of the L1 norm is the soft-thresholding operator. It then proves this by deriving the solutions for when the threshold is less than, between, or greater than the value. The proximal operator of the nuclear norm is also shown to be the singular value soft-thresholding operator.

Uploaded by

Naseer Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views8 pages

Proximal Exe Solutions

The document provides examples and step-by-step solutions for deducing closed form expressions of proximal operators. It first shows the proximal operator of the L1 norm is the soft-thresholding operator. It then proves this by deriving the solutions for when the threshold is less than, between, or greater than the value. The proximal operator of the nuclear norm is also shown to be the singular value soft-thresholding operator.

Uploaded by

Naseer Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Exercise List: Proximal Operator.

Robert M. Gower.

September 23, 2019

1 Introduction
This is an exercise in deducing closed form expressions for proximal operators. In the
first part we will show how to deduce that the proximal operator of the L1 norm is the
soft-thresholding operator. In the second part we will show the equivalence between the
proximal operator of the matrix nuclear norm and the singular value soft-thresholding
operator.
First some necessary notation.

def a
Notation: For every x, y, P Rn let hx, yi “ xJ y and let kxk2 “ hx, xi. Let σpAq “
rσ1 pAq, . . . , σn pAqs be the singular values of A.
2 def
` J ˘ ř 2
ř Let kAk F “ Tr A A “ ij Aij denote the Frobenius norm of A and let kAk˚ “
i σi pAq be the nuclear norm.
Let f : x P Rd Ñ f pxq be a convex function. Consider the proximal operator

def 1
proxf pvq “ arg min kx ´ vk22 ` f pxq. (1)
x 2

2 Properties
řn
Ex. 1 — If f pxq “ i“1 fi pxi q then
n
ą
proxf pvq “ proxfi pvi q “ pproxf1 pv1 q, . . . , proxfd pvd qq, @v P Rd .
i“1

1
Answer (Ex. I) —
1
proxf pvq “ arg min kw ´ vk22 ` f pwq
w 2
n
ÿ 1
“ arg min pwi ´ vi q2 ` fi pwi q
w
i“1
2
n n
ą ÿ 1
“ arg min pwi ´ vi q2 ` fi pwi q
i“1
wi
i“1
2
n
ą
“ proxfi pvi q
i“1
“ pproxf1 pv1 q, . . . , proxfd pvd qq.

Ex. 2 — Let C be a closed convex set. If


#
def 0 if x P C
f pxq “ IC pxq “
8 if x R C

then proxf pvq “ projC pvq “ arg minwPC kw ´ vk2 the orthogonal projection of v onto C.

Answer (Ex. II) —


1
proxf pvq “ arg min kw ´ vk22 ` IC pwq
w 2
1
“ arg min kw ´ vk22
wPC 2
“ arg minkw ´ vk2 .
wPC

Ex. 3 — Let A P Rd be a positive definite matrix with A ľ 0 and A “ AJ . Let b P Rd . If


λ J
f pxq “ x Ax ` hb, wi
2
then proxf pvq “

Answer (Ex. III) —


1 λ
proxf pvq “ arg min kw ´ vk22 ` wJ Aw ` hb, wi ô
w 2 2
0 “ pw ´ vq ` λAw ` b ô
pI ` λAqw “ v ´ b ô
w “ pI ` λAq´1 pv ´ bq.

2
3 Soft Thresholding
Ex. 4 — In this exercise we will show step-by-step that the proximal operator of the L1
norm is the soft thresholding operator, that is

proxλkwk1 pvq “ pSλ pv1 q, . . . , Sλ pvn qq , (2)

where $
&v ´ λ
’ if λ ă v
Sλ pvq “ 0 if ´ λ ď v ď λ (3)

v`λ if v ă ´λ.
%

Part I

Show that
proxλkwk1 pvq “ pproxλ|w1 | pv1 q, . . . , proxλ|wd | pvd qq.

Part II
Show that if
1
α˚ “ arg min pα ´ vq2 ` λ|α| (4)
α 2

then
α˚ P v ´ λB|α˚ |. (5)
Note that by definition α˚ “ proxλ|α| pvq.

Part III
If λ ă v show that the solution to the inclusion (5) is given by

α˚ “ v ´ λ.

Part IV
If ´λ ă v ă λ show that the solution to the inclusion (5) is given by

α˚ “ 0.

Part V
Using the previous items, prove that

proxλ|α| pvq “ Sλ pvq

3
and that the equality (10) holds.

Part VI
Show that the soft-threshold operator can be written in a more compact way as
Sλ pvq “ signpvqp|v| ´ λq` , (6)
def
where pαq` “ maxt 0, αu and signpvq is the sign function given by signpvq “ 1vě0 ´ 1vă0 .
This can be implemented efficiently in python using numpy and

prox_L1(v,lmbda) = np.sign(v) * np.maximum(np.abs(v) - lmbda, 0.)

The above code also works when v is a vector!


Answer (Ex. I) — The trick lies in noticing that
d d d
1 ÿ 1ÿ ÿ
min kx ´ vk22 ` fi pxi q “ min pxi ´ vi q2 ` fi pxi q
x 2 x 2
i“1 i“1 i“1
d
ÿ 1
“ min pxi ´ vi q2 ` fi pxi q.
xi 2
i“1

Consequently
ˆ " * " *˙
1 2 1 2 1 2
arg min kx ´ vk2 ` f pxq “ arg min px1 ´ v1 q ` f1 px1 q , . . . , arg min pxd ´ vd q ` fd pxd q
x 2 x1 2 xd 2
` ˘
“ proxf1 pv1 q, . . . , proxfd pvd q .

Answer (Ex. II) — The solution to (4) must be such that


ˆ ˙
1 ˚ 2
0PB pα ´ vq ` λ|α | “ α˚ ´ v ` λB|α˚ |.
˚
2
Rearranging the above gives α˚ P v ´ λB|α˚ |.

Answer (Ex. III) — First note that


v ´ λB|α˚ | Ă rv ´ λ, v ` λs. (7)
If λ ă v then the above together with inclusion (5) shows that
α˚ P s0, 8r ñ B|α˚ | “ 1.
Consequently (5) shows that
α˚ P v ´ λB|α˚ | “ tv ´ λu.

4
Answer (Ex. IV) — If ´λ ď v ď λ then due to (7) the solution to the inclusion (5) is
bounded by
$
&tv ` λu
’ if α˚ ă 0
α˚ P v ´ λB|α˚ | Ă rv ´ λ, v ` λs if α˚ “ 0. (8)

%
tv ´ λu ˚
if α ą 0
Now suppose that α˚ ă 0. The above shows that

α˚ P r0, 2λs ,

a contradiction. If α˚ ą 0, then (8) shows that

α˚ P r´2λ, 0s ,

another contraction. Finally if α˚ “ 0 then (8) offers no contradiction since it is equivalent


to
α˚ P r´2λ, 2λs .
Consequently ´λ ă v ă λ ñ α˚ “ 0.

Answer (Ex. V) — Using analogous arguments to Ex III we can show that v ă ´λ ñ


α˚ “ v ` λ. By combing this observation together the solutions of Ex III and IV we have
that $
&v ´ λ if λ ă v

˚
α “ 0 if ´ λ ď v ď λ

v ` λ if v ă ´λ.
%

Thus
(3)
proxλ|α| pvq “ α˚ “ Sλ pvq.

Answer (Ex. VI) — It suffices to do a case by case analysis, that is, for 0 ď λ ă v we
have that
signpvqp|v| ´ λq` “ pv ´ λq` “ v ´ λ,
while for ´λ ď v ď λ we have that

signpvqp|v| ´ λq` “ signpvq0 “ 0.

Finally for v ă ´λ ď 0 we have that

signpvqp|v| ´ λq` “ ´p´v ´ λq` “ v ` λ.

5
4 Singular Value Soft Thresholding
Consider the extension of proximal operators to matrices

def 1
proxF pAq “ arg min kX ´ Ak2F ` F pXq. (9)
XPRdˆd 2
We will now prove step by step that

proxλkXk˚ pAq “ U Sλ pdiagpσpAqqqV J , (10)

where kXk˚ “ di“1 σi pXq and A “ U diagpσi pAqqV J is the singular value decomposition
ř
of A.
This proximal operator forms the basis of the celebrated algorithm for solving the
matrix completion problem [CaiCandes:2010].

Ex. 5 — Part I
Show that the nuclear and the Frobenius norm are invariant under rotations. That is, for
any matrix A and orthogonal matrices O and Q we have that

kAk2F “ kOAk2F “ kAQk2F

and
kAk˚ “ kOAk˚ “ kAQk˚ .
Part II

(Level HARD): Prove that (10) holds. You may use the following Theorem by Von
Neumann
Theorem 4.1 (Von Neumann 1937) For any matrices X and A of the same dimen-
sions and orthogonal matrices U and V , we have that

U XV J , A ď hdiagpσi pXqq, diagpσi pAqqi ,



(11)

where diagpσi pAqq is a diagonal matrix with the singulars values of A on the diagonal.

Answer (Ex. I) — By the definition of Frobenius norm we have that

kOAk2F “ Tr AJ pOJ OqA “ Tr AJ A “ kAk2F .


` ˘ ` ˘

For the nuclear norm, note that for any orthogonal matrices O and U we have that OU is
an orthogonal matrix since

pOU qJ OU “ U J pOJ OqU “ U J U “ I.

6
If the SVD decomposition is given by A “ U diagpσi pAqqV J then OA “ OU diagpσi pAqqV J
is the SVD decomposition of OA, that is, the matrix OA has the same singular values of
A. Consequently we have that
d
ÿ
kOAk˚ “ kpOU qdiagpσi pAqqV J k “ σi pAq “ kAk˚ ,
i“1

by definition of nuclear norm.

Answer (Ex. II) — Substituting A “ U diagpσi pAqqV J gives


1
proxλkXk˚ pAq “ arg min kU pU J XV ´ diagpσi pAqqqV J k2F ` λkXk˚ .
XPRdˆd 2
Changing variable name
X̄ “ U J XV, (12)
and noting that the Frobenius norm and the nuclear norm are invariant to orthogonal
transforms we have that
1 1
min kU pX̄ ´ diagpσi pAqqqV J k2F ` λkU X̄V J k˚ “ min kX̄ ´ diagpσi pAqqk2F ` λkX̄k˚ .(13)
XPRdˆd 2 XPRdˆd 2
I now claim that the solution X̄ to the above must be a diagonal matrix. This is where
Von Neumann’s theorem comes into play. To see this let Ux diagpσi pXqqVxJ be the SVD
decomposition of X. Thus

kX̄ ´ diagpσi pAqqk2F “ kUx diagpσi pXqqVxJ ´ diagpσi pAqqk2F


kdiagpσi pXqqk2F ` kdiagpσi pAqqk2F ´ 2 Ux diagpσi pXqqVxJ , diagpσi pAqq



Theorem 4.1
ě kdiagpσi pXqqk2F ` kdiagpσi pAqqk2F ´ 2 hdiagpσi pXqq, diagpσi pAqqi
“ kdiagpσi pXqq ´ diagpσi pAqqk2F .

Consequently
1 1
min kX̄ ´ diagpσi pAqqk2F ` λkX̄k˚ ě min kdiagpσi pX̄qq ´ diagpσi pAqqk2F ` λkdiagpσi pX̄qqk˚ ,
X̄ 2 X̄ 2
where we used the invariance of the nuclear norm under orthogonal transformations. This
proves that the solution X̄ “ diagpX̄11 , . . . , X̄dd q will be a diagonal matrix. From now
on we assume that X̄ “ diagpX̄ii q is a diagonal matrix. Let x̄ “ pX̄11 , . . . , X̄dd q be the
vectorization of X̄. Thus kX̄k˚ “ kx̄k1 and kX̄k2F “ kX̄k22 . Let σpAq “ rσ1 pAq, . . . , σd pAqs P
Rd . Finally we have that (13) becomes
1 1
min kX̄ ´ diagpσi pAqqk2F ` λkX̄k˚ “ min kx̄ ´ σpAqk22 ` λkx̄k1 .
XPRdˆd 2 x̄PR 2
d

7
Consequently, taking the minimum argument we have that
1
Sλ pdiagpσpAqqq “ arg min kX̄ ´ diagpσi pAqqk2F ` kX̄k˚ ,
X̄ is diag 2

where Sλ pdiagpσpAqqq :“ diagpSλ pσi pAqqq. To conclude, note that our original argument is
U XV J “ X̄ due to (12). Thus finally

proxλkXk˚ pAq “ arg minkX ´ Ak2F ` λkXk˚ “ U Sλ pdiagpσpAqqqV J .


X

You might also like