0% found this document useful (0 votes)

80 views

Convex Opt Alg

Lecture Notes on Convex Analysis and Iterative Algorithms

Uploaded by

DragutinAD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views

Convex Opt Alg

Lecture Notes on Convex Analysis and Iterative Algorithms

Uploaded by

DragutinAD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Lecture Notes on Convex Analysis and Iterative

Algorithms
İlker Bayram
[email protected]
About These Notes
These are the lectures notes of a graduate course I offered in the Dept. of Elec-
tronics and Telecommunications Engineering at Istanbul Technical University.
My goal was to get students acquainted with methods of convex analysis, to
make them more comfortable in following arguments that appear in recent
signal processing literature, and understand/analyze the proximal point al-
gorithm, along with its many variants. In the first half of the course, convex
analysis is introduced at a level suitable for graduate students in electrical engi-
neering (i.e., some familiarity with the notion of a convex set, convex functions
from other courses). Then several other algorithms are derived based on the
proximal point algorithm, such as the Douglas-Rachford algorithm, ADMM,
and some applications to saddle point problems. There are no references in
this version. I hope to add some in the future.

İlker Bayram
December, 2018
Contents
1 Convex Sets 2
1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Operations Preserving Convexity of Sets . . . . . . . . . . . . . 3
1.3 Projections onto Closed Convex Sets . . . . . . . . . . . . . . . 7
1.4 Separation and Normal Cones . . . . . . . . . . . . . . . . . . . 10
1.5 Tangent and Normal Cones . . . . . . . . . . . . . . . . . . . . 12

2 Convex Functions 15
2.1 Operations That Preserve the Convexity of Functions . . . . . . 17
2.2 First Order Differentiation . . . . . . . . . . . . . . . . . . . . . 18
2.3 Second Order Differentiation . . . . . . . . . . . . . . . . . . . . 20

3 Conjugate Functions 22

4 Duality 27
4.1 A General Discussion of Duality . . . . . . . . . . . . . . . . . . 27
4.2 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Karush-Kuhn-Tucker (KKT) Conditions . . . . . . . . . . . . . 34

5 Subdifferentials 35
5.1 Motivation, Definition, Properties of Subdifferentials . . . . . . 35
5.2 Connection with the KKT Conditions . . . . . . . . . . . . . . . 40
5.3 Monotonicity of the Subdifferential . . . . . . . . . . . . . . . . 40

6 Applications to Algorithms 45
6.1 The Proximal Point Algorithm . . . . . . . . . . . . . . . . . . . 45
6.2 Firmly-Nonexpansive Operators . . . . . . . . . . . . . . . . . . 48
6.3 The Dual PPA and the Augmented Lagrangian . . . . . . . . . 52
6.4 The Douglas-Rachford Algorithm . . . . . . . . . . . . . . . . . 53
6.5 Alternating Direction Method of Multipliers . . . . . . . . . . . 57
6.6 A Generalized Proximal Point Algorithm . . . . . . . . . . . . . 60

1
1 Convex Sets
This first chapter introduces convex sets and discusses some of their properties.
Having a solid understanding of convex sets is very useful for convex analysis
of functions because we can and will regard a convex function as a special
representation of a convex set, namely its epigraph.

1.1 Basic Definitions

Definition 1. A set C ∈ Rn is said to be convex if x ∈ C, x0 ∈ C implies that
αx + (1 − α)x ∈ C for any α ∈ [0, 1].

Consider the sets below. Each pair of (x, x0 ) we select in the set on the left
defines a line segment which lies inside the set. However, this is not the case
for the set on the right. Even though we are able to find line segments with
endpoints inside the set (as in (x, x0 )), this is not true in general, as exemplified
by (y, y 0 ).

x0 y y0

x
x x0

For the examples below, decide if the set is convex or not and prove whatever
you think is true.
Example 1 (Hyperplane). For given s ∈ Rn , r ∈ R, consider the set Hs,r =
{x ∈ Rn : hs, xi = r}. Notice that this is a subspace for r = 0.
Example 2 (Affine Subspace). This is a set V ∈ Rn such that if x ∈ V and
x0 ∈ V , then αx + (1 − α)x0 ∈ V for all α ∈ R.
−
Example 3 (Half Space). For given s ∈ Rn , r ∈ R, consider the set Hs,r =
n
{x ∈ R : hs, xi ≤ r}.
Example 4 (Cone). A cone K ∈ Rn is a set such that if x ∈ K, then
αx ∈ K for all α > 0. Note that a cone may be convex or non-convex.
See below for an example of a convex (left) and a non-convex (right) cone.

2
Exercise 1. Let K be a cone. Show that K is convex if and only if x, y ∈ K
implies x + y ∈ K.

1.2 Operations Preserving Convexity of Sets

Proposition 1 (Intersection of Convex Sets). Let C1 , C2 , . . . Ck be convex
sets. Show that C = ∩i Ci is convex.

Proof. Exercise!
Question 1. What happens if the intersection is empty? Is the empty set
convex?

This simple result is useful for characterizing linear systems of equations or

inequalities.
Example 5. For a matrix A, the solution set of Ax = b is an intersection of
hyperplanes. Therefore it is convex.

For the example above, we can in fact say more, thanks to the following vari-
ation of Prop. 1.
Exercise 2. Show that the intersection of a finite collection of affine subspaces
is an affine subspace.

Let us continue with systems of linear inequalities.

Example 6. For a matrix A, the solution set of Ax ≤ b is an intersection of
half spaces. Therefore it is convex.
Proposition 2 (Cartesian Products of Convex Sets). Suppose C1 ,. . . , Ck are
convex sets in Rn . Then the Cartesian product C1 × · · · × Ck is a convex set
in Rn×···×n .

Proof. Exercise!

Given an operator F and a set C, we can apply F to elements of C to obtain

the image of C under F . We will denote that set as F C. If F is linear then
it preserves convexity.
Proposition 3 (Linear Transformations of Convex Sets). Let M be a matrix.
If C is convex, then M C is also convex.

Consider now an operator that just adds a vector dto its operand. This is
a translation operator. Geometrically, it should be obvious that translation
preserves convexity. It is a good exercise to translate this mental picture to an
algebraic expression and show the following.

3
Proposition 4 (Translation). If C is convex, then the set C + d = {x : x =
v + d for some v ∈ C}is also convex.

Given C1 , C2 , consider the set of points of the form v = v1 + v2 , where vi ∈ Ci .

This set is denoted by C1 + C2 and is called the Minkowski sum of C1 and C2 .
We have the following result concerning Minkowski sums.

Proposition 5 (Minkowski Sums of Convex Sets). If C1 and C2 are convex

then C1 + C2 is also convex.

Proof. Observe that

C1 + C2 = I I (C1 × C2 ).

Thus it follows by Prop. 2 and Prop. 3 that C1 + C2 is convex.

Example 7. The Minkowski sum of a rectangle and a disk in R2 in shown

below.
3
2

+ =
1 1 2

Definition 2 (Convex Combination). Consider a finite collection of points x1 ,

. . . , xk . x is said to be a convex combination of xi ’s if x satisfies

x = α 1 x1 + · · · + α k xk

for some αi such that

αi ≥ 0 for all i,
k
X
αk = 1.
i=1

Definition 3 (Convex Hull). The set of all convex combinations of a set C is

called the convex hull of C and is denoted as Co(C).

Below are two examples, showing the convex hull of the sets C = {x1 , x2 }, and
C 0 = {y1 , y2 , y3 }.

4
y3

y2
x2
x1 y1

Notice that in the definition of the convex hull, the set C does not have to
be convex (in fact, C is not convex in the examples above). Also, regardless
of the dimension of C, when constructing Co(C), we can consider convex
combinations of any number of finite elements chosen from C. In fact, if we
denote the set of all convex combinations of kelements from C as Ck , then we
can show that Ck ⊂ Ck+m for m ≥ 0. An interesting result, which we will not
use is in this course, is the following.

Exercise 3 (Caratheodory’s Theorem). Show that, if C ∈ Rn , then Co(C) =

Cn+1 .

The following proposition justifies the term ‘convex’ in the definition of the
convex hull.

Proposition 6. The convex hull of a set C is convex.

Proof. Exercise!

The convex hull of C is the smallest convex set that contains C. More precisely,
we have the following.

Proposition 7. If D = Co(C), and if E is a convex set with C ⊂ E, then

D ⊂ E.

Proof. The idea is to show that for any integer k, Econtains all convex com-
binations involving k elements from C. For this, we will present an argument
based on induction.
We start with k = 2. Suppose x1 , x2 ∈ C. This implies x1 , x2 ∈ E. Since E
is convex, we have αx1 + (1 − α)x2 ∈ E, for all α ∈ [0, 1]. Since x1 , x2 were
arbitrary elements of C, it follows that Econtains all convex combinations of
any two elements from C.
Suppose now that Econtains all convex combinations Pk−1 of any k − 1 elements
from C. That is, if x1 , . . . xk−1 are in C, then for i=1 αi = 1, and αi ≥ 0, we
have k−1 th
P
i=1 αi xi ∈ E. Suppose we pick a k element, say xk , from C. Also,
let α1 , . . . αk be on the unit simplex, with αk 6= 0 (if αk = 0, we have nothing

5
to prove). Observe that
k
X k−1
X
αi xi = αk xk + α i xi
i=1 i=1
k−1
X αi
= αk xk + (1 − αk ) xi .
i=1
1 − αk

Notice that
k−1
X αi
= 1,
i=1
1 − αk
αi
≥ 0, for all i.
1 − αk
Therefore,
k−1
X αi
y= xi
i=1
1 − αk

is an element of E since it is a convex combination of k − 1 elements of C. But

then,
k
X
αi xi = αk xk + (1 − αk ) y
i=1

is an element of E (why?). We are done.

Another operation of interest is the affine hull. For that, let us introduce affine
combinations.

Definition 4 (Affine Combination). Consider a finite collection of points x1 ,

. . . , xk . x is said to be an affine combination of xi ’s if x satisfies

x = α 1 x1 + · · · + α k xk

for some αi such that

k
X
αi = 1.
i=1

Definition 5 (Affine Hull). The set of all affine combinations of a set C is

called the affine hull of C.

6
The convex hull of two points x1 , x2 is a line segment passing through the two
points.
...

x2
x1
...

Exercise 4. Consider a set C ⊂ R2 , composed of two points C = {x1 , x2 }.

What is the difference between the affine and convex hull of C?

Let us end this discussion with some definitions.

Definition 6 (Interior). x is said to be in the interior of Cif there exists an
open set Bsuch that x ∈ B and B ⊂ E. The interior of C is denoted as int(C).

c
Definition 7 (Boundary). Boundary of a set C is defined to be C ∩ int(C) .

1.3 Projections onto Closed Convex Sets

Definition 8 (Projection). Let C be a set. For a given point y (inside or
outside C), x ∈ C is said to be a projection of y onto C if it is a minimizer of
the following problem

min ky − zk2 .
z∈C

In general, projections may not exist, or may not be uniquely defined.

Example 8. Suppose D denotes the open disk in R2 and ybe such that kyk2 >
1. Then, the projection of y onto D does not exist. Notice that D is convex
but not closed.
Example 9. Let C be the unit circle. Can you find a point y ∈ / C that
has infinitely many projections? Can you find a point which does not have a
projection onto C? Notice in this case that C is not convex, but closed.

The two examples above imply that projections are not always guaranteed to
be well-defined. In fact, convexity alone is not sufficient. We will see later that
convexity and closedness together ensure the existence of a unique minimizer.
The following provides a useful characterization of the projection.
Proposition 8. Let C be a convex set. x is a projection of y onto x if and
only if

hz − x, y − xi ≤ 0, for all z ∈ C.

7
It is useful to understand what this proposition means geometrically. Consider
the figure below. If x is the projection of y onto the convex set, then Prop.8
implies that the angle zdxy is obtuse.
y

z
x

Proof of Prop. 8. (⇒) Suppose x = PC (y) but there exists z such that
hz − x, y − xi > 0.

The idea is as follows. Consider the figure below.

t y
β
α
x

Pick t such that β > α. But then

ky − tk2 < ky − xk2 .
Thus, x cannot be the projection.
Let us now provide an algebraic proof. Consider
ky − αz + (1 − α) x k22 = ky − x + α(x − z)k22

= ky − xk22 + α2 kx − zk22 +2α hx − z, y − xi

| {z } | {z }
d −c
2
Notice now that, since c > 0 by assumption, the polynomial α d − 2αc is
negative in the interval
2 (0, 2c/d). Thus we can find α ∈ (0, 1) such that
2
ky − αz + (1 − α) x k2 is strictly less than ky − xk2 . But this contradicts the
assumption that z = PC (y).
(⇐) Suppose x ∈ C and x 6= PC (y). Let z = PC (y). Also, suppose that
hz − x, y − xi ≤ 0.
Consider
ky − zk22 = ky − x + x − zk22
= ky − xk22 + kx − zk22 + 2 hx − z, y − xi .
| {z }
c

8
Now if c > 0, then ky − xk2 < ky − zk2 . But this is a contradiction.
Corollary 1. If C is closed, convex then PC (y) is a unique point.

Proof. Suppose x1 , x2 ∈ C and

kx1 − yk2 = kx2 − yk2 ≤ kz − yk2 for all z ∈ C.

Then, we have, by Prop. 8 that

hy − x1 , x2 − x1 i ≤ 0,
hy − x2 , x1 − x2 i ≤ 0.

Adding these inequalities, we obtain

hx1 − x2 , x1 − x2 i ≤ 0,

which implies x1 = x2 .

Projection operators enjoy useful properties. One of them is the following,

which we will refer to as ‘firm nonexpansivity’ (to be properly defined later).
Proposition 9. kPC (x1 ) − PC (x2 )k22 ≤ hPC (x1 ) − PC (x2 ), x1 − x2 i.

Proof. Let pi = PC (xi ). Then, we have

hx1 − p1 , p2 − p1 i ≤ 0,
hx2 − p2 , p2 − p1 i ≤ 0.

Summing these, we have

h(p2 − p1 ) + (x1 − x2 ), p2 − p1 i ≤ 0.

Rearranging, we obtain the desired inequality.

Consider the following figure. The proposition states that the inner product
of the two vectors is greater than the length of the shorter one squared.
x1

C
PC (x1 )

x2
PC (x2 )

As a first corollary of this proposition, we have :

9
Corollary 2. For a closed, convex C, we have hPC (x1 ) − PC (x2 ), x1 − x2 i ≥ 0.

Applying the Cauchy-Schwarz inequality, we obtain from Prop. 9 that projec-

tion operators are ‘non-expansive’ (also, to be discussed later).

Corollary 3. For a closed, convex C, we have kPC (x1 )−PC (x2 )k2 ≤ kx1 −x2 k2 .

1.4 Separation and Normal Cones

Proposition 10. Let C be a closed convex set and x ∈
/ C. Then, there exists
s such that

hs, xi > suphs, yi.

Proof. To outline the idea of the proof, consider the left figure below.
H x

C
z
p p
x

The hyperplane H, normal to x − p touching C at p should separate x and C.

If this is not the case, the situation resembles the right figure above, and we
should have kx − zk < kx − pk.
Algebraically, hx − PC (x), z − PC (x)i ≤ 0, for all z ∈ C. Set s = x − PC (x).
Then, we have

hs, z − x + si ≤ 0, ∀z ∈ C
⇐⇒hs, xi ≥ hs, si + hs, zi, ∀z ∈ C.

As a generalization, we have the following result.

Proposition 11. Let C, D be disjoint closed convex sets. Suppose also that
C − D is closed. Then, there exists s such that hs, xi > hs, yi, for all x ∈ C,
y ∈ D.

Proof. The idea is to consider the segment between the closest points of C and
D, and construct a separating hyperplane that is orthogonal to this segment,
as visualized below.

10
D

We want to find s such that

hs, y − xi < 0, ∀x ∈ C, y ∈ D.
Note that y −x ∈ D −C. Also, since C ∩D = ∅, we have 0 ∈ / C −D. Therefore,
there exists s such that hs, 0i > hs, zi, for all z ∈ D − C. This s satisdies
hs, y − xi < 0, ∀ y ∈ D, x ∈ C.

Definition 9. An affine hyperplane Hs,r is said to support the set C if hs, xi ≤

−
r for all x ∈ C. Notice that this is equivalent to C ⊂ Hs,r .
−
For a given set C, let ΣC denote the set of (s, r) such that C ⊂ Hs,r .
Proposition 12. If C is closed and convex, then
−
C = ∩(s,r)∈ΣC Hs,r .

Proof. The proof follows by showing that the expressions on the rhs and lhs
are subsets of each other.
Obviously,
−
C ⊂ ∪(s,r)∈ΣC Hs,r .

Let us now show the other inclusion. Take x ∈

/ C. Then, there exists p such
that
hx, pi < hz, pi, ∀ z ∈ C.
− −
Take q = PC (x). Then Hp,q ⊃ C, and x ∈
/ Hp,q . Since (p, q) ∈ ΣC , we find
−
that x ∈
/ ∩(s,r)∈ΣC Hs,r . Thus,
−
C ⊃ ∪(s,r)∈ΣC Hs,r .

We also state, without proof, the following result, that will be useful in the
discussion of duality.
Proposition 13. Suppose C is a convex set. There exists a supporting hy-
perplane for any x ∈ bd(C).

11
1.5 Tangent and Normal Cones
Definition 10. Let C be a closed convex set. The direction s ∈ Rn is said to
be normal to C at x when

hs, y − xi ≤ 0, ∀ y ∈ C.

According to the definition, for any y ∈ C, the angle between the two vectors
shown below is obtuse.

x
s
y

Notice that if s is normal to C at x, then αs is also normal, for α ≥ 0. The

set of normal directions is therefore a cone.

Definition 11. The cone mentioned above is called the normal cone of C at
x, and is denoted as NC (x).

Proposition 14. NC (x) is a convex cone.

Proof. If s1 , s2 are in NC (x), then

hαs1 + (1 − α)s2 , y − xi = αhs1 , y − xi + (1 − α)hs2 , y − xi ≤ 0,

implying that αs1 + (1 − α)s2 ∈ NC (x).

Below are two examples.

C x1 + NC (x1 )

x1 D
z
x2 z + ND (z)

x2 + NC (x2 )

From the definition, we directly obtain the following results on normal cones.

Proposition 15. Let C be closed, convex. If s ∈ NC (x), then

PC (x + αs) = x, ∀ α ≥ 0.

12
Normal cones will be of interest when we discuss constrained minimization,
and subdifferentials of functions.
Let us now define a related cone through ‘polarity’.

Definition 12. Given a closed cone C, the polar cone of C is the set of p such
that

hp, si ≤ 0, ∀ s ∈ C.

In the figure below, the dot indicates the origin, and D is the polar cone of C.
Note also that C is the polar cone of D.

C D

Definition 13. The polar cone of NC (x) is called the tangent cone of C at x,
and is denoted by TC (x).

The figures below show the tangent cones of the sets at the origin (the origin
is indicated by a dot).

NC (0) TC (0)
TC (0) C NC (0)

Proposition 16. For a convex C, we have x + TC (x) ⊃ C.

Proof. If p ∈ C and s ∈ NC (x), then

hp − x, si ≤ 0
⇒ p − x ∈ TC (x).

13
Proposition 17. Suppose C is closed and convex. Then

∩x∈C x + TC (x) = C.

Proof. Let us denote the set on the lhs as D. Note that, by the previous
proposition, D ⊃ C.
To see the converse inclusion, take z ∈
/ C. Let x = PC (z). Then z − x ∈ NC (x)
and z − x ∈/ TC (x). Thus z ∈/ x + TC (x), and thus z ∈
/ D.

Proposition 18. TC (x) is the closure of the cone generated by C − x.

1
Exercise 5. Show that C is convex if and only if (x+y) ∈ C, for all x, y ∈ C.
2

Proposition 19. Let x ∈ bd(C), where C is convex. There exists s such that

hs, xi ≤ hs, yi, ∀y ∈ C.

Proof. Consider a sequence {xk }k with xk ∈ / C and limk xk = x. Also, let

y ∈ C. We can find a sequence {sk }k with ksk k2 = 1 such that hsk , xk i ≤ hsk , yi
for all k. Now, extract a convergent subsequence ski with limit s (such a
subsequence exists by the Bolzano-Weierstrass theorem, since sk are bounded,
and are in Rn .). Then, we must have

hski , xki i ≤ hski , yi, ∀i.

Taking limits, we obtain hs, xi ≤ hs, yi.

Alternative Proof (Sketch). Assume NC (x) 6= ∅. Take z ∈ NC (x). Then,

hz, y − xi ≤ 0. This is equivalent to h−z, xi ≤ h−z, yi.

14
2 Convex Functions
The standard definition is as follows.
Definition 14. f : Rn → R is said to be convex if for all x, y ∈ Rn , and
α ∈ [0, 1], we have

f αx + (1 − α)y ≤ αf (x) + (1 − α) f (y).

If the above inequality can be made strict, the function is said to be strictly
convex.
The inequality is demonstrated in the figure below.
f (·)

x y

In order to link convex functions and sets, let us also introduce the following.
Definition 15. Given f : Rn → R, the epigraph of f is the subset of Rn+1
defined as
n o
n
epi(f ) = (x, r) ∈ R × R : r ≥ f (x) .

epi(f )

Proposition 20. f is convex if and only if its epigraph is a convex set in Rn+1 .

Proof. (⇒) Suppose f is convex. Pick (x1 , r1 ), (x2 , r2 ) from epi(f ). Then,

r1 ≥ f (x1 )
r2 ≥ f (x2 ).

Using the convexity of f , we have,

1 1 1 1
f x 1 + x2 ≤ f (x1 ) + f (x2 ) ≤ (r1 + r2 ).
2 2 2 2
Therefore
1
(x1 , r1 ) + (x2 , r2 ) ∈ epi(f ),
2
15
and so epi(f ) is convex.
(⇐) Suppose epi(f ) is convex. Notice that since (x1 , f (x1 )), (x2 , f (x2 )) ∈
epi(f ), we have

αx1 + (1 − α)x2 , αf (x1 ) + (1 − α)f (x2 ) ∈ epi(f ).

But this means that

f αx1 + (1 − α)x2 ≤ αf (x1 ) + (1 − α) f (x2 ).

Thus, f is convex.

Definition 16. The domain of f : Rn → R is the set

dom(f ) = {x ∈ Rn : f (x) < ∞}.

Example 10. Let C be a set in Rn . Consider the function

(
0, if x ∈ C,
iC (x) =
∞, if x ∈
/ C.

iC (x) is called the indicator function of the set C. Its domain is the set C.

Exercise 6. Show that iC (x) is convex if and only if C is convex.

Exercise 7. Consider the function

(
0, if x ∈ C,
uC (x) =
1, if x ∈/ C.

Determine if uC (x) is convex. If so, under what conditions?

Proposition 21 (Jensen’s inequality). Let f : Rn → R be convex P and

x1 , . . . , xk , be points in its domain. Also, let α1 , . . . , αk ∈ [0, 1] with i αi = 1.
Then,
X X
f α i xi ≤ αi f (xi ). (2.1)
i i

Proof. Notice that xi , f (xi ) ∈ epi(f ), for i = 1, . . . , k. Therefore,
!
X X
α i xi , αi f (xi ) ∈ epi(f ).
i i

But this is equivalent to (2.1).

16
Definition 17. f is said to be concave if −f is convex.

Below are some examples of convex functions.

Example 11. Affine functions : f (x) = hs, xi+b, for some s and b. In relation
with this, determine if f (x, y) = xy is convex.
Example 12 (Norms). Suppose k·k is a norm. Then by the triangle inequality,
and the homogeneity of the norm, we have

kαx + (1 − α)yk ≤ αkxk + (1 − α)kyk.

In particular, for 1 ≤ p ≤ ∞, the `p norm is defined as

!1/p
X
kxkp = |xi |p .
i

Show that kxkp is actually a norm. What happens if p < 1?

Example 13 (Quadratic Forms). f (x) = xT Qx is convex if Q + QT is positive
semidefinite. Show this!
Exercise 8. Show that f (x) above is not convex if Q + QT is not positive
semi-definite.

2.1 Operations That Preserve the Convexity of Func-

tions
1. Translation by an arbitrary amount, multiplication with a non-negative
constant.
2. Dilation : If f (x) is convex, so is f (αx), for α ∈ R. (Follows by consid-
ering the epigraph.)
3. Pre-Composition with a matrix : If f (x) is convex, so is f (Ax). Show
this!
4. Post-Composition with an increasing convex function : Suppose g : R →
n
R is an increasing convex function and f : R → R is convex. Then,
g f (·) is convex.

Proof. Since g is increasing, and convex we have

g f αx1 + (1 − α)x2 ≤ g αf (x1 ) + (1 − α)f (x2 )

≤ α g f (x1 ) + (1 − α) g f (x2 ) .

17
5. Pointwise supremum of convex functions : Suppose f1 (·), . . . fk (·) are
convex functions, and define
g(x) = max fi (x).
i

Then g is convex.

Proof. Notice that

\
epi(g) = epi(fi ).
i

Since intersections of convex sets are convex, epi(g) is convex, and so g

is convex.
Below is a visual demonstration of the proof.
epi(g)
f1 f2

2.2 First Order Differentiation

Proposition 22. Suppose f : Rn → R is a differentiable function. Then, f is
convex if and only if
f (y) ≥ f (x) + h∇f (x), y − xi ∀x, y ∈ Rn .

Proof. (⇒) Suppose f is convex. Then,

f x + α(y − x) ≤ (1 − α)f (x) + αf (y), for 0 ≤ α ≤ 1.

Rearranging, we obtain

f x + α(y − x) − f (x)
≤ f (y) − f (x) for 0 ≤ α ≤ 1.
α
Letting α → 0, the left hand side converges to h∇f (x), y − xi.
(⇒) Consider the function gy (x) = f (y) + h∇f (y), x − yi. Then, gy (x) ≤ f (x)
and gy (y) = f (y) (see below). Also, gy (x) is convex (in fact, affine).
gy (·) f (·)

18
Now set

h(x) = sup gy (x).

But, by the two properties of gy (·) above, it follows that h(x) = f (x). But
since h(x) is the supremum of a collection of convex functions, it is convex.
Thus, it follows that f (x) is convex.

We remark that a similar construction as in the second part of the proof will
lead to the conjugate of f (x), which will be discussed later.
Note that if f : R → R is differentiable and convex, then f 0 (·) is a monotonously
increasing function. To see this, suppose that f 0 (x0 ) = 0. Then, for y ≥ x0 ,
we can show that f 0 (y) ≥ f 0 (x0 ) = 0. Indeed, f (y) ≥ f (x0 ) because of the
proposition. If f 0 (y) < 0, then

f (x0 ) ≥ f (y) + hf 0 (y), x − yi > f (x0 ),

|{z} | {z }
≥f (x0 ) >0

which is a contradiction. Therefore, we must have f 0 (y) ≥ 0.

To generalize this argument, apply it to

hx0 (x) = f (x) − f 0 (x0 ) x

Observe that h0 (x0 ) = 0, and h0x0 (x) = f 0 (x) − f 0 (x0 ).

Question 2. How does the foregoing discussion generalize to convex functions
defined on Rn ?
Definition 18. An operator M : Rn → Rn is said to be monotone if

hM (x) − M (y), x − yi ≥ 0, for all x, y ∈ Rn .

Below is an instance of this relation. The two vectors x − y and M (x) − M (y)
approximately point in the same direction.
y
M (y)

x
M (x)

Proposition 23. Suppose f : Rn → R is differentiable. Then, f is convex if

and only if ∇f is monotone.

19
Proof. (⇒) Suppose f is convex. This implies the following two inequalities.

f (y) ≥ f (x) + h∇f (x), y − xi,

f (x) ≥ f (y) + h∇f (y), x − yi,

Rearranging these inequalities, we obtain

0 ≥ h∇f (x) − ∇f (y), y − xi.

(Lef tarrow) Suppose ∇f is monotone. For y, x ∈ Rn , let z = y − x. Then

Z 1
f (y) − f (x) = h∇f (x + αz), zi dα.
0

Rearranging,
Z 1 Z 1
f (y) − f (x) − h∇f (x), zidα = h∇f (x + αz) − ∇f (x), zi dα.
0 0

But the right hand side is non-negative by the monotonicity of ∇f . It follows

that

f (y) ≥ f (x) + h∇f (y), y − xi.

It then follows by Prop. 22 that f is convex.

2.3 Second Order Differentiation

We have seen that f : R → R is convex if and only if its first derivative
if monotone increasing. But the latter property is equivalent to the second
derivative being non-negative. Therefore, we also have an additional equivalent
condition that involves the second derivative. This generalizes as follows.

Proposition 24. Let f : Rn → R be a twice-differentiable function. Then, f

is convex if and only if ∇2 f is positive semi-definite.

Proof. (⇒) If f is convex, then F = ∇f : Rn → Rn is a monotone mapping,

by Prop. 23. Let d ∈ Rn . Then,

hF (x + αd) − F (x), αdi ≥ 0 for all α > 0.

This implies

F (x + αd) − F (x)
, d ≥ 0 for all α > 0.
α

20
Taking limits (which exist because f is twice differentiable), we obtain hG(x) d, di ≥
0, where
 
∂12 f ∂2 ∂1 f . . . ∂n ∂1 f
G(x) = ∇2 f =  ... ..
 
. 
2
∂1 ∂n f ∂2 ∂n f . . . ∂n f

(⇐) Conversely, assume that G(x) = ∇F is positive semi-definite. We will

show that F = ∇f is monotone. Notice that
Z 1
F (x + d) − F (x) = G(x + αd) d dα.
0

This implies
Z 1
hF (x + d) − F (x), di = hG(x + αd) d, di ≥ 0.
0 | {z }
≥0

Therefore F is monotone.

21
3 Conjugate Functions
This chapter introduces the notion of a conjugate function, along with some
basic properties.
Suppose epi(f ) is a closed set. We will now consider a dual representation of
this set.
Consider a point in epi(f ) : (x0 , f (xo )) ∈ Rn+1 , where f : Rn → R.

f (x0 )

Notice that this point is on the boundary of epi(f ). Therefore we can find a
point (z, c) such that

hz, x0 i + cf (x0 ) ≥ hz, yi + cr for all (y, r) ∈ epi(f ). (3.2)

Notice that here c ≤ 0 since if y ∈ dom(f ), r can be arbitrarily large.

Now, if c 6= 0, we can find s, and r such that fs,r (x) = hs, xi + r minorizes
f (x) at x0 . That is,

fs,r (x0 ) = f (x0 )

fs,r (x) ≤ f (x)

To be concrete, we obtain s and r as follows.

hz/c, x0 i + f (x0 ) ≤ hz/c, xi + f (x)

⇐⇒ h−z/c, xi + hz/c, x0 i + f (x0 ) ≤ f (x)
| {z } | {z }
s r

The remaining question is : can we always find a (z, c) pair with c 6= 0 such
that (3.2) holds?
Fortunately, the answer is yes, and it is easier to see if we assume dom(f ) = Rn .
Note that, in this case, if (z, c) 6= (0, 0) and (3.2) holds, then c = 0 implies
that

hz, x0 i ≥ hz, yi for all y ∈ Rn .

But this inequality is not valid for y = 2zkx0 k2 /kzk2 . Thus, we must have
c 6= 0. The general case is considered below.
Lemma 1. Suppose f : Rn → R is closed, convex. Then, there exist (z, c)
with z 6= 0, c 6= 0 such that (3.2) holds.

22
Proof. To see that we can find a pair (z, c) with c 6= 0, let xk ∈ relint(dom(f ))
with xk → x. Then, we can find (sk , ck ) such that
hsk , xk i + ck f (xk ) ≤ hsk , yi + ck f (y).
Here, if ck = 0, then
hsk , xk − yi ≤ 0, ∀y ∈ dom(f ).
But since xk ∈ relint(dom(f )), xk + α(xk − y) ∈ dom(f ) for sufficiently small
α > 0. This implies hsk , y − xk i ≤ 0, which is a contradiction (here, we
assume that sk is included in the subspace parallel to aff(dom(f ))). Therefore,
ck 6= 0.

The foregoing discussion implies that, for a closed convex f : Rn → R, given

any x0 ∈ Rn , we can find z ∈ Rn and r ∈ R such that the following two
conditions hold:
hz, x0 i + r = f (x0 )
hz, xi + r ≤ f (x).

This is depicted below.

hz, ·i + r f (·)

Notice that there is a maximum value of r, associated with a z, so that the

above two conditions are valid. How can we find this value?
Consider the following figure.
f (·)
hz, ·i

In order to find the maximum r, we can look for the minimum vertical distance
between f (x) and hz, xi. That is, we set
r = inf f (x) − hz, xi
x

23
Observe that this definition implies the two conditions in (3.3).
In order to work with sup rather than inf, and to emphasize the dependence
on z, and we define
f ∗ (z) = −r = suphz, xi − f (x).
x
∗
The function f (z) is called the Fenchel conjugate of f , and thanks to the
supremum operation, it is convex with respect to z – note that in the defini-
tion of f ∗ , x acts like an index. In addition to convexity, f ∗ ∗ is also closed,
because its epigraph is the intersection of closed sets (half spaces).The follow-
ing inequality follows immediately from the definition.
Proposition 25 (Fenchel’s inequality). For a closed convex f : Rn → R, we
have
f (x) + f ∗ (z) ≤ hz, xi, for all x, z.

Consider now the conjugate of the conjugate:

f ∗∗ (x) = suphz, xi − f ∗ (z).
z

Since the function g(z) = hz, xi−f ∗ (z) minorizes f (x), we have f ∗∗ (x) ≤ f (x).
However, by the previous discussion, we also know that for any x∗ , there is
a pair (z ∗ , r∗ ) such that hz ∗ , xi + r∗ = f (x∗ ). Therefore, we deduce that
f ∗∗ (x) = f (x). Thus, we have shown the following.
Proposition 26. If f : Rn → R is a closed convex function, then
f (x) = sup hz, xi − f ∗ (z).
z

Example 14. Let f (x) = 21 xT Q x, where Q is positive definite. Then,

1
f ∗ (z) = suphz, xi − xT Q x.
x 2
Maximum is achieved at x∗ such that z = Q x∗ . Plugging in x∗ = Q−1 z, we
obtain
1
f ∗ (z) = z T Q−1 z − z T Q−T Q Q−1 z
2
1 T −1
= z Q z.
2

Example 15. Let C be a convex set. The indicator function is defined as
(
0, if x ∈ C,
iC (x) =
∞, if x ∈
/ C.
The conjugate
σC (z) = sup hz, xi − iC (x) = sup hz, xi.
x x∈C

24
Some Properties of the Conjugate Function

(i) If g(x) = f (x − s), then

g ∗ (z) = sup hz, xi − f (x − s)
x
= suphz, y + si − f (y)
y
∗
= f (z) + hz, si.

(ii) If g(x) = t f (x), with t > 0, then

g ∗ (z) = sup .hz, xi − tf (x)
x
= t suphz/t, xi − f (x)
x
z
= tf ∗ .
t
(iii) If g(x) = f (tx), then
g ∗ (z) = sup hz, y/ti − f (y)
y=tx
∗ z

=f .
t
(iv) If A is invertible, and g(x) = f (Ax), then
g ∗ (z) = suphz, xi − f (Ax)
x
= suphA−T z, yi − f (y)
y

= f ∗ (A−T z).

(v) If g(x) = f (x) + hs0 , xi, then

g ∗ (z) = sup hz − s0 , xi − f (x)
x
∗
= f (z − s0 ).

(vi) If g(x1 , x2 ) = f1 (x1 ) + f2 (x2 ), then

g ∗ (z1 , z2 ) = sup hz1 , x1 i + hz2 , x2 i − f (x)
x1 ,x2

= f (z1 ) + f ∗ (z2 ).
∗

(vii) If g(x) = f1 (x) + f2 (x), then

∗ ∗
g (z) = suphz, xi − f1 (x) − suphy, xi − f2 (y)
x y

= sup inf hz, xi − f1 (x) − hy, xi + f2∗ (y).

x y

25
We will later dicsuss that whenever there is a ‘sadlle point’, we can ex-
change the order of inf and sup. Doing so, we obtain

g ∗ (z) = inf sup hz − y, xi − f1 (x) + f2∗ (y)

y x
= inf f1∗ (z − y) + f2∗ (y).
y

The operation that appears in the last line is called infimal convolution.

(viii) More generally, if g(x) = f1 (x) + f2 (x) + . . . + fn (x), then

g ∗ (z) = inf
z1 ,...,zn
f1∗ (z1 ) + . . . + fn∗ (zn ).
s.t. z=z1 +...+zn

Example 16. The last property will be useful when we consider multiple
constraints. In particular, let C = A ∩ B, where A, B are convex sets. Then
we have that

σC (x) = suphz, xi = inf σA (x − y) + σB (y).

z∈C y

To see this, note that σ ∗ (z) = iC (z). But iA R B (z) = iA (z) + iB (z). Computing
the conjugate, we obtain

i∗C (x) = σC (x) = inf σA (x − y) + σB (y).

26
4 Duality
This chapter introduces the notion of duality. We start with a general discus-
sion of duality. We then pass to Lagrangian duality, and finally consider the
Karush-Kuhn-Tucker conditions of optimality.

4.1 A General Discussion of Duality

Consider a minimization problem like

min f (x), where C is a closed convex set.

x∈C

Suppose there exists a function K(x, λ), which is

(i) convex for fixed λ, as a function of x,

(ii) concave for fixed x, as a function of λ,

(
f (x), if x ∈ C,
max K(x, λ) =
λ∈D ∞, if x ∈ / C,
for some closed convex set D.
Example 17. Consider the problem
1
min ky − xk22 + kxk2 , (4.4)
x 2
for x, y ∈ C = Rn . Here, by Cauchy-Schwarz inequality, we can write

kxk2 = maxhx, λi,

λ∈B2

where B2 is the unit `2 ball of Rn . In this case, the problem (4.4) is equivalent
to
1
min max ky − xk22 + hy, xi .
x λ∈B2 2
| {z }
K(x,λ)

In short, “minx∈C f (x)” is equivalent to

min max K(x, λ).

x∈C λ∈D

We call (x∗ , λ∗ ) a solution of this problem if

K(x∗ , λ) ≤ K(x∗ , λ∗ ) ≤ K(x, λ∗ ) for all x ∈ C, λ ∈ D.

27
If such a (x∗ , λ∗ ) exists, it is called a saddle point of (x, λ).
See below for a depiction.

λ
This approach is useful especiallly if for fixed λ, K(x, λ) is easy to minimize
with respect to x. In that case, if λ = λ∗ , then minimizing K(x, λ∗ ) over x ∈ C
is equivalent to solving the problem.
Example 18. Suppose λ is fixed. Then to maximize
1
ky − xk22 + hλ, xi,
2
set the gradient to zero. This gives

x−y+λ=0 ⇔ x = y − λ.

The question now is, how do we obtain λ∗ ? For that, define the function

g(λ) = min K(x, λ).

x∈C

Notice that

g(x, λ) ≤ K(x, λ) for all x ∈ C.

Consider now the problem

max g(λ). (4.5)

λ∈D

Exercise 9. Show that g(λ) is concave for λ ∈ D.

Now let λ̄ = max g(λ). Also, let x̄ = arg min K(x, λ̄). Then,
λ∈D x∈C

K(x̄, λ̄) ≤ K(x, λ̄) for x ∈ C,

28
and

g(λ̄) = K(x̄, λ̄) ≥ g(λ) ≥ K(x̄, λ) for λ ∈ D.

Combining these, we obtain

K(x̄, λ) ≤ K(x̄, λ̄) ≤ K(x, λ̄) for x ∈ C, λ ∈ D.

Therefore, (x̄, λ̄) is a saddle point, and x̄ solves the problem

min f (x).
x∈C

We have shown that if a saddle point exists, we can obtain it either by solving
a min − max or a max − min problem.
Here, (4.4) is called the primal problem and (4.5) is called the dual problem.
Note that there might be different dual problems depending on how we choose
K(x, λ). Notice also that if a saddle point exists, we have

d∗ = max g(λ) = min f (x) = p∗ .

λ∈D x∈C

Example 19. Consider the problem

1
min ky − xk22 + kxk2 .
x 2
An equivalent problem is
1
min max ky − xk22 + hx, λi,
x λ∈B2 2
where B2 is the unit ball of the `2 norm. Let us define
1
g(λ) = min ky − xk22 + hx, λi.
x 2
Carrying out the minimization, we find
1
g(λ) = kλk22 + hy − λ, λi
2
1
= − ky − λk22 + c,
2
for some constant c. Therefore, the dual problem is

max −ky − λk22 .

λ∈B2

Or, equivalently

min ky − λk22 .
λ∈B2

29
The minimizer is the projection of y onto B2 , and is given by
(
y, if kyk2 ≤ 1,
λ∗ =
y/kyk2 , if 1 < kyk2 .

The solution of the primal problem is


0, if kyk2 ≤ 1,
∗ ∗
x = y − λ = y − PB2 (y) = kyk2 − 1
y , if 1 < kyk2 .
kyk2

Notice that this discussion depends heavily on the existence of a saddle point.
However, even if such a point does not exist, we can define a dual problem,
but in this case, the maximum of the dual d∗ and the minimum of the primal
problem p∗ are not necessarily equal. Instead, we will have d∗ ≤ p∗ . To see
this, note that

g(λ) ≤ K(x, λ) ≤ f (x) for all x, λ.

Therefore,

d∗ = g(λ∗ ) ≤ K(x, λ∗ ) ≤ f (x∗ ) = p∗ .

Therefore, p∗ − d∗ is always nonnegative. This difference is called the duality

gap.
The following proposition is a summary of the foregoing discussion. We provide
a proof for the sake of completeness.

Proposition 27. Let K(x, λ) be convexoconcave, and define

f (x) = sup K(x, λ)

λ
g(λ) = inf K(x, λ).
x

Then,

K(x∗ , λ) ≤ K(x∗ , λ∗ ) ≤ K(x, λ∗ ) (4.6)

if and only if

x∗ ∈ arg min f (x), (4.7)

x
λ∗ ∈ arg max g(λ),
λ
inf sup K(x, λ) = sup inf K(x, λ).
x λ λ x

30
Proof. (⇒) Suppose (4.6) holds. Then, since K(x∗ , λ∗ ) = f (x∗ ) = g(λ∗ ), and
since f (x) ≥ g(λ) for arbitrary x, λ, it follows that (4.7) holds too.
(⇐) Suppose (4.7) holds. Then, by the last equality in (4.7), we have f (x∗ ) =
g(λ∗ ). Now by the definition of f (·), we have

f (x∗ ) ≥ K(x∗ , λ), for any λ.

Similarly, by the definition of g(·), we obtain

g(λ∗ ) ≤ K(x, λ∗ ) for any x.

Using the definition of f , and g once again, we can write,

f (x∗ ) ≥ K(x∗ , λ∗ ) ≥ g(λ∗ ).

Since f (x∗ ) = g(λ∗ ), we therefore obtain the desired inequality (4.6) by com-
bining these inequalities.

4.2 Lagrangian Duality

We now discuss a specific dual, associated with a constrained minimization
problem.


 g1 (x) ≤ 0,

g2 (x) ≤ 0,

min f (x) subject to .. (4.8)
x  .



g (x) ≤ 0,
m

where all of the functions are closed, convex, and defined on Rn .

In this setting, we define the Lagrangian function as
(
f (x) + λ1 g1 (x) + λ2 g2 (x) + . . . λm gm (x), if λi ≥ 0
L(x, λ) =
−∞, if at least one λi ≤ 0.

Notice that
(
f (x), if gi (x) ≤ 0, for all i,
max L(x, λ) =
λ≥0 ∞ otherwise.

Therefore, (4.8) is equivalent to

min max L(x, λ).

x λ≥0

31
Also, notice that for fixed x, L(x, λ) is concave (in fact affine), with respect to
λ. It follows from the previous discussion that if (x∗ , λ∗ ) is a saddle point of
L(x, λ) if x∗ solves (4.8), or
λ∗ = arg min L(x, λ∗ )
x
= arg min f (x) + λ∗1 g1 (x) + λ∗2 g2 (x) + . . . + λ∗m gm (x).
x

Notice that the problem is transformed into an unconstrained problem, with

the help of λ∗ . To obtain λ∗ , we consider the Lagrangian dual, defined as
g(λ) = min L(x, λ).
x

The dual problem is,

max g(λ).
λ≥0

If a saddle point exists, λ∗ is the solution of this dual problem. In that case,
we obtain a minimizer (which need not be unique) as
x ∈ arg min L(x, λ∗ ).
x

Example 20. Consider the problem

min x s.t. x2 + 2x ≤ 0.
The dual function is
g(λ) = min x + λ(x2 + 2x) .
x | {z }
L(x,λ) for λ≥0

At the minimum we must have

1
1 + λ (2x + 2) = 0 ⇐⇒ − − 1 = x.
2λ
Plugging in, we obtain,

1 1 1 1
g(λ) = − − 1 + λ +1+ − −2
2λ 4λ2 λ λ
1
= −λ − − 1.
4λ
λ∗ satisfies
1
−1 + .
(2λ∗ )2
1
Solving for λ∗ , and taking the positive root (since λ∗ ≥ 0), we find λ∗ = .
2
Therefore,
1
x∗ = arg min x + (x2 + 2x),
x 2
which can be easily solved, to give x∗ = −2. Note that we can see this easily
if we sketch the constraint function.

32
So far, the discussion relied on the assumption that the Lagrangian has a saddle
point, so that the duality gap is zero. A natural question is to ask when this
can be guaranteed. The conditions that ensure the existence of a saddle point
are called constraint qualification. A simple one to state is Slater’s condition.
Proposition 28 (Slater’s condition). Suppose f (·) and gi (·) for i = 1, 2, . . . , n
are convex functions. Consider the problem
min f (x), s.t. gi (x) ≤ 0, ∀i. (4.9)
x

Suppose there exists x̄ such that gi (x̄) ≤ 0 for all i, and gj (x) < 0 for some j.
Then, x∗ solves (4.9) if and only if there exists λ∗ ≥ 0 such that (x∗ , λ∗ ) is a
saddle point of the function
 n
f (x) + P λ g (x), if λ ≥ 0 for all i,
i i i
L(x, λ) = i=1
−∞, otherwise.


Proof. For simplicity, we assume that n = 1, so there is only one constraint

function g(·).
Assume that x∗ solves (4.9). Consider the sets

z1 z1 f (x)
A= : ≥ for some x ,
z2 z2 g(x)
f (x∗ )

z1 z1
B= : < .
z2 z2 0

f (x∗ )
g
B

It can be shown that both A and B are convex (show this!). Further, we have
A ∩ B = ∅ (show this!). Therefore, there exists µ1 , µ2 such that
µ1 z1 + µ2 z2 ≤ µ1 t1 + µ2 t2 for all z ∈ B, t ∈ A.
Note here that µ2 ≥ 0 since z2 can be taken as small as desired. Similarly,
µ1 ≥ 0, since z1 can be taken as small as desired. Also notice that µ1 6= 0,
since otherwise we would have 0 ≤ g(x) for all x, but we already know that
g(x̄) < 0. Therefore, for λ∗ = µ2 /µ1 , we can write
z1 + λ∗ z2 ≤ 0 · t1 + λ∗ t2 , for all z ∈ B, t ∈ A.

33
Consequently,

f (x∗ ) ≤ f (x) + λ∗ g(x) for all x and λ∗ ≥ 0.

In particular, we have that f (x∗ ) ≤ f (x∗ ) + λ∗ g(x∗ ). Since g(x∗ ) ≤ 0, and

λ∗ ≥ 0, it follows that λ∗ g(x∗ ) = 0. So, we have

f (x) + λ∗ g(x) ≥ f (x∗ ) = f (x∗ ) + λ∗ g(x∗ ) ≥ f (x∗ ) + λ g(x∗ ) for all λ ≥ 0.

Thus, (x∗ , λ∗ ) is a saddle point.

4.3 Karush-Kuhn-Tucker (KKT) Conditions

Suppose now that Slater’s conditions are satisfied so that x∗ solves the problem
so that (x∗ , λ∗ ) is a saddle point of the Lagrangian L(x, λ). Notice that in this
case, x∗ is a minimizer for the problem,

min f (x) + λ∗1 g1 (x) + λ∗2 g2 (x) + . . . + λ∗m gm (x)

If all of the functions above are differentiable, we can write

∇f (x∗ ) + λ∗1 ∇g1 (x∗ ) + λ∗2 ∇g2 (x∗ ) + . . . + λ∗m ∇gm (x∗ ) = 0.

But since λ∗i ≥ 0, we have that if gi (x∗ ) < 0, then we must have λ∗i . = 0.
Therefore, λ∗i gi (x∗ ) = 0 for all i. Collected together, these conditions are
known as KKT conditions.

λ∗i ≥ 0,
gi (x∗ ) ≤ 0,
λ∗i gi (x∗ ) = 0, (‘Complementary slackness’)
X
∇f (x∗ ) + λ∗i ∇gi (x∗ ) = 0. (4.10a)
i

By the above discussion these conditions are necessary for optimality. It turns
out that they are also sufficient conditions. To see this, first observe that
(4.10a) implies

g(λ∗ ) = L(x∗ , λ∗ ).

But since λ∗i gi (x∗ ) = 0, we have

X
L(x∗ , λ∗ ) = f (x∗ ) + λ∗i gi (x∗ ) = f (x∗ ).
i

Therefore, g(λ∗ ) = L(x∗ , λ∗ ) = f (x∗ ), i.e., (x∗ , λ∗ ) is a saddle point of L(x, λ).
Thus x∗ solves the primal problem.

34
5 Subdifferentials
A convex function does not have to be differentiable. However, even when it
is not differentiable, there is considerable structure in how it varies locally.
Subdifferentials generalize derivatives (or gradients) and capture this structure
for convex functions. In addition, they enjoy a certain calculus, which proves
very useful in deriving minimization algorithms. We introduce, and discuss
some basic properties of subdifferentials in this chapter. We start with some
motivations underlying definitions and basic properties. We then explore con-
nections with KKT conditions. Finally, we dicsuss the notion of monotonicity,
a fundamental property of subdifferentials.

5.1 Motivation, Definition, Properties of Subdifferen-

tials
Recall that if f : Rn → R is differentiable and convex, then

f (x) ≥ f (y) + hx − y, ∇f (y)i

for all x, y. In fact, s = ∇f (y) is the unique vector that satisfies the inequality
below

f (x) ≥ f (y) + hx − y, si for all x, y. (5.11)

f (y) + hx − y, ∇f (y)i f (·)

This useful observation has the shortcoming that it requires f to be differ-

entiable. In general, a convex function may not be differentiable – consider
for instance f (x) = |x|. We define the subdifferential by making use of the
inequality (5.11).

Definition 19. Let f : Rn → R be convex. The subdifferential of f at y is

the set of s that satisfy

f (x) ≥ f (y) + hx − y, si for all x.

This set is denoted by ∂f (y).

Since for every y ∈ dom(f ), we can find r, s, such that

(i) r + hs, yi = f (y),

35
(ii) r + hs, yi ≤ f (x),

it follows that ∂f (y) is non-empty. ∂f (·) has other interesting properties as

well.
Proposition 29. For every y ∈ dom(f ), ∂f (y) is a convex set.

Proof. Suppose (s1 , s2 ) ∈ ∂f (y). Then, we have

f (y) + hx − y, s1 i ≤ f (x)
f (y) + hx − y, s2 i ≤ f (x).

Taking a convex combination of these inequalities, we find

f (y) + hx − y, αs1 + (1 − α)s2 i ≤ f (x).

Example 21. Let f (x) = |x|. Then


{−1}, if x < 0,

∂f (x) = [−1, 1], if x = 0,

{1}, if x > 0.


Notice the example above, wherever the function is differentiable, the subdif-
ferential coincides with the gradient. This holds in general.
Proposition 30. If f : Rn → R is differentiable at x, then ∂f (x) = {∇f (x)}.

Proof. Suppose s ∈ ∂f (x). In this case, for any d ∈ Rn , t ∈ R, we have

f (x + td) − f (x)
f (x + t d) ≥ f (x) + hs, t di ⇐⇒ hs, di ≤ for all t, d.
t
Now let t → 0. The inequality above implies

hs, di ≤ h∇f (x), di, for all d. (5.12)

Now notice that

f (x − td) − f (x)
hs, −di ≤ for all t, d.
t
Therefore,

hs, −di ≤ h∇f (x), −di, for all d. (5.13)

Taken together, (5.12), (5.13) imply hs, di = h∇f, di for all d. Thus, s =
∇f (x).

36
The subdifferential can be used to characterize the minimizers of convex func-
tions.

Proposition 31. Let f : Rn → R be convex. Then,

x∗ ∈ arg min f (x)

if and only if

0 ∈ ∂f (x∗ ).

Proof. This follows from the equivalences

x∗ ∈ arg min f (x)

x
∗
⇐⇒ f (x ) ≤ f (x) for all x,
⇐⇒ f (x) ≤ f (x∗ ) + h0, x − x∗ i for all x,
⇐⇒ 0 ∈ ∂f (x∗ ).

Proposition 32. Let f , g : Rn → R be convex functions. Also let h = f + g.

If x ∈ dom(f ) ∩ dom(g), then

∂h(x) = ∂f (x) + ∂g(x).

Proof. Let s1 ∈ A, s2 ∈ B. Then,

f (x) + hy − x, s1 i + g(x) + hy − x, s2 i ≤ f (y) + g(y).

Therefore s1 + s2 ∈ C. Since s1 and s2 are arbitrary members of A, B, it

follows that A + B ⊂ C.
For the converse (i.e., C ⊂ A. + B), we need to show that any z ∈ ∂h(x) we
can find u ∈ ∂f (x) such that z − u ∈ ∂g(x). This is equivalent to saying that,
we can find u ∈ ∂f (x), v ∈ ∂g(x) such that z = u + v. We take a detour to
show this result.

Let us now study the link between conjugate functions and subdifferentials.
This will complete the remaining part of Prop. 5.1.

Proposition 33. Let f : Rn → R be a closed, convex function and f ∗ denote

its conjugate. Then s ∈ ∂f (x) if and only if

f (x) + f ∗ (x) = hs, xi. (5.14)

37
Proof. Since f (x) = supz hz, xi − f ∗ (z), we have

f (x) + f ∗ (x) ≥ hz, xi, for all z, x. (5.15)

Consider now the following chain of equivalences

s ∈ ∂f (x)
⇐⇒ f (x) − hs, xi ≤ f (y) − hs, yi for all y
⇐⇒ f (x) − hs, xi ≤ inf f (y) − hs, yi = − suphs, yi − f (y) = −f ∗(5.16)
(s).
y y

Combining the two inequalities (5.15), (5.16), we obtain (5.14).

Corollary 4. x ∈ ∂f ∗ (s) if and only if f (x) + f ∗ (s) = hx, si.

Corollary 5. 0 ∈ ∂f (x) if and only if x ∈ f ∗ (0).

Proof. Consider the following chain of equivalences.

0 ∈ ∂f (x) ⇐⇒ f (x) + f ∗ (x) = h0, xi ⇐⇒ x ∈ ∂f ∗ (x).

Example 22. Recall the definition of a support function : for a closed convex
set C, let σC (x) = supz∈C hz, xi. Recall also that σC∗ (s) = iC (s). Therefore,

s ∈ ∂σC (x) ⇐⇒ σC (x) + iC (s) = hs, xi

⇐⇒ σC (x) = hs, xi, ∀s ∈ C.

In words, ∂σC (x) is the set of s ∈ C for which σC (x) = hs, xi.
[Insert Fig. on p37]

Example 23. Consider now a closed convex set C, and let us obtain a de-
scription of the subdifferential of the characteristic function of C at x, i.e.,
∂iC (x).
Observe that

s ∈ ∂iC (x) ⇐⇒ iC (y) ≥ iC (x) + hs, y − xi, for all y.

If x ∈
/ C, the inequality is not satisfied for any s. In this case, ∂iC (x)0∅.
If x ∈ C, then there are two cases to consider

(i) If y ∈
/ C, then iC (y) = ∞, and the inequality is always satisfied.

(ii) If y ∈ C, then

0 ≥ hs, y − xi, ∀y ∈ C ⇐⇒ s ∈ NC (x).

38
In summary,
(
NC (x), if x ∈ C,
∂iC (x) =
∅, if x ∈
/ C.

See below for when C is a rectangular set.

x + ∂iC (x)

x
C

We now go back to Prop. 5.1, and complete the proof. Recall that what
remains is to show that ∂h(x) ⊂ ∂f (x) + ∂g(x), where h(x) = f (x) + g(x) for
convex functions f (x), g(x).

Proof of Prop. cont’d: Let u ∈ ∂h(x). This implies,

h(x) + h∗ (x) = hx, ui.

Plugging in the definition of h, we can write

f (x) + g(x) + inf f ∗ (z) + g ∗ (u − z) = hx, ui.

z | {z }
m

Suppose the infimum of m is achieved for when z = s.

h i h i
f (x) + f ∗ (x) − hx, si + g(x) + g ∗ (u − s) − hu − s − xi = 0

Both terms in square bracksts are non-negative by Fenchel’s inequality. There-

fore, for equality to hold, we need both terms to be zero. Thus, we can write

f (x) + f ∗ (s) = hx, si

g(x) + g ∗ (u − s) = hu − s, xi.

Thus,

s + u − s ∈ ∂f (x) + ∂g(x).
u = |{z}
| {z }
∈∂f (x) ∈∂g(x)

Since u was arbitrary, this implies ∂h(x) ⊂ ∂f (x) + ∂g(x).

39
From this dicsussion, we obtain the following corollary concerning constrained
minimization.
Corollary 6. Consider the problem

min f (x),
x∈C

where f is a convex function, C is a convex set. x∗ is a solution of this problem

if and only if

0 ∈ ∂f (x∗ ) + NC (x∗ ).

5.2 Connection with the KKT Conditions

Let us now study what the condition stated in Corollary 6 means. Let us
consider the problem
(
g1 (x) ≤ 0,
min f (x) subject to
x g2 (x) ≤ 0,

where gi are convex and differentiable. Let Ci = {x : gi (x) ≤ 0}. Note that
both Ci ’s are convex sets. Suppose gi (x) < 0. Then NCi (x) = {0}. However,
if gi (x) = 0, then

NCi (x) = {s : hs, y − xi ≤ 0 = gi (x) for all y with gi (y) ≤ gi (x) = 0}.

That is, NCi (x) = α∇gi (x), where α ≥ 0. Also, if gi (x) > 0, then NCi (x) = ∅.
Therefore, the condition

0 ∈ ∇f (x) + NC1 (x) + NC2 (x)

is equivalent to

αi ≥ 0,

0 = ∇f (x) + α1 ∇g1 (x) + α2 g(x) where αi gi (x) = 0,

gi (x) ≤ 0.


These are precisely the KKT conditions.

5.3 Monotonicity of the Subdifferential

Recall that for a convex differentiable f , we had

h∇f (x) − ∇f (y), x − yi ≥ 0, for all x, y.

A similar property holds for the subdifferential.

40
Proposition 34. Suppose f : Rn → R is convex. If s ∈ ∂f (x) and z ∈ ∂f (y),
then
hs − z, x − yi ≥ 0. (5.17)

Proof. Observe that

s ∈ ∂f (x) =⇒ f (y) ≥ f (x) + hs, y − xi
z ∈ ∂f (y) =⇒ f (x) ≥ f (y) + hz, x − yi.
Summing the inequalities, we obtain (5.17).
n
Definition 20. An operator T : Rn → 2R is said to be monotone if hs−z, x−
yi ≥ 0, for all x, y, and s ∈ T (x), z ∈ T (y).

It is useful to think of set-valued operators in terms of their graphs.

n
Definition 21. The graph of T : Rn → 2R is the set of u, v such that v ∈ T (u).

Notice that for a convex function, the graph of ∂f is the set of (x, u) such that
x ∈ dom(f ) and u ∈ ∂f (x).
A curious property satisfied by ∂f (x) is ‘maximal’ monotonicity.
n
Definition 22. T : Rn → 2R is said to be maximal monotone if there is no
monotone operator F , such that the graph of T is a strict subset of the graph
of F .
Example 24. For f (x) = |x|, the graph of ∂f (x) is shown below.
[Insert Fig on p.41].

We state the following fact without proof. See Rockafellar’s book for a proof.
Proposition 35. If T = ∂f for a closed convex function f , then T is maximal
monotone.

Let us now revisit a minimization problem like minx f (x), for a convex f .
In terms of the subdifferential of f , this is equivalent to looking for x such
that 0 ∈ T (x), where T = ∂f . An equivalent problem is to find x such that
x ∈ x + λ T (x), for λ > 0.
n
Definition 23. Given S : Rn → 2R , S −1 is the operator whose graph is the
set of (u, v) where u ∈ S(v).

Note that, by the foregoing discussion, minx f (x) is equivalent to finding x

such that x = (I + λT )−1 x. In the following, we study the properties of
Jλ T = (I + λT )−1 .
Let us first write down our previous observation as a proposition.

41
Proposition 36. If f (z) ≤ f (x) for all x, then JλT (z) = z.

Definition 24. An operator U : Rn → Rn is said to be firmly-nonexpansive if

kU (x) − U (y)k22 + k(I − U )(x) − (I − U )(y)k22 ≤ kx − yk22 .

Proposition 37. If T is monotone, then (I + T )−1 is firmly non-expansive.

Proof. Note that,

k(I − J)x − (I − J)yk22 = kx − yk22 + kJx − Jyk22 − 2hJx − Jy, x − yi.

Therefore, if we can show that

hJx − Jy, x − yi ≥ kJx − Jyk22 , (5.18)

we are done.
Now suppose

x = u + v, with v ∈ ∂T (u),
y = z + t, with t ∈ ∂T (z).

Then, J(x) = v, and J(y) = t. This implies

hv − t, u + v − z − ti = hJx − Jy, x − yi = kv − tk22 + hv − t, u − zi.

The last term is non-negative by the monotonicity of T . Also, since kJx −

Jyk22 = kv − tk22 , the (5.18) follows.

Alternative Proof :

T is monotone
⇐⇒ hx0 − x, y 0 − yi ≥ 0 ∀(x, y), (x0 , y 0 ) ∈ T,
⇐⇒ hx0 − x + y 0 − y, x0 − xi ≥ kx0 − xk22 ∀(x, y), (x0 , y 0 ) ∈ T.

Proposition 38. Suppose f : Rn → R is differentiable, convex and

k∇f (x) − ∇f (y)k2 ≤ Lkx − yk2 , (5.19)

for any x, y pair. Then,

L
f (x) ≤ f (y) + h∇f (y), x − yi + kx − yk22 .
2

42

Proof. Consider the function g(s) = f y+s (x−y) . Observe that g(0) = f (y),
g(1) = f (x), and
Z 1
g(1) − g(0) = g 0 (s) ds
0
Z 1

= h∇f y + s (x − y), x − y i ds
Z0 1

≤ h∇f y + s (x − y) − ∇f (y), x − y i + h∇f (y), x − yids
Z0 1
≤ k∇f y + s (x − y) − ∇f (y)k2 kx − yk2 + h∇f (y), x − yids
0
Z 1
≤ L s kx − yk22 + h∇f (y), x − yi ds
0
L
= kx − yk22 + h∇f (y), x − yi.
2
where we applied the Cauchy-Schwarz inequality and (5.19) to obtain the last
two inequalities.

Now suppose we want to minimize f and it satisfies the hypothesis of Prop. 38,
namely (5.19). We will derive an algorithm which will start from some initial
point x0 and produce a sequence that converges to the minimizer.
Suppose that at the k th step, we have xk , and we set xk+1 as
n α o
xk+1 = arg min Fk (x) = f (xk ) + h∇f (xk ), x − xk i + kx − xk k22 .
x 2
Observe that

(i) Fk (x) = f (xk ),

(ii) Fk (x) ≥ f (x) for all x.

f (·)
Fk (·)

Therefore,
f (xk+1 ) ≤ F (xk+1 ) ≤ Fk (xk ) ≤ f (xk ).
In words, at each iteration, we reduce the value of f . Let us derive what xk+1
is. Observe that
0 = ∇f (xk ) + α(xk+1 − xk ).

43
This is equivalent to
1
xk+1 = xk − ∇f (xk ).
α
This is an instance of the steepest-descent algorithm with a fixed step-size.

44
6 Applications to Algorithms
We now consider the application of the foregoing discussion convex analysis
to algorithms. We start with the proximal point algorithm. Although the prox-
imal point algorithm in its first form is almost never used in minimization
algorithms, its convergence proof contains key ideas that are relevant for more
complicated algorithms. In fact, the algorithms discussed in the sequel can be
written as proximal point algorithms. We then discuss firmly non-expansive
operators, which may be regarded as building blocks for developing convergent
algorithms. Following this, we discuss the augmented Lagrangian, the Douglas-
Rachford algorithm, and the alternating direction method of multipliers algo-
rithm (ADMM). The last section considers a generalization of the proximal
point algorithm, along with an application to a saddle point problem. I hope to
add more algorithms to this collection...

6.1 The Proximal Point Algorithm

Consider the minimization of a convex function f (x). The proximal point al-
gorithm constructs a sequence xk that converges to a minimizer. The sequence
is defined as
1
xk+1 = arg min kx − xk k22 + f (x) (6.20)
x 2α
| {z }
Fk (x)

The function Fk (x) is a perturbation of f (x) around xk . The quadratic term

ensures that xk+1 is close to xk .
In fact, this algorithm may actually be regarded as an MM algorithm since

• Fk (xk ) = f (xk ),
• Fk (x) ≥ f (x) for all x.

It follows immediately that

f (xk+1 ) ≤ Fk (xk+1 ) ≤ Fk (xk ) = f (xk )

In terms of subdifferentials, we have

0 ∈ (xk+1 − xk ) + α ∂f (xk+1 ).
Equivalently, we can write
xk ∈ (I + α∂f ) xk+1 .
Thus, at the k th step of PPA, we are essentially computing the inverse of
(I + α ∂f ) at xk . Notice that, since Fk (x) in (6.20) is strictly convex, xk+1 is
uniquely defined. Therefore, (I + α ∂f )−1 is a single-valued operator.

45
Definition 25. For a convex f , the operator Jf = (I + ∂f )−1 is called the
proximity operator of f .

The proximity operator of a convex function acts like a projection operator in

the following sense.

Proposition 39. For a proximity operator Jαf , we have

hJαf (x) − Jαf (y), x − yi ≥ kJαf (x) − Jαf (y)k22 . (6.21)

Proof. Let x0 = Jαf (x) and y 0 = Jαf (y). Then, we have

x ∈ (I + α ∂f ) x0 ⇐⇒ (x − x0 ) ∈ α∂f (x0 ),
y ∈ (I + α ∂f ) y 0 ⇐⇒ (y − y 0 ) ∈ α∂f (y 0 ).

Thanks to the monotonicity of α∂f , we can therefore write

hx0 − y 0 , (x − x0 ) − (y − y 0 )i ≥ 0
⇐⇒ hx0 − y 0 , x − yi ≥ kx0 − y 0 k22 ,

which is what we wanted to show.

The property in (6.21) is a key observation so we give it a name.

Definition 26. An operator F is said to be firmly-nonexpansive if

hF x − F y, x − yi ≥ kF x − F yk22 .

Thus, proximity operators are firmly-nonexpansive.

Let us now make another observation regarding the proximity operator. Sup-
pose x∗ minimizes f . For α > 0, this is equivalent to

0 ∈ ∂f (x∗ )
⇐⇒ 0 ∈ α ∂f (x∗ )
⇐⇒ x∗ ∈ x∗ + α ∂f (x∗ )
⇐⇒ x∗ = Jαf (x∗ ).

The last equality is a very special one.

Definition 27. A point x is said to be a fixed point of an operator F if

x = F (x).

We have thus shown the following.

46
Proposition 40. A point x∗ minimizes the convex function f if and only if
x∗ = Jαf (x∗ ) for all α > 0.

The following theorem is known as the Krasnoselskii-Mann theorem.

Theorem 1 (Krasnoselskii-Mann). Suppose F is a firmly-nonexpansive map-

ping on RN and its set of fixed points is non-empty. Let x0 ∈ RN . If the
sequence xk is defined as xk+1 = F xk , then xk converges to a fixed point of F .

Before we prove the theorem, let us state an auxiliary result of interest, that
provides an alternative definition of firm-nonexpansivity.

Lemma 2. For an operator F , the following conditions are equivalent.

• hF x − F y, x − yi ≥ kF x − F yk22
• kF x − F yk22 + (I − F )x − (I − F )yk22 ≤ kx − yk22

Proof. Exercise!
Hint : Expand the expression

kF x − F yk22 + (x − y) − (F x − F y)k22 .

We are now ready for the proof of the Krasnoselskii-Mann Theorem.

Proof of the Krasnoselskii-Mann Theorem. Pick some z such that z = F z. We

have,

kF xk − zk22 = kF xk − F zk22
≤ kxk − zk22 − k(I − F ) xk − (I − F ) z k22
| {z }
=0

Therefore,

kxk+1 − zk22 ≤ kxk − zk22 − kxk − xk+1 k22 . (6.22)

Summing this inequality from k = 0 to n, and rearranging, we obtain

n
X
n+1
kx − zk22 0
≤ kx − zk22 − kxk − xk+1 k22 .
k=0

From this inequality, we obtain

n
X
kxk − xk+1 k22 ≤ kx0 − zk22 .
k=0

47
Therefore, kxk − xk+1 k2 → 0, which implies (I − F )xk → 0.
Also, since kxn − zk2 ≤ kx0 − zk2 for any n, the sequence xk is bounded.
Therefore, we can pick a convergent subsequence {xkn }n with limit, say x∗ .
By the continuity of I − F , we have,
0 = lim (I − F )xkn = (I − F )x∗ .
n→∞

Therefore x∗ = F x∗ . If we now plug in z = x∗ in (6.22), we have

kxkn +m − x∗ k2 ≤ kxkn − x∗ k2 for all m > 0.
Since the kxkn −x∗ k2 can be made arbitrarily small by choosing n large enough,
it follows that xk → x∗ .

An immediate corollary of this general result (which we will refer to later) is

the convergence of PPA.
Corollary 7. The sequence xk , constructed by the proximal point algorithm
in (6.20) converges to a minimizer of f for any α > 0.

Thanks to the Krasnoselskii-Mann theorem, firmly-nonexpansive operators

play a central role in the convergence study of a number of algorithms. We
now provide a brief study of these operators. We state the results in their most
general form, for later reference.

6.2 Firmly-Nonexpansive Operators

We already saw that proximity operators are firmly-nonexpansive. However,
not all firmly-nonexpansive operators are proximity operators. Firmly-nonexpansive
operators can be generated from monotone, or maximal monotone operators.
Specifically, suppose T is a monotone operator with domain RN and consider
SαT = (I + αT )−1 . Let us name this operator, for it has interesting properties
that will be useful later.
Definition 28. For a monotone operator T , the operator ST = (I + T )−1 is
called the resolvent of T .

We pose two questions regarding resolvent operators.

• Is SαT defined everywhere?

• Is SαT single-valued?

We noted earlier that for a convex function f , if T = ∂f , then the answer to

both questions is affirmative. However, for a general monotone operator, JαT
may not be defined everywhere. Let us consider an example to demonstrate
this.

48
Example 25. Suppose we defined T : R → R as the unit step function, i.e.,
(
0, if x ≤ 0,
T (x) =
1, if x > 0.

Consider the operator I + T . Observe that

(
x, if x ≤ 0,
(I + T )(x) =
x + 1, if x > 0.

Notice that the interval (0, 1] is not included in the range of I + T . Therefore,
ST = (I + T )−1 is not defined for any point in (0, 1].

Nevertheless, in the range of I + α T , the inverse (I + α T )−1 is single-valued.

Proposition 41. Suppose T is a monotone mapping and α > 0. Then, for

any x in the range of the operator I + αT , we can find a unique z such that
(I + α T ) z = x. That is, (I + αT )−1 is a single-valued operator on its domain.

Proof. First, observe that for α > 0, the operator T is monotone if and only
if αT is monotone. Therefore, without loss of generality, we take α = 1.
We first show that for w ∈ T u and z ∈ T y, if u + w = y + z, then u = y. To
see this, observe that

u+w−y−z =0

implies

k(u − y) + (w − z)k22 = 0,

which is equivalent to

ku − yk22 + kw − zk22 + 2hu − y, w − zi = 0. (6.23)

But the inner product term in (6.23) is non-negative thanks to the monotonic-
ity of T . Therefore, in order for equality to hold in (6.23), we must have u = y
and w = z.
Now, if x is in the range of I + αT , then there is a point v and u ∈ αT v such
that u + v = x. But the observation above implies that if this is the case, then
u is unique and in fact, u = (I + αT )−1 x.

We noted that for a monotone operator T , the range of I + αT may be a

strict subset of Rn . This restricts the domain of SαT . For maximal monotone
operators, the range of I + αT is in fact the whole RN . This non-trivial result
(which we will not prove) is known as Minty’s theorem.

49
Theorem 2 (Minty’s Theorem). Suppose T is a maximal monotone operator
defined on RN and α > 0. Then, the range of I + αT is RN .

To demonstrate Minty’s theorem, let us consider the maximal monotone op-

erator whose graph is a superset of the graph of the operator in Example 25.

Example 26. Suppose the set valued operator T̃ is defined as the set valued
operator

{0}, if x < 0,

T̃ x = [0, 1], if x = 0,

{1}, if x > 0.


Consider now the operator I + T . Observe that


{x},
 if x < 0,
(I + T̃ )(x) = [0, 1], if x = 0,

{x + 1}, if x > 0.


Notice that for any y ∈ Rn , we can find x such that y ∈ (I + T̃ )x. Therefore,
the range of I + T̃ is the whole Rn .

Minty’s theorem ensures that the domain of SαT is the whole space if T is
maximal monotone. Therefore, we can state the following corollary.

Corollary 8. If T is a maximal monotone operator on RN , then SαT is single

valued and defined for all x ∈ RN .

We have the following generalization of Prop. 39 in this case.

Proposition 42. Suppose T is maximal monotone and α > 0. Then, SαT is

firmly-nonexpansive.

Proof. Exercise!
(Hint : Consider the argument used in the proof of Prop. 39.)

We remark that a subdifferential is maximal monotone. Therefore, a proximity

operator is in fact the resolvent of a maximal monotone operator and Prop. 39
is a special case of Prop. 42.
We introduce a final object called the ‘reflected resolvent’ that will be of in-
terest in the convergence analysis of algorithms that will follow. Let us first
state a result to motivate the definition.

Proposition 43. An operator S is firmly non-expansive if and only if 2S − I

is non-expansive.

50
Proof. Suppose S is firmly nonexpansive and let N = 2S − I. Observe that
n o
kN x − N yk22 = kx − yk22 + 4kSx − Syk22 − 4hSx − Sy, x − yi .

But the term inside the curly brackets above is non-negative thanks to the
firm-nonexpansivity of S. Thus we obtain

kN x − N yk2 ≤ kx − yk2 .

1
Conversely, suppose N = 2S − I is non-expansive. Then, S = 2
I + 12 N .
Observe also that I − S = 21 I − 12 N . We compute
1 1
kSx − Syk22 + k(I − S)x − (I − S)yk22 = kx − yk22 + kN x − N yk22
2 2
≤ kx − yk22 .

Thus, S is firmly nonexpansive by Lemma 2.

Definition 29. Suppose T is maximal monotone and α > 0. The operator
NT = 2 ST − I is called the reflected resolvent of T .

Thus, the reflected resolvent of a maximal monotone operator is non-expansive.

Let us state Prop. 43 from another viewpoint for later reference.

1 1
Corollary 9. N is nonexpansive if and only if I + N is firmly-nonexpansive.
2 2

Another useful observation is the following.

Corollary 10. If f is a convex function and α > 0, then (2Jαf − I) is non-
expansive.

The following is a useful property to know concerning reflected resolvents

(which, in fact, justifies the term ‘reflected’).
Proposition 44. Suppose T is maximal monotone. Then, NT (x + y) = x − y
if and only if y ∈ T x.

Proof. The claim follows from the following chain of equivalences.

NT (x + y) = x − y
⇐⇒ 2 ST (x + y) − (x + y) = x − y
⇐⇒ ST (x + y) = x
⇐⇒ x + y ∈ (I + T ) x
⇐⇒ y ∈ T x.

51
6.3 The Dual PPA and the Augmented Lagrangian
Consider now a problem of the form
min f (x) subject to Ex = d, (6.24)
x

where E is a matrix.
In order to solve this problem, we will apply PPA on the dual problem. For
this let us derive a dual problem through the use of Lagrangians. Let
L(x, λ) = f (x) + hλ, Ex − di.
Then, (6.24) can be expressed as,
min max f (x) + hλ, Ex − di.
x λ

In order to obtain the dual problem, we change the order of min and max.
This gives us the dual problem,
max g(λ),
λ

where
g(λ) = min f (x) + hλ, Ex − di.
x

Recall that if
λ∗ ∈ arg max g(λ),
λ

then
x∗ ∈ arg min L(x, λ∗ )
x

is a solution of the original problem (6.24). To find λ∗ , we apply PPA on the

dual problem and define a sequence as
α
λk+1 = arg max g(λ) − kλ − λk k22 .
λ 2
To find λk+1 , we need to solve
α
max min f (x) + hλ, Ex − di − kλ − λk k22 . (6.25)
λ x 2
Let (xk+1 , λk+1 ) denote the solution of this saddle point problem. To find this
point, suppose we first tackle the maximization. Observe that, for fixed x, the
optimality condition for the maximization part implies
0 = (Ex − d) − α(λ∗ − λk )
1
⇐⇒ λ∗ = λk + (Ex − d)
α

52
To find xk+1 , plug this expression into the saddle point problem (6.25).
1 α 1
xk+1 = arg min f (x) + hλk + (Ex − d), Ex − di − kλk + (Ex − d) − λk k22
x α 2 α

α
= arg min f (x) + hλk , Ex − di + kEx − dk22 .
x 2
To summarize, the dual PPA algorithm is

xk+1 = arg min LA (x, λk )

x
1
λk+1 = λk + (Exk − d),
α
where
α
LA (x, λ) = f (x) + hλ, Ex − di + kEx − dk22 .
2
The function LA is called the augmented Lagrangian. Notice that LA is similar
to the Lagrangian L but contains the additional term hλ, Ex − di, justifying
the expression ‘augmented’.
Let us now consider the convergence of this algorithm. Observe that since

0 ∈ E xk+1 − d + α(λk+1 − λk ),

and since limk λk+1 − λk → 0, we have that

lim E xk − d → 0.
k

Therefore, if x∗ is a limit point of xk , and x̄ is an arbitrary vector such that

E x̄ = d, we have

f (x∗ ) = LA (x∗ , λ∗ ) ≤ f (x̄).

Therefore x∗ is a solution.
Note that this argument is not necessarily a convergence proof, because it
depends on the assumption that x∗ is a limit point of the sequence of xk ’s.

6.4 The Douglas-Rachford Algorithm

Consider now a problem of the form

min f (x) + g(x), (6.26)

where both f and g are convex. Let F = ∂f , and G = ∂G. If x is a solution

of this problem, we should have

0 ∈ F x + Gx. (6.27)

53
Let us now derive equivalent expressions of this inclusion, to obtain a fixed
point iteration. Suppose we fix α > 0. Then, (6.27) and the following state-
ments are equivalent.
There exist u ∈ F x, z ∈ Gx such that 0 = u + z
⇐⇒ There exist u ∈ F x, z ∈ Gx such that x + αz = x − αu
⇐⇒ There exist u ∈ F x such that x = (I + αG)−1 (x − αu) (6.28)
⇐⇒ x ∈ (I + αG)−1 (I − αF ) x (6.29)
At this point, observe that if f is differentiable then F is single-valued. In that
case, the inclusion in (6.29) is actually an equality (may I confuse matters :
n
this statement is correct if we consider the range of F as Rn and not 2R ). This
suggests the following fixed point iterations, which is known as the forward-
backward splitting algorithm.
xk+1 = (I + αG)−1 (I − αF ) x
We will discuss this algorithm later.
We would like to derive an algorithm that employs Jαf = (I +αF )−1 . For this,
let us now backtrack to (6.28) and write down another equivalent statement.
⇐⇒ There exist u ∈ F x such that x+αu = (I +αG)−1 (x−αu)+αu (6.30)
If we now define a new variable t = x + αu, we have the following useful
equalities :
x = (I + α F )−1 t = Jαf (t)
αu = t − x = (I − Jαf ) t
Plugging these in (6.30), we obtain the following proposition.
Proposition 45. Suppose f and g are convex and the minimization problem
(6.26) has at least one solution. A point x is a solution of (6.26) if and only if

x = Jαf (t), for some t that satisfies t = Jαg 2Jαf − I + (I − Jαf ) (t).
(6.31)

Thus, if we can obtain t that satisfies the fixed point equation in (6.31), we can
obtain the solution to our minimization problem as Jαf (t). A natural choice
is to consider the fixed point iterations

tk+1 = Jαg 2Jαf − I + (Jαf − I) (tk ).

These constitute the Douglas-Rachford iterations. By a little algebra, we can

put them in a form that is easier to interpret. For this observe that,
1 1
Jαg 2Jαf − I + (I − Jαf ) = Jαg 2Jαf − I − (2Jαf − I) + I
2 2
1 1
= I + (2Jαg − I) (2Jαf − I). (6.32)
2 2

54
The convergence of the algorithm is easier to see in this form. Recall that,
Corollary 10 implies the non-expansivity of (2Jαf − I) and (2Jαg − I). But
composition of non-expansive operators is also non-expansive. Therefore, the
composite operator (2Jαg − I) (2Jαf − I) is non-expansive. Finally, by Corol-
lary 9, we can conclude that the operator in (6.32) is firmly-nonexpansive.
Combining this observation with Prop. 47, we obtain the following convergence
result as a consequence of the Krasnoselskii-Mann theorem (i.e., Thm. 1).
Proposition 46. Suppose f and g are convex and the minimization problem
(6.26) has at least one solution. The sequence constructed as

k+1 1 1
t = I + (2Jαg − I) (2Jαf − I) (tk ), (6.33)
2 2
is convergent. If t∗ denotes the limit of this sequence, then x∗ = Jαf (t∗ ) is a
solution of (6.26).

Generalized Douglas-Rachford Algorithm

In this section, we obtain a small variation on the Douglas-Rachford iterations
in (6.31). Let us start with the following general observation. Suppose T is a
single valued operator and z = (1 − β )I + β T z for some β ∗ 6= 0. Then,
∗ ∗

actually z = (1 − β)I + βT z, for any β. Therefore, we have the following

generalization of Prop. 47.
Proposition 47. Suppose f and g are convex and the minimization problem
(6.26) has at least one solution. A point x is a solution of (6.26) if and only if
x = Jαf (t) for some t that satisfies

t = (1 − β)I + β(2Jαg − I) (2Jαf − I) t, for all β.

Although this proposition is true for any β ∈ R, we will specifically be inter-

ested in β ∈ (0, 1), because that is the interval for which we can construct a
convergent sequence that can be used to obtain a minimizer of (6.26). The
generalized DR iterations are as follows.
Proposition 48. Suppose f and g are convex and the minimization problem
(6.26) has at least one solution. Also, let β ∈ (0, 1). Then, the sequence
constructed as

k+1
t = (1 − β)I + β(2Jαg − I) (2Jαf − I) (tk ), (6.34)

is convergent. If t∗ denotes the limit of this sequence, then x∗ = Jαf (t∗ ) is a

solution of (6.26).

In order to prove this result, we need a generalization of the Krasnoselskii-

Mann theorem. Let us start with a definition.

55
Definition 30. An operator T is said to be β-averaged with β ∈ (0, 1) if T
can be written as

T = (1 − β)I + βN,

for a non-expansive operator N .

Specifically, a firmly-nonexpansive operator is 12 -averaged. Notice also that if

T is β-averaged and β < β 0 , then T is also β 0 -averaged.
Let us now consider how we can generalize the Krasnoselskii-Mann theorem.
Recall that, in the proof of that theorem, a key inequality used in the beginning
of the proof was of the form

kT x − T yk22 + k(I − T )x − (I − T )yk22 ≤ kx − yk22 , (6.35)

where T is firmly nonexpansive. However, this depends heavily on the firm

non-expansivity of T . If T is only β averaged, with β > 1/2, then (6.35) does
not hold any more. Let us now see if we can come up with an alternative. So,
let T = (1 − β)I + βN , for a nonexpansive N . Then, we have,

kT x − T yk22 = (1 − β)2 kx − yk22 + β 2 kN x − N yk22 + 2β(1 − β)hN x − N y, x − yi

k(I − T )x − (I − T )yk22 = β 2 kx − yk22 + β 2 kN x − N yk22 − 2β 2 hN x − N y, x − yi.

In order to cancel the inner product terms, let us consider a weighted sum.
1−β
kT x − T yk22 + k(I − T )x − (I − T )yk22
β
= (1 − β)2 + β(1 − β) kx − yk22 + β 2 + β(1 − β) kN x − N yk22

≤ kx − yk22 .

Thus, we have shown the following.

Proposition 49. Suppose T is β-averaged. Then,
1−β
kT x − T yk22 + k(I − T )x − (I − T )yk22 ≤ kx − yk22 . (6.36)
β

We can now state a generalization of Theorem 1, which we also refer to as the

Krasnoselskii-Mann theorem.
Theorem 3 (Krasnoselskii-Mann (General Statement)). Suppose F is a β-
averaged mapping on RN and its set of fixed points is non-empty. Given an
initial x0 , suppose we define a sequence as xk+1 = F x0 . Then, xk converges
to a fixed point of F .

56
Proof. Exactly the same arguments as in the proof of Thm 1 work, except that
we now start with the inequality (6.36).

Proposition 48 now follows as a corollary of Theorem 3 because the operator

in (6.34) is β-averaged with β ∈ (0, 1).

6.5 Alternating Direction Method of Multipliers

Consider the problem

min f (x) + g(M x). (6.37)

In order to solve this problem, we will apply the Douglas-Rachford algorithm

on the dual problem. The resulting algorithm is known as the alternating
direction method of multipliers algorithm (ADMM). We will assume that M
is full column rank.
We first split variables and write (6.37) as a saddle point problem

min max f (x) + g(z) + hλ, M x − zi.

x,z λ

The dual problem is then

h i h i
max min f (x) + hλ, M xi + min g(z) + hλ, zi
λ x z
| {z } | {z }
−d1 (λ) −d2 (λ)

Equivalently, the dual problem can be expressed as,

min d1 (λ) + d2 (λ), (6.38)

for

d1 (λ) = max h−M T λ, xi − f (x) = f ∗ (−M T λ) (6.39a)

x
d2 (λ) = max hλ, zi − g(z) = g ∗ (λ). (6.39b)
z

For this problem, the Douglas-Rachford algorithm, starting from some y 0 , can
be written as follows.

ȳ k = Nαd2 (y k ),
ŷ k = Nαd1 (ȳ k )
1 1
y k+1 = y k + ŷ k
2 2
k+1 k+1
λ = Jαd2 (y )

57
Recall that the sequence of λk ’s converge to a solution of (6.38). Let us now
find expressions for the terms above in terms of f and g.
Suppose y k = λk + α z k , with z k ∈ ∂d2 (λk ). This implies (also recall Prop. 44)
that λk = Jαd2 (y k ), so that
ȳ k = 2 Jαd2 (y k ) − y k
= 2λ − (λk + α z k )
= λk − αz k .
Now observe that
1
Jα d1 (ȳ k ) = arg min ky − ȳ k k22 + d1 (y)
y 2α

1 k 2 T
= arg min max ky − ȳ k2 − f (x) − hM y, xi
y x 2α
Changing the order of min and max, we find that Jαd1 (ȳ k ) must satisfy
Jα d1 (ȳ k ) = ȳ k + α M xk+1 ,
where
1
xk+1 := arg max kαM xk22 − f (x) − hȳ k + αM x, M xi
x 2α
1
= arg max kαM xk22 − f (x) − hλk − αz k + αM x, M xi
x 2α
α
= arg min f (x) + hλk , M xi + kM x − z k k22 .
x 2
Therefore,
ŷ k = 2Jαd1 (y k ) − ȳ k
= ȳ k + 2αM xk
= λk − αz k + 2αM xk+1 .
We also have
1 1
y k+1 = y k + ŷ k
2 2
= λk + α M xk+1

Let us finally show that y k+1 can be expressed as y k+1 = λk+1 +α z k+1 for some
z k+1 ∈ ∂d2 (λk+1 ), so that the assumption stated in the beginning of the k th
iteration is also valid at the (k + 1)st iteration, when we define z k+1 properly.
We have,
1
λk+1 = arg min ky − y k+1 k22 + d2 (y)
y 2α

1 k+1 2
= arg min max ky − y k2 + hy, zi − g(z)
y z 2α

58
Changing the order of min-max, we find

λk+1 = y k+1 − αz k+1

= λk + α(M xk+1 − z k+1 ),

where
1
z k+1 = arg max kαzk22 + hy k+1 − αz, zi − g(z)
z 2α
1
= arg max kαzk22 + hλk + αM xk+1 − αz, zi − g(z)
z 2α
α
= arg min g(z) − hλk , zi + kM xk+1 − zk22 .
z 2
The optimality conditions for the last equation are

0 ∈ ∂g(z k+1 ) − λk + α(z k+1 − M xk+1 )

⇐⇒ λk − α(z k+1 − M xk+1 ) ∈ ∂g(z k+1 )
⇐⇒ λk+1 ∈ ∂g(z k+1 )
⇐⇒ z k+1 ∈ ∂g ∗ (λk+1 ) = ∂d2 (λk+1 ).

ADMM in Terms of the Primal Variables

We can rewrite the iterations solely in terms of xk , z k , λk . This produces the
following algorithm, referred to as the ADMM.
α
xk+1 = arg min f (x) + hλk , M xi + kM x − z k k22
x 2
α
z k+1 = arg min g(z) − hλk , zi + kM xk+1 − zk22
z 2
λk+1 = λk + α(M xk+1 − z k+1 ).

Convergence of ADMM
Recall that we derived ADMM as an instance of Douglas-Rachford algorithm
applied on the dual problem in (6.38). The iterations we started with, and
their relation to xk , z k , are given below.
h i
ȳ k = Nαd2 (y k ) ȳ k = λk − αz k (6.42a)
h i
ŷ k = Nαd1 (ȳ k ) ŷ k = λk − αz k + 2αM xk+1 (6.42b)
1 1 h i
y k+1 = y k + ŷ k y k+1 = λk + αM xk+1 (6.42c)
2 2 h i
λk+1 = Jαd2 (y k+1 ) λk+1 = λk + α(M xk+1 − z k+1 ) (6.42d)

59
The convergence results on the Douglas-Rachford iterations therefore ensure
that in (6.40), y k is convergent. Since the operators Nαd2 , Nαd1 , Jαd2 are con-
tinuous, this in turn implies that ȳ k , ŷ k and λk are also convergent sequences.
Thanks to the relations in (6.42), and the full column rank property of M
we then obtain that xk and z k are also convergent. Let λ∗ , x∗ , z ∗ denote the
corresponding limits.
First, notice that (6.42d) implies

λ∗ = λ∗ + α(M x∗ − z ∗ ).

Thus, we have M x∗ = z ∗ .
Now, using M x∗ = z ∗ , from (6.42c), and (6.42a) we obtain,

Nαd2 (λ∗ + αz ∗ ) = λ∗ − αz ∗ .

But by Prop. 44, this implies that z ∗ ∈ ∂d2 (λ∗ ). In view of (6.39b), equivalently
λ∗ ∈ ∂g(z ∗ ).
By a similar argument, we obtain from (6.42a) that ȳ ∗ = λ∗ − αM x∗ , from
(6.42b) that ŷ = λ∗ + αM x∗ , and

Nαd1 (λ∗ − αM x∗ ) = λ∗ + αM x∗ .

This time, Prop. 44, implies −M x∗ ∈ ∂d1 (λ∗ ). In view of (6.39a), equivalently
−M T λ∗ ∈ ∂f (x∗ ). Using this and the previous observation λ∗ ∈ ∂g(z ∗ ), we
thus can write

0 ∈ ∂f (x∗ ) + M T λ∗
⇐⇒ 0 ∈ ∂f (x∗ ) + M T ∂g(z ∗ )
⇐⇒ 0 ∈ ∂f (x∗ ) + M T ∂g (M x∗ ).

In words, x∗ is a solution of the primal problem (6.37).

6.6 A Generalized Proximal Point Algorithm

Recall that, given a maximal monotone T , the proximal point algorithm con-
sists of

xk+1 = (I + αT )−1 xk .

The sequence xk converges to some x∗ such that 0 ∈ T (x∗ ). Recall that the
convergence of PPA depended on the firm-nonexpansivity of S = (I + αT )−1 ,
which is equivalent to

hS x1 − S x2 , x1 − x2 i ≥ kS x1 − S x2 k22 .

60
To derive a generalized PPA, suppose M is a positive definite matrix, and
consider the following train of equivalences

0 ∈ T (x),
⇐⇒ M x ∈ M x + αT (x),
⇐⇒ x = (M + αT )−1 M x.

Note that, the last line assumes that (M + αT ) has an inverse. This is indeed
the case, since M can be written as M = cI + U for some positive definite
matrix U and c > 0. Thanks to positive definiteness, U is maximal monotone.
Consequently, U + αT is also maximal monotone and we can then resort to
Minty’s theorem and Prop. 42 to conclude that (M + αT ) = c I + (U + αT )
has a well-defined inverse. We also remark at this point that M does not have
to be symmetric.
We will study the operator (M + αT )−1 M in a modified norm.

Lemma 3. Suppose M is a positive definite matrix. Then, the mapping

hx, yiM = hx, M yi defines an inner product.

Proof. To be added...
p
In the following, we denote the induced norm h·, ·iM as k · kM .

Proposition 50. Suppose M is positive definite and T is maximal monotone

with respect to the inner product h·, ·iI . Then, the operator S = (M +αT )−1 M
is firmly-nonexpansive with respect to the inner product h·, ·iM . That is,

hSx − Sy, x − yiM ≥ kSx − Syk2M .

Proof. Without loss of generality, take α = 1. Since the range of M + T is the

whole space, and M is invertible, for any yi we can find xi and ui ∈ T (x) such
that M yi = M xi + ui . Notice that xi = Syi . But then,

hSy1 − Sy2 , y1 − y2 iM = hx1 − x2 , x1 − x2 iM + hx1 − x2 , M −1 (u1 − u2 )iM

= kx1 − x2 k2M + hx1 − x2 , u1 − u2 iI
≥ kx1 − x2 k2M .

To be added : generalization of the Krasnoselskii-Mann theorem...

In summary, the generalized PPA consists of the following iterations

xk+1 = (M + αT )−1 M xk .

61
Application to a Saddle Point Problem
Consider a problem of the form

min max f (x) + hAx, zi,

x z∈B

where B is a closed, convex set, and f is a convex function. Rewriting, we

obtain an equivalent problem as,

min max L(x, z) = f (x) + hAx, zi − iB (z) .
x z

Observe that L(x, z) is a convexoconcave function. For such functions, the

following operator replaces the subdifferential

∂x L(x, z)
T (x, z) =
∂z −L(x, z)
(6.43)
∂f (x) + AT z
=
∂iB (z) − Ax

Proposition 51. The operator T defined in (6.43) is maximal monotone.

Proof. Let us first show monotonicity. Observe that

x x
T (x1 , z2 ) − T (x2 , z2 ), 1 − 2
z1 z2
T
∂f (x1 ) ∂f (x2 ) x1 − x2 x1 − x2 0 A T x1 − x2
= − , +
∂iB (z1 ) ∂iB (z2 ) z1 − z2 z1 − z2 −A 0 z1 − z2
| {z }
=0
≥ 0.

Proof of maximality to be added...

If we apply PPA for obtaining a zero of T (x, z) defined in (6.43), the resulting
iterations are complicated. Consider now a generalized PPA (GPPA) with the
choice

I αAT
M= .
αA I

Observe that M is positive definite if α2 σ(AT A) < 1. In order to apply GPPA,

we need an expression for (M + αT )−1 . Notice that

x I αAT ∂f AT x
(M + αT ) = +α
z αA I −A ∂iB z
T

I + α∂f 2αA x
=
0 I + α∂iB z

62
Thus,

x̂ x
(M + αT ) ∈
ẑ z
means

(I + α∂f )x̂ + 2αAT ẑ = x, (6.44a)

(I + α ∂iB )ẑ = z. (6.44b)

Solving (6.44b) and plugging this back in (6.44a), we obtain the expressions
for x̂and ẑas,

ẑ = PB (z),
x̂ = Jαf (x − 2αAT ẑ),

where PB = JαiB denotes the projection operator onto B and Jαf is proximity
operator for f .
Observe also that,

x x + α AT z
M = .
z αAx + z

Therefore, the GPPA for this problem is,

z k+1 = PB (z k + αAxk )
xk+1 = Jαf xk + αAT z k − 2αAT z k+1

= Jαf xk − αAT (2z k+1 − z k )

By the analysis on GPPA, we can state that this algorithm converges if α2 >
σ(AT A).

Application to an Analysis Prior Problem

Consider now the problem
1
min ky − H xk22 + λkA xk1
x 2
By making use of the dual expression of the `1 norm, we can express this
problem as a saddle point problem. For this, recall that,

kxk1 = max hx, zi,

z∈B∞

= max hx, zi − iB∞ (z),

63
where B∞ denotes the unit ball of the `∞ norm. The equivalent saddle problem
is,
1
min max ky − H xk22 +λ hA x, zi − λ iB∞ (z),
|2
x z
{z }
f (x)

This is exactly the same form considered above. The choice

I −αAT
M=
−αA I

leads to the iterations

xk+1 = Jαf (xk − αAT z k )

z k+1 = PB∞ z k + αA(2xk+1 − xk ) .

These are the iterations proposed by Chambolle and Pock.

Convex Optimization Solutions Manual Stephen Boyd Lieven Vandenberghe instant download
100% (1)
Convex Optimization Solutions Manual Stephen Boyd Lieven Vandenberghe instant download
85 pages
Where Mathematics Comes From
67% (3)
Where Mathematics Comes From
122 pages
Co 463
No ratings yet
Co 463
116 pages
Script Convex Analysis
No ratings yet
Script Convex Analysis
167 pages
Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot
No ratings yet
Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot
107 pages
Convex - Module A Part 2
No ratings yet
Convex - Module A Part 2
27 pages
Convex Analysis (2024)
No ratings yet
Convex Analysis (2024)
32 pages
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
27 pages
OPTIIILN2023Spring ConvexOpti
No ratings yet
OPTIIILN2023Spring ConvexOpti
341 pages
Lecture 02 - Convexity
No ratings yet
Lecture 02 - Convexity
42 pages
Convex Optimization Theory - Summary
100% (1)
Convex Optimization Theory - Summary
58 pages
Lecture3 - Convex Sets and Convex Functions
No ratings yet
Lecture3 - Convex Sets and Convex Functions
9 pages
Note 1
No ratings yet
Note 1
3 pages
Nonlinear Programming-57-72
No ratings yet
Nonlinear Programming-57-72
16 pages
Daniiiiiiiii
No ratings yet
Daniiiiiiiii
84 pages
Lecture3 ConvexSetsFuns PDF
No ratings yet
Lecture3 ConvexSetsFuns PDF
43 pages
Research Log 1
No ratings yet
Research Log 1
12 pages
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
No ratings yet
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
135 pages
L2 Sets
No ratings yet
L2 Sets
21 pages
Lecture 2 Convex Sets and Functions
No ratings yet
Lecture 2 Convex Sets and Functions
25 pages
Lectures On Convex Sets: Niels Lauritzen
No ratings yet
Lectures On Convex Sets: Niels Lauritzen
93 pages
Lectures On Convex Sets
No ratings yet
Lectures On Convex Sets
93 pages
Math-Chapter 5
No ratings yet
Math-Chapter 5
4 pages
Convex
No ratings yet
Convex
63 pages
Lecture 01
No ratings yet
Lecture 01
10 pages
CPSC 542F WINTER 2017: Lecture Notes
No ratings yet
CPSC 542F WINTER 2017: Lecture Notes
10 pages
07 Convex Analysis
No ratings yet
07 Convex Analysis
18 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Lec 4
No ratings yet
Lec 4
13 pages
CPSC 542f Notes
No ratings yet
CPSC 542f Notes
10 pages
optimzation 공부자료0
No ratings yet
optimzation 공부자료0
38 pages
convex-fns-scribed
No ratings yet
convex-fns-scribed
6 pages
Analiza Convexa Duca Dorel
100% (1)
Analiza Convexa Duca Dorel
126 pages
MTH600 (A) Convex
No ratings yet
MTH600 (A) Convex
11 pages
Set 20160906
No ratings yet
Set 20160906
27 pages
Convexity and Optimization: 1 An Entirely Too Brief Motivation
No ratings yet
Convexity and Optimization: 1 An Entirely Too Brief Motivation
22 pages
Convex and Discrete Optimization - Lecture 1
No ratings yet
Convex and Discrete Optimization - Lecture 1
15 pages
1993_Bookmatter_FundamentalsOfConvexAnalysis
No ratings yet
1993_Bookmatter_FundamentalsOfConvexAnalysis
20 pages
Hw2 Convex Sets
No ratings yet
Hw2 Convex Sets
7 pages
Convex Function Analysis
No ratings yet
Convex Function Analysis
46 pages
lecture_09
No ratings yet
lecture_09
4 pages
(Grundlehren der mathematischen Wissenschaften 305) Jean-Baptiste Hiriart-Urruty, Claude Lemaréchal (auth.) - Convex Analysis and Minimization Algorithms I_ Fundamentals-Springer-Verlag Berlin Heidelb
No ratings yet
(Grundlehren der mathematischen Wissenschaften 305) Jean-Baptiste Hiriart-Urruty, Claude Lemaréchal (auth.) - Convex Analysis and Minimization Algorithms I_ Fundamentals-Springer-Verlag Berlin Heidelb
431 pages
additional_exercises
No ratings yet
additional_exercises
288 pages
1 9780898719451 ch1
No ratings yet
1 9780898719451 ch1
16 pages
convex_sets_functions
No ratings yet
convex_sets_functions
16 pages
1 Convexity
No ratings yet
1 Convexity
24 pages
trustrum1971
No ratings yet
trustrum1971
98 pages
Convex Sets
100% (1)
Convex Sets
38 pages
Convex Functions and Optimization
No ratings yet
Convex Functions and Optimization
20 pages
New Zealand Mathematical Olympiad Committee Convex Functions
No ratings yet
New Zealand Mathematical Olympiad Committee Convex Functions
7 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
143 pages
2 Convex
No ratings yet
2 Convex
18 pages
Kılınc-Karzan, F., & Mellon, C. (2025). Essential Mathematics for Convex Optimization. Cambridge University Press. (Draft)
No ratings yet
Kılınc-Karzan, F., & Mellon, C. (2025). Essential Mathematics for Convex Optimization. Cambridge University Press. (Draft)
460 pages
Lec 4
No ratings yet
Lec 4
43 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Convex Functions: See P. 10 of The Handout On Preliminary Material
No ratings yet
Convex Functions: See P. 10 of The Handout On Preliminary Material
20 pages
CUPProjectIniNoSol
No ratings yet
CUPProjectIniNoSol
454 pages
Mathematics Essentials for Convex Optimization
No ratings yet
Mathematics Essentials for Convex Optimization
300 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
Grants Cats
No ratings yet
Grants Cats
58 pages
Tensor Usage in Latex PDF
No ratings yet
Tensor Usage in Latex PDF
9 pages
A To Z of DTP
No ratings yet
A To Z of DTP
19 pages
Leonardo Davinci
100% (1)
Leonardo Davinci
104 pages
Tensor Usage in Latex
No ratings yet
Tensor Usage in Latex
1 page
Peano's Existence Theorem Revisited: February, 2012
No ratings yet
Peano's Existence Theorem Revisited: February, 2012
19 pages
The Literary Works of Leonardo Da Vinci 2
No ratings yet
The Literary Works of Leonardo Da Vinci 2
1,237 pages
Understanding How The Brain Naturally Learns
100% (1)
Understanding How The Brain Naturally Learns
44 pages
Motherboard GA B75M D2V
No ratings yet
Motherboard GA B75M D2V
44 pages
Spherical Unit Vectors
No ratings yet
Spherical Unit Vectors
1 page
Part 3 AWS Welding Symbols, Fillet, Plug, Slot & Spot Welds
No ratings yet
Part 3 AWS Welding Symbols, Fillet, Plug, Slot & Spot Welds
81 pages
10 0506 Uniformly Convex
No ratings yet
10 0506 Uniformly Convex
12 pages
An Index For Operational Flexibility in Chemical Process Design
No ratings yet
An Index For Operational Flexibility in Chemical Process Design
10 pages
Chapter 4. Optimization
No ratings yet
Chapter 4. Optimization
62 pages
Wrecking Ball Tutorial
No ratings yet
Wrecking Ball Tutorial
10 pages
ECEN 687 VLSI Design Automation: Nonlinear Programming
No ratings yet
ECEN 687 VLSI Design Automation: Nonlinear Programming
26 pages
1 Polygons
No ratings yet
1 Polygons
12 pages
Far New 1
No ratings yet
Far New 1
9 pages
Module 3 Activity On Polygons q1
No ratings yet
Module 3 Activity On Polygons q1
2 pages
Integral Geometry and Geometric Probability 2nd Edition Luis A. Santaló pdf download
100% (2)
Integral Geometry and Geometric Probability 2nd Edition Luis A. Santaló pdf download
77 pages
An Improved 3d Adaptive EFG Method For Forging and Extrusion Analysis With Thermal Coupling in Ls-Dyna
No ratings yet
An Improved 3d Adaptive EFG Method For Forging and Extrusion Analysis With Thermal Coupling in Ls-Dyna
8 pages
Tverberg's Theorem Is 50 Years Old: A Survey: Imre B Ar Any Pablo Sober On
No ratings yet
Tverberg's Theorem Is 50 Years Old: A Survey: Imre B Ar Any Pablo Sober On
2 pages
Detailed Lesson Plan in Mathematics 7
No ratings yet
Detailed Lesson Plan in Mathematics 7
8 pages
Grade 10 and 11 Mathematics Syllabus
No ratings yet
Grade 10 and 11 Mathematics Syllabus
70 pages
Combinatorics Winter Camp 1
No ratings yet
Combinatorics Winter Camp 1
2 pages
Paper 1 J. Adv. Math. Stud 2013
No ratings yet
Paper 1 J. Adv. Math. Stud 2013
9 pages
Polygons and Their Angles Measures
No ratings yet
Polygons and Their Angles Measures
16 pages
Congruences of Convex Algebras: ASC Report No. 39/2012
No ratings yet
Congruences of Convex Algebras: ASC Report No. 39/2012
45 pages
Integral Geometry and Geometric Probability 2nd Edition Luis A. Santaló 2024 scribd download
100% (3)
Integral Geometry and Geometric Probability 2nd Edition Luis A. Santaló 2024 scribd download
81 pages
CGR Question Bank
No ratings yet
CGR Question Bank
3 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
2022 11 Exam Solution
No ratings yet
2022 11 Exam Solution
9 pages
Divide and Conquer - Continued
No ratings yet
Divide and Conquer - Continued
6 pages
Curvature Map
No ratings yet
Curvature Map
1 page
000 - 01 - 07 - Research On The Water Level in A Bending Channel Using A Guide Wall
No ratings yet
000 - 01 - 07 - Research On The Water Level in A Bending Channel Using A Guide Wall
13 pages
Concavity and Q-Concavity
No ratings yet
Concavity and Q-Concavity
18 pages
Convex Duality and Financial Mathematics Peter Carr All Chapters Instant Download
100% (1)
Convex Duality and Financial Mathematics Peter Carr All Chapters Instant Download
55 pages
Affine Barycentric and Convex
No ratings yet
Affine Barycentric and Convex
8 pages
P15 v7.0 User Manual
No ratings yet
P15 v7.0 User Manual
648 pages

Convex Opt Alg

Uploaded by

Convex Opt Alg

Uploaded by

Lecture Notes on Convex Analysis and Iterative

1.1 Basic Definitions

1.2 Operations Preserving Convexity of Sets

This simple result is useful for characterizing linear systems of equations or

Let us continue with systems of linear inequalities.

Given an operator F and a set C, we can apply F to elements of C to obtain

Given C1 , C2 , consider the set of points of the form v = v1 + v2 , where vi ∈ Ci .

Proposition 5 (Minkowski Sums of Convex Sets). If C1 and C2 are convex

Proof. Observe that

Thus it follows by Prop. 2 and Prop. 3 that C1 + C2 is convex.

Example 7. The Minkowski sum of a rectangle and a disk in R2 in shown

Definition 2 (Convex Combination). Consider a finite collection of points x1 ,

for some αi such that

Definition 3 (Convex Hull). The set of all convex combinations of a set C is

Exercise 3 (Caratheodory’s Theorem). Show that, if C ∈ Rn , then Co(C) =

Proposition 6. The convex hull of a set C is convex.

Proposition 7. If D = Co(C), and if E is a convex set with C ⊂ E, then

is an element of E since it is a convex combination of k − 1 elements of C. But

is an element of E (why?). We are done.

Definition 4 (Affine Combination). Consider a finite collection of points x1 ,

for some αi such that

Definition 5 (Affine Hull). The set of all affine combinations of a set C is

Exercise 4. Consider a set C ⊂ R2 , composed of two points C = {x1 , x2 }.

Let us end this discussion with some definitions.

1.3 Projections onto Closed Convex Sets

In general, projections may not exist, or may not be uniquely defined.

The idea is as follows. Consider the figure below.

Pick t such that β > α. But then

= ky − xk22 + α2 kx − zk22 +2α hx − z, y − xi

Proof. Suppose x1 , x2 ∈ C and

kx1 − yk2 = kx2 − yk2 ≤ kz − yk2 for all z ∈ C.

Then, we have, by Prop. 8 that

Adding these inequalities, we obtain

Projection operators enjoy useful properties. One of them is the following,

Proof. Let pi = PC (xi ). Then, we have

Summing these, we have

Rearranging, we obtain the desired inequality.

As a first corollary of this proposition, we have :

Applying the Cauchy-Schwarz inequality, we obtain from Prop. 9 that projec-

1.4 Separation and Normal Cones

hs, xi > suphs, yi.

The hyperplane H, normal to x − p touching C at p should separate x and C.

As a generalization, we have the following result.

We want to find s such that

Definition 9. An affine hyperplane Hs,r is said to support the set C if hs, xi ≤

Let us now show the other inclusion. Take x ∈

Notice that if s is normal to C at x, then αs is also normal, for α ≥ 0. The

Proposition 14. NC (x) is a convex cone.

Proof. If s1 , s2 are in NC (x), then

hαs1 + (1 − α)s2 , y − xi = αhs1 , y − xi + (1 − α)hs2 , y − xi ≤ 0,

implying that αs1 + (1 − α)s2 ∈ NC (x).

Below are two examples.

Proposition 15. Let C be closed, convex. If s ∈ NC (x), then

Proposition 16. For a convex C, we have x + TC (x) ⊃ C.

Proof. If p ∈ C and s ∈ NC (x), then

Proposition 18. TC (x) is the closure of the cone generated by C − x.

hs, xi ≤ hs, yi, ∀y ∈ C.

Proof. Consider a sequence {xk }k with xk ∈ / C and limk xk = x. Also, let

hski , xki i ≤ hski , yi, ∀i.

Taking limits, we obtain hs, xi ≤ hs, yi.

Alternative Proof (Sketch). Assume NC (x) 6= ∅. Take z ∈ NC (x). Then,

Using the convexity of f , we have,

But this means that

Definition 16. The domain of f : Rn → R is the set

dom(f ) = {x ∈ Rn : f (x) < ∞}.

Example 10. Let C be a set in Rn . Consider the function

Exercise 6. Show that iC (x) is convex if and only if C is convex. 

Exercise 7. Consider the function

Determine if uC (x) is convex. If so, under what conditions? 

Proposition 21 (Jensen’s inequality). Let f : Rn → R be convex P and

But this is equivalent to (2.1).

Below are some examples of convex functions.

kαx + (1 − α)yk ≤ αkxk + (1 − α)kyk.

In particular, for 1 ≤ p ≤ ∞, the `p norm is defined as

Show that kxkp is actually a norm. What happens if p < 1?

2.1 Operations That Preserve the Convexity of Func-

Exercise 6. Show that iC (x) is convex if and only if C is convex.

Determine if uC (x) is convex. If so, under what conditions?

Exercise 9. Show that g(λ) is concave for λ ∈ D.