Elements of Convex Optimization Theory - 2015
Elements of Convex Optimization Theory - 2015
Costis Skiadas
August 2015
1 Vector Space
A (real) vector or linear space is a set X, whose elements are called vectors or points,
together with two operations, addition and multiplication by scalars, that satisfy the
following conditions, for all vectors x; y; z and real numbers ; :
2. x + y = y + x; x + (y + z) = (x + y) + z and ( x) = ( ) x:
3. (x + y) = x + y and ( + ) x = x + x:
1
It follows that the zero vector is unique, and for each x in X, the vector x is unique. We
write x y = x + ( y).
An underlying vector space X is taken as given throughout this appendix. Although the
term “vector space X”is common, it should be emphasized that a vector space speci…cation
includes not only a set of vectors but also the rules for adding vectors and multiplying by
scalars.
Example 2 Given an underlying probability space, the set of all random variables of …nite
variance is a vector space, with the usual rules for adding and scaling random variables
state by state.
A subset L of X is a linear (or vector) subspace if it is a vector space itself or, equiv-
alently, if for all x; y 2 L and 2 R; x + y and x are elements of L: The linear subspace
generated or spanned by a set of vectors S; denoted by span(S) ; is the intersection of
all linear subspaces that include S. Alternatively, the span of a set can be constructed
from within: For a …nite set of vectors S = fx1 ; : : : ; xn g ; the set span(S) ; also denoted
by span(x1 ; : : : ; xn ) ; consists of all linear combinations of the form 1 x1 + + n xn ; where
S
1 ; : : : ; n 2 R: For every set of vectors S; span(S) = fspan (F ) : F is a …nite subset of Sg :
A set of vectors S is linearly independent if every x 2 span(S) has a unique repre-
sentation of the form x = 1 x1 + + n xn ; for some 1 ; : : : ; n 2 R and x1 ; : : : ; xn 2 S:
It follows easily that the set of vectors S is linearly independent if and only if for all
x1 ; : : : ; xn 2 S and 1 ; : : : ; n 2 R;
1 x1 + + n xn = 0 implies 1 = = n = 0:
A basis of X is a linearly independent set of vectors S that generates X, that is, span(S) =
X.
A vector space is …nite dimensional if it has a …nite basis and in…nite dimensional
otherwise. Example 2 with an in…nite state space motivates our general interest in in…nite-
dimensional vector spaces. In this text, every state space is assumed to be …nite, and
therefore random variables can be encoded as …nite-dimensional Euclidean vectors. We
therefore focus on the …nite-dimensional case, but as noted in the introductory remarks,
2
from a perspective that allows direct extensions to contexts such as that of Example 2
with an in…nite state space. Every basis of a …nite-dimensional vector space has the same
number of elements, called the space’s dimension. The vector space f0g has, by de…nition,
dimension zero.
If X has …nite dimension d; we represent a basis fB1 ; : : : ; Bd g of X as a column matrix
B = (B1 ; : : : ; Bd )0 ;
and we write x to denote the row vector in Rd that represents the point x in X relative
to the given basis B, meaning that
d
X
x x
x= B= i Bi :
i=1
for the column matrix that lists the values that f assigns to the basis elements. The single
vector f (B) in Rd determines the entire function f; since x = x B implies f (x) = x f (B) :
A subset C of X is convex if for all x; y 2 C; 2 (0; 1) implies x + (1 ) y 2 C:
The function f : D ! R; where D X, is convex if D is a convex set and for all x; y 2 D;
2 (0; 1) implies f ( x + (1 ) y) f (x) + (1 ) f (y) : The function f is concave
if f is convex. A subset C of X is a cone if for all x 2 C; 2 R+ implies x 2 C: One
can easily check that a cone C is convex if and only if x; y 2 C implies x + y 2 C, and that
a convex cone C is a linear subspace if and only if x 2 C implies x 2 C:
An important type of convex set is a linear manifold. Given any subset S of X and
vector x; we write x + S = S + x = fx + s : s 2 Sg to denote the translation of S by x.
A subset M of X is a linear manifold if some translation of M is a linear subspace. The
dimension of a linear manifold M is that of the linear subspace to which M translates.
An exercise shows:
3
2 Inner Product
A widely applicable form of optimization can be thought of as a projection operation.
Informally, the projection of a point x onto a convex set S is the point of S that is closest
to x (assuming such a point exists). The simplest nontrivial instance of this problem is the
case in which S is a line. Suppose that S = span(y) and y has unit “length.” The scalar
s such that sy is the projection of x onto S is the inner product of x and y, denoted by
(y j x). Note that the functional (y j ) is linear. De…ning ( y j x) = (y j x) for all 2 R,
it is not hard to see that the function ( j ) is bilinear (linear in each argument), symmetric
((x j y) = (y j x)) and positive de…nite ((x j x) > 0 for x 6= 0). With this geometric
motivation, we take the notion of an inner product as a primitive object satisfying certain
axioms, and (in later sections) we use inner products to characterize projections on convex
sets more generally.
1. (x + y j z) = (x j z) + (y j z) and ( x j y) = (x j y) :
2. (x j y) = (y j x) :
An inner product space is a vector space together with an inner product on this space.
Example 7 Suppose X is the set of all random variables on some …nite probability space
in which every state is assigned a positive probability mass. Then (x j y) = E [xy] de…nes
4
an inner product. If Y is a linear subspace of X that does not contain the constant random
variables, then (x j y) = cov[x; y] = E [xy] ExEy de…nes an inner product on Y but not
on X (why?).
We henceforth take as given the inner product space (X; ( j )) ; which in the remainder of
this section is assumed to be …nite dimensional.
It will be convenient to extend the inner product notation to matrices of vectors, using
the usual matrix addition and multiplication rules. In particular, given a column matrix
of vectors B = (B1 ; : : : ; Bd )0 and any vector x; we write
x j B 0 = ((x j B1 ) ; : : : ; (x j Bd )) ;
and 0 1
(B1 j B1 ) (B1 j B2 ) (B1 j Bd )
B C
B (B2 j B1 ) (B2 j B2 ) (B2 j Bd ) C
BjB 0
=B
B .. .. .. .. C:
C
@ . . . . A
(Bd j B1 ) (Bd j B2 ) (Bd j Bd )
The matrix (B j B 0 ) is known as the Gram matrix of B and plays a crucial role in
the computation of projections. The following proposition shows that Gram matrices can
be used to convert abstract inner-product notation to concrete expressions involving only
matrices of scalars.
(b) The symmetry of the inner product implies that (B j B 0 ) is a symmetric matrix.
For any row vector 2 Rd , (B j B 0 ) 0 = ( B j B) : By the positive de…niteness of the
5
inner product it follows that (B j B 0 ) 0 0; with equality holding if and only if B = 0:
Since B is a basis, B = 0 if and only if = 0: This proves that (B j B 0 ) is a positive
de…nite matrix.
(c) By the linearity of the inner product, x = x B implies (x j B 0 ) = x (B j B 0 ) : By
part (b), (B j B 0 ) is invertible, and the claimed expression for x follows.
f (y) = f (B)0 y0
and (x j y) = x
B j B0 y0
:
3 Norm
A norm on the vector space X is a function of the form k k : X ! R that satis…es the
following properties for all x; y 2 X :
6
3. kxk 0; with equality holding if and only if x = 0.
Here kxk represents the value that the norm k k assigns to x, referred to simply as the
norm of x. It is easy to see that the triangle inequality holds for all x; y 2 X if and only
if
jkxk kykj kx yk for all x; y 2 X: (1)
In the current context, we assume that the norm k k is induced1 by the underlying
inner product ( j ) ; meaning that
p
kxk = (x j x); x 2 X: (2)
We will see shortly that k k so de…ned is indeed a norm. We think of kxk as the length of
x, in the sense used in our earlier informal motivation of inner products. Orthogonality of
two vectors x and y, de…ned by the condition (x j y) = 0; can be characterized entirely in
terms of the induced norm.
Proposition 10 The vectors x and y are orthogonal if and only if they satisfy the Pythagorean
identity
kx + yk2 = kxk2 + kyk2 :
Two vectors x and y are said to be colinear if either x = y or y = x is true for some
2 R:
7
Proof. The Cauchy-Schwarz inequality holds trivially as an equality if either x or y is
zero. Suppose x and y are nonzero, and let x ^ = x= kxk and y^ = y= kyk. Visualizing the
vector (^
x j y^) y^ as the projection of x
^ on the line spanned by y^, we note that x
^ (^
x j y^) y^
is orthogonal to y^: Indeed,
(^
x (^
x j y^) y^ j y^) = (^
x j y^) (^
x j y^) (^
y j y^) = 0:
0 k^
x x j y^) y^k2 = 1
(^ x j y^)2 ;
(^
4 Convergence
A sequence fxn g = (x1 ; x2 ; : : : ) of points in X converges to the limit x 2 X if for every
" > 0; there exists an integer N" such that n > N" implies kxn xk < ". In this case, the
sequence fxn g is said to be convergent. A subset S of X is closed if every convergent
sequence in S converges to a point in S: A subset S of X is open if its complement X n S
is closed. The set of all open subsets of X is known as the topology of X:
The following properties of closed and open sets can be easily con…rmed.2 The empty
set and X are both open and closed. The union of …nitely many closed sets is closed, and
the intersection of …nitely many open sets is open. Arbitrary intersections of closed sets
are closed, and arbitrary unions of open sets are open.
A sequence fxn g of points in X is Cauchy if for every " > 0; there exists an integer N"
such that m; n > N" implies kxm xn k < ": A subset S of X is complete if every Cauchy
2
Just as a norm need not be induced by an inner product, a topology, de…ned as a set of subsets of X
with the stated properties, need not be de…ned by a norm in a more general theory of convergence, which
is not needed in this text.
8
sequence in S converges to a limit in S. It is immediate from the de…nitions that a subset
of a complete set is complete if and only if it is closed. The triangle inequality implies
that every convergent sequence is Cauchy. Exercise 4 shows that the converse need not be
true for an arbitrary inner product space. Intuitively, a Cauchy sequence should converge
to something, but if that something is not within the space X; then the sequence is not
convergent. As we will see shortly, di¢ culties of the sort do not arise in …nite-dimensional
spaces.
The following proposition shows that if X is …nite dimensional, the convergence or
Cauchy property of a sequence is equivalent to the respective property of the sequence’s
coordinates relative to any given basis.
A Hilbert space is any inner product space that is complete (relative to the norm
induced by the inner product). One of the fundamental properties of the real line is that
it is complete. Given this fact, the last proposition implies:
9
Let S be any subset of X: The closure of S is the set
and the boundary of S is the set S n S 0 : A vector x is a closure point of the set S if and
only if there exists a sequence in S that converges to x: Therefore, the set S is closed if and
only if S = S; and it is open if and only if S = S 0 : Finally, note that closures and interiors
can be described in purely topological terms: The closure of a set is the intersection of all
its closed supersets, and the interior of a set is the union of all its open subsets.
5 Continuity
A function f : D ! R; where D X; is continuous at x 2 D if for any sequence
fxn g in D converging to x, the sequence ff (xn )g converges to f (x) : The function f is
continuous if it is continuous at every point of its domain D: It is straightforward to check
that f : D ! R is continuous at x 2 D if and only if given any " > 0; there exists some
> 0 (depending on ") such that y 2 D \ B (x; ) implies jf (y) f (x)j < ": Based on last
section’s discussion, we note that if X is …nite dimensional, the continuity of f at a point
is true or not independently of the inner product (or norm) used to de…ne the topology.
Inequality (1) shows that the underlying norm is a continuous function. The inner
product ( j ) is also continuous, in the following sense.
Proposition 14 Suppose fxn g and fyn g are sequences in X converging to x and y, re-
spectively. Then f(xn j yn )g converges to (x j y) :
10
then it has a Riesz representation z (by Proposition 9). An application of the Cauchy-
Schwarz inequality shows that jf (x) f (y)j kzk kx yk for all x; y 2 X and hence the
continuity of f: The remainder of this section extends this argument to concave functions,
showing in particular that in a …nite-dimensional space, a concave function over an open
domain is continuous.
Consider any concave function f : D ! R; where D is an open subset of X: We
introduce two properties of f; which turn out to be equivalent to the continuity of f: We
say that f is locally bounded below if given any x 2 D; there exists a small enough
radius r > 0 such that the in…mum of f over the ball B (x; r) is …nite. We say that f is
locally Lipschitz continuous if given any x 2 D; there exist a constant K and a ball
B (x; r) D such that
Proof. Given any x 2 D; let b denote a lower bound of f on B (x; r) D: Fixing any
y 2 B (x; r) ; let u = (y x) = ky xk and ( ) = f (x + u) for 2 [ r; r] : Consider the
following claim, where the …rst equality de…nes K :
11
The preceding lemma (and remark) applies to an arbitrary inner product space X:
Further restricting X to be …nite-dimensional, we can verify that a concave function is
locally bounded below, thus proving its continuity.
Proof. Suppose …rst that X = Rd with the standard Euclidean inner product. We …x
any x 2 D and show that f is bounded below near x: An exercise shows that x is contained
in the interior of a set of the form
[ ; ] = fx 2 X : xi 2 [ i ; i] ; i = 1; : : : ; dg D;
where ; 2 Rd and i < i for all i: Since f is concave, it is minimized over [ ; ] at some
extreme point of [ ; ] ; that is, a point x such that xi 2 f i ; i g for all i: To see why, take
any x 2 [ ; ] and any k 2 f1; : : : ; dg ; and de…ne the points x and x by xi = xi = xi
for i 6= k, xk = k and xk = k : Then for some 2 [0; 1] ; x = x + (1 ) x : Concavity
implies min f (x ) ; f x f (x) : We can therefore replace x by one of x or x without
increasing its value under f: Repeating this process for all coordinates shows that for every
x 2 [ ; ] ; there exists some extreme point x of [ ; ] such that f (x) f (x) : Since [ ; ]
has only …nitely many extreme points, f is bounded below on [ ; ] : Continuity of f at x
follows from Lemma 15.
Finally, suppose X is an arbitrary …nite-dimensional space and let B be any basis. We
have seen in Proposition 12 that whether f is continuous or not does not depend on what
inner product we endow X with. We can therefore choose the inner product of Example 6,
which makes B an orthonormal basis. Identifying each element x = x B 2 X with its
basis representation x ; the above argument applies.
12
The inequality (z j s s) 0 can be visualized as the requirement that the vectors z
and s s form an acute angle. Theorem 19(a) below characterizes projections on a convex
set S based on the intuitive geometric idea that a point xS in S is the projection on S of
a point x outside S if and only if for any s 2 S the vectors xS x and s xS form an
acute angle. The theorem also addresses the uniqueness and existence of a projection on
a convex set. Clearly, if S is not closed then a projection of a point on S may not exist.
For example, on the real line, the projection of zero on the open interval (1; 2) does not
exist. Exercise 4 further shows that in an in…nite-dimensional space, S can be closed and
convex, and yet the projection of zero on S may still not exist. The key issue in this case
is the absence of completeness of the ambient space X:
The central result on projections on convex sets follows. The theorem and its proof
apply to any inner product space (not necessarily a …nite-dimensional one).
ks xS k2 :
k k2 = k Sk
2
+k Sk
2
+ 2( S j S) k 2
Sk :
13
(c) Apply part (b) with x = y.
(d) Let = inf fkx sk : s 2 Sg, and choose a sequence fsn g in S such that fkx sn kg
converges to : Direct computation (see Exercise 2) shows
2
sm + sn
ksm sn k2 = 2 kx sm k2 + 2 kx s n k2 4 x
2
2 kx sm k2 + 2 kx s n k2 4 2;
7 Orthogonal Projections
This section elaborates on the important special case of projections on linear manifolds.
A vector x is orthogonal to the linear manifold M if x is orthogonal to m1 m2 for all
m1 ; m2 2 M: The orthogonal to M subspace, denoted by M ? ; is the linear subspace
of all vectors that are orthogonal to M . Note that M ? = (x + M )? for every x 2 X; and
a vector supports M at some point if and only if it is orthogonal to M . Theorem 19(a)
applied to linear manifolds reduces to the following result.
L? = fy 2 X : (B j y) = 0g : (5)
14
Finally, L?? = L:
1
Proof. Let xL = xL B; where xL = (x j B 0 ) (B j B 0 ) . Clearly, xL 2 L: Moreover,
x xL 2 L? , since, for any y = y B 2 L;
xL 1
(xL j y) = B j B0 y0
= x j B0 B j B0 B j B0 y0
= (x j y) :
Proof. Fixing any m 2 M; note that M m is the linear subspace L? of equation (5) ;
and therefore M ? = L?? = L = span(B) : The point xM is the projection of x on M if and
only if xM m is the projection of x m on L? : We can therefore apply Proposition (22)
to conclude that the projection xM of x on M exists and
1
xM m = (x m) x m j B0 B j B0 B:
15
If the linear subspace L has a …nite orthogonal basis B; then (B j B 0 ) is diagonal and
formula (4) for the projection of x on L reduces to
k
X (x j Bi )
xL = Bi : (6)
(Bi j Bi )
i=1
Formula (4) is the same as the Riesz representation expression of Proposition 9 with
f (y) = (x j y) : The reason should be clear given the following general relationship be-
tween orthogonal projections and Riesz representations (whose proof is immediate from
the de…nitions).
We use orthogonal projections to show that in any Hilbert space the Riesz represen-
tation of a continuous linear functional exists. The argument is redundant in the …nite-
dimensional case, since the claim was established in Proposition 9, but still worth reviewing.
16
8 Compactness
In this section we show that a continuous function achieves a maximum and a minimum
over any compact set. While the result applies to any normed space, it is mostly useful in
the …nite-dimensional case, in which a set is compact if and only if it is closed and bounded.
A subset S of X is (sequentially) compact if every sequence in S has a subsequence
that converges to a vector in S: This de…nition immediately implies that a compact set
is complete and therefore closed. A compact set S is also bounded, meaning that
sup fksk : s 2 Sg < 1: (If S were unbounded, there would exist a sequence fsn g in S
such that ksn k > n for all n; a condition that precludes the existence of a convergent
subsequence.)
Proof. As noted above, it is generally true that a compact set is closed and bounded.
We prove the converse, relying on the assumption that X is …nite dimensional. By
Proposition 12 and Example 6, it su¢ ces to show the result for X = Rd (why?). Let
fsn g = s1n ; : : : ; sdn be a sequence in a closed bounded subset S of Rd . Then the …rst
coordinate sequence s1n lies in some bounded interval I. Select a half-interval I1 of I
that contains in…nitely many points of s1n and let s1n1 be one of these points. Then select
a half-interval I 2 of I 1 that contains in…nitely many points of s1n : n > n1 and let s1n2 be
one of these points. Continuing in this manner, we obtain a nested sequence of intervals
fIk g whose length shrinks to zero and a corresponding subsequence s1nk with s1nk 2 Ik
for all k. Clearly, the subsequence s1nk is Cauchy and therefore convergent. Repeating
the argument we can extract a further subsequence for the second coordinate, then the
third, and so on. This process generates a convergent subsequence of fsn g whose limit
point must be in S, since S is assumed closed.
17
Proposition 29 Suppose that S is a compact subset of X and the function f : S ! R is
continuous. Then there exist s ; s 2 S such that
Proof. Let fsn g be a sequence such that limn f (sn ) = sup f: By the compactness of
S, there exists a subsequence of fsn g converging to some s 2 S. Since f is continuous,
f (s ) = sup f and therefore f (s ) f (s) for all s 2 S. The same argument applied to
f completes the proof.
Proof. Suppose A is bounded and therefore compact. Select r > 0 large enough so
that the set C = B \ fy : kx yk r for some x 2 Ag is nonempty. Note that C is also
compact (why?). Let f(an ; bn )g be any sequence in A C such that kan bn k converges
to = inf fkx yk : (x; y) 2 A Cg : We can then extract a subsequence f(ank ; bnk )g that
converges to some (a; b) 2 A C: By the triangle inequality,
18
conjunction with convexity properties. For example, a weak compactness argument can
be used to prove that in any Hilbert space, if the nonempty set S is convex, closed and
bounded, and the function f : X ! R is concave over S and continuous at every point of
S; then f achieves a maximum in S: Suitable references are provided in the endnotes.
9 Supporting Hyperplanes
A hyperplane H is a linear manifold whose orthogonal subspace is of dimension one. If
H ? is spanned by the vector y and = (y j x) ; where x is any point in H; then
H = fx : (y j x) = g : (7)
This expression characterizes all hyperplanes as y ranges over the set of all nonzero vectors
and ranges over the set of all scalars.
According to De…nition 18, the vector y supports the set S X at s 2 S if and only if
(y j s) (y j s) for all s 2 S. Letting = (y j s) in (7) ; this condition can be visualized
as S being included in the half-space fx : (y j x) g on the one side of the hyperplane
H; while touching H at s: The following is an extension of De…nition 18, which can be
visualized the same way, but does not require s to be an element of S:
It is intuitively compelling that one should be able to support a convex set at any
point of their boundary. The supporting hyperplane theorem, proved below, shows that
this is indeed true for a …nite-dimensional space. In another example in which …nite-
dimensional intuition need not apply in in…nite-dimensional spaces, Exercise 16 shows that
in an in…nite-dimensional space it may not be possible to support a convex set at a point
of its boundary.3
19
Proof. Since x is not interior, we can construct a sequence fxn g of vectors such that
xn 2= S for all n and xn ! x as n ! 1: For each n, let sn be the projection of xn on
S and de…ne yn = (sn xn ) = ksn xn k. The dual characterization of projections gives
(yn j sn ) (yn j s) for all s 2 S. By Theorem 19(b), the sequence fsn g converges to the
projection s of x on S. The sequence fyn g lies in the closure of the unit ball, which is
compact, and therefore we can extract a subsequence fynk g that converges to some vector
y of unit norm. By the continuity of inner products, we conclude that (y j s) (y j s) for
all s 2 S. If x is on the boundary of S, then x = s is the limit of some sequence fsn g
in S. Therefore, f(y j sn )g converges to (y j s), implying (8) : If x is not on the boundary
of S; then fynk g converges to y = (s x) = ks xk. Since (y j s x) > 0, it follows that
(y j x) < (y j s) (y j s) for all s 2 S:
20
The subgraph of f is the set
" "
K<f y f (0) + y j y = f (0) " kyk ;
kyk kyk
which results in the bound kyk < (f (0) K) =": This proves that @f (x) bounded. Since
it is also closed, it is compact.
The gradient of f at x is said to exist if f 0 (x; h) exists for every h 2 X and the functional
f 0 (x; ) is linear. In this case, the Riesz representation of the linear functional f 0 (x; ) is
4
The result extends to general Hilbert spaces, provided the compactness conclusion is replaced by weak
compactness.
21
the gradient of f at x; denoted by rf (x) : Therefore, when it exists, the gradient rf (x)
is characterized by the restriction
For concave functions, directional derivatives and gradients are related to superdi¤er-
entials as follows.
L= x + h; f (x) + f 0 (x; h) : 2A X R
22
does not intersect the interior of the subgraph of f; as de…ned in (10) : By the separating
hyperplane theorem (Corollary 33), there exists nonzero (p; ) 2 X R that separates the
sets L and sub(f ) :
Since the right-hand side must be …nite, > 0: It follows that the right-hand side is at
least as large as (p j x) + f (x) ; which is also obtained as the expression on the left-hand
side with = 0: Since 0 2 A0 ; the coe¢ cient of on the left-hand side must vanish:
(p j h) + f 0 (x; h) = 0: Therefore, f 0 (x; h) = (y j h) ; where y = p= ; and the separation
condition reduces to the gradient inequality de…ning the condition y 2 @f (x) :
(c) If @f (x) = f g ; then part (b) implies that f 0 (x; h) = ( j h) for all h 2 X;
and therefore = rf (x) : Conversely, if the gradient exists, then part (a) implies that
(rf (x) y j h) 0 for all h 2 X and y 2 @f (x) : Letting h = rf (x) y; it follows that
y = rf (x) if y 2 @f (x) :
With the convention inf ; = 1; this de…nes a function of the form J : Rn ! [ 1; +1] ;
which is clearly monotone: 1 2 =) J ( 1 ) J ( 2 ) : We will characterize the solution
to the problem (P0 ) : Since problem (P ) is the same as problem (P0 ) with G in place
of G; this covers the general case. The key to understanding (P0 ), however, is to consider
the entire function J:
Associated with problem (P0 ) is the Lagrangian
where the dot denotes the Euclidean inner product in Rn : The parameter will be referred
to as a Lagrange multiplier. Assuming J (0) is …nite, we extend our earlier de…nition
(for …nite-valued functions) by de…ning the superdi¤erential of J at zero to be the set
23
The monotonicity of J implies that @J (0) Rn+ : Indeed, if 2 @J (0) ; then 0 J ( )
J (0) for all 0 and therefore 0: The following relationship between the
Lagrangian and the superdi¤erential of J at zero is key.
Proof. In the space Rn R with the Euclidean inner product, 2 @J (0) if and only
if ( ; 1) supports at (0; J (0)) (in the sense of De…nition 31) the set
S1 = f( ; ) 2 Rn R: < J ( )g :
Similarly, J (0) = sup fL (x; ) : x 2 Cg if and only if ( ; 1) supports at (0; J (0)) the set
Lemma 36 does not assume that an optimum is achieved. If a maximum does exist,
the argument can be extended to obtain the following global optimality conditions for
problem (P0 ) :
Theorem 37 Suppose that c 2 C; G (c) 0 and 2 Rn : Then the following two conditions
are equivalent:
24
Remark 38 (a) Given the inequalities G (c) 0 and 0; the restriction G (c) = 0 is
known as complimentary slackness, since it is equivalent to G (c)i < 0 =) i = 0; for
every coordinate i: Intuitively, a constraint can have a positive price only if it is binding.
(b) Condition 2 of Theorem 37 is sometimes equivalently stated as a saddle-point con-
dition: L ( ; ) is maximized over C at c; and L (c; ) is minimized over Rn+ at :
Assuming the existence of a maximum, the above global optimality conditions are
applicable if and only if @J (0) is nonempty. Convexity-based su¢ cient conditions for this
to be true are given below. Besides convexity, the key assumption is the so-called Slater
condition:
There exists x 2 C such that G (x)i < 0 for all i 2 f1; : : : ; ng : (12)
Proposition 39 Suppose that C; U and G are convex, J (0) is …nite and the Slater con-
dition (12) holds. Then @J (0) 6= ;:
Proof. One can easily verify that sub(J) is convex and has (0; J (0)) on its boundary.
By the supporting hyperplane theorem, sub(J) is supported at (0; J (0)) by some nonzero
( ; ), and therefore J ( ) implies J (0) for all 2 Rn : If < 0; then
one obtains a contradiction with = 0: The Slater condition guarantees that J ( ) > 1
for all su¢ ciently close to zero, and therefore 6= 0: The only possibility is > 0; in
which case = 2 @J (0) :
25
Suppose that the gradient of U at c 2 D0 \ M exists. Then
n
X
U (c) = max U (x) implies rU (c) = i Bi for some 2 Rn :
x2D\M
i=1
Proof. Suppose U (c) = max fU (x) : x 2 D \ M g and consider any vector y in the
linear subspace
L = M c = fx 2 X : (Bi j x) = 0; i = 1; : : : ; ng :
Finally, we derive the Kuhn-Tucker optimality conditions for last section’s prob-
lem (P0 ) :
Theorem 41 Suppose that c 2 C 0 solves problem (P0 ) : U (c) = J (0) and G (c) 0:
Suppose also that the gradients of U and G at c exist, and that
26
By the separating hyperplane theorem (Corollary 33), there exists nonzero ( ; ) 2 Rn R
that separates A and B; and therefore
0 and 0 =) + 0; (14)
[G (c) + (rG (c) j h)] + (rU (c) j h) 0 for all h 2 X: (15)
13 Exercises
1. Prove Proposition 3.
2. (a) Given the inner product space X; prove the parallelogram identity:
4. Let X = C [0; 1] be the vector space of all continuous functions on the unit interval,
R1
with the inner product (x j y) = 0 x (t) y (t) dt:
(a) Show that the sequence fxn g de…ned by xn (t) = 1= (1 + nt) ; t 2 [0; 1] ; is Cauchy
but does not converge in X:
27
(b) Suppose we are interested in …nding a continuous function x : [0; 1] ! R that
R1
minimizes 0 x (t)2 dt subject to the constraint x (0) = 1: Express this as a projection
problem in X; and show that the suitable projection does not exist.
5. Let X be the vector space of every real-valued sequence x = (x1 ; x2 ; : : : ) such that
P
xn = 0 for all but …nitely many values of n; with the inner product (x j y) = n xn yn :
P
Show that the functional f (x) = n nxn is linear but not continuous.
6. Give an alternative proof of the necessity of the support condition in Theorem 19(a)
that utilizes the fact that the quadratic f ( ) = kx x k2 is minimized at zero and
therefore the right derivative of f at zero must be nonnegative. Also draw a diagram
that makes it obvious that if the angle between s xS and xS x is wider than a
right angle, then there exists some 2 (0; 1) such that kx x k < kx xS k :
7. (a) Show that the vector y supports the set S at some vector s if and only if y supports
S at s. (Note that it is not assumed that s is in S, and therefore De…nition 31 is
required.)
(b) Show that the closure of a convex set is convex and conclude that, in a complete
space, projections on the closure of a convex set always exist.
10. Prove Proposition 26, and derive the projection expression of Proposition 22 as a
corollary of Proposition 9.
11. Suppose P is a linear operator on the inner product space (X; ( j )) ; that is, a
function of the form P : X ! X such that P ( x + y) = P (x) + P (y) for
all ; 2 R and x; y 2 X. P is a projection operator if there exists a linear
subspace L such that P x is the projection of x on L for all x 2 X: The operator P
28
is idempotent if P 2 = P . Finally, P is self-adjoint if (P x j y) = (x j P y) for all
x; y 2 X.
(a) Prove that P is a projection operator if and only if P is both idempotent and
self-adjoint.
(b) Apply part (a) to show that a matrix A 2 Rn n is idempotent A2 = A and
symmetric (A0 = A) if and only if there exists a full-rank matrix B such that A =
B 0 (BB 0 ) 1 B:
12. (a) Suppose X, Y and Z are vector spaces, and f : X ! Y and g : X ! Z are
linear functions such that g (x) = 0 implies f (x) = 0 for all x 2 X: Show that
L = fg (x) : x 2 Xg is a linear subspace of Z; and that there exists a linear function
h : L ! Y such that f = h g:
(b) Suppose f; g1 ; : : : ; gn are linear functionals on X such that g1 (x) = = gn (x) =
Pn
0 implies f (x) = 0; for all x 2 X: Show that f = i=1 i gi for some 2 Rn :
min x0 Qx : Ax = b; x 2 Rn 1
;
15. Suppose that the random vector y = (y1 ; : : : ; yn )0 is generated by the model y =
A + "; where A 2 Rn m is a known matrix, 2 Rm is an unknown vector, and "
is an unobservable zero-mean random vector, valued in Rn ; with variance-covariance
matrix E [""0 ] = ; assumed to be positive de…nite. We are interested in a linear
estimator of the parameter ; that is, an estimator of the form ^ = By; for some
B 2 Rm n : The linear estimator represented by B is unbiased if BA = I; since then
E ^ = for every choice of : Using projection theory, determine the unbiased linear
estimator that minimizes the variance of ^ i i for every i 2 f1; : : : ; mg :
29
16. (1) Suppose X = l2 , the space of square summable sequences with the inner product
P
(x j y) = 1 n=1 x (n) y (n) : Let S be the positive cone of X; that is, the set of all x 2 X
such that x (n) 0 for all n: Show that S = S and therefore S contains no interior
points. Finally, consider any s 2 S such that s (n) > 0 for all n; and show that the
only vector y such that (y j s) (y j s) for all s 2 S is the zero vector.
17. Verify the claim of display (11) ; stating the equivalence of the supergradient property
and an associated support condition.
19. In the context of Section 11, suppose that U and G are continuous. Is J necessarily
continuous at an interior point of its e¤ective domain (that is, the set where J is
…nite)? Provide a proof or a counterexample.
21. Prove the opening claim of the proof of Theorem 41. Also, provide a set of convexity-
based su¢ cient conditions for the Kuhn-Tucker conditions of Theorem 41 to imply
optimality, and prove your claim.
14 Notes
Treatments of …nite-dimensional convex optimization theory include Rockafellar (1970),
Bertsekas (2003) and Boyd and Vandenberghe (2004), the latter providing an introduction
to modern computational optimization algorithms. In…nite-dimensional extensions can
be found in Luenberger (1969), Ekeland and Témam (1999) and Bonnans and Shapiro
(2000). The Hilbert-space theory outlined in this Appendix is presented and extended in
the classic text of Dunford and Schwartz (1988), where one can also …nd the notion of weak
compactness and its interplay with convexity alluded to in Section 8.
30
References
Bertsekas, D. P. (2003): Convex Analysis and Optimization. Athena Scienti…c, Belmont,
MA.
Dunford, N., and J. T. Schwartz (1988): Linear Operators, Part I, General Theory.
Wiley.
Ekeland, I., and R. Témam (1999): Convex Analysis and Variational Problems. SIAM,
Philadelphia, PA.
Skiadas, C. (2009): Asset Pricing Theory. Princeton Univ. Press, Princeton, NJ.
31