0% found this document useful (0 votes)
2 views

semip

Chapter 4 discusses semidefinite programming (SDP) and its significance in optimization, highlighting the challenges in formulating problems as SDP due to a lack of understanding and support in modeling languages. It explains the structure of conic problems and their duals, particularly focusing on semidefinite programs, which involve positive semidefinite matrices. The chapter also touches on the computational limitations of current SDP solvers and the potential for SDP to address new problem types in convex optimization.

Uploaded by

Zheng Zhou
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

semip

Chapter 4 discusses semidefinite programming (SDP) and its significance in optimization, highlighting the challenges in formulating problems as SDP due to a lack of understanding and support in modeling languages. It explains the structure of conic problems and their duals, particularly focusing on semidefinite programs, which involve positive semidefinite matrices. The chapter also touches on the computational limitations of current SDP solvers and the potential for SDP to address new problem types in convex optimization.

Uploaded by

Zheng Zhou
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Chapter 4

Semidefinite programming

Prior to 1984, linear and nonlinear programming,4.1 one a subset of the other,
had evolved for the most part along unconnected paths, without even a common
terminology. (The use of “programming” to mean “optimization” serves as a
persistent reminder of these differences.)
−Forsgren, Gill, & Wright, 2002 [169]
Given some practical application of convex analysis, it may at first seem puzzling why
a search for its solution ends abruptly with a formalized statement of the problem itself
as a constrained optimization. The explanation is: typically we do not seek analytical
solution because there are relatively few. (§3.5.3, §C) If a problem can be expressed in
convex form, rather, then there exist computer programs providing efficient numerical
global solution. [195] [454] [455] [453] [395] [379] The goal, then, becomes conversion of a
given problem (perhaps a nonconvex or combinatorial problem statement) to an equivalent
convex form or to an alternation of convex subproblems convergent to a solution of the
original problem:
By the fundamental theorem of Convex Optimization, any locally optimal point
(solution) of a convex problem is globally optimal. [66, §4.2.2] [348, §1] Given convex real
objective function g and convex feasible set D ⊆ dom g , which is the set of all variable
values satisfying the problem constraints, we pose a generic convex optimization problem
minimize g(X)
X (708)
subject to X∈ D
where constraints are abstract here in membership of variable X to convex feasible set D .
Inequality constraint functions of a convex optimization problem are convex. Quasiconvex
inequality constraint functions are prohibited by prevailing methods for numerical solution.
Equality constraint functions are conventionally affine, but not necessarily so. Affine
equality constraint functions, as opposed to the superset of all convex equality constraint
functions having convex level sets (§3.4.0.0.5), make convex optimization tractable.
Similarly, the problem
maximize g(X)
X (709)
subject to X ∈ D
is called convex were g a real concave function and feasible set D convex. As conversion
to convex form is not always possible, there is much ongoing research to determine which
problem types have convex expression or relaxation. [36] [64] [176] [315] [390] [173]
4.1 nascence of polynomial-time interior-point methods of solution [410] [451].

Linear programming ⊂ (convex ∩ nonlinear) programming.

Dattorro, Convex Optimization „ Euclidean Distance Geometry 2ε, Mεβoo, v2018.09.21. 223
224 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.1 Conic problem


Still, we are surprised to see the relatively small number of submissions to
semidefinite programming (SDP) solvers, as this is an area of significant
current interest to the optimization community. We speculate that semidefinite
programming is simply experiencing the fate of most new areas: Users have
yet to understand how to pose their problems as semidefinite programs, and
the lack of support for SDP solvers in popular modelling languages likely
discourages submissions. −SIAM News, 2002 [134, p.9]

(confer p.127) Consider a conic problem (p) and its dual (d): [333, §3.3.1] [273, §2.1] [274]

minimize c Tx maximize b Ty
x y,s
(310) (p) subject to x∈K subject to s ∈ K∗ (d) (710)
Ax = b A Ty + s = c
where K is a closed convex cone, K∗ is its dual, matrix A is fixed, and the remaining
quantities are vectors.
When K is a polyhedral cone (§2.12.1), then each conic problem becomes a linear
program; the selfdual nonnegative orthant providing the prototypical primal linear
program and its dual. [104, §3-1]4.2 More generally, each optimization problem is convex
when K is a closed convex cone. Solution to each convex problem is not necessarily
unique; the optimal solution sets {x⋆ } and {y ⋆ , s⋆ } are convex and may comprise more
than a single point.

4.1.1 a semidefinite program


n
When K is the selfdual cone of positive semidefinite matrices S+ in the subspace of
n
symmetric matrices S , then each conic problem is called semidefinite program (SDP);
[315, §6.4] primal problem (P) having matrix variable X ∈ S n while corresponding dual
(D) has slack variable S ∈ S n and vector variable y = [yi ] ∈ Rm : [11] [12, §2] [461, §1.3.8]
minimize hC , X i maximize hb , yi
X∈ Sn
y∈Rm, S∈ Sn
(P) subject to Xº0 subject to S º 0 (D) (711)
A svec X = b svec−1 (AT y) + S = C
This is the prototypical primal semidefinite program and its dual, where matrix C ∈ S n
and vector b ∈ Rm are fixed as is
 
svec(A1 )T
.. m×n(n+1)/2
A , ∈ R (712)
 
.
T
svec(Am )

because {Ai ∈ S n , i = 1 . . . m} is given. Thus


hA1 , X i
 
.
..
A svec X =  
hAm , X i (713)
m
svec−1 (AT y) =
P
yi Ai
i=1
4.2 Dantzig explains reasoning behind a nonnegativity constraint: . . . negative quantities of activities are
not possible. . . . a negative number of cases cannot be shipped.
4.1. CONIC PROBLEM 225

The vector inner-product for matrices is defined in the Euclidean/Frobenius sense in the
isomorphic vector space Rn(n+1)/2 ; id est,

hC , X i , tr(C TX) = svec(C )T svec X (40)

where svec X defined by (59) denotes symmetric vectorization.

In a national planning problem of some size, one may easily run into several
hundred variables and perhaps a hundred or more degrees of freedom. . . . It
should always be remembered that any mathematical method and particularly
methods in linear programming must be judged with reference to the type of
computing machinery available. Our outlook may perhaps be changed when we
get used to the super modern, high capacity electronic computor that will be
available here from the middle of next year.
−Ragnar Frisch [171]

The Simplex method of solution for linear programming, invented by Dantzig in


1947 [104], is now integral to modern technology. The same cannot yet be said for
semidefinite programming whose roots trace back to systems of positive semidefinite linear
inequalities studied by Bellman & Fan in 1963 [33] [115] who provided saddle convergence
criteria. Interior-point methods for numerical solution of linear programs can be traced
back to the logarithmic barrier of Frisch in 1954 and Fiacco & McCormick in 1968 [164].
Karmarkar’s polynomial-time interior-point method sparked a log-barrier renaissance
in 1984, [312, §11] [451] [410] [315, p.3] but numerical performance of contemporary
general-purpose semidefinite program solvers remains limited: Computational intensity
for dense systems varies as O(m2 n) (m constraints ≪ n variables) based on interior-point
methods that produce solutions no more relatively accurate than 1E-8. There are no
solvers capable
¥ ¦ of handling in excess of n =100,000 variables without significant, sometimes
crippling, ¸ loss of precision or time.4.3 [37] [314, p.258] [73, p.3]
Nevertheless, semidefinite programming has recently emerged to prominence because it
admits a new problem type previously unsolvable by convex optimization techniques [64]
and because it theoretically subsumes other convex types: (Figure 94) linear programming,
quadratic programming, second-order cone programming.4.4 Determination of the Riemann
mapping function from complex analysis [324] [31, §8, §13], for example, can be posed as
a semidefinite program.

4.1.2 Maximal complementarity


It has been shown [461, §2.5.3] that contemporary interior-point methods [452] [327]
[315] [12] [66, §11] [169] (developed circa 1990 [176] for numerical solution of semidefinite
programs) can converge to a solution of maximal complementarity; [205, §5] [460] [288]
[183] not a vertex solution but a solution of highest cardinality or rank among all optimal
solutions.4.5
4.3 Heuristics are not ruled out by SIOPT; indeed I would suspect that most successful methods have

(appropriately described) heuristics under the hood - my codes certainly do. . . . Of course, there are still
questions relating to high-accuracy and speed, but for many applications a few digits of accuracy suffices
and overnight runs for non-real-time delivery is acceptable.
−Nicholas I. M. Gould, Stanford alumnus, SIOPT Editor in Chief
4.4 Second-order cone programming (SOCP) was born in the 1990s; it is not posable as a quadratic

program. [283]
4.5 This characteristic might be regarded as a disadvantage to interior-point methods of numerical

solution, but this behavior is not certain and depends on solver implementation.
226 CHAPTER 4. SEMIDEFINITE PROGRAMMING

PC semidefinite
second-order cone

quadratically constrained

quadratic

linear

geometric

Figure 94: Venn diagram of program hierarchy. Convex program PC represents broadest
class of convex optimization problem having efficient global solution methods. Semidefinite
program subsumes other convex program types excepting geometric program [65] [88].

This phenomenon can be explained by recognizing that, by design, interior-point


methods generally find solutions relatively interior to a feasible set.4.6 [7, p.3] Log barriers
are designed to fail numerically at the feasible set boundary. So low-rank solutions, all
on the boundary, are rendered more difficult to find as numerical error becomes more
prevalent there.

4.1.2.1 Reduced-rank solution


A simple rank reduction algorithm, for construction of a primal optimal solution X ⋆ to
(711P) satisfying an upper bound on rank governed by Proposition 2.9.3.0.1, is presented
in §4.3. That proposition asserts existence of feasible solutions with an upper bound
on their rank; [28, §II.13.1] specifically, it asserts an extreme point (§2.6.0.0.1) of primal
n
feasible set A ∩ S+ satisfies upper bound
¹√ º
8m + 1 − 1
rank X ≤ (279)
2

where, given A ∈ Rm×n(n+1)/2 (712) and b ∈ Rm ,

A , {X ∈ S n | A svec X = b } (2297)

is the affine subset from primal problem (711P).

4.1.2.2 Coexistence of low- and high-rank solutions; analogy


That low-rank and high-rank optimal solutions {X ⋆ } of (711P) coexist may be grasped
with the following analogy: We compare a proper polyhedral cone S+ 3
in R3 (illustrated in
Figure 95) to the positive semidefinite cone S+ in isometrically isomorphic R6 , difficult
3

to visualize. The analogy is good:


4.6 Simplex methods, in contrast, find vertex solutions. [104, p.158] [17, p.2]
4.1. CONIC PROBLEM 227

0 Γ1 P
Γ2 3
S+

A = ∂H

Figure 95: Visualizing positive semidefinite cone in high dimension: Proper polyhedral
cone S+ 3
⊂ R3 representing positive semidefinite cone S3+ ⊂ S3 ; analogizing its intersection
3
S+ ∩ ∂H with hyperplane. Number of facets is arbitrary (an analogy not inspired by
eigenvalue decomposition). The rank-0 positive semidefinite matrix corresponds to origin
in R3 , rank-1 positive semidefinite matrices correspond to edges of polyhedral cone, rank-2
to facet relative interiors, and rank-3 to polyhedral cone interior. Vertices Γ1 and Γ2 are
3 3
extreme points of polyhedron P = ∂H ∩ S+ , and extreme directions of S+ . A given vector
C is normal to another hyperplane (not illustrated but independent w.r.t ∂H) containing
line segment Γ1 Γ2 minimizing real linear function hC , X i on P . (confer Figure 29,
Figure 33)
228 CHAPTER 4. SEMIDEFINITE PROGRAMMING

ˆ intr S3+ is constituted by rank-3 matrices.


3
intr S+ has three dimensions.
ˆ boundary ∂ S3+ contains rank-0 , rank-1 , and rank-2 matrices.
3
boundary ∂S+ contains 0-, 1-, and 2-dimensional faces.
ˆ the only rank-0 matrix resides in the vertex at the origin.
ˆ Rank-1 matrices are in one-to-one correspondence with extreme directions of S3+
3
and S+ . The set of all rank-1 symmetric matrices in this dimension
G ∈ S3+ | rank G = 1
© ª
(714)
is not a connected set.
ˆ Rank of a sum of members F + G in Lemma 2.9.2.9.1 and location of a difference
F − G in §2.9.2.12.1 similarly hold for S3+ and S+
3
.
ˆ Euclidean distance from any particular rank-3 positive semidefinite matrix (in the
cone interior) to the closest rank-2 positive semidefinite matrix (on the boundary)
is generally less than the distance to the closest rank-1 positive semidefinite matrix.
(§7.1.2)
ˆ distance from any point in ∂ S3+ to intr S3+ is infinitesimal (§2.1.7.1.1).
3 3
distance from any point in ∂S+ to intr S+ is infinitesimal.
ˆ faces of S3+ correspond to faces of S+
3
(confer Table 2.9.2.3.1):

k 3
dim F(S+ ) dim F(S3+ ) dim F(S3+ ∋ rank-k matrix)
0 0 0 0
boundary 1 1 1 1
2 2 3 3
interior 3 3 6 6

Integer k indexes k-dimensional faces F of S+ 3


. Positive semidefinite cone S3+
has four kinds of faces, including cone itself (k = 3 , boundary + interior), whose
dimensions in isometrically isomorphic R6 are listed under dim F(S3+ ). Smallest
face F S3+ ∋ rank-k matrix that contains a rank-k positive semidefinite matrix
¡ ¢

has dimension k(k + 1)/2 by (230).


ˆ For A equal to intersection of m hyperplanes having linearly independent normals,
3
and for X ∈ S+ ∩ A , we have rank X ≤ m ; the analogue to (279).
Proof. With reference to Figure 95: Assume one (m = 1) hyperplane A = ∂H
intersects the polyhedral cone. Every intersecting plane contains at least one matrix
3
having rank less than or equal to 1 ; id est, from all X ∈ ∂H ∩ S+ there exists an
X such that rank X ≤ 1. Rank 1 is therefore an upper bound in this case.
Now visualize intersection of the polyhedral cone with two (m = 2) hyperplanes
having linearly independent normals. The hyperplane intersection A makes a
line. Every intersecting line contains at least one matrix having rank less than
or equal to 2 , providing an upper bound. In other words, there exists a positive
semidefinite matrix X belonging to any line intersecting the polyhedral cone such
that rank X ≤ 2.
In the case of three independent intersecting hyperplanes (m = 3), the hyperplane
intersection A makes a point that can reside anywhere in the polyhedral cone. The
upper bound on a point in S+ 3
is also the greatest upper bound: rank X ≤ 3. ¨
4.1. CONIC PROBLEM 229

3
4.1.2.2.1 Example. Optimization over A ∩ S+ .
Consider minimization of the real linear function hC , X i over

3
P , A ∩ S+ (715)

a polyhedral feasible set;

f0⋆ , minimize hC , X i
X
3
(716)
subject to X ∈ A ∩ S+

As illustrated for particular vector C and hyperplane A = ∂H in Figure 95, this linear
function is minimized on any X belonging to the face of P containing extreme points
{Γ1 , Γ2 } and all the rank-2 matrices in between; id est, on any X belonging to the face
of P
F(P) = {X | hC , X i = f0⋆ } ∩ A ∩ S+
3
(717)

exposed by the hyperplane {X | hC , X i = f0⋆ }. In other words, the set of all optimal
points X ⋆ is a face of P
{X ⋆ } = F(P) = Γ1 Γ2 (718)

comprising rank-1 and rank-2 positive semidefinite matrices. Rank 1 is the upper bound on
existence in the feasible set P for this case m = 1 hyperplane constituting A . The rank-1
matrices Γ1 and Γ2 in face F(P) are extreme points of that face and (by transitivity
(§2.6.1.2)) extreme points of the intersection P as well. As predicted by analogy to
Barvinok’s Proposition 2.9.3.0.1, the upper bound on rank of X existent in the feasible
set P is satisfied by an extreme point. The upper bound on rank of an optimal solution
X ⋆ existent in F(P) is thereby also satisfied by an extreme point of P precisely because
{X ⋆ } constitutes F(P) ;4.7 in particular,

{X ⋆ ∈ P | rank X ⋆ ≤ 1} = {Γ1 , Γ2 } ⊆ F(P) (719)

As all linear functions on a polyhedron are minimized on a face, [104] [287] [311] [318] by
analogy we so demonstrate coexistence of optimal solutions X ⋆ of (711P) having assorted
rank. 2

4.1.2.3 Previous work

Barvinok showed, [26, §2.2] when given a positive definite matrix C and an arbitrarily
small neighborhood of C comprising positive definite matrices, there exists a matrix C̃
from that neighborhood such that optimal solution X ⋆ to (711P) (substituting C̃ ) is an
n
extreme point of A ∩ S+ and satisfies upper bound (279).4.8 Given arbitrary positive
definite C , this means: nothing inherently guarantees that an optimal solution X ⋆ to
problem (711P) satisfies (279); certainly nothing given any symmetric matrix C , as the
problem is posed. This can be proved by example:

4.7 and every face contains a subset of the extreme points of P by the extreme existence theorem

(§2.6.0.0.2). This means: because the affine subset A and hyperplane {X | hC , X i = f0⋆ } must intersect
a whole face of P , calculation of an upper bound on rank of X ⋆ ignores counting the hyperplane when
determining m in (279).
4.8 Further, the set of all such C̃ in that neighborhood is open and dense.
230 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.1.2.3.1 Example. (Ye) Maximal Complementarity.


Assume dimension n to be an even positive number. Then the particular instance of
problem (711P),
¿· ¸ À
I 0
minimize , X
X∈ Sn 0 2I
subject to X º 0 (720)
hI , X i = n
has optimal solution
· ¸
2I 0

X = ∈ Sn (721)
0 0

with an equal number of twos and zeros along the main diagonal. Indeed, optimal solution
(721) is a terminal solution along the central path taken by the interior-point method as
implemented in [461, §2.5.3]; it is also a solution of highest rank among all optimal solutions
to (720). Clearly, rank of this primal optimal solution exceeds by far a rank-1 solution
predicted by upper bound (279). 2

4.1.2.4 Later developments

This rational example (720) indicates the need for a more generally applicable and simple
algorithm to identify an optimal solution X ⋆ satisfying Barvinok’s Proposition 2.9.3.0.1.
We will review such an algorithm in §4.3, but first we provide more background.

4.2 Framework
4.2.1 Feasible sets
Denote by D and D∗ the convex sets of primal and dual points respectively satisfying the
primal and dual constraints in (711), each assumed nonempty;

hA1 , X i
   
..
 
n  n
D = X ∈ S+ | .  = b = A ∩ S+
hAm , X i
 
(722)
m
½ ¾
∗ n m
D = S ∈ S+ , y = [yi ] ∈ R |
P
yi Ai + S = C
i=1

These are the primal feasible set and dual feasible set. Geometrically, primal feasible
n n
A ∩ S+ represents an intersection of the positive semidefinite cone S+ with an affine subset
A of the subspace of symmetric matrices S in isometrically isomorphic Rn(n+1)/2 . A has
n

dimension n(n + 1)/2 − m when the vectorized Ai are linearly independent. Dual feasible
set D∗ is a Cartesian product of the positivePsemidefinite cone with its inverse image
(§2.1.9.0.1) under affine transformation4.9 C − yi Ai . Both feasible sets are convex, and
the objective functions are linear on a Euclidean vector space. Hence, (711P) and (711D)
are convex optimization problems.

4.9 Inequality directly from (711D) (§2.9.0.1.1) and is known as a linear matrix
P
C − yi Ai º 0 follows
inequality. (§2.13.6.1.1) Because
P
yi Ai ¹ C , matrix S is known as a slack variable (a term borrowed
from linear programming [104]) since its inclusion raises this inequality to equality.
4.2. FRAMEWORK 231

n
4.2.1.1 A ∩ S+ emptiness determination via Farkas’ lemma
4.2.1.1.1 Lemma. Semidefinite Farkas’ lemma. (confer §4.2.1.1.2)
Given affine subset A = {X ∈ S n |hAi , X i = b i , i = 1 . . . m} (2297), vector b = [b i ] ∈ Rm ,
and set {Ai ∈ S n , i = 1 . . . m} such that {A svec X | X º 0} (390) is closed, then primal
n
feasible set A ∩ S+ is nonempty if and only if y T b ≥ 0 holds for each and every vector
m
y = [yi ] ∈ Rm such that
P
yi Ai º 0.
i=1
n
Equivalently, primal feasible set A ∩ S+ is nonempty if and only if y T b ≥ 0 holds for
m
P
each and every vector kyk = 1 such that yi Ai º 0. ⋄
i=1

Semidefinite Farkas’ lemma provides necessary and sufficient conditions for a set of
n
hyperplanes to have nonempty intersection A ∩ S+ with the positive semidefinite cone.
Given  
svec(A1 )T
.. m×n(n+1)/2
A = ∈R (712)
 
.
svec(Am )T
semidefinite Farkas’ lemma assumes that a convex cone
K = {A svec X | X º 0} (390)
is closed per membership relation (327) from which the lemma springs: [265, §I] K closure
is attained when matrix A satisfies the cone closedness invariance corollary (p.143). Given
closed convex cone K and its dual from Example 2.13.6.1.1
m
X
K∗ = {y | yj Aj º 0} (397)
j=1
then we can apply membership relation
b ∈ K ⇔ hy , bi ≥ 0 ∀ y ∈ K∗ (327)
to obtain the lemma
n
b∈K ⇔ ∃ X º 0 Ä A svec X = b ⇔ A ∩ S+ 6= ∅ (723)
∗ n
b∈K ⇔ hy , bi ≥ 0 ∀ y ∈ K ⇔ A∩ S+ 6= ∅ (724)
The final equivalence synopsizes semidefinite Farkas’ lemma.
While the lemma is correct as stated, a positive definite version is required for
semidefinite programming [461, §1.3.8] because existence of a feasible solution in the cone
n
interior A ∩ intr S+ is required by Slater’s condition 4.10 to achieve 0 duality gap (optimal
primal−dual objective difference, §4.2.3, Figure 64). Geometrically, a positive definite
lemma is required to insure that a point of intersection closest to the origin is not at
infinity; e.g, Figure 48. Then given A ∈ Rm×n(n+1)/2 having rank m , we wish to detect
existence of nonempty primal feasible set interior to the PSD cone;4.11 (393)
n
b ∈ intr K ⇔ hy , bi > 0 ∀ y ∈ K∗, y 6= 0 ⇔ A ∩ intr S+ 6= ∅ (725)
Positive definite Farkas’ lemma is made from proper cones, K (390) and K∗ (397), and
membership relation (333) for which K closedness is unnecessary:
4.10 Slater’ssufficient constraint qualification is satisfied whenever any primal or dual strictly feasible
solution exists; id est, any point satisfying the respective affine constraints and relatively interior to the
convex cone. [372, §6.6] [43, p.325] If the cone were polyhedral, then Slater’s constraint qualification is
satisfied when any feasible solution exists (relatively interior to the cone or on its relative boundary).
[66, §5.2.3]
4.11 Detection of A ∩ intr S n 6= ∅ by examining intr K instead is a trick need not be lost.
+
232 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.2.1.1.2 Lemma. Positive definite Farkas’ lemma. (confer §4.2.1.1.1)


Given l.i. set {Ai ∈ S n , i = 1 . . . m} and vector b = [b i ] ∈ Rm , make affine set

A = {X ∈ S n |hAi , X i = b i , i = 1 . . . m} (2297)
n
Primal feasible cone interior A ∩ intr S+ is nonempty if and only if y T b > 0 holds for each
m
P
and every vector y = [yi ] 6= 0 such that yi Ai º 0.
i=1
n
Equivalently, primal feasible cone interior A ∩ intr S+ is nonempty if and only if
m
T
y b > 0 holds for each and every vector kyk = 1 Ä
P
yi Ai º 0. ⋄
i=1

4.2.1.1.3 Example. “New” Farkas’ lemma.


Lasserre [265, §III] presented an example in 1995, originally offered by Ben-Israel in 1969
[34, p.378], to support closedness in semidefinite Farkas’ Lemma 4.2.1.1.1:

svec(A1 )T
· ¸ · ¸ · ¸
0 1 0 1
A , = , b = (726)
svec(A2 )T 0 0 1 0
n
Intersection A ∩ S+ is practically empty because the solution set
(" # )
α √1
2
{X º 0 | A svec X = b} = º 0 | α∈ R (727)
√1 0
2

m
yi Ai º 0 ⇒ y T b ≥ 0 the dual
P
is positive semidefinite only asymptotically (α → ∞). Yet
i=1
system erroneously indicates nonempty intersection because K (390) violates a closedness
condition of the lemma; videlicet, for kyk = 1
" #
0 √1
· ¸ · ¸
0 0 0
y1 √1 2 + y2 º 0 ⇔ y= ⇒ yT b = 0 (728)
2
0 0 1 1

n
On the other hand, positive definite Farkas’ Lemma 4.2.1.1.2 certifies that A ∩ intr S+ is
empty; what we need to know for semidefinite programming.
Lasserre suggested addition of another condition to semidefinite Farkas’ lemma
(§4.2.1.1.1) to make a new lemma having no closedness condition. But positive definite
Farkas’ lemma (§4.2.1.1.2) is simpler and obviates the additional condition proposed.
2

4.2.1.2 Theorem of the alternative for semidefinite programming


Because these Farkas’ lemmas follow from membership relations, we may construct
alternative systems from them. Applying the method of §2.13.2.1.1, then from positive
definite Farkas’ lemma we get
n
A ∩ intr S+ 6= ∅
or in the alternative (729)
m
T
P
y b≤0, yi Ai º 0 , y 6= 0
i=1

n
Any single vector y satisfying the alternative certifies A ∩ intr S+ is empty. Such a vector
can be found as a solution to another semidefinite program: for linearly independent
4.2. FRAMEWORK 233

(vectorized) set {Ai ∈ S n , i = 1 . . . m}

minimize yT b
y
m
(730)
P
subject to yi Ai º 0
i=1
kyk2 ≤ 1

If an optimal vector y ⋆ 6= 0 can be found such that y ⋆T b ≤ 0 , then primal feasible cone
n
interior A ∩ intr S+ is empty.

4.2.1.3 Boundary-membership criterion


(confer (724)(725)) From boundary-membership relation (337), for proper cones K (390)
and K∗ (397) of linear matrix inequality,
n n
b ∈ ∂K ⇔ ∃ y 6= 0 Ä hy , bi = 0 , y ∈ K∗, b ∈ K ⇔ ∂ S+ ⊃ A ∩ S+ 6= ∅ (731)

Whether vector b ∈ ∂K belongs to cone K boundary, that is a determination we can indeed


make; one that is certainly expressible as a feasibility problem: Given linearly independent
set4.12 {Ai ∈ S n , i = 1 . . . m} , for b ∈ K (723)

find y 6= 0
subject to yT b = 0 (732)
Pm
yi Ai º 0
i=1

Any such nonzero solution y certifies that affine subset A (2297) intersects the positive
n n
semidefinite cone S+ only on its boundary; in other words, nonempty feasible set A ∩ S+
n
belongs to the positive semidefinite cone boundary ∂ S+ .

4.2.2 Duals
The dual objective function from (711D) evaluated at any feasible solution represents a
lower bound on the primal optimal objective value from (711P). We can see this by direct
n
substitution: Assume the feasible sets A ∩ S+ and D∗ are nonempty. Then it is always
true:
¿ hC , XÀ
i ≥ hb , yi
P
yi Ai + S , X ≥ [ hA1 , X i · · · hAm , X i ] y (733)
i
hS , X i ≥ 0
The converse also follows because

X º 0 , S º 0 ⇒ hS , X i ≥ 0 (1655)

Optimal value of the dual objective thus represents the greatest lower bound on the primal.
This fact is known as weak duality for semidefinite programming, [461, §1.3.8] and can be
used to detect convergence in any primal/dual numerical method of solution.
4.12 From the results of Example 2.13.6.1.1, vector b on the boundary of K cannot be detected simply
by looking for 0 eigenvalues in matrix X . We do not consider a thin-or-square matrix A because then
n
feasible set A ∩ S+ is at most a single point.
234 CHAPTER 4. SEMIDEFINITE PROGRAMMING

P P̃

duality
duality

D D̃
transformation

Figure 96: Connectivity indicates paths between particular primal and dual problems
from Exercise 4.2.2.1.1. More generally, any path between primal problems P (and
equivalent P̃) and dual D (and equivalent D̃) is possible: implying, any given path is
not necessarily circuital; dual of a dual problem is not necessarily stated in precisely same
manner as corresponding primal convex problem, in other words, although its solution set
is equivalent to within some transformation.

4.2.2.1 Dual problem statement is not unique


Even subtle but equivalent restatements of a primal convex problem can lead to vastly
different statements of a corresponding dual problem. This phenomenon is of interest
because a particular instantiation of dual problem might be easier to solve numerically or
it might take one of few forms for which analytical solution is known.
Here is a canonical restatement of prototypical dual semidefinite program (711D), for
example, equivalent by (202):

maximize hb , yi
y∈Rm, S∈ Sn maximize
m
hb , yi
(D) subject to S º 0 ≡ y∈R (711D̃)
subject to svec−1 (AT y) ¹ C
svec−1 (AT y) + S = C
n
Dual feasible cone interior in intr S+ (722) (713) thereby corresponds with canonical dual
(D̃) feasible interior
m
( )
X
∗ m
rel intr D̃ , y ∈ R | yi Ai ≺ C (734)
i=1

4.2.2.1.1 Exercise. Prototypical primal semidefinite program.


Derive prototypical primal (711P) from its canonical dual (711D̃); id est, demonstrate that
particular connectivity in Figure 96. H

4.2.3 Optimality conditions


n
When primal feasible cone interior A ∩ intr S+ exists in S n or when canonical dual
m
feasible interior rel intr D̃∗ exists in R , then these two problems (711P) (711D) become
strong duals by Slater’s sufficient condition (p.231). In other words, the primal optimal
objective value becomes equal to the dual optimal objective value: there is no duality gap
n
(Figure 64) and so determination of convergence is facilitated; id est, if ∃ X ∈ A ∩ intr S+
4.2. FRAMEWORK 235

or ∃ y ∈ rel intr D̃∗ then

¿ hC , X ⋆À i = hb , y ⋆ i
yi⋆ Ai + S ⋆ , X ⋆ = [ hA1 , X ⋆ i · · · hAm , X ⋆ i ] y ⋆
P
(735)
i
hS ⋆ , X ⋆ i = 0

where S ⋆ , y ⋆ denote a dual optimal solution.4.13 We summarize this:

4.2.3.0.1 Corollary. Optimality and strong duality. [406, §3.1] [461, §1.3.8]
For semidefinite programs (711P) and (711D), assume primal and dual feasible sets
n
A ∩ S+ ⊂ S n and D∗ ⊂ S n × Rm (722) are nonempty. Then
ˆ X ⋆ is optimal for (711P)
ˆ S ⋆ , y ⋆ are optimal for (711D)
ˆ duality gap hC , X ⋆ i− hb , y ⋆ i is 0
if and only if
n
i) ∃ X ∈ A ∩ intr S+ or ∃ y ∈ rel intr D̃∗
and
ii) hS ⋆ , X ⋆ i = 0 ⋄

For symmetric positive semidefinite matrices, requirement ii is equivalent to the


complementarity
hS ⋆ , X ⋆ i = 0 ⇔ S ⋆ X ⋆ = X ⋆ S ⋆ = 0 (1767)
Commutativity of diagonalizable matrices is necessary and sufficient [233, §1.3.12] for these
two optimal symmetric matrices to be simultaneously diagonalizable. Therefore
rank X ⋆ + rank S ⋆ ≤ n (736)
Proof. The product of symmetric optimal matrices X ⋆ , S ⋆∈ S n must itself be symmetric
because of commutativity. (1644) The symmetric product has diagonalization [12, cor.2.11]
S ⋆ X ⋆ = X ⋆ S ⋆ = Q ΛS ⋆ ΛX ⋆ QT = 0 ⇔ ΛX ⋆ ΛS ⋆ = 0 (737)
where Q is an orthogonal matrix. Product of the nonnegative diagonal Λ matrices can
be 0 if their main diagonal zeros are complementary or coincide. Due only to symmetry,
rank X ⋆ = rank ΛX ⋆ and rank S ⋆ = rank ΛS ⋆ for these optimal primal and dual solutions.
(1630) So total number of nonzero diagonal entries, from both Λ , cannot exceed n
because of the complementarity. ¨

When equality is attained in (736)


rank X ⋆ + rank S ⋆ = n (738)
there are no coinciding main diagonal zeros in ΛX ⋆ ΛS ⋆ , and so we have what is called
strict complementarity.4.14 Logically it follows that a necessary and sufficient condition
for strict complementarity of an optimal primal and dual solution is
X ⋆ + S⋆ ≻ 0 (739)
4.13 Optimality condition hS ⋆ , X ⋆ i = 0 is called a complementary slackness condition, in keeping with LP
tradition [104], that forbids dual inequalities in (711) to simultaneously hold strictly. [348, §4]
4.14 distinct from maximal complementarity (§4.1.2).
236 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.2.3.1 solving primal problem via dual


The beauty of Corollary 4.2.3.0.1 is its conjugacy; id est, one can solve either the primal or
dual problem in (711) and then find a solution to the other via the optimality conditions.
When a dual optimal solution is known, for example, a primal optimal solution is any
primal feasible solution in hyperplane {X | hS ⋆ , X i = 0}.

4.2.3.1.1 Example. Minimal cardinality Boolean. [103] [36, §4.3.4] [390]


(confer Example 4.6.1.5.1) Consider finding a minimal cardinality Boolean solution x to
the classic linear algebra problem Ax = b given noiseless data A ∈ Rm×n and b ∈ Rm ;

minimize kxk0
x
subject to Ax = b (740)
xi ∈ {0, 1} , i=1 . . . n

where kxk0 denotes cardinality of vector x (a.k.a 0-norm; not a convex function).
A minimal cardinality solution answers the question: “Which fewest linear combination
of columns in A constructs vector b ?” Cardinality problems have extraordinarily wide
appeal, arising in many fields of science and across many disciplines. [361] [246] [200] [199]
Yet designing an efficient algorithm to optimize cardinality has proved difficult. In this
example, we also constrain the variable to be Boolean. The Boolean constraint forces an
identical solution were the norm in problem (740) instead the 1-norm or 2-norm; id est,
the two problems

minimize kxk0 minimize kxk1


x x
(740) = (741)
subject to Ax = b subject to Ax = b
xi ∈ {0, 1} , i=1 . . . n xi ∈ {0, 1} , i=1 . . . n

are the same. The Boolean constraint makes the 1-norm problem nonconvex.
Given data
−1 1 8 1 1 0 1
   

A =  −3 2 8 21 1
3
1
2−3
1 
, b =  12  (742)
1 1 1 1 1
−9 4 8 4 9 4−9 4

the obvious and desired solution to the problem posed,

x⋆ = e4 ∈ R6 (743)

has norm kx⋆ k2 = 1 and minimal cardinality; the minimum number of nonzero entries in
vector x . The Matlab backslash command x=A\b , for example, finds
 2 
128
 0 
 5 
 
xM = 128  (744)
 0 
 
 90 
128
0

having norm kxM k2 = 0.7044 . Coincidentally, xM is a 1-norm solution; id est, an optimal


solution to
minimize kxk1
x (529)
subject to Ax = b
4.2. FRAMEWORK 237

The pseudoinverse solution (rounded)


 
−0.0456
 −0.1881 
 

 0.0623 
xP = A b = 
 0.2668

 (745)
 
 0.3770 
−0.1102

has least norm kxPk2 = 0.5165 ; id est, the optimal solution to (§E.0.1.0.1)

minimize kxk2
x (746)
subject to Ax = b

Certainly none of the traditional methods provide x⋆ = e4 (743).


We can reformulate this minimal cardinality Boolean problem (740) as a semidefinite
program: First transform the variable

x , (x̂ + 1) 21 (747)
so x̂i ∈ {−1, 1} ; equivalently,

minimize k(x̂ + 1) 12 k0

subject to A(x̂ + 1) 12 = b (748)
δ(x̂x̂T ) = 1

where δ is the main-diagonal linear operator (§A.1). By assigning (§B.1)

x̂ [ x̂T 1 ] x̂x̂T x̂
· ¸ · ¸ · ¸
X x̂
G= = , ∈ S n+1 (749)
1 x̂T 1 x̂T 1

problem (748) becomes equivalent to: (Theorem A.3.1.0.7)

minimize 1T x̂
X∈ Sn , x̂∈Rn
1
subject to A(x̂ +
· 1) 2 = b ¸
X x̂ (750)
G= (º 0)
x̂T 1
δ(X) = 1
rank G = 1

where solution is confined to rank-1 vertices of the elliptope in S n+1 (§5.9.1.0.1) by the
rank constraint, the positive semidefiniteness, and the equality constraints δ(X) = 1. The
rank constraint makes this problem nonconvex; by removing it4.15 we get the semidefinite
program
minimize
n n
1T x̂
X∈ S , x̂∈R
1
subject to A(x̂ +
· 1) 2 = b ¸
X x̂ (751)
G= º0
x̂T 1
δ(X) = 1
4.15 Relaxed problem (751) can also be derived via Lagrange duality; it is a dual of a dual program
[sic ] to (750). [346] [66, §5, exer.5.39] [447, §IV] [175, §11.3.4] The relaxed problem must therefore be
convex having a larger feasible set; its optimal objective value represents a generally loose lower bound
(1869) on the optimal objective of problem (750).
238 CHAPTER 4. SEMIDEFINITE PROGRAMMING

whose optimal solution x⋆ (747) is identical to that of minimal cardinality Boolean problem
(740) if and only if rank G⋆ = 1.
Hope4.16 of acquiring a rank-1 solution is not ill-founded because 2n elliptope vertices
have rank 1 and because we are minimizing an affine function on a subset of the elliptope
(Figure 157) containing rank-1 vertices; id est, by assumption that the feasible set of
minimal cardinality Boolean problem (740) is nonempty, a desired solution resides on the
elliptope relative boundary at a rank-1 vertex.4.17
For that data given in (742), our semidefinite program solver sdpsol [454] [455]
(accurate in solution to approximately 1E-8)4.18 finds optimal solution to (751)

 
1 1 1 −1 1 1 −1

 1 1 1 −1 1 1 −1 

 1 1 1 −1 1 1 −1 

 
round(G ) = 
 −1 −1 −1 1 −1 −1 1 
 (752)

 1 1 1 −1 1 1 −1 

 1 1 1 −1 1 1 −1 
−1 −1 −1 1 −1 −1 1

near a rank-1 vertex of the elliptope in S n+1 (Theorem 5.9.1.0.2); its sorted eigenvalues,

 
6.99999977799099

 0.00000022687241 

 0.00000002250296 

 
λ(G ) = 
 0.00000000262974 
 (753)

 −0.00000000999738 

 −0.00000000999875 
−0.00000001000000

Negative eigenvalues are undoubtedly finite-precision effects. Because the largest


eigenvalue predominates by many orders of magnitude, we can expect to find a good
approximation to a minimal cardinality Boolean solution by truncating all smaller
eigenvalues. We find, indeed, the desired result (743)

 
0.00000000127947

 0.00000000527369 

0.00000000181001
x⋆ = round
 
  = e4 (754)

 0.99999997469044 

 0.00000001408950 
0.00000000482903

These numerical results are solver dependent; insofar, not all SDP solvers will return a
rank-1 vertex solution. 2

4.16 A more deterministic approach to constraining rank and cardinality is in §4.7.0.0.12.


4.17 Confinement to the elliptope can be regarded as a kind of normalization akin to matrix A column
normalization suggested in [139] and explored in Example 4.2.3.1.2.
4.18 A typically ignored limitation of interior-point solution methods is their relative accuracy of only about

1E-8 on a machine using 64-bit (double precision) floating-point arithmetic; id est, optimal solution x⋆
cannot be more accurate than square root of machine epsilon (ǫ = 2.2204E-16). Nonzero primal−dual
objective difference is not a good measure of solution accuracy.
4.2. FRAMEWORK 239

4.2.3.1.2 Example. Optimization over elliptope versus 1- norm polyhedron


for minimal cardinality Boolean Example 4.2.3.1.1.
A minimal cardinality problem is typically formulated via, what is by now, a standard
practice [139] [75, §3.2, §3.4] of column normalization applied to a 1-norm problem
surrogate like (529). Suppose we define a diagonal matrix
 
kA(: , 1)k2 0
 kA(: , 2)k2 
Λ ,  ∈ S6 (755)
 
 . .. 
0 kA(: , 6)k2
used to normalize the columns (assumed nonzero) of given noiseless data matrix A . Then
approximate the minimal cardinality Boolean problem
minimize kxk0
x
subject to Ax = b (740)
xi ∈ {0, 1} , i=1 . . . n
as
minimize kỹk1

subject to AΛ−1 ỹ = b (756)
1 º Λ−1 ỹ º 0
where optimal solution
y ⋆ = round(Λ−1 ỹ ⋆ ) (757)
The inequality in (756) relaxes Boolean constraint yi ∈ {0, 1} from (740); bounding any
solution y ⋆ to a nonnegative unit hypercube whose vertices are binary numbers. Convex
problem (756) is justified by the convex envelope
1
cenv kxk0 on {x ∈ Rn | kxk∞ ≤ κ} = kxk1 (1536)
κ
Donoho concurs with this particular formulation, equivalently expressible as a linear
program via (525).
Approximation (756) is therefore equivalent to minimization of an affine function (§3.2)
on a bounded polyhedron, whereas semidefinite program
minimize 1T x̂
X∈ Sn , x̂∈Rn
1
subject to · 1) 2 = b ¸
A(x̂ +
X x̂ (751)
G= º0
x̂T 1
δ(X) = 1
minimizes an affine function on an intersection of the elliptope with hyperplanes. Although
the same Boolean solution is obtained from this approximation (756) as compared with
semidefinite program (751), when given that particular data from Example 4.2.3.1.1,
Singer confides a counterexample: Instead, given data
" #
1 0 √12 1
· ¸
A = , b = (758)
0 1 √12 1

then solving approximation (756) yields


1 − √12
   
0
y ⋆ = round 1 − √12  =  0  (759)
   
1 1
240 CHAPTER 4. SEMIDEFINITE PROGRAMMING

(infeasible, with or without rounding, with respect to original problem (740)) whereas
solving semidefinite program (751) produces
 
1 1 −1 1
 1 1 −1 1 
round(G⋆ ) = 
 −1 −1
 (760)
1 −1 
1 1 −1 1
with sorted eigenvalues
 
3.99999965057264
 0.00000035942736 
λ(G⋆ ) =  
 −0.00000000000000  (761)
−0.00000001000000

Truncating all but the largest eigenvalue, from (747) we obtain (confer y ⋆ )
   
0.99999999625299 1
x⋆ = round 0.99999999625299  =  1  (762)
0.00000001434518 0

the desired minimal cardinality Boolean result. 2

4.2.3.1.3 Exercise. Minimal cardinality Boolean art.


Assess general performance of standard-practice approximation (756) as compared with
the proposed semidefinite program (751). H

4.2.3.1.4 Exercise. Conic independence.


Matrix A from (742) is full-rank having three-dimensional nullspace. Find its four conically
independent columns. (§2.10)4.19 To what part of proper cone K = {Ax | x º 0} does
vector b belong? H

4.2.3.1.5 Exercise. Linear independence.


Show why wide matrix A , from compressed sensing problem (529) or (534), may be
regarded full-rank without loss of generality. In other words: Is a minimal cardinality
solution invariant to linear dependence of rows? H

4.3 Rank reduction


. . . it is not clear generally how to predict rank X ⋆ or rank S ⋆ before solving
the SDP problem.
−Farid Alizadeh, 1995 [12, p.22]

The premise of rank reduction in semidefinite programming is: an optimal solution X ⋆


found does not satisfy Barvinok’s upper bound (279) on rank. The particular numerical
algorithm solving a semidefinite program may have instead returned a high-rank optimal
solution (§4.1.2; e.g, (721)) when a lower-rank optimal solution was expected. Rank
reduction is a means to adjust rank of an optimal solution to (711P), returned by a solver,
until it satisfies Barvinok’s upper bound with the optimal objective value unchanged.
4.19 Hint: §4.4.2.0.2, §4.6.2.0.2.
4.3. RANK REDUCTION 241

4.3.1 posit a perturbation of X ⋆


n
Recall (§4.1.2.1) that there is an extreme point of A ∩ S+ satisfying upper bound (279)
n
on rank. [26, §2.2] It is therefore sufficient to locate an extreme point of A ∩ S+ whose
primal objective value (711P) is optimal:4.20 [127, §31.5.3] [273, §2.4] [274] [8, §3] [331]
Consider again affine subset
A = {X ∈ S n | A svec X = b } (2297)
n
where for Ai ∈ S
 
svec(A1 )T
.. m×n(n+1)/2
A = ∈R (712)
 
.
T
svec(Am )
Given any optimal solution X ⋆ to SDP
minimize
n
hC , X i
X∈ S
n
(711P)
subject to X ∈ A ∩ S+
whose rank does not satisfy upper bound (279), we posit existence of a set of perturbations
{tj Bj | tj ∈ R , Bj ∈ S n , j = 1 . . . n} (763)
to X ⋆ such that, for some 0 ≤ i ≤ n and scalars {tj , j = 1 . . . i} ,
i
X
X⋆ + tj Bj (764)
j=1

becomes an extreme point of A ∩ Sn+ and remains an optimal solution to (711P).


Membership of (764) to affine subset A is secured, for the i th perturbation, by demanding
hBi , Aj i = 0 , j =1 . . . m (765)
n
while membership to positive semidefinite cone S+ is insured by small perturbation (774).
Feasibility of (764) is certified in this manner, whereas optimality is proved in §4.3.3.
The following simple algorithm has low computational intensity and locates an optimal
extreme point, assuming nontrivial solution: given optimal primal solution X ⋆

4.3.1.0.1 Procedure. Rank reduction. [433]


initialize: Bi = 0 ∀ i
for iteration i=1...n
{
i−1
1. compute a nonzero perturbation matrix Bi (768) of X ⋆ + t⋆j Bj
P
j=1

2. maximize ti (774)
i−1
⋆ n
t⋆j Bj S+
P
subject to X + + ti Bi ∈
j=1
} ¶

A rank-reduced optimal solution is then


i
X
X⋆ ← X⋆ + t⋆j Bj (766)
j=1

4.20 There is no known construction for Barvinok’s tighter result (284). −Monique Laurent, 2004
242 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.3.2 rank perturbation form


Perturbations of X ⋆ are independent of constants C ∈ S n and b ∈ Rm in primal and dual
problems (711). Numerical accuracy of any rank-reduced result, found by perturbation of
an initial optimal solution X ⋆ , is therefore quite dependent upon initial accuracy of X ⋆ .

4.3.2.0.1 Definition. Matrix step function. (confer §A.6.2.2.1)


Define the signum-like quasiconcave real function ψ : S n → R
½
1, Z º 0
ψ(Z ) , (767)
−1 , otherwise

The value −1 is taken for indefinite or nonzero negative semidefinite argument.4.21 △

Deza & Laurent [127, §31.5.3] prove: every perturbation matrix Bi , i = 1 . . . n , is of


the form
Bi = −ψ(Zi )Ri Zi RiT ∈ S n (768)
where
i−1
X
X ⋆ , R1 R1T , X⋆ + t⋆j Bj , Xi = Ri RiT ∈ S n (769)
j=1

where the optimal t⋆j are scalars and Ri ∈ Rn×ρ is full-rank and thin where
 
i−1
X
ρ , rankX ⋆ + t⋆j Bj = rank Xi (770)
j=1

and where Zi ∈ S ρ is found at each iteration i by solving a simple feasibility problem:4.22

findρ Ri Zi RiT 6= 0
Zi ∈ S (771)
subject to hZi , RiTAj Ri i = 0 , j =1 . . . m

Were there a sparsity pattern common to each member of set {RiTAj Ri ∈ S ρ , j = 1 . . . m} ,


then a good choice for Zi has 1 in each entry corresponding to a 0 in the pattern; id est,
a sparsity pattern complement. At iteration i
i−1
X
X⋆ + t⋆j Bj + ti Bi = Ri (I − ti ψ(Zi )Zi )RiT (772)
j=1

By fact (1620), therefore


i−1
X
X⋆ + t⋆j Bj + ti Bi º 0 ⇔ 1 − ti ψ(Zi )λ(Zi ) º 0 (773)
j=1

where λ(Zi ) ∈ Rρ denotes the eigenvalues of Zi . Necessity and sufficiency are due to the
facts: Ri can be completed to a nonsingular matrix (§A.3.1.0.5.c), and I − ti ψ(Zi )Zi can
4.21 Because of how 0 and indefinites are handled, ψ is not an odd function; id est, ψ(−Z) 6= −ψ(Z).
4.22 A simple method of solution is closed-form projection of a nonzero random point Zi on that
proper subspace of isometrically isomorphic Rρ(ρ+1)/2 specified by the constraints. (§E.5.0.0.7) Such
a solution is nontrivial assuming the specified intersection of hyperplanes is not the origin; guaranteed
by ρ(ρ + 1)/2 > m . This geometric intuition, about forming a perturbation, is indeed what bounds any
solution’s rank from below; m is fixed by the number of equality constraints in (711P) while rank ρ
decreases with each iteration i . Otherwise, we might iterate indefinitely.
4.3. RANK REDUCTION 243

be padded with zeros while maintaining equivalence in (772). Maximization of each ti , in


step 2 of Procedure 4.3.1.0.1, reduces rank of (772) so locates a new point on the boundary
n 4.23
∂(A ∩ S+ ). Maximization of ti thereby has closed form;

(t⋆i )−1 = max {ψ(Zi )λ(Zi )k , k = 1 . . . ρ} (774)

When Zi is indefinite, direction of perturbation (determined by ψ(Zi )) is arbitrary. We


may take an early exit, from the Procedure, were all feasible Ri Zi RiT to become {0} or
were ρ to become equal to 1 (assuming a nontrivial solution) or were

rank svec RiTA1 Ri svec RiTA2 Ri · · · svec RiTAm Ri = ρ(ρ + 1)/2


£ ¤
(775)

n
(281) which characterizes rank ρ of any [sic] extreme point in A ∩ S+ . [273, §2.4] [274]

Proof. Assuming the form of every perturbation matrix is indeed (768), then by (771)

svec Zi ⊥ svec(RiTA1 Ri ) svec(RiTA2 Ri ) · · · svec(RiTAm Ri )


£ ¤
(776)

By orthogonal complement we have


¤⊥
rank svec(RiTA1 Ri ) svec(RiTA2 Ri ) · · · svec(RiTAm Ri )
£
(777)
+ rank svec(RiTA1 Ri ) svec(RiTA2 Ri ) · · · svec(RiTAm Ri ) = ρ(ρ + 1)/2
£ ¤

When Zi can only be 0 , then the perturbation is null because an extreme point has been
found; thus
¤⊥
svec(RiTA1 Ri ) svec(RiTA2 Ri ) · · · svec(RiTAm Ri )
£
= 0 (778)

from which the stated result (775) directly follows. ¨

4.3.3 Optimality of perturbed X ⋆


We show that the optimal objective value is unaltered by perturbation (768); id est,

i
X
hC , X ⋆ + t⋆j Bj i = hC , X ⋆ i (779)
j=1

Proof. From Corollary 4.2.3.0.1 we have the necessary and sufficient relationship between
optimal primal and dual solutions under assumption of nonempty primal feasible cone
n
interior A ∩ intr S+ :

S ⋆ X ⋆ = S ⋆ R1 R1T = X ⋆ S ⋆ = R1 R1T S ⋆ = 0 (780)

This means R(R1 ) ⊆ N (S ⋆ ) and R(S ⋆ ) ⊆ N (R1T ). From (769) and (772), after 0-padding
Zi for dimensional compatibility, come the sequence:

4.23 This holds because rank of a positive semidefinite matrix in S n is diminished below n by the number
of its 0 eigenvalues (1630), and because a positive semidefinite matrix having one or more 0 eigenvalues
corresponds to a point on the PSD cone boundary (200).
244 CHAPTER 4. SEMIDEFINITE PROGRAMMING

X ⋆ = R1 R1T
X ⋆ + t⋆1 B1 = R2 R2T = R1 (I − t⋆1 ψ(Z1 )Z1 )R1T
p p
X ⋆ + t⋆1 B1 + t⋆2 B2 = R3 R3T = R2 (I − t⋆2 ψ(Z2 )Z2 )R2T = R1 I − t⋆1 ψ(Z1 )Z1 (I − t⋆2 ψ(Z2 )Z2 ) I − t⋆1 ψ(Z1 )Z1 R1T
..
. Ã !Ã !
i i q 1q
X⋆ + t⋆j Bj = R1 I − t⋆j ψ(Zj )Zj I − t⋆j ψ(Zj )Zj R1T ,
P Q Q
(781) i>0
j=1 j=1 j=i

where second product counts backwards. Substituting C = svec−1 (AT y ⋆ ) + S ⋆ from (711),
* +
i i q 1q
hC , X ⋆ + t⋆j Bj i = svec−1 (AT y ⋆ ) + S ⋆ , R1 I − t⋆j ψ(Zj )Zj I − t⋆j ψ(Zj )Zj R1T
P Q Q
j=1 j=1 j=i
* +
m i
=
P ⋆
yk Ak , X +⋆
P ⋆
tj Bj (782)
k=1 j=1
¿m À
P ⋆
= yk Ak + S ⋆ , X ⋆ = hC , X ⋆ i
k=1

because hBi , Aj i = 0 ∀ i , j by design (765). ¨

4.3.3.0.1 Example. A δ(X) = b .


This academic example demonstrates that a solution found by rank reduction can certainly
have rank less than Barvinok’s upper bound (279): Assume that a given vector b belongs
to the conic hull of columns of a given matrix A

−1 1 8 1 1 1
   

A =  −3 2 8 1 1  ∈ Rm×n , b = 1  ∈ Rm (783)
2 3 2
1 1 1
−9 4 8 4 9 4

Consider the convex optimization problem

minimize
5
tr X
X∈ S
subject to X º 0 (784)
A δ(X) = b

that minimizes the 1-norm of the main diagonal; id est, problem (784) is the same as

minimize
5
kδ(X)k1
X∈ S
subject to X º 0 (785)
A δ(X) = b

that finds a solution to A δ(X) = b . Rank-3 solution X ⋆ = δ(xM ) is optimal, where


(confer (744))
 2 
128
 0 
 5 
xM =  
 128  (786)
 0 
90
128
4.3. RANK REDUCTION 245

Yet upper bound (279) predicts existence of at most a


µ¹√ º ¶
8m + 1 − 1
rank - =2 (787)
2

feasible solution from m = 3 equality constraints. To find a lower rank ρ optimal solution
to (784) (barring combinatorics), we invoke Procedure 4.3.1.0.1:

Initialize: C = I , ρ = 3 , Aj , δ(A(j , :)) , j = 1, 2, 3, X ⋆ = δ(xM ) , m = 3 , n = 5.


{
Iteration i=1:
 q 
2
128 0 0
 

 0 q0 0 

Step 1: R1 =  5
0 0 .
 
 128 

 0 0 q0


90
0 0 128

find R1 Z1 R1T 6= 0
Z1 ∈ S3 (788)
subject to hZ1 , R1TAj R1 i = 0 , j = 1, 2, 3

A nonzero randomly selected matrix Z1 , having 0 main diagonal, is a solution


yielding nonzero perturbation matrix B1 . Choose arbitrarily

Z1 = 11T − I ∈ S3 (789)
Then (rounding)
 
0 0 0.0247 0 0.1048

 0 0 0 0 0 

B1 = 
 0.0247 0 0 0 0.1657 
 (790)
 0 0 0 0 0 
0.1048 0 0.1657 0 0

Step 2: t⋆1 = 1 because λ(Z1 ) = [ −1 −1 2 ]T . So,


2
 
128 0 0.0247 0 0.1048
 0 0 0 0 0 

t⋆1 B1 5
 
X ← δ(xM ) + =
 0.0247 0 128 0 0.1657 
 (791)
 0 0 0 0 0 
90
0.1048 0 0.1657 0 128

has rank ρ ← 1 and produces the same optimal objective value.


} 2

4.3.3.0.2 Exercise. Rank reduction of maximal complementarity.


Apply rank reduction Procedure 4.3.1.0.1 to the maximal complementarity example
(§4.1.2.3.1). Demonstrate a rank-1 solution; which can certainly be found (by Barvinok’s
Proposition 2.9.3.0.1) because there is only one equality constraint. H
246 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.3.4 thoughts regarding rank reduction


Because rank reduction Procedure 4.3.1.0.1 is guaranteed only to produce another
optimal solution conforming to Barvinok’s upper bound (279), the Procedure will not
necessarily produce solutions of arbitrarily low rank; but if they exist, the Procedure
can. Arbitrariness of search direction, when matrix Zi becomes indefinite (mentioned on
page 243), and the enormity of choices for Zi (771) are liabilities for this algorithm.

4.3.4.1 inequality constraints


The question naturally arises: what to do when a semidefinite program (not in prototypical
form (711))4.24 has linear inequality constraints of the form
αiT svec X ¹ ϕi , i = 1... k (792)
where {ϕi } are given scalars and {αi } are given vectors. One expedient way to handle this
circumstance is to convert the inequality constraints to equality constraints by introducing
a slack variable γ ; id est,
αiT svec X + γi = ϕi , i = 1... k , γº0 (793)
thereby converting the problem to prototypical form.
Alternatively, we say the i th inequality constraint is active when it is met with equality;
id est, when for particular i in (792), αiT svec X ⋆ = ϕi . An optimal high-rank solution
X ⋆ is, of course, feasible (satisfying all the constraints). But for the purpose of rank
reduction, inactive inequality constraints are ignored while active inequality constraints are
interpreted as equality constraints. In other words, we take the union of active inequality
constraints (as equalities) with equality constraints A svec X = b to form a composite
affine subset  substituting for (2297). Then we proceed with rank reduction of X ⋆ as
though the semidefinite program were in prototypical form (711P).

4.4 Cardinality reduction


Analogous to rank reduction of semidefinite variable in SDP (§4.3), cardinality reduction
of vector variable in LP means: to lower cardinality of an optimal solution to (710p) (found
by numerical solver) while leaving the optimal objective value unchanged.

4.4.1 perturbation of x⋆
Given affine subset
A = {x ∈ Rn | Ax = b } (153)
where
aT
 
1
.
A =  ..  ∈ Rm×n (152)
aTm
and given any optimal solution x⋆ to LP
minimize c Tx
x
subject to xº0 (710p)
Ax = b
4.24 Contemporary numerical packages for solving semidefinite programs can solve a range of problems
wider than prototype (711). Generally, they do so by transforming a given problem into prototypical form
by introducing new constraints and variables. [12] [455] We are momentarily considering a departure from
the primal prototype that augments the constraint set with linear inequalities.
4.4. CARDINALITY REDUCTION 247

whose cardinality is not minimal, an extreme point of A ∩ Rn+ (whose primal objective
value (710p) is optimal) would possess reduced cardinality. To reveal such an extreme
point, we posit existence of a set of perturbations to x⋆ (like those in §4.3.1)

{tj βj | tj ∈ R , βj ∈ Rn , j = 1 . . . n} (794)

such that, for some 0 ≤ i ≤ n and set of scalars {tj , j = 1 . . . i} ,


i
X
x⋆ + tj βj (795)
j=1

becomes extreme and optimal. Membership of (795) to affine subset A is guaranteed, for
the i th perturbation, by constraints

hβi , aj i = 0 , j =1 . . . m (796)

while membership to nonnegative orthant Rn+ is insured by small perturbation (805).


Thus, feasibility of (795) is certain.

4.4.2 cardinality perturbation form


Perturbation of x⋆ is independent of vector constants c ∈ Rn and b ∈ Rm in primal and
dual problems (710). Every perturbation βi , i = 1 . . . n , is a vector of the form

βi = −ψ(δ(zi ))zi ◦ xi ∈ Rn (797)


where
i−1
X
x⋆ , x1 , x⋆ + t⋆j βj , xi ∈ Rn (798)
j=1

where the optimal t⋆j are scalars and where zi is found at each iteration i by solving a
simple feasibility problem:

find zi ◦ xi 6= 0
zi ∈ Rn (799)
subject to hzi , aj ◦ xi i = 0 , j =1 . . . m

Cardinality ρ of xi ∈ Rn is equivalent to number of its nonzero entries:


 
i−1
X
ρ , cardx⋆ + t⋆j βj = card xi (800)
j=1
At iteration i
i−1
X
x⋆ + t⋆j βj + ti βi = (1 − ti ψ(δ(zi ))zi ) ◦ xi (801)
j=1

Hence, the sequence

x⋆ = x1
x⋆ + t⋆1 β1 = x2 = (1 − t1 ψ(δ(z1 ))z1 ) ◦ x1
x⋆ + t⋆1 β1 + t⋆2 β2 = x3 = (1 − t2 ψ(δ(z2 ))z2 ) ◦ x2 = (1 − t1 ψ(δ(z1 ))z1 ) ◦ (1 − t2 ψ(δ(z2 ))z2 ) ◦ x1
..
. Ã !
i i
x⋆ + t⋆j βj =
P Q
δ(1 − ti ψ(δ(zi ))zi ) x1 , (802) i>0
j=1 j=1
248 CHAPTER 4. SEMIDEFINITE PROGRAMMING

from which it follows (in order of iteration):


i−1
X

x + t⋆j βj + ti βi º 0 ⇔ 1 − ti ψ(δ(zi ))zi º 0 , i=1 . . . n (803)
j=1

The following algorithm locates an optimal extreme point, assuming nontrivial solution:
given any optimal primal solution x⋆

4.4.2.0.1 Procedure. Cardinality reduction.


initialize: βi = 0 ∀ i
for iteration i=1...n
{
i−1
1. compute a nonzero perturbation vector βi (797) of x⋆ + t⋆j βj
P
j=1

2. maximize ti (805)
i−1
subject to x⋆ + t⋆j βj + ti βi º 0
P
j=1
} ¶

A cardinality-reduced optimal solution is then


i
X
x⋆ ← x⋆ + t⋆j βj (804)
j=1

Maximization of ti , in step 2 of Procedure 4.4.2.0.1, reduces cardinality of (801) so locates


a new point on boundary ∂(A ∩ Rn+ ). Maximization of ti thereby has closed form;
(t⋆i )−1 = max {ψ(δ(zi ))zi (k) , k = 1 . . . n} (805)
We may exit early, from the Procedure, were all feasible zi ◦ xi to become {0} or were
cardinality ρ to become 1 or were
rank[ a1 ◦ xi a2 ◦ xi · · · am ◦ xi ] = ρ (806)
which characterizes cardinality ρ of any extreme point in A ∩ Rn+ .

4.4.2.0.2 Example. Ax = b .
Cardinality minimization is often at odds with norm minimization because these two
objectives can compete; e.g, §4.2.3.1.1. Yet, prior knowledge of optimal norm objective
value may facilitate a cardinality minimization problem. If optimal solution x⋆ were known
to be binary with particular cardinality ρ , for example, then a linear constraint on the
variable 1T x = ρ might be warranted because ρ = kxk1 for a binary variable. Columns of
this particular A matrix
−1 1 8 1 1 0 1
   

A =  −3 2 8 12 1
3
1
2−3
1 
∈ Rm×n , b =  21  ∈ Rm (742)
1 1 1 1 1
−9 4 8 4 9 4−9 4

constitute generators of a pointed polyhedral cone K .4.25 Vector b predetermines optimal


1-norm, of a binary variable, to be 1 or 2. This convex feasibility problem
find x ∈ R6
subject to x º 0
(807)
Ax = b
cTx = 1
4.25 Columns {1,5,2,6} are c.i. generators, {1,5} {5,2} {2,6} {6,1} generate facets, {3,4} are interior to K .
4.5. RANK CONSTRAINT BY CONVEX ITERATION 249

brings objective cTx (c = 1) down into the constraints. Were cardinality-1 solution found,
feasible x would certainly be binary. Because minimization of cTx is forgone, conditions
for 0-duality gap (312) are unmet; objective value cannot be maintained as in §4.3.3.
 2 
159
 0 
 
 5 
 159 
xG =  (808)
 0 

 121 
 159 
31
159

Cardinality-4 xG solves (807). Ignoring norm constraint cTx = 1 , Procedure 4.4.2.0.1


may be invoked to find a lesser cardinality solution:

Initialize: c = 1 , ρ = 1 , aj , j = 1, 2, 3 (152)(p.246), x⋆ = xG , m = 3 , n = 6.
{
Iteration i=1:
Step 1: x1 = x⋆ .
find z1 ◦ x1 6= 0
z1 ∈ R6 (809)
subject to hz1 , aj ◦ x1 i = 0 , j = 1, 2, 3
Choose
159 T
£ 159 159 1546
¤
z1 = − 128 1 − 128 1 3963 31 (810)
Then (797)
£ 1 5 19
¤T
β1 = − 64 0 − 128 0 64 1 (811)
Step 2: t⋆1 = 128
159 . So,
T
x⋆ ← xG + t⋆1 β1 = [ 0 0 0 0 1 1] (812)

has cardinality ρ ← 2.
}

Further iterations i produce zi = 0. 2

As illustrated by Example 4.4.2.0.2, cardinality reduction can fail (at (797)) to find a
minimal cardinality solution when x1 has a 0-entry in a minimal cardinality location.
This result instigates search for a new method:

4.5 Rank constraint by Convex Iteration


We generalize the trace heuristic (§7.2.2.1), for finding low-rank optimal solutions to
semidefinite programs of a more general form:

4.5.1 constraining rank of semidefinite matrices


Consider a semidefinite feasibility problem of the form
find G
G∈ SN
subject to G∈ C (813)
Gº0
rank G ≤ n
where C is a convex set presumed to contain positive semidefinite matrices of rank n
or less; id est, C intersects the positive semidefinite cone boundary. We propose: this
250 CHAPTER 4. SEMIDEFINITE PROGRAMMING

rank-constrained feasibility problem can be equivalently expressed as iteration of the


convex problem sequence (814) and (1892a):
minimize hG , W i
G∈ SN
subject to G∈ C (814)
Gº0

where direction vector 4.26 W ∈ SN is an optimal solution to the following semidefinite


program, for 0 ≤ n ≤ N − 1
XN
λ(G⋆ )i = minimize hG⋆ , W i (1892a)
W ∈ SN
i=n+1
subject to 0 ¹ W ¹ I
tr W = N − n
whose feasible set is a Fantope (§2.3.2.0.1),4.27 and where G⋆ is an optimal solution to
problem (814) given some iterate W . The idea is to iterate solution of (814) and (1892a)
until convergence as defined in §4.5.1.2: (confer (850))
N
X
λ(G⋆ )i = hG⋆ , W ⋆ i = λ(G⋆ )T λ(W ⋆ ) , 0 (815)
i=n+1
defines global optimality of the iteration; a vanishing objective that is a certificate of global
optimality but cannot be guaranteed. Inner product of eigenvalues follows from (1767)
and properties of commutative matrix products (p.496). Optimal direction vector W ⋆ is
defined as any positive semidefinite matrix yielding optimal solution G⋆ of rank n or less
to then convex equivalent (814) of feasibility problem (813):
find G
G∈ SN minimize hG , W ⋆ i
G∈ SN
(813) subject to G∈ C ≡ (814)
subject to G∈ C
Gº0
Gº0
rank G ≤ n
id est, any direction vector for which the last N − n nonincreasingly ordered eigenvalues
λ of G⋆ are zero.
In any semidefinite feasibility problem, a solution of least rank must be an extreme
point of the feasible set.4.28 This means there exists a hyperplane supporting the feasible
set at that extreme point. (§2.11) Then there must exist a linear objective function such
that this least-rank feasible solution optimizes the resultant semidefinite program.
We emphasize that convex problem (814) is not a relaxation of rank-constrained
feasibility problem (813); at global optimality, convex iteration (814) (1892a) makes it
instead an equivalent problem.

4.5.1.1 direction matrix interpretation


(confer §4.6.1.2) The feasible set of direction matrices in (1892a) is the convex hull of outer
product of all rank-(N − n) orthonormal matrices; videlicet,
n o n o
conv U U T | U ∈ RN ×N −n , U T U = I ≡ A ∈ SN | I º A º 0 , hI , A i = N − n (93)
4.26 Search direction W is a hyperplane-normal pointing opposite to direction of movement describing
minimization of a real linear function hG , W i (p.62).
4.27 Sum of eigenvalues follows from a result of Ky Fan (p.539).
4.28 which follows by extremes theorem 2.8.1.1.1, by rank of a sum of positive semidefinite matrices (1636)

(266), and by definition of extreme point (172) for which no convex combination can produce it: If a least
rank solution were expressible as a convex combination of feasible points, then there could exist feasible
matrices of lesser rank.
4.5. RANK CONSTRAINT BY CONVEX ITERATION 251

Sn

(I −W )G⋆ (I −W ) G⋆

W G⋆ W Sn⊥

Figure 97: (confer Figure 195) Projection of G⋆ on subspace Sn of rank ≤ n matrices


whose nullspace contains N (G⋆ ). This direction W is closed-form solution to (1892a).

This set (95), argument to conv{ } , comprises the extreme points of this Fantope (93).
An optimal solution W to (1892a), that is an extreme point, is known in closed form
(p.539): Given ordered diagonalization G⋆ = QΛQT ∈ SN + (§A.5.1), then direction matrix
W=U U ⋆ ⋆T
is optimal and extreme where U = Q(: , n+1 : N ) ∈ RN ×N −n . Eigenvalue

vector λ(W ) has 1 in each entry corresponding to the N − n smallest entries of δ(Λ) and
has 0 elsewhere. By (229) (232), polar direction −W can be regarded as pointing toward
the set of all rank-n (or less) positive semidefinite matrices whose nullspace contains that
of G⋆ . For that particular closed-form solution W , consequent to Theobald (p.495),
(confer (852))
XN
λ(G⋆ )i = hG⋆ , W i = λ(G⋆ )T λ(W ) ≥ 0 (816)
i=n+1

This is the connection to cardinality minimization of vectors;4.29 id est, eigenvalue λ


cardinality (rank) is analogous to vector x cardinality via (852): for positive semidefinite X
P ∗
pP i λ(X)i = √
tr X = kXk2 ⇔ kxk1
2 tr X 2 kXkF ⇔ kxk2
i λ(X)i = = (817)
max{λ(X)i } = kXk2 ⇔ kxk∞
i

So that this method, for constraining rank, will not be misconstrued under closed-form
solution W to (1892a): Define (confer (229))

Sn , {(I −W )G(I −W ) | G ∈ SN } = {X ∈ SN | N (X) ⊇ N (G⋆ )} (818)


4.29not trace minimization of a nonnegative diagonal matrix δ(x) as in [156, §1] [342, §2]. To make
rank-constrained problem (813) resemble cardinality problem (541), we could make C an affine subset:

find X ∈ SN
subject to A svec X = b
Xº0
rank X ≤ n
252 CHAPTER 4. SEMIDEFINITE PROGRAMMING

S2+
0

∂H = {G | hG , I i = κ}

Figure 98: (confer Figure 114) Trace heuristic can be interpreted as minimization of a
hyperplane, with normal I , over positive semidefinite cone drawn here in isometrically
isomorphic R3 . Polar of direction vector W = I points toward origin.

as the symmetric subspace of rank ≤ n matrices whose nullspace contains N (G⋆ ).


Then projection of G⋆ on Sn is (I −W )G⋆ (I −W ). (§E.7) Direction of projection is
−W G⋆ W . (Figure 97) tr(W G⋆ W ) is a measure of proximity to Sn because its orthogonal
complement is Sn⊥ = {W GW | G ∈ SN } ; the point being, convex iteration (incorporating
constrained tr(W GW ) = hG , W i minimization) is not a projection method: certainly,
not on these two subspaces. Proposed convex iteration is neither dual projection
(Figure 194) or alternating projection (Figure 198).
Closed-form solution W to problem (1892a), though efficient, comes with a caveat:
there exist cases where this projection matrix solution W does not provide the shortest
route to an optimal rank-n solution G⋆ ; id est, direction W is not unique. So we
sometimes choose to solve (1892a) instead of employing a known closed-form solution.
When direction matrix W = I , as in the trace heuristic for example, then −W points
directly at the origin (the rank-0 PSD matrix, Figure 98). Vector inner-product of an
optimization variable with direction matrix W is therefore a generalization of the trace
heuristic (§7.2.2.1) for rank minimization; −W is instead trained toward the boundary of
the positive semidefinite cone.

4.5.1.2 convergence

We study convergence to ascertain conditions under which a direction matrix will reveal
a feasible solution G , of rank n or less, to semidefinite program (814). Denote by W ⋆
a particular optimal direction matrix from semidefinite program (1892a) such that (815)
holds (feasible rank G ≤ n found). Then we define global optimality of the iteration (814)
(1892a) to correspond with this vanishing vector inner-product (815) of optimal solutions.
Because this iterative technique for constraining rank is not a projection method, it
can find a rank-n solution G⋆ ((815) will be satisfied) only if at least one exists in the
feasible set of program (814).
4.5. RANK CONSTRAINT BY CONVEX ITERATION 253

4.5.1.2.1 Proof. Suppose hG⋆ , W i = τ is satisfied for some nonnegative constant τ


after any particular iteration (814) (1892a) of the two minimization problems. Once
a particular value of τ is achieved, it can never be exceeded by subsequent iterations
because existence of feasible G and W having that vector inner-product τ has been
established simultaneously in each problem. Because the infimum of vector inner-product
of two positive semidefinite matrix variables is zero, the nonincreasing sequence of
iterations is thus bounded below hence convergent because any bounded monotonic
sequence in R is convergent. [294, §1.2] [44, §1.1] Local optimality to some nonnegative
objective value τ is thereby established. ¨

Local optimality, in this context, means convergence of hG⋆ , W i to a fixed point


of possibly infeasible rank. Only local optimality can be established because objective
hG , W i , when instead regarded simultaneously in two variables (G , W ) , is generally
multimodal. (§3.14.0.0.3)
Local optimality, convergence to hG⋆ , W i = τ 6= 0 and definition of a stall, never
implies nonexistence of a rank-n feasible solution to (814). Conversely, a nonexistent
rank-n feasible solution would mean certain failure (τ 6= 0) to achieve global optimality by
definition (815). But, as proved, convex iteration always converges to a local optimum if
not a global one.
When a rank-n feasible solution to (814) exists, it remains an open problem to
state conditions under which hG⋆ , W ⋆ i = τ = 0 (815) is achieved by iterative solution of
semidefinite programs (814) and (1892a). Then rank G⋆ ≤ n and pair (G⋆ , W ⋆ ) becomes
a globally optimal fixed point of iteration. There can be no proof of convergence to a
global optimum because of the implicit high-dimensional multimodal manifold in variables
(G , W ).
When stall occurs, direction vector W can be manipulated to steer out; e.g,
reversal of search direction as in Example 4.7.0.0.1, or reinitialization to a random
rank-(N − n) matrix in the same PSD cone face (§2.9.2.3) demanded by the current iterate:
Given ordered diagonalization G⋆ = QΛQT ∈ SN , then W = U ⋆ Φ U ⋆T as in (229) where
U ⋆ = Q(: , n+1 : N ) ∈ RN ×N −n and where eigenvalue vector λ(W )1:N −n = λ(Φ) is made
to have nonnegative uniformly distributed random entries in (0 , 1] (by discriminating
−n
selection of Φ ∈ SN+ ) while λ(W )N −n+1:N = 0. Zero eigenvalues act as memory while
randomness largely reduces likelihood of stall. When this direction works, rank and
objective sequence hG⋆ , W i (with respect to iteration) tend to be noisily monotonic.

4.5.1.2.2 Exercise. Completely positive semidefinite


 matrix.  [42]
0.50 0.55 0.20
Given rank-2 positive semidefinite matrix G =  0.55 0.61 0.22 , find a positive
0.20 0.22 0.08
factorization G = X TX (1058) by solving

find X≥0
X∈ R2×3 · ¸
I X (819)
subject to Z= º0
XT G
rank Z ≤ 2

via convex iteration. H


254 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.5.1.2.3 Exercise.Nonnegative matrix


 factorization.

17 28 42
Given rank-2 nonnegative matrix X =  16 47 51 , find a nonnegative factorization
17 82 72

X = WH (820)
by solving
find W,H
A∈S3 , B∈S3 , W ∈ R3×2 , H∈ R2×3
WT
 
I H
subject to Z=  W A X º 0
HT XT B (821)
W ≥0
H≥0
rank Z ≤ 2
which follows from the fact, at optimality,

[I WT H ]
 
I

Z = W  (822)
HT

Use the known closed-form solution for a direction vector Y to regulate rank by convex
iteration; set Z ⋆ = QΛQT ∈ S8 to an ordered diagonalization and U ⋆ = Q(: , 3 : 8) ∈ R8×6 ,
then Y = U ⋆ U ⋆T (§4.5.1.1).
In summary, initialize Y then iterate numerical solution of (convex) semidefinite
program
minimize hZ , Y i
A∈S3 , B∈S3 , W ∈ R3×2 , H∈ R2×3
WT H
 
I
subject to Z = W A X º 0 (823)
HT XT B
W ≥0
H≥0
with Y = U ⋆ U ⋆T until convergence (which is to a global optimum, and occurs in very few
iterations for this instance). H

Now, an application to optimal regulation of affine dimension:

4.5.1.2.4 Example. Sensor-Network Localization and Wireless Location.


Heuristic solution to a sensor-network localization problem, proposed by Carter, Jin,
Saunders, & Ye in [79],4.30 is limited to two Euclidean dimensions and applies semidefinite
programming (SDP) to little subproblems. There, a large network is partitioned into
smaller subnetworks (as small as one sensor - a mobile point, whereabouts unknown) and
then semidefinite programming and heuristics called spaseloc are applied to localize each
and every partition by two-dimensional distance geometry. Their partitioning procedure
is one-pass, yet termed iterative; a term applicable only insofar as adjoining partitions can
share localized sensors and anchors (absolute sensor positions known a priori ). But there
is no iteration on the entire network, hence the term “iterative” is perhaps inappropriate.
As partitions are selected based on “rule sets” (heuristics, not geographics), they also term
4.30 Thepaper constitutes Jin’s dissertation for University of Toronto [248] although her name appears as
second author. Ye’s authorship is honorary.
4.5. RANK CONSTRAINT BY CONVEX ITERATION 255

Figure 99: Sensor-network localization in R2 , illustrating connectivity and circular


radio-range per sensor. Smaller dark grey regions each hold an anchor at their center;
known fixed sensor positions. Sensor/anchor distance is measurable with negligible
uncertainty for sensor within those grey regions. (Graphic by Geoff Merrett.)

the partitioning adaptive. But no adaptation of a partition actually occurs once it has
been determined.
One can reasonably argue that semidefinite programming methods are unnecessary
for localization of small partitions of large sensor networks. [316] [95] In the past, these
nonlinear localization problems were solved algebraically and computed by least squares
solution to hyperbolic equations; called multilateration.4.31 [261] [303] Indeed, practical
contemporary numerical methods for global positioning (GPS) by satellite do not rely on
convex optimization. [329]
Modern distance geometry is inextricably melded with semidefinite programming. The
beauty of semidefinite programming, as relates to localization, lies in convex expression
of classical multilateration: So & Ye showed [363] that the problem of finding unique
solution, to a noiseless nonlinear system describing the common point of intersection of
hyperspheres in real Euclidean vector space, can be expressed as a semidefinite program
via distance geometry.
But the need for SDP methods in Carter & Jin et alii is enigmatic for two more
reasons: 1) guessing solution to a partition whose intersensor measurement data
or connectivity is inadequate for localization by distance geometry, 2) reliance on
complicated and extensive heuristics for partitioning a large network that could instead
be efficiently solved whole by one semidefinite program [256, §3]. While partitions range
in size between 2 and 10 sensors, 5 sensors optimally, heuristics provided are only for
4.31 Multilateration
- literally, having many sides; shape of a geometric figure formed by nearly intersecting
lines of position. In navigation systems, therefore: Obtaining a fix from multiple lines of position.
Multilateration can be regarded as noisy trilateration.
256 CHAPTER 4. SEMIDEFINITE PROGRAMMING

two spatial dimensions (no higher-dimensional heuristics are proposed). For these small
numbers it remains unclarified as to precisely what advantage is gained over traditional
least squares: it is difficult to determine what part of their noise performance is attributable
to SDP and what part is attributable to their heuristic geometry.
Partitioning of large sensor networks is a compromise to rapid growth of SDP
computational intensity with problem size. But when impact of noise on distance
measurement is of most concern, one is averse to a partitioning scheme because noise-effects
vary inversely with problem size. [57, §2.2] (§5.13.2) Since an individual partition’s solution
is not iterated in Carter & Jin and is interdependent with adjoining partitions, we expect
errors to propagate from one partition to the next; the ultimate partition solved, expected
to suffer most.
Heuristics often fail on real-world data because of unanticipated circumstances.
When heuristics fail, generally they are repaired by adding more heuristics. Tenuous
is any presumption, for example, that distance measurement errors have distribution
characterized by circular contours of equal probability about an unknown sensor-location.
(Figure 99) That presumption effectively appears within Carter & Jin’s optimization
problem statement as affine equality constraints relating unknowns to distance
measurements that are corrupted by noise. Yet in most all urban environments, this
measurement noise is more aptly characterized by ellipsoids of varying orientation and
eccentricity as one recedes from a sensor. (Figure 153) Each unknown sensor must
therefore instead be bound to its own particular range of distance, primarily determined
by the terrain.4.32 The nonconvex problem we must instead solve is:

find {xi , xj }
i,j∈ I (824)
subject to dij ≤ kxi − xj k2 ≤ dij

where xi represents sensor location, and where dij and dij respectively represent lower
and upper bounds on measured distance-square from i th to j th sensor (or from sensor
to anchor). Figure 104 illustrates contours of equal sensor-location uncertainty. By
establishing these individual upper and lower bounds, orientation and eccentricity can
effectively be incorporated into the problem statement.
Generally speaking, there can be no unique solution to the sensor-network localization
problem because there is no unique formulation; that is the art of Optimization. Any
optimal solution obtained depends on whether or how a network is partitioned, whether
distance data is complete, presence of noise, and how the problem is formulated. When
a particular formulation is a convex optimization problem, then the set of all optimal
solutions forms a convex set containing the actual or true localization. Measurement
noise precludes equality constraints representing distance. The optimal solution set is
consequently expanded; necessitated by introduction of distance inequalities admitting
more and higher-rank solutions. Even were the optimal solution set a single point, it is
not necessarily the true localization because there is little hope of exact localization by
any algorithm once significant noise is introduced.
Carter & Jin gauge performance of their heuristics to the SDP formulation of author
Biswas whom they regard as vanguard to the art. [16, §1] Biswas posed localization as an
optimization problem minimizing a distance measure. [51] [49] Intuitively, minimization
of any distance measure yields compacted solutions; (confer §6.7.0.0.1) precisely the
anomaly motivating Carter & Jin. Their two-dimensional heuristics outperformed Biswas’
localizations both in execution-time and proximity to the desired result. Perhaps, instead
of heuristics, Biswas’ approach to localization can be improved: [48] [50].
4.32 A distinct contour map corresponding to each anchor is required in practice.
4.5. RANK CONSTRAINT BY CONVEX ITERATION 257

2
3

1 4

Figure 100: 2-lattice in R2 , hand-drawn. Nodes 3 and 4 are anchors; remaining nodes are
sensors. Radio range of sensor 1 indicated by arc.

The sensor-network localization problem is considered difficult. [16, §2] Rank


constraints in optimization are considered more difficult. Control of affine dimension
in Carter & Jin is suboptimal because of implicit projection on R2 . In what follows, we
present the localization problem as a semidefinite program (equivalent to (824)) having an
explicit rank constraint which controls affine dimension of an optimal solution. We show
how to achieve that rank constraint only if the feasible set contains a matrix of desired
rank. Our problem formulation is extensible to any spatial dimension.

proposed standardized test

Jin proposes an academic test in two-dimensional real Euclidean space R2 that we adopt.
In essence, this test is a localization of sensors and anchors arranged in a regular triangular
lattice. Lattice connectivity is solely determined by sensor radio range; a connectivity
graph is assumed incomplete. In the interest of test standardization, we propose adoption
of a few small examples: Figure 100 through Figure 103 and their particular connectivity
represented by matrices (825) through (828) respectively.

0 • ? •
• 0 • •
(825)
? • 0 ◦
• • ◦ 0

Matrix entries dot • indicate measurable distance between nodes while unknown
distance is denoted by ? (question mark ). Matrix entries hollow dot ◦ represent known
distance between anchors (to high accuracy) while zero distance is denoted 0. Because
measured distances are quite unreliable in practice, our solution to the localization problem
substitutes a distinct range of possible distance for each measurable distance; equality
constraints exist only for anchors.
Anchors are chosen so as to increase difficulty for algorithms dependent on existence
of sensors in their convex hull. The challenge is to find a solution in two dimensions close
to the true sensor positions given incomplete noisy intersensor distance information.
258 CHAPTER 4. SEMIDEFINITE PROGRAMMING

5 7 6

3 8 4

1 9 2

Figure 101: 3-lattice in R2 , hand-drawn. Nodes 7, 8, and 9 are anchors; remaining nodes
are sensors. Radio range of sensor 1 indicated by arc.

0 • • ? • ? ? • •
• 0 • • ? • ? • •
• • 0 • • • • • •
? • • 0 ? • • • •
• ? • ? 0 • • • • (826)
? • • • • 0 • • •
? ? • • • • 0 ◦ ◦
• • • • • • ◦ 0 ◦
• • • • • • ◦ ◦ 0
4.5. RANK CONSTRAINT BY CONVEX ITERATION 259

10 13 11 12

7 14 8 9

4 15 5 6

1 16 2 3

Figure 102: 4-lattice in R2 , hand-drawn. Nodes 13, 14, 15, and 16 are anchors; remaining
nodes are sensors. Radio range of sensor 1 indicated by arc.

0 ? ? • ? ? • ? ? ? ? ? ? ? • •
? 0 • • • • ? • ? ? ? ? ? • • •
? • 0 ? • • ? ? • ? ? ? ? ? • •
• • ? 0 • ? • • ? • ? ? • • • •
? • • • 0 • ? • • ? • • • • • •
? • • ? • 0 ? • • ? • • ? ? ? ?
• ? ? • ? ? 0 ? ? • ? ? • • • •
? • ? • • • ? 0 • • • • • • • •
(827)
? ? • ? • • ? • 0 ? • • • ? • ?
? ? ? • ? ? • • ? 0 • ? • • • ?
? ? ? ? • • ? • • • 0 • • • • ?
? ? ? ? • • ? • • ? • 0 ? ? ? ?
? ? ? • • ? • • • • • ? 0 ◦ ◦ ◦
? • ? • • ? • • ? • • ? ◦ 0 ◦ ◦
• • • • • ? • • • • • ? ◦ ◦ 0 ◦
• • • • • ? • • ? ? ? ? ◦ ◦ ◦ 0
260 CHAPTER 4. SEMIDEFINITE PROGRAMMING

17 18 21 19 20

13 14 22 15 16

9 10 23 11 12

5 6 24 7 8

1 2 25 3 4

Figure 103: 5-lattice in R2 . Nodes 21 through 25 are anchors.

0 • ? ? • • ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
• 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? • •
? ? 0 • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ? • • •
? ? • 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? • ?
• • ? ? 0 • ? ? • • ? ? • • ? ? • ? ? ? ? ? • ? •
• • • ? • 0 • ? • • • ? ? • ? ? ? ? ? ? ? ? • • •
? ? • • ? • 0 • ? ? • • ? ? • • ? ? ? ? ? ? • • •
? ? • • ? ? • 0 ? ? • • ? ? • • ? ? ? ? ? ? ? • ?
• ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? ? ? ? ?
? • ? ? • • ? ? • 0 • ? • • ? ? ? • ? ? • • • • •
? ? • ? ? • • • ? • 0 • ? • • • ? ? • ? ? • • • •
? ? • • ? ? • • ? ? • 0 ? ? • • ? ? • • ? • • • ?
? ? ? ? • ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? (828)
? ? ? ? • • ? ? • • • ? • 0 • ? • • • ? • • • • ?
? ? ? ? ? ? • • ? ? • • ? • 0 • ? ? • • • • • • ?
? ? ? ? ? ? • • ? ? • • ? ? • 0 ? ? • • ? • ? ? ?
? ? ? ? • ? ? ? • ? ? ? • • ? ? 0 • ? ? • ? ? ? ?
? ? ? ? ? ? ? ? • • ? ? • • ? ? • 0 • ? • • • ? ?
? ? ? ? ? ? ? ? ? ? • • ? • • • ? • 0 • • • • ? ?
? ? ? ? ? ? ? ? ? ? ? • ? ? • • ? ? • 0 • • ? ? ?
? ? ? ? ? ? ? ? ? • ? ? • • • ? • • • • 0 ◦ ◦ ◦ ◦
? ? ? ? ? ? ? ? ? • • • • • • • ? • • • ◦ 0 ◦ ◦ ◦
? ? • ? • • • ? ? • • • ? • • ? ? • • ? ◦ ◦ 0 ◦ ◦
? • • • ? • • • ? • • • ? • • ? ? ? ? ? ◦ ◦ ◦ 0 ◦
? • • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ◦ ◦ ◦ ◦ 0
4.5. RANK CONSTRAINT BY CONVEX ITERATION 261

M
AR
KE
T
St
.
Figure 104: Location uncertainty ellipsoid in R2 for each of 15 sensors • within three city
blocks in downtown San Francisco. (Data by Polaris Wireless.)

problem statement
Ascribe points in a list {xℓ ∈ Rn , ℓ = 1 . . . N } to the columns of a matrix X ;

X = [ x1 · · · xN ] ∈ Rn×N (79)

where N is regarded as cardinality of list X . Positive semidefinite matrix X TX , formed


from inner product of the list, is a Gram matrix ; [285, §3.6]

kx1 k2 xT xT xT
 
1 x2 1 x3 ··· 1 xN
 T
 x2 x1 kx2 k2 xT
2 x3 ··· xT2 xN 

..
 
T
G = X X =  xT xT kx3 k2 . T ∈ SN (1058)
 
 3 x1 3 x2 x3 xN  +
.. .. ..

 .. .. 
 . . . . . 
xNT x1 xNT x2 T
xN x3 ··· kxN k2

where SN+ is the convex cone of N × N positive semidefinite matrices in the symmetric
matrix subspace SN .
Existence of noise precludes measured distance from the input data. We instead assign
measured distance to a range estimate specified by individual upper and lower bounds: dij
is an upper bound on distance-square from i th to j th sensor, while dij is a lower bound.
These bounds become the input data. Each measurement range is presumed different from
the others because of measurement uncertainty; e.g, Figure 104.
Our mathematical treatment of anchors and sensors is not dichotomized.4.33 A sensor
position that is known a priori to high accuracy (with absolute certainty) x̌i is called an
anchor. Then the sensor-network localization problem (824) can be expressed equivalently:
Given a number m of anchors and a set of indices I (corresponding to all measurable
distances • ), for 0 < n < N
4.33 Wireless location problem thus stated identically; difference being: fewer sensors.
262 CHAPTER 4. SEMIDEFINITE PROGRAMMING

find X
G∈ SN , X∈ Rn×N
subject to dij ≤ hG , (ei − ej )(ei − ej )T i ≤ dij ∀(i, j) ∈ I
hG , ei eT
i i = kx̌i k ,2
i=N−m + 1... N
hG , (ei eT T
j + ej ei )/2i = x̌T
i x̌j , i<j , ∀ i, j ∈ {N − m + 1 . . . N }
X(: , N − m + 1 : N ) = [ x̌N −m+1 · · · x̌N ]
· ¸
I X
Z= º 0
XT G
rank Z = n (829)

where ei is the i th member of the standard basis for RN . Distance-square

dij = kxi − xj k22 = hxi − xj , xi − xj i (1045)

is related to Gram matrix entries G , [gij ] by vector inner-product

dij = gii + gjj − 2gij


(1060)
= hG , (ei − ej )(ei − ej )T i = tr(G T (ei − ej )(ei − ej )T )

hence the scalar inequalities. Each linear equality constraint in G ∈ SN represents a


hyperplane in isometrically isomorphic Euclidean vector space RN (N +1)/2 , while each
linear inequality pair represents a convex Euclidean body known as slab.4.34 By Schur
complement (§A.4), any solution (G , X) provides comparison with respect to the positive
semidefinite cone
G º X TX (1098)
which is a convex relaxation of the desired equality constraint

[I X ]
· ¸ · ¸
I X I
= (1099)
XT G XT

The rank constraint insures this equality holds, by Theorem A.4.0.1.3, thus restricting
solution to Rn . Assuming full-rank solution (list) X

rank Z = rank G = rank X (830)

convex equivalent problem statement


Problem statement (829) is nonconvex because of the rank constraint. We do not eliminate
or ignore the rank constraint; rather, we find a convex way to enforce it: for 0 < n < N

minimize hZ , W i
G∈ SN , X∈ Rn×N
subject to dij ≤ hG , (ei − ej )(ei − ej )T i ≤ dij ∀(i, j) ∈ I
hG , ei eT
i i = kx̌i k ,2
i=N−m + 1... N
hG , (ei eT T
j + ej ei )/2i = x̌T
i x̌j , i<j , ∀ i, j ∈ {N − m + 1 . . . N }
X(: , N − m + 1 : N ) = [ x̌N −m+1 · · · x̌N ]
· ¸
I X
Z= º 0 (831)
XT G

4.34 an intersection of two parallel but opposing halfspaces (Figure 13). In terms of position X , this
distance slab can be thought of as a thick hypershell instead of a hypersphere boundary.
4.5. RANK CONSTRAINT BY CONVEX ITERATION 263

1.2

0.8

0.6

0.4

0.2

−0.2
−0.5 0 0.5 1 1.5 2

Figure 105: Typical solution for 2-lattice in Figure 100 with noise factor η = 0.1 (834).
Two red rightmost nodes are anchors; two remaining nodes are sensors. Radio range of
sensor 1 indicated by arc; radius = 1.14 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 1 iteration (831) (1892a)
subject to reflection error.

Convex function tr Z is a well-known heuristic whose sole purpose is to represent convex


envelope of rank Z . (§7.2.2.1) In this convex optimization problem (831), a semidefinite
program, we substitute a vector inner-product objective function for trace;

tr Z = hZ , I i ← hZ , W i (832)

a generalization of the trace heuristic for minimizing convex envelope of rank, where
N +n
W ∈ S+ is constant with respect to (831). Matrix W is normal to a hyperplane in
N +n
S minimized over a convex feasible set specified by the constraints in (831). Matrix
W is chosen so −W points in direction of rank-n feasible solutions G . For properly
chosen W , problem (831) becomes an equivalent to (829). Thus the purpose of vector
inner-product objective (832) is to locate a rank-n feasible Gram matrix assumed existent
on the boundary of positive semidefinite cone SN + , as explained beginning in §4.5.1; how
to choose direction vector W is explained there and in what follows:

direction matrix W
Denote by Z ⋆ an optimal composite matrix from semidefinite program (831). Then
for Z ⋆ ∈ SN +n whose eigenvalues λ(Z ⋆ ) ∈ RN +n are arranged in nonincreasing order,
(Ky Fan)
N
X +n
λ(Z ⋆ )i = minimize hZ ⋆ , W i (1892a)
W ∈ SN +n
i=n+1
subject to 0 ¹ W ¹ I
tr W = N
which has an optimal solution that is known in closed form (p.539, §4.5.1.1). This
eigenvalue sum is zero when Z ⋆ has rank n or less.
Foreknowledge of optimal Z ⋆ , to make possible this search for W , implies iteration;
id est, semidefinite program (831) is solved for Z ⋆ initializing W = I or W = 0. Once
found, Z ⋆ becomes constant in semidefinite program (1892a) where a new normal direction
W is found as its optimal solution. Then this cycle (831) (1892a) iterates until convergence.
264 CHAPTER 4. SEMIDEFINITE PROGRAMMING

1.2

0.8

0.6

0.4

0.2

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Figure 106: Typical solution for 3-lattice in Figure 101 with noise factor η = 0.1 (834).
Three red vertical middle nodes are anchors; remaining nodes are sensors. Radio range of
sensor 1 indicated by arc; radius = 1.12 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 2 iterations (831) (1892a).

1.2

0.8

0.6

0.4

0.2

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Figure 107: Typical solution for 4-lattice in Figure 102 with noise factor η = 0.1 (834).
Four red vertical middle-left nodes are anchors; remaining nodes are sensors. Radio range
of sensor 1 indicated by arc; radius = 0.75 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 7 iterations (831) (1892a).
4.5. RANK CONSTRAINT BY CONVEX ITERATION 265

1.2

0.8

0.6

0.4

0.2

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Figure 108: Typical solution for 5-lattice in Figure 103 with noise factor η = 0.1 (834).
Five red vertical middle nodes are anchors; remaining nodes are sensors. Radio range of
sensor 1 indicated by arc; radius = 0.56 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 3 iterations (831) (1892a).

1.2

0.8

0.6

0.4

0.2

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Figure 109: Typical solution for 10-lattice with noise factor η = 0.1 (834) compares better
than Carter & Jin [79, fig.4.2]. Ten red vertical middle nodes are anchors; the rest are
sensors. Radio range of sensor 1 indicated by arc; radius = 0.25 . Actual sensor indicated
by target # while its localization is indicated by bullet • . Rank-2 solution found in 5
iterations (831) (1892a).
266 CHAPTER 4. SEMIDEFINITE PROGRAMMING

1.2

0.8

0.6

0.4

0.2

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2

Figure 110: Typical localization of 100 randomized noiseless sensors (η = 0 (834)) is exact
despite incomplete EDM. Ten red vertical middle nodes are anchors; remaining nodes are
sensors. Radio range of sensor at origin indicated by arc; radius = 0.25 . Actual sensor
indicated by target # while its localization is indicated by bullet • . Rank-2 solution
found in 3 iterations (831) (1892a).

When rank Z ⋆ = n , solution via this convex iteration solves sensor-network localization
problem (824) and its equivalent (829).

numerical solution
In all examples to follow, number of anchors

m= N (833)

equals square root of cardinality N of list X . Indices set I identifying all measurable
distances • is ascertained from connectivity matrix (825), (826), (827), or (828). We
solve iteration (831) (1892a) in dimension n = 2 for each respective example illustrated
in Figure 100 through Figure 103.
In presence of negligible noise, true position is reliably localized for every standardized
example; noteworthy insofar as each example represents an incomplete graph. This implies
that the set of all optimal solutions having least rank must be small.
To make the examples interesting and consistent with previous work, we randomize
each range of distance-square that bounds hG , (ei − ej )(ei − ej )T i in (831); id est, for each
and every (i, j) ∈ I √
dij = dij (1 +√ 3 η χl )2
(834)
dij = dij (1 − 3 η χl+1 )2
where η = 0.1 is a constant noise factor, χl is the l th sample of a noise process realization
uniformly distributed in the interval (0 , 1) like rand(1) from Matlab, and dij is actual
distance-square from i th to j th sensor. Because of distinct function calls to rand() , each
range of distance-square [ dij , dij ] is not necessarily centered on actual distance-square

dij . Unit stochastic variance is provided by factor 3.
Figure 105 through Figure 108 each illustrate one realization of numerical solution to
the standardized lattice problems posed by Figure 100 through Figure 103 respectively.
Exact localization, by any method, is impossible because of measurement noise. Certainly,
by inspection of their published graphical data, our results are better than those of
Carter & Jin. (Figure 109, 110, 111) Obviously our solutions do not suffer from those
4.5. RANK CONSTRAINT BY CONVEX ITERATION 267

1.2

0.8

0.6

0.4

0.2

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2

Figure 111: Typical solution for 100 randomized sensors with noise factor η = 0.1 (834);
worst measured average sensor error ≈ 0.0044 compares better than Carter & Jin’s 0.0154
computed in 0.71s [79, p.19]. Ten red vertical middle nodes are anchors; same as before.
Remaining nodes are sensors. Interior anchor placement makes localization difficult. Radio
range of sensor at origin indicated by arc; radius = 0.25 . Actual sensor indicated by target
# while its localization is indicated by bullet • . After 1 iteration rank G = 92 , after 2
iterations rank G = 4. Rank-2 solution found in 3 iterations (831) (1892a). (Regular
lattice in Figure 109 is actually harder to solve, requiring more iterations.) Runtime for
SDPT3 [395] under cvx [195] is a few minutes on 2009 vintage laptop Core 2 Duo CPU
(Intel T6400@2GHz, 800MHz FSB).
268 CHAPTER 4. SEMIDEFINITE PROGRAMMING

compaction-type errors (clustering of localized sensors) exhibited by Biswas’ graphical


results for the same noise factor η .

localization example conclusion


Solution to this sensor-network localization problem became apparent by understanding
geometry of optimization. Trace of a matrix, to a student of linear algebra, is perhaps
a sum of eigenvalues. But to us, trace represents the normal I to some hyperplane in
Euclidean vector space. (Figure 98)
Our solutions are globally optimal, requiring: 1) no centralized-gradient postprocessing
heuristic refinement as in [48] because there is effectively no relaxation of (829) at global
optimality, 2) no implicit postprojection on rank-2 positive semidefinite matrices induced
by nonzero G−X TX denoting suboptimality as occurs in [49] [50] [51] [79] [248] [256];
indeed, G⋆ = X ⋆TX ⋆ by convex iteration.
Numerical solution to noisy problems, containing sensor variables well in excess of
100 , becomes difficult via the holistic semidefinite program we proposed. When problem
size is within reach of contemporary general-purpose semidefinite program solvers, then
the convex iteration we presented inherently overcomes limitations of Carter & Jin with
respect to both noise performance and ability to localize in any desired affine dimension.
The legacy of Carter, Jin, Saunders, & Ye [79] is a sobering demonstration of the need
for more efficient methods for solution of semidefinite programs, while that of So & Ye
[363] forever bonds distance geometry to semidefinite programming. Elegance of our
semidefinite problem statement (831), for constraining affine dimension of sensor-network
localization, should provide some impetus to focus more research on computational
intensity of general-purpose semidefinite program solvers. An approach different from
interior-point methods is required; higher speed and greater accuracy from a simplex-like
solver is what is needed. 2

4.5.1.2.5 Example. Nonnegative spectral factorization. (confer §3.14.2.0.2)


Having found optimal real coefficient vectors v ⋆ , u⋆ for a sixteenth order magnitude square
transfer function, evaluated along the ω axis (p.212),

1 + v1⋆ ω 2 + v2⋆ ω 4 + . . . + v8⋆ ω 16


|H(ω)|2 = H(ω)H(−ω) = (680)
1 + u⋆1 ω 2 + u⋆2 ω 4 + . . . + u⋆8 ω 16

we wish to find real coefficients b , a for corresponding Fourier transform

1 + b1 ω + b2 (ω)2 + . . . + b8 (ω)8
H(ω) = (677)
1 + a1 ω + a2 (ω)2 + . . . + a8 (ω)8

These coefficients b , a , v ⋆ , u⋆ are related through simultaneous nonlinear algebraic


equations:

v1⋆ = b21 − 2b2 , u⋆1 = a21 − 2a2


v2⋆ = b22 − 2b1 b3 + 2b4 , u⋆2 = a22 − 2a1 a3 + 2a4
v3⋆ = b23 − 2b2 b4 + 2b1 b5 − 2b6 , u⋆3 = a23 − 2a2 a4 + 2a1 a5 − 2a6
v4⋆ = b24 − 2b3 b5 + 2b2 b6 − 2b1 b7 + 2b8 , u⋆4 = a24 − 2a3 a5 + 2a2 a6 − 2a1 a7 + 2a8
v5⋆ = b25 − 2b4 b6 + 2b3 b7 − 2b2 b8 , u⋆5 = a25 − 2a4 a6 + 2a3 a7 − 2a2 a8
v6⋆ = b26 − 2b5 b7 + 2b4 b8 , u⋆6 = a26 − 2a5 a7 + 2a4 a8
v7⋆ = b27 − 2b6 b8 , u⋆7 = a27 − 2a6 a8
v8⋆ = b28 , u⋆8 = a28 (835)
4.5. RANK CONSTRAINT BY CONVEX ITERATION 269

Define a rank-1 matrix


 
1 b1 b2 b3 b4 b5 b6 b7 b8
 b1
 b21 b1 b2 b1 b3 b1 b4 b1 b5 b1 b6 b1 b7 b1 b8 

 b2
 b1 b2 b22 b2 b3 b2 b4 b2 b5 b2 b6 b2 b7 b2 b8 

· ¸ T
 b3 b1 b3 b2 b3 b23 b3 b4 b3 b5 b3 b6 b3 b7 b3 b8 
1 [1 b ]   ∈ S9

G(b) , = b4 b1 b4 b2 b4 b3 b4 b24 b4 b5 b4 b6 b4 b7 b4 b8 (836)
b 
 b5
 b1 b5 b2 b5 b3 b5 b4 b5 b25 b5 b6 b5 b7 b5 b8 

 b6
 b1 b6 b2 b6 b3 b6 b4 b6 b5 b6 b26 b6 b7 b6 b8 

 b7 b1 b7 b2 b7 b3 b7 b4 b7 b5 b7 b6 b7 b27 b7 b8 
b8 b1 b8 b2 b8 b3 b8 b4 b8 b5 b8 b6 b8 b7 b8 b28

(Matrix G(a) is similarly defined.) Observe that v ⋆ in (835) is formed by summing


antidiagonals of G(b) whose entries alternate sign. A particular sum is specified by a
predetermined symmetric matrix constant Ai (confer (60)) from a set {Ai ∈ S 9 , i = 1 . . . 8}.
With  
svec(A1 )T
.. 8×9(9+1)/2
A = ∈ R (712)
 
.
T
svec(A8 )
as previously defined in §4.1.1, all the sums (835) may be stated as two linear equalities
A svec G(b) = v ⋆ and A svec G(a) = u⋆ . Then the problem of finding coefficients b may be
stated as a feasibility problem4.35

find b ∈ R8
G∈ S9

subject to ·A svec
¸ G=v
1
= G(: , 1) (837)
b
bº0
(G º 0)
rank G = 1

The rank-1 constraint is handled by convex iteration, as explained in §4.5.1. Positive


semidefiniteness is parenthetical here because, for rank-1 matrices, symmetry is necessary
and sufficient (§A.3.1.0.7). 2

4.5.1.2.6 Example. Nonnegative spectral factorization II.


The purpose of spectral factorization, in electronics, is to facilitate high order filter
implementation in the form of passive and active circuitry. Cascades of second-order
(Laplace) sections are preferred because component sensitivity becomes manageable and
because needed complex poles and zeros cannot be obtained from a first-order section.
Nonnegative spectral factorization on a magnitude square transfer function, evaluated
along the ω axis, was performed in Example 4.5.1.2.5 to recover its corresponding Fourier
transform.4.36 In this example, we nonnegatively decompose a high order magnitude
square transfer function into a product of successively lower order magnitude square
transfer functions. Once fourth order magnitude square functions are found, then
4.35 separately from the similar optimization problem to find vector a . Stability requires a º 0 with more
constraints on a . Minimum phase requires b º 0 and more constraints on b that are missing from problem
statement (837). Both stability and minimum phase may be enforced, subsequent to spectral factorization,
by negating positive real parts of poles and zeros respectively in order to move them into the left half
(Laplace) s-plane with no impact to |H(ω)|.
4.36 When there are no poles on the ω axis, a Laplace transform can be recovered from a Fourier transform

by substitution ω ← s .
270 CHAPTER 4. SEMIDEFINITE PROGRAMMING

level
8 th order Laplace 1

4 th order 4 th order 2

2 nd 2 nd 2 nd 2 nd 3

Figure 112: Nonnegative spectral factorization, high order bisection strategy. η = 8 th


order Laplace transform corresponds to 2η = 16 th order magnitude square transfer function.
Because numerator v and denominator u are factored separately, number of factorizations
= 2(log2 (η)−1). In the text, double dots v̈ , ü connote first bifurcation (level 2). Triple
... ...
dots v , u connote second bifurcations (level 3). Factors per level = 2level−1 .

corresponding second-order Laplace transfer function coefficients are ascertained from


(679) and then passive component values can be determined from those coefficients.
Our strategy, for an eighth order Laplace transfer function, is illustrated in Figure 112.
We begin at the tree’s level 2 factorization. Nonnegative decomposition of a 16 th order
magnitude square transfer function into two 8 th order functions
1 + v1⋆ ω 2 + v2⋆ ω 4 + . . . + v8⋆ ω 16 1 + v̈1 ω 2 + v̈2 ω 4 + v̈3 ω 6 + v̈4 ω 8 1 + v̈5 ω 2 + v̈6 ω 4 + v̈7 ω 6 + v̈8 ω 8
=
1 + u⋆1 ω 2 + u⋆2 ω 4 + . . . + u⋆8 ω 16 1 + ü1 ω 2 + ü2 ω 4 + ü3 ω 6 + ü4 ω 8 1 + ü5 ω 2 + ü6 ω 4 + ü7 ω 6 + ü8 ω 8
(838)
implies these simultaneous algebraic identifications with known real coefficient vectors
v ⋆ , u⋆ :
v1⋆ = v̈1 + v̈5 , u⋆1 = ü1 + ü5
v2⋆ = v̈2 + v̈6 + v̈1 v̈5 , u⋆2 = ü2 + ü6 + ü1 ü5
v3⋆ = v̈3 + v̈7 + v̈1 v̈6 + v̈2 v̈5 , u⋆3 = ü3 + ü7 + ü1 ü6 + ü2 ü5
v4⋆ = v̈4 + v̈8 + v̈1 v̈7 + v̈2 v̈6 + v̈3 v̈5 , u⋆4 = ü4 + ü8 + ü1 ü7 + ü2 ü6 + ü3 ü5
(839)
v5⋆ = v̈4 v̈5 + v̈3 v̈6 + v̈2 v̈7 + v̈1 v̈8 , u⋆5 = ü4 ü5 + ü3 ü6 + ü2 ü7 + ü1 ü8
v6⋆ = v̈4 v̈6 + v̈3 v̈7 + v̈2 v̈8 , u⋆6 = ü4 ü6 + ü3 ü7 + ü2 ü8
v7⋆ = v̈4 v̈7 + v̈3 v̈8 , u⋆7 = ü4 ü7 + ü3 ü8
v8⋆ = v̈4 v̈8 , u⋆8 = ü4 ü8
Now define a rank-1 matrix for the numerator
 
1 v̈1 v̈2 v̈3 v̈4 v̈5 v̈6 v̈7 v̈8
 v̈1 v̈12 v̈1 v̈2 v̈1 v̈3 v̈1 v̈4 v̈1 v̈5 v̈1 v̈6 v̈1 v̈7 v̈1 v̈8 
 
 v̈2 v̈1 v̈2 v̈22 v̈2 v̈3 v̈2 v̈4 v̈2 v̈5 v̈2 v̈6 v̈2 v̈7 v̈2 v̈8 
 
 v̈3 v̈1 v̈3 v̈2 v̈3 v̈32 v̈3 v̈4 v̈3 v̈5 v̈3 v̈6 v̈3 v̈7 v̈3 v̈8 
1 [ 1 v̈ T ] 
· ¸
 9
G(v̈) ,  v̈4 v̈1 v̈4 v̈2 v̈4
= v̈3 v̈4 v̈42 v̈4 v̈5 v̈4 v̈6 v̈4 v̈7 v̈4 v̈8 ∈ S (840)
v̈ 
 v̈5 v̈1 v̈5 v̈2 v̈5
 v̈3 v̈5 v̈4 v̈5 v̈52 v̈5 v̈6 v̈5 v̈7 v̈5 v̈8 

 v̈6 v̈1 v̈6 v̈2 v̈6
 v̈3 v̈6 v̈4 v̈6 v̈5 v̈6 v̈62 v̈6 v̈7 v̈6 v̈8 

 v̈7 v̈1 v̈7 v̈2 v̈7 v̈3 v̈7 v̈4 v̈7 v̈5 v̈7 v̈6 v̈7 v̈72 v̈7 v̈8 
v̈8 v̈1 v̈8 v̈2 v̈8 v̈3 v̈8 v̈4 v̈8 v̈5 v̈8 v̈6 v̈8 v̈7 v̈8 v̈82
(Matrix G(ü) is defined similarly for the denominator.) Terms in (839) are picked
out of G(v̈) by a predetermined symmetric matrix constant Äi (confer (60)) from a set
4.5. RANK CONSTRAINT BY CONVEX ITERATION 271

{Äi ∈ S 9 , i = 1 . . . 8}. Populating rows of


 
svec(Ä1 )T
.. 8×9(9+1)/2
A = ∈ R (712)
 
.
svec(Ä8 )T

with vectorized Äi (as in §4.1.1), sums (839) are succinctly represented by two linear
equalities A svec G(v̈) = v ⋆ and A svec G(ü) = u⋆ . Then this spectral factorization in v̈
may be posed as a feasibility problem

find v̈ ∈ R8
G∈ S9

subject to ·A svec¸ G=v
1
= G(: , 1) (841)

v̈ º 0
(G º 0)
rank G = 1

Having found two 8 th order square spectral factors in nonnegative v̈ ⋆ from (841), two
pairs of 4 th order level 3 factors remain to be found:
... ... ... ...
1 + v̈1⋆ ω 2 + v̈2⋆ ω 4 + v̈3⋆ ω 6 + v̈4⋆ ω 8 1 + v1 ω 2 + v 2 ω 4 1 + v3 ω 2 + v 4 ω 4
= ... ... ... ... (842)
1 + ü⋆1 ω 2 + ü⋆2 ω 4 + ü⋆3 ω 6 + ü⋆4 ω 8 1 + u1 ω 2 + u2 ω 4 1 + u3 ω 2 + u4 ω 4
... ... ... 2 ... 4
1 + v̈5⋆ ω 2 + v̈6⋆ ω 4 + v̈7⋆ ω 6 + v̈8⋆ ω 8 1 + v5 ω 2 + v 6 ω 4 1 + v7 ω + v 8 ω
⋆ ⋆ ⋆ ⋆ = ... ... ... ... (843)
1 + ü5 ω 2 + ü6 ω 4 + ü7 ω 6 + ü8 ω 8 1 + u5 ω 2 + u6 ω 4 1 + u7 ω 2 + u8 ω 4

... ... ... ...


v̈1⋆ = v1 + v 3 , ü⋆1 = u1 + u3
... ... ... ... ... ... ... ...
v̈2⋆ ⋆
= v2 + v4 + v1 v3 , ü2 = u2 + u4 + u1 u3
... ... ... ... ... ... ... ... (844)
v̈3⋆ = v1 v 4 + v2 v 3 , ü⋆3 = u1 u4 + u2 u3
... ... ... ...
v̈4⋆ = v2 v 4 , ü⋆4 = u2 u4
... ... ... ...
v̈5⋆ = v5 + v 7 , ü⋆5 = u5 + u7
... ... ... ... ... ... ... ...
v̈6⋆ = v6 + v8 + v5 v7 , ü⋆6 = u6 + u8 + u5 u7
... ... ... ... ... ... ... ... (845)
v̈7⋆ = v5 v 8 + v6 v 7 , ü⋆7 = u5 u8 + u6 u7
... ... ... ...
v̈8⋆ = v6 v 8 , ü⋆8 = u6 u8
 ... ... ... ... ... ... ... ... 
1 v1 v2 v3 v4 v5 v6 v7 v8
... ...2 ... ... ... ... ... ... ... ... ... ... ... ... ... ...
 v1 v1 v 1 v2 v1 v3 v1 v 4 v 1 v5 v1 v6 v1 v 7 v 1 v8 
 ... ... ... ...2 ... ... ... ... ... ... ... ... ... ... ... ... 
 v2 v1 v 2 v2 v2 v3 v2 v 4 v 2 v5 v2 v6 v2 v 7 v 2 v8 
 ... ... ... ... ... ...2 ... ... ... ... ... ... ... ... ... ... 
· ¸ ...T  v3 v1 v 3 v 2 v3 v3 v3 v 4 v 3 v5 v3 v6 v3 v 7 v 3 v8 
... 1 [1 v ]  ... ... ... ... ... ... ... ...2 ... ... ... ... ... ... ... ...  9
G(v) , ... = v4 v1 v 4 v 2 v4 v3 v4 v4 v 4 v5 v4 v6 v4 v 7 v 4 v8 ∈ S (846)
v  ... ... ... ... ... ... ... ... ... ...2 ... ... ... ... ... ... 
 v5 v1 v 5 v 2 v5 v3 v5 v4 v 5 v5 v5 v6 v5 v 7 v 5 v8 
 ... ... ... ... ... ... ... ... ... ... ... ...2 ... ... ... ... 
 v6 v1 v 6 v 2 v6 v3 v6 v4 v 6 v 5 v6 v6 v6 v 7 v 6 v8 
 ... ... ... ... ... ... ... ... ... ... ... ... ... ...2 ... ... 
 v7 v1 v 7 v 2 v7 v3 v7 v4 v 7 v 5 v7 v6 v7 v7 v 7 v8 
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...2
v8 v1 v 8 v 2 v8 v3 v8 v4 v 8 v 5 v8 v6 v8 v7 v 8 v8
Setting
 ... 
svec(A1 )T
.. 8×9(9+1)/2
A = ∈ R (712)
 
. ...
svec(A8 )T
272 CHAPTER 4. SEMIDEFINITE PROGRAMMING

hZ , W i rank Z

wc

w
0 f (Z)

Figure 113: Regularization curve, parametrized by weight w for real convex objective f
minimization (848) with rank constraint to k by convex iteration, illustrates discontinuity
in f .

...
then all level 3 (Figure 112) nonnegative spectral factorization coefficients v are found
at once by solving
...
find9 v ∈ R8
G∈ S

subject to ·A svec ¸ G = v̈
1
... = G(: , 1) (847)
v
...
vº0
(G º 0)
rank G = 1
...
The feasibility problem to find u is similar. All second-order Laplace transfer function
coefficients can be found via (679). 2

4.5.2 regularization
We test the convex iteration technique, for constraining rank, over a wide range of problems
beyond localization of randomized positions (Figure 111); e.g, stress (§7.2.2.7.1), ball
packing (§5.4.2.2.6), and cardinality (§4.7). We have had some success introducing the
direction matrix inner-product (832) as a regularization term4.37

minimize f (Z) + whZ , W i


Z∈ SN
subject to Z∈C (848)
Zº0
4.37 called multiobjective- or vector optimization. Proof of convergence for this convex iteration is identical
to that in §4.5.1.2.1 because f is a convex real function, hence bounded below, and f (Z ⋆ ) is constant in
(849).
4.6. CONSTRAINING CARDINALITY 273

minimize f (Z ⋆ ) + whZ ⋆ , W i
W ∈ SN
subject to 0 ¹ W ¹ I (849)
tr W = N − n
whose purpose is to constrain rank, affine dimension, or cardinality:
The abstraction, that is Figure 113, is a synopsis; a broad generalization of
accumulated empirical evidence: There exists a critical (smallest) weight wc • for which a
rank constraint is just met. Graphical discontinuity can subsequently exist when there is a
range of greater w providing required rank k but not necessarily increasing a minimization
objective function f ; e.g, §4.7.0.0.2. Positive scalar w is chosen via bisection so that
hZ ⋆ , W ⋆ i just vanishes.

4.6 Constraining cardinality


The convex iteration technique for constraining rank can be applied to cardinality
problems. There are parallels in its development analogous to how prototypical
semidefinite program (711) resembles linear program (710) on page 224 [452]:

4.6.1 nonnegative variable


Our goal has been to reliably constrain rank in a semidefinite program. There is a direct
analogy to linear programming that is simpler to present but, perhaps, more difficult to
solve. In Optimization, that analogy is known as the cardinality problem.
Consider a feasibility problem Ax = b , but with an upper bound k on cardinality
kxk0 of a nonnegative solution x : for A ∈ Rm×n and vector b ∈ R(A)

find x ∈ Rn
subject to Ax = b
(541)
xº0
kxk0 ≤ k

where kxk0 ≤ k means4.38 vector x has at most k nonzero entries; such a vector is
presumed existent in the feasible set. Nonnegativity constraint x º 0 is analogous to
positive semidefiniteness; the notation means vector x belongs to the nonnegative orthant
n
Rn+ . Cardinality is quasiconcave on Rn+ just as rank is quasiconcave on S+ . [66, §3.4.2]

4.6.1.1 direction vector


We propose that cardinality-constrained feasibility problem (541) can be equivalently
expressed as iteration of a sequence of two convex problems: for 0 ≤ k ≤ n−1

minimize
n
hx , yi
x∈ R
subject to Ax = b (160)
xº0

n
π(x⋆ )i = hx⋆ , yi
P
minimize
n
i=k+1 y∈ R
(536)
subject to 0¹y¹1
yT 1 = n − k
4.38 Although it is a metric (§5.2), cardinality kxk0 cannot be a norm (§3.2) because it is not positively
homogeneous.
274 CHAPTER 4. SEMIDEFINITE PROGRAMMING
0

R3+

∂H

1
∂H = {x | hx , 1i = κ}

Figure 114: (confer Figure 98) 1-norm heuristic for cardinality minimization can be
interpreted as minimization of a hyperplane, ∂H with normal 1 , over nonnegative orthant
drawn here in R3 . Polar of direction vector y = 1 points toward origin.

where π is the (nonincreasing) presorting function (1487). This sequence is iterated until
x⋆Ty ⋆ vanishes; id est, until desired cardinality is achieved. But this global optimality
cannot be guaranteed.4.39
Problem (536) is analogous to the rank constraint problem; (p.250)
N
X
λ(G⋆ )i = minimize hG⋆ , W i (1892a)
W ∈ SN
i=k+1
subject to 0 ¹ W ¹ I
tr W = N − k

The feasible set of (536) is Linear Program’s analogue to Fantope (§2.3.2.0.1); its optimal
subset comprises a sum of n− k smallest entries from vector x . In context of problem
(541), we want n− k entries of x to sum to zero; id est, we want a globally optimal
objective x⋆Ty ⋆ to vanish: more generally, (confer (815))
n
X
π(|x⋆ |)i = h|x⋆ | , y ⋆ i = |x⋆ |T y ⋆ , 0 (850)
i=k+1

defines global optimality for the iteration. Then n− k entries of x⋆ are themselves zero
whenever their absolute sum is, and cardinality of x⋆ ∈ Rn is at most k . Optimal direction
vector y ⋆ is defined as any nonnegative vector for which

find x ∈ Rn minimize hx , y ⋆ i
n
subject to Ax = b x∈ R
(541) ≡ subject to Ax = b (160)
xº0
kxk0 ≤ k xº0

Existence of such a y ⋆ , whose nonzero entries are complementary to those of x⋆ , is obvious


assuming existence of a cardinality-k solution x⋆ .
4.39 When it succeeds, a sequence may be regarded as a homotopy to minimal 0-norm.
4.6. CONSTRAINING CARDINALITY 275

4.6.1.2 direction vector interpretation

(confer §4.5.1.1) Vector y may be interpreted as a negative search direction; it points


opposite to direction of movement of hyperplane {x | hx , yi = τ } in a minimization of
real linear function hx , yi over the feasible set in linear program (160). (p.62) Direction
vector y is not unique. The feasible set of direction vectors in (536) is the convex hull of
all cardinality-(n− k) binary vectors; videlicet,

conv{u ∈ Rn | card u = n − k , ui ∈ {0, 1}} = {a ∈ Rn | 1 º a º 0 , h1 , ai = n − k} (851)

This set, argument to conv{ } , comprises the extreme points of set (851) which is a
nonnegative hypercube slice. An optimal solution y to (536), that is an extreme point
of its feasible set, is known in closed form: it has 1 in each entry corresponding to the
n− k smallest entries of x⋆ and has 0 elsewhere. That particular polar direction −y can
be interpreted4.40 (by Proposition 7.1.3.0.3) as pointing toward the nonnegative orthant
in the Cartesian subspace, whose basis is a subset of the Cartesian axes, containing all
cardinality k (or less) vectors having the same ordering as x⋆ . Consequently, for that
closed-form solution, (confer (816))

n
X
π(|x⋆ |)i = h|x⋆ | , yi = |x⋆ |T y ≥ 0 (852)
i=k+1

When y = 1 , as in 1-norm minimization for example, then polar direction −y points


directly at the origin (the cardinality-0 nonnegative vector) as in Figure 114. We
sometimes solve (536) instead of employing a known closed form because a direction
vector is not unique. Setting direction vector y instead in accordance with an iterative
inverse weighting scheme, called reweighting [188], was described for the 1-norm by Huo
[239, §4.11.3] in 1999.

4.6.1.3 convergence can mean stalling

Convex iteration (160) (536) always converges to a locally optimal solution, a fixed point
of possibly infeasible cardinality, by virtue of a monotonically nonincreasing real objective
sequence. [294, §1.2] [44, §1.1] There can be no proof of global optimality, defined by (850).
Constraining cardinality (solution to problem (541)) can often be achieved, but simple
examples can be contrived that stall at a fixed point of infeasible cardinality; at a positive
objective value hx⋆ , yi = τ > 0. Direction vector y is then manipulated, as countermeasure,
to steer out of local minima; e.g, complete randomization as in Example 4.6.1.5.1, or
reinitialization to a random cardinality-(n− k) vector in the same nonnegative orthant
face demanded by the current iterate: y has nonnegative uniformly distributed random
entries in (0 , 1] corresponding to the n− k smallest entries of x⋆ and has 0 elsewhere.
Zero entries behave like memory or state while randomness greatly diminishes likelihood
of a stall. When this particular heuristic is successful, cardinality and objective sequence
hx⋆ , yi versus iteration are characterized by noisy monotonicity.

4.40 Convex iteration (160) (536) is not a projection method because there is no thresholding or discard of
variable-vector x entries. An optimal direction vector y must always reside on the feasible set boundary
in (536) page 273; id est, it is ill-advised to attempt simultaneous optimization of variables x and y .
276 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.6.1.4 algebraic derivation of direction vector for convex iteration


In §3.2.2.1.3, the compressed sensing problem was precisely represented as a nonconvex
difference of convex functions bounded below by 0

find x ∈ Rn minimize kxk1 − kxkn


n
subject to Ax = b x∈R k
≡ subject to Ax = b (541)
xº0
kxk0 ≤ k xº0

where convex k-largest norm kxkn is monotonic on Rn+ . There we showed how (541) is
k
equivalently stated in terms of gradients
D E
minimize x , ∇kxk1 − ∇kxk n
x∈Rn k
subject to Ax = b (853)
xº0
because
kxk1 = xT ∇kxk1 , kxkn = xT ∇kxkn , xº0 (854)
k k

The objective function from (853) is a directional derivative (at x in direction x , §D.1.6,
confer §D.1.4.1.1) of the objective function from (541) while the direction vector of convex
iteration
y = ∇kxk1 − ∇kxkn (855)
k
T
is an objective gradient where ∇kxk1 = ∇1 x = 1 under nonnegativity and

∇kxkn = ∇z Tx = arg maximize n
z Tx 
k z∈R

subject to 0 ¹ z ¹ 1  , xº0 (544)
zT1 = k

is not unique. Substituting 1 − z ← z the direction vector becomes

y = 1 − arg maximize
n
z Tx ← arg minimize
n
zTx
z∈R z∈R
subject to 0¹z¹1 subject to 0¹z¹1 (536)
zT1 = k T
z 1=n−k

4.6.1.5 optimality conditions for minimal cardinality


Now we see how global optimality conditions can be stated without reference to a dual
problem: From conditions (478) for optimality of (541), it is necessary [66, §5.5.3] that

x⋆ º 0 (1)
Ax⋆ = b (2)
∇kx k1 − ∇kx kn + ATν ⋆ º 0
⋆ ⋆
(3) (856)
k
h∇kx⋆ k1 − ∇kx⋆ kn + A ν , x⋆ i = 0
T ⋆
(4L)
k

These conditions must hold at any optimal solution (locally or globally). By (854), the
fourth condition is identical to
kx⋆ k1 − kx⋆ kn + ν ⋆TAx⋆ = 0 (4L) (857)
k
Because a 1-norm
kxk1 = kxkn + kπ(|x|)k+1:n k1 (858)
k
4.6. CONSTRAINING CARDINALITY 277

7
Donoho bound
m/k approximation
x > 0 constraint minimize kxk1
6 x (529)
subject to Ax = b

m > k log2 (1+ n/k)


4
minimize kxk1
x
subject to Ax = b (534)
3 xº0

hard
1
0 0.2 0.4 0.6 0.8 1
k/n

Figure 115: (confer Figure 76) For Gaussian random matrix A ∈ Rm×n , graph illustrates
Donoho/Tanner least lower bound on number of measurements m below which recovery
of k-sparse n-length signal x by linear programming fails with overwhelming probability.
Hard problems are below curve, but not the reverse; id est, failure above depends on
proximity. Inequality demarcates approximation (− − −) to empirical phase transition
from [25]. Problems having nonnegativity constraint (· · ·) are easier to solve. [141] [142]

is separable into k largest and n − k smallest absolute entries,

kπ(|x|)k+1:n k1 = 0 ⇔ kxk0 ≤ k (4g) (859)

is a necessary condition for global optimality. By assumption, matrix A is wide and


b 6= 0 ⇒ Ax⋆ 6= 0. This means ν ⋆ ∈ N (AT ) ⊂ Rm , and ν ⋆ = 0 when A is full-rank. By
definition, ∇kxk1 º ∇kxkn always holds. Assuming existence of a cardinality-k solution,
k
then only three of the four conditions are necessary and sufficient for global optimality of
(541):
x⋆ º 0 (1)
Ax⋆ = b (2) (860)
⋆ ⋆
kx k1 − kx kn = 0 (4g)
k
meaning, global optimality of a feasible solution to (541) is identified by a zero objective.

4.6.1.5.1 Example. Sparsest solution to Ax = b . [77] [137]


(confer Example 4.6.2.0.2) Data (742) induces sparsest solution not easily recoverable by
least 1-norm; id est, not by compressed sensing because of proximity to a theoretical lower
bound on number of measurements m depicted in Figure 115: for A ∈ Rm×n
ˆ Given data from Example 4.2.3.1.1, for m = 3 , n = 6 , k = 1
−1 1 8 1 1 0 1
   

A =  −3 2 8 12 1
3
1
2−3
1 
, b =  12  (742)
1 1 1 1 1
−9 4 8 4 9 4−9 4

the sparsest solution to classical linear equation Ax = b is x = e4 ∈ R6 (confer (754)).


278 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Although the sparsest solution is recoverable by inspection, we discern it instead by convex


iteration; namely, by iterating problem sequence (160) (536) on page 273. From the
numerical data given, cardinality kxk0 = 1 is expected. Iteration continues until xTy
vanishes (to within some numerical precision); id est, until desired cardinality is achieved.
But this comes not without a stall.
Stalling, whose occurrence is sensitive to initial conditions of convex iteration, is a
consequence of finding a local minimum of a multimodal objective hx , yi when regarded
as simultaneously variable in x and y . (§3.14.0.0.3) Stalls are simply detected as fixed
points x of infeasible cardinality, sometimes remedied by reinitializing direction vector y
to a random positive state.
Bolstered by success in breaking out of a stall, we then apply convex iteration to 22,000
randomized problems:
ˆ Given random data for m = 3 , n = 6 , k = 1 , in Matlab notation

A = randn(3 , 6), index = round(5∗ rand(1)) + 1, b = rand(1)∗ A(: , index) (861)

the sparsest solution x ∝ eindex is a scaled standard basis vector.


Without convex iteration or a nonnegativity constraint x º 0 , rate of failure for this
minimal cardinality problem Ax = b by 1-norm minimization of x is 22%. That failure
rate drops to 6% with a nonnegativity constraint. If we then engage convex iteration,
detect stalls, and randomly reinitialize the direction vector, failure rate drops to 0% but
the amount of computation is approximately doubled. 2

Stalling is not an inevitable behavior. For some problem types (beyond mere Ax = b),
convex iteration succeeds nearly all the time. Here is a cardinality problem, with noise,
whose statement is just a bit more intricate but easy to solve in a few convex iterations:

4.6.1.5.2 Example. Signal dropout. [140, §6.2]


Signal dropout is an old problem; well studied from both an industrial and academic
perspective. Essentially dropout means momentary loss or gap in a signal, while passing
through some channel, caused by some man-made or natural phenomenon. The signal
lost is assumed completely destroyed somehow. What remains within the time-gap is
system or idle channel noise. The signal could be voice over Internet protocol (VoIP), for
example, audio data from a compact disc (CD) or video data from a digital video disc
(DVD), a television transmission over cable or the airwaves, or a typically ravaged cell
phone communication, etcetera.
Here we consider signal dropout in a discrete-time signal corrupted by additive white
noise assumed uncorrelated to the signal. The linear channel is assumed to introduce
no filtering. We create a discretized windowed signal for this example by positively
combining k randomly chosen vectors from a discrete cosine transform (DCT) basis
denoted Ψ ∈ Rn×n . Frequency increases, in the Fourier sense, from DC toward Nyquist as
column index of basis Ψ increases. Otherwise, details of the basis are unimportant except
for its orthogonality ΨT = Ψ−1 . Transmitted signal is denoted

s = Ψz ∈ Rn (862)

whose upper bound on DCT basis coefficient cardinality card z ≤ k is assumed known;4.41
hence a critical assumption: transmitted signal s is sparsely supported (k < n) on the DCT
basis. It is further assumed that nonzero signal coefficients in vector z place each chosen
basis vector above the noise floor.
4.41 This simplifies exposition, although it may be an unrealistic assumption in many applications.
4.6. CONSTRAINING CARDINALITY 279

We also assume that the gap’s beginning and ending in time are precisely localized to
within a sample; id est, index ℓ locates the last sample prior to the gap’s onset, while
index n−ℓ+1 locates the first sample subsequent to the gap: for rectangularly windowed
received signal g possessing a time-gap loss and additive noise η ∈ Rn
 
s1:ℓ + η1:ℓ
g = ηℓ+1:n−ℓ  ∈ Rn (863)
sn−ℓ+1:n + ηn−ℓ+1:n

The window is thereby centered on the gap and short enough so that the DCT spectrum
of signal s can be assumed static over the window’s duration n . Signal to noise ratio
within this window is defined
°· ¸°
°
° s1:ℓ °
°
° sn−ℓ+1:n °
SNR , 20 log (864)
kηk
In absence of noise, knowing the signal DCT basis and having a good estimate of basis
coefficient cardinality makes perfectly reconstructing gap-loss easy: it amounts to solving
a linear system of equations and requires little or no optimization; with caveat, number
of equations exceeds cardinality of signal representation (roughly ℓ ≥ k) with respect to
DCT basis.
But addition of a significant amount of noise η increases level of difficulty dramatically;
a 1-norm based method of reducing cardinality, for example, almost always returns
DCT basis coefficients numbering in excess of minimal cardinality. We speculate that is
because signal cardinality 2ℓ becomes the predominant cardinality. DCT basis coefficient
cardinality is an explicit constraint to the optimization problem we shall pose: In presence
of noise, constraints equating reconstructed signal f to received signal g are not possible.
We can instead formulate the dropout recovery problem as a best approximation:
°· ¸°
° f1:ℓ − g1:ℓ °
minimize °
° fn−ℓ+1:n − gn−ℓ+1:n °
°
x∈ Rn
subject to f = Ψ x (865)
xº0
card x ≤ k

We propose solving this nonconvex problem (865) by moving the cardinality constraint to
the objective as a regularization term as explained in §4.6 (p.273); id est, by iteration of
two convex problems until convergence:
°· ¸°
° f1:ℓ − g1:ℓ °
minimize hx , yi + ° °
x∈ R n ° fn−ℓ+1:n − gn−ℓ+1:n °
(866)
subject to f = Ψ x
xº0
and
minimize
n
hx⋆ , yi
y∈ R
subject to 0¹y¹1 (536)
yT 1 = n − k
Signal cardinality 2ℓ is implicit to the problem statement. When number of samples in
the dropout region exceeds half the window size, then that deficient cardinality of signal
remaining becomes a source of degradation to reconstruction in presence of noise. Thus, by
observation, we divine a reconstruction rule for this signal dropout problem to attain good
noise suppression: ℓ must exceed a maximum of cardinality bounds; 2ℓ ≥ max{2k , n/2}.
280 CHAPTER 4. SEMIDEFINITE PROGRAMMING

0.25
flatline and g
s+η
0.2

0.15

0.1 s+η
η
0.05

0 (a)

−0.05

−0.1

−0.15

−0.2 dropout (s = 0)

−0.25
0 100 200 300 400 500

0.25
f and g
0.2

0.15 f
0.1

0.05

0 (b)

−0.05

−0.1

−0.15

−0.2

−0.25
0 100 200 300 400 500

Figure 116: (a) Signal dropout in signal s corrupted by noise η (SNR = 10dB, g = s + η).
Flatline indicates duration of signal dropout. (b) Reconstructed signal f (red) overlaid
with corrupted signal g .
4.6. CONSTRAINING CARDINALITY 281

0.25
f−s
0.2

0.15

0.1

0.05

0 (a)

−0.05

−0.1

−0.15

−0.2

−0.25
0 100 200 300 400 500

0.25
f and s
0.2

0.15

0.1

0.05

0 (b)

−0.05

−0.1

−0.15

−0.2

−0.25
0 100 200 300 400 500

Figure 117: (a) Error signal power (reconstruction f less original noiseless signal s) is
36dB below s . (b) Original signal s overlaid with reconstruction f (red) from signal g
having dropout plus noise.
282 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Figure 116 and Figure 117 show one realization of this dropout problem. Original
signal s is created by adding four (k = 4) randomly selected DCT basis vectors, from
Ψ (n = 500 in this example), whose amplitudes are randomly selected from a uniform
distribution above the noise floor; in the interval [10−10/20 , 1]. Then a 240-sample dropout
is realized (ℓ = 130) and Gaussian noise η added to make corrupted signal g (from which
a best approximation f will be made) having 10dB signal to noise ratio (864). The time
gap contains much noise, as apparent from Figure 116a. But in only a few iterations (866)
(536), original signal s is recovered with relative error power 36dB down; illustrated in
Figure 117. Correct cardinality is also recovered (card x = card z) along with the basis
vector indices used to make original signal s . Approximation error is due to DCT basis
coefficient estimate error. When this experiment is repeated 1000 times on noisy signals
averaging 10dB SNR, the correct cardinality and indices are recovered 99% of the time
with average relative error power 30dB down. Without noise, we get perfect reconstruction
in one iteration. [435, Matlab code] 2

4.6.1.6 Compressed sensing geometry with a nonnegative variable

It is well known that cardinality problem (541) (p.180) is easier to solve by linear
programming when variable x is nonnegatively constrained than when not. We postulate
a simple geometrical explanation:
Figure 75 illustrates 1-norm ball B1 in R3 and affine subset A defined {x ∈ R3 | Ax = b}.
Prototypical compressed sensing problem, for A ∈ Rm×n

minimize kxk1
x (529)
subject to Ax = b

is solved when the 1-norm ball B1 kisses the affine subset.


If variable x is constrained to the nonnegative orthant

minimize kxk1 minimize 1T x minimize c


n
x∈ R n
x∈ R c∈R , x∈Rn
subject to Ax = b ≡ subject to Ax = b ≡ subject to Ax = b (534)
xº0 xº0 x ∈ cS

then 1-norm ball B1 becomes nonnegative simplex S in Figure 118 where

cS = {[ I ∈ Rn×n 0 ∈ Rn ]a | aT 1 = c , a º 0} = {x | x º 0 , 1T x ≤ c} (867)

Nonnegative simplex S is the convex hull of its vertices. All n + 1 vertices of S are
constituted by standard basis vectors and the origin. In other words, all its nonzero
extreme points are cardinality-1.
Affine subset A kisses nonnegative simplex c⋆ S at optimality of (534). A kissing point
is achieved at x⋆ for optimal c⋆ as B1 or S contracts. Whereas 1-norm ball B1 has
only six vertices in R3 corresponding to cardinality-1 solutions, simplex S has three edges
(along the Cartesian axes) containing an infinity of cardinality-1 solutions. And whereas
B1 has twelve edges containing cardinality-2 solutions, S has three (out of total four)
facets constituting cardinality-2 solutions. In other words, likelihood of a low-cardinality
solution is higher by kissing nonnegative simplex S (534) than by kissing 1-norm ball B1
(529) because facial dimension (corresponding to given cardinality) is higher in S .
Empirically, this observation also holds in other Euclidean dimensions; e.g, Figure 76,
Figure 115.
4.6. CONSTRAINING CARDINALITY 283

R3

cS = {x | x º 0 , 1T x ≤ c}

A = {x ∈ R3 | Ax = b}

y
F

Figure 118: Simplex S is convex hull of origin and all cardinality-1 nonnegative vectors of
unit norm (its vertices). Line A , intersecting two-dimensional (cardinality-2) face F of
nonnegative simplex cS , emerges from cS at a cardinality-1 vertex. S equals nonnegative
orthant R3+ ∩ 1-norm ball B1 (Figure 75). Kissing point achieved when • (on edge) meets
A as simplex contracts (as scalar c diminishes) under optimization (534).
284 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.6.1.7 cardinality-1 compressed sensing problem always solvable


In the special case of cardinality-1 feasible solution to nonnegative compressed sensing
problem (534), there is a geometrical interpretation that leads to an algorithm.
Figure 118 illustrates a cardinality-1 feasible solution to problem (534) in R3 ; a vertex
solution. But first-octant S of 1-norm ball B1 does not kiss line A ; which would be an
optimality condition. How can we perform optimization and make A intersect S at a
vertex? Assuming that nonnegative cardinality-1 solutions exist in the feasible set, it so
happens:

4.6.1.7.1 Algorithm. Deprecation.


Columns of measurement matrix A , corresponding to high cardinality solution of
(534)4.42 found by Simplex method [104], may be deprecated and the problem solved
again with those columns missing. Such columns are recursively removed from A until a
cardinality-1 solution is found. ¶

This algorithm intimates that either a solution to problem (534) is cardinality-1 or


column indices of A , corresponding to a higher cardinality solution, do not intersect that
index corresponding to a cardinality-1 feasible solution.
When problem (534) is first solved, in the example of Figure 118, solution is
cardinality-2 at a kissing point on that edge of simplex cS indicated by • . Imagining
that the corresponding cardinality-2 face F has collapsed, as a result of zeroing those two
extreme points whose convex hull constructs that same edge • of F , then the simplex
collapses to a line segment along the y axis. When that line segment kisses A , then the
cardinality-1 vertex solution illustrated has been found.4.43

4.6.1.7.2 Proof (pending). Deprecation algorithm 4.6.1.7.1.


We require proof that a cardinality-1 feasible solution to (534) cannot exist within
a higher cardinality optimal solution found by Simplex method; for only then can
corresponding columns of A be eliminated without precluding cardinality-1 at optimality
of the deprecated problem. Crucial is the Simplex method of solution because then an
optimal solution is guaranteed to reside at a vertex of the feasible set. [104, p.158] [17, p.2]
¥

Although it is more efficient (compared with our algorithm) to search over individual
columns of matrix A for a cardinality-1 solution known a priori to exist, tables are turned
when cardinality exceeds 1 :

4.6.2 cardinality-k geometric presolver


This idea of deprecating columns has foundation in convex cone theory. (§2.13.5) Removing
columns (and rows)4.44 from A ∈ Rm×n , in a linear program like (534) in §3.2, is known
in the industry as presolving;4.45 the elimination of redundant constraints and identically
4.42 Because signed compressed sensing problem (529) can be equivalently expressed in a nonnegative
variable, as we learned in Example 3.2.0.1.1 (p.178), and because a cardinality-1 constraint in (529)
transforms to a cardinality-1 constraint in its nonnegative equivalent (533), then this cardinality-1 recursive
reconstruction algorithm continues to hold for a signed variable as in (529).
4.43 A similar argument holds for any orientation of line A and cardinality-1 point of emergence from

simplex cS . This cardinality-1 reconstruction algorithm also holds more generally when affine subset A
has any higher dimension n− m .
4.44 Rows of matrix A are removed based upon linear dependence. Assuming b ∈ R(A) , corresponding

entries of vector b may also be removed without loss of generality.


4.45 . . . presolving can in particular do the following:
4.6. CONSTRAINING CARDINALITY 285

Rn Rn+ (a)

A
P

Rm b
(b)

Figure 119: Constraint interpretations: (a) Halfspace-description of feasible set in problem


(534) is a polyhedron P formed by intersection of nonnegative orthant Rn+ with hyperplanes
A prescribed by equality constraint. (Drawing by Pedro Sánchez.) (b) Vertex-description
of constraints in problem (534): point b belongs to polyhedral cone K = {Ax | x º 0}.
Number of extreme directions in K may exceed dimensionality of ambient space.
286 CHAPTER 4. SEMIDEFINITE PROGRAMMING

zero variables prior to numerical solution. We offer a different and geometric presolver
first introduced in §2.13.5:4.46
Two interpretations of the constraints from problem (534) are realized in Figure 119.
Assuming that a cardinality-k solution exists and matrix A describes a pointed polyhedral
cone K = {Ax | x º 0} , as in Figure 119b, columns are removed from A if they do not
belong to the smallest face F of K containing vector b ; those columns correspond to
0-entries in variable vector x (and vice versa). Generators of that smallest face always hold
a minimal cardinality solution, in other words, because a generator outside the smallest
face (having positive coefficient) would violate the assumption that b belongs to that face.
Benefit accrues when vector b does not belong to relative interior of K ; there would
be no columns to remove were b ∈ rel intr K since the smallest face becomes cone K itself
(Example 4.6.2.0.2). Were b an extreme direction, at the other end of the spectrum, then
the smallest face is an edge that is a ray containing b ; this geometrically describes a
cardinality-1 case where all columns, save one, would be removed from A .
When vector b resides in a face F of K that is not cone K itself, benefit is realized
as a reduction in computational intensity because the consequent equivalent problem has
smaller dimension. Number of columns removed depends completely on geometry of a
given problem; particularly, location of b within K . In the example of Figure 119b,
interpreted literally in R3 , all but two columns of A are discarded by our presolver when
b belongs to facet F .

4.6.2.0.1 Exercise. Minimal cardinality generators.


Prove that generators of the smallest face F of K = {Ax | x º 0} , containing vector b ,
always hold a minimal cardinality solution to Ax = b . H

4.6.2.0.2 Example. Presolving for cardinality-2 solution to Ax = b .


(confer Example 4.6.1.5.1) Again taking data from Example 4.2.3.1.1 (A ∈ Rm×n , desired
cardinality of x is k), for m = 3 , n = 6 , k = 2

−1 1 8 1 1 0 1
   

A =  −3 2 8 1 2
1 1
3 −1  ,
2 3 b = 1  (742)
2
1 1 1 1 1
−9 4 8 4 9 4−9 4

proper cone K = {Ax | x º 0} is pointed as proven by method of §2.12.2.2. A cardinality-2


solution is known to exist; sum of the last two columns of matrix A . Generators of
the smallest face that contains vector b , found by the method in Example 2.13.5.0.1,
comprise the entire A matrix because b ∈ intr K (§2.13.4.2.4). So geometry of this
particular problem does not permit number of generators to be reduced below n by
discerning the smallest face.4.47 2

There is wondrous bonus to presolving when a constraint matrix is sparse. After


columns are removed by theory of convex cones (finding the smallest face), some remaining
rows may become 0T , identical to other rows, or nonnegative. When nonnegative
1. Fix a variable, i.e, permanently set y = p .
2. Aggregate a variable, i.e, conclude that y = ax + c for some values a and c .
3. Multi-aggregate a variable, i.e, conclude that y = a1 x1 + . . . + ak xk + c .
In all cases, y will be removed from the set of “active” variables and instead added to the set of “fixed”
variables. −Tobias Achterberg
4.46 Comparison of computational intensity to a brute force search would pit combinatorial complexity, a
µ ¶
n
binomial coefficient ∝ , against polynomial complexity of this conic presolver.
k
4.47 But a canonical set of conically independent generators of K comprise only the first two and last two

columns of A .
4.6. CONSTRAINING CARDINALITY 287

rows appear in an equality constraint to 0 , all nonnegative variables corresponding to


nonnegative entries in those rows must vanish (§A.7.1); meaning, more columns may be
removed. Once rows and columns have been removed from a constraint matrix, even more
rows and columns may be removed by repeating the presolver procedure.

4.6.3 constraining cardinality of signed variable


Now consider a feasibility problem equivalent to the classical problem from linear algebra
Ax = b , but with an upper bound k on cardinality kxk0 : for vector b ∈ R(A)
find x ∈ Rn
subject to Ax = b (868)
kxk0 ≤ k
where kxk0 ≤ k means vector x has at most k nonzero entries; such a vector is presumed
existent in the feasible set. Convex iteration (§4.6.1) utilizes a nonnegative variable; so
absolute value |x| is needed here. We propose that nonconvex problem (868) can be
equivalently written as a sequence of convex problems that move the cardinality constraint
to the objective:
minimize ht , y + ε1i
minimize
n
h|x| , yi x∈ Rn , t∈ Rn
x∈ R ≡ subject to Ax = b (869)
subject to Ax = b
−t ¹ x ¹ t
minimize
n
ht⋆ , y + ε1i
y∈ R
subject to 0¹y¹1 (536)
yT 1 = n − k
where ε is a relatively small positive constant. This sequence is iterated until a direction
vector y is found that makes |x⋆ |T y ⋆ vanish. The term ht , ε1i in (869) is necessary to
determine absolute value |x⋆ | = t⋆ (§3.2) because vector y can have zero-valued entries.
By initializing y to (1− ε)1 , the first iteration of problem (869) is a 1-norm problem
(525); id est,
minimize ht , 1i
x∈ Rn , t∈ Rn minimize
n
kxk1
subject to Ax = b ≡ x∈ R (529)
subject to Ax = b
−t ¹ x ¹ t
Subsequent iterations of problem (869) engaging cardinality term ht , yi can be interpreted
as corrections to this 1-norm problem leading to a 0-norm solution; vector y can be
interpreted as a direction of search.

4.6.3.1 local optimality


As before (§4.6.1.3), convex iteration (869) (536) always converges to a locally optimal
solution; a fixed point of possibly infeasible cardinality.

4.6.3.2 simple variations on a signed variable


Several useful equivalents to linear programs (869) (536) are easily devised, but their
geometrical interpretation is not as apparent: e.g, equivalent in the limit ε → 0+
minimize ht , yi
x∈ Rn , t∈ Rn
subject to Ax = b (870)
−t ¹ x ¹ t
288 CHAPTER 4. SEMIDEFINITE PROGRAMMING

minimize
n
h|x⋆ | , yi
y∈ R
subject to 0¹y¹1 (536)
yT 1 = n − k
We get another equivalent to linear programs (869) (536), in the limit, by interpreting
problem (529) as infimum to a vertex-description of the 1-norm ball (Figure 75,
Example 3.2.0.1.1, confer (528)):

minimize ha , yi
minimize
n
kxk1 2n
a∈ R
x∈ R ≡ subject to [ A −A ]a = b (871)
subject to Ax = b
aº0

minimize
2n
ha⋆ , yi
y∈ R
subject to 0¹y¹1 (536)
y T 1 = 2n − k
where x⋆ = [ I −I ]a⋆ ; from which it may be rightfully construed that any vector 1-norm
minimization problem has equivalent expression in a nonnegative variable.

4.7 Cardinality and rank constraint examples


4.7.0.0.1 Example. Projection on ellipsoid boundary. [56] [172, §5.1] [282, §2]
Consider classical linear equation Ax = b but with constraint on norm of solution x , given
matrices C and wide A and vector b ∈ R(A)

find x ∈ RN
subject to Ax = b (872)
kC xk = 1

The set {x | kC xk = 1} (27) describes an ellipsoid boundary (Figure 15). This problem is
nonconvex because solution is constrained to that boundary. Assign
C x [ xTC T 1 ] C xxTC T C x
· ¸ · ¸ · ¸
X Cx
G= = , ∈ SN +1 (873)
1 xTC T 1 xTC T 1

Any rank-1 solution must have this form. (§B.1.0.2) Ellipsoidally constrained feasibility
problem (872) is equivalent to:

find x ∈ RN
X∈ SN
subject to Ax =·b ¸
X Cx (874)
G= (º 0)
xTC T 1
rank G = 1
tr X = 1
This is transformed to an equivalent convex problem by moving the rank constraint to the
objective: We iterate solution of
minimize hG , Y i
X∈ SN , x∈RN
subject to Ax =·b
(875)
¸
X Cx
G= º0
xTC T 1
tr X = 1
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 289

with
minimize hG⋆ , Y i
Y ∈ SN +1
subject to 0¹Y¹I (876)
tr Y = N

until convergence. Initially 0 , direction matrix Y ∈ SN +1 regulates rank. (1892a)


Singular value decomposition G⋆ = U ΣQT ∈ SN +
+1
(§A.6) provides a new direction matrix
T
Y = U (: , 2 : N + 1)U (: , 2 : N + 1) that optimally solves (876) at each iteration. An optimal
solution to (872) is thereby found in a few iterations, making convex problem (875) its
equivalent.
It remains possible for the iteration to stall; were a rank-1 G matrix not found. In
that case, the current search direction is momentarily reversed with an added randomized
element:
Y = −U(: , 2 : N+ 1) ∗ (U(: , 2 : N+ 1)′ + randn(N , 1) ∗ U(: , 1)′ ) (877)

in Matlab notation. This heuristic is quite effective for problem (872) which is
exceptionally easy to solve by convex iteration.
When b ∈ / R(A) then problem (872) must be restated as a projection:

minimize kAx − bk
x∈RN (878)
subject to kC xk = 1

This is a projection of point b on an ellipsoid boundary because any affine transformation


of an ellipsoid remains an ellipsoid. Problem (875) in turn becomes

minimize hG , Y i + kAx − bk
X∈ SN , x∈RN · ¸
X Cx (879)
subject to G= º0
xTC T 1
tr X = 1

We iterate this with calculation (876) of direction matrix Y as before until a rank-1
G matrix is found. 2

4.7.0.0.2 Example. Orthonormal Procrustes. [56]


Example 4.7.0.0.1 is extensible. An orthonormal matrix Q ∈ Rn×p is characterized
QTQ = I . Consider the particular case Q = [ x y ] ∈ Rn×2 as variable to a Procrustes
problem (§C.3): given A ∈ Rm×n and B ∈ Rm×2

minimize kAQ − B kF
Q∈ Rn×2 (880)
subject to QTQ = I

which is nonconvex. By vectorizing matrix Q we can make the assignment:

x [ xT y T 1 ] xxT xy T
     
X Z x x
G = y  =  ZT Y y , yxT
  yy T y ∈ S2n+1 (881)
1 xT yT 1 xT yT 1
290 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Now orthonormal Procrustes problem (880) can be equivalently restated:


minimize kA[ x y ] − B kF
X , Y∈S, Z , x, y  
X Z x
subject to G =  Z T Y y (º 0)
xT y T 1
(882)
rank G = 1
tr X = 1
tr Y = 1
tr Z = 0
To solve this, we form the convex problem sequence:
minimize kA[ x y ] − B kF + hG , W i
X , Y , Z , x, y  
X Z x
subject to G =  ZT Y y º 0
xT y T 1 (883)
tr X = 1
tr Y = 1
tr Z = 0
and
minimize hG⋆ , W i
W ∈ S2n+1
subject to 0¹W¹ I (884)
tr W = 2n
which has an optimal solution W that is known in closed form (p.539). These two problems
are iterated until convergence and a rank-1 G matrix is found. A good initial value for
direction matrix W is 0. Optimal Q⋆ equals [ x⋆ y ⋆ ].
Numerically, this Procrustes problem is easy to solve; a solution seems always to be
found in one or few iterations. This problem formulation is extensible, of course, to
orthogonal (square) matrices Q . 2

4.7.0.0.3 Example. Combinatorial Procrustes problem.


In case A , B ∈ Rn , when vector A = ΞB is known to be a permutation of vector B ,
solution to orthogonal Procrustes problem
minimize kA − X BkF
X∈ Rn×n (1904)
subject to X T = X −1
is not necessarily a permutation matrix Ξ even though an optimal objective value of 0
is found by the known analytical solution (§C.3). The simplest method of solution finds
permutation matrix X ⋆ = Ξ simply by sorting vector B with respect to A .
Instead of sorting, we design two different convex problems each of whose optimal
solution is a permutation matrix: one design is based on rank constraint, the other on
cardinality. Because permutation matrices are sparse by definition, we depart from a
traditional Procrustes problem by instead demanding a vector 1-norm which is known to
produce solutions more sparse than Frobenius’ norm.
There are two principal facts exploited by the first convex iteration design (§4.5.1) we
propose. Permutation matrices Ξ constitute:
1) the set of all nonnegative orthogonal matrices,
2) all points extreme to the polyhedron (104) of doubly stochastic matrices.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 291

{X(: , i) | X(: , i)T X(: , i) = 1}

{X(: , i) | 1T X(: , i) = 1}

Figure 120: Permutation matrix i th column-norm and column-sum constraint, abstract


in two dimensions, when rank-1 constraint is satisfied. Optimal solutions reside at
intersection of hyperplane with unit circle.

That means:
1) norm of each row and column is 1 ,4.48

kΞ(: , i)k = 1 , kΞ(i , :)k = 1 , i=1 . . . n (885)

2) sum of each nonnegative row and column is 1 , (§2.3.2.0.4)

ΞT 1 = 1 , Ξ1 = 1 , Ξ≥0 (886)

solution via rank constraint


The idea is to individually constrain each column of variable matrix X to have unity
norm. Matrix X must also belong to that polyhedron, (104) in the nonnegative orthant,
implied by constraints (886); so each row-sum and column-sum of X must also be unity.
It is this combination of nonnegativity, sum, and sum square constraints that extracts the
permutation matrices: (Figure 120) given nonzero vectors A , B
Pn
minimize kA − X Bk1 + w hGi , Wi i
X∈ Rn×n, Gi ∈ Sn+1 · i=1 ¸ 
Gi (1 : n , 1 : n) X(: , i)
Gi = º0

subject to X(: , i)T 1 , i=1 . . . n
tr Gi = 2
 (887)

X T1 = 1
X1 = 1
X≥0
4.48 This fact would be superfluous were the objective of minimization linear, because the permutation
matrices reside at the extreme points of a polyhedron (104) implied by (886). But as posed, only
either rows or columns need be constrained to unit norm because matrix orthogonality implies transpose
orthogonality. (§B.5.2) Absence of vanishing inner product constraints that help define orthogonality, like
tr Z = 0 from Example 4.7.0.0.2, is a consequence of nonnegativity; id est, the only orthogonal matrices
having exclusively nonnegative entries are permutations of the Identity.
292 CHAPTER 4. SEMIDEFINITE PROGRAMMING

where w ≈ 10 positively weights the rank regularization term. Optimal solutions G⋆i are
key to finding direction matrices Wi for the next iteration of semidefinite programs
(887) (888):
hG⋆i , Wi i 

minimize
n+1
Wi ∈ S

subject to 0 ¹ Wi ¹ I  , i=1 . . . n (888)
tr Wi = n

Direction matrices thus found lead toward rank-1 matrices G⋆i on subsequent iterations.
Constraint on trace of G⋆i normalizes the i th column of X ⋆ to unity because (confer p.361)
X (: , i) [ X ⋆ (: , i)T 1 ]
· ⋆ ¸
G⋆i = (889)
1
at convergence. Binary-valued X ⋆ column entries result from the further sum constraint
X1 = 1. Columnar orthogonality is a consequence of the further transpose-sum constraint
X T 1 = 1 in conjunction with nonnegativity constraint X ≥ 0 ; but we leave proof of
orthogonality an exercise. The optimal objective value is 0 for both semidefinite programs
when vectors A and B are related by permutation. In any case, optimal solution X ⋆
becomes a permutation matrix Ξ .
Because there are n direction matrices Wi to find, it can be advantageous to invoke
a known closed-form solution for each from page 539. What makes this combinatorial
problem more tractable are relatively small semidefinite constraints in (887). (confer (883))
When a permutation A of vector B exists, number of iterations can be as small as 1. But
this combinatorial Procrustes problem can be made even more challenging when vector A
has repeated entries.

solution via cardinality constraint


Now the idea is to force solution at a vertex of permutation polyhedron (104) by finding
a solution of desired sparsity. Because permutation matrix X is n-sparse by assumption,
this combinatorial Procrustes problem may instead be formulated as a compressed sensing
problem with convex iteration on cardinality of vectorized X (§4.6.1): given nonzero
vectors A , B
minimize kA − X Bk1 + whX , Y i
X∈ Rn×n
subject toX T1 = 1 (890)
X1 = 1
X≥0
where direction vector Y is an optimal solution to
minimize
n×n
hX ⋆ , Y i
Y ∈R
subject to 0≤ Y ≤1 (536)
1T Y 1 = n2 − n
each a linear program. In this circumstance, use of closed-form solution for direction
vector Y is discouraged. When vector A is a permutation of B , both linear programs
have objectives that converge to 0. When vectors A and B are permutations and no entries
of A are repeated, optimal solution X ⋆ can be found as soon as the first iteration.
In any case, X ⋆ = Ξ is a permutation matrix. 2

4.7.0.0.4 Exercise. Combinatorial Procrustes constraints.


Assume that the objective of semidefinite program (887) is 0 at optimality. Prove that the
constraints in program (887) are necessary and sufficient to produce a permutation matrix
as optimal solution. Alternatively and equivalently, prove those constraints necessary and
sufficient to optimally produce a nonnegative orthogonal matrix. H
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 293

4.7.0.0.5 Example. Tractable polynomial constraint.


The set of all coefficients for which a multivariate polynomial were convex is generally
difficult to determine. But the ability to handle rank constraints makes any nonconvex
polynomial constraint transformable to a convex constraint. All optimization problems
having polynomial objective and polynomial constraints can be reformulated as a
semidefinite program with a rank-1 constraint. [323] Suppose we require
3 + 2x − xy ≤ 0 (891)
Identify    2 
x [x y 1] x xy x
G = y  =  xy y2 y  ∈ S3 (892)
1 x y 1
Then nonconvex polynomial constraint (891) is equivalent to constraint set
tr(GA) ≤ 0
G33 = 1
(893)
(G º 0)
rank G = 1
with direct correspondence to sense of trace inequality where G is assumed symmetric
(§B.1.0.2) and
0 − 21
 
1
A =  − 21 0 0  ∈ S3 (894)
1 0 3
Then the method of convex iteration from §4.5.1 is applied to implement the rank
constraint. 2

4.7.0.0.6 Exercise. Binary Pythagorean theorem.


The technique in Example 4.7.0.0.5 is extensible to any quadratic constraint; e.g,
xTA x + 2bTx + c ≤ 0 , xTA x + 2bTx + c ≥ 0 , and xTA x + 2bTx + c = 0. Write a
rank-constrained semidefinite program to find the intersection of a line with a circle:
½
x+y =1
(895)
x2 + y 2 = 1
(Figure 120) a set that is not connected. Implement this system in cvx4.49 by convex
iteration. This particular system has no xy terms, so instead of (892) we may assign [154]
· ¸
a−x y
G= ∈ S2 (896)
y a+x
p
Employ the fact that G is positive semidefinite rank-1 iff x2 + y 2 = a ; which holds,
p
by (1620), because G is positive semidefinite iff eigenvalues λ(G) = a ± x2 + y 2 are
nonnegative. a ∈ R is nonnegatively constrained, implicitly, and vanishes iff rank equals 0.
H

4.7.0.0.7 Example. High order polynomials.


Consider nonconvex problem from Canadian Mathematical Olympiad 1999:
find x, y, z
x , y , z∈ R
22
subject to x2 y + y 2 z + z 2 x = 33 (897)
x+y+z =1
x, y, z ≥ 0
4.49 cvx is a high-level prototyping language [195] for Optimization that runs under Matlab.
294 CHAPTER 4. SEMIDEFINITE PROGRAMMING

2
We wish to solve for, what is known to be, a tight upper bound 323 on the constrained
polynomial x2 y + y 2 z + z 2 x by transformation to a rank-constrained semidefinite program.
First identify
   2 
x [x y z 1] x xy zx x
 y   xy y 2 yz y  4
G = =  zx yz z 2 z  ∈ S (898)
  
 z 
1 x y z 1
 [ x2 y 2 z 2 x y z 1 ] 
x2 x4 x2 y 2 z 2 x2 x3 x2 y zx2 x2
 

 y2 

 x2 y 2
 2 2 y4 y2 z2 xy 2 y3 y2 z y2 

 z2   z x y2 z2 z4 z2x yz 2 z3 z2 
 ∈ S7
   3

X=
 x 
  x2
= xy 2 z2x x2 xy zx x 

 y 

 x y
 y3 yz 2 xy y2 yz y 

 z   zx2 y2 z z3 zx yz z2 z 
1 x2 y2 z2 x y z 1 (899)

then apply convex iteration (§4.5.1) to implement rank constraints:


find b
A , C∈ S , b
2
subject to tr(XE) = 323
· ¸
A b
G= T (º 0)
b 1 · ¸
δ(A)
C (900)
X = £ b (º 0)
δ(A)T bT
¤
1
1T b = 1
bº0
rank G = 1
rank X = 1
where  
0 0 0 0 1 0 0
 0 0 0 0 0 1 0 
 
 0 0 0 1 0 0 0 
 1 7
E = 0 0 1 0 0 0 0 2 ∈ S (901)

 1 0 0 0 0 0 0 
 
 0 1 0 0 0 0 0 
0 0 0 0 0 0 0
[429, Matlab code]. Positive semidefiniteness is optional only when rank-1 constraints
are explicit by Theorem A.3.1.0.7. Optimal solution (x , y , z) = (0 , 23 , 31 ) to problem (897)
is not unique. 2

4.7.0.0.8 Exercise. Motzkin polynomial.


Prove xy 2 + x2 y − 3xy + 1 to be nonnegative on the nonnegative orthant. H

4.7.0.0.9 Example. Boolean vector satisfying Ax ¹ b . (confer §4.2.3.1.1)


Now we consider solution to a discrete problem whose only known analytical method of
solution is combinatorial in complexity: given A ∈ RM ×N and b ∈ RM
find x ∈ RN
subject to Ax ¹ b (902)
δ(xxT ) = 1
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 295

This nonconvex problem demands a Boolean solution [ xi = ±1 , i = 1 . . . N ].


Assign a rank-1 matrix of variables; symmetric variable matrix X and solution
vector x :
x [ xT 1 ] xxT x
· ¸ · ¸ · ¸
X x
G= = , ∈ SN +1 (903)
1 xT 1 xT 1
Then design an equivalent semidefinite feasibility problem to find a Boolean solution to
Ax ¹ b :
find x ∈ RN
X∈ SN
subject to Ax ¹·b ¸
X x (904)
G= (º 0)
xT 1
rank G = 1
δ(X) = 1
where x⋆i ∈ {−1, 1} , i = 1 . . . N . The two variables X and x are made dependent via
their assignment to rank-1 matrix G . By (1798), an optimal rank-1 matrix G⋆ must take
the form (903).
As before, we regularize the rank constraint by introducing a direction matrix Y into
the objective:
minimize hG , Y i
X∈ SN , x∈RN
subject to Ax ¹·b
(905)
¸
X x
G= º0
xT 1
δ(X) = 1
Solution of this semidefinite program is iterated with calculation of the direction matrix
Y from semidefinite program (876). At convergence, in the sense (815), convex problem
(905) becomes equivalent to nonconvex Boolean problem (902).
Direction matrix Y can be an orthogonal projector having closed-form expression, by
(1892a), although convex iteration is not a projection method. (§4.5.1.1) Given randomized
data A and b for a large problem, we find that stalling becomes likely (convergence of
the iteration to a positive objective hG⋆ , Y i). To overcome this behavior, we introduce
a heuristic into the implementation on Wıκımization [418] that momentarily reverses
direction of search (like (877)) upon stall detection. We find that rate of convergence can
be sped significantly by detecting stalls early. 2

4.7.0.0.10 Example. Variable-vector normalization.


Suppose, within some convex optimization problem, we want vector variables x , y ∈ RN
constrained by a nonconvex equality:

x kyk = y (906)

id est, kxk = 1 and x points in the same direction as y 6= 0 ; e.g,

minimize f (x , y)
x, y
subject to (x , y) ∈ C (907)
x kyk = y

where f is some convex function and C is some convex set. We can realize the nonconvex
equality by constraining rank and adding a regularization term to the objective. Make the
296 CHAPTER 4. SEMIDEFINITE PROGRAMMING

assignment:

x [ xT y T 1 ] xxT xy T
     
X Z x x
G = y  = Z Y y  ,  yxT yy T y ∈ S2N +1 (908)
1 xT yT 1 xT yT 1

where X , Y ∈ SN , also Z ∈ SN [sic]. Any rank-1 solution must take the form of (908).
(§B.1) The problem statement equivalent to (907) is then written

minimize f (x , y) + kX − Y kF
X , Y∈S, Z , x, y
subject to (x , y)∈ C 
X Z x
G= Z Y y (º 0) (909)
xT y T 1
rank G = 1
tr(X) = 1
δ(Z) º 0

The trace constraint on X normalizes vector x while the diagonal constraint on Z


maintains sign between respective entries of x and y . Regularization term kX − Y kF
then makes x equal to y to within a real scalar; (§C.2.0.0.2) in this case, a positive scalar.
To make this program solvable by convex iteration, as explained in Example 4.5.1.2.4 and
other previous examples, we move the rank constraint to the objective

minimize f (x , y) + kX − Y kF + hG , W i
X , Y , Z , x, y
subject to (x , y)∈ C 
X Z x
G= Z Y y º 0 (910)
xT yT 1
tr(X) = 1
δ(Z) º 0

by introducing a direction matrix W found from (1892a):

minimize hG⋆ , W i
W ∈ S2N +1
subject to 0¹W¹ I (911)
tr W = 2N

This semidefinite program has an optimal solution that is known in closed form. Iteration
(910) (911) terminates when rank G = 1 and linear regularization hG , W i vanishes to
within some numerical tolerance in (910); typically, in two iterations. If function f
competes too much with the regularization, positively weighting each regularization term
will become required. At convergence, problem (910) becomes a convex equivalent to the
original nonconvex problem (907). 2

4.7.0.0.11 Example. fast max cut. [127]


Let Γ be an n-node graph, and let the arcs (i , j ) of the graph be associated
with . . . weights aij . The problem is to find a cut of the largest possible weight,
i.e, to partition the set of nodes into two parts Mc , M′c in such a way that the
total weight of all arcs linking Mc and M′c (i.e, with one incident node in Mc
and the other one in M′c [Figure 121]) is as large as possible. −[36, §4.3.3]
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 297
Mc
2 3 4

1
CUT

5
16
M′c

1
6
-1
15
-3
7
-3
5
4

14
8
13
9

10
11
12
Figure 121: A cut partitions nodes {i = 1 . . . 16} of this graph into Mc and M′c . Linear
arcs have circled weights. The problem is to find a cut maximizing total weight of all arcs
linking partitions made by the cut.

Literature on the max cut problem is vast because this problem has elegant primal
and dual formulation, its solution is very difficult, and there exist many commercial
applications; e.g, semiconductor design [144], quantum computing [457].
Our purpose here is to demonstrate how iteration of two simple convex problems can
quickly converge to an optimal solution of the max cut problem with a 98% success rate,
on average.4.50 max cut is stated:
aij (1 − xi xj ) 21
P
maximize
x∈Rn 1≤i<j≤n (912)
subject to δ(xxT ) = 1

where [aij ] are real arc weights, and vector x = [xi ] ∈ Rn corresponds to the n nodes;
specifically,
node i ∈ Mc ⇔ xi = 1
(913)
node i ∈ M′c ⇔ xi = −1
4.50 We term our solution to max cut fast because we sacrifice a little accuracy to achieve speed; id est,
only about two or three convex iterations, achieved by heavily weighting a rank regularization term.
298 CHAPTER 4. SEMIDEFINITE PROGRAMMING

If nodes i and j have the same binary value xi and xj , then they belong to the same
partition and contribute nothing to the cut. Arc (i , j) traverses the cut, otherwise, adding
its weight aij to the cut.
max cut statement (912) is the same as, for A = [aij ] ∈ S n
1 T
maximize
n 4 h11 − xxT , Ai
x∈R
T
(914)
subject to δ(xx ) = 1

Because of Boolean assumption δ(xxT ) = 1


h11T − xxT , Ai = hxxT , δ(A1) − Ai (915)
so problem (914) is the same as
1 T
maximize
n 4 hxx , δ(A1) − Ai
x∈R
T
(916)
subject to δ(xx ) = 1
This max cut problem is combinatorial (nonconvex).
Because an estimate of upper bound to max cut is needed to ascertain
convergence when vector x has large dimension, we digress to derive the dual
problem: Directly from (916), its Lagrangian is [66, §5.1.5] (1588)
1 T
L(x , ν) = 4 hxx , δ(A1) − Ai + hν , δ(xxT ) − 1i
1 T
= 4 hxx , δ(A1) − Ai + hδ(ν) , xxT i − hν , 1i (917)
1 T
= 4 hxx , δ(A1 + 4ν) − Ai − hν , 1i

where quadratic xT(δ(A1 + 4ν)−A)x has supremum 0 if δ(A1 + 4ν)−A is


assumed negative semidefinite, and has supremum ∞ otherwise. The finite
supremum
½
−hν , 1i , assuming A − δ(A1 + 4ν) º 0
g(ν) = sup L(x , ν) = (918)
x∈Rn ∞ otherwise

is chosen as the objective of minimization to dual (convex semidefinite) problem


minimize
n
−ν T 1
ν∈R (919)
subject to A − δ(A1 + 4ν) º 0

whose optimal value (−ν ⋆T 1) provides an upper bound to max cut but is not
tight4.51 ( 14 hxxT , δ(A1)−Ai < g(ν) , duality gap is nonzero); [182] problem
(919) is not a strong dual to (916).4.52
To transform max cut to its convex equivalent, first define
X = xxT ∈ S n (924)
then max cut (916) becomes
1
maximize
n 4 hX , δ(A1) − Ai
X∈ S
subject to δ(X) = 1 (920)
(X º 0)
rank X = 1
4.51 Taking the dual of dual problem (919) would provide (920) but without the rank constraint. [175]
Dual of a dual of even a convex primal problem is not necessarily the same primal problem; although,
optimal solution of one can be obtained from the other.
4.52 Even so, empirically, binary solution arg sup ⋆
x∈B L(x , ν ) to (917) is optimal to (916).
n
±
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 299

whose rank constraint can be regularized as in


1
maximize
n 4 hX , δ(A1) − Ai − whX , W i
X∈ S
subject to δ(X) = 1 (921)
Xº0

where w ≈ 1000 is a nonnegative fixed weight, and W is a direction matrix determined


from
Xn
λ(X ⋆ )i = minimizen
hX ⋆ , W i (1892a)
W∈ S
i=2
subject to 0 ¹ W ¹ I
tr W = n − 1
which has an optimal solution that is known in closed form. These two problems (921)
and (1892a) are iterated until convergence as defined on page 250.
Because convex problem statement (921) is so elegant, it is numerically solvable for
large binary vectors within reasonable time.4.53 To test our convex iterative method, we
compare an optimal convex result to an actual solution of the max cut problem found
by performing a brute force combinatorial search of (916)4.54 for a tight upper bound.
Search-time limits binary vector lengths to 24 bits (about five days CPU time). 98%
accuracy, actually obtained, is independent of binary vector length (12 , 13 , 20 , 24) when
averaged over more than 231 problem instances including planar, randomized, and toroidal
graphs.4.55 When failure occurred, large and small errors were manifest. That same 98%
average accuracy is presumed maintained when binary vector length is further increased.
A Matlab program is provided on Wıκımization [424]. 2

4.7.0.0.12 Example. Cardinality/rank problem.


d’Aspremont, El Ghaoui, Jordan, & Lanckriet [105] propose approximating a positive
semidefinite matrix A ∈ SN
+ by a rank-1 matrix having constraint on cardinality c : for
0<c<N
minimize kA − zz T kF
z (922)
subject to card z ≤ c
which, they explain, is a hard problem equivalent to

maximize xTA x
x
subject to kxk = 1 (923)
card x ≤ c

where z , λ x and where optimal solution x⋆ is a principal eigenvector (1885) (§A.5.1.1)
of A and λ = x⋆TA x⋆ is the principal eigenvalue [185, p.331] when c is true cardinality of
that eigenvector. This is principal component analysis with a cardinality constraint which
controls solution sparsity. Define the matrix variable

X , xxT ∈ SN (924)
4.53 We solved for a length-250 binary vector in only a few minutes and convex iterations on a 2006 vintage
laptop Core 2 CPU (Intel [email protected], 666MHz FSB).
4.54 more computationally intensive than the proposed convex iteration by many orders of magnitude.

Solving max cut by searching over all binary vectors of length 100, for example, would occupy a
contemporary supercomputer for a million years.
4.55 Existence of a polynomial-time approximation to max cut with accuracy provably better than 94.11%

would refute NP-hardness; which Håstad believes to be highly unlikely. [212, thm.8.2] [213]
300 CHAPTER 4. SEMIDEFINITE PROGRAMMING

whose desired rank is 1 , and whose desired diagonal cardinality

card δ(X) ≡ card x (925)

is equivalent to cardinality c of vector x . Then we can transform cardinality problem


(923) to an equivalent in new variable X :4.56

maximize hX , Ai
X∈ SN
subject to hX , I i = 1
(X º 0) (926)
rank X = 1
card δ(X) ≤ c

We transform problem (926) to an equivalent convex problem by introducing two


direction matrices into regularization terms: W to achieve desired cardinality card δ(X) ,
and Y to find an approximating rank-1 matrix X :

maximize hX , A − w1 Y i − w2 hδ(X) , δ(W )i


X∈ SN
subject to hX , I i = 1 (927)
Xº0

where w1 and w2 are positive scalars respectively weighting tr(XY ) and δ(X)T δ(W )
just enough to insure that they vanish to within some numerical precision, where direction
matrix Y is an optimal solution to semidefinite program

minimize hX ⋆ , Y i
Y ∈ SN
subject to 0¹Y¹I (928)
tr Y = N − 1

and where diagonal direction matrix W ∈ SN optimally solves linear program

minimize
2
hδ(X ⋆ ) , δ(W )i
W = δ (W )
subject to 0 ¹ δ(W ) ¹ 1 (929)
tr W = N − c

Both direction matrix programs are derived from (1892a) whose analytical solution is
known but is not necessarily unique. We emphasize (confer p.250): because this iteration
(927) (928) (929) (initial Y, W = 0) is not a projection method (§4.5.1.1), success relies
on existence of matrices in the feasible set of (927) having desired rank and diagonal
cardinality. In particular, the feasible set of convex problem (927) is a Fantope (94) whose
extreme points constitute the set of all normalized rank-1 matrices; among those are found
rank-1 matrices of any desired diagonal cardinality.
Convex problem (927) is neither a relaxation of cardinality problem (923); instead,
problem (927) becomes a convex equivalent to (923) at global optimality of iteration (927)
(928) (929). Because the feasible set of problem (927) contains all normalized rank-1 (§B.1)
symmetric matrices of every nonzero diagonal cardinality, a constraint too low or high in
cardinality c will not prevent solution. An optimal rank-1 solution X ⋆ , whose diagonal
cardinality is equal to cardinality of a principal eigenvector of matrix A , will produce the
least residual Frobenius norm (to within machine noise processes) in the original problem
statement (922). 2
4.56 A semidefiniteness constraint X º 0 is not required, theoretically, because positive semidefiniteness
of a rank-1 matrix is enforced by symmetry. (Theorem A.3.1.0.7)
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 301

phantom(256)

Figure 122: Shepp-Logan phantom from Matlab image processing toolbox.

4.7.0.0.13 Example. Compressive sampling of a phantom.


In Summer 2004, Candès, Romberg, & Tao [77] and Donoho [137] released papers on
perfect signal reconstruction from samples that stand in violation of Shannon’s classical
sampling theorem. These defiant signals are assumed sparse inherently or under some
sparsifying affine transformation. Essentially, they proposed sparse sampling theorems
asserting average sample rate less than Shannon’s and independent of signal bandwidth:
minimum sampling rate:
ˆ of Ω -bandlimited signal: 2Ω ([322, §3.2] Shannon)
ˆ of k-sparse length-n signal: k log2 (1+ n/k) (Figure 115 Candès/Donoho)
In essence, Candès and Donoho provide conditions under which 1-norm minimization
provides solution equivalent, with high probability, to 0-norm (cardinality) minimization.
Certainly, much was already known about nonuniform or random sampling [38] [293]
and about subsampling or multirate systems [100] [404]. Vetterli, Marziliano, & Blu
[413] had congealed a theory of noiseless signal reconstruction, in May 2001, from
samples that violate the Shannon rate. [434, Sampling Sparsity] They anticipated the
sparsifying transform by recognizing: it is the innovation (onset) of functions constituting
a (not necessarily bandlimited) signal that determines minimum sampling rate for perfect
reconstruction. Average onset (sparsity), Vetterli et alii call, the rate of innovation.
Vector inner-products that Candès/Donoho call samples or measurements, Vetterli
calls projections. From those projections Vetterli demonstrates reconstruction (by
digital signal processing and “root finding”) of a Dirac comb, the very same
prototypical signal from which Candès probabilistically derives minimum sampling
rate [Compressive Sampling and Frontiers in Signal Processing, University of Minnesota,
June 6, 2007]. Combining their terminology, we paraphrase a sparse sampling theorem:
ˆ Minimum sampling rate, asserted by Candès/Donoho, ∝ Vetterli’s rate of innovation
(a.k.a: information rate, degrees of freedom [ibidem, June 5, 2007]).
What distinguishes these researchers are their methods of reconstruction.
Properties of 1-norm were also well understood by June 2004 finding application in
deconvolution of linear systems [92], regularized linear regression (Lasso) [394] [361], and
basis pursuit [85] [86] [245]. But never before had there been a formalized and rigorous
sense that perfect reconstruction were possible by convex optimization of 1-norm when
302 CHAPTER 4. SEMIDEFINITE PROGRAMMING

information lost in a subsampling process became nonrecoverable by classical methods.


Donoho named this discovery compressed sensing to describe a nonadaptive perfect
reconstruction method by means of linear programming. By the time Candès’ and
Donoho’s landmark papers were finally published by IEEE in 2006, compressed sensing was
old news that had spawned intense research which still persists; notably, from prominent
members of the wavelet community.
Reconstruction of the Shepp-Logan phantom (Figure 122), from a severely aliased
image (Figure 124) obtained by Magnetic Resonance Imaging (MRI), was the impetus
driving Candès’ quest for a sparse sampling theorem. He realized that line segments
appearing in the aliased image were regions of high total variation. There is great
motivation, in the medical community, to apply compressed sensing to MRI because it
translates to reduced scan-time which brings great technological and physiological benefits.
MRI is now about 35 years old, beginning in 1973 with Nobel laureate Paul Lauterbur
from Stony Brook USA. There has been much progress in MRI and compressed sensing
since 2004, but there have also been indications of 1-norm abandonment (indigenous to
reconstruction by compressed sensing) in favor of criteria closer to 0-norm because of
a correspondingly smaller number of measurements required to accurately reconstruct a
sparse signal:4.57
5481 complex samples (22 radial lines, ≈ 256 complex samples per) were required in
June 2004 to reconstruct a noiseless 256×256 -pixel Shepp-Logan phantom by 1-norm
minimization of an image-gradient integral estimate called total variation; id est, 8.4%
subsampling of 65536 data. [77, §1.1] [76, §3.2] It was soon discovered that reconstruction
of the Shepp-Logan phantom were possible with only 2521 complex samples (10 radial
lines, Figure 123); 3.8% subsampled data input to a (nonconvex) 21 -norm total-variation
minimization. [83, §IIIA] The closer to 0-norm, the fewer the samples required for perfect
reconstruction.
Passage of a few years witnessed an algorithmic speedup and dramatic reduction
in minimum number of samples required for perfect reconstruction of the noiseless
Shepp-Logan phantom. But minimization of total variation is ideally suited to recovery of
any piecewise-constant image, like a phantom, because gradient of such images is highly
sparse by design.
There is no inherent characteristic of real-life MRI images that would make reasonable
an expectation of sparse gradient. Sparsification of a discrete image-gradient tends to
preserve edges. Then minimization of total variation seeks an image having fewest edges.
There is no deeper theoretical foundation than that. When applied to human brain scan or
angiogram, with as much as 20% of 256×256 Fourier samples, we have observed4.58 a 30dB
image/reconstruction-error ratio4.59 barrier that seems impenetrable by the total-variation
objective. Total-variation minimization has met with moderate success, in retrospect,
only because some medical images are moderately piecewise-constant signals. One simply
hopes a reconstruction, that is in some sense equal to a known subset of samples and
whose gradient is most sparse, is that unique image we seek.4.60
4.57 Efficient techniques continually emerge urging 1-norm criteria abandonment; [89] [403] [402, §IID] e.g,
five techniques for compressed sensing are compared in [39] demonstrating that 1-norm performance limits
for cardinality minimization can be reliably exceeded.
4.58 Experiments with real-life images were performed by Christine S. W. Law at Lucas Center for Imaging,

Stanford University.
4.59 Noise considered here is due only to the reconstruction process itself; id est, noise in excess of that

produced by the best reconstruction of an image from a complete set of samples in the sense of Shannon.
At less than 30dB image/error, artifacts generally remain visible to the naked eye. We estimate that
about 50dB is required to eliminate noticeable distortion in a visual A/B comparison.
4.60 In vascular radiology, diagnoses are almost exclusively based on morphology of vessels and, in

particular, presence of stenoses. There is a compelling argument for total-variation reconstruction of


magnetic resonance angiogram because it helps isolate structures of particular interest.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 303

The total-variation objective, operating on an image, is expressible as norm of a linear


transformation (948). It is natural to ask whether there exist other sparsifying transforms
that might break the real-life 30dB barrier (any sampling pattern @20% 256×256 data)
in MRI. There has been much research into application of wavelets, discrete cosine
transform (DCT), randomized orthogonal bases, splines, etcetera, but with suspiciously
little focus on objective measures like image/error or illustration of difference images; the
predominant basis of comparison instead being subjectively visual (Duensing & Huang,
ISMRM Toronto 2008).4.61 Despite choice of transform, there seems yet to have been a
breakthrough of the 30dB barrier. Application of compressed sensing to MRI, therefore,
remains fertile in 2008 for continued research.

regularized form of compressed sensing in imaging


We now repeat Candès’ image reconstruction experiment from 2004 which led to discovery
of sparse sampling theorems. [77, §1.2] But we achieve perfect reconstruction with an
algorithm based on vanishing gradient of a compressed sensing problem’s regularization,
which is computationally efficient. Our contraction method (p.307) is fast also because
matrix multiplications are replaced by fast Fourier transform, and number of constraints is
cut in half by sampling symmetrically. Convex iteration for cardinality minimization (§4.6)
is incorporated which allows perfect reconstruction of a phantom at 4.1% subsampling
rate; 50% Candès’ rate. By making neighboring-pixel selection adaptive, convex iteration
reduces discrete image-gradient sparsity of the Shepp-Logan phantom to 1.9% ; 33% lower
than previously reported.
We demonstrate application of discrete image-gradient sparsification to the
n× n = 256× 256 Shepp-Logan phantom, simulating idealized acquisition of MRI data by
radial sampling in the Fourier domain (Figure 123).4.62 Define a Nyquist-centric discrete
Fourier transform (DFT) matrix
1 1 1 1 ··· 1
 
− 2π/n − 4π/n − 6π/n −(n−1)2π/n
 1 e e e ··· e 
− 4π/n − 8π/n −12π/n −(n−1)4π/n 
· · ·

 1 e e e e  1 n×n
F , − 6π/n −12π/n −18π/n −(n−1)6π/n √ ∈ C

 1
 . e e e · · · e n
.. .. .. .. ..
 ..

. . . . . 
−(n−1)2π/n −(n−1)4π/n −(n−1)6π/n −(n−1)2 2π/n (930)
1 e e e ··· e
a symmetric (nonHermitian) unitary matrix characterized
F = FT
−1 (931)
F = FH

Denoting an unknown image U ∈ Rn×n , its two-dimensional discrete Fourier transform F


is
F(U) , F UF (932)
hence the inverse discrete transform

U = F H F(U)F H (933)
4.61 Ihave never calculated the PSNR of these reconstructed images [of Barbara]. −Jean-Luc Starck
The sparsity of the image is the percentage of transform coefficients sufficient for diagnostic-quality
reconstruction. Of course the term “diagnostic quality” is subjective. . . . I have yet to see an “objective”
measure of image quality. Difference images, in my experience, definitely do not tell the whole story.
Often I would show people some of my results and get mixed responses, but when I add artificial Gaussian
noise to an image, often people say that it looks better. −Michael Lustig
4.62 k-space is conventional acquisition terminology indicating domain of the continuous raw data provided

by an MRI machine. An image is reconstructed by inverse discrete Fourier transform of that data
interpolated on a Cartesian grid in two dimensions.
304 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Figure 123: MRI radial sampling pattern, in DC-centric Fourier domain, representing 4.1%
(10 lines) subsampled data. Only half of these complex samples, in any halfspace about
the origin in theory, need be acquired for a real image because of conjugate symmetry.
Due to MRI machine imperfections, samples are generally taken over full extent of each
radial line segment. MRI acquisition time is proportional to number of lines.

From §A.1.1 no.33 we have a vectorized two-dimensional DFT via Kronecker product ⊗

vec F(U) , (F ⊗F ) vec U (934)

and from (933) its inverse [194, p.24]

vec U = (F H ⊗ F H )(F ⊗F ) vec U = (F HF ⊗ F HF ) vec U (935)

Idealized radial sampling in the Fourier domain can be simulated by Hadamard product
◦ with a binary mask Φ ∈ Rn×n whose nonzero entries could, for example, correspond
with the radial line segments in Figure 123. To make the mask Nyquist-centric, like DFT
matrix F , define a circulant [197] symmetric permutation matrix4.63
· ¸
0 I
Θ , ∈ Sn (936)
I 0

Then given subsampled Fourier domain (MRI k-space) measurements in incomplete


K ∈ C n×n , we might constrain F(U) thus:

ΘΦΘ ◦ F UF = K (937)
and in vector form, (44) (1983)

δ(vec ΘΦΘ)(F ⊗F ) vec U = vec K (938)

Because measurements K are complex, there are actually twice the number of equality
constraints as there are measurements.
We can cut that number of constraints in half via vertical and horizontal mask Φ
symmetry which forces the imaginary inverse transform to 0 : The inverse subsampled
transform in matrix form is

F H (ΘΦΘ ◦ F UF )F H = F HKF H (939)


4.63 Matlab fftshift()
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 305

vec−1 f

Figure 124: Aliasing of Shepp-Logan phantom in Figure 122 resulting from k-space
subsampling pattern in Figure 123. This image is real because binary mask Φ is vertically
and horizontally symmetric. It is remarkable that the phantom can be reconstructed, by
convex iteration, given only U 0 = vec−1 f .

and in vector form

(F H ⊗ F H )δ(vec ΘΦΘ)(F ⊗F ) vec U = (F H ⊗ F H ) vec K (940)

later abbreviated
P vec U = f (941)
where 2
× n2
P , (F H ⊗ F H )δ(vec ΘΦΘ)(F ⊗F ) ∈ C n (942)
2
Because of idempotence P = P , P is a projection matrix. Because of its Hermitian
symmetry [194, p.24]

P = (F H ⊗ F H )δ(vec ΘΦΘ)(F ⊗F ) = (F ⊗F )H δ(vec ΘΦΘ)(F H ⊗ F H )H = P H (943)

P is an orthogonal projector.4.64 P vec U is real when P is real; id est, when for positive
even integer n · ¸
Φ11 Φ(1 , 2 : n)Ξ
Φ= ∈ Rn×n (944)
Ξ Φ(2 : n , 1) Ξ Φ(2 : n , 2 : n)Ξ
where Ξ ∈ S n−1 is the order-reversing permutation matrix (1920). In words, this necessary
and sufficient condition on Φ (for a real inverse subsampled transform [322, p.53]) demands
vertical symmetry about row n2 +1 and horizontal symmetry4.65 about column n2 +1.
Define  
1 0 0
 
 −1 1 0 
 
 . . 
 −1 1 . 
∆ ,  ∈ Rn×n (945)
 
 . .
.. .. .. . 
 
 .. 

 . 1 0 

0T −1 1
4.64 (942) is a diagonalization of matrix P whose binary eigenvalues are δ(vec ΘΦΘ) while the
corresponding eigenvectors constitute the columns of unitary matrix F H ⊗ F H .
4.65 This condition on Φ applies to both DC- and Nyquist-centric DFT matrices.
306 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Express an image-gradient estimate




U∆
 U ∆T  4n×n
∇U , 
 ∆U ∈ R
 (946)
∆T U
that is a simple first-order difference of neighboring pixels (Figure 125) to the right, left,
2 2
above, and below.4.66 By §A.1.1 no.33, its vectorization: for Ψi ∈ Rn × n
 T   
∆ ⊗I Ψ1
 ∆⊗I   ΨT  4n2
 1 
vec ∇U =   I ⊗ ∆ vec U ,  Ψ2 vec U , Ψ vec U ∈ R (947)

I ⊗ ∆T ΨT2
2 2
where Ψ ∈ R4n ×n . A total-variation minimization for reconstructing MRI image U ,
that is known suboptimal [239] [78], may be concisely posed
minimize kΨ vec Uk1
U (948)
subject to P vec U = f
where 2
f = (F H ⊗ F H ) vec K ∈ C n (949)
is the known inverse subsampled Fourier data (a vectorized aliased image, Figure 124),
and where a norm of discrete image-gradient ∇U is equivalently expressed as norm of a
linear transformation Ψ vec U .
Although this simple problem statement (948) is equivalent to a linear program (§3.2),
its numerical solution is beyond the capability of even the most highly regarded of
contemporary commercial solvers.4.67 Our recourse is to recast the problem in regularized
form and write customized code to solve it:
minimize h|Ψ vec U| , yi
U (a)
subject to P vec U = f
≡ (950)
minimize h|Ψ vec U| , yi + 12 λkP vec U − f k22 (b)
U
where multiobjective parameter λ ∈ R+ is quite large (λ ≈1E8) so as to enforce the equality
constraint: P vec U −f = 0 ⇔ kP vec U −f k22 = 0 (§A.7.1). We introduce a direction
4n2
vector y ∈ R+ as part of a convex iteration (§4.6.3) to overcome that known suboptimal
minimization of discrete image-gradient cardinality: id est, there exists a vector y ⋆ with
entries yi⋆ ∈ {0, 1} such that
minimize kΨ vec Uk0 1
U ≡ minimize h|Ψ vec U| , y ⋆ i + 2 λkP vec U − f k22 (951)
subject to P vec U = f U

Existence of such a y ⋆ , complementary to an optimal vector Ψ vec U ⋆ , is obvious by


definition of global optimality h|Ψ vec U ⋆ | , y ⋆ i = 0 (850) under which a cardinality-c
optimal objective kΨ vec U ⋆ k0 is assumed to exist.
4.66 There is significant improvement in reconstruction quality by augmentation of a nominally two-point
discrete image-gradient estimate to four points per pixel by inclusion of two polar directions. Improvement
is due to centering; symmetry of discrete differences about a central pixel. We find small improvement on
real-life images, ≈ 1dB empirically, by further augmentation with diagonally adjacent pixel differences.
4.67 for images as small as 128×128 pixels. Obstacle to numerical solution is not a computer resource:

e.g, execution time, memory. The obstacle is, in fact, inadequate numerical precision. Even when all
dependent equality constraints are manually removed, the best commercial solvers fail simply because
computer numerics become nonsense; id est, numerical errors enter significant digits and the algorithm
exits prematurely, loops indefinitely, or produces an infeasible solution.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 307

Figure 125: Neighboring-pixel stencil [403] for image-gradient estimation on Cartesian


grid. Implementation selects adaptively from darkest four • about central. Continuous
image-gradient from two pixels holds only in a limit. For discrete differences, better
practical estimates are obtained when centered.

Because (950b) is an unconstrained convex problem, a zero objective function gradient


is necessary and sufficient for optimality (§2.13.3); id est, (§D.2.1)
ΨT δ(y) sgn(Ψ vec U ) + λP H (P vec U − f ) = 0 (952)
Because of P idempotence and Hermitian symmetry and sgn() definition (p.641), this is
equivalent to
lim ΨT δ(y)δ(|Ψ vec U| + ǫ1)−1 Ψ + λP vec U = λP f
¡ ¢
(953)
ǫ→0

where small positive constant ǫ ∈ R+ has been introduced for invertibility. Speaking
more analytically, introduction of ǫ serves to uniquely define the objective’s gradient
everywhere in the function domain; id est, it transforms absolute value in (950b) from a
function differentiable almost everywhere into a differentiable function. An example of
such a transformation in one dimension is illustrated in Figure 126. When small enough
for practical purposes4.68 (ǫ ≈1E-3), we may ignore the limiting operation. Then the
mapping, for 0 ¹ y ¹ 1
¢−1
vec U t+1 = ΨT δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP
¡
λP f (954)
is a contraction in U t that can be solved recursively in t for its unique fixed point; id est,
until U t+1 → U t . [259, p.300] [234, p.155] Calculating this inversion directly is not possible
for large matrices on contemporary computers because of numerical precision, so instead
we apply the conjugate gradient method of solution to
Ψ δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP vec U t+1 = λP f
¡ T ¢
(955)
which is linear in U t+1 at each recursion in the Matlab program [419].4.69
4.68 We are looking for at least 50dB image/error ratio from only 4.1% subsampled data (10 radial lines in
k-space). With this setting of ǫ , we actually attain in excess of 100dB from a simple Matlab program in
about a minute on a 2006 vintage laptop Core 2 CPU (Intel [email protected], 666MHz FSB). By trading
execution time and treating discrete image-gradient cardinality as a known quantity for this phantom,
over 160dB is achievable.
4.69 Conjugate gradient method requires positive definiteness. [177, §4.8.3.2]
308 CHAPTER 4. SEMIDEFINITE PROGRAMMING

R x y
−1 |y|+ǫ
dy

|x|

Figure 126: Real absolute value function f2 (x) = |x| on x ∈ [−1, 1] (from Figure 72b)
superimposed upon integral of its derivative at ǫ = 0.05 which smooths objective function.

Observe that P (942), in the equality constraint from problem (950a), is not a
wide matrix.4.70 Although number of Fourier samples taken is equal to the number
of nonzero entries in binary mask Φ , matrix P is square but never actually formed
during computation. Rather, a two-dimensional fast Fourier transform of U is computed
followed by masking with ΘΦΘ and then an inverse fast Fourier transform. This technique
significantly reduces memory requirements and, together with contraction method of
solution, is the principal reason for relatively fast computation.

convex iteration
By convex iteration we mean alternation of solution to (950a) and (956) until convergence.
Direction vector y is initialized to 1 until the first fixed point is found; which means, the
contraction recursion begins calculating a (1-norm) solution U ⋆ to (948) via problem
(950b). Once U ⋆ is found, vector y is updated according to an estimate of discrete
2
image-gradient cardinality c : Sum of the 4n2 − c smallest entries of |Ψ vec U ⋆ | ∈ R4n is
the optimal objective value from a linear program, for 0 ≤ c ≤ 4n2 − 1 (536)

2
4n
π(|Ψ vec U ⋆ |)i = h|Ψ vec U ⋆ | , yi
P
minimize
2
i=c+1 y ∈ R4n
(956)
subject to 0¹y¹1
y T 1 = 4n2 − c

where π is the nonlinear permutation-operator sorting its vector argument into


nonincreasing order. An optimal solution y to (956), that is an extreme point of its feasible
set, is known in closed form: it has 1 in each entry corresponding to the 4n2 − c smallest
entries of |Ψ vec U ⋆ | and has 0 elsewhere. −p.275 Updated image U ⋆ is assigned to U t ,
the contraction is recomputed solving (950b), direction vector y is updated again, and so
on until convergence which is guaranteed by virtue of a monotonically nonincreasing real
sequence of objective values in (950a) and (956).
There are two features that distinguish problem formulation (950b) and our particular
implementation of it [419, Matlab code]:

4.70 Wide is typical of compressed sensing problems; e.g, [76] [83].


4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 309

1) An image-gradient estimate may engage any combination of four adjacent pixels.


In other words, the algorithm is not locked into a four-point gradient estimate
(Figure 125); number of points constituting an estimate is directly determined by
direction vector y .4.71 Indeed, we find only c = 5092 zero entries in y ⋆ for the
Shepp-Logan phantom; meaning, discrete image-gradient sparsity is actually closer
to 1.9% than the 3% reported elsewhere; e.g, [402, §IIB].
2) Numerical precision of the fixed point of contraction (954) (≈1E-2 for perfect
reconstruction @−103dB error) is a parameter to the implementation; meaning,
direction vector y is updated after contraction begins but prior to its culmination.
Impact of this idiosyncrasy tends toward simultaneous optimization in variables U
and y while insuring y settles on a boundary point of its feasible set (nonnegative
hypercube slice) in (956) at every iteration; for only a boundary point4.72 can yield
the sum of smallest entries in |Ψ vec U ⋆ |.

Perfect reconstruction of the Shepp-Logan phantom (at 103dB image/error) is achieved


in a Matlab minute with 4.1% subsampled data (2671 complex samples); well below an
11% least lower bound predicted by the sparse sampling theorem. Because reconstruction
approaches optimal solution to a 0-norm problem, minimum number of Fourier-domain
samples is bounded below by cardinality of discrete image-gradient at 1.9%. 2

4.7.0.0.14 Exercise. Contraction operator.


Determine conditions on λ and ǫ under which ΨT δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP from
(955) is positive definite and (954) is a contraction. H

4.7.0.0.15 Example. Eternity II.


A tessellation puzzle game, playable by children, commenced world-wide in July 2007;
introduced in London by Christopher Walter Monckton, 3rd Viscount Monckton of
Brenchley. Called Eternity II, its name derives from an estimate of time that would pass
while trying all allowable tilings of puzzle pieces before obtaining a complete solution. By
the end of 2008, a complete solution had not yet been √ found
√ although a $10,000 USD
prize was awarded for a high score 467 (out of 480 = 2 M ( M − 1)) obtained by heuristic
methods.4.73 No prize was awarded for 2009 and 2010. Game-rules state that a $2M prize
would be awarded to the first person who completely solves the puzzle before December 31,
2010, but the prize went unclaimed and solution remains yet to be found.
The full game comprises M = 256 square pieces and 16 ×16 gridded board (Figure 128)
whose complete tessellation is considered NP-hard.4.74 [388] [120] A player may tile, retile,
and rotate pieces, indexed 1 through 256, in any order face-up on the square board. Pieces
are immutable in the sense that each is characterized by four colors (and their uniquely
associated British symbols), one at each edge, which are not necessarily the same per
piece or from piece to piece; id est, different pieces may or may not have some edge-colors
in common. There are L = 22 distinct edge-colors plus a solid grey. The object of the
game is to completely tile the board with pieces whose touching edges have identical color.
Boundary of the board must be colored grey.
4.71 This adaptive gradient was not contrived. It is an artifact of the convex iteration method for minimal
cardinality solution; in this case, cardinality minimization of a discrete image-gradient.
4.72 Simultaneous optimization of these two variables U and y should never be a pinnacle of aspiration;

for then, optimal y might not attain a boundary point.


4.73 That score means all but a few of the 256 pieces had been placed successfully (including the mandatory

piece). Although distance between 467 to 480 is relatively small, there is apparently vast distance to a
solution because no complete solution followed in 2009.
4.74 Even so, combinatorial-intensity brute-force backtracking methods can solve similar puzzles in minutes

given M = 196 pieces on a 14×14 test board; as demonstrated by Yannick Kirschhoffer. There is a steep
rise in level of difficulty going to a 15×15 board.
310 CHAPTER 4. SEMIDEFINITE PROGRAMMING

1 5 9 13

(a) pieces 2 6 10 14

3 7 11 15

4 8 12 16

13 4 16 5

3 2 10 11
(b) one solution
12 14 6 8

1 15 7 9

e1 e3

(c) colors

e2 e4

Figure 127: Eternity II is a board game in the puzzle genre. (a) Shown are all of the
16 puzzle pieces (indexed as in the tableau alongside) from a scaled-down computerized
demonstration game version from the TOMY website. Puzzle pieces are square and
partitioned into four colors (with associated symbols). Pieces may be moved, removed,
and rotated at random on a 4×4 board. (b) Illustrated is one complete solution to
this puzzle whose solution is not unique. The piece, whose border is lightly outlined, was
placed last in this realization. There is no mandatory piece placement, as for the full game,
except the grey board-boundary. Solution time for a human is typically on the order of a
minute. (c) This puzzle has four colors, indexed 1 through 4 ; grey corresponds to 0.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 311

full-game rules
1) Any puzzle piece may be rotated face-up in quadrature and placed or replaced on
the square board.

2) Only one piece may occupy any particular cell on the board.

3) All adjacent pieces must match in color (and symbol) at their touching edges.

4) Solid grey edges must appear all along the board’s boundary.

5) One mandatory piece (numbered 139) must have a predetermined rotation in a


predetermined cell (number 121) on the board (Figure 128).

6) The board must be tiled completely (covered ).

A scaled-down demonstration version of the game is illustrated in Figure 127.


Differences between the full game (Figure 128) and scaled-down game are: number of
edge-colors L (22 versus 4, ignoring solid grey), number of pieces M (256 versus 16), and a
single mandatory piece placement interior to the board for the full game. The scaled-down
game has four distinct edge-colors, plus a solid grey, whose coding is illustrated in
Figure 127c.

ˆ For the full game board, there


√ are L√= 22 distinct edge-colors and M = 256 puzzle
pieces with board-dimension M × M = 16 ×16.

ˆ For the scaled-down demonstration game board, there


√ are√L = 4 distinct edge-colors
and M = 16 puzzle pieces with board-dimension M × M = 4×4.

Euclidean distance intractability


If each square puzzle piece were characterized by four points in quadrature, one point
representing board coordinates and color per edge, then Euclidean distance geometry
would be suitable for solving this puzzle. Since all interpoint distances per piece are
known, this game may be regarded as a Euclidean distance matrix completion problem4.75
in EDM 4M . Because distance information provides for reconstruction of point position to
within an isometry (§5.5), piece translation and rotation are isometric transformations that
abide by rules of the game.4.76 Convex constraints can be devised to prevent puzzle-piece
reflection and to quantize rotation such that piece-edges stay aligned with the board
boundary. (§5.5.2.0.1)
But manipulating such a large EDM is too numerically difficult for contemporary
general-purpose semidefinite program (SDP) solvers which incorporate interior-point
methods; indeed, they are hard-pressed to find a solution for variable matrices of dimension
as small as 100. Our challenge, therefore, is to express this game’s rules as constraints in a
convex and numerically tractable way so as to find one solution from a googol of possible
combinations.4.77
4.75 (§6.7) This EDM would have a block-diagonal structure of known entries were edge-points ordered
sequentially with piece number.
4.76 Translation occurs when a piece moves on the board in Figure 128, rotation occurs when colors are

aligned with an adjacent piece.


4.77 Oliver Riordan asserts that at least one solution exists; I suspect there is only one solution although

Monckton insists they number in the thousands. Ignoring board-boundary constraints and the full game’s
single mandatory piece placement, a loose upper bound on number of combinations is M ! 4M = 256! 4256 .
That number gets further loosened: 150638!/(256!(150638−256)!) after presolving Eternity II (984).
312 CHAPTER 4. SEMIDEFINITE PROGRAMMING
17 33 49 65 81 97 113 129 145 161 177 193 209 225

1 241

2 242
3 243
4 244

5 245

6 246

7 247

8 248

9 249

10 250

11 251

12 252

13 253

14 254

15 255

16 256
32 48 64 80 96 112 128 144 160 176 192 208 224 240

Figure 128: Eternity II full-game board (16 ×16 , M = 256 , L = 22) illustrating boundary
cell numbers. Grid facilitates piece placement within unit-square cell; one piece per cell.
Cell 121 (shaded) holds mandatory puzzle-piece P139 designated by Monckton.

piece P permutation Ξ rotation Π strategy


To each puzzle piece, from a given set of M pieces {Pi , i = 1 . . . M } , assign an index
i representing a unique piece-number. Each square piece is characterized by four given
colors, in quadrature, corresponding to its four edges. Each color pij ∈ RL is represented
by eℓ ∈ RL an L-dimensional standard basis vector or 0 if grey. These four edge-colors are
represented in a 4 ×L-dimensional matrix; one matrix per piece

pT
 
i1
 pT 
Pi ,  i2  ∈ R4×L , i=1 . . . M (957)
 pT
i3

pT
i4

In other words, each distinct nongrey color is assigned a unique corresponding index
ℓ ∈ {1 . . . L} identifying a standard basis vector eℓ ∈ RL (Figure 127c) that becomes a
vector pij ∈ {e1 . . . eL , 0} ⊂ RL constituting matrix Pi representing a particular piece.
Rows {pT ij , j = 1 . . . 4} of Pi are ordered counterclockwise as in Figure 129. Color data is
given in Figure 130 for the demonstration game board. Then matrix Pi describes the i th
piece, excepting its rotation and position on the board.
Our intent is to show how to vectorize the board, with respect to whole pieces, and
then express Eternity II as a very hard combinatorial objective with linear constraints:
All pieces are initially placed in order of their given index i assigned by Monckton. The
vectorized game-board has initial state represented within a matrix
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 313

p62
pT
 
61
 pT  4×L
p63 p61 62 
P6 = 
 pT
63
∈ R
p64 pT
64

Figure 129: Demo-game piece P6 illustrating edge-color • p6j ∈ RL counterclockwise


ordering in j beginning from right. For all game boards, edge-color index j = 1 . . . 4.

P1
 
.
P ,  ..  ∈ R4M ×L (958)
PM

enumerated in Figure 130 for the demonstration game. Moving pieces all at once about
the square board corresponds to permuting pieces Pi on the vectorized board represented
by matrix P , while rotating the i th piece is equivalent to circularly shifting row indices of
Pi (rowwise permutation). This permutation problem, as stated, is doubly combinatorial
(M ! 4M combinations) because we must find a permutation of pieces (M !)

Ξ ∈ RM ×M (959)

and quadrature rotation Πi ∈ R4×4 of each individual piece (4M ) that solve the puzzle;

Π1 P1
 
..  ∈ R4M ×L
(Ξ ⊗ I4 )Π P = (Ξ ⊗ I4 ) . (960)
ΠM PM
where
    

 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 

0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0
 
Πi ∈ {π1 , π2 , π3 , π4 } , 
0 0
,  ,  ,   (961)

 1 00 0 0 11 0 0 00 1 0 0 

0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0
 

 
Π1 0
Π ,
 .. ∈ R
 4M ×4M
(962)
.
0 ΠM

and where I4 , I ∈ S4 and π1 = I4 . Initial game-board state P (958) corresponds to


Ξ = I and Πi = π1 ∀ i . Circulant [197] permutation matrices {π1 , π2 , π3 , π4 } ⊂ R4×4
correspond to clockwise piece-rotations {0◦ , 90◦ , 180◦ , 270◦ }.

piece edge adjacency ∆


Rules of the game dictate that adjacent pieces on the square board have colors that
match at their touching edges as in Figure 127b.4.78 A complete match is therefore
equivalent
√ √ to demanding that a constraint, comprising numeric color differences between
2 M ( M − 1) touching edges, vanish. Because vectorized board layout is fixed and its
cells are loaded or reloaded with pieces during play, locations of adjacent edges in R4M ×L
(960) are known a priori. We need simply form differences between colors from adjacent
edges of pieces loaded into those known locations. Each difference may be represented
4.78 Piece adjacencies on the square board map linearly to the vectorized board, of course.
314 CHAPTER 4. SEMIDEFINITE PROGRAMMING

P1 [ e3 0 0 e1 ]T

P2 [ e2 e4 e4 e4 ]T

P3 [ e2 e1 0 e1 ]T

P4 [ e4 e1 0 e1 ]T

P5 [0 0 e3 e1 ]T

P6 [ e2 e2 e4 e2 ]T

P7 [ e2 e3 0 e3 ]T

P8 [ e4 e3 0 e3 ]T

P9 [0 e3 e3 0 ]T

P10 [ e2 e2 e4 e4 ]T

P11 [ e2 e3 0 e1 ]T

P12 [ e4 e1 0 e3 ]T

P13 [0 e1 e1 0 ]T

P14 [ e2 e2 e4 e4 ]T

P15 [ e2 e1 0 e3 ]T

P16 [ e4 e3 0 e1 ]T

Figure 130: Vectorized demo-game board illustrating M = 16 matrices in R4×L describing


initial state P ∈ R4M ×L of puzzle pieces; four colors per puzzle-piece (Figure 129), L = 4
colors total in game (Figure 127c). Standard basis vectors eℓ in RL represent color so that
color difference measurement remains unweighted.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 315

Figure 131: All pieces in their initial state on vectorized demo-game board. Line segments
indicate differences ∆ (965), ° indicate edges on board boundary β (967). Entries are
indices ℓ identifying standard basis vectors eℓ ∈ RL from Figure 130.

√ √ vector ∆i , whose entries belong to {−1, 0, 1} , from a set


by a constant cardinality-2
{∆i ∈ R4M , i = 1 . . . 2 M ( M − 1)}. Defining sparse constant wide matrix

∆T1
 
√ √
.. 2 M ( M −1)×4M
∆ , ∈ R (963)
 
.
∆T2√M (√M −1)

then the desired constraint is


√ √
M ( M −1)×L
∆(Ξ ⊗ I4 )Π P = 0 ∈ R2 (964)

For the demonstration game, the first twelve entries of ∆ correspond to blue line segments
(leftmost) in Figure 131 while the twelve remaining entries correspond to red lines: for
ei ∈ R64
316 CHAPTER 4. SEMIDEFINITE PROGRAMMING

eT T
 
1 − e19
 T
e5 − eT23

eT T
 
9 − e27
 
e13 − eT
T
 
 31

eT T
 
17 − e35
 
e21 − eT
T
 
 39

eT T
 
25 − e43
 
e29 − eT
T
 
 47

eT T
 
33 − e51
 

eT T


 37 − e 55



 eT
41 − e T
59


 eT
45 − e T
63  ∈ R24×64

∆= (965)

 eT4 − e T
6



 eT
8 − eT
10



 eT T
12 − e14



 eT
20 − e T
22



 eT
24 − e T
26



 eT
28 − e T
30



 eT
36 − e T
38



 eT
40 − e T
42



 eT
44 − e T
46



 e52 − eT
T
54


 eT T
56 − e58

e60 − eT
T
62

game board boundary β √


Boundary of the square board must be colored grey. This means there are 4 M boundary
locations in R4M ×L (960) that must have value 0T . Because (Ξ ⊗ I4 )Π P ≥ 0 , these may
all be lumped into one equality constraint
β T(Ξ ⊗ I4 )Π P 1 = 0 (966)
where β ∈ R4M is a sparse vector constant having entries in {0, 1} complementary to the
√ +
known 4 M zeros. For the demonstration game board Figure 131, for example,
β = [0110001000100011010000000000000101000000000000011100100010001001]T ∈ R64 (967)

consolidating permutations Φ
By defining
Φ , (Ξ ⊗ I4 )Π ∈ R4M ×4M (968)
this square matrix becomes a structured permutation matrix replacing the product of
permutation matrices. Then puzzle piece edge adjacency constraint (964) becomes
√ √
M ( M −1)×L
∆ΦP = 0 ∈ R2 (969)
while game board boundary constraint (966) becomes
β TΦP 1 = 0 ∈ R (970)
Now partition composite permutation matrix variable Φ into 4×4 blocks
 
φ11 · · · φ1M
Φ ,  ... .. ..
∈ R
4M ×4M
(971)
 
. .
φM 1 · · · φM M
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 317

10

20
Φ⋆
30 M = 16

40

50

60

0 10 20 30 40 50 60

Figure 132: Sparsity pattern for composite permutation matrix Φ⋆∈ R4M ×4M representing
solution from Figure 127b. Each four-point cluster represents a circulant permutation
matrix from (961). Any M = 16-piece solution may be verified by the TOMY demo.

where Φ⋆ij ∈ {0, 1} because (961)

φ⋆ij ∈ {0 , π1 , π2 , π3 , π4 } ⊂ R4×4 (972)

An optimal composite permutation matrix Φ⋆ is represented pictorially in Figure 132.


Now we ask what are necessary conditions on Φ⋆ at optimality:
ˆ 4M -sparse4.79 (cardinality-1 per row or column) and nonnegativity.
ˆ Each column has one 1. Each row has one 1.
ˆ Entries along each and every diagonal of each and every 4×4 block φ⋆ij are equal.
ˆ Corner pair of 2×2 submatrices on antidiagonal of each and every 4×4 block φ⋆ij
are equal.
We want an objective function whose global optimum, when attained, certifies that the
puzzle has been solved. Then, in terms of this Φ partitioning (971), the Eternity II problem
is a minimization of cardinality with optimal objective value 8M :4.80
4M
kΦ(i , :)T k0 + kΦ(: , i)k0
P
minimize
Φ∈ R4M ×4M i=1
subject to ∆ ΦP = 0
β T ΦP 1 = 0
Φ1 = 1
(973)
ΦT 1 = 1
(IM ⊗ Rd )Φ(IM ⊗ RdT ) = (IM ⊗ Sd )Φ(IM ⊗ SdT )
(IM ⊗ Rφ )Φ(IM ⊗ SφT ) = (IM ⊗ Sφ )Φ(IM ⊗ RφT )
(e121 ⊗ I4 )T Φ (e139 ⊗ I4 ) = π3
Φ≥0
4.79 Definesparsity as ratio of number of nonzero entries to matrix-dimension product. For matrices, the
average number of nonzeros per row or column is easier to understand and likely to be small for typical
LP problems, independent of the dimensions. −Michael Saunders
4.80 A nonobvious method to transform cardinality minimization, in permutation problems, to rank

minimization is disclosed in Example 4.7.0.0.3 with reference to Figure 120.


318 CHAPTER 4. SEMIDEFINITE PROGRAMMING

which is convex in the constraints where e121 , e139 ∈ RM are members of the standard
basis representing mandatory piece P139 placement in the full game,4.81 where
   
1 0 0 0 1 0
Rd ,  1 0  ∈ R3×4 , Sd ,  0 1  ∈ R3×4 (974)
0 1 0 0 0 1
· ¸ · ¸
1 0 0 0 0 0 1 0
Rφ , ∈ R2×4 , Sφ , ∈ R2×4 (975)
0 1 0 0 0 0 0 1
and where Φ ≥ 0 denotes entrywise nonnegativity. These matrices R and S enforce
circulance.4.82 Full game mandatory-piece rotation requires equality constraint π3 .

permutation polyhedron
Constraints Φ1 = 1 and ΦT 1 = 1 and Φ ≥ 0 confine Φ to a permutation polyhedron (104)
in R4M ×4M ; which is, the convex hull of permutation matrices. The objective enforces
minimal cardinality per row and column. Slicing the permutation polyhedron, by looking
at a particular row or column subspace of Φ , looks like intersection of a 1-norm ball
with a nonnegative orthant. Cardinality 1 vectors reside at vertices of a one norm ball.
(Figure 75)4.83 Hence, the optimal objective is a sum of cardinalities 1.
Any vertex, of the permutation polyhedron, is a permutation matrix having minimal
cardinality 4M .4.84 The feasible set of problem (973) is an intersection of the polyhedron
with a number of hyperplanes. Feasible solutions exist that are not permutation matrices.
But the intersection must contain a vertex of the permutation polyhedron because a
solution Φ⋆ cannot otherwise be a permutation matrix; such a solution is presumed to
exist, so it must also be a vertex (extreme point)4.85 of the intersection.
In vectorized variable, by §A.1.1 no.33, problem (973) is equivalent to
4M
kΦ(i , :)T k0 + kΦ(: , i)k0
P
minimize
Φ∈ R4M ×4M i=1
T
subject to (P ⊗ ∆) vec Φ = 0
(P 1 ⊗ β)T vec Φ = 0
(1T
4M ⊗ I4M ) vec Φ = 1 (976)
(I4M ⊗ 1T4M ) vec Φ = 1
(IM ⊗ Rd ⊗ IM ⊗ Rd − IM ⊗ Sd ⊗ IM ⊗ Sd ) vec Φ = 0
(IM ⊗ Sφ ⊗ IM ⊗ Rφ − IM ⊗ Rφ ⊗ IM ⊗ Sφ ) vec Φ = 0
(e139 ⊗ I4 ⊗ e121 ⊗ I4 )T vec Φ = vec π3
vec Φ º 0
whose optimal objective value is 8M ; cardinality of permutation matrix Φ⋆ is 4M .
With respect to an orthant, º connotes entrywise nonnegativity (p.642). This problem is
abbreviated:
4M
kΦ(i , :)T k0 + kΦ(: , i)k0
P
minimize
Φ∈ R4M ×4M i=1 (977)
subject to E vec Φ = τ
vec Φ º 0
4.81 meaning that piece P139 (numbered 139 by Monckton) must be placed in cell 121 on the board
(Figure 128) with rotation π3 (p.313).
4.82 Since 0 is the trivial circulant matrix, application is democratic over all blocks φ .
ij
4.83 This means: each vertex of the permutation polyhedron, in isometrically isomorphic R16M 2, is

coincident with a vertex of 8M 1-norm balls in 4M -dimensional subspaces.


4.84 but maximal Frobenius norm (p.324).
4.85 Vertex means zero-dimensional exposed face (§2.6.1.0.1); intersection with a strictly supporting

hyperplane. There can be no further intersection with a feasible affine subset that would enlarge that
face; id est, a vertex of the permutation polyhedron persists in the feasible set.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 319
√ √
M −1)+8M +13M 2 ×16M 2
where E ∈ R17+2L M ( is highly sparse having 4,784,144 nonzero
entries in {−1, 0, 1}.
dim E = 864,593 × 1,048,576 (978)

ˆ Any feasible binary solution is minimal cardinality and vice versa because it is a
vertex of the feasible set. (§2.3.2.0.4)

But number of equality constraints is too large for contemporary binary solvers.4.86 So
again, we reformulate the problem:

canonical Eternity II
Because each block φij of Φ (971) is optimally circulant, comprising four permutation
matrices (972) uniquely identifiable by their first column (961), we may take as variable
every fourth column of Φ :

Φ̃ , [ Φ(: , 1) Φ(: , 5) Φ(: , 9) · · · Φ(: , 4M − 3) ] ∈ R4M ×M (979)

where Φ̃ij ∈ {0, 1}. Then, for ei ∈ R4

4M ×4M
Φ = (Φ̃⊗eT T T T
1 )+(IM ⊗π4 )(Φ̃⊗e2 )+(IM ⊗π3 )(Φ̃⊗e3 )+(IM ⊗π2 )(Φ̃⊗e4 ) ∈ R (980)

This formula describes replication (+), columnar upsampling & shifting (ei ∈ R4 ), and
rotation (πi ∈ R4×4 ) of Φ̃ . By §A.1.1 no.45 and no.46

vec Φ = (IM ⊗ e1 ⊗ I4M + IM ⊗ e2 ⊗ IM ⊗ π4 + IM ⊗ e3 ⊗ IM ⊗ π3 + IM ⊗ e4 ⊗ IM ⊗ π2 ) vec Φ̃


2
, Y vec Φ̃ ∈ R16M (981)
2 2
where Y ∈ R16M × 4M . Because three out of every four rows (per consecutive quadruple
adjacent rows of Φ̃) equal 0T , permutation polyhedron (104) demands that each quadruple
and each column sum to 1 : respectively, (IM ⊗ 1T T
4 )Φ̃1 = 1 and Φ̃ 1 = 1 where Φ̃ is now
variable and optimally binary. By substitution of columnar subsampled matrix Φ̃ (979) for
permutation matrix Φ (968), circulance constraints in R and S (which are most numerous)
may be dropped from Eternity II problem (973) because circulance of φij is built into
Φ-reconstruction formula (980). We are left with a feasibility problem equivalent to (973),
for e121 , e139 ∈ RM

find Φ̃ ∈ B4M ×M
subject to ∆ ΦP = 0
β T ΦP 1 = 0
(982)
(IM ⊗ 1T 4 )Φ̃1 = 1
Φ̃T 1 = 1
(e121 ⊗ I4 )T Φ (e139 ⊗ I4 ) = π3
√ √
where ∆ ∈ R2 M ( M −1)×4M (963) (identifying adjacent edges) is evaluated in (965), initial
piece placement P ∈ R4M ×L is defined in (958) and enumerated in Figure 130, β ∈ R4M +
defining a game board boundary in Figure 131 has corresponding value (967), and where
π3 (961) determines mandatory-piece rotation. Thus, Eternity II (976) is equivalently

4.86 Saunders’
program lusol can reduce that number to 797,508 constraints by eliminating linearly
dependent rows of matrix E , but that is not enough to overcome numerical issues with the best solvers.
320 CHAPTER 4. SEMIDEFINITE PROGRAMMING

transformed
4M M
kΦ̃(i , :)T k0 +
P P
minimize kΦ̃(: , j)k0
Φ̃∈ R4M ×M i=1 j=1
T
subject to (P ⊗ ∆)Y vec Φ̃ = 0
(P 1 ⊗ β)T Y vec Φ̃ = 0
(983)
(1T T
M ⊗ IM ⊗ 14 ) vec Φ̃ = 1
T
(IM ⊗ 14M ) vec Φ̃ = 1
(e139 ⊗ e121 ⊗ I4 )T vec Φ̃ = π3 e1
vec Φ̃ º 0

whose optimal objective value is 2M since optimal cardinality of Φ̃⋆ (with entries in {0, 1})
is M , where matrix constant Y maps subsampled Φ̃ to Φ via (981), and where e1 ∈ R4 .
In abbreviation of reformulation (983)
4M M
kΦ̃(i , :)T k0 +
P P
minimize kΦ̃(: , j)k0
Φ̃∈ R4M ×M i=1 j=1
(984)
subject to Ẽ vec Φ̃ = τ̃
vec Φ̃ º 0

number of equality constraints



is√now 11,077 ; an order of magnitude fewer constraints
2
than (977). Sparse Ẽ ∈ R5+2L M ( M −1)+2M ×4M replaces matrix E . Number of columns
has also been reduced, down from more than a million:

dim Ẽ = 11,077 × 262,144 (985)

But this dimension remains out of reach of most highly regarded academic and commercial
binary solvers; especially disappointing insofar as sparsity of Ẽ is high with 1,503,732
nonzero entries in {−1, 0, 1, 2} ; element {2} arising only in the β constraint which is soon
to disappear after presolving.

presolving: game board’s edge


Any process of discarding rows and columns, prior to numerical optimization, is called
presolving. The constraint in β , which zeroes the board at its edges, has all positive
coefficients. The zero sum means that all vec Φ̃ entries, corresponding to nonzero entries in
row vector (P 1 ⊗ β)T Y , must be zero. For the full game, this means we may immediately
eliminate 57,840 variables from 262,144. After zero-row and dependent-row (two) removal,

dim Ẽ → 10,054 × 204,304 (986)


with entries in {−1, 0, 1}.

geometric presolver: polyhedral cone theory


Eternity II problem (984) constraints are interpretable in the language of convex cones:
The columns of matrix Ẽ constitute a set of generators for a pointed (§2.12.2.2) polyhedral
cone
K = {Ẽ vec Φ̃ | vec Φ̃ º 0} (987)
Even more intriguing is the observation: vector τ̃ resides on that polyhedral cone’s
boundary.4.87 (§2.13.4.2.4) We may apply techniques from §2.13.5 to prune generators
not belonging to the smallest face of that cone, to which τ̃ belongs, because generators of
4.87 This
observation applies equally well to cones generated by both (985) and (986). And τ is on the
boundary of the polyhedral cone generated by E (978).
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 321

that smallest face must hold a minimal cardinality solution. Matrix dimension is thereby
reduced:4.88
Designate I as the set of all surviving column indices of Ẽ from 4M 2 = 262,144 columns:

I ⊂ 1 . . . 4M 2
© ª
(988)

The i th column Ẽ(: , i) of matrix Ẽ belongs to the smallest face F of K that contains τ̃ if
and only if
find vec Φ̃ ∈ Rdim I , µ ∈ R
subject to µ τ̃ − Ẽ(: , i) = Ẽ vec Φ̃ (384)
vec Φ̃ º 0
is feasible. By a transformation of Saunders, this linear feasibility problem is the same as

find vec Φ̃ ∈ Rdim I , µ ∈ R


subject to Ẽ vec Φ̃ = µ τ̃
(989)
vec Φ̃ º 0
(vec Φ̃)i ≥ 1

A minimal cardinality solution to Eternity II (984) implicitly constrains Φ̃⋆ to be binary.


So this test (989) of membership to F(K ∋ τ̃ ) may be tightened to a test of (vec Φ̃)i = 1 ;
id est, for i = 1 . . . dim I = 1 . . . 204,304 distinct linear feasibility problems

find vec Φ̃ ∈ Rdim I


subject to Ẽ vec Φ̃ = τ̃
(990)
vec Φ̃ º 0
(vec Φ̃)i = 1

whose feasible set is a proper subset of that in (989). Real variable µ can be set to 1
because if it must not be, then feasible (vec Φ̃)i = 1 could not be feasible to Eternity II
(984).
If infeasible here in (990), then the only choice remaining for (vec Φ̃)i is 0 ; meaning,
column Ẽ(: , i) may be discarded but only after all columns have been tested. This
tightened problem (990) therefore tells us two things when feasible: Ẽ(: , i) belongs to the
smallest face of K that contains τ̃ , and (vec Φ̃)i constitutes a nonzero vertex-coordinate
of permutation polyhedron (104). After presolving via this conic pruning method (with
subsequent zero-row and dependent-row removal),

dim Ẽ → 7,362 × 150,638 (991)

Entries in vec Φ̃ , corresponding to discarded columns of Ẽ , are optimally 0. But now τ̃


resides relatively interior to the polyhedral cone (987) generated by this reduction Ẽ . Its
binary nature is evident in Hu’s depiction of our reduced “A” matrix in Figure 133.

c.i. presolver Eternity II: Generators of smallest face are conically independent
Matrix Ẽ now accounts for the board’s edge and holds what remains after discard of
all generators not in the smallest face F of cone K that contains τ̃ . To further prune
4.88 Column elimination can be quite dramatic but is dependent upon problem geometry. By method of
convex cones, we will discard 53,666 more columns via Saunders’ pdco; a total of 111,506 columns will
have been removed from 262,144. Following dependent-row removal via lusol, dimension of Ẽ becomes
7,362 × 150,638.
322 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Figure 133: Directed graph of adjacency matrix for Ẽ (991) (Ẽ = A¥ in¦ [422]) representing
reduced equality constraint in Eternity II problem. “Movie” ¸ from [236] shows
realization in 3D; color corresponding to line-segment length. (Realization by Yifan Hu.)

generators relatively interior to that smallest face, we may subsequently test for conic
dependence as described in §2.10: for i = 1 . . . dim I = 1 . . . 150,638

find vec Φ̃ ∈ Rdim I


subject to Ẽ vec Φ̃ = Ẽ(: , i)
(287)
vec Φ̃ º 0
(vec Φ̃)i = 0

If feasible, then column Ẽ(: , i) is a conically dependent generator of the smallest face and
must be discarded from matrix Ẽ before proceeding with test of remaining columns.
Generators interior to a smallest face could provide a lower cardinality solution, so it
might be imprudent to prune. It turns out, for Eternity II: generators of the smallest face,
previously found via (990), comprise a minimal set; id est, (287) is never feasible; no more
columns of Ẽ can be discarded.4.89

m × dim I , dim Ẽ = 7,362 × 150,638 (991)


4.89 One cannot help but notice a binary selection of variable by tests (990) and (287): Geometrical test
(990) (smallest face) checks feasibility of vector entry 1 while geometrical test (287) (conic independence)
checks feasibility of 0. Changing 1 to 0 in (990) is always feasible for Eternity II.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 323

Figure 134: Polyhedron vertices • inscribed on sphere skeleton in R3 for visualization of


permutation matrices in abstract isomorphic space. Vertices represent matrices, of the
same dimension, all equidistant from origin. Sphere about origin represents level set at
maximum of simple quadratic xTx where vertices intersect sphere. Permutation matrices
are represented by those vertices in nonnegative orthant. If sphere expands, intersection
with polyhedron becomes empty. (Drawing by Robert Austin using Stella4D.)

Successive reductions of E and τ can be found on Wıκımization [422] in Matlab


format. 2

Incorporating more Clue Pieces, provided by Monckton, makes the Eternity II problem
harder in the sense that solution set is diminished; the target gets smaller.4.90

4.7.0.0.16 Example. Eternity II - affinity for maximization.


Reversing tack on Example 4.7.0.0.15, Eternity II optimization resembles Figure 33a (not
(b)) because variable Φ̃ is implicitly bounded above by design; 1 º vec Φ̃ by confinement
of Φ in (973) to the permutation polyhedron (104), for i = 1 . . . dim I = 1 . . . 150,638

1 = maximize (vec Φ̃)i


Φ̃
subject to Ẽ vec Φ̃ = τ̃ (992)
vec Φ̃ º 0

Unity is always attainable, by (990). By (979) this means (§4.6.1.4)

M = maximize (1 − y)T vec Φ̃ maximize kvec Φ̃kdim I


M
y(Φ̃) , Φ̃ Φ̃
subject to Ẽ vec Φ̃ = τ̃ ≡ subject to Ẽ vec Φ̃ = τ̃ (993)
vec Φ̃ º 0 vec Φ̃ º 0
where
y = 1 − ∇kvec Φ̃kdim
M
I (855)

4.90 But given the four clues provided, our geometric presolver (p.320) produces a 15% smaller face; a
total very nearly half the 262,144 columns can be proven to correspond to 0 coefficients.
324 CHAPTER 4. SEMIDEFINITE PROGRAMMING

is a direction vector from the cardinality minimization technique of convex iteration in


§4.6.1.1 and where kvec Φ̃kdim
M
I is a k-largest norm (§3.2.2.1, k = M ). When upper bound

M in (993) is met, solution vec Φ̃⋆ will be optimal for Eternity II because it must then be
a Boolean vector with minimal cardinality M .
dim I
Maximization of convex function kvec Φ̃kdim I (monotonic on R+ ) is not a convex
M
problem, though the constraints are convex. [349, §32] Geometrical visualization of this
problem formulation is clear. We therefore choose to work with a complementary
direction vector z , in what follows, in predilection for a mental picture of convex function
maximization.

complementary direction vector is optimal solution of Eternity II


Instead of solving (993), which is difficult, we propose iterating a convex problem sequence:
for 1 − y ← z
maximize z T vec Φ̃
vec Φ̃∈ Rdim I
subject to Ẽ vec Φ̃ = τ̃ (994)
vec Φ̃ º 0

maximize z T vec Φ̃⋆


z ∈ Rdim I
subject to 0¹z¹1 (537)
zT1 = M

Variable Φ̃ is implicitly bounded above at 1 by design of Ẽ . A globally optimal


complementary direction vector z ⋆ will always exactly match an optimal solution vec Φ̃⋆
for convex iteration of any problem formulated as maximization of a Boolean variable:
here we have
z ⋆T vec Φ̃⋆ , M (995)

Because z ⋆ = vec Φ̃⋆ , Eternity II can be equivalently formulated as maximization of a


convex quadratic instead:
maximize (vec Φ̃)T vec Φ̃
vec Φ̃∈ Rdim I
subject to Ẽ vec Φ̃ = τ̃ (996)
vec Φ̃ º 0
a nonconvex problem but requiring no convex iteration. The optimal objective is known:
(vec Φ̃⋆ )T vec Φ̃⋆ = kΦ̃⋆ k2F = 1T vec Φ̃⋆ = M with vec Φ̃⋆ binary and cardinality-M attained
at a vertex of the permutation polyhedron (p.318). (Figure 134)

rumination
If it were possible to form a nullspace basis Z for Ẽ , of about equal sparsity such that

vec Φ̃ = Z ξ + vec Φ̃p (119)

then a problem formulation equivalent to (996)

maximize (Z ξ + vec Φ̃p )T (Z ξ + vec Φ̃p )


ξ (997)
subject to Z ξ + vec Φ̃p º 0

might invoke optimality conditions as obtained in [228, thm.8]. 2


4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 325

Figure 135: N × N = 12×12 Chimera topology for D:Wave 1152-qubit • chip architecture
illustrating 3,360(= 16x12x12 + 2x11x4x12) couplers •−−−• by line segments between
qubits (lines cross without intersection). Coupled qubits are neighbors, but distance is
not preserved by this map. (Drawing by Diane Carr.)
326 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.8 Quantum optimization


There was a time when the newspapers said that only twelve men understood
the theory of relativity. I don’t believe there ever was such a time. . . . a lot
of people kind of understood the theory of relativity in some way or other, but
more than twelve. On the other hand, I think I can safely say that nobody
understands quantum mechanics. −Richard Feynman, 1964

A superconducting quantum annealer is the physical embodiment of an optimizer that


globally minimizes a hypersurface whose modes increase factorially with dimension.

Note that this architecture is very different from conventional computing. The
processor has no large areas of memory (cache), rather each qubit has a tiny
piece of memory of its own. In fact, the chip is architected more like a biological
brain than the common ‘Von Neumann’ architecture of a conventional silicon
processor. One can think of the qubits as being like neurons, and the couplers as
being like synapses that control the flow of information between those neurons.
−[dwavesys.com , §1.3]

A quantum annealer is unlike a von Neumann computer architecture insofar as it does not
solve equations, there are no conditionally executable instructions, one qubit (the quantum
analogue to bit) can be in the two binary states at once,4.91 and qubit values may not be
set by a programmer [102, §2]. There is no clock in a quantum annealer which operates at
a temperature colder than outer space: near 0◦ Kelvin. The first commercially available
quantum annealer was delivered in 2011.4.92 Even though its magnetic superconducting
niobium qubits are etched on a silicon substrate, a chip, the D:Wave quantum annealer is
actually the first analog computer of its kind.4.93
Ising’s spin model [47, §2.1] [354, p.297] is a measure of molecular energy for a magnetic
material, for bipolar binary s ∈ Bn± = {−1, 1}n

1
J , ssT + hh , si
­ ®
E(s) = 2 (998)

Given applied field strength h ∈ Rn and interaction field strength J ∈ Snh , a quantum
annealer minimizes this energy E which is always bounded because vector variable s
is bounded above and below.
A graph of the D:Wave N × N Chimera topology (N = 12) is represented in Figure 135;
a neighboring qubit topology. Hollow matrix J represents coupling that occurs among
physically neighboring qubits. Scalar 12 accounts for bidirectional coupling implied by J
matrix symmetry. Coupling, which is an application of entanglement in quantum physics,
can be controlled only for physically neighboring qubits. Increasing number of neighbors is
therefore of practical importance. [46] Effective coupling of distant qubits is implemented
by replicating qubits redundantly. [102, §√ 3.4] As rule of thumb, complete coupling of n
qubits (highest density J ) would leave O( n) qubits available.4.94
4.91 the qubit’s superposition state.
4.92 Itis not capable of solving Eternity II in 2016 because of qubits insufficient in number and coupling.
4.93 At present, there are two emergent technologies for harnessing quantum phenomena: adiabatic model

(analog annealer) and gate model (analogue to Boolean logic gates of digital computers).
4.94 1152-qubit architecture machines, having 3,360 physical couplers, became available in 2015 for $10M

USD. In 2016, 2048-qubit chips (6,016 couplers) were announced. If qubit growth continues following Rose’
law, we should see million-qubit chips in 2025. Given n qubits, complete coupling requires n(n−1)/2
couplers. Insufficient coupler population, not qubits, will become the bottleneck. In the near term,
innovating a three-dimensional qubit topology would accelerate ratio of coupler to qubit growth.
4.8. QUANTUM OPTIMIZATION 327

Figure 136: Chimera circuit chip layout abstract, topological dimension N = 1 illustrated.
Eight qubits comprise hollow slabs <= whereas couplers are represented by sixteen blue
discs • . Chip layout is dual to graph topology in Figure 135. Up/down arrows connote
final qubit states.

Chimera provides 8N 2 qubits having N (24N − 8) physical couplers; coupler


qubit = 3 − N
1

approaches three couplers per qubit on average, as topological dimension N increases,


although it has no complete circuit of three qubits. Because couplers are bidirectional,
each qubit effectively sees twice the coupler/qubit ratio:4.95
2
6− couplings/qubit (999)
N
For 1 ≤ N ≤ 2 , this coupling number is exact. For N > 2 , this real number should be
regarded as average number of couplings seen by a qubit. Physical layout of Chimera
reveals a duality with graph topology: In chip layout Figure 136, physical qubits are
represented by hollow slabs <= and couplers by discs • . But in the topological graph in
Figure 135, qubits are represented by discs • and couplers by line segments ------ .
The D:Wave machine performs physical, not simulated, annealing. The system is
initialized to a superposition (a 2n simultaneity) of all possible states [464, p.2] by
application of a globally transverse magnetic field [209, slide 8/45]. At its outset, the
energy hypersurface appears globally convex but settles into the Ising model after about
20µs [441] with 2015 technology.4.96 A globally optimal solution cannot be guaranteed
because present understanding of the quantum annealing process is nondeterministic. To
increase probability of finding a global solution, the same problem is sequentially executed
thousands of times on the quantum annealer. The minimum, from each run, becomes a
sample in proximity to the global minimum of binary quadratic function (998).4.97
4.95 Complete 8N 2 (8N 2−1)
coupling is impossible with current technology; it would require an 2
line-segment
2
topology: were coupler
qubit
= 8N2−1 , each qubit would effectively see 8N 2 − 1 couplings.
4.96 2011 saw 128-qubit machines with settling time at about 75µs [251].
4.97 Sampling is necessary because successive minima can be offset by as much as a few percent from the

global minimum.
328 CHAPTER 4. SEMIDEFINITE PROGRAMMING

By change of variable, for binary q ∈ Bn = {0, 1}n


s ← 2q − 1 (1000)
the resulting quadratic unconstrained binary optimization (QUBO)
J , qq T + hh − J 1 , qi
­ ®
2 minimize
n
(1001)
q∈{0, 1}

remains an equivalent energy minimization whose constant term 1T(J 1 12 − h) is ignored.


Whereas
δ(ssT ) = 1 , δ qq T = q
¡ ¢
(1002)
this latter equality in q means that QUBO (1001) is the same as
J , qq T + δ(h − J 1) , qq T = 2 minimize J + δ(h − J 1) , qq T (1003)
­ ® ­ ® ­ ®
2 minimizen n
q∈{0, 1} q∈{0, 1}

Coefficient matrix J + δ(h−J 1) can be indefinite.


To abstract problem formulation away from the machine a little more (to simplify
presentation), a QUBO shall be generalized
minimize
n
q TB q + aTq (1004)
q∈{0, 1}

where a ∈ Rn is a coefficient vector and where hollow matrix B ∈ Rn×n (comprising


quadratic coefficients) is not necessarily symmetric; its main diagonal may be assumed
0 and its lower triangular part empty.

4.8.1 quantum gap maximization by linear programming


To further increase probability of finding a globally optimal solution, the discrete gap
between optimal objective and least suboptimal objective is maximized by problem design
[46] (1006); by discriminating coefficients as in Example 4.8.2.0.1 and Example 4.8.2.0.2.
D:Wave quantum annealer coefficient quantization is coarse, encoded by application of an
external magnetic field whose resolution is about 4 or 5 bits over [−2, 2].
Because the Eternity II puzzle (§4.7.0.0.15) can be formulated as a permutation
problem, it is of interest to express a permutation polyhedron constraint (p.318). To
illustrate realization of just one row of a permutation matrix in QUBO form (1004),
consider an n-qubit vector q that is allowed to have only one nonzero; id est, a discrete
impulse (a.k.a Kronecker delta) over a vector of qubits. In other words, we need to
translate this program
find q ∈ Bn
(1005)
subject to q T 1 = 1
into a QUBO. First we analyze a three-qubit case, then generalize to n in (1008):
·
quantum impulse:
··· ···
q1 q2 q3 B12 q1 q2 + B13 q1 q3 + B23 q2 q3 + a1 q1 + a2 q2 + a3 q3
desirable
0 0 1 a3
0 1 0 a2
1 0 0 a1
undesirable
0 0 0 0
0 1 1 B23 + a2 + a3
1 0 1 B13 + a1 + a3
1 1 0 B12 + a1 + a2
1 1 1 B12 + B13 + B23 + a1 + a2 + a3
4.8. QUANTUM OPTIMIZATION 329

Coefficients B and a (1004) are selected by solution to a linear program whose undesirable
objectives always exceed the objective for each and every desirable state:

maximize gap
B , a , gap
subject to 0 ≥ a3 + gap
0 ≥ a2 + gap
0 ≥ a1 + gap
B23 + a2 + a3 ≥ a3 + gap
B23 + a2 + a3 ≥ a2 + gap
B23 + a2 + a3 ≥ a1 + gap
B13 + a1 + a3 ≥ a3 + gap
B13 + a1 + a3 ≥ a2 + gap (1006)
B13 + a1 + a3 ≥ a1 + gap
B12 + a1 + a2 ≥ a3 + gap
B12 + a1 + a2 ≥ a2 + gap
B12 + a1 + a2 ≥ a1 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a3 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a2 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a1 + gap
−2 ≤ a ≤ 2
−2 ≤ B ≤ 2
having solution:
  
−1 0 2 2
a⋆ =  −1  , B⋆ =  0 0 2  , gap⋆ = 1 (1007)
−1 0 0 0

easily found by cvx [195] under Matlab. For higher-dimensional q vectors (by induction),
 
0 2
..
a⋆ = −1 ∈ Rn , B⋆ =  .  ∈ Rn×n , gap⋆ = 1 (1008)
0 0

4.8.2 quantum Eternity II


Any equality of the form Ax = b , having binary solution x , may be expressed as a QUBO

minimize
n
xTATAx − 2xTAT b (1009)
x∈{0, 1}

(§E.0.1.0.1) where B , ATA− δ 2(ATA) and a , δ(ATA) − 2AT b from (1004). An adiabatic
quantum annealer (like D:Wave’s) is theoretically capable of solving Eternity II because
it may be expressed Ẽ q = τ̃ (984) assuming that any feasible binary solution is minimal
cardinality (p.319). This formulation (1009) decreases sparsity, from that of A , which
increases required qubit coupling.4.98
4.98 For sparsity as defined on page 317, for nonsymmetric B matrix, and for:
ˆ matrix E corresponding to (978), sparsity decreases from 0.0000052771 to 0.002683
ˆ matrix Ẽ corresponding to (985), sparsity decreases from 0.00051786 to 0.027965
ˆ matrix Ẽ corresponding to (986), sparsity decreases from 0.00056985 to 0.0047694
ˆ matrix Ẽ corresponding to (991), sparsity decreases from 0.00070522 to 0.0042453 .
330 CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.8.2.0.1 Example. (E. D. Dahl) Nonincreasing discrete step.

···
quantum step:
···
q1 q2 B12 q1 q2 + a1 q1 + a2 q2
desirable
0 0 0
1 0 a1
1 1 B12 + a1 + a2
undesirable
0 1 a2

maximize gap
B , a , gap
subject to a2 ≥ 0 + gap
a2 ≥ a1 + gap (1010)
a2 ≥ B12 + a1 + a2 + gap
−1 ≤ a ≤ 1
−1 ≤ B ≤ 1

Upper and lower bounds are 1 , on each entrywise inequality, because gap is sufficient;

· ¸ · ¸
⋆ −1 ⋆ 0 −1
a = , B = , gap⋆ = 1 (1011)
1 0 0

Extensible to higher dimension; e.g, {000 , 100 , 110 , 111}T are desirable q ∈ R3 . 2

4.8.2.0.2 Example. (E. D. Dahl) Boolean qubit and function.


We consider the case where second argument to and is complemented:

quantum and function: q3 = q1 · \q2


q1 q2 q3 B12 q1 q2 + B13 q1 q3 + B23 q2 q3 + a1 q1 + a2 q2 + a3 q3
desirable
0 0 0 0
0 1 0 a2
1 0 1 B13 + a1 + a3
1 1 0 B12 + a1 + a2
undesirable
0 0 1 a3
0 1 1 B23 + a2 + a3
1 0 0 a1
1 1 1 B12 + B13 + B23 + a1 + a2 + a3
4.9. CONSTRAINING RANK OF INDEFINITE MATRICES 331

maximize gap
B , a , gap
subject to a3 ≥ 0 + gap
a3 ≥ a2 + gap
a3 ≥ B13 + a1 + a3 + gap
a3 ≥ B12 + a1 + a2 + gap
B23 + a2 + a3 ≥ 0 + gap
B23 + a2 + a3 ≥ a2 + gap
B23 + a2 + a3 ≥ B13 + a1 + a3 + gap
B23 + a2 + a3 ≥ B12 + a1 + a2 + gap
a1 ≥ 0 + gap (1012)
a1 ≥ a2 + gap
a1 ≥ B13 + a1 + a3 + gap
a1 ≥ B12 + a1 + a2 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ 0 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a2 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ B13 + a1 + a3 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ B12 + a1 + a2 + gap
−2 ≤ a ≤ 2
−2 ≤ B ≤ 2

Optimal coefficients are not unique, but optimal objective gap is:
   
1 0 1 −2
a⋆ =  −2  , B⋆ =  0 0 2 , gap⋆ = 1 (1013)
1 0 0 0

This optimal B matrix represents required coupling for and but cannot be implemented
in Chimera directly because there is no completely coupled three-qubit circuit. 2

4.9 Constraining rank of indefinite matrices


Example 4.9.0.0.1, which follows, demonstrates that convex iteration is more generally
applicable to indefinite or nonsquare matrices X ∈ Rm×n ; not only to positive semidefinite
matrices. Indeed,

find X
X , Y, Z
find X
X∈ Rm×n subject to X ∈ ·C
≡ (1014)
¸
subject to X∈C Y X
G=
rank X ≤ k XT Z
rank G ≤ k

Proof. rank G ≤ k ⇒ rank X ≤ k because X is the projection of composite matrix G on


subspace Rm×n . For symmetric Y and Z , any rank-k positive semidefinite composite
matrix G can be factored into rank-k terms R : G = RTR where R , [ B C ] and
rank B , rank C ≤ rank R and B ∈ Rk×m and C ∈ Rk×n . Because Y and Z and X = B T C
are variable, (1635) rank X ≤ rank B , rank C ≤ rank R = rank G is tight. ¨
332 CHAPTER 4. SEMIDEFINITE PROGRAMMING

So, there must exist an optimal direction vector W ⋆ such that


find X minimize hG , W ⋆ i
X , Y, Z
X , Y, Z
subject to X ∈ ·C
Y X
¸
≡ subject to X ∈ ·C (1015)
G=
¸
XT Z Y X
G= º0
rank G ≤ k XT Z

Were W ⋆ = I , the optimal resulting trace objective would be equivalent to the


minimization of nuclear norm of X over C by (1877). This means:
ˆ (confer p.178) The argument of any nuclear norm minimization problem may be
replaced with a composite semidefinite variable of the same optimal rank but doubly
dimensioned.
Then Figure 98 becomes an accurate geometrical description of a consequent composite
semidefinite problem objective. But there are better direction vectors than Identity I
which occurs only under special conditions:

4.9.0.0.1 Example. Compressed sensing, compressive sampling. [342]


As our modern technology-driven civilization acquires and exploits ever-increasing
amounts of data, everyone now knows that most of the data we acquire can be thrown
away with almost no perceptual loss - witness the broad success of lossy compression
formats for sounds, images, and specialized technical data. The phenomenon of ubiquitous
compressibility raises very natural questions: Why go to so much effort to acquire all the
data when most of what we get will be thrown away? Can’t we just directly measure the
part that won’t end up being thrown away? −David Donoho [137]

Lossy data compression techniques like JPEG are popular, but it is also well known that
compression artifacts become quite perceptible with signal postprocessing that goes beyond
mere playback of a compressed signal. [252] [279] Spatial or audio frequencies presumed
masked by a simultaneity are not encoded, for example, so rendered imperceptible even
with significant postfiltering (of the decompressed signal) that is meant to reveal them;
id est, desirable artifacts are forever lost, so lossy compressed data is not amenable
to search, analysis, or postprocessing: e.g, sound effects [109] [110] [112] or image
enhancement (Adobe Photoshop).4.99 Further, there can be no universally acceptable
unique metric of perception for gauging exactly how much data can be tossed. For these
reasons, there will always be need for raw (noncompressed) data.
In this example, only so much information is thrown out as to leave perfect
reconstruction within reach. Specifically, the MIT logo in Figure 137 is perfectly
reconstructed from 700 time-sequential samples {yi } acquired by the one-pixel camera
illustrated in Figure 138. The MIT-logo image in this example impinges a 46×81
array micromirror device. This mirror array is modulated by a pseudonoise source
that independently positions all the individual mirrors. A single photodiode (one pixel)
integrates incident light from all mirrors. After stabilizing the mirrors to a fixed
but pseudorandom pattern, light so collected is then digitized into one sample yi
by analog-to-digital (A/D) conversion. This sampling process is repeated with the
micromirror array modulated to a new pseudorandom pattern.
The most important questions are: How many samples are needed for perfect
reconstruction? Does that number of samples represent compression of the original data?
4.99 As simple a process as upward scaling of signal amplitude or image size will always introduce noise;
even to a noncompressed signal. But scaling-noise is particularly noticeable in a JPEG-compressed image;
e.g, text or any sharp edge.
4.9. CONSTRAINING RANK OF INDEFINITE MATRICES 333

Figure 137: Massachusetts Institute of Technology (MIT) logo, including its white
boundary, may be interpreted as a rank-5 matrix. This constitutes Scene Y observed
by the one-pixel camera in Figure 138 for Example 4.9.0.0.1.

yi

Figure 138: One-pixel camera. Compressive imaging camera block diagram. Incident
lightfield (corresponding to the desired image Y ) is reflected off a digital micromirror
device (DMD) array whose mirror orientations are modulated in the pseudorandom pattern
supplied by the random number generators (RNG). Each different mirror pattern produces
a voltage at the single photodiode that corresponds to one measurement yi . −[389] [440]
334 CHAPTER 4. SEMIDEFINITE PROGRAMMING

1 2 3 4 5

samples

0 1000 2000 3000 3726

Figure 139: Estimates of compression for various encoding methods:


1) linear interpolation (140 samples),
2) minimal columnar basis (311 samples),
3) convex iteration (700 samples) can achieve lower bound predicted by compressed sensing
(670 samples, n = 46×81 , k = 140 , Figure 115) whereas nuclear norm minimization
alone does not [342, §6],
4) JPEG @100% quality (2588 samples),
5) no compression (3726 samples).

We claim that perfect reconstruction of the MIT logo can be achieved reliably with as
few as 700 samples y = [yi ] ∈ R700 from this one-pixel camera. That number represents
only 19% of information obtainable from 3726 micromirrors.4.100 (Figure 139)
Our approach to reconstruction is to look for low-rank solution to an underdetermined
system:
find X
X ∈ R46×81
subject to A vec X = y (1016)
rank X ≤ 5
where vec X is the vectorized (39) unknown image matrix. Each row of wide matrix
A is one realization of a pseudorandom pattern applied to the micromirrors. Since
these patterns are deterministic (known), then the i th sample yi equals A(i , :) vec Y ;
id est, y = A vec Y . Perfect reconstruction here means optimal solution X ⋆ equals scene
Y ∈ R46×81 to within machine precision.
Because variable matrix X is generally not square or positive semidefinite, we constrain
its rank by rewriting the problem equivalently

find X
W1 ∈ R46×46 , W2 ∈ R81×81 , X ∈ R46×81
subject to A vec·X = y ¸ (1017)
W1 X
rank ≤5
X T W2

This rank constraint on the composite (block) matrix insures rank X ≤ 5 for any choice
of dimensionally compatible matrices W1 and W2 . But to solve this problem by convex
iteration, we alternate solution of semidefinite program
µ· ¸ ¶
W1 X
minimize tr Z
W1 ∈ S46 , W2 ∈ S81 , X ∈ R46×81 X T W2
subject to ·A vec X = y¸ (1018)
W1 X
º0
X T W2
4.100 That number (700 samples) is difficult to achieve, as reported in [342, §6]. If a minimal basis for the
MIT logo were instead constructed, only five rows or columns worth of data (from a 46×81 matrix) are
linearly independent. This means a lower bound on achievable compression is about 5×46 = 230 samples
plus 81 samples column encoding; which corresponds to 8% of the original information. (Figure 139)
4.10. CONVEX ITERATION RANK -1 335

with semidefinite program


µ· ¸⋆ ¶
W1 X
minimize tr Z
46+81
Z∈ S X T W2
(1019)
subject to 0 ¹ Z ¹ I
tr Z = 46 + 81 − 5

(which has an optimal solution known in closed form, p.539) until a rank-5 composite
matrix is found.
With 1000 samples {yi } , convergence occurs in two iterations; 700 samples require
more than ten iterations but reconstruction remains perfect. Iterating more admits taking
of fewer samples. Reconstruction is independent of pseudorandom sequence parameters;
e.g, binary sequences succeed with the same efficiency as Gaussian or uniformly distributed
sequences. 2

4.9.1 rank-constraint midsummary


We find that this direction matrix idea works well and quite independently of desired
rank or affine dimension. This idea of direction matrix is good principally because of
its simplicity: When confronted with a problem otherwise convex if not for a rank or
cardinality constraint, then that constraint becomes a linear regularization term in the
objective.
There exists a common thread through all these Examples; that being, convex iteration
with a direction matrix as normal to a linear regularization (a generalization of the
well-known trace heuristic). But each problem type (per Example) possesses its own
idiosyncrasies that slightly modify how a rank-constrained optimal solution is actually
obtained: The ball packing problem in Chapter 5.4.2.2.6, for example, requires a problem
sequence in a progressively larger number of balls to find a good initial value for the
direction matrix, whereas many of the examples in the present chapter require an initial
value of 0. Finding a Boolean solution in Example 4.7.0.0.9 requires a procedure to detect
stalls, while other problems have no such requirement. The combinatorial Procrustes
problem in Example 4.7.0.0.3 allows use of a known closed-form solution for direction
vector when solved via rank constraint, but not when solved via cardinality constraint.
Some problems require a careful weighting of the regularization term, whereas other
problems do not, and so on. It would be nice if there were a universally applicable method
for constraining rank; one that is less susceptible to quirks of a particular problem type.
Poor initialization of the direction matrix from the regularization can lead to an
erroneous result. We speculate one reason to be a simple dearth of optimal solutions
of desired rank or cardinality;4.101 an unfortunate choice of initial search direction leading
astray. Ease of solution by convex iteration occurs when optimal solutions abound. With
this speculation in mind, we now propose a further generalization of convex iteration for
constraining rank that attempts to ameliorate quirks and unify problem types:

4.10 Convex Iteration rank-1


We now develop a general method for constraining rank that first decomposes a given
problem via standard diagonalization of matrices (§A.5). This method is motivated
by observation (§4.5.1.1) that an optimal direction matrix can be simultaneously
diagonalizable with an optimal variable matrix. This suggests minimization of an
4.101 In Convex Optimization, an optimal solution generally comes from a convex set of optimal solutions;
(§3.1.1.1) that set can be large.
336 CHAPTER 4. SEMIDEFINITE PROGRAMMING

objective function directly in terms of eigenvalues. A second motivating observation is


that variable orthogonal matrices seem easily found by convex iteration; e.g, Procrustes
Example 4.7.0.0.2.

4.10.1 rank-1 transformation


It turns out that this general method always requires solution to a rank-1 constrained
problem regardless of desired rank ρ from the original problem. To demonstrate, we pose
a semidefinite feasibility problem

find X ∈ Sn
subject to A svec X = b
(1020)
Xº0
rank X ≤ ρ

given an upper bound 0 < ρ < n on rank, a vector b ∈ Rm , and typically wide full-rank
 
svec(A1 )T
.. m×n(n+1)/2
A = ∈ R (712)
 
.
T
svec(Am )

where Ai ∈ S n , i = 1 . . . m . So, for symmetric matrix vectorization svec as defined in (59),

tr(A1 X)
 
..
A svec X =  .  (713)
tr(Am X)

This program (1020) is a statement of the classical problem of finding a matrix X of


maximum rank ρ in the intersection of the positive semidefinite cone with a given number
m of hyperplanes in the subspace of symmetric matrices S n . [28, §II.13] [26, §2.2] Such a
matrix is presumed to exist.
To begin transformation of (1020), express the nonincreasingly ordered diagonalization
(§A.5.1) of positive semidefinite variable matrix
n
X
X , QΛQT = λ i Qii ∈ S n (1021)
i=1

which is a sum of rank-1 orthogonal-projection matrices Qii weighted by eigenvalues λ i


where Qij , qi qjT ∈ Rn×n , Q = [ q1 · · · qn ] ∈ Rn×n , QT = Q−1 , Λii = λ i ∈ R , and
 
λ1 0
 λ2 
Λ =  ∈ Sn (1022)
 
..
 . 
0T λn

where λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. Recall the fact:

Λº0 ⇔ Xº0 (1620)

From orthogonal matrix Q in ordered diagonalization (1021) of variable X , take a matrix


hp i
λ1 q1 · · · λρ qρ ∈ Rn×ρ
p p
U , [ u1 · · · uρ ] , Q(: , 1 : ρ) Λ(1 : ρ , 1 : ρ) = (1023)
4.10. CONVEX ITERATION RANK -1 337

Then U has orthogonal but unnormalized columns;


ρ
X ρ
X ρ
X
X = UUT = ui uT
i , Uii = λ i qi qiT ∈ S n (1024)
i=1 i=1 i=1
Make an assignment
 
u1 [ uT T
1 · · · uρ ]
Z =
 ..  ∈ S nρ
. 

    (1025)
U11 · · · U1ρ u1 uT
1 ··· u1 uT
ρ
.. .. ..   .. .. .. 
= . ,

. .   . . . 
T
U1ρ ··· Uρρ uρ uT
1 ··· uρ uT
ρ

Then transformation of (1020) to its rank-1 equivalent is:


ρ
X
find X= Uii
Uii ∈ S , Uij ∈ Rn×n
n
i=1
 
U11 · · · U1ρ
 . .. .. 
subject to Z =  .. . . (º 0)
T
U1ρ · · · Uρρ (1026)

A svec Uii = b
i=1
tr Uij = 0 i < j = 2...ρ
rank Z = 1
Symmetry is necessary and sufficient for positive semidefiniteness of a rank-1 matrix.
(§A.3.1.0.7) Matrix X is positive semidefinite whenever Z is. (§A.3.1.0.4, §A.3.1.0.2) This
new problem always enforces a rank-1 constraint on matrix Z ; id est, regardless of upper
bound on rank ρ of variable matrix X , this equivalent problem always poses a rank-1
constraint. Upper bound ρ on rank of positive semidefinite matrix X is assured by rank-1
optimal matrix Z .
We propose solving (1026) by iteration of convex problem

minimize tr(Z W )
Uii ∈ Sn , Uij ∈ Rn×n
 
U11 · · · U1ρ
 . .. .. 
subject to Z =  .. . . º 0
T
U1ρ · · · Uρρ (1027)

A svec Uii = b
i=1
tr Uij = 0 i < j = 2...ρ
with convex problem
minimize

tr(Z ⋆ W )
W∈ S
subject to 0¹W¹ I (1028)
tr W = nρ − 1
the latter providing direction of search W for a rank-1 matrix Z in (1027). These convex
problems (1027) (1028) are iterated until a rank-1 Z matrix is found (until the objective
338 CHAPTER 4. SEMIDEFINITE PROGRAMMING

of (1027) vanishes). Initial value of direction matrix W is the Identity. For subsequent
iterations, an optimal solution to (1028) has closed form (p.539).
Because of the nonconvex nature of a rank-constrained problem, there can be no proof
of convergence of this convex iteration to a feasible point of (1026). But the iteration
always converges to a local minimum because the sequence of objective values is monotonic
and nonincreasing; any monotonically nonincreasing real sequence converges. [294, §1.2]
[44, §1.1] A rank ρ matrix X solving the original problem (1020) is found when the
objective in (1027) converges to 0 : a certificate of global optimality for the convex
iteration. In practice, incidence of success is quite high (99.99% [421]); failures being
mostly attributable to numerical accuracy.

4.10.2 singular value decomposition by convex iteration


This diagonal decomposition technique (transformation to a rank-1 problem) is extensible
to other problem types; e.g, [257, §III]. Rank-1 transformation makes singular value
decomposition (SVD, §A.6) possible by convex iteration because orthogonality constraints
may then be introduced. We learn that any uniqueness properties, the SVD of rank-ρ
matrix
X , U S V T ∈ Rm×n (1029)
might enjoy, stem from demand for singular vector orthonormality.4.102
Assignment Z ∈ S2mρ+nρ+ρ+1
+ is key to finding the SVD of X by convex optimization:

find U , δ(S ) , V
H, J
vec(H)T vec(U )T δ(S )T vec(V )T
 
1
 vec H 
 
º 0
subject to Z =  vec U J 
 δ(S ) 
vec V
δ(S ) º 0
H = US ⊂ J
X = HV T ∈ J (1030)
HU T symmetry
U ¡TH perpendicularity
tr¡H(: , i) H(: , i)T¢ = S(i , i)2
¢
i=1 . . . ρ
tr H(: , i) U (: , i)T = S(i , i) i=1 . . . ρ
H orthogonality
U orthonormality
V orthonormality
rank Z = 1
2mρ+nρ+ρ
where variable matrix J ∈ S+ is a large partition of Z , where given rank-ρ matrix
m×n
X∈ R is subject to SVD in unknown orthonormal matrices U ∈ Rm×ρ and V ∈ Rn×ρ
and unknown diagonal matrix of singular values S ∈ Rρ×ρ , and where introduction of
variable
H , U S ∈ Rm×ρ (1031)
makes identification of input X = HV T possible within partition J . Orthogonality
constraints on columns of H , within J , and orthonormality constraints on columns of
U and V are critical; videlicet, h ⊥ v ⇔ tr(hv T ) = 0 ; v T v = 1 ⇔ tr(vv T ) = 1.
4.102 Otherwise, there exist many similarly structured tripartite nonorthogonal matrix decompositions; in
place of ρ nonzero singular values, diagonal matrix S would instead hold exactly ρ coordinates; orthonormal
columns in U and V would become merely linearly independent.
4.10. CONVEX ITERATION RANK -1 339

tr(Z W ) 5

0
1 2 3 4 5 iteration
Figure 140: Typical convergence of SVD by convex iteration for a 2×2 random X matrix.
Matrix W represents a direction vector of convex iteration rank-1.

Symmetric matrix Z is positive semidefinite rank-1 at optimality, regardless of rank ρ .


That rank constraint is the only nonconvex constraint in (1030); the only constraint that
cannot be directly implemented in a convex manner per partition J . But the rank
constraint is handled well by convex iteration. Matlab implementation of SVD by convex
iteration [437] is intricate although incidence of success is 99.99%, barring numerical error.

4.10.2.0.1 Example. SVD of X by convex iteration. (confer [184])


Given rank-2 matrix X = U S V T ∈ R2×2 , we now make explicit every constraint in (1030):
· ¸
2×2 σ1 0
find U∈R , S= ∈ S2 , V ∈ R2×2
H∈ R2×2 , J∈ S14 0 σ2

hT hT uT uT v1T v2T
 
1 1 2 1 2 [ σ1 σ2 ]
 h1 J11 J12 J13 J14 J15 J16 J17 
 T

 h2 J12 J 22 J 23 J 24 J25 J26 J27 
 T T

 u1 J13 J 23 J 33 J 34 J35 J36 J37 
 T T T

subject to Z = u 2 J J J J 44 J45 J46 J47 º 0
 · ¸ 14 24 34 
 σ1 T T T T

 σ2 J15 J25 J35 J45 J55 J56 J57
 

 T T T T T

 v1 J16 J26 J36 J46 J56 J66 J67 
T T T T T T
v2 J17 J27 J37 J47 J57 J67 J77 (1032)
σ 1 , σ2 ≥ 0
H = [ J35 (: , 1) J45 (: , 2) ]
X = J16 + J27
T T
J13 = J13 , J24 = J24
tr J14 = 0 , tr J23 = 0
tr J11 = J55 (1 , 1) , tr J22 = J55 (2 , 2)
tr J13 = σ1 , tr J24 = σ2
tr J12 = 0
tr J33 = 1 , tr J44 = 1 , tr J34 = 0
tr J66 = 1 , tr J77 = 1 , tr J67 = 0
rank Z = 1

where H , [ h1 h2 ] , U , [ u1 u2 ] , S , δ([ σ1 σ2 ]T ) , V , [ v1 v2 ] , and Z ∈ S15 .


Observe how, excepting the rank constraint, constraints are written as affine expressions
340 CHAPTER 4. SEMIDEFINITE PROGRAMMING

of variable matrix J . [437] Convergence is illustrated in Figure 140. 2

4.10.2.0.2 Example. SVD of X·∈ R2×2¸ in closed form.


a c
Singular value decomposition of X = is analytically determinable (Mathematica):
b d

X = US V T =
2bcd+a(a2+b2+c2−d2−γ) 2bcd+a(a2+b2+c2−d2+γ)
 
√ √
 (2bcd+a(a2+b2+c2−d2−γ))2 +(2acd+b(a2+b2−c2+d2−γ))2 (2bcd+a(a2+b2+c2−d2+γ))2 +(2acd+b(a2+b2−c2+d2+γ))2 
2acd+b(a2+b2−c2+d2−γ) 2acd+b(a2+b2−c2+d2+γ)
 
√ √
(2bcd+a(a2+b2+c2−d2−γ))2 +(2acd+b(a2+b2−c2+d2−γ))2 (2bcd+a(a2+b2+c2−d2+γ))2 +(2acd+b(a2+b2−c2+d2+γ))2
√ 
a2+b2+c2+d2−γ
√ 0 (1033)
 2 √ 
a2+b2+c2+d2+γ
0 √
2
2 2 2 2
a +b −c −d −γ 2(ac+bd)
 
√ √
 4(ac+bd)2 +(a2+b2−c2−d2−γ)2 4(ac+bd)2 +(a2+b2−c2−d2−γ)2 
a2+b2−c2−d2+γ
 
2(ac+bd)
√ √
4(ac+bd)2 +(a2+b2−c2−d2+γ)2 4(ac+bd)2 +(a2+b2−c2−d2+γ)2

where p
γ, ((b + c)2 + (a − d)2 ) ((b − c)2 + (a + d)2 ) (1034)
2

4.10.2.0.3 Example. Closed-form SVD of X ∈ R2×2 rank-1.


Singular value decomposition of a real 2×2 rank-1 matrix is especially simple.
· ¸ · ¸
a [c d ] ac ad
X= = (1035)
b bc bd
· ¸· √ √ ¸· ¸
a −b a2 + b2 c2 + d2 0 c d
X = US V T = √ 1 √ 1 (1036)
a2+b2 b a 0 0 −d c c2+d2
2

4.10.2.0.4 Exercise. Constraints required for SVD calculation by Optimization.


Given matrix X , prove that constraints in (1030) are necessary and sufficient for its
singular value decomposition. H

4.10.3 Convex Iteration accelerant


Convex iteration can be made to converge faster; sometimes, by orders of magnitude. The
idea here is to determine whether the last three direction vectors are close to their fit to
a straight line. When three direction vectors are close to a straight line, then the last
direction vector may be replaced with its extrapolation along that line.
To reduce computation time, a fitted line is not a best fit. Instead, the midpoint
between each pair of iteration-adjacent direction vectors is calculated (Figure 141). A
straight line is uniquely defined by two midpoints in any dimension. Distance of each
direction vector to the line is calculated, then those three distances summed into a program
variable called straight . When a sum is small, three direction vectors are deemed close
to the line determined by them. What is meant by close and small depends on problem
type and data. For the parameters and normalized random data chosen for two Matlab
realizations [421] [437] on Wıκımization (corresponding to problems (1026) and (1030)),
4.10. CONVEX ITERATION RANK -1 341

svec W3

m2

svec W2

m1

svec W1

Figure 141: W1 , W2 , and W3 represent the last three direction vectors in a sequence.
m1 represents the midpoint between direction vectors W1 and W2 ; m2 is the midpoint
of W2 and W3 . Straight line passes through midpoints.

small is numerically defined to be 1 or less in the statement if straight < 1 whose


purpose is to determine straightness of the last three direction vectors of convex iteration.
The smaller the value of sum straight , the closer the last three direction vectors are
to a straight line. Variable straight is inherently bounded below by 0 which indicates
three direction vectors precisely on the line going through them.
If linear extrapolation goes too far, then the objective of convex iteration will increase
or a solver may fail numerically. In either case, one must forget the last iteration and back
up the linear extrapolation until the objective decreases. These techniques are illustrated
by the Matlab programs; [421] Figure 140 is one representative. [437]

You might also like