semip
semip
Semidefinite programming
Prior to 1984, linear and nonlinear programming,4.1 one a subset of the other,
had evolved for the most part along unconnected paths, without even a common
terminology. (The use of “programming” to mean “optimization” serves as a
persistent reminder of these differences.)
−Forsgren, Gill, & Wright, 2002 [169]
Given some practical application of convex analysis, it may at first seem puzzling why
a search for its solution ends abruptly with a formalized statement of the problem itself
as a constrained optimization. The explanation is: typically we do not seek analytical
solution because there are relatively few. (§3.5.3, §C) If a problem can be expressed in
convex form, rather, then there exist computer programs providing efficient numerical
global solution. [195] [454] [455] [453] [395] [379] The goal, then, becomes conversion of a
given problem (perhaps a nonconvex or combinatorial problem statement) to an equivalent
convex form or to an alternation of convex subproblems convergent to a solution of the
original problem:
By the fundamental theorem of Convex Optimization, any locally optimal point
(solution) of a convex problem is globally optimal. [66, §4.2.2] [348, §1] Given convex real
objective function g and convex feasible set D ⊆ dom g , which is the set of all variable
values satisfying the problem constraints, we pose a generic convex optimization problem
minimize g(X)
X (708)
subject to X∈ D
where constraints are abstract here in membership of variable X to convex feasible set D .
Inequality constraint functions of a convex optimization problem are convex. Quasiconvex
inequality constraint functions are prohibited by prevailing methods for numerical solution.
Equality constraint functions are conventionally affine, but not necessarily so. Affine
equality constraint functions, as opposed to the superset of all convex equality constraint
functions having convex level sets (§3.4.0.0.5), make convex optimization tractable.
Similarly, the problem
maximize g(X)
X (709)
subject to X ∈ D
is called convex were g a real concave function and feasible set D convex. As conversion
to convex form is not always possible, there is much ongoing research to determine which
problem types have convex expression or relaxation. [36] [64] [176] [315] [390] [173]
4.1 nascence of polynomial-time interior-point methods of solution [410] [451].
Dattorro, Convex Optimization Euclidean Distance Geometry 2ε, Mεβoo, v2018.09.21. 223
224 CHAPTER 4. SEMIDEFINITE PROGRAMMING
(confer p.127) Consider a conic problem (p) and its dual (d): [333, §3.3.1] [273, §2.1] [274]
minimize c Tx maximize b Ty
x y,s
(310) (p) subject to x∈K subject to s ∈ K∗ (d) (710)
Ax = b A Ty + s = c
where K is a closed convex cone, K∗ is its dual, matrix A is fixed, and the remaining
quantities are vectors.
When K is a polyhedral cone (§2.12.1), then each conic problem becomes a linear
program; the selfdual nonnegative orthant providing the prototypical primal linear
program and its dual. [104, §3-1]4.2 More generally, each optimization problem is convex
when K is a closed convex cone. Solution to each convex problem is not necessarily
unique; the optimal solution sets {x⋆ } and {y ⋆ , s⋆ } are convex and may comprise more
than a single point.
The vector inner-product for matrices is defined in the Euclidean/Frobenius sense in the
isomorphic vector space Rn(n+1)/2 ; id est,
In a national planning problem of some size, one may easily run into several
hundred variables and perhaps a hundred or more degrees of freedom. . . . It
should always be remembered that any mathematical method and particularly
methods in linear programming must be judged with reference to the type of
computing machinery available. Our outlook may perhaps be changed when we
get used to the super modern, high capacity electronic computor that will be
available here from the middle of next year.
−Ragnar Frisch [171]
(appropriately described) heuristics under the hood - my codes certainly do. . . . Of course, there are still
questions relating to high-accuracy and speed, but for many applications a few digits of accuracy suffices
and overnight runs for non-real-time delivery is acceptable.
−Nicholas I. M. Gould, Stanford alumnus, SIOPT Editor in Chief
4.4 Second-order cone programming (SOCP) was born in the 1990s; it is not posable as a quadratic
program. [283]
4.5 This characteristic might be regarded as a disadvantage to interior-point methods of numerical
solution, but this behavior is not certain and depends on solver implementation.
226 CHAPTER 4. SEMIDEFINITE PROGRAMMING
PC semidefinite
second-order cone
quadratically constrained
quadratic
linear
geometric
Figure 94: Venn diagram of program hierarchy. Convex program PC represents broadest
class of convex optimization problem having efficient global solution methods. Semidefinite
program subsumes other convex program types excepting geometric program [65] [88].
A , {X ∈ S n | A svec X = b } (2297)
0 Γ1 P
Γ2 3
S+
A = ∂H
Figure 95: Visualizing positive semidefinite cone in high dimension: Proper polyhedral
cone S+ 3
⊂ R3 representing positive semidefinite cone S3+ ⊂ S3 ; analogizing its intersection
3
S+ ∩ ∂H with hyperplane. Number of facets is arbitrary (an analogy not inspired by
eigenvalue decomposition). The rank-0 positive semidefinite matrix corresponds to origin
in R3 , rank-1 positive semidefinite matrices correspond to edges of polyhedral cone, rank-2
to facet relative interiors, and rank-3 to polyhedral cone interior. Vertices Γ1 and Γ2 are
3 3
extreme points of polyhedron P = ∂H ∩ S+ , and extreme directions of S+ . A given vector
C is normal to another hyperplane (not illustrated but independent w.r.t ∂H) containing
line segment Γ1 Γ2 minimizing real linear function hC , X i on P . (confer Figure 29,
Figure 33)
228 CHAPTER 4. SEMIDEFINITE PROGRAMMING
k 3
dim F(S+ ) dim F(S3+ ) dim F(S3+ ∋ rank-k matrix)
0 0 0 0
boundary 1 1 1 1
2 2 3 3
interior 3 3 6 6
3
4.1.2.2.1 Example. Optimization over A ∩ S+ .
Consider minimization of the real linear function hC , X i over
3
P , A ∩ S+ (715)
f0⋆ , minimize hC , X i
X
3
(716)
subject to X ∈ A ∩ S+
As illustrated for particular vector C and hyperplane A = ∂H in Figure 95, this linear
function is minimized on any X belonging to the face of P containing extreme points
{Γ1 , Γ2 } and all the rank-2 matrices in between; id est, on any X belonging to the face
of P
F(P) = {X | hC , X i = f0⋆ } ∩ A ∩ S+
3
(717)
exposed by the hyperplane {X | hC , X i = f0⋆ }. In other words, the set of all optimal
points X ⋆ is a face of P
{X ⋆ } = F(P) = Γ1 Γ2 (718)
comprising rank-1 and rank-2 positive semidefinite matrices. Rank 1 is the upper bound on
existence in the feasible set P for this case m = 1 hyperplane constituting A . The rank-1
matrices Γ1 and Γ2 in face F(P) are extreme points of that face and (by transitivity
(§2.6.1.2)) extreme points of the intersection P as well. As predicted by analogy to
Barvinok’s Proposition 2.9.3.0.1, the upper bound on rank of X existent in the feasible
set P is satisfied by an extreme point. The upper bound on rank of an optimal solution
X ⋆ existent in F(P) is thereby also satisfied by an extreme point of P precisely because
{X ⋆ } constitutes F(P) ;4.7 in particular,
As all linear functions on a polyhedron are minimized on a face, [104] [287] [311] [318] by
analogy we so demonstrate coexistence of optimal solutions X ⋆ of (711P) having assorted
rank. 2
Barvinok showed, [26, §2.2] when given a positive definite matrix C and an arbitrarily
small neighborhood of C comprising positive definite matrices, there exists a matrix C̃
from that neighborhood such that optimal solution X ⋆ to (711P) (substituting C̃ ) is an
n
extreme point of A ∩ S+ and satisfies upper bound (279).4.8 Given arbitrary positive
definite C , this means: nothing inherently guarantees that an optimal solution X ⋆ to
problem (711P) satisfies (279); certainly nothing given any symmetric matrix C , as the
problem is posed. This can be proved by example:
4.7 and every face contains a subset of the extreme points of P by the extreme existence theorem
(§2.6.0.0.2). This means: because the affine subset A and hyperplane {X | hC , X i = f0⋆ } must intersect
a whole face of P , calculation of an upper bound on rank of X ⋆ ignores counting the hyperplane when
determining m in (279).
4.8 Further, the set of all such C̃ in that neighborhood is open and dense.
230 CHAPTER 4. SEMIDEFINITE PROGRAMMING
with an equal number of twos and zeros along the main diagonal. Indeed, optimal solution
(721) is a terminal solution along the central path taken by the interior-point method as
implemented in [461, §2.5.3]; it is also a solution of highest rank among all optimal solutions
to (720). Clearly, rank of this primal optimal solution exceeds by far a rank-1 solution
predicted by upper bound (279). 2
This rational example (720) indicates the need for a more generally applicable and simple
algorithm to identify an optimal solution X ⋆ satisfying Barvinok’s Proposition 2.9.3.0.1.
We will review such an algorithm in §4.3, but first we provide more background.
4.2 Framework
4.2.1 Feasible sets
Denote by D and D∗ the convex sets of primal and dual points respectively satisfying the
primal and dual constraints in (711), each assumed nonempty;
hA1 , X i
..
n n
D = X ∈ S+ | . = b = A ∩ S+
hAm , X i
(722)
m
½ ¾
∗ n m
D = S ∈ S+ , y = [yi ] ∈ R |
P
yi Ai + S = C
i=1
These are the primal feasible set and dual feasible set. Geometrically, primal feasible
n n
A ∩ S+ represents an intersection of the positive semidefinite cone S+ with an affine subset
A of the subspace of symmetric matrices S in isometrically isomorphic Rn(n+1)/2 . A has
n
dimension n(n + 1)/2 − m when the vectorized Ai are linearly independent. Dual feasible
set D∗ is a Cartesian product of the positivePsemidefinite cone with its inverse image
(§2.1.9.0.1) under affine transformation4.9 C − yi Ai . Both feasible sets are convex, and
the objective functions are linear on a Euclidean vector space. Hence, (711P) and (711D)
are convex optimization problems.
4.9 Inequality directly from (711D) (§2.9.0.1.1) and is known as a linear matrix
P
C − yi Ai º 0 follows
inequality. (§2.13.6.1.1) Because
P
yi Ai ¹ C , matrix S is known as a slack variable (a term borrowed
from linear programming [104]) since its inclusion raises this inequality to equality.
4.2. FRAMEWORK 231
n
4.2.1.1 A ∩ S+ emptiness determination via Farkas’ lemma
4.2.1.1.1 Lemma. Semidefinite Farkas’ lemma. (confer §4.2.1.1.2)
Given affine subset A = {X ∈ S n |hAi , X i = b i , i = 1 . . . m} (2297), vector b = [b i ] ∈ Rm ,
and set {Ai ∈ S n , i = 1 . . . m} such that {A svec X | X º 0} (390) is closed, then primal
n
feasible set A ∩ S+ is nonempty if and only if y T b ≥ 0 holds for each and every vector
m
y = [yi ] ∈ Rm such that
P
yi Ai º 0.
i=1
n
Equivalently, primal feasible set A ∩ S+ is nonempty if and only if y T b ≥ 0 holds for
m
P
each and every vector kyk = 1 such that yi Ai º 0. ⋄
i=1
Semidefinite Farkas’ lemma provides necessary and sufficient conditions for a set of
n
hyperplanes to have nonempty intersection A ∩ S+ with the positive semidefinite cone.
Given
svec(A1 )T
.. m×n(n+1)/2
A = ∈R (712)
.
svec(Am )T
semidefinite Farkas’ lemma assumes that a convex cone
K = {A svec X | X º 0} (390)
is closed per membership relation (327) from which the lemma springs: [265, §I] K closure
is attained when matrix A satisfies the cone closedness invariance corollary (p.143). Given
closed convex cone K and its dual from Example 2.13.6.1.1
m
X
K∗ = {y | yj Aj º 0} (397)
j=1
then we can apply membership relation
b ∈ K ⇔ hy , bi ≥ 0 ∀ y ∈ K∗ (327)
to obtain the lemma
n
b∈K ⇔ ∃ X º 0 Ä A svec X = b ⇔ A ∩ S+ 6= ∅ (723)
∗ n
b∈K ⇔ hy , bi ≥ 0 ∀ y ∈ K ⇔ A∩ S+ 6= ∅ (724)
The final equivalence synopsizes semidefinite Farkas’ lemma.
While the lemma is correct as stated, a positive definite version is required for
semidefinite programming [461, §1.3.8] because existence of a feasible solution in the cone
n
interior A ∩ intr S+ is required by Slater’s condition 4.10 to achieve 0 duality gap (optimal
primal−dual objective difference, §4.2.3, Figure 64). Geometrically, a positive definite
lemma is required to insure that a point of intersection closest to the origin is not at
infinity; e.g, Figure 48. Then given A ∈ Rm×n(n+1)/2 having rank m , we wish to detect
existence of nonempty primal feasible set interior to the PSD cone;4.11 (393)
n
b ∈ intr K ⇔ hy , bi > 0 ∀ y ∈ K∗, y 6= 0 ⇔ A ∩ intr S+ 6= ∅ (725)
Positive definite Farkas’ lemma is made from proper cones, K (390) and K∗ (397), and
membership relation (333) for which K closedness is unnecessary:
4.10 Slater’ssufficient constraint qualification is satisfied whenever any primal or dual strictly feasible
solution exists; id est, any point satisfying the respective affine constraints and relatively interior to the
convex cone. [372, §6.6] [43, p.325] If the cone were polyhedral, then Slater’s constraint qualification is
satisfied when any feasible solution exists (relatively interior to the cone or on its relative boundary).
[66, §5.2.3]
4.11 Detection of A ∩ intr S n 6= ∅ by examining intr K instead is a trick need not be lost.
+
232 CHAPTER 4. SEMIDEFINITE PROGRAMMING
A = {X ∈ S n |hAi , X i = b i , i = 1 . . . m} (2297)
n
Primal feasible cone interior A ∩ intr S+ is nonempty if and only if y T b > 0 holds for each
m
P
and every vector y = [yi ] 6= 0 such that yi Ai º 0.
i=1
n
Equivalently, primal feasible cone interior A ∩ intr S+ is nonempty if and only if
m
T
y b > 0 holds for each and every vector kyk = 1 Ä
P
yi Ai º 0. ⋄
i=1
svec(A1 )T
· ¸ · ¸ · ¸
0 1 0 1
A , = , b = (726)
svec(A2 )T 0 0 1 0
n
Intersection A ∩ S+ is practically empty because the solution set
(" # )
α √1
2
{X º 0 | A svec X = b} = º 0 | α∈ R (727)
√1 0
2
m
yi Ai º 0 ⇒ y T b ≥ 0 the dual
P
is positive semidefinite only asymptotically (α → ∞). Yet
i=1
system erroneously indicates nonempty intersection because K (390) violates a closedness
condition of the lemma; videlicet, for kyk = 1
" #
0 √1
· ¸ · ¸
0 0 0
y1 √1 2 + y2 º 0 ⇔ y= ⇒ yT b = 0 (728)
2
0 0 1 1
n
On the other hand, positive definite Farkas’ Lemma 4.2.1.1.2 certifies that A ∩ intr S+ is
empty; what we need to know for semidefinite programming.
Lasserre suggested addition of another condition to semidefinite Farkas’ lemma
(§4.2.1.1.1) to make a new lemma having no closedness condition. But positive definite
Farkas’ lemma (§4.2.1.1.2) is simpler and obviates the additional condition proposed.
2
n
Any single vector y satisfying the alternative certifies A ∩ intr S+ is empty. Such a vector
can be found as a solution to another semidefinite program: for linearly independent
4.2. FRAMEWORK 233
minimize yT b
y
m
(730)
P
subject to yi Ai º 0
i=1
kyk2 ≤ 1
If an optimal vector y ⋆ 6= 0 can be found such that y ⋆T b ≤ 0 , then primal feasible cone
n
interior A ∩ intr S+ is empty.
find y 6= 0
subject to yT b = 0 (732)
Pm
yi Ai º 0
i=1
Any such nonzero solution y certifies that affine subset A (2297) intersects the positive
n n
semidefinite cone S+ only on its boundary; in other words, nonempty feasible set A ∩ S+
n
belongs to the positive semidefinite cone boundary ∂ S+ .
4.2.2 Duals
The dual objective function from (711D) evaluated at any feasible solution represents a
lower bound on the primal optimal objective value from (711P). We can see this by direct
n
substitution: Assume the feasible sets A ∩ S+ and D∗ are nonempty. Then it is always
true:
¿ hC , XÀ
i ≥ hb , yi
P
yi Ai + S , X ≥ [ hA1 , X i · · · hAm , X i ] y (733)
i
hS , X i ≥ 0
The converse also follows because
X º 0 , S º 0 ⇒ hS , X i ≥ 0 (1655)
Optimal value of the dual objective thus represents the greatest lower bound on the primal.
This fact is known as weak duality for semidefinite programming, [461, §1.3.8] and can be
used to detect convergence in any primal/dual numerical method of solution.
4.12 From the results of Example 2.13.6.1.1, vector b on the boundary of K cannot be detected simply
by looking for 0 eigenvalues in matrix X . We do not consider a thin-or-square matrix A because then
n
feasible set A ∩ S+ is at most a single point.
234 CHAPTER 4. SEMIDEFINITE PROGRAMMING
P P̃
duality
duality
D D̃
transformation
Figure 96: Connectivity indicates paths between particular primal and dual problems
from Exercise 4.2.2.1.1. More generally, any path between primal problems P (and
equivalent P̃) and dual D (and equivalent D̃) is possible: implying, any given path is
not necessarily circuital; dual of a dual problem is not necessarily stated in precisely same
manner as corresponding primal convex problem, in other words, although its solution set
is equivalent to within some transformation.
maximize hb , yi
y∈Rm, S∈ Sn maximize
m
hb , yi
(D) subject to S º 0 ≡ y∈R (711D̃)
subject to svec−1 (AT y) ¹ C
svec−1 (AT y) + S = C
n
Dual feasible cone interior in intr S+ (722) (713) thereby corresponds with canonical dual
(D̃) feasible interior
m
( )
X
∗ m
rel intr D̃ , y ∈ R | yi Ai ≺ C (734)
i=1
¿ hC , X ⋆À i = hb , y ⋆ i
yi⋆ Ai + S ⋆ , X ⋆ = [ hA1 , X ⋆ i · · · hAm , X ⋆ i ] y ⋆
P
(735)
i
hS ⋆ , X ⋆ i = 0
4.2.3.0.1 Corollary. Optimality and strong duality. [406, §3.1] [461, §1.3.8]
For semidefinite programs (711P) and (711D), assume primal and dual feasible sets
n
A ∩ S+ ⊂ S n and D∗ ⊂ S n × Rm (722) are nonempty. Then
X ⋆ is optimal for (711P)
S ⋆ , y ⋆ are optimal for (711D)
duality gap hC , X ⋆ i− hb , y ⋆ i is 0
if and only if
n
i) ∃ X ∈ A ∩ intr S+ or ∃ y ∈ rel intr D̃∗
and
ii) hS ⋆ , X ⋆ i = 0 ⋄
minimize kxk0
x
subject to Ax = b (740)
xi ∈ {0, 1} , i=1 . . . n
where kxk0 denotes cardinality of vector x (a.k.a 0-norm; not a convex function).
A minimal cardinality solution answers the question: “Which fewest linear combination
of columns in A constructs vector b ?” Cardinality problems have extraordinarily wide
appeal, arising in many fields of science and across many disciplines. [361] [246] [200] [199]
Yet designing an efficient algorithm to optimize cardinality has proved difficult. In this
example, we also constrain the variable to be Boolean. The Boolean constraint forces an
identical solution were the norm in problem (740) instead the 1-norm or 2-norm; id est,
the two problems
are the same. The Boolean constraint makes the 1-norm problem nonconvex.
Given data
−1 1 8 1 1 0 1
A = −3 2 8 21 1
3
1
2−3
1
, b = 12 (742)
1 1 1 1 1
−9 4 8 4 9 4−9 4
x⋆ = e4 ∈ R6 (743)
has norm kx⋆ k2 = 1 and minimal cardinality; the minimum number of nonzero entries in
vector x . The Matlab backslash command x=A\b , for example, finds
2
128
0
5
xM = 128 (744)
0
90
128
0
has least norm kxPk2 = 0.5165 ; id est, the optimal solution to (§E.0.1.0.1)
minimize kxk2
x (746)
subject to Ax = b
x , (x̂ + 1) 21 (747)
so x̂i ∈ {−1, 1} ; equivalently,
minimize k(x̂ + 1) 12 k0
x̂
subject to A(x̂ + 1) 12 = b (748)
δ(x̂x̂T ) = 1
x̂ [ x̂T 1 ] x̂x̂T x̂
· ¸ · ¸ · ¸
X x̂
G= = , ∈ S n+1 (749)
1 x̂T 1 x̂T 1
minimize 1T x̂
X∈ Sn , x̂∈Rn
1
subject to A(x̂ +
· 1) 2 = b ¸
X x̂ (750)
G= (º 0)
x̂T 1
δ(X) = 1
rank G = 1
where solution is confined to rank-1 vertices of the elliptope in S n+1 (§5.9.1.0.1) by the
rank constraint, the positive semidefiniteness, and the equality constraints δ(X) = 1. The
rank constraint makes this problem nonconvex; by removing it4.15 we get the semidefinite
program
minimize
n n
1T x̂
X∈ S , x̂∈R
1
subject to A(x̂ +
· 1) 2 = b ¸
X x̂ (751)
G= º0
x̂T 1
δ(X) = 1
4.15 Relaxed problem (751) can also be derived via Lagrange duality; it is a dual of a dual program
[sic ] to (750). [346] [66, §5, exer.5.39] [447, §IV] [175, §11.3.4] The relaxed problem must therefore be
convex having a larger feasible set; its optimal objective value represents a generally loose lower bound
(1869) on the optimal objective of problem (750).
238 CHAPTER 4. SEMIDEFINITE PROGRAMMING
whose optimal solution x⋆ (747) is identical to that of minimal cardinality Boolean problem
(740) if and only if rank G⋆ = 1.
Hope4.16 of acquiring a rank-1 solution is not ill-founded because 2n elliptope vertices
have rank 1 and because we are minimizing an affine function on a subset of the elliptope
(Figure 157) containing rank-1 vertices; id est, by assumption that the feasible set of
minimal cardinality Boolean problem (740) is nonempty, a desired solution resides on the
elliptope relative boundary at a rank-1 vertex.4.17
For that data given in (742), our semidefinite program solver sdpsol [454] [455]
(accurate in solution to approximately 1E-8)4.18 finds optimal solution to (751)
1 1 1 −1 1 1 −1
1 1 1 −1 1 1 −1
1 1 1 −1 1 1 −1
⋆
round(G ) =
−1 −1 −1 1 −1 −1 1
(752)
1 1 1 −1 1 1 −1
1 1 1 −1 1 1 −1
−1 −1 −1 1 −1 −1 1
near a rank-1 vertex of the elliptope in S n+1 (Theorem 5.9.1.0.2); its sorted eigenvalues,
6.99999977799099
0.00000022687241
0.00000002250296
⋆
λ(G ) =
0.00000000262974
(753)
−0.00000000999738
−0.00000000999875
−0.00000001000000
0.00000000127947
0.00000000527369
0.00000000181001
x⋆ = round
= e4 (754)
0.99999997469044
0.00000001408950
0.00000000482903
These numerical results are solver dependent; insofar, not all SDP solvers will return a
rank-1 vertex solution. 2
1E-8 on a machine using 64-bit (double precision) floating-point arithmetic; id est, optimal solution x⋆
cannot be more accurate than square root of machine epsilon (ǫ = 2.2204E-16). Nonzero primal−dual
objective difference is not a good measure of solution accuracy.
4.2. FRAMEWORK 239
(infeasible, with or without rounding, with respect to original problem (740)) whereas
solving semidefinite program (751) produces
1 1 −1 1
1 1 −1 1
round(G⋆ ) =
−1 −1
(760)
1 −1
1 1 −1 1
with sorted eigenvalues
3.99999965057264
0.00000035942736
λ(G⋆ ) =
−0.00000000000000 (761)
−0.00000001000000
Truncating all but the largest eigenvalue, from (747) we obtain (confer y ⋆ )
0.99999999625299 1
x⋆ = round 0.99999999625299 = 1 (762)
0.00000001434518 0
2. maximize ti (774)
i−1
⋆ n
t⋆j Bj S+
P
subject to X + + ti Bi ∈
j=1
} ¶
4.20 There is no known construction for Barvinok’s tighter result (284). −Monique Laurent, 2004
242 CHAPTER 4. SEMIDEFINITE PROGRAMMING
where the optimal t⋆j are scalars and Ri ∈ Rn×ρ is full-rank and thin where
i−1
X
ρ , rankX ⋆ + t⋆j Bj = rank Xi (770)
j=1
findρ Ri Zi RiT 6= 0
Zi ∈ S (771)
subject to hZi , RiTAj Ri i = 0 , j =1 . . . m
where λ(Zi ) ∈ Rρ denotes the eigenvalues of Zi . Necessity and sufficiency are due to the
facts: Ri can be completed to a nonsingular matrix (§A.3.1.0.5.c), and I − ti ψ(Zi )Zi can
4.21 Because of how 0 and indefinites are handled, ψ is not an odd function; id est, ψ(−Z) 6= −ψ(Z).
4.22 A simple method of solution is closed-form projection of a nonzero random point Zi on that
proper subspace of isometrically isomorphic Rρ(ρ+1)/2 specified by the constraints. (§E.5.0.0.7) Such
a solution is nontrivial assuming the specified intersection of hyperplanes is not the origin; guaranteed
by ρ(ρ + 1)/2 > m . This geometric intuition, about forming a perturbation, is indeed what bounds any
solution’s rank from below; m is fixed by the number of equality constraints in (711P) while rank ρ
decreases with each iteration i . Otherwise, we might iterate indefinitely.
4.3. RANK REDUCTION 243
n
(281) which characterizes rank ρ of any [sic] extreme point in A ∩ S+ . [273, §2.4] [274]
Proof. Assuming the form of every perturbation matrix is indeed (768), then by (771)
When Zi can only be 0 , then the perturbation is null because an extreme point has been
found; thus
¤⊥
svec(RiTA1 Ri ) svec(RiTA2 Ri ) · · · svec(RiTAm Ri )
£
= 0 (778)
i
X
hC , X ⋆ + t⋆j Bj i = hC , X ⋆ i (779)
j=1
Proof. From Corollary 4.2.3.0.1 we have the necessary and sufficient relationship between
optimal primal and dual solutions under assumption of nonempty primal feasible cone
n
interior A ∩ intr S+ :
This means R(R1 ) ⊆ N (S ⋆ ) and R(S ⋆ ) ⊆ N (R1T ). From (769) and (772), after 0-padding
Zi for dimensional compatibility, come the sequence:
4.23 This holds because rank of a positive semidefinite matrix in S n is diminished below n by the number
of its 0 eigenvalues (1630), and because a positive semidefinite matrix having one or more 0 eigenvalues
corresponds to a point on the PSD cone boundary (200).
244 CHAPTER 4. SEMIDEFINITE PROGRAMMING
X ⋆ = R1 R1T
X ⋆ + t⋆1 B1 = R2 R2T = R1 (I − t⋆1 ψ(Z1 )Z1 )R1T
p p
X ⋆ + t⋆1 B1 + t⋆2 B2 = R3 R3T = R2 (I − t⋆2 ψ(Z2 )Z2 )R2T = R1 I − t⋆1 ψ(Z1 )Z1 (I − t⋆2 ψ(Z2 )Z2 ) I − t⋆1 ψ(Z1 )Z1 R1T
..
. Ã !Ã !
i i q 1q
X⋆ + t⋆j Bj = R1 I − t⋆j ψ(Zj )Zj I − t⋆j ψ(Zj )Zj R1T ,
P Q Q
(781) i>0
j=1 j=1 j=i
where second product counts backwards. Substituting C = svec−1 (AT y ⋆ ) + S ⋆ from (711),
* +
i i q 1q
hC , X ⋆ + t⋆j Bj i = svec−1 (AT y ⋆ ) + S ⋆ , R1 I − t⋆j ψ(Zj )Zj I − t⋆j ψ(Zj )Zj R1T
P Q Q
j=1 j=1 j=i
* +
m i
=
P ⋆
yk Ak , X +⋆
P ⋆
tj Bj (782)
k=1 j=1
¿m À
P ⋆
= yk Ak + S ⋆ , X ⋆ = hC , X ⋆ i
k=1
−1 1 8 1 1 1
A = −3 2 8 1 1 ∈ Rm×n , b = 1 ∈ Rm (783)
2 3 2
1 1 1
−9 4 8 4 9 4
minimize
5
tr X
X∈ S
subject to X º 0 (784)
A δ(X) = b
that minimizes the 1-norm of the main diagonal; id est, problem (784) is the same as
minimize
5
kδ(X)k1
X∈ S
subject to X º 0 (785)
A δ(X) = b
feasible solution from m = 3 equality constraints. To find a lower rank ρ optimal solution
to (784) (barring combinatorics), we invoke Procedure 4.3.1.0.1:
find R1 Z1 R1T 6= 0
Z1 ∈ S3 (788)
subject to hZ1 , R1TAj R1 i = 0 , j = 1, 2, 3
Z1 = 11T − I ∈ S3 (789)
Then (rounding)
0 0 0.0247 0 0.1048
0 0 0 0 0
B1 =
0.0247 0 0 0 0.1657
(790)
0 0 0 0 0
0.1048 0 0.1657 0 0
4.4.1 perturbation of x⋆
Given affine subset
A = {x ∈ Rn | Ax = b } (153)
where
aT
1
.
A = .. ∈ Rm×n (152)
aTm
and given any optimal solution x⋆ to LP
minimize c Tx
x
subject to xº0 (710p)
Ax = b
4.24 Contemporary numerical packages for solving semidefinite programs can solve a range of problems
wider than prototype (711). Generally, they do so by transforming a given problem into prototypical form
by introducing new constraints and variables. [12] [455] We are momentarily considering a departure from
the primal prototype that augments the constraint set with linear inequalities.
4.4. CARDINALITY REDUCTION 247
whose cardinality is not minimal, an extreme point of A ∩ Rn+ (whose primal objective
value (710p) is optimal) would possess reduced cardinality. To reveal such an extreme
point, we posit existence of a set of perturbations to x⋆ (like those in §4.3.1)
{tj βj | tj ∈ R , βj ∈ Rn , j = 1 . . . n} (794)
becomes extreme and optimal. Membership of (795) to affine subset A is guaranteed, for
the i th perturbation, by constraints
hβi , aj i = 0 , j =1 . . . m (796)
where the optimal t⋆j are scalars and where zi is found at each iteration i by solving a
simple feasibility problem:
find zi ◦ xi 6= 0
zi ∈ Rn (799)
subject to hzi , aj ◦ xi i = 0 , j =1 . . . m
x⋆ = x1
x⋆ + t⋆1 β1 = x2 = (1 − t1 ψ(δ(z1 ))z1 ) ◦ x1
x⋆ + t⋆1 β1 + t⋆2 β2 = x3 = (1 − t2 ψ(δ(z2 ))z2 ) ◦ x2 = (1 − t1 ψ(δ(z1 ))z1 ) ◦ (1 − t2 ψ(δ(z2 ))z2 ) ◦ x1
..
. Ã !
i i
x⋆ + t⋆j βj =
P Q
δ(1 − ti ψ(δ(zi ))zi ) x1 , (802) i>0
j=1 j=1
248 CHAPTER 4. SEMIDEFINITE PROGRAMMING
The following algorithm locates an optimal extreme point, assuming nontrivial solution:
given any optimal primal solution x⋆
2. maximize ti (805)
i−1
subject to x⋆ + t⋆j βj + ti βi º 0
P
j=1
} ¶
4.4.2.0.2 Example. Ax = b .
Cardinality minimization is often at odds with norm minimization because these two
objectives can compete; e.g, §4.2.3.1.1. Yet, prior knowledge of optimal norm objective
value may facilitate a cardinality minimization problem. If optimal solution x⋆ were known
to be binary with particular cardinality ρ , for example, then a linear constraint on the
variable 1T x = ρ might be warranted because ρ = kxk1 for a binary variable. Columns of
this particular A matrix
−1 1 8 1 1 0 1
A = −3 2 8 12 1
3
1
2−3
1
∈ Rm×n , b = 21 ∈ Rm (742)
1 1 1 1 1
−9 4 8 4 9 4−9 4
brings objective cTx (c = 1) down into the constraints. Were cardinality-1 solution found,
feasible x would certainly be binary. Because minimization of cTx is forgone, conditions
for 0-duality gap (312) are unmet; objective value cannot be maintained as in §4.3.3.
2
159
0
5
159
xG = (808)
0
121
159
31
159
Initialize: c = 1 , ρ = 1 , aj , j = 1, 2, 3 (152)(p.246), x⋆ = xG , m = 3 , n = 6.
{
Iteration i=1:
Step 1: x1 = x⋆ .
find z1 ◦ x1 6= 0
z1 ∈ R6 (809)
subject to hz1 , aj ◦ x1 i = 0 , j = 1, 2, 3
Choose
159 T
£ 159 159 1546
¤
z1 = − 128 1 − 128 1 3963 31 (810)
Then (797)
£ 1 5 19
¤T
β1 = − 64 0 − 128 0 64 1 (811)
Step 2: t⋆1 = 128
159 . So,
T
x⋆ ← xG + t⋆1 β1 = [ 0 0 0 0 1 1] (812)
has cardinality ρ ← 2.
}
As illustrated by Example 4.4.2.0.2, cardinality reduction can fail (at (797)) to find a
minimal cardinality solution when x1 has a 0-entry in a minimal cardinality location.
This result instigates search for a new method:
(266), and by definition of extreme point (172) for which no convex combination can produce it: If a least
rank solution were expressible as a convex combination of feasible points, then there could exist feasible
matrices of lesser rank.
4.5. RANK CONSTRAINT BY CONVEX ITERATION 251
Sn
(I −W )G⋆ (I −W ) G⋆
W G⋆ W Sn⊥
This set (95), argument to conv{ } , comprises the extreme points of this Fantope (93).
An optimal solution W to (1892a), that is an extreme point, is known in closed form
(p.539): Given ordered diagonalization G⋆ = QΛQT ∈ SN + (§A.5.1), then direction matrix
W=U U ⋆ ⋆T
is optimal and extreme where U = Q(: , n+1 : N ) ∈ RN ×N −n . Eigenvalue
⋆
vector λ(W ) has 1 in each entry corresponding to the N − n smallest entries of δ(Λ) and
has 0 elsewhere. By (229) (232), polar direction −W can be regarded as pointing toward
the set of all rank-n (or less) positive semidefinite matrices whose nullspace contains that
of G⋆ . For that particular closed-form solution W , consequent to Theobald (p.495),
(confer (852))
XN
λ(G⋆ )i = hG⋆ , W i = λ(G⋆ )T λ(W ) ≥ 0 (816)
i=n+1
So that this method, for constraining rank, will not be misconstrued under closed-form
solution W to (1892a): Define (confer (229))
find X ∈ SN
subject to A svec X = b
Xº0
rank X ≤ n
252 CHAPTER 4. SEMIDEFINITE PROGRAMMING
S2+
0
∂H = {G | hG , I i = κ}
Figure 98: (confer Figure 114) Trace heuristic can be interpreted as minimization of a
hyperplane, with normal I , over positive semidefinite cone drawn here in isometrically
isomorphic R3 . Polar of direction vector W = I points toward origin.
4.5.1.2 convergence
We study convergence to ascertain conditions under which a direction matrix will reveal
a feasible solution G , of rank n or less, to semidefinite program (814). Denote by W ⋆
a particular optimal direction matrix from semidefinite program (1892a) such that (815)
holds (feasible rank G ≤ n found). Then we define global optimality of the iteration (814)
(1892a) to correspond with this vanishing vector inner-product (815) of optimal solutions.
Because this iterative technique for constraining rank is not a projection method, it
can find a rank-n solution G⋆ ((815) will be satisfied) only if at least one exists in the
feasible set of program (814).
4.5. RANK CONSTRAINT BY CONVEX ITERATION 253
find X≥0
X∈ R2×3 · ¸
I X (819)
subject to Z= º0
XT G
rank Z ≤ 2
X = WH (820)
by solving
find W,H
A∈S3 , B∈S3 , W ∈ R3×2 , H∈ R2×3
WT
I H
subject to Z= W A X º 0
HT XT B (821)
W ≥0
H≥0
rank Z ≤ 2
which follows from the fact, at optimality,
[I WT H ]
I
⋆
Z = W (822)
HT
Use the known closed-form solution for a direction vector Y to regulate rank by convex
iteration; set Z ⋆ = QΛQT ∈ S8 to an ordered diagonalization and U ⋆ = Q(: , 3 : 8) ∈ R8×6 ,
then Y = U ⋆ U ⋆T (§4.5.1.1).
In summary, initialize Y then iterate numerical solution of (convex) semidefinite
program
minimize hZ , Y i
A∈S3 , B∈S3 , W ∈ R3×2 , H∈ R2×3
WT H
I
subject to Z = W A X º 0 (823)
HT XT B
W ≥0
H≥0
with Y = U ⋆ U ⋆T until convergence (which is to a global optimum, and occurs in very few
iterations for this instance). H
the partitioning adaptive. But no adaptation of a partition actually occurs once it has
been determined.
One can reasonably argue that semidefinite programming methods are unnecessary
for localization of small partitions of large sensor networks. [316] [95] In the past, these
nonlinear localization problems were solved algebraically and computed by least squares
solution to hyperbolic equations; called multilateration.4.31 [261] [303] Indeed, practical
contemporary numerical methods for global positioning (GPS) by satellite do not rely on
convex optimization. [329]
Modern distance geometry is inextricably melded with semidefinite programming. The
beauty of semidefinite programming, as relates to localization, lies in convex expression
of classical multilateration: So & Ye showed [363] that the problem of finding unique
solution, to a noiseless nonlinear system describing the common point of intersection of
hyperspheres in real Euclidean vector space, can be expressed as a semidefinite program
via distance geometry.
But the need for SDP methods in Carter & Jin et alii is enigmatic for two more
reasons: 1) guessing solution to a partition whose intersensor measurement data
or connectivity is inadequate for localization by distance geometry, 2) reliance on
complicated and extensive heuristics for partitioning a large network that could instead
be efficiently solved whole by one semidefinite program [256, §3]. While partitions range
in size between 2 and 10 sensors, 5 sensors optimally, heuristics provided are only for
4.31 Multilateration
- literally, having many sides; shape of a geometric figure formed by nearly intersecting
lines of position. In navigation systems, therefore: Obtaining a fix from multiple lines of position.
Multilateration can be regarded as noisy trilateration.
256 CHAPTER 4. SEMIDEFINITE PROGRAMMING
two spatial dimensions (no higher-dimensional heuristics are proposed). For these small
numbers it remains unclarified as to precisely what advantage is gained over traditional
least squares: it is difficult to determine what part of their noise performance is attributable
to SDP and what part is attributable to their heuristic geometry.
Partitioning of large sensor networks is a compromise to rapid growth of SDP
computational intensity with problem size. But when impact of noise on distance
measurement is of most concern, one is averse to a partitioning scheme because noise-effects
vary inversely with problem size. [57, §2.2] (§5.13.2) Since an individual partition’s solution
is not iterated in Carter & Jin and is interdependent with adjoining partitions, we expect
errors to propagate from one partition to the next; the ultimate partition solved, expected
to suffer most.
Heuristics often fail on real-world data because of unanticipated circumstances.
When heuristics fail, generally they are repaired by adding more heuristics. Tenuous
is any presumption, for example, that distance measurement errors have distribution
characterized by circular contours of equal probability about an unknown sensor-location.
(Figure 99) That presumption effectively appears within Carter & Jin’s optimization
problem statement as affine equality constraints relating unknowns to distance
measurements that are corrupted by noise. Yet in most all urban environments, this
measurement noise is more aptly characterized by ellipsoids of varying orientation and
eccentricity as one recedes from a sensor. (Figure 153) Each unknown sensor must
therefore instead be bound to its own particular range of distance, primarily determined
by the terrain.4.32 The nonconvex problem we must instead solve is:
find {xi , xj }
i,j∈ I (824)
subject to dij ≤ kxi − xj k2 ≤ dij
where xi represents sensor location, and where dij and dij respectively represent lower
and upper bounds on measured distance-square from i th to j th sensor (or from sensor
to anchor). Figure 104 illustrates contours of equal sensor-location uncertainty. By
establishing these individual upper and lower bounds, orientation and eccentricity can
effectively be incorporated into the problem statement.
Generally speaking, there can be no unique solution to the sensor-network localization
problem because there is no unique formulation; that is the art of Optimization. Any
optimal solution obtained depends on whether or how a network is partitioned, whether
distance data is complete, presence of noise, and how the problem is formulated. When
a particular formulation is a convex optimization problem, then the set of all optimal
solutions forms a convex set containing the actual or true localization. Measurement
noise precludes equality constraints representing distance. The optimal solution set is
consequently expanded; necessitated by introduction of distance inequalities admitting
more and higher-rank solutions. Even were the optimal solution set a single point, it is
not necessarily the true localization because there is little hope of exact localization by
any algorithm once significant noise is introduced.
Carter & Jin gauge performance of their heuristics to the SDP formulation of author
Biswas whom they regard as vanguard to the art. [16, §1] Biswas posed localization as an
optimization problem minimizing a distance measure. [51] [49] Intuitively, minimization
of any distance measure yields compacted solutions; (confer §6.7.0.0.1) precisely the
anomaly motivating Carter & Jin. Their two-dimensional heuristics outperformed Biswas’
localizations both in execution-time and proximity to the desired result. Perhaps, instead
of heuristics, Biswas’ approach to localization can be improved: [48] [50].
4.32 A distinct contour map corresponding to each anchor is required in practice.
4.5. RANK CONSTRAINT BY CONVEX ITERATION 257
2
3
1 4
Figure 100: 2-lattice in R2 , hand-drawn. Nodes 3 and 4 are anchors; remaining nodes are
sensors. Radio range of sensor 1 indicated by arc.
Jin proposes an academic test in two-dimensional real Euclidean space R2 that we adopt.
In essence, this test is a localization of sensors and anchors arranged in a regular triangular
lattice. Lattice connectivity is solely determined by sensor radio range; a connectivity
graph is assumed incomplete. In the interest of test standardization, we propose adoption
of a few small examples: Figure 100 through Figure 103 and their particular connectivity
represented by matrices (825) through (828) respectively.
0 • ? •
• 0 • •
(825)
? • 0 ◦
• • ◦ 0
Matrix entries dot • indicate measurable distance between nodes while unknown
distance is denoted by ? (question mark ). Matrix entries hollow dot ◦ represent known
distance between anchors (to high accuracy) while zero distance is denoted 0. Because
measured distances are quite unreliable in practice, our solution to the localization problem
substitutes a distinct range of possible distance for each measurable distance; equality
constraints exist only for anchors.
Anchors are chosen so as to increase difficulty for algorithms dependent on existence
of sensors in their convex hull. The challenge is to find a solution in two dimensions close
to the true sensor positions given incomplete noisy intersensor distance information.
258 CHAPTER 4. SEMIDEFINITE PROGRAMMING
5 7 6
3 8 4
1 9 2
Figure 101: 3-lattice in R2 , hand-drawn. Nodes 7, 8, and 9 are anchors; remaining nodes
are sensors. Radio range of sensor 1 indicated by arc.
0 • • ? • ? ? • •
• 0 • • ? • ? • •
• • 0 • • • • • •
? • • 0 ? • • • •
• ? • ? 0 • • • • (826)
? • • • • 0 • • •
? ? • • • • 0 ◦ ◦
• • • • • • ◦ 0 ◦
• • • • • • ◦ ◦ 0
4.5. RANK CONSTRAINT BY CONVEX ITERATION 259
10 13 11 12
7 14 8 9
4 15 5 6
1 16 2 3
Figure 102: 4-lattice in R2 , hand-drawn. Nodes 13, 14, 15, and 16 are anchors; remaining
nodes are sensors. Radio range of sensor 1 indicated by arc.
0 ? ? • ? ? • ? ? ? ? ? ? ? • •
? 0 • • • • ? • ? ? ? ? ? • • •
? • 0 ? • • ? ? • ? ? ? ? ? • •
• • ? 0 • ? • • ? • ? ? • • • •
? • • • 0 • ? • • ? • • • • • •
? • • ? • 0 ? • • ? • • ? ? ? ?
• ? ? • ? ? 0 ? ? • ? ? • • • •
? • ? • • • ? 0 • • • • • • • •
(827)
? ? • ? • • ? • 0 ? • • • ? • ?
? ? ? • ? ? • • ? 0 • ? • • • ?
? ? ? ? • • ? • • • 0 • • • • ?
? ? ? ? • • ? • • ? • 0 ? ? ? ?
? ? ? • • ? • • • • • ? 0 ◦ ◦ ◦
? • ? • • ? • • ? • • ? ◦ 0 ◦ ◦
• • • • • ? • • • • • ? ◦ ◦ 0 ◦
• • • • • ? • • ? ? ? ? ◦ ◦ ◦ 0
260 CHAPTER 4. SEMIDEFINITE PROGRAMMING
17 18 21 19 20
13 14 22 15 16
9 10 23 11 12
5 6 24 7 8
1 2 25 3 4
0 • ? ? • • ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
• 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? • •
? ? 0 • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ? • • •
? ? • 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? • ?
• • ? ? 0 • ? ? • • ? ? • • ? ? • ? ? ? ? ? • ? •
• • • ? • 0 • ? • • • ? ? • ? ? ? ? ? ? ? ? • • •
? ? • • ? • 0 • ? ? • • ? ? • • ? ? ? ? ? ? • • •
? ? • • ? ? • 0 ? ? • • ? ? • • ? ? ? ? ? ? ? • ?
• ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? ? ? ? ?
? • ? ? • • ? ? • 0 • ? • • ? ? ? • ? ? • • • • •
? ? • ? ? • • • ? • 0 • ? • • • ? ? • ? ? • • • •
? ? • • ? ? • • ? ? • 0 ? ? • • ? ? • • ? • • • ?
? ? ? ? • ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? (828)
? ? ? ? • • ? ? • • • ? • 0 • ? • • • ? • • • • ?
? ? ? ? ? ? • • ? ? • • ? • 0 • ? ? • • • • • • ?
? ? ? ? ? ? • • ? ? • • ? ? • 0 ? ? • • ? • ? ? ?
? ? ? ? • ? ? ? • ? ? ? • • ? ? 0 • ? ? • ? ? ? ?
? ? ? ? ? ? ? ? • • ? ? • • ? ? • 0 • ? • • • ? ?
? ? ? ? ? ? ? ? ? ? • • ? • • • ? • 0 • • • • ? ?
? ? ? ? ? ? ? ? ? ? ? • ? ? • • ? ? • 0 • • ? ? ?
? ? ? ? ? ? ? ? ? • ? ? • • • ? • • • • 0 ◦ ◦ ◦ ◦
? ? ? ? ? ? ? ? ? • • • • • • • ? • • • ◦ 0 ◦ ◦ ◦
? ? • ? • • • ? ? • • • ? • • ? ? • • ? ◦ ◦ 0 ◦ ◦
? • • • ? • • • ? • • • ? • • ? ? ? ? ? ◦ ◦ ◦ 0 ◦
? • • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ◦ ◦ ◦ ◦ 0
4.5. RANK CONSTRAINT BY CONVEX ITERATION 261
M
AR
KE
T
St
.
Figure 104: Location uncertainty ellipsoid in R2 for each of 15 sensors • within three city
blocks in downtown San Francisco. (Data by Polaris Wireless.)
problem statement
Ascribe points in a list {xℓ ∈ Rn , ℓ = 1 . . . N } to the columns of a matrix X ;
X = [ x1 · · · xN ] ∈ Rn×N (79)
kx1 k2 xT xT xT
1 x2 1 x3 ··· 1 xN
T
x2 x1 kx2 k2 xT
2 x3 ··· xT2 xN
..
T
G = X X = xT xT kx3 k2 . T ∈ SN (1058)
3 x1 3 x2 x3 xN +
.. .. ..
.. ..
. . . . .
xNT x1 xNT x2 T
xN x3 ··· kxN k2
where SN+ is the convex cone of N × N positive semidefinite matrices in the symmetric
matrix subspace SN .
Existence of noise precludes measured distance from the input data. We instead assign
measured distance to a range estimate specified by individual upper and lower bounds: dij
is an upper bound on distance-square from i th to j th sensor, while dij is a lower bound.
These bounds become the input data. Each measurement range is presumed different from
the others because of measurement uncertainty; e.g, Figure 104.
Our mathematical treatment of anchors and sensors is not dichotomized.4.33 A sensor
position that is known a priori to high accuracy (with absolute certainty) x̌i is called an
anchor. Then the sensor-network localization problem (824) can be expressed equivalently:
Given a number m of anchors and a set of indices I (corresponding to all measurable
distances • ), for 0 < n < N
4.33 Wireless location problem thus stated identically; difference being: fewer sensors.
262 CHAPTER 4. SEMIDEFINITE PROGRAMMING
find X
G∈ SN , X∈ Rn×N
subject to dij ≤ hG , (ei − ej )(ei − ej )T i ≤ dij ∀(i, j) ∈ I
hG , ei eT
i i = kx̌i k ,2
i=N−m + 1... N
hG , (ei eT T
j + ej ei )/2i = x̌T
i x̌j , i<j , ∀ i, j ∈ {N − m + 1 . . . N }
X(: , N − m + 1 : N ) = [ x̌N −m+1 · · · x̌N ]
· ¸
I X
Z= º 0
XT G
rank Z = n (829)
[I X ]
· ¸ · ¸
I X I
= (1099)
XT G XT
The rank constraint insures this equality holds, by Theorem A.4.0.1.3, thus restricting
solution to Rn . Assuming full-rank solution (list) X
minimize hZ , W i
G∈ SN , X∈ Rn×N
subject to dij ≤ hG , (ei − ej )(ei − ej )T i ≤ dij ∀(i, j) ∈ I
hG , ei eT
i i = kx̌i k ,2
i=N−m + 1... N
hG , (ei eT T
j + ej ei )/2i = x̌T
i x̌j , i<j , ∀ i, j ∈ {N − m + 1 . . . N }
X(: , N − m + 1 : N ) = [ x̌N −m+1 · · · x̌N ]
· ¸
I X
Z= º 0 (831)
XT G
4.34 an intersection of two parallel but opposing halfspaces (Figure 13). In terms of position X , this
distance slab can be thought of as a thick hypershell instead of a hypersphere boundary.
4.5. RANK CONSTRAINT BY CONVEX ITERATION 263
1.2
0.8
0.6
0.4
0.2
−0.2
−0.5 0 0.5 1 1.5 2
Figure 105: Typical solution for 2-lattice in Figure 100 with noise factor η = 0.1 (834).
Two red rightmost nodes are anchors; two remaining nodes are sensors. Radio range of
sensor 1 indicated by arc; radius = 1.14 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 1 iteration (831) (1892a)
subject to reflection error.
tr Z = hZ , I i ← hZ , W i (832)
a generalization of the trace heuristic for minimizing convex envelope of rank, where
N +n
W ∈ S+ is constant with respect to (831). Matrix W is normal to a hyperplane in
N +n
S minimized over a convex feasible set specified by the constraints in (831). Matrix
W is chosen so −W points in direction of rank-n feasible solutions G . For properly
chosen W , problem (831) becomes an equivalent to (829). Thus the purpose of vector
inner-product objective (832) is to locate a rank-n feasible Gram matrix assumed existent
on the boundary of positive semidefinite cone SN + , as explained beginning in §4.5.1; how
to choose direction vector W is explained there and in what follows:
direction matrix W
Denote by Z ⋆ an optimal composite matrix from semidefinite program (831). Then
for Z ⋆ ∈ SN +n whose eigenvalues λ(Z ⋆ ) ∈ RN +n are arranged in nonincreasing order,
(Ky Fan)
N
X +n
λ(Z ⋆ )i = minimize hZ ⋆ , W i (1892a)
W ∈ SN +n
i=n+1
subject to 0 ¹ W ¹ I
tr W = N
which has an optimal solution that is known in closed form (p.539, §4.5.1.1). This
eigenvalue sum is zero when Z ⋆ has rank n or less.
Foreknowledge of optimal Z ⋆ , to make possible this search for W , implies iteration;
id est, semidefinite program (831) is solved for Z ⋆ initializing W = I or W = 0. Once
found, Z ⋆ becomes constant in semidefinite program (1892a) where a new normal direction
W is found as its optimal solution. Then this cycle (831) (1892a) iterates until convergence.
264 CHAPTER 4. SEMIDEFINITE PROGRAMMING
1.2
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 106: Typical solution for 3-lattice in Figure 101 with noise factor η = 0.1 (834).
Three red vertical middle nodes are anchors; remaining nodes are sensors. Radio range of
sensor 1 indicated by arc; radius = 1.12 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 2 iterations (831) (1892a).
1.2
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 107: Typical solution for 4-lattice in Figure 102 with noise factor η = 0.1 (834).
Four red vertical middle-left nodes are anchors; remaining nodes are sensors. Radio range
of sensor 1 indicated by arc; radius = 0.75 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 7 iterations (831) (1892a).
4.5. RANK CONSTRAINT BY CONVEX ITERATION 265
1.2
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 108: Typical solution for 5-lattice in Figure 103 with noise factor η = 0.1 (834).
Five red vertical middle nodes are anchors; remaining nodes are sensors. Radio range of
sensor 1 indicated by arc; radius = 0.56 . Actual sensor indicated by target # while its
localization is indicated by bullet • . Rank-2 solution found in 3 iterations (831) (1892a).
1.2
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 109: Typical solution for 10-lattice with noise factor η = 0.1 (834) compares better
than Carter & Jin [79, fig.4.2]. Ten red vertical middle nodes are anchors; the rest are
sensors. Radio range of sensor 1 indicated by arc; radius = 0.25 . Actual sensor indicated
by target # while its localization is indicated by bullet • . Rank-2 solution found in 5
iterations (831) (1892a).
266 CHAPTER 4. SEMIDEFINITE PROGRAMMING
1.2
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2
Figure 110: Typical localization of 100 randomized noiseless sensors (η = 0 (834)) is exact
despite incomplete EDM. Ten red vertical middle nodes are anchors; remaining nodes are
sensors. Radio range of sensor at origin indicated by arc; radius = 0.25 . Actual sensor
indicated by target # while its localization is indicated by bullet • . Rank-2 solution
found in 3 iterations (831) (1892a).
When rank Z ⋆ = n , solution via this convex iteration solves sensor-network localization
problem (824) and its equivalent (829).
numerical solution
In all examples to follow, number of anchors
√
m= N (833)
equals square root of cardinality N of list X . Indices set I identifying all measurable
distances • is ascertained from connectivity matrix (825), (826), (827), or (828). We
solve iteration (831) (1892a) in dimension n = 2 for each respective example illustrated
in Figure 100 through Figure 103.
In presence of negligible noise, true position is reliably localized for every standardized
example; noteworthy insofar as each example represents an incomplete graph. This implies
that the set of all optimal solutions having least rank must be small.
To make the examples interesting and consistent with previous work, we randomize
each range of distance-square that bounds hG , (ei − ej )(ei − ej )T i in (831); id est, for each
and every (i, j) ∈ I √
dij = dij (1 +√ 3 η χl )2
(834)
dij = dij (1 − 3 η χl+1 )2
where η = 0.1 is a constant noise factor, χl is the l th sample of a noise process realization
uniformly distributed in the interval (0 , 1) like rand(1) from Matlab, and dij is actual
distance-square from i th to j th sensor. Because of distinct function calls to rand() , each
range of distance-square [ dij , dij ] is not necessarily centered on actual distance-square
√
dij . Unit stochastic variance is provided by factor 3.
Figure 105 through Figure 108 each illustrate one realization of numerical solution to
the standardized lattice problems posed by Figure 100 through Figure 103 respectively.
Exact localization, by any method, is impossible because of measurement noise. Certainly,
by inspection of their published graphical data, our results are better than those of
Carter & Jin. (Figure 109, 110, 111) Obviously our solutions do not suffer from those
4.5. RANK CONSTRAINT BY CONVEX ITERATION 267
1.2
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2
Figure 111: Typical solution for 100 randomized sensors with noise factor η = 0.1 (834);
worst measured average sensor error ≈ 0.0044 compares better than Carter & Jin’s 0.0154
computed in 0.71s [79, p.19]. Ten red vertical middle nodes are anchors; same as before.
Remaining nodes are sensors. Interior anchor placement makes localization difficult. Radio
range of sensor at origin indicated by arc; radius = 0.25 . Actual sensor indicated by target
# while its localization is indicated by bullet • . After 1 iteration rank G = 92 , after 2
iterations rank G = 4. Rank-2 solution found in 3 iterations (831) (1892a). (Regular
lattice in Figure 109 is actually harder to solve, requiring more iterations.) Runtime for
SDPT3 [395] under cvx [195] is a few minutes on 2009 vintage laptop Core 2 Duo CPU
(Intel T6400@2GHz, 800MHz FSB).
268 CHAPTER 4. SEMIDEFINITE PROGRAMMING
1 + b1 ω + b2 (ω)2 + . . . + b8 (ω)8
H(ω) = (677)
1 + a1 ω + a2 (ω)2 + . . . + a8 (ω)8
find b ∈ R8
G∈ S9
⋆
subject to ·A svec
¸ G=v
1
= G(: , 1) (837)
b
bº0
(G º 0)
rank G = 1
by substitution ω ← s .
270 CHAPTER 4. SEMIDEFINITE PROGRAMMING
level
8 th order Laplace 1
4 th order 4 th order 2
2 nd 2 nd 2 nd 2 nd 3
with vectorized Äi (as in §4.1.1), sums (839) are succinctly represented by two linear
equalities A svec G(v̈) = v ⋆ and A svec G(ü) = u⋆ . Then this spectral factorization in v̈
may be posed as a feasibility problem
find v̈ ∈ R8
G∈ S9
⋆
subject to ·A svec¸ G=v
1
= G(: , 1) (841)
v̈
v̈ º 0
(G º 0)
rank G = 1
Having found two 8 th order square spectral factors in nonnegative v̈ ⋆ from (841), two
pairs of 4 th order level 3 factors remain to be found:
... ... ... ...
1 + v̈1⋆ ω 2 + v̈2⋆ ω 4 + v̈3⋆ ω 6 + v̈4⋆ ω 8 1 + v1 ω 2 + v 2 ω 4 1 + v3 ω 2 + v 4 ω 4
= ... ... ... ... (842)
1 + ü⋆1 ω 2 + ü⋆2 ω 4 + ü⋆3 ω 6 + ü⋆4 ω 8 1 + u1 ω 2 + u2 ω 4 1 + u3 ω 2 + u4 ω 4
... ... ... 2 ... 4
1 + v̈5⋆ ω 2 + v̈6⋆ ω 4 + v̈7⋆ ω 6 + v̈8⋆ ω 8 1 + v5 ω 2 + v 6 ω 4 1 + v7 ω + v 8 ω
⋆ ⋆ ⋆ ⋆ = ... ... ... ... (843)
1 + ü5 ω 2 + ü6 ω 4 + ü7 ω 6 + ü8 ω 8 1 + u5 ω 2 + u6 ω 4 1 + u7 ω 2 + u8 ω 4
hZ , W i rank Z
wc
w
0 f (Z)
Figure 113: Regularization curve, parametrized by weight w for real convex objective f
minimization (848) with rank constraint to k by convex iteration, illustrates discontinuity
in f .
...
then all level 3 (Figure 112) nonnegative spectral factorization coefficients v are found
at once by solving
...
find9 v ∈ R8
G∈ S
⋆
subject to ·A svec ¸ G = v̈
1
... = G(: , 1) (847)
v
...
vº0
(G º 0)
rank G = 1
...
The feasibility problem to find u is similar. All second-order Laplace transfer function
coefficients can be found via (679). 2
4.5.2 regularization
We test the convex iteration technique, for constraining rank, over a wide range of problems
beyond localization of randomized positions (Figure 111); e.g, stress (§7.2.2.7.1), ball
packing (§5.4.2.2.6), and cardinality (§4.7). We have had some success introducing the
direction matrix inner-product (832) as a regularization term4.37
minimize f (Z ⋆ ) + whZ ⋆ , W i
W ∈ SN
subject to 0 ¹ W ¹ I (849)
tr W = N − n
whose purpose is to constrain rank, affine dimension, or cardinality:
The abstraction, that is Figure 113, is a synopsis; a broad generalization of
accumulated empirical evidence: There exists a critical (smallest) weight wc • for which a
rank constraint is just met. Graphical discontinuity can subsequently exist when there is a
range of greater w providing required rank k but not necessarily increasing a minimization
objective function f ; e.g, §4.7.0.0.2. Positive scalar w is chosen via bisection so that
hZ ⋆ , W ⋆ i just vanishes.
find x ∈ Rn
subject to Ax = b
(541)
xº0
kxk0 ≤ k
where kxk0 ≤ k means4.38 vector x has at most k nonzero entries; such a vector is
presumed existent in the feasible set. Nonnegativity constraint x º 0 is analogous to
positive semidefiniteness; the notation means vector x belongs to the nonnegative orthant
n
Rn+ . Cardinality is quasiconcave on Rn+ just as rank is quasiconcave on S+ . [66, §3.4.2]
minimize
n
hx , yi
x∈ R
subject to Ax = b (160)
xº0
n
π(x⋆ )i = hx⋆ , yi
P
minimize
n
i=k+1 y∈ R
(536)
subject to 0¹y¹1
yT 1 = n − k
4.38 Although it is a metric (§5.2), cardinality kxk0 cannot be a norm (§3.2) because it is not positively
homogeneous.
274 CHAPTER 4. SEMIDEFINITE PROGRAMMING
0
R3+
∂H
1
∂H = {x | hx , 1i = κ}
Figure 114: (confer Figure 98) 1-norm heuristic for cardinality minimization can be
interpreted as minimization of a hyperplane, ∂H with normal 1 , over nonnegative orthant
drawn here in R3 . Polar of direction vector y = 1 points toward origin.
where π is the (nonincreasing) presorting function (1487). This sequence is iterated until
x⋆Ty ⋆ vanishes; id est, until desired cardinality is achieved. But this global optimality
cannot be guaranteed.4.39
Problem (536) is analogous to the rank constraint problem; (p.250)
N
X
λ(G⋆ )i = minimize hG⋆ , W i (1892a)
W ∈ SN
i=k+1
subject to 0 ¹ W ¹ I
tr W = N − k
The feasible set of (536) is Linear Program’s analogue to Fantope (§2.3.2.0.1); its optimal
subset comprises a sum of n− k smallest entries from vector x . In context of problem
(541), we want n− k entries of x to sum to zero; id est, we want a globally optimal
objective x⋆Ty ⋆ to vanish: more generally, (confer (815))
n
X
π(|x⋆ |)i = h|x⋆ | , y ⋆ i = |x⋆ |T y ⋆ , 0 (850)
i=k+1
defines global optimality for the iteration. Then n− k entries of x⋆ are themselves zero
whenever their absolute sum is, and cardinality of x⋆ ∈ Rn is at most k . Optimal direction
vector y ⋆ is defined as any nonnegative vector for which
find x ∈ Rn minimize hx , y ⋆ i
n
subject to Ax = b x∈ R
(541) ≡ subject to Ax = b (160)
xº0
kxk0 ≤ k xº0
This set, argument to conv{ } , comprises the extreme points of set (851) which is a
nonnegative hypercube slice. An optimal solution y to (536), that is an extreme point
of its feasible set, is known in closed form: it has 1 in each entry corresponding to the
n− k smallest entries of x⋆ and has 0 elsewhere. That particular polar direction −y can
be interpreted4.40 (by Proposition 7.1.3.0.3) as pointing toward the nonnegative orthant
in the Cartesian subspace, whose basis is a subset of the Cartesian axes, containing all
cardinality k (or less) vectors having the same ordering as x⋆ . Consequently, for that
closed-form solution, (confer (816))
n
X
π(|x⋆ |)i = h|x⋆ | , yi = |x⋆ |T y ≥ 0 (852)
i=k+1
Convex iteration (160) (536) always converges to a locally optimal solution, a fixed point
of possibly infeasible cardinality, by virtue of a monotonically nonincreasing real objective
sequence. [294, §1.2] [44, §1.1] There can be no proof of global optimality, defined by (850).
Constraining cardinality (solution to problem (541)) can often be achieved, but simple
examples can be contrived that stall at a fixed point of infeasible cardinality; at a positive
objective value hx⋆ , yi = τ > 0. Direction vector y is then manipulated, as countermeasure,
to steer out of local minima; e.g, complete randomization as in Example 4.6.1.5.1, or
reinitialization to a random cardinality-(n− k) vector in the same nonnegative orthant
face demanded by the current iterate: y has nonnegative uniformly distributed random
entries in (0 , 1] corresponding to the n− k smallest entries of x⋆ and has 0 elsewhere.
Zero entries behave like memory or state while randomness greatly diminishes likelihood
of a stall. When this particular heuristic is successful, cardinality and objective sequence
hx⋆ , yi versus iteration are characterized by noisy monotonicity.
4.40 Convex iteration (160) (536) is not a projection method because there is no thresholding or discard of
variable-vector x entries. An optimal direction vector y must always reside on the feasible set boundary
in (536) page 273; id est, it is ill-advised to attempt simultaneous optimization of variables x and y .
276 CHAPTER 4. SEMIDEFINITE PROGRAMMING
where convex k-largest norm kxkn is monotonic on Rn+ . There we showed how (541) is
k
equivalently stated in terms of gradients
D E
minimize x , ∇kxk1 − ∇kxk n
x∈Rn k
subject to Ax = b (853)
xº0
because
kxk1 = xT ∇kxk1 , kxkn = xT ∇kxkn , xº0 (854)
k k
The objective function from (853) is a directional derivative (at x in direction x , §D.1.6,
confer §D.1.4.1.1) of the objective function from (541) while the direction vector of convex
iteration
y = ∇kxk1 − ∇kxkn (855)
k
T
is an objective gradient where ∇kxk1 = ∇1 x = 1 under nonnegativity and
∇kxkn = ∇z Tx = arg maximize n
z Tx
k z∈R
subject to 0 ¹ z ¹ 1 , xº0 (544)
zT1 = k
y = 1 − arg maximize
n
z Tx ← arg minimize
n
zTx
z∈R z∈R
subject to 0¹z¹1 subject to 0¹z¹1 (536)
zT1 = k T
z 1=n−k
x⋆ º 0 (1)
Ax⋆ = b (2)
∇kx k1 − ∇kx kn + ATν ⋆ º 0
⋆ ⋆
(3) (856)
k
h∇kx⋆ k1 − ∇kx⋆ kn + A ν , x⋆ i = 0
T ⋆
(4L)
k
These conditions must hold at any optimal solution (locally or globally). By (854), the
fourth condition is identical to
kx⋆ k1 − kx⋆ kn + ν ⋆TAx⋆ = 0 (4L) (857)
k
Because a 1-norm
kxk1 = kxkn + kπ(|x|)k+1:n k1 (858)
k
4.6. CONSTRAINING CARDINALITY 277
7
Donoho bound
m/k approximation
x > 0 constraint minimize kxk1
6 x (529)
subject to Ax = b
hard
1
0 0.2 0.4 0.6 0.8 1
k/n
Figure 115: (confer Figure 76) For Gaussian random matrix A ∈ Rm×n , graph illustrates
Donoho/Tanner least lower bound on number of measurements m below which recovery
of k-sparse n-length signal x by linear programming fails with overwhelming probability.
Hard problems are below curve, but not the reverse; id est, failure above depends on
proximity. Inequality demarcates approximation (− − −) to empirical phase transition
from [25]. Problems having nonnegativity constraint (· · ·) are easier to solve. [141] [142]
A = −3 2 8 12 1
3
1
2−3
1
, b = 12 (742)
1 1 1 1 1
−9 4 8 4 9 4−9 4
Stalling is not an inevitable behavior. For some problem types (beyond mere Ax = b),
convex iteration succeeds nearly all the time. Here is a cardinality problem, with noise,
whose statement is just a bit more intricate but easy to solve in a few convex iterations:
s = Ψz ∈ Rn (862)
whose upper bound on DCT basis coefficient cardinality card z ≤ k is assumed known;4.41
hence a critical assumption: transmitted signal s is sparsely supported (k < n) on the DCT
basis. It is further assumed that nonzero signal coefficients in vector z place each chosen
basis vector above the noise floor.
4.41 This simplifies exposition, although it may be an unrealistic assumption in many applications.
4.6. CONSTRAINING CARDINALITY 279
We also assume that the gap’s beginning and ending in time are precisely localized to
within a sample; id est, index ℓ locates the last sample prior to the gap’s onset, while
index n−ℓ+1 locates the first sample subsequent to the gap: for rectangularly windowed
received signal g possessing a time-gap loss and additive noise η ∈ Rn
s1:ℓ + η1:ℓ
g = ηℓ+1:n−ℓ ∈ Rn (863)
sn−ℓ+1:n + ηn−ℓ+1:n
The window is thereby centered on the gap and short enough so that the DCT spectrum
of signal s can be assumed static over the window’s duration n . Signal to noise ratio
within this window is defined
°· ¸°
°
° s1:ℓ °
°
° sn−ℓ+1:n °
SNR , 20 log (864)
kηk
In absence of noise, knowing the signal DCT basis and having a good estimate of basis
coefficient cardinality makes perfectly reconstructing gap-loss easy: it amounts to solving
a linear system of equations and requires little or no optimization; with caveat, number
of equations exceeds cardinality of signal representation (roughly ℓ ≥ k) with respect to
DCT basis.
But addition of a significant amount of noise η increases level of difficulty dramatically;
a 1-norm based method of reducing cardinality, for example, almost always returns
DCT basis coefficients numbering in excess of minimal cardinality. We speculate that is
because signal cardinality 2ℓ becomes the predominant cardinality. DCT basis coefficient
cardinality is an explicit constraint to the optimization problem we shall pose: In presence
of noise, constraints equating reconstructed signal f to received signal g are not possible.
We can instead formulate the dropout recovery problem as a best approximation:
°· ¸°
° f1:ℓ − g1:ℓ °
minimize °
° fn−ℓ+1:n − gn−ℓ+1:n °
°
x∈ Rn
subject to f = Ψ x (865)
xº0
card x ≤ k
We propose solving this nonconvex problem (865) by moving the cardinality constraint to
the objective as a regularization term as explained in §4.6 (p.273); id est, by iteration of
two convex problems until convergence:
°· ¸°
° f1:ℓ − g1:ℓ °
minimize hx , yi + ° °
x∈ R n ° fn−ℓ+1:n − gn−ℓ+1:n °
(866)
subject to f = Ψ x
xº0
and
minimize
n
hx⋆ , yi
y∈ R
subject to 0¹y¹1 (536)
yT 1 = n − k
Signal cardinality 2ℓ is implicit to the problem statement. When number of samples in
the dropout region exceeds half the window size, then that deficient cardinality of signal
remaining becomes a source of degradation to reconstruction in presence of noise. Thus, by
observation, we divine a reconstruction rule for this signal dropout problem to attain good
noise suppression: ℓ must exceed a maximum of cardinality bounds; 2ℓ ≥ max{2k , n/2}.
280 CHAPTER 4. SEMIDEFINITE PROGRAMMING
0.25
flatline and g
s+η
0.2
0.15
0.1 s+η
η
0.05
0 (a)
−0.05
−0.1
−0.15
−0.2 dropout (s = 0)
−0.25
0 100 200 300 400 500
0.25
f and g
0.2
0.15 f
0.1
0.05
0 (b)
−0.05
−0.1
−0.15
−0.2
−0.25
0 100 200 300 400 500
Figure 116: (a) Signal dropout in signal s corrupted by noise η (SNR = 10dB, g = s + η).
Flatline indicates duration of signal dropout. (b) Reconstructed signal f (red) overlaid
with corrupted signal g .
4.6. CONSTRAINING CARDINALITY 281
0.25
f−s
0.2
0.15
0.1
0.05
0 (a)
−0.05
−0.1
−0.15
−0.2
−0.25
0 100 200 300 400 500
0.25
f and s
0.2
0.15
0.1
0.05
0 (b)
−0.05
−0.1
−0.15
−0.2
−0.25
0 100 200 300 400 500
Figure 117: (a) Error signal power (reconstruction f less original noiseless signal s) is
36dB below s . (b) Original signal s overlaid with reconstruction f (red) from signal g
having dropout plus noise.
282 CHAPTER 4. SEMIDEFINITE PROGRAMMING
Figure 116 and Figure 117 show one realization of this dropout problem. Original
signal s is created by adding four (k = 4) randomly selected DCT basis vectors, from
Ψ (n = 500 in this example), whose amplitudes are randomly selected from a uniform
distribution above the noise floor; in the interval [10−10/20 , 1]. Then a 240-sample dropout
is realized (ℓ = 130) and Gaussian noise η added to make corrupted signal g (from which
a best approximation f will be made) having 10dB signal to noise ratio (864). The time
gap contains much noise, as apparent from Figure 116a. But in only a few iterations (866)
(536), original signal s is recovered with relative error power 36dB down; illustrated in
Figure 117. Correct cardinality is also recovered (card x = card z) along with the basis
vector indices used to make original signal s . Approximation error is due to DCT basis
coefficient estimate error. When this experiment is repeated 1000 times on noisy signals
averaging 10dB SNR, the correct cardinality and indices are recovered 99% of the time
with average relative error power 30dB down. Without noise, we get perfect reconstruction
in one iteration. [435, Matlab code] 2
It is well known that cardinality problem (541) (p.180) is easier to solve by linear
programming when variable x is nonnegatively constrained than when not. We postulate
a simple geometrical explanation:
Figure 75 illustrates 1-norm ball B1 in R3 and affine subset A defined {x ∈ R3 | Ax = b}.
Prototypical compressed sensing problem, for A ∈ Rm×n
minimize kxk1
x (529)
subject to Ax = b
cS = {[ I ∈ Rn×n 0 ∈ Rn ]a | aT 1 = c , a º 0} = {x | x º 0 , 1T x ≤ c} (867)
Nonnegative simplex S is the convex hull of its vertices. All n + 1 vertices of S are
constituted by standard basis vectors and the origin. In other words, all its nonzero
extreme points are cardinality-1.
Affine subset A kisses nonnegative simplex c⋆ S at optimality of (534). A kissing point
is achieved at x⋆ for optimal c⋆ as B1 or S contracts. Whereas 1-norm ball B1 has
only six vertices in R3 corresponding to cardinality-1 solutions, simplex S has three edges
(along the Cartesian axes) containing an infinity of cardinality-1 solutions. And whereas
B1 has twelve edges containing cardinality-2 solutions, S has three (out of total four)
facets constituting cardinality-2 solutions. In other words, likelihood of a low-cardinality
solution is higher by kissing nonnegative simplex S (534) than by kissing 1-norm ball B1
(529) because facial dimension (corresponding to given cardinality) is higher in S .
Empirically, this observation also holds in other Euclidean dimensions; e.g, Figure 76,
Figure 115.
4.6. CONSTRAINING CARDINALITY 283
R3
cS = {x | x º 0 , 1T x ≤ c}
A = {x ∈ R3 | Ax = b}
y
F
Figure 118: Simplex S is convex hull of origin and all cardinality-1 nonnegative vectors of
unit norm (its vertices). Line A , intersecting two-dimensional (cardinality-2) face F of
nonnegative simplex cS , emerges from cS at a cardinality-1 vertex. S equals nonnegative
orthant R3+ ∩ 1-norm ball B1 (Figure 75). Kissing point achieved when • (on edge) meets
A as simplex contracts (as scalar c diminishes) under optimization (534).
284 CHAPTER 4. SEMIDEFINITE PROGRAMMING
Although it is more efficient (compared with our algorithm) to search over individual
columns of matrix A for a cardinality-1 solution known a priori to exist, tables are turned
when cardinality exceeds 1 :
simplex cS . This cardinality-1 reconstruction algorithm also holds more generally when affine subset A
has any higher dimension n− m .
4.44 Rows of matrix A are removed based upon linear dependence. Assuming b ∈ R(A) , corresponding
Rn Rn+ (a)
A
P
Rm b
(b)
zero variables prior to numerical solution. We offer a different and geometric presolver
first introduced in §2.13.5:4.46
Two interpretations of the constraints from problem (534) are realized in Figure 119.
Assuming that a cardinality-k solution exists and matrix A describes a pointed polyhedral
cone K = {Ax | x º 0} , as in Figure 119b, columns are removed from A if they do not
belong to the smallest face F of K containing vector b ; those columns correspond to
0-entries in variable vector x (and vice versa). Generators of that smallest face always hold
a minimal cardinality solution, in other words, because a generator outside the smallest
face (having positive coefficient) would violate the assumption that b belongs to that face.
Benefit accrues when vector b does not belong to relative interior of K ; there would
be no columns to remove were b ∈ rel intr K since the smallest face becomes cone K itself
(Example 4.6.2.0.2). Were b an extreme direction, at the other end of the spectrum, then
the smallest face is an edge that is a ray containing b ; this geometrically describes a
cardinality-1 case where all columns, save one, would be removed from A .
When vector b resides in a face F of K that is not cone K itself, benefit is realized
as a reduction in computational intensity because the consequent equivalent problem has
smaller dimension. Number of columns removed depends completely on geometry of a
given problem; particularly, location of b within K . In the example of Figure 119b,
interpreted literally in R3 , all but two columns of A are discarded by our presolver when
b belongs to facet F .
−1 1 8 1 1 0 1
A = −3 2 8 1 2
1 1
3 −1 ,
2 3 b = 1 (742)
2
1 1 1 1 1
−9 4 8 4 9 4−9 4
columns of A .
4.6. CONSTRAINING CARDINALITY 287
minimize
n
h|x⋆ | , yi
y∈ R
subject to 0¹y¹1 (536)
yT 1 = n − k
We get another equivalent to linear programs (869) (536), in the limit, by interpreting
problem (529) as infimum to a vertex-description of the 1-norm ball (Figure 75,
Example 3.2.0.1.1, confer (528)):
minimize ha , yi
minimize
n
kxk1 2n
a∈ R
x∈ R ≡ subject to [ A −A ]a = b (871)
subject to Ax = b
aº0
minimize
2n
ha⋆ , yi
y∈ R
subject to 0¹y¹1 (536)
y T 1 = 2n − k
where x⋆ = [ I −I ]a⋆ ; from which it may be rightfully construed that any vector 1-norm
minimization problem has equivalent expression in a nonnegative variable.
find x ∈ RN
subject to Ax = b (872)
kC xk = 1
The set {x | kC xk = 1} (27) describes an ellipsoid boundary (Figure 15). This problem is
nonconvex because solution is constrained to that boundary. Assign
C x [ xTC T 1 ] C xxTC T C x
· ¸ · ¸ · ¸
X Cx
G= = , ∈ SN +1 (873)
1 xTC T 1 xTC T 1
Any rank-1 solution must have this form. (§B.1.0.2) Ellipsoidally constrained feasibility
problem (872) is equivalent to:
find x ∈ RN
X∈ SN
subject to Ax =·b ¸
X Cx (874)
G= (º 0)
xTC T 1
rank G = 1
tr X = 1
This is transformed to an equivalent convex problem by moving the rank constraint to the
objective: We iterate solution of
minimize hG , Y i
X∈ SN , x∈RN
subject to Ax =·b
(875)
¸
X Cx
G= º0
xTC T 1
tr X = 1
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 289
with
minimize hG⋆ , Y i
Y ∈ SN +1
subject to 0¹Y¹I (876)
tr Y = N
in Matlab notation. This heuristic is quite effective for problem (872) which is
exceptionally easy to solve by convex iteration.
When b ∈ / R(A) then problem (872) must be restated as a projection:
minimize kAx − bk
x∈RN (878)
subject to kC xk = 1
minimize hG , Y i + kAx − bk
X∈ SN , x∈RN · ¸
X Cx (879)
subject to G= º0
xTC T 1
tr X = 1
We iterate this with calculation (876) of direction matrix Y as before until a rank-1
G matrix is found. 2
minimize kAQ − B kF
Q∈ Rn×2 (880)
subject to QTQ = I
x [ xT y T 1 ] xxT xy T
X Z x x
G = y = ZT Y y , yxT
yy T y ∈ S2n+1 (881)
1 xT yT 1 xT yT 1
290 CHAPTER 4. SEMIDEFINITE PROGRAMMING
{X(: , i) | 1T X(: , i) = 1}
That means:
1) norm of each row and column is 1 ,4.48
ΞT 1 = 1 , Ξ1 = 1 , Ξ≥0 (886)
X T1 = 1
X1 = 1
X≥0
4.48 This fact would be superfluous were the objective of minimization linear, because the permutation
matrices reside at the extreme points of a polyhedron (104) implied by (886). But as posed, only
either rows or columns need be constrained to unit norm because matrix orthogonality implies transpose
orthogonality. (§B.5.2) Absence of vanishing inner product constraints that help define orthogonality, like
tr Z = 0 from Example 4.7.0.0.2, is a consequence of nonnegativity; id est, the only orthogonal matrices
having exclusively nonnegative entries are permutations of the Identity.
292 CHAPTER 4. SEMIDEFINITE PROGRAMMING
where w ≈ 10 positively weights the rank regularization term. Optimal solutions G⋆i are
key to finding direction matrices Wi for the next iteration of semidefinite programs
(887) (888):
hG⋆i , Wi i
minimize
n+1
Wi ∈ S
subject to 0 ¹ Wi ¹ I , i=1 . . . n (888)
tr Wi = n
Direction matrices thus found lead toward rank-1 matrices G⋆i on subsequent iterations.
Constraint on trace of G⋆i normalizes the i th column of X ⋆ to unity because (confer p.361)
X (: , i) [ X ⋆ (: , i)T 1 ]
· ⋆ ¸
G⋆i = (889)
1
at convergence. Binary-valued X ⋆ column entries result from the further sum constraint
X1 = 1. Columnar orthogonality is a consequence of the further transpose-sum constraint
X T 1 = 1 in conjunction with nonnegativity constraint X ≥ 0 ; but we leave proof of
orthogonality an exercise. The optimal objective value is 0 for both semidefinite programs
when vectors A and B are related by permutation. In any case, optimal solution X ⋆
becomes a permutation matrix Ξ .
Because there are n direction matrices Wi to find, it can be advantageous to invoke
a known closed-form solution for each from page 539. What makes this combinatorial
problem more tractable are relatively small semidefinite constraints in (887). (confer (883))
When a permutation A of vector B exists, number of iterations can be as small as 1. But
this combinatorial Procrustes problem can be made even more challenging when vector A
has repeated entries.
2
We wish to solve for, what is known to be, a tight upper bound 323 on the constrained
polynomial x2 y + y 2 z + z 2 x by transformation to a rank-constrained semidefinite program.
First identify
2
x [x y z 1] x xy zx x
y xy y 2 yz y 4
G = = zx yz z 2 z ∈ S (898)
z
1 x y z 1
[ x2 y 2 z 2 x y z 1 ]
x2 x4 x2 y 2 z 2 x2 x3 x2 y zx2 x2
y2
x2 y 2
2 2 y4 y2 z2 xy 2 y3 y2 z y2
z2 z x y2 z2 z4 z2x yz 2 z3 z2
∈ S7
3
X=
x
x2
= xy 2 z2x x2 xy zx x
y
x y
y3 yz 2 xy y2 yz y
z zx2 y2 z z3 zx yz z2 z
1 x2 y2 z2 x y z 1 (899)
x kyk = y (906)
minimize f (x , y)
x, y
subject to (x , y) ∈ C (907)
x kyk = y
where f is some convex function and C is some convex set. We can realize the nonconvex
equality by constraining rank and adding a regularization term to the objective. Make the
296 CHAPTER 4. SEMIDEFINITE PROGRAMMING
assignment:
x [ xT y T 1 ] xxT xy T
X Z x x
G = y = Z Y y , yxT yy T y ∈ S2N +1 (908)
1 xT yT 1 xT yT 1
where X , Y ∈ SN , also Z ∈ SN [sic]. Any rank-1 solution must take the form of (908).
(§B.1) The problem statement equivalent to (907) is then written
minimize f (x , y) + kX − Y kF
X , Y∈S, Z , x, y
subject to (x , y)∈ C
X Z x
G= Z Y y (º 0) (909)
xT y T 1
rank G = 1
tr(X) = 1
δ(Z) º 0
minimize f (x , y) + kX − Y kF + hG , W i
X , Y , Z , x, y
subject to (x , y)∈ C
X Z x
G= Z Y y º 0 (910)
xT yT 1
tr(X) = 1
δ(Z) º 0
minimize hG⋆ , W i
W ∈ S2N +1
subject to 0¹W¹ I (911)
tr W = 2N
This semidefinite program has an optimal solution that is known in closed form. Iteration
(910) (911) terminates when rank G = 1 and linear regularization hG , W i vanishes to
within some numerical tolerance in (910); typically, in two iterations. If function f
competes too much with the regularization, positively weighting each regularization term
will become required. At convergence, problem (910) becomes a convex equivalent to the
original nonconvex problem (907). 2
1
CUT
5
16
M′c
1
6
-1
15
-3
7
-3
5
4
14
8
13
9
10
11
12
Figure 121: A cut partitions nodes {i = 1 . . . 16} of this graph into Mc and M′c . Linear
arcs have circled weights. The problem is to find a cut maximizing total weight of all arcs
linking partitions made by the cut.
Literature on the max cut problem is vast because this problem has elegant primal
and dual formulation, its solution is very difficult, and there exist many commercial
applications; e.g, semiconductor design [144], quantum computing [457].
Our purpose here is to demonstrate how iteration of two simple convex problems can
quickly converge to an optimal solution of the max cut problem with a 98% success rate,
on average.4.50 max cut is stated:
aij (1 − xi xj ) 21
P
maximize
x∈Rn 1≤i<j≤n (912)
subject to δ(xxT ) = 1
where [aij ] are real arc weights, and vector x = [xi ] ∈ Rn corresponds to the n nodes;
specifically,
node i ∈ Mc ⇔ xi = 1
(913)
node i ∈ M′c ⇔ xi = −1
4.50 We term our solution to max cut fast because we sacrifice a little accuracy to achieve speed; id est,
only about two or three convex iterations, achieved by heavily weighting a rank regularization term.
298 CHAPTER 4. SEMIDEFINITE PROGRAMMING
If nodes i and j have the same binary value xi and xj , then they belong to the same
partition and contribute nothing to the cut. Arc (i , j) traverses the cut, otherwise, adding
its weight aij to the cut.
max cut statement (912) is the same as, for A = [aij ] ∈ S n
1 T
maximize
n 4 h11 − xxT , Ai
x∈R
T
(914)
subject to δ(xx ) = 1
whose optimal value (−ν ⋆T 1) provides an upper bound to max cut but is not
tight4.51 ( 14 hxxT , δ(A1)−Ai < g(ν) , duality gap is nonzero); [182] problem
(919) is not a strong dual to (916).4.52
To transform max cut to its convex equivalent, first define
X = xxT ∈ S n (924)
then max cut (916) becomes
1
maximize
n 4 hX , δ(A1) − Ai
X∈ S
subject to δ(X) = 1 (920)
(X º 0)
rank X = 1
4.51 Taking the dual of dual problem (919) would provide (920) but without the rank constraint. [175]
Dual of a dual of even a convex primal problem is not necessarily the same primal problem; although,
optimal solution of one can be obtained from the other.
4.52 Even so, empirically, binary solution arg sup ⋆
x∈B L(x , ν ) to (917) is optimal to (916).
n
±
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 299
maximize xTA x
x
subject to kxk = 1 (923)
card x ≤ c
√
where z , λ x and where optimal solution x⋆ is a principal eigenvector (1885) (§A.5.1.1)
of A and λ = x⋆TA x⋆ is the principal eigenvalue [185, p.331] when c is true cardinality of
that eigenvector. This is principal component analysis with a cardinality constraint which
controls solution sparsity. Define the matrix variable
X , xxT ∈ SN (924)
4.53 We solved for a length-250 binary vector in only a few minutes and convex iterations on a 2006 vintage
laptop Core 2 CPU (Intel [email protected], 666MHz FSB).
4.54 more computationally intensive than the proposed convex iteration by many orders of magnitude.
Solving max cut by searching over all binary vectors of length 100, for example, would occupy a
contemporary supercomputer for a million years.
4.55 Existence of a polynomial-time approximation to max cut with accuracy provably better than 94.11%
would refute NP-hardness; which Håstad believes to be highly unlikely. [212, thm.8.2] [213]
300 CHAPTER 4. SEMIDEFINITE PROGRAMMING
maximize hX , Ai
X∈ SN
subject to hX , I i = 1
(X º 0) (926)
rank X = 1
card δ(X) ≤ c
where w1 and w2 are positive scalars respectively weighting tr(XY ) and δ(X)T δ(W )
just enough to insure that they vanish to within some numerical precision, where direction
matrix Y is an optimal solution to semidefinite program
minimize hX ⋆ , Y i
Y ∈ SN
subject to 0¹Y¹I (928)
tr Y = N − 1
minimize
2
hδ(X ⋆ ) , δ(W )i
W = δ (W )
subject to 0 ¹ δ(W ) ¹ 1 (929)
tr W = N − c
Both direction matrix programs are derived from (1892a) whose analytical solution is
known but is not necessarily unique. We emphasize (confer p.250): because this iteration
(927) (928) (929) (initial Y, W = 0) is not a projection method (§4.5.1.1), success relies
on existence of matrices in the feasible set of (927) having desired rank and diagonal
cardinality. In particular, the feasible set of convex problem (927) is a Fantope (94) whose
extreme points constitute the set of all normalized rank-1 matrices; among those are found
rank-1 matrices of any desired diagonal cardinality.
Convex problem (927) is neither a relaxation of cardinality problem (923); instead,
problem (927) becomes a convex equivalent to (923) at global optimality of iteration (927)
(928) (929). Because the feasible set of problem (927) contains all normalized rank-1 (§B.1)
symmetric matrices of every nonzero diagonal cardinality, a constraint too low or high in
cardinality c will not prevent solution. An optimal rank-1 solution X ⋆ , whose diagonal
cardinality is equal to cardinality of a principal eigenvector of matrix A , will produce the
least residual Frobenius norm (to within machine noise processes) in the original problem
statement (922). 2
4.56 A semidefiniteness constraint X º 0 is not required, theoretically, because positive semidefiniteness
of a rank-1 matrix is enforced by symmetry. (Theorem A.3.1.0.7)
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 301
phantom(256)
Stanford University.
4.59 Noise considered here is due only to the reconstruction process itself; id est, noise in excess of that
produced by the best reconstruction of an image from a complete set of samples in the sense of Shannon.
At less than 30dB image/error, artifacts generally remain visible to the naked eye. We estimate that
about 50dB is required to eliminate noticeable distortion in a visual A/B comparison.
4.60 In vascular radiology, diagnoses are almost exclusively based on morphology of vessels and, in
U = F H F(U)F H (933)
4.61 Ihave never calculated the PSNR of these reconstructed images [of Barbara]. −Jean-Luc Starck
The sparsity of the image is the percentage of transform coefficients sufficient for diagnostic-quality
reconstruction. Of course the term “diagnostic quality” is subjective. . . . I have yet to see an “objective”
measure of image quality. Difference images, in my experience, definitely do not tell the whole story.
Often I would show people some of my results and get mixed responses, but when I add artificial Gaussian
noise to an image, often people say that it looks better. −Michael Lustig
4.62 k-space is conventional acquisition terminology indicating domain of the continuous raw data provided
by an MRI machine. An image is reconstructed by inverse discrete Fourier transform of that data
interpolated on a Cartesian grid in two dimensions.
304 CHAPTER 4. SEMIDEFINITE PROGRAMMING
Figure 123: MRI radial sampling pattern, in DC-centric Fourier domain, representing 4.1%
(10 lines) subsampled data. Only half of these complex samples, in any halfspace about
the origin in theory, need be acquired for a real image because of conjugate symmetry.
Due to MRI machine imperfections, samples are generally taken over full extent of each
radial line segment. MRI acquisition time is proportional to number of lines.
From §A.1.1 no.33 we have a vectorized two-dimensional DFT via Kronecker product ⊗
Idealized radial sampling in the Fourier domain can be simulated by Hadamard product
◦ with a binary mask Φ ∈ Rn×n whose nonzero entries could, for example, correspond
with the radial line segments in Figure 123. To make the mask Nyquist-centric, like DFT
matrix F , define a circulant [197] symmetric permutation matrix4.63
· ¸
0 I
Θ , ∈ Sn (936)
I 0
ΘΦΘ ◦ F UF = K (937)
and in vector form, (44) (1983)
Because measurements K are complex, there are actually twice the number of equality
constraints as there are measurements.
We can cut that number of constraints in half via vertical and horizontal mask Φ
symmetry which forces the imaginary inverse transform to 0 : The inverse subsampled
transform in matrix form is
vec−1 f
Figure 124: Aliasing of Shepp-Logan phantom in Figure 122 resulting from k-space
subsampling pattern in Figure 123. This image is real because binary mask Φ is vertically
and horizontally symmetric. It is remarkable that the phantom can be reconstructed, by
convex iteration, given only U 0 = vec−1 f .
later abbreviated
P vec U = f (941)
where 2
× n2
P , (F H ⊗ F H )δ(vec ΘΦΘ)(F ⊗F ) ∈ C n (942)
2
Because of idempotence P = P , P is a projection matrix. Because of its Hermitian
symmetry [194, p.24]
P is an orthogonal projector.4.64 P vec U is real when P is real; id est, when for positive
even integer n · ¸
Φ11 Φ(1 , 2 : n)Ξ
Φ= ∈ Rn×n (944)
Ξ Φ(2 : n , 1) Ξ Φ(2 : n , 2 : n)Ξ
where Ξ ∈ S n−1 is the order-reversing permutation matrix (1920). In words, this necessary
and sufficient condition on Φ (for a real inverse subsampled transform [322, p.53]) demands
vertical symmetry about row n2 +1 and horizontal symmetry4.65 about column n2 +1.
Define
1 0 0
−1 1 0
. .
−1 1 .
∆ , ∈ Rn×n (945)
. .
.. .. .. .
..
. 1 0
0T −1 1
4.64 (942) is a diagonalization of matrix P whose binary eigenvalues are δ(vec ΘΦΘ) while the
corresponding eigenvectors constitute the columns of unitary matrix F H ⊗ F H .
4.65 This condition on Φ applies to both DC- and Nyquist-centric DFT matrices.
306 CHAPTER 4. SEMIDEFINITE PROGRAMMING
I ⊗ ∆T ΨT2
2 2
where Ψ ∈ R4n ×n . A total-variation minimization for reconstructing MRI image U ,
that is known suboptimal [239] [78], may be concisely posed
minimize kΨ vec Uk1
U (948)
subject to P vec U = f
where 2
f = (F H ⊗ F H ) vec K ∈ C n (949)
is the known inverse subsampled Fourier data (a vectorized aliased image, Figure 124),
and where a norm of discrete image-gradient ∇U is equivalently expressed as norm of a
linear transformation Ψ vec U .
Although this simple problem statement (948) is equivalent to a linear program (§3.2),
its numerical solution is beyond the capability of even the most highly regarded of
contemporary commercial solvers.4.67 Our recourse is to recast the problem in regularized
form and write customized code to solve it:
minimize h|Ψ vec U| , yi
U (a)
subject to P vec U = f
≡ (950)
minimize h|Ψ vec U| , yi + 12 λkP vec U − f k22 (b)
U
where multiobjective parameter λ ∈ R+ is quite large (λ ≈1E8) so as to enforce the equality
constraint: P vec U −f = 0 ⇔ kP vec U −f k22 = 0 (§A.7.1). We introduce a direction
4n2
vector y ∈ R+ as part of a convex iteration (§4.6.3) to overcome that known suboptimal
minimization of discrete image-gradient cardinality: id est, there exists a vector y ⋆ with
entries yi⋆ ∈ {0, 1} such that
minimize kΨ vec Uk0 1
U ≡ minimize h|Ψ vec U| , y ⋆ i + 2 λkP vec U − f k22 (951)
subject to P vec U = f U
e.g, execution time, memory. The obstacle is, in fact, inadequate numerical precision. Even when all
dependent equality constraints are manually removed, the best commercial solvers fail simply because
computer numerics become nonsense; id est, numerical errors enter significant digits and the algorithm
exits prematurely, loops indefinitely, or produces an infeasible solution.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 307
where small positive constant ǫ ∈ R+ has been introduced for invertibility. Speaking
more analytically, introduction of ǫ serves to uniquely define the objective’s gradient
everywhere in the function domain; id est, it transforms absolute value in (950b) from a
function differentiable almost everywhere into a differentiable function. An example of
such a transformation in one dimension is illustrated in Figure 126. When small enough
for practical purposes4.68 (ǫ ≈1E-3), we may ignore the limiting operation. Then the
mapping, for 0 ¹ y ¹ 1
¢−1
vec U t+1 = ΨT δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP
¡
λP f (954)
is a contraction in U t that can be solved recursively in t for its unique fixed point; id est,
until U t+1 → U t . [259, p.300] [234, p.155] Calculating this inversion directly is not possible
for large matrices on contemporary computers because of numerical precision, so instead
we apply the conjugate gradient method of solution to
Ψ δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP vec U t+1 = λP f
¡ T ¢
(955)
which is linear in U t+1 at each recursion in the Matlab program [419].4.69
4.68 We are looking for at least 50dB image/error ratio from only 4.1% subsampled data (10 radial lines in
k-space). With this setting of ǫ , we actually attain in excess of 100dB from a simple Matlab program in
about a minute on a 2006 vintage laptop Core 2 CPU (Intel [email protected], 666MHz FSB). By trading
execution time and treating discrete image-gradient cardinality as a known quantity for this phantom,
over 160dB is achievable.
4.69 Conjugate gradient method requires positive definiteness. [177, §4.8.3.2]
308 CHAPTER 4. SEMIDEFINITE PROGRAMMING
R x y
−1 |y|+ǫ
dy
|x|
Figure 126: Real absolute value function f2 (x) = |x| on x ∈ [−1, 1] (from Figure 72b)
superimposed upon integral of its derivative at ǫ = 0.05 which smooths objective function.
Observe that P (942), in the equality constraint from problem (950a), is not a
wide matrix.4.70 Although number of Fourier samples taken is equal to the number
of nonzero entries in binary mask Φ , matrix P is square but never actually formed
during computation. Rather, a two-dimensional fast Fourier transform of U is computed
followed by masking with ΘΦΘ and then an inverse fast Fourier transform. This technique
significantly reduces memory requirements and, together with contraction method of
solution, is the principal reason for relatively fast computation.
convex iteration
By convex iteration we mean alternation of solution to (950a) and (956) until convergence.
Direction vector y is initialized to 1 until the first fixed point is found; which means, the
contraction recursion begins calculating a (1-norm) solution U ⋆ to (948) via problem
(950b). Once U ⋆ is found, vector y is updated according to an estimate of discrete
2
image-gradient cardinality c : Sum of the 4n2 − c smallest entries of |Ψ vec U ⋆ | ∈ R4n is
the optimal objective value from a linear program, for 0 ≤ c ≤ 4n2 − 1 (536)
2
4n
π(|Ψ vec U ⋆ |)i = h|Ψ vec U ⋆ | , yi
P
minimize
2
i=c+1 y ∈ R4n
(956)
subject to 0¹y¹1
y T 1 = 4n2 − c
piece). Although distance between 467 to 480 is relatively small, there is apparently vast distance to a
solution because no complete solution followed in 2009.
4.74 Even so, combinatorial-intensity brute-force backtracking methods can solve similar puzzles in minutes
given M = 196 pieces on a 14×14 test board; as demonstrated by Yannick Kirschhoffer. There is a steep
rise in level of difficulty going to a 15×15 board.
310 CHAPTER 4. SEMIDEFINITE PROGRAMMING
1 5 9 13
(a) pieces 2 6 10 14
3 7 11 15
4 8 12 16
13 4 16 5
3 2 10 11
(b) one solution
12 14 6 8
1 15 7 9
e1 e3
(c) colors
e2 e4
Figure 127: Eternity II is a board game in the puzzle genre. (a) Shown are all of the
16 puzzle pieces (indexed as in the tableau alongside) from a scaled-down computerized
demonstration game version from the TOMY website. Puzzle pieces are square and
partitioned into four colors (with associated symbols). Pieces may be moved, removed,
and rotated at random on a 4×4 board. (b) Illustrated is one complete solution to
this puzzle whose solution is not unique. The piece, whose border is lightly outlined, was
placed last in this realization. There is no mandatory piece placement, as for the full game,
except the grey board-boundary. Solution time for a human is typically on the order of a
minute. (c) This puzzle has four colors, indexed 1 through 4 ; grey corresponds to 0.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 311
full-game rules
1) Any puzzle piece may be rotated face-up in quadrature and placed or replaced on
the square board.
2) Only one piece may occupy any particular cell on the board.
3) All adjacent pieces must match in color (and symbol) at their touching edges.
4) Solid grey edges must appear all along the board’s boundary.
Monckton insists they number in the thousands. Ignoring board-boundary constraints and the full game’s
single mandatory piece placement, a loose upper bound on number of combinations is M ! 4M = 256! 4256 .
That number gets further loosened: 150638!/(256!(150638−256)!) after presolving Eternity II (984).
312 CHAPTER 4. SEMIDEFINITE PROGRAMMING
17 33 49 65 81 97 113 129 145 161 177 193 209 225
1 241
2 242
3 243
4 244
5 245
6 246
7 247
8 248
9 249
10 250
11 251
12 252
13 253
14 254
15 255
16 256
32 48 64 80 96 112 128 144 160 176 192 208 224 240
Figure 128: Eternity II full-game board (16 ×16 , M = 256 , L = 22) illustrating boundary
cell numbers. Grid facilitates piece placement within unit-square cell; one piece per cell.
Cell 121 (shaded) holds mandatory puzzle-piece P139 designated by Monckton.
pT
i1
pT
Pi , i2 ∈ R4×L , i=1 . . . M (957)
pT
i3
pT
i4
In other words, each distinct nongrey color is assigned a unique corresponding index
ℓ ∈ {1 . . . L} identifying a standard basis vector eℓ ∈ RL (Figure 127c) that becomes a
vector pij ∈ {e1 . . . eL , 0} ⊂ RL constituting matrix Pi representing a particular piece.
Rows {pT ij , j = 1 . . . 4} of Pi are ordered counterclockwise as in Figure 129. Color data is
given in Figure 130 for the demonstration game board. Then matrix Pi describes the i th
piece, excepting its rotation and position on the board.
Our intent is to show how to vectorize the board, with respect to whole pieces, and
then express Eternity II as a very hard combinatorial objective with linear constraints:
All pieces are initially placed in order of their given index i assigned by Monckton. The
vectorized game-board has initial state represented within a matrix
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 313
p62
pT
61
pT 4×L
p63 p61 62
P6 =
pT
63
∈ R
p64 pT
64
P1
.
P , .. ∈ R4M ×L (958)
PM
enumerated in Figure 130 for the demonstration game. Moving pieces all at once about
the square board corresponds to permuting pieces Pi on the vectorized board represented
by matrix P , while rotating the i th piece is equivalent to circularly shifting row indices of
Pi (rowwise permutation). This permutation problem, as stated, is doubly combinatorial
(M ! 4M combinations) because we must find a permutation of pieces (M !)
Ξ ∈ RM ×M (959)
and quadrature rotation Πi ∈ R4×4 of each individual piece (4M ) that solve the puzzle;
Π1 P1
.. ∈ R4M ×L
(Ξ ⊗ I4 )Π P = (Ξ ⊗ I4 ) . (960)
ΠM PM
where
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0
Πi ∈ {π1 , π2 , π3 , π4 } ,
0 0
, , , (961)
1 00 0 0 11 0 0 00 1 0 0
0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0
Π1 0
Π ,
.. ∈ R
4M ×4M
(962)
.
0 ΠM
P1 [ e3 0 0 e1 ]T
P2 [ e2 e4 e4 e4 ]T
P3 [ e2 e1 0 e1 ]T
P4 [ e4 e1 0 e1 ]T
P5 [0 0 e3 e1 ]T
P6 [ e2 e2 e4 e2 ]T
P7 [ e2 e3 0 e3 ]T
P8 [ e4 e3 0 e3 ]T
P9 [0 e3 e3 0 ]T
P10 [ e2 e2 e4 e4 ]T
P11 [ e2 e3 0 e1 ]T
P12 [ e4 e1 0 e3 ]T
P13 [0 e1 e1 0 ]T
P14 [ e2 e2 e4 e4 ]T
P15 [ e2 e1 0 e3 ]T
P16 [ e4 e3 0 e1 ]T
Figure 131: All pieces in their initial state on vectorized demo-game board. Line segments
indicate differences ∆ (965), ° indicate edges on board boundary β (967). Entries are
indices ℓ identifying standard basis vectors eℓ ∈ RL from Figure 130.
∆T1
√ √
.. 2 M ( M −1)×4M
∆ , ∈ R (963)
.
∆T2√M (√M −1)
For the demonstration game, the first twelve entries of ∆ correspond to blue line segments
(leftmost) in Figure 131 while the twelve remaining entries correspond to red lines: for
ei ∈ R64
316 CHAPTER 4. SEMIDEFINITE PROGRAMMING
eT T
1 − e19
T
e5 − eT23
eT T
9 − e27
e13 − eT
T
31
eT T
17 − e35
e21 − eT
T
39
eT T
25 − e43
e29 − eT
T
47
eT T
33 − e51
eT T
37 − e 55
eT
41 − e T
59
eT
45 − e T
63 ∈ R24×64
∆= (965)
eT4 − e T
6
eT
8 − eT
10
eT T
12 − e14
eT
20 − e T
22
eT
24 − e T
26
eT
28 − e T
30
eT
36 − e T
38
eT
40 − e T
42
eT
44 − e T
46
e52 − eT
T
54
eT T
56 − e58
e60 − eT
T
62
consolidating permutations Φ
By defining
Φ , (Ξ ⊗ I4 )Π ∈ R4M ×4M (968)
this square matrix becomes a structured permutation matrix replacing the product of
permutation matrices. Then puzzle piece edge adjacency constraint (964) becomes
√ √
M ( M −1)×L
∆ΦP = 0 ∈ R2 (969)
while game board boundary constraint (966) becomes
β TΦP 1 = 0 ∈ R (970)
Now partition composite permutation matrix variable Φ into 4×4 blocks
φ11 · · · φ1M
Φ , ... .. ..
∈ R
4M ×4M
(971)
. .
φM 1 · · · φM M
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 317
10
20
Φ⋆
30 M = 16
40
50
60
0 10 20 30 40 50 60
Figure 132: Sparsity pattern for composite permutation matrix Φ⋆∈ R4M ×4M representing
solution from Figure 127b. Each four-point cluster represents a circulant permutation
matrix from (961). Any M = 16-piece solution may be verified by the TOMY demo.
which is convex in the constraints where e121 , e139 ∈ RM are members of the standard
basis representing mandatory piece P139 placement in the full game,4.81 where
1 0 0 0 1 0
Rd , 1 0 ∈ R3×4 , Sd , 0 1 ∈ R3×4 (974)
0 1 0 0 0 1
· ¸ · ¸
1 0 0 0 0 0 1 0
Rφ , ∈ R2×4 , Sφ , ∈ R2×4 (975)
0 1 0 0 0 0 0 1
and where Φ ≥ 0 denotes entrywise nonnegativity. These matrices R and S enforce
circulance.4.82 Full game mandatory-piece rotation requires equality constraint π3 .
permutation polyhedron
Constraints Φ1 = 1 and ΦT 1 = 1 and Φ ≥ 0 confine Φ to a permutation polyhedron (104)
in R4M ×4M ; which is, the convex hull of permutation matrices. The objective enforces
minimal cardinality per row and column. Slicing the permutation polyhedron, by looking
at a particular row or column subspace of Φ , looks like intersection of a 1-norm ball
with a nonnegative orthant. Cardinality 1 vectors reside at vertices of a one norm ball.
(Figure 75)4.83 Hence, the optimal objective is a sum of cardinalities 1.
Any vertex, of the permutation polyhedron, is a permutation matrix having minimal
cardinality 4M .4.84 The feasible set of problem (973) is an intersection of the polyhedron
with a number of hyperplanes. Feasible solutions exist that are not permutation matrices.
But the intersection must contain a vertex of the permutation polyhedron because a
solution Φ⋆ cannot otherwise be a permutation matrix; such a solution is presumed to
exist, so it must also be a vertex (extreme point)4.85 of the intersection.
In vectorized variable, by §A.1.1 no.33, problem (973) is equivalent to
4M
kΦ(i , :)T k0 + kΦ(: , i)k0
P
minimize
Φ∈ R4M ×4M i=1
T
subject to (P ⊗ ∆) vec Φ = 0
(P 1 ⊗ β)T vec Φ = 0
(1T
4M ⊗ I4M ) vec Φ = 1 (976)
(I4M ⊗ 1T4M ) vec Φ = 1
(IM ⊗ Rd ⊗ IM ⊗ Rd − IM ⊗ Sd ⊗ IM ⊗ Sd ) vec Φ = 0
(IM ⊗ Sφ ⊗ IM ⊗ Rφ − IM ⊗ Rφ ⊗ IM ⊗ Sφ ) vec Φ = 0
(e139 ⊗ I4 ⊗ e121 ⊗ I4 )T vec Φ = vec π3
vec Φ º 0
whose optimal objective value is 8M ; cardinality of permutation matrix Φ⋆ is 4M .
With respect to an orthant, º connotes entrywise nonnegativity (p.642). This problem is
abbreviated:
4M
kΦ(i , :)T k0 + kΦ(: , i)k0
P
minimize
Φ∈ R4M ×4M i=1 (977)
subject to E vec Φ = τ
vec Φ º 0
4.81 meaning that piece P139 (numbered 139 by Monckton) must be placed in cell 121 on the board
(Figure 128) with rotation π3 (p.313).
4.82 Since 0 is the trivial circulant matrix, application is democratic over all blocks φ .
ij
4.83 This means: each vertex of the permutation polyhedron, in isometrically isomorphic R16M 2, is
hyperplane. There can be no further intersection with a feasible affine subset that would enlarge that
face; id est, a vertex of the permutation polyhedron persists in the feasible set.
4.7. CARDINALITY AND RANK CONSTRAINT EXAMPLES 319
√ √
M −1)+8M +13M 2 ×16M 2
where E ∈ R17+2L M ( is highly sparse having 4,784,144 nonzero
entries in {−1, 0, 1}.
dim E = 864,593 × 1,048,576 (978)
Any feasible binary solution is minimal cardinality and vice versa because it is a
vertex of the feasible set. (§2.3.2.0.4)
But number of equality constraints is too large for contemporary binary solvers.4.86 So
again, we reformulate the problem:
canonical Eternity II
Because each block φij of Φ (971) is optimally circulant, comprising four permutation
matrices (972) uniquely identifiable by their first column (961), we may take as variable
every fourth column of Φ :
4M ×4M
Φ = (Φ̃⊗eT T T T
1 )+(IM ⊗π4 )(Φ̃⊗e2 )+(IM ⊗π3 )(Φ̃⊗e3 )+(IM ⊗π2 )(Φ̃⊗e4 ) ∈ R (980)
This formula describes replication (+), columnar upsampling & shifting (ei ∈ R4 ), and
rotation (πi ∈ R4×4 ) of Φ̃ . By §A.1.1 no.45 and no.46
find Φ̃ ∈ B4M ×M
subject to ∆ ΦP = 0
β T ΦP 1 = 0
(982)
(IM ⊗ 1T 4 )Φ̃1 = 1
Φ̃T 1 = 1
(e121 ⊗ I4 )T Φ (e139 ⊗ I4 ) = π3
√ √
where ∆ ∈ R2 M ( M −1)×4M (963) (identifying adjacent edges) is evaluated in (965), initial
piece placement P ∈ R4M ×L is defined in (958) and enumerated in Figure 130, β ∈ R4M +
defining a game board boundary in Figure 131 has corresponding value (967), and where
π3 (961) determines mandatory-piece rotation. Thus, Eternity II (976) is equivalently
4.86 Saunders’
program lusol can reduce that number to 797,508 constraints by eliminating linearly
dependent rows of matrix E , but that is not enough to overcome numerical issues with the best solvers.
320 CHAPTER 4. SEMIDEFINITE PROGRAMMING
transformed
4M M
kΦ̃(i , :)T k0 +
P P
minimize kΦ̃(: , j)k0
Φ̃∈ R4M ×M i=1 j=1
T
subject to (P ⊗ ∆)Y vec Φ̃ = 0
(P 1 ⊗ β)T Y vec Φ̃ = 0
(983)
(1T T
M ⊗ IM ⊗ 14 ) vec Φ̃ = 1
T
(IM ⊗ 14M ) vec Φ̃ = 1
(e139 ⊗ e121 ⊗ I4 )T vec Φ̃ = π3 e1
vec Φ̃ º 0
whose optimal objective value is 2M since optimal cardinality of Φ̃⋆ (with entries in {0, 1})
is M , where matrix constant Y maps subsampled Φ̃ to Φ via (981), and where e1 ∈ R4 .
In abbreviation of reformulation (983)
4M M
kΦ̃(i , :)T k0 +
P P
minimize kΦ̃(: , j)k0
Φ̃∈ R4M ×M i=1 j=1
(984)
subject to Ẽ vec Φ̃ = τ̃
vec Φ̃ º 0
But this dimension remains out of reach of most highly regarded academic and commercial
binary solvers; especially disappointing insofar as sparsity of Ẽ is high with 1,503,732
nonzero entries in {−1, 0, 1, 2} ; element {2} arising only in the β constraint which is soon
to disappear after presolving.
that smallest face must hold a minimal cardinality solution. Matrix dimension is thereby
reduced:4.88
Designate I as the set of all surviving column indices of Ẽ from 4M 2 = 262,144 columns:
I ⊂ 1 . . . 4M 2
© ª
(988)
The i th column Ẽ(: , i) of matrix Ẽ belongs to the smallest face F of K that contains τ̃ if
and only if
find vec Φ̃ ∈ Rdim I , µ ∈ R
subject to µ τ̃ − Ẽ(: , i) = Ẽ vec Φ̃ (384)
vec Φ̃ º 0
is feasible. By a transformation of Saunders, this linear feasibility problem is the same as
whose feasible set is a proper subset of that in (989). Real variable µ can be set to 1
because if it must not be, then feasible (vec Φ̃)i = 1 could not be feasible to Eternity II
(984).
If infeasible here in (990), then the only choice remaining for (vec Φ̃)i is 0 ; meaning,
column Ẽ(: , i) may be discarded but only after all columns have been tested. This
tightened problem (990) therefore tells us two things when feasible: Ẽ(: , i) belongs to the
smallest face of K that contains τ̃ , and (vec Φ̃)i constitutes a nonzero vertex-coordinate
of permutation polyhedron (104). After presolving via this conic pruning method (with
subsequent zero-row and dependent-row removal),
c.i. presolver Eternity II: Generators of smallest face are conically independent
Matrix Ẽ now accounts for the board’s edge and holds what remains after discard of
all generators not in the smallest face F of cone K that contains τ̃ . To further prune
4.88 Column elimination can be quite dramatic but is dependent upon problem geometry. By method of
convex cones, we will discard 53,666 more columns via Saunders’ pdco; a total of 111,506 columns will
have been removed from 262,144. Following dependent-row removal via lusol, dimension of Ẽ becomes
7,362 × 150,638.
322 CHAPTER 4. SEMIDEFINITE PROGRAMMING
Figure 133: Directed graph of adjacency matrix for Ẽ (991) (Ẽ = A¥ in¦ [422]) representing
reduced equality constraint in Eternity II problem. “Movie” ¸ from [236] shows
realization in 3D; color corresponding to line-segment length. (Realization by Yifan Hu.)
generators relatively interior to that smallest face, we may subsequently test for conic
dependence as described in §2.10: for i = 1 . . . dim I = 1 . . . 150,638
If feasible, then column Ẽ(: , i) is a conically dependent generator of the smallest face and
must be discarded from matrix Ẽ before proceeding with test of remaining columns.
Generators interior to a smallest face could provide a lower cardinality solution, so it
might be imprudent to prune. It turns out, for Eternity II: generators of the smallest face,
previously found via (990), comprise a minimal set; id est, (287) is never feasible; no more
columns of Ẽ can be discarded.4.89
Incorporating more Clue Pieces, provided by Monckton, makes the Eternity II problem
harder in the sense that solution set is diminished; the target gets smaller.4.90
4.90 But given the four clues provided, our geometric presolver (p.320) produces a 15% smaller face; a
total very nearly half the 262,144 columns can be proven to correspond to 0 coefficients.
324 CHAPTER 4. SEMIDEFINITE PROGRAMMING
M in (993) is met, solution vec Φ̃⋆ will be optimal for Eternity II because it must then be
a Boolean vector with minimal cardinality M .
dim I
Maximization of convex function kvec Φ̃kdim I (monotonic on R+ ) is not a convex
M
problem, though the constraints are convex. [349, §32] Geometrical visualization of this
problem formulation is clear. We therefore choose to work with a complementary
direction vector z , in what follows, in predilection for a mental picture of convex function
maximization.
rumination
If it were possible to form a nullspace basis Z for Ẽ , of about equal sparsity such that
Figure 135: N × N = 12×12 Chimera topology for D:Wave 1152-qubit • chip architecture
illustrating 3,360(= 16x12x12 + 2x11x4x12) couplers •−−−• by line segments between
qubits (lines cross without intersection). Coupled qubits are neighbors, but distance is
not preserved by this map. (Drawing by Diane Carr.)
326 CHAPTER 4. SEMIDEFINITE PROGRAMMING
Note that this architecture is very different from conventional computing. The
processor has no large areas of memory (cache), rather each qubit has a tiny
piece of memory of its own. In fact, the chip is architected more like a biological
brain than the common ‘Von Neumann’ architecture of a conventional silicon
processor. One can think of the qubits as being like neurons, and the couplers as
being like synapses that control the flow of information between those neurons.
−[dwavesys.com , §1.3]
A quantum annealer is unlike a von Neumann computer architecture insofar as it does not
solve equations, there are no conditionally executable instructions, one qubit (the quantum
analogue to bit) can be in the two binary states at once,4.91 and qubit values may not be
set by a programmer [102, §2]. There is no clock in a quantum annealer which operates at
a temperature colder than outer space: near 0◦ Kelvin. The first commercially available
quantum annealer was delivered in 2011.4.92 Even though its magnetic superconducting
niobium qubits are etched on a silicon substrate, a chip, the D:Wave quantum annealer is
actually the first analog computer of its kind.4.93
Ising’s spin model [47, §2.1] [354, p.297] is a measure of molecular energy for a magnetic
material, for bipolar binary s ∈ Bn± = {−1, 1}n
1
J , ssT + hh , si
®
E(s) = 2 (998)
Given applied field strength h ∈ Rn and interaction field strength J ∈ Snh , a quantum
annealer minimizes this energy E which is always bounded because vector variable s
is bounded above and below.
A graph of the D:Wave N × N Chimera topology (N = 12) is represented in Figure 135;
a neighboring qubit topology. Hollow matrix J represents coupling that occurs among
physically neighboring qubits. Scalar 12 accounts for bidirectional coupling implied by J
matrix symmetry. Coupling, which is an application of entanglement in quantum physics,
can be controlled only for physically neighboring qubits. Increasing number of neighbors is
therefore of practical importance. [46] Effective coupling of distant qubits is implemented
by replicating qubits redundantly. [102, §√ 3.4] As rule of thumb, complete coupling of n
qubits (highest density J ) would leave O( n) qubits available.4.94
4.91 the qubit’s superposition state.
4.92 Itis not capable of solving Eternity II in 2016 because of qubits insufficient in number and coupling.
4.93 At present, there are two emergent technologies for harnessing quantum phenomena: adiabatic model
(analog annealer) and gate model (analogue to Boolean logic gates of digital computers).
4.94 1152-qubit architecture machines, having 3,360 physical couplers, became available in 2015 for $10M
USD. In 2016, 2048-qubit chips (6,016 couplers) were announced. If qubit growth continues following Rose’
law, we should see million-qubit chips in 2025. Given n qubits, complete coupling requires n(n−1)/2
couplers. Insufficient coupler population, not qubits, will become the bottleneck. In the near term,
innovating a three-dimensional qubit topology would accelerate ratio of coupler to qubit growth.
4.8. QUANTUM OPTIMIZATION 327
Figure 136: Chimera circuit chip layout abstract, topological dimension N = 1 illustrated.
Eight qubits comprise hollow slabs <= whereas couplers are represented by sixteen blue
discs • . Chip layout is dual to graph topology in Figure 135. Up/down arrows connote
final qubit states.
global minimum.
328 CHAPTER 4. SEMIDEFINITE PROGRAMMING
Coefficients B and a (1004) are selected by solution to a linear program whose undesirable
objectives always exceed the objective for each and every desirable state:
maximize gap
B , a , gap
subject to 0 ≥ a3 + gap
0 ≥ a2 + gap
0 ≥ a1 + gap
B23 + a2 + a3 ≥ a3 + gap
B23 + a2 + a3 ≥ a2 + gap
B23 + a2 + a3 ≥ a1 + gap
B13 + a1 + a3 ≥ a3 + gap
B13 + a1 + a3 ≥ a2 + gap (1006)
B13 + a1 + a3 ≥ a1 + gap
B12 + a1 + a2 ≥ a3 + gap
B12 + a1 + a2 ≥ a2 + gap
B12 + a1 + a2 ≥ a1 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a3 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a2 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a1 + gap
−2 ≤ a ≤ 2
−2 ≤ B ≤ 2
having solution:
−1 0 2 2
a⋆ = −1 , B⋆ = 0 0 2 , gap⋆ = 1 (1007)
−1 0 0 0
easily found by cvx [195] under Matlab. For higher-dimensional q vectors (by induction),
0 2
..
a⋆ = −1 ∈ Rn , B⋆ = . ∈ Rn×n , gap⋆ = 1 (1008)
0 0
minimize
n
xTATAx − 2xTAT b (1009)
x∈{0, 1}
(§E.0.1.0.1) where B , ATA− δ 2(ATA) and a , δ(ATA) − 2AT b from (1004). An adiabatic
quantum annealer (like D:Wave’s) is theoretically capable of solving Eternity II because
it may be expressed Ẽ q = τ̃ (984) assuming that any feasible binary solution is minimal
cardinality (p.319). This formulation (1009) decreases sparsity, from that of A , which
increases required qubit coupling.4.98
4.98 For sparsity as defined on page 317, for nonsymmetric B matrix, and for:
matrix E corresponding to (978), sparsity decreases from 0.0000052771 to 0.002683
matrix Ẽ corresponding to (985), sparsity decreases from 0.00051786 to 0.027965
matrix Ẽ corresponding to (986), sparsity decreases from 0.00056985 to 0.0047694
matrix Ẽ corresponding to (991), sparsity decreases from 0.00070522 to 0.0042453 .
330 CHAPTER 4. SEMIDEFINITE PROGRAMMING
···
quantum step:
···
q1 q2 B12 q1 q2 + a1 q1 + a2 q2
desirable
0 0 0
1 0 a1
1 1 B12 + a1 + a2
undesirable
0 1 a2
maximize gap
B , a , gap
subject to a2 ≥ 0 + gap
a2 ≥ a1 + gap (1010)
a2 ≥ B12 + a1 + a2 + gap
−1 ≤ a ≤ 1
−1 ≤ B ≤ 1
Upper and lower bounds are 1 , on each entrywise inequality, because gap is sufficient;
· ¸ · ¸
⋆ −1 ⋆ 0 −1
a = , B = , gap⋆ = 1 (1011)
1 0 0
Extensible to higher dimension; e.g, {000 , 100 , 110 , 111}T are desirable q ∈ R3 . 2
maximize gap
B , a , gap
subject to a3 ≥ 0 + gap
a3 ≥ a2 + gap
a3 ≥ B13 + a1 + a3 + gap
a3 ≥ B12 + a1 + a2 + gap
B23 + a2 + a3 ≥ 0 + gap
B23 + a2 + a3 ≥ a2 + gap
B23 + a2 + a3 ≥ B13 + a1 + a3 + gap
B23 + a2 + a3 ≥ B12 + a1 + a2 + gap
a1 ≥ 0 + gap (1012)
a1 ≥ a2 + gap
a1 ≥ B13 + a1 + a3 + gap
a1 ≥ B12 + a1 + a2 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ 0 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ a2 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ B13 + a1 + a3 + gap
B12 + B13 + B23 + a1 + a2 + a3 ≥ B12 + a1 + a2 + gap
−2 ≤ a ≤ 2
−2 ≤ B ≤ 2
Optimal coefficients are not unique, but optimal objective gap is:
1 0 1 −2
a⋆ = −2 , B⋆ = 0 0 2 , gap⋆ = 1 (1013)
1 0 0 0
This optimal B matrix represents required coupling for and but cannot be implemented
in Chimera directly because there is no completely coupled three-qubit circuit. 2
find X
X , Y, Z
find X
X∈ Rm×n subject to X ∈ ·C
≡ (1014)
¸
subject to X∈C Y X
G=
rank X ≤ k XT Z
rank G ≤ k
Lossy data compression techniques like JPEG are popular, but it is also well known that
compression artifacts become quite perceptible with signal postprocessing that goes beyond
mere playback of a compressed signal. [252] [279] Spatial or audio frequencies presumed
masked by a simultaneity are not encoded, for example, so rendered imperceptible even
with significant postfiltering (of the decompressed signal) that is meant to reveal them;
id est, desirable artifacts are forever lost, so lossy compressed data is not amenable
to search, analysis, or postprocessing: e.g, sound effects [109] [110] [112] or image
enhancement (Adobe Photoshop).4.99 Further, there can be no universally acceptable
unique metric of perception for gauging exactly how much data can be tossed. For these
reasons, there will always be need for raw (noncompressed) data.
In this example, only so much information is thrown out as to leave perfect
reconstruction within reach. Specifically, the MIT logo in Figure 137 is perfectly
reconstructed from 700 time-sequential samples {yi } acquired by the one-pixel camera
illustrated in Figure 138. The MIT-logo image in this example impinges a 46×81
array micromirror device. This mirror array is modulated by a pseudonoise source
that independently positions all the individual mirrors. A single photodiode (one pixel)
integrates incident light from all mirrors. After stabilizing the mirrors to a fixed
but pseudorandom pattern, light so collected is then digitized into one sample yi
by analog-to-digital (A/D) conversion. This sampling process is repeated with the
micromirror array modulated to a new pseudorandom pattern.
The most important questions are: How many samples are needed for perfect
reconstruction? Does that number of samples represent compression of the original data?
4.99 As simple a process as upward scaling of signal amplitude or image size will always introduce noise;
even to a noncompressed signal. But scaling-noise is particularly noticeable in a JPEG-compressed image;
e.g, text or any sharp edge.
4.9. CONSTRAINING RANK OF INDEFINITE MATRICES 333
Figure 137: Massachusetts Institute of Technology (MIT) logo, including its white
boundary, may be interpreted as a rank-5 matrix. This constitutes Scene Y observed
by the one-pixel camera in Figure 138 for Example 4.9.0.0.1.
yi
Figure 138: One-pixel camera. Compressive imaging camera block diagram. Incident
lightfield (corresponding to the desired image Y ) is reflected off a digital micromirror
device (DMD) array whose mirror orientations are modulated in the pseudorandom pattern
supplied by the random number generators (RNG). Each different mirror pattern produces
a voltage at the single photodiode that corresponds to one measurement yi . −[389] [440]
334 CHAPTER 4. SEMIDEFINITE PROGRAMMING
1 2 3 4 5
samples
We claim that perfect reconstruction of the MIT logo can be achieved reliably with as
few as 700 samples y = [yi ] ∈ R700 from this one-pixel camera. That number represents
only 19% of information obtainable from 3726 micromirrors.4.100 (Figure 139)
Our approach to reconstruction is to look for low-rank solution to an underdetermined
system:
find X
X ∈ R46×81
subject to A vec X = y (1016)
rank X ≤ 5
where vec X is the vectorized (39) unknown image matrix. Each row of wide matrix
A is one realization of a pseudorandom pattern applied to the micromirrors. Since
these patterns are deterministic (known), then the i th sample yi equals A(i , :) vec Y ;
id est, y = A vec Y . Perfect reconstruction here means optimal solution X ⋆ equals scene
Y ∈ R46×81 to within machine precision.
Because variable matrix X is generally not square or positive semidefinite, we constrain
its rank by rewriting the problem equivalently
find X
W1 ∈ R46×46 , W2 ∈ R81×81 , X ∈ R46×81
subject to A vec·X = y ¸ (1017)
W1 X
rank ≤5
X T W2
This rank constraint on the composite (block) matrix insures rank X ≤ 5 for any choice
of dimensionally compatible matrices W1 and W2 . But to solve this problem by convex
iteration, we alternate solution of semidefinite program
µ· ¸ ¶
W1 X
minimize tr Z
W1 ∈ S46 , W2 ∈ S81 , X ∈ R46×81 X T W2
subject to ·A vec X = y¸ (1018)
W1 X
º0
X T W2
4.100 That number (700 samples) is difficult to achieve, as reported in [342, §6]. If a minimal basis for the
MIT logo were instead constructed, only five rows or columns worth of data (from a 46×81 matrix) are
linearly independent. This means a lower bound on achievable compression is about 5×46 = 230 samples
plus 81 samples column encoding; which corresponds to 8% of the original information. (Figure 139)
4.10. CONVEX ITERATION RANK -1 335
(which has an optimal solution known in closed form, p.539) until a rank-5 composite
matrix is found.
With 1000 samples {yi } , convergence occurs in two iterations; 700 samples require
more than ten iterations but reconstruction remains perfect. Iterating more admits taking
of fewer samples. Reconstruction is independent of pseudorandom sequence parameters;
e.g, binary sequences succeed with the same efficiency as Gaussian or uniformly distributed
sequences. 2
find X ∈ Sn
subject to A svec X = b
(1020)
Xº0
rank X ≤ ρ
given an upper bound 0 < ρ < n on rank, a vector b ∈ Rm , and typically wide full-rank
svec(A1 )T
.. m×n(n+1)/2
A = ∈ R (712)
.
T
svec(Am )
tr(A1 X)
..
A svec X = . (713)
tr(Am X)
minimize tr(Z W )
Uii ∈ Sn , Uij ∈ Rn×n
U11 · · · U1ρ
. .. ..
subject to Z = .. . . º 0
T
U1ρ · · · Uρρ (1027)
Xρ
A svec Uii = b
i=1
tr Uij = 0 i < j = 2...ρ
with convex problem
minimize
nρ
tr(Z ⋆ W )
W∈ S
subject to 0¹W¹ I (1028)
tr W = nρ − 1
the latter providing direction of search W for a rank-1 matrix Z in (1027). These convex
problems (1027) (1028) are iterated until a rank-1 Z matrix is found (until the objective
338 CHAPTER 4. SEMIDEFINITE PROGRAMMING
of (1027) vanishes). Initial value of direction matrix W is the Identity. For subsequent
iterations, an optimal solution to (1028) has closed form (p.539).
Because of the nonconvex nature of a rank-constrained problem, there can be no proof
of convergence of this convex iteration to a feasible point of (1026). But the iteration
always converges to a local minimum because the sequence of objective values is monotonic
and nonincreasing; any monotonically nonincreasing real sequence converges. [294, §1.2]
[44, §1.1] A rank ρ matrix X solving the original problem (1020) is found when the
objective in (1027) converges to 0 : a certificate of global optimality for the convex
iteration. In practice, incidence of success is quite high (99.99% [421]); failures being
mostly attributable to numerical accuracy.
find U , δ(S ) , V
H, J
vec(H)T vec(U )T δ(S )T vec(V )T
1
vec H
º 0
subject to Z = vec U J
δ(S )
vec V
δ(S ) º 0
H = US ⊂ J
X = HV T ∈ J (1030)
HU T symmetry
U ¡TH perpendicularity
tr¡H(: , i) H(: , i)T¢ = S(i , i)2
¢
i=1 . . . ρ
tr H(: , i) U (: , i)T = S(i , i) i=1 . . . ρ
H orthogonality
U orthonormality
V orthonormality
rank Z = 1
2mρ+nρ+ρ
where variable matrix J ∈ S+ is a large partition of Z , where given rank-ρ matrix
m×n
X∈ R is subject to SVD in unknown orthonormal matrices U ∈ Rm×ρ and V ∈ Rn×ρ
and unknown diagonal matrix of singular values S ∈ Rρ×ρ , and where introduction of
variable
H , U S ∈ Rm×ρ (1031)
makes identification of input X = HV T possible within partition J . Orthogonality
constraints on columns of H , within J , and orthonormality constraints on columns of
U and V are critical; videlicet, h ⊥ v ⇔ tr(hv T ) = 0 ; v T v = 1 ⇔ tr(vv T ) = 1.
4.102 Otherwise, there exist many similarly structured tripartite nonorthogonal matrix decompositions; in
place of ρ nonzero singular values, diagonal matrix S would instead hold exactly ρ coordinates; orthonormal
columns in U and V would become merely linearly independent.
4.10. CONVEX ITERATION RANK -1 339
tr(Z W ) 5
0
1 2 3 4 5 iteration
Figure 140: Typical convergence of SVD by convex iteration for a 2×2 random X matrix.
Matrix W represents a direction vector of convex iteration rank-1.
hT hT uT uT v1T v2T
1 1 2 1 2 [ σ1 σ2 ]
h1 J11 J12 J13 J14 J15 J16 J17
T
h2 J12 J 22 J 23 J 24 J25 J26 J27
T T
u1 J13 J 23 J 33 J 34 J35 J36 J37
T T T
subject to Z = u 2 J J J J 44 J45 J46 J47 º 0
· ¸ 14 24 34
σ1 T T T T
σ2 J15 J25 J35 J45 J55 J56 J57
T T T T T
v1 J16 J26 J36 J46 J56 J66 J67
T T T T T T
v2 J17 J27 J37 J47 J57 J67 J77 (1032)
σ 1 , σ2 ≥ 0
H = [ J35 (: , 1) J45 (: , 2) ]
X = J16 + J27
T T
J13 = J13 , J24 = J24
tr J14 = 0 , tr J23 = 0
tr J11 = J55 (1 , 1) , tr J22 = J55 (2 , 2)
tr J13 = σ1 , tr J24 = σ2
tr J12 = 0
tr J33 = 1 , tr J44 = 1 , tr J34 = 0
tr J66 = 1 , tr J77 = 1 , tr J67 = 0
rank Z = 1
X = US V T =
2bcd+a(a2+b2+c2−d2−γ) 2bcd+a(a2+b2+c2−d2+γ)
√ √
(2bcd+a(a2+b2+c2−d2−γ))2 +(2acd+b(a2+b2−c2+d2−γ))2 (2bcd+a(a2+b2+c2−d2+γ))2 +(2acd+b(a2+b2−c2+d2+γ))2
2acd+b(a2+b2−c2+d2−γ) 2acd+b(a2+b2−c2+d2+γ)
√ √
(2bcd+a(a2+b2+c2−d2−γ))2 +(2acd+b(a2+b2−c2+d2−γ))2 (2bcd+a(a2+b2+c2−d2+γ))2 +(2acd+b(a2+b2−c2+d2+γ))2
√
a2+b2+c2+d2−γ
√ 0 (1033)
2 √
a2+b2+c2+d2+γ
0 √
2
2 2 2 2
a +b −c −d −γ 2(ac+bd)
√ √
4(ac+bd)2 +(a2+b2−c2−d2−γ)2 4(ac+bd)2 +(a2+b2−c2−d2−γ)2
a2+b2−c2−d2+γ
2(ac+bd)
√ √
4(ac+bd)2 +(a2+b2−c2−d2+γ)2 4(ac+bd)2 +(a2+b2−c2−d2+γ)2
where p
γ, ((b + c)2 + (a − d)2 ) ((b − c)2 + (a + d)2 ) (1034)
2
svec W3
m2
svec W2
m1
svec W1
Figure 141: W1 , W2 , and W3 represent the last three direction vectors in a sequence.
m1 represents the midpoint between direction vectors W1 and W2 ; m2 is the midpoint
of W2 and W3 . Straight line passes through midpoints.