Expanding Subspace Theorem
Expanding Subspace Theorem
1.2. Conjugate Direction Methods. In this section we focus on the problem P when f
has the form
1
(1.1) f (x) := xT Qx − bT x,
2
where Q is a symmetric positive definite matrix. Our development in this section revolves
around the notion of Q-conjugacy.
Definition 1.1 (Conjugacy). Let Q ∈ Rn×n be symmetric and positive definite. We say
that the vectors x, y ∈ Rn \{0} are Q-conjugate (or Q-orthogonal) if xT Qy = 0.
Proposition 1.1.1 (Conjugacy implies Linear Independence). If Q ∈ Rn×n is pos-
itive definite and the set of nonzero vectors d0 , d1 , . . . , dk are (pairwise) Q-conjugate, then
these vectors are linearly independent.
k
P
Proof. If 0 = αi di , then for i0 ∈ {0, 1, . . . , k}
i=0
k
X
0= dTi0 Q[ αi di ] = αi0 dTi0 Qdi ,
i=0
so that
−dTj ∇f (x0 )
(1.3) µj = j = 0, 1 . . . , k − 1 .
dTj Qdj
This observation motivates the following theorem.
Theorem 1.2. [Expanding Subspace Theorem]
Let {di }n−1 n n
i=0 be a sequence of nonzero Q-conjugate vectors in R . Then for any x0 ∈ R the
sequence {xk } generated according to
xk+1 = xk + αk dk
with
αk := arg min{f (xk + αdk ) : α ∈ R}
has the property that f (x) = 21 xT Qx − bT x attains its minimum value on the affine set
x0 + Span {d0 , . . . , dk−1 } at the point xk .
Proof. Let us first compute the value of the αk ’s. Set
ϕk (α) = f (xk + αdk )
2
= α2 dTk Qdk + αgkT dk + f (xk ),
where gk = ∇f (xk ) = Qxk − b. Then ϕ0k (α) = αdTk Qdk + gkT dk . Since f is strictly convex so
is φ, and so αk is the unique solution to φ0 (α) = 0 which is given by
gkT dk
αk = − .
dTk Qdk
Therefore,
xk = x0 + α0 d0 + α1 d1 + · · · + αk dk
with
gkT dk
αk = − , k = 0, 1, . . . , k.
dTk Qdk
Preceding the theorem it was shown that if x∗ is the solution to the problem
min {f (x) |x ∈ x0 + Span(d0 , d1 , . . . , dk )} ,
then x∗ is given by (1.2) and (1.3). Therefore, If we can now show that µj = αj , j =
0, 1, . . . , k, then x∗ = xk which proves the result. For each j ∈ {0, 1, . . . , k} we have
∇f (xj )T dj = (Qxj − b)T dj
= (Q(x0 + α0 d0 + α1 d1 + · · · + αj−1 dj−1 ) − b)T dj
= (Qx0 − b)T dj + α0 dT0 Qdj + α1 dT1 Qdj + · · · + αj−1 dTj−1 Qdj
= (Qx0 − b)T dj
= ∇f (x0 )T dj .
Therefore, for each j ∈ {0, 1, . . . , k},
−∇f (xj )T dj −∇f (x0 )T dj
αj = = = µj ,
dTj Qdj dTj Qdj
4
/ Span [g0 , . . . , Qk g0 ] and so Span [g0 , g1 , . . . , gk+1 ] = Span [g0 , . . . , Qk+1 g0 ], which
gk+1 ∈
proves (1).
To prove (2) write
dk+1 = −gk+1 + βk dk
so that (2) follows from (1) and the induction hypothesis on (2).
To see (3) observe that
dTk+1 Qdi = −gk+1 Qdi + βk dTk Qdi .
For i = k the right hand side is zero by the definition of βk . For i < k both terms vanish.
T
The term gk+1 Qdi = 0 by Theorem 1.2 since Qdi ∈ Span[d0 , . . . , dk ] by (1) and (2). The
T
term di Qdi vanishes by the induction hypothesis on (3).
To prove (4) write
−gkT dk = gkT gk − βk−1 gkT dk−1
where gkT dk−1 = 0 by Theorem 1.2.
T
To prove (5) note that gk+1 gk = 0 by Theorem 1.2 because gk ∈ Span[d0 , . . . , dk ]. Hence
T 1 T 1 T
gk+1 Qdk = gk+1 [gk+1 − gk ] = g gk+1 .
αk αk k+1
Therefore,
T T
1 gk+1 gk+1 gk+1 gk+1
βk = T
= T
.
αk dk Qdk gk gk
Remarks:
(1) The C–G method decribed above is a descent method since the values
f (x0 ), f (x1 ), . . . , f (xn )
form a decreasing sequence. Moreover, note that
∇f (xk )T dk = −gkT gk and αk > 0 .
Thus, the C–G method behaves very much like the descent methods discussed pevi-
ously.
(2) It should be observed that due to the occurrence of round-off error the C-G algorithm
is best implemented as an iterative method. That is, at the end of n steps, f may
not attain its global minimum at xn and the intervening directions dk may not be
Q-conjugate. Consequently, at the end of the nth step one should check the value
k∇f (xn )k. If it is sufficiently small, then accept xn as the point at which f attains
its global minimum value; otherwise, reset x0 := xn and run the algorithm again.
Due to the observations in remark above, this approach is guarenteed to continue to
reduce the function value if possible since the overall method is a descent method.
In this sense the C–G algorithm is self correcting.
6
to get some kind of control on this behavior, we define a measure of conjugacy and if this
measure is violated, then we restart. Second, we need to make sure that the search directions
dk are descent directions. Moreover, (a) the angle between these directions and the negative
gradient should be bounded away from zero in order to force the gradient to zero, and (b)
the directions should have a magnitude that is comparable to that of the gradient in order
to prevent ill–conditioning. The precise restart conditions are given below.
Restart Conditions
(1) k = n
T
(2) |gk+1 gk | ≥ 0.2gkT gk
(3) −2gk gk ≥ gkT dk ≥ −0.2gkT gk
T
Conditions (2) and (3) above are known as the Powell restart conditions.