EE364a Homework 7 Solutions
EE364a Homework 7 Solutions
Boyd
EE364a Homework 7 solutions
8.16 Maximum volume rectangle inside a polyhedron. Formulate the following problem as a
convex optimization problem. Find the rectangle
1 = x R
n
[ l _ x _ u
of maximum volume, enclosed in a polyhedron T = x [ Ax _ b. The variables are
l, u R
n
. Your formulation should not involve an exponential number of constraints.
Solution. A straightforward, but very inecient, way to express the constraint 1 T
is to use the set of m2
n
inequalities Av
i
_ b, where v
i
are the (2
n
) corners of 1. (If the
corners of a box lie inside a polyhedron, then the box does.) Fortunately it is possible
to express the constraint in a far more ecient way. Dene
a
+
ij
= maxa
ij
, 0, a
ij
= maxa
ij
, 0.
Then we have 1 T if and only if
n
i=1
(a
+
ij
u
j
a
ij
l
j
) b
i
, i = 1, . . . , m,
The maximum volume rectangle is the solution of
maximize (
n
i=1
(u
i
l
i
))
1/n
subject to
n
i=1
(a
+
ij
u
j
a
ij
l
j
) b
i
, i = 1, . . . , m,
with implicit constraint u _ l. Another formulation can be found by taking the log of
the objective, which yields
maximize
n
i=1
log(u
i
l
i
)
subject to
n
i=1
(a
+
ij
u
j
a
ij
l
j
) b
i
, i = 1, . . . , m.
9.30 Gradient and Newton methods. Consider the unconstrained problem
minimize f(x) =
m
i=1
log(1 a
T
i
x)
n
i=1
log(1 x
2
i
),
with variable x R
n
, and domf = x [ a
T
i
x < 1, i = 1, . . . , m, [x
i
[ < 1, i = 1, . . . , n.
This is the problem of computing the analytic center of the set of linear inequalities
a
T
i
x 1, i = 1, . . . , m, [x
i
[ 1, i = 1, . . . , n.
Note that we can choose x
(0)
= 0 as our initial point. You can generate instances of
this problem by choosing a
i
from some distribution on R
n
.
1
(a) Use the gradient method to solve the problem, using reasonable choices for the
backtracking parameters, and a stopping criterion of the form |f(x)|
2
.
Plot the objective function and step length versus iteration number. (Once you
have determined p
versus iteration.)
Experiment with the backtracking parameters and to see their eect on
the total number of iterations required. Carry these experiments out for several
instances of the problem, of dierent sizes.
(b) Repeat using Newtons method, with stopping criterion based on the Newton
decrement
2
. Look for quadratic convergence. You do not have to use an ecient
method to compute the Newton step, as in exercise 9.27; you can use a general
purpose dense solver, although it is better to use one that is based on a Cholesky
factorization.
Hint. Use the chain rule to nd expressions for f(x) and
2
f(x).
Solution.
(a) Gradient method. The gures show the function values and step lengths versus
iteration number for an example with m = 200, n = 100. We used = 0.01,
= 0.5, and exit condition |f(x
(k)
)|
2
10
3
.
0 100 200 300 400 500
10
8
10
6
10
4
10
2
10
0
10
2
10
4
k
f
(
x
(
k
)
)
0 2 4 6 8
0
0.2
0.4
0.6
0.8
1
k
t
(
k
)
The following is a Matlab implementation.
% Newton method
vals = []; steps = [];
x = zeros(n,1);
for iter = 1:MAXITERS
val = -sum(log(1-A*x)) - sum(log(1+x)) - sum(log(1-x));
vals = [vals, val];
d = 1./(1-A*x);
grad = A*d - 1./(1+x) + 1./(1-x);
hess = A*diag(d.^2)*A + diag(1./(1+x).^2 + 1./(1-x).^2);
v = -hess\grad;
fprime = grad*v
if abs(fprime) < NTTOL, break; end;
t = 1;
while ((max(A*(x+t*v)) >= 1) | (max(abs(x+t*v)) >= 1)),
t = BETA*t;
end;
while ( -sum(log(1-A*(x+t*v))) - sum(log(1-(x+t*v).^2)) > ...
val + ALPHA*t*fprime )
t = BETA*t;
end;
x = x+t*v;
4
steps = [steps,t];
end;
optval = vals(length(vals));
figure(3)
semilogy([0:(length(vals)-2)], vals(1:length(vals)-1)-optval, -, ...
[0:(length(vals)-2)], vals(1:length(vals)-1)-optval, o);
xlabel(x); ylabel(z);
figure(4)
plot([1:length(steps)], steps, -, [1:length(steps)], steps, o);
axis([0, length(steps), 0, 1.1]);
xlabel(x); ylabel(z);
9.31 Some approximate Newton methods. The cost of Newtons method is dominated by
the cost of evaluating the Hessian
2
f(x) and the cost of solving the Newton system.
For large problems, it is sometimes useful to replace the Hessian by a positive denite
approximation that makes it easier to form and solve for the search step. In this
problem we explore some common examples of this idea.
For each of the approximate Newton methods described below, test the method on some
instances of the analytic centering problem described in exercise 9.30, and compare the
results to those obtained using the Newton method and gradient method.
(a) Re-using the Hessian. We evaluate and factor the Hessian only every N iterations,
where N > 1, and use the search step x = H
1
f(x), where H is the last
Hessian evaluated. (We need to evaluate and factor the Hessian once every N
steps; for the other steps, we compute the search direction using back and forward
substitution.)
(b) Diagonal approximation. We replace the Hessian by its diagonal, so we only have
to evaluate the n second derivatives
2
f(x)/x
2
i
, and computing the search step
is very easy.
Solution.
(a) The gure shows the function value versus approxmate total number of ops
required (for the same example as in the solution of exercise 9.30), for N = 1
(i.e., Newtons method), N = 15, and N = 30.
5
0 0.5 1 1.5 2 2.5
x 10
6
10
10
10
5
10
0
10
5
# of ops
f
(
x
(
k
)
)
Newton
N = 15 N = 30
We see that the speed of convergence is increased using the method of using a
factorized Hessian for several steps, as measured by true eort (i.e., number of
ops required). Of course in terms of iterations, the method is worse than the
basic Newton method.
The following is a Matlab implementation.
randn(state,1);
m=200;
n=100;
ALPHA = 0.01;
BETA = 0.5;
MAXITERS = 1000;
NTTOL = 1e-9;
GRADTOL = 1e-3;
% generate random problem
A = randn(m,n);
% Newton method with periodically updated Hessian
for N = [1,15,30]; % re-compute Hessian every N iterations
vals = [];
flops = [];
flop = 0;
x = zeros(n,1);
for iter = 1:MAXITERS
val = -sum(log(1-A*x))-sum(log(1+x))-sum(log(1-x));
vals = [vals,val];
6
flops = [flops,flop];
d = 1./(1-A*x);
grad = A*d-1./(1+x)+1./(1-x);
if (rem(iter-1,N) == 0)
H = A*diag(d.^2)*A+diag(1./(1+x).^2+1./(1-x).^2);
L = chol(H,lower);
flop = (1/3)*n^3; % add flop for Cholesky factorization
else
flop = 0;
end
v = -L\(L\grad);
flop = flop+2*n^2; % add flop for fwd/bwd substitution
fprime = grad*v
if (abs(fprime) < NTTOL) break; end
t = 1;
while ((max(A*(x+t*v))>=1) | (max(abs(x+t*v))> 1)),
t = BETA*t;
end
while (-sum(log(1-A*(x+t*v)))-sum(log(1-(x+t*v).^2)) > ...
val + ALPHA*t*fprime )
t = BETA*t;
end
x = x+t*v;
end
if (N==1), optval = vals(length(vals)); end
figure(1)
cflops = cumsum(flops(1:end-1));
perror = vals(1:end-1)-optval;
semilogy(cflops,perror,-,cflops,perror,o);
hold on;
semilogy(cflops(1:N:end-1),perror(1:N:end-1),...
mo,MarkerEdgeColor,k,MarkerFaceColor,b,MarkerSize,8);
text(cflops(end),perror(end),[N,num2str(N)]);
hold on;
end
7
xlabel(x); ylabel(z);
(b) The gure shows the function value versus iteration number (for the same example
as in the solution of exercise 9.30), for a diagonal approximation of the Hessian.
The experiment shows that the algorithm converges very much like the gradient
method.
0 200 400 600 800
10
8
10
6
10
4
10
2
10
0
10
2
10
4
k
f
(
x
(
k
)
)
i=1
(a
T
i
x b
i
)
2
+
n1
i=1
(x
i
x
i+1
)
2
+
n
i=1
x
2
i
,
where x R
n
is the variable, and , > 0 are parameters.
(a) Express the optimality conditions for this problem as a set of linear equations
involving x. (These are called the normal equations.)
13
(b) Now assume that k n. Describe an ecient method to solve the normal
equations found in (2a). Give an approximate op count for a general method
that does not exploit structure, and also for your ecient method.
(c) A numerical instance. In this part you will try out your ecient method. Well
choose k = 100 and n = 2000, and = = 1. First, randomly generate A and
b with these dimensions. Form the normal equations as in (2a), and solve them
using a generic method. Next, write (short) code implementing your ecient
method, and run it on your problem instance. Verify that the solutions found by
the two methods are nearly the same, and also that your ecient method is much
faster than the generic one.
Note: Youll need to know some things about Matlab to be sure you get the speedup
from the ecient method. Your method should involve solving linear equations with
tridiagonal coecient matrix. In this case, both the factorization and the back sub-
stitution can be carried out very eciently. The Matlab documentation says that
banded matrices are recognized and exploited, when solving equations, but we found
this wasnt always the case. To be sure Matlab knows your matrix is tridiagonal, you
can declare the matrix as sparse, using spdiags, which can be used to create a tridi-
agonal matrix. You could also create the tridiagonal matrix conventionally, and then
convert the resulting matrix to a sparse one using sparse.
One other thing you need to know. Suppose you need to solve a group of linear
equations with the same coecient matrix, i.e., you need to compute F
1
a
1
, ..., F
1
a
m
,
where F is invertible and a
i
are column vectors. By concatenating columns, this can
be expressed as a single matrix
F
1
a
1
F
1
a
m
= F
1
[a
1
a
m
] .
To compute this matrix using Matlab, you should collect the righthand sides into one
matrix (as above) and use Matlabs backslash operator: F\A. This will do the right
thing: factor the matrix F once, and carry out multiple back substitutions for the
righthand sides.
Solution.
(a) The objective function is
x
T
(A
T
A + + I)x 2b
T
Ax + b
T
b,
where A R
kn
is the matrix with rows a
i
, and R
nn
is the tridiagonal
14
matrix
=
1 1 0 0 0 0
1 2 1 0 0 0
0 1 2 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 2 1 0
0 0 0 1 2 1
0 0 0 0 1 1
.
Since the problem is unconstrained, the optimality conditions are
(A
T
A + + I)x
= A
T
b. (1)
(b) If no structure is exploited, then solving (1) costs approximately (1/3)n
3
ops. If
k n, we need to solve a system Fx = g where F is the sum of a tridiagonal
and a (relatively) low-rank matrix. We can use the Sherman-Morrison-Woodbury
formula
x
= ( + I)
1
g ( + I)
1
A
T
(I + A( + I)
1
A
T
)
1
A( + I)
1
g
to eciently solve (1) as follows:
i. Solve ( + I)z
1
= g and ( + I)Z
2
= A
T
for z
1
and Z
2
. Since + I
is tridiagonal, the total cost for this is approximately 4nk + 5n ops (n for
factorization and 4n(k + 1) for the solves).
ii. Form Az
1
and AZ
2
(2nk + 2nk
2
ops).
iii. Solve (I + AZ
2
)z
3
= Az
1
for z
3
((1/3)k
3
ops).
iv. Form x
= z
1
Z
2
z
3
(2nk ops).
The total op count, keeping only leading terms, is 2nk
2
ops, which is much
smaller than (1/3)n
3
when k n.
(c) Heres the Matlab code:
clear all; close all;
n = 2000;
k = 100;
delta = 1;
eta = 1;
A = rand(k,n);
b = rand(k,1);
e = ones(n,1);
D = spdiags([-e 2*e -e],[-1 0 1], n,n);
15
D(1,1) = 1; D(n,n) = 1;
I = speye(n);
F = A*A + eta*I + delta*D;
P = eta*I + delta*D; %P is cheap to invert since its tridiagonal
g = A*b;
%Directly computing optimal solution
fprintf(\nComputing solution directly\n);
s1 = cputime;
x_gen = F\g;
s2 = cputime;
fprintf(Done (in %g sec)\n,s2-s1);
fprintf(\nComputing solution using efficient method\n);
%x_eff = P^{-1}g - P^{-1}A(I +AP^{-1}A)^{-1}AP^{-1}g.
t1= cputime;
Z_0 = P\[g A];
z_1 = Z_0(:,1);
%z_2 = A*z_1;
Z_2 = Z_0(:,2:k+1);
z_3 = (sparse(1:k,1:k,1) +A*Z_2)\(A*z_1);
x_eff = z_1 - Z_2*z_3;
t2 = cputime;
fprintf(Done (in %g sec)\n,t2-t1);
fprintf(\nrelative error = %e\n,norm(x_eff-x_gen)/norm(x_gen) );
16