Gradient Lagrange
Gradient Lagrange
Prof. S. Boyd
fi
,
(Df (x))ij =
xj x
i = 1, . . . , m,
j = 1, . . . , n.
"
x1 x22
x1 x3
"
1 2x2 0
x3
0
x1
"
1
1
"
1 0 0
1 0 1
y 0 .
1
Gradient
For f : Rn R, the gradient at x Rn is denoted f (x) Rn , and it is defined as
f (x) = Df (x)T , the transpose of the derivative. In terms of partial derivatives, we have
f
f (x)i =
,
xi x
i = 1, . . . , n.
Minimizing a function
Suppose f : Rn R, and we want to choose x so as to minimize f (x). Assuming f is
differentiable, any optimal x (and its possible that there isnt an optimal x) must satisfy
f (x) = 0. The converse is false: f (x) = 0 does not mean that x minimizes f . Such a
point is actually a stationary point, and could be a saddle point or a maximum of f , or a
local minimum. We refer to f (x) = 0 as an optimality condition for minimizing f . It is
necessary, but not sufficient, for x to minimize f .
We use this result as follows. To minimize f , we find all points that satisfy f (x) = 0.
If there is a point that minimizes f , it must be one of these.
Example: Least-squares. Suppose we want to choose x Rn to minimize kAx bk,
where A Rmn is skinny and full rank. This is the same as minimizing f (x) = (1/2)kAx
bk2 . The optimality condition is
f (x) = AT Ax AT b = 0.
Only one value of x satisfies this equation: xls = (AT A)1 AT b.
We have to use other methods to determine that f is actually minimized (and not, say,
maximized) by xls . Here is one method. For any z, we have
(Az)T (Axls b) = z T (AT Axls AT b) = 0,
using the orthogonality result above. So this shows that xls really does minimize f . With
this argument, we really didnt need the optimality condition. But the optimality condition
gave us a quick way to find the answer, if not verify it.
Lagrange multipliers
Suppose we want to solve the constrained optimization problem
minimize f (x)
subject to g(x) = 0,
where f : Rn R and g : Rn Rp .
Lagrange introduced an extension of the optimality condition above for problems with
constraints. We first form the Lagrangian
L(x, ) = f (x) + T g(x),
where Rp is called the Lagrange multiplier. The (necessary, but not sufficient) optimality
conditions are
x L(x, ) = 0,
L(x, ) = g(x) = 0.
These two conditions are called the KKT (Kharush-Kuhn-Tucker) equations. The second
condition is not very interesting; we already knew that the optimal x must satisfy g(x) = 0.
The first is interesting, however.
To solve the constrained problem, we attempt to solve the KKT equations. The optimal
point (if one exists) must satisfy the KKT equations.
Example: Linearly constrained least-squares. Consider the linearly constrained leastsquares problem (see lecture slides 8)
minimize (1/2)kAx bk2
subject to Cx d = 0
with A Rmn and C Rpn . The Lagrangian is
L(x, ) = (1/2)kAx bk2 + T (Cx d)
= (1/2)xT AAx bT Ax + (1/2)bT b + (C T )T x T d.
3
L(x, ) = Cx d = 0.
AT A C T
C
0
#"
"
AT b
d
AT b
d
"
AT A C T
C
0
#1 "
As in the least-squares example above, you have to use another argument to show that
x found this way actually minimizes f subject to Cx = d. We dont expect you to be able
to come up with this argument, but heres how it goes. Suppose that z satisfies Cz = 0.
Then
(Az)T (Ax b) = z T (AT Ax AT b) = z T (C T ) = (Cz)T = 0,
so (Az) (Ax b). Using exactly the same calculation as for least-squares above, we get
kAx bk2 kAx bk2 ,
which shows that x does indeed minimize kAx bk subject to Cx = d.