0% found this document useful (0 votes)
57 views

Gradient Lagrange

This document discusses concepts related to optimization including derivatives, gradients, Lagrange multipliers, and their applications. It defines the derivative and Jacobian of a function, and how they relate to the linear approximation of a function. It then defines the gradient and how it relates to the derivative and first-order Taylor expansion of a function. Finally, it discusses how to use optimality conditions involving gradients and Lagrange multipliers to minimize functions subject to equality constraints.

Uploaded by

mir emmett
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Gradient Lagrange

This document discusses concepts related to optimization including derivatives, gradients, Lagrange multipliers, and their applications. It defines the derivative and Jacobian of a function, and how they relate to the linear approximation of a function. It then defines the gradient and how it relates to the derivative and first-order Taylor expansion of a function. Finally, it discusses how to use optimality conditions involving gradients and Lagrange multipliers to minimize functions subject to equality constraints.

Uploaded by

mir emmett
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

EE263

Prof. S. Boyd

Derivative, Gradient, and Lagrange Multipliers


Derivative
Suppose f : Rn Rm is differentiable. Its derivative or Jacobian at a point x Rn is
denoted Df (x) Rmn , defined as

fi
,
(Df (x))ij =
xj x

i = 1, . . . , m,

j = 1, . . . , n.

The first order Taylor expansion of f at (or near) x is given by


f(y) = f (x) + Df (x)(y x).
When y x is small, f (y) f(y) is very small. This is called the linearization of f at (or
near) x.
As an example, consider n = 3, m = 2, with
f (x) =

"

x1 x22
x1 x3

Its derivative at the point x is


Df (x) =

"

1 2x2 0
x3
0
x1

and its first order Taylor expansion near x = (1, 0, 1) is given by


f(y) =

"

1
1

"

1 0 0
1 0 1

y 0 .
1

Gradient
For f : Rn R, the gradient at x Rn is denoted f (x) Rn , and it is defined as
f (x) = Df (x)T , the transpose of the derivative. In terms of partial derivatives, we have

f
f (x)i =
,
xi x

i = 1, . . . , n.

The first order Taylor expansion of f at x is given by


f(x) = f (x) + f (x)T (y x).
1

Gradient of affine and quadratic functions


You can check the formulas below by working out the partial derivatives.
For f affine, i.e., f (x) = aT x + b, we have f (x) = a (independent of x).
For f a quadratic form, i.e., f (x) = xT P x with P Rnn , we have f (x) = (P + P T )x.
When P is symmetric, this simplifies to f (x) = 2P x.
We can use these basic facts and some simple calculus rules, such as linearity of gradient
operator (the gradient of a sum is the sum of the gradients, and the gradient of a scaled
function is the scaled gradient) to find the gradient of more complex functions. For example,
lets compute the gradient of
f (x) = (1/2)kAx bk2 + cT x,
with A Rmn . We expand the first term to get
f (x) = (1/2)xT (AT A)x bT Ax + (1/2)bT b + cT x,
and now use the rules above to get
f (x) = AT Ax AT b + c = AT (Ax b) + c.

Minimizing a function
Suppose f : Rn R, and we want to choose x so as to minimize f (x). Assuming f is
differentiable, any optimal x (and its possible that there isnt an optimal x) must satisfy
f (x) = 0. The converse is false: f (x) = 0 does not mean that x minimizes f . Such a
point is actually a stationary point, and could be a saddle point or a maximum of f , or a
local minimum. We refer to f (x) = 0 as an optimality condition for minimizing f . It is
necessary, but not sufficient, for x to minimize f .
We use this result as follows. To minimize f , we find all points that satisfy f (x) = 0.
If there is a point that minimizes f , it must be one of these.
Example: Least-squares. Suppose we want to choose x Rn to minimize kAx bk,
where A Rmn is skinny and full rank. This is the same as minimizing f (x) = (1/2)kAx
bk2 . The optimality condition is
f (x) = AT Ax AT b = 0.
Only one value of x satisfies this equation: xls = (AT A)1 AT b.
We have to use other methods to determine that f is actually minimized (and not, say,
maximized) by xls . Here is one method. For any z, we have
(Az)T (Axls b) = z T (AT Axls AT b) = 0,

so Az Axls b. Now we note that


kAx bk2 =
=
=

kAxls b + A(x xls )k2


kAxls bk2 + 2(A(x xls ))T (Axls b) + kA(x xls )k2
kAxls bk2 + kA(x xls )k2
kAxls bk2

using the orthogonality result above. So this shows that xls really does minimize f . With
this argument, we really didnt need the optimality condition. But the optimality condition
gave us a quick way to find the answer, if not verify it.

Lagrange multipliers
Suppose we want to solve the constrained optimization problem
minimize f (x)
subject to g(x) = 0,
where f : Rn R and g : Rn Rp .
Lagrange introduced an extension of the optimality condition above for problems with
constraints. We first form the Lagrangian
L(x, ) = f (x) + T g(x),
where Rp is called the Lagrange multiplier. The (necessary, but not sufficient) optimality
conditions are
x L(x, ) = 0,
L(x, ) = g(x) = 0.
These two conditions are called the KKT (Kharush-Kuhn-Tucker) equations. The second
condition is not very interesting; we already knew that the optimal x must satisfy g(x) = 0.
The first is interesting, however.
To solve the constrained problem, we attempt to solve the KKT equations. The optimal
point (if one exists) must satisfy the KKT equations.
Example: Linearly constrained least-squares. Consider the linearly constrained leastsquares problem (see lecture slides 8)
minimize (1/2)kAx bk2
subject to Cx d = 0
with A Rmn and C Rpn . The Lagrangian is
L(x, ) = (1/2)kAx bk2 + T (Cx d)
= (1/2)xT AAx bT Ax + (1/2)bT b + (C T )T x T d.
3

The KKT conditions are


x L(x, ) = AT Ax AT b + C T = 0,

L(x, ) = Cx d = 0.

These are a set of n + p linear equations in n + p variables, which we can write as


"

AT A C T
C
0

#"

"

AT b
d

AT b
d

If the matrix on the left is invertible, this has one solution,


"

"

AT A C T
C
0

#1 "

As in the least-squares example above, you have to use another argument to show that
x found this way actually minimizes f subject to Cx = d. We dont expect you to be able
to come up with this argument, but heres how it goes. Suppose that z satisfies Cz = 0.
Then
(Az)T (Ax b) = z T (AT Ax AT b) = z T (C T ) = (Cz)T = 0,

so (Az) (Ax b). Using exactly the same calculation as for least-squares above, we get
kAx bk2 kAx bk2 ,
which shows that x does indeed minimize kAx bk subject to Cx = d.

You might also like