0% found this document useful (0 votes)
7 views

18 Vector Calculus and Optimization

Uploaded by

Phương Lê Thu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

18 Vector Calculus and Optimization

Uploaded by

Phương Lê Thu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Vector Calculus for Optimization and

Root-Finding

Brian Van Koten

2024/04/22

Unconstrained Optimization
Goal is to find

x ∈ argminz f (z)

for some function f : Rn → R.


That is, find a minimizer of f .
Here, we call f the objective function.

Minima vs. Maxima


The minima of f are the maxima of −f , so methods for finding minima can also
be used to find maxima.

Constraints
In many applications, one has constraints, e.g.

minimize f (x) subject to Ax = 0


for some matrix A ∈ Rk×n .
We will not consider constrained minimization, but it is very important.
See for example wikipedia on Linear Programming.

Review of Vector Calculus


Gradient and Directional Derivatives
Let g : Rm → R.

1
Definition. The directional derivative of g at x ∈ Rm in direction h ∈ Rm is

d
∂h g(x) := g(x + sh),
ds s=0

if the derivative above exists.


Define gh : R → R by gh (s) = g(x + sh).
Think of graph of a gh as the cross-section of graph of g over the line {x + sh; h ∈
R}.
(Picture on board.)
The directional derivative is gh′ (0) = ds
d
s=0
g(x+sh), i.e. the slope of the tangent
to the graph of gh at x on the line {x + sh; h ∈ R}.
(Picture on board.)
If you prefer to understand the graph of gh as a curve over the s-axis in a
two-dimensional space, then the directional derivative is the slope of the tangent
at s = 0.
(Picture on board.)
Alternatively, suppose one were to move through Rm along the path γ(s) = x+sh.
Here, γ(s) is position at time s.
The rate of change in the value of g observed at position γ(s) is

d d
g(γ(s)) = g(x + sh),
ds ds
so the directional derivative is the rate of change of g along the path γ at time
zero.
(Picture on board.)
The directional derivatives can be calculated from the gradient.
Definition. The gradient of g at x is
 ∂g 
∂x1 (x)
∂g
∂x2 (x) 
 
m

∇g(x) = 
 .. 
∈R .
 . 
∂g
∂xm (x)

Lemma. If all partial derivatives of g exist at x and are continuous, then

d
∂h g(x) = g(x + sh) = ∇g(x) · h.
ds s=0

2
First-Order Optimality Condition
Lemma. If x ∈ Rm is a local minimizer of a differentiable function g : Rm → R,
then
∇g(x) = 0.

Proof. If x is a local minimizer of g, then for every h ∈ Rm , zero is a local


minimizer of the function gh : R → R by

gh (s) = g(x + hs).

Therefore, we must have gh′ (0) = ∂h g(x) = ∇g(x) · h = 0 for all h ∈ Rn . It


follows that ∇g(x) = 0. (If this last point is not obvious, observe that we can
take h = ∇g(x), and then ∇g(x) · h = |∇g(x)|2 = 0.)
(Picture on board.)
Q.E.D.

Tangent Vector to a Path


Let γ : [a, b] → Rm be a path in Rm .
Let γi : [a, b] → R be the i’th coordinate component of γ for i = 1, . . . , m.
Definition. The derivative or tangent vector of γ at s ∈ [a, b] is
 ′ 
γ1 (s)
γ ′ (s) =  ... 
 

γm (s)

(Picture on board in class.)

Jacobian Matrix as Derivative of Vector-Valued Function


Let f : Rm → Rn .
Let fi : Rm → R denote the i’th coordinate component of f for i = 1, . . . , n.
Definition. The directional derivative of f at x ∈ Rm in direction h ∈ Rm
is the derivative of the path s 7→ f (x + sh) at s = 0. That is, the directional
derivative is  ′ 
fh,1 (s)
∂h f (x) := fh′ (0) =  ... 
 

fh,m (s)
for fh : R → Rm by fh (s) = f (x + sh).

3
Definition. The derivative (or Jacobian) of a function f : Rm → Rn at a point
x ∈ Rm is
 ∂f ∂f1 ∂f1

∂x2 (x) . . .
1
∂x1 (x) ∂xm (x)
 ∂f2 ∂f ∂f2
 ∂x1 (x) ∂x22 (x) . . . ∂xm (x)

f ′ (x) := 
 .. .. .. ..  ∈R
m×n
,
 . . . . 
∂fn ∂fn ∂fn
∂x1 (x) ∂x2 (x) . . . ∂xm (x)

if the partial derivatives all exist and are continuous. Sometimes the derivative
is denoted Df (x) or δf (x) instead of f ′ (x) in this context.
Observe that for a function f : Rn → R, the derivative is the transpose of the
gradient: f ′ (x) = ∇f (x)t .
Lemma. If the partial derivatives of f are continuous at x, we have

∇f1 (x)t
   
∇f1 (x) · h
 ∇f2 (x) · h   ∇f2 (x)t 
∂h f (x) =  = h = f ′ (x)h.
   
..  .. 
 .   . 
∇fm (x) · h ∇fm (x)t
| {z }
matrix whose i’th row is ∇fi (x)t

Proof.
To derive the first equality, express the directional derivatives of the fi ’s in terms
of their gradients, as in the section above. The second equality follows from the
definition of matrix multiplication: Each entry of a matrix vector product is the
dot product of the corresponding row of the matrix with the vector.
Q.E.D.

Linearizations
1-d Case
Let f : R → R be differentiable, and define

e(h) = f (x + h) − ( f (x) + f ′ (x)h )


| {z }
linearization of f at x

to be the error of the approximation of f (x + h) by f (x) + f ′ (x)h.


We have seen how to approximate e(h) using Taylor’s theorem.
I’d like to take a somewhat different perspective now.
Observe that

4
e(h) f (x + h) − f (x)
lim = lim − f ′ (x) = f ′ (x) − f ′ (x) = 0.
h→0 h h→0 h

This means, roughly speaking, that the error e(h) tends to zero faster than h.
We know from Taylor’s theorem that if f is twice continuously differentiable,
then e(h) ∼ h2 , so this is what one would expect.

General Case
Let g : Rm → Rn be differentiable and define

e(h) = ∥g(x + h) − ( g(x) + g ′ (x)h )∥


| {z }
linearization of g at x

Lemma. If the partial derivatives of g exist and are continuous, then

∥e(h)∥
lim = 0.
h→0 ∥h∥

Thus, the linearization approximates the function as in the one-dimensional case.


Remark. One can (and probably should) define the derivative g ′ (x) of g at x
to be the linear operator L ∈ Rn×m so that

∥g(x + h) − g(x) − Lh∥


lim = 0,
h→0 ∥h∥

provided that such an operator exists. When such an L exists, it is unique.

Hessian and Directional Derivatives


Definition (Hessian Matrix). The Hessian or second derivative of f : Rn → R
at the point x ∈ Rn is the matrix f ′′ (x) ∈ RN ×N whose ij’th entry is

∂2f
f ′′ (x)ij = (x).
∂xi ∂xj

Sometimes the Hessian is denoted D2 f (x) or δ 2 f (x) instead of f ′′ (x).


Note that f ′′ (x) is symmetric by Clairault’s theorem when the partial derivatives
exist and are continuous.
Observe that f ′′ (x) is the derivative of the gradient of f : Set G(x) = ∇f (x),
and note that G : Rn → Rn . We have
∂ ∂ ∂ ∂
G′ (x)ij = Gi (x) = ∇f (x)i = f (x) = f ′′ (x)ij .
∂xj ∂xj ∂xj ∂xi

5
Second order directional derivatives can be expressed in terms of the Hessian.
Lemma. For v, w ∈ Rn , we have

∂v ∂w f (x) = v t f ′′ (x)w.

In particular,
d2
f (x + sv) = v t f ′′ (x)v.
ds2 s=0

Proof. Exercise using results above!

Hessian and Second-Order Optimality Condition


For f : R → R you know x is a local minimizer if f ′ (x) = 0 and f ′′ (x) > 0. The
same is true in higher dimensions:
Lemma. If f : Rn → R is twice continuously differentiable and for some x ∈ Rn ,

∇f (x) = 0 and f ′′ (x) ∈ SPD,

then x is a local minimizer of f .


Proof.
Fix v ∈ Rn \ {0}. Consider the cross-section of the graph of f over the line
{x + sv : s ∈ R}. The cross-section coincides with the graph of the function
fv : R → R defined by fv (s) = f (x + sv). We have

fv′ (0) = ∂v f (x) = ∇f (x)t v = 0 and fv′′ (0) = ∂v2 f (x) = v t f ′′ (x)v t > 0,

since ∇f (x) = 0 and f ′′ (x) ∈ SP D. Therefore, at s = 0, the tangent to the


graph of fv is horizontal and fv is convex (concave up), hence s = 0 is a local
minimizer of fv . Equivalently, the cross-section of the graph of f over the line
{x + sv : s ∈ R} has a horizontal tangent at x and is convex at x, hence x is a
local minimizer of f along {x + sv : s ∈ R}.
Since x is a local minimizer of f along any line through x (that is, for all
directions v in the paragraph above) it follows that x must be a local minimizer.
(Picture on board in class.)
Q.E.D.

You might also like