18 Vector Calculus and Optimization
18 Vector Calculus and Optimization
Root-Finding
2024/04/22
Unconstrained Optimization
Goal is to find
x ∈ argminz f (z)
Constraints
In many applications, one has constraints, e.g.
1
Definition. The directional derivative of g at x ∈ Rm in direction h ∈ Rm is
d
∂h g(x) := g(x + sh),
ds s=0
d d
g(γ(s)) = g(x + sh),
ds ds
so the directional derivative is the rate of change of g along the path γ at time
zero.
(Picture on board.)
The directional derivatives can be calculated from the gradient.
Definition. The gradient of g at x is
∂g
∂x1 (x)
∂g
∂x2 (x)
m
∇g(x) =
..
∈R .
.
∂g
∂xm (x)
d
∂h g(x) = g(x + sh) = ∇g(x) · h.
ds s=0
2
First-Order Optimality Condition
Lemma. If x ∈ Rm is a local minimizer of a differentiable function g : Rm → R,
then
∇g(x) = 0.
3
Definition. The derivative (or Jacobian) of a function f : Rm → Rn at a point
x ∈ Rm is
∂f ∂f1 ∂f1
∂x2 (x) . . .
1
∂x1 (x) ∂xm (x)
∂f2 ∂f ∂f2
∂x1 (x) ∂x22 (x) . . . ∂xm (x)
f ′ (x) :=
.. .. .. .. ∈R
m×n
,
. . . .
∂fn ∂fn ∂fn
∂x1 (x) ∂x2 (x) . . . ∂xm (x)
if the partial derivatives all exist and are continuous. Sometimes the derivative
is denoted Df (x) or δf (x) instead of f ′ (x) in this context.
Observe that for a function f : Rn → R, the derivative is the transpose of the
gradient: f ′ (x) = ∇f (x)t .
Lemma. If the partial derivatives of f are continuous at x, we have
∇f1 (x)t
∇f1 (x) · h
∇f2 (x) · h ∇f2 (x)t
∂h f (x) = = h = f ′ (x)h.
.. ..
. .
∇fm (x) · h ∇fm (x)t
| {z }
matrix whose i’th row is ∇fi (x)t
Proof.
To derive the first equality, express the directional derivatives of the fi ’s in terms
of their gradients, as in the section above. The second equality follows from the
definition of matrix multiplication: Each entry of a matrix vector product is the
dot product of the corresponding row of the matrix with the vector.
Q.E.D.
Linearizations
1-d Case
Let f : R → R be differentiable, and define
4
e(h) f (x + h) − f (x)
lim = lim − f ′ (x) = f ′ (x) − f ′ (x) = 0.
h→0 h h→0 h
This means, roughly speaking, that the error e(h) tends to zero faster than h.
We know from Taylor’s theorem that if f is twice continuously differentiable,
then e(h) ∼ h2 , so this is what one would expect.
General Case
Let g : Rm → Rn be differentiable and define
∥e(h)∥
lim = 0.
h→0 ∥h∥
∂2f
f ′′ (x)ij = (x).
∂xi ∂xj
5
Second order directional derivatives can be expressed in terms of the Hessian.
Lemma. For v, w ∈ Rn , we have
∂v ∂w f (x) = v t f ′′ (x)w.
In particular,
d2
f (x + sv) = v t f ′′ (x)v.
ds2 s=0
fv′ (0) = ∂v f (x) = ∇f (x)t v = 0 and fv′′ (0) = ∂v2 f (x) = v t f ′′ (x)v t > 0,