3 ECE5570 - CH3 - 12feb17
3 ECE5570 - CH3 - 12feb17
We will first develop tools and concepts associated with the general
optimization process, and we begin with problems that are independent
of time
) known as static or parameter optimization problems
local minimum
) L.u"1 C 4u1; u"2 C 4u2; : : : ; u"m C 4um/ > L.u"1 ; u"2 ; : : : ; u"m/
for all infinitesimal changes 4u1; 4u2; : : : ; 4um , where values u"
denote the optimal (minimizing) values of u
! Along any line u.˛/ D u" C ˛s through u" , L Œu.˛/! has both zero
slope and non-negative curvature at u" (see Figure 3.3 )
! This is the usual condition derived from a Taylor series for a local
minimum of a function of one variable
ˇ ˇ
dL ˇ 1 d 2 ˇ
L
L.u/ D L.u"/ C ˇ 4u C ˇ 4u2 C % % %
ˇ
du u" 2 du2 ˇu"
or ˇ ˇ
dL ˇ 1 d 2 ˇ
L
4L D L.u/ $ L.u"/ D ˇ 4u C ˇ 4u2 C % % %
du ˇu" 2 du2 ˇu"
– Since u" is a local minimum, we know two things:
1. L.u/ $ L.u"/ > 0 for all u in a neighborhood of u"
2. 4u is an arbitrary, but infinitesimal change in u away from u" )
higher order terms in Taylor series expansion are insignificant:
ˇ
dL ˇˇ
) 4L & 4u
du ˇu"
ˇ
dL ˇˇ
But since 4u is arbitrary, ¤ 0 ) 4L < 0 for some 4u,
du ˇu"
and by deduction,
ˇ
dL ˇ
) u" can only be a minimum if ˇ D0
du ˇu"
ˇ
dL ˇˇ
If D 0,
du ˇu"
ˇ
d 2L ˇˇ 2
4L & ˇ 4u
du2 u"
2 ˇ
ˇ
d L ˇ >0
but 4u2 > 0 for all 4u, so 4L > 0 if
du ˇ "
2
u
2 ˇ
ˇ
d L ˇ >0
) u" will be a minimum if
du ˇu"
2
Two-Parameter Problem
! Consider the function L.u1; u2/ where L.u"1 ; u"2 / is a local minimum
! We’ll use the same Taylor series arguments as above to develop
conditions for a minimum, but now the Taylor series is more
complicated:
ˇ ˇ
" " @L ˇˇ @L ˇˇ
L.u1; u2/ D L.u1 ; u2 / C 4u1 C 4u2
@u1 ˇu";u" @u2 ˇu";u"
$ 2 ˇ ˇ ˇ
1 2 1 2
%
1 @ L ˇˇ 2 @2
L ˇ
ˇ 4u14u2 C @ 2 ˇ
L ˇ 4u2 C % % %
C 4u C
2 @u21 ˇ" @u1@u2 ˇ" @u22 ˇ"
1 2 2
condition is attained: ˇ ˇ
@L ˇˇ @L ˇˇ
D D0
@u1 ˇ" @u2 ˇ"
– If these conditions are satisfied, then the second-order term in the
Taylor
" series
# expansion must be greater than or equal to to zero for
u"1 ; u"2 to be a minimizer
! Let’s re-write the 2nd-order term to see how we can validate this
condition:
$ ˇ ˇ ˇ %
1 @2L ˇˇ @ 2
L ˇ
ˇ 4u14u2 C @2 ˇ
L ˇ 4u2
ˇ 4u21 C 2 ˇ
2
2 @u1 " @u1@u2 " @u2 ˇ"
2 2
2 ˇ ˇ 3
@2L ˇˇ @2L ˇˇ
6 @u21 ˇ" @u1@u2 ˇ" 7 " #
1h i6 7 4u
6 7
4u1 4u2 6 7
1
D
2 6 ˇ ˇ 7 4u2
4 @ L ˇˇ
2 2 ˇ
@ Lˇ 5
@u2@u1 ˇ" @u22 ˇ"
( )
1 @2L
D 4uT 4u
2 @u2
@2 L
– N OTE: is the Hessian of L
@u2
! This result clearly indicates that the 2nd-order term in the Taylor
series expansion will be greater than or equal to zero if
@2 L
is positive semidefinite
@u2
N-Parameter Problem
! The vector notation introduced in the 2-parameter problem above is
ideally suited to the N -parameter problem and leads to precisely the
same necessary and sufficient conditions as those stated above
Example 1
! Consider the following four cases:
1. f .x/ D x12 C x22
@f h i
D 2x1 2x2 D 0
@x
) x1 D x2 D 0
" #
2
@f 2 0
D >0
@x 2 0 2
2. f .x/ D $x12 C $x22
@f h i
D $2x1 $2x2 D 0
@x
) x1 D x2 D 0
" #
2
@f $2 0
D <0
@x 2 0 $2
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–10
Figure 3.4 f .x/ D x12 C x22 Figure 3.5 f .x/ D !x12 ! x22
Figure 3.6 f .x/ D x12 ! x22 Figure 3.7 f .x/ D !x12 C x22
4.x1 $ x2 C 2/ D 0
x1 $ x2 D $2
x1 C x2 $ 4 D 0
x1 C x2 D 4
)
h iT
xD 1 3
2 2 2
3
2 C 12.x1 C x2 $ 4/ $2 C 12.x1 C x2 $ 4/
@2f 6 7
D 4 5
@x 2
$2 C 12.x1 C x2 $ 4/2 2 C 12.x1 C x2 $ 4/2
" #
2 2
@f2
2 C 12.x1 C x2 $ 4/ $2 C 12.x1 C x2 $ 4/
D
@x 2 $2 C 12.x1 C x2 $ 4/2 2 C 12.x1 C x2 $ 4/2
" #
2 $2
D "1 D 4; "2 D 0
$2 2
! It is apparent that the slope of dL=d˛ at ˛ .k/ must be zero, which gives
rL.kC1/T s.k/ D 0
! If ˛O is the least value of ˛ > 0 at which the f .˛/ curve intersects the
#-line, and $ > #, then it can be shown there exists an interal of
acceptable points satisfying the Goldstein conditions (proof omitted).
! In practice, it is customary to use $ D 0:1 and # D 0:01 , though the
behavior is not really too sensitive to choice of #
! Line search algorithm comprises two phases: bracketing and
sectioning:
Bracketing Algorithm
For i D 1; 2; : : :
1. evaluate f .˛i /
2. if f .˛i / ) fmi n ) terminate line search
3. if f .˛i / > f .0/ C ˛#f 0.0/ or f .˛i / ' f .˛i $1/
(a) ai D ˛i $1
(b) bi D ˛i
) terminate bracket
4. evaluate f 0.˛i /
5. if jf 0.˛i /j ) $$f 0.0/ ) terminate line search
6. if f 0.˛i / ' 0
(a) ai D ˛i
(b) bi D ˛i $1
) terminate bracket
7. if % ) 2˛i $ ˛i$1
(a) ˛i C1 D %
8. else
(a) choose ˛i C1 2 Œ2˛i $ ˛i $1; min .%; ˛i C &1 .˛i $ ˛i $1//!
end
! Parameter &1 is preset and governs the size of the jumps; &1 D 9 is a
reasonable choice
Example: Bracketing
f (α)
ρ = 0.5
′
f (0) ρ = 0.25
σf ′ (0)
(1 − ρ)f ′ (0)
µ α
Bracket
Sectioning Algorithm
For j D i; i C 1; : : :
- & ' & '.
1. choose ˛j 2 aj C &2 bj $ aj ; b j $ & 3 bj $ a j
2. evaluate f .˛j /
3. if f .˛j / > f .0/ C #˛j f 0.0/ or f .˛j / ' f .aj /
(a) aj C1 D aj
(b) bj C1 D ˛j
4. else
end
! Parameters &2 and &3 are preset and restrict ˛j from getting too close
- .
to the extremes of the interval aj ; bj :
1
0 < & 2 < &3 )
2
! Typical values are: &2 D 0:1 and &3 D 0:5
! For the quadratic case, we can define the 2nd -order polynomial
pq.z/ D p2z 2 C p1z C p0
or in matrix form,
2 32 3 2
& ' 3
1 1 1 1 p3 f bj
6 76 7 6 0& ' 7
63 2 1 07 6 7 6 f z bj 7
6 7 6 p2 7D6 & '7
60 07 6 7 6f0 a 7
4 0 1 5 4 p1 5 4 z & j' 5
0 0 0 1 p0 f aj
Thus giving the solution:
p0 D f .aj /
p1 D fz0.aj /
& '
p2 D 3 f .bj / $ f .aj / $ 2fz0.aj / $ fz0.bj /
& '
p3 D fz0.aj / C fz0.bj / $ 2 f .bj / $ f .aj /
! Choosing parameters,
$ D 0:1
# D 0:01
&1 D9
&2 D 0:1
&3 D 0:5
Iteration 0 1 2 3 4
˛1 D 0:1
˛1 D 1
˛ 0 1 0:1 0:19 0:160922
f .˛/ 1 100 0:82 0:786421 0:771112
f 0 .˛/ $2 $1:4 1:1236 $0:011269
! Descent methods are line search methods where the the search
direction satisfies the descent property :
where
g .k/ D rL.u/
Convergence
or
L.k/ $ L.kC1/ ) '
Newton-like Methods
! The key to this algorithm is that the values of u which minimize L are
the same as the ones which satisfy
@L
D0
@u
! So, we’ll set up an algorithm which searches for a solution to this
problem
! Writing the truncated Taylor series expansion of L.u/ about u.k/ :
1
L.uk/ C ı/ & q .k/.ı/ D L.k/ C g .k/T ı C ı T G .k/ı
2
where ı D u $ u.k/, and q .k/.ı/ is the resulting quadratic
approximation for iteration k.
! Iteration ukC1/ is computed as uk/ C ı .k/ where the correction ı .k/
minimizes q .k/.ı/
! Method requires the zero, first and second order derivatives of L.u/
! The basic algorithm can be written:
Complete Algorithm )
@L
1. Compute the general functions g .k/ D and
@u
@2 L
G D 2 a priori
.k/
@u
2. Choose starting value u.1/
3. Evaluate g .1/ and G .1/ at u.1/
4. Solve for ı .1/ (solve the set of simultaneous equa-
tions G .1/ı .1/ D $g .1/)
5. Compute u.2/ D u.1/ C ı .1/
6. Repeat steps (3) - (5) for increasing values of k until
convergence condition is satisfied
! The biggest problem with this algorithm is that the calculation of the
@2L
Hessian 2 may be extremely tedious
@u
Let
L.u/ D u41 C u1u2 C .1 C u2/2
where " #
1:25
u.1/ D
$0:2
k 1 2 3 4 5 6 7
u.k/
1 1:25 0:9110 0:7451 0:69932 0:6959029 0:6958844 0:6958843
u.k/
2 $0:2 $1:455 $1:3726 $1:34966 $1:347951 $1:3479422 $1:3479422
From the above, it can be shown that the ratio kh.kC1/k=kh.k/k2 ! a where
a & 1:4 , indicating second-order convergence.
! Basic Newton method is not suitable for a general purpose algorithm:
– G .k/ may not be be positive definite when x .k/ is far from the
solution
– even if G .k/ is positive definite, algorithm still may not converge
– convergence can be addressed by using Newton’s method with
line search:
s.k/ D $G .k/$1g .k/ ) search direction
If we choose " #
0
x .1/ D
0
then we have,
" # " # " #
0 0 1 $2
g .1/ D ; G .1/ D ; s.1/ D
2 1 2 0
! Other modifications exist, but they are beyond the scope of this
course
– if K is too big, the 2nd-order term in the Taylor series may become
significant and the algorithm may overshoot the stationary point
and not converge at all.
– if K is too small, the higher-order-terms will be truly insignificant
but it may take forever to get to the solution
– how is it done in practice? Vary K during the iteration process
Quasi-Newton Methods
Quasi-Newton Methods
! Basic Algorithm:
˛ .2/ D 0:4775
Iteration k D 3
" # " # " #
$0:0818 0:1713 0
) u.3/ D u.2/ C ˛ .2/s.2/ D C .0:4775/ D
0:8182 $1:7135 0
" #
0
g .3/ D
0
" # " # " #
0 $0:0818 0:0818
ı .2/ D u.3/ $ u.2/ D $ D
0 0:8182 $0:8182
" # " # " #
0 $1:6364 1:6364
! .2/ D g .3/ $ g .2/ D $ D
0 1:6364 $1:6364
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–41
" #
$0:0895
v.2/ D ı .2/ $ H .2/! .2/ D
0:8953
" #
0:05 0
H .3/ D
0 0:5
.kC1/ ıı T H !! T
HDFP DHC T $ T
ı ! ! H!
– B ROYDEN -F LETCHER -G OLDFARB -S HANNO (BFGS):
+ , T !
T T T
.kC1/ ! H ! ıı ı! H C H !ı
HBF GS D H C 1 C T $
ı ! ıT ! ıT !