0% found this document useful (0 votes)
9 views

3 ECE5570 - CH3 - 12feb17

sys3

Uploaded by

arulmozhi6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

3 ECE5570 - CH3 - 12feb17

sys3

Uploaded by

arulmozhi6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

ECE5570: Optimization Methods for Systems and Control 3–1

Parameter Optimization: Unconstrained

We will first develop tools and concepts associated with the general
optimization process, and we begin with problems that are independent
of time
) known as static or parameter optimization problems

Useful Algorithmic and Geometric Properties

! We’ll first examine a useful class of algorithms known as iterative


methods that generate a sequence of points, x .1/; x .2/; x .3/; : : : ; or
˚ .k/!
more compactly x , that converge to a fixed point x " which is the
solution to a given problem
0 0
! Let us define a line as a set of points x.˛/ D x C ˛s where x is a
fixed point and s is the direction of the line (see a 2-D representation
in Figure 3.1 )

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–2

Figure 3.1 A line in two dimensions

! An acceptable iterative optimization algorithm exhibits the following


properties:

– iterations x .k/ move steadily toward the neighborhood of a local


minimizer x "
– iterations converge rapidly to the point x ", i.e., for h.k/ D x .k/ $ x ";
h.k/ ! 0 for some appropriate measure of h.k/
– rate of convergence is an important measure of goodness of the
algorithm

! A method is usually based on a model – an approximation of the


objective function – which enables an estimate of the local minimizer
to be made

– most successful have been quadratic models

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–3
Unconstrained Parameter Optimization

! To begin this investigation, we must define the goals that we hope to


achieve when attacking these problems

– First, we will define an index of performance, or objective function,


that we’ll call L for this development
ı In general, L will be a function of one, two or many variables;
i.e., L D f .u1; u2; : : : ; um/
ı N OTE: it’s also customary to use J to denote an objective
function
" and xk for the # independent variables; e,g.,
J x1 ; x 2 ; : : : ; x m
– Our goal in the parameter optimization process will be to select the
variables fu1; u2; : : : ; umg such that L will be minimized
ı Recall that maximization can be achieved by switching the sign
on a minimization problem

– But what exacly do we mean by a minimum? We generally


consider two definitions:

absolute (or global) minimum


) L.u"1 C 4u1; u"2 C 4u2; : : : ; u"m C 4um/ > L.u"1 ; u"2 ; : : : ; u"m/
for all changes 4u1; 4u2; : : : ; 4um

local minimum
) L.u"1 C 4u1; u"2 C 4u2; : : : ; u"m C 4um/ > L.u"1 ; u"2 ; : : : ; u"m/
for all infinitesimal changes 4u1; 4u2; : : : ; 4um , where values u"
denote the optimal (minimizing) values of u

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–4
– An optimization problem usually assumes that an optimum solution
u" exists, is unique and can be found, but this ideal situation may
not hold for a number of reasons:
ı L.u/ is unbounded below
ı L.u/ is bounded below
ı u" is not unique
ı local minimum exists that is not a global minimum
ı local minimum exists although L.u/ is unbounded below (see
Figure 3.2)

Figure 3.2 f .x/ D x 3 ! 3x

– The conditions for a local minimum are considerably easier to


solve than for a global minimum; we’ll address the local minimum
problem in this course
– N OTE : We will focus on minimizing performance indices (or
objective functions). The problem of maximizing an objective
K D $L
function fits easily within this framework by simply letting L
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–5
Conditions for Local Minima

! Along any line u.˛/ D u" C ˛s through u" , L Œu.˛/! has both zero
slope and non-negative curvature at u" (see Figure 3.3 )
! This is the usual condition derived from a Taylor series for a local
minimum of a function of one variable

Figure 3.3 Zero slope and non-negative curvature at ˛ D 0

Single Parameter Problem

! Consider the function: L.u/ D .u $ 1/2


How do we find the minimum?
dL
D 0 D 2.u $ 1/ ) uD1
du
d 2L
D2>0
du2
! Why does this work?
– if we let u" denote a local minimum of L.u/, then L can be
expanded in a Taylor series about u":
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–6

ˇ ˇ
dL ˇ 1 d 2 ˇ
L
L.u/ D L.u"/ C ˇ 4u C ˇ 4u2 C % % %
ˇ
du u" 2 du2 ˇu"
or ˇ ˇ
dL ˇ 1 d 2 ˇ
L
4L D L.u/ $ L.u"/ D ˇ 4u C ˇ 4u2 C % % %
du ˇu" 2 du2 ˇu"
– Since u" is a local minimum, we know two things:
1. L.u/ $ L.u"/ > 0 for all u in a neighborhood of u"
2. 4u is an arbitrary, but infinitesimal change in u away from u" )
higher order terms in Taylor series expansion are insignificant:
ˇ
dL ˇˇ
) 4L & 4u
du ˇu"
ˇ
dL ˇˇ
But since 4u is arbitrary, ¤ 0 ) 4L < 0 for some 4u,
du ˇu"
and by deduction,
ˇ
dL ˇ
) u" can only be a minimum if ˇ D0
du ˇu"
ˇ
dL ˇˇ
If D 0,
du ˇu"
ˇ
d 2L ˇˇ 2
4L & ˇ 4u
du2 u"
2 ˇ
ˇ
d L ˇ >0
but 4u2 > 0 for all 4u, so 4L > 0 if
du ˇ "
2
u
2 ˇ
ˇ
d L ˇ >0
) u" will be a minimum if
du ˇu"
2

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–7

Sufficient Conditions For a Local Minimum:


ˇ ˇ
ˇ
dL ˇ 2 ˇ
d Lˇ
D
du ˇu" du2 ˇu"
0I >0
ˇ
d 2L ˇˇ
! What if D0?
du2 ˇu"

! Must go to higher order derivatives (odd derivatives must be zero, 1st


even derivatives must be positive)

Necessary Conditions For a Local Minimum:


ˇ ˇ
dL ˇˇ d 2L ˇˇ
D 0I '0
du ˇu" du2 ˇu"
Q UESTION : What is the difference between necessary and sufficient
conditions?

Two-Parameter Problem

! Consider the function L.u1; u2/ where L.u"1 ; u"2 / is a local minimum
! We’ll use the same Taylor series arguments as above to develop
conditions for a minimum, but now the Taylor series is more
complicated:
ˇ ˇ
" " @L ˇˇ @L ˇˇ
L.u1; u2/ D L.u1 ; u2 / C 4u1 C 4u2
@u1 ˇu";u" @u2 ˇu";u"
$ 2 ˇ ˇ ˇ
1 2 1 2
%
1 @ L ˇˇ 2 @2
L ˇ
ˇ 4u14u2 C @ 2 ˇ
L ˇ 4u2 C % % %
C 4u C
2 @u21 ˇ" @u1@u2 ˇ" @u22 ˇ"
1 2 2

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–8
& '
! Clearly, u"1 ; u2 can only be a minimum if the following stationarity
"

condition is attained: ˇ ˇ
@L ˇˇ @L ˇˇ
D D0
@u1 ˇ" @u2 ˇ"
– If these conditions are satisfied, then the second-order term in the
Taylor
" series
# expansion must be greater than or equal to to zero for
u"1 ; u"2 to be a minimizer

! Let’s re-write the 2nd-order term to see how we can validate this
condition:
$ ˇ ˇ ˇ %
1 @2L ˇˇ @ 2
L ˇ
ˇ 4u14u2 C @2 ˇ
L ˇ 4u2
ˇ 4u21 C 2 ˇ
2
2 @u1 " @u1@u2 " @u2 ˇ"
2 2

2 ˇ ˇ 3
@2L ˇˇ @2L ˇˇ
6 @u21 ˇ" @u1@u2 ˇ" 7 " #
1h i6 7 4u
6 7
4u1 4u2 6 7
1
D
2 6 ˇ ˇ 7 4u2
4 @ L ˇˇ
2 2 ˇ
@ Lˇ 5
@u2@u1 ˇ" @u22 ˇ"
( )
1 @2L
D 4uT 4u
2 @u2
@2 L
– N OTE: is the Hessian of L
@u2
! This result clearly indicates that the 2nd-order term in the Taylor
series expansion will be greater than or equal to zero if
@2 L
is positive semidefinite
@u2

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–9

Sufficient Conditions For a Local Minimum:


@L @2L
D0 positive definite
@u @u2
Necessary Conditions For a Local Minimum:
@L @2L
D0 positive semidefinite
@u @u2

N-Parameter Problem
! The vector notation introduced in the 2-parameter problem above is
ideally suited to the N -parameter problem and leads to precisely the
same necessary and sufficient conditions as those stated above

Example 1
! Consider the following four cases:
1. f .x/ D x12 C x22
@f h i
D 2x1 2x2 D 0
@x
) x1 D x2 D 0
" #
2
@f 2 0
D >0
@x 2 0 2
2. f .x/ D $x12 C $x22
@f h i
D $2x1 $2x2 D 0
@x
) x1 D x2 D 0
" #
2
@f $2 0
D <0
@x 2 0 $2
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–10

3. f .x/ D x12 $ x22


@f h i
D 2x1 $2x2 D 0
@x
) x1 D x2 D 0
" #
2
@f 2 0
D ( indeterminate
@x 2 0 $2
4. f .x/ D $x12 C x22
@f h i
D $2x1 2x2 D 0
@x
) x1 D x2 D 0
" #
2
@f $2 0
D ( indeterminate
@x 2 0 2

! Corresponding function surface graphs are depicted in the following


figures

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–11

Figure 3.4 f .x/ D x12 C x22 Figure 3.5 f .x/ D !x12 ! x22

Figure 3.6 f .x/ D x12 ! x22 Figure 3.7 f .x/ D !x12 C x22

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–12
Example 2

! Consider the objective function given by:

f .x/ D .x1 $ x2 C 2/2 C .x1 C x2 $ 4/4


( )T " # " #
3
@f 2.x1 $ x2 C 2/ C 4.x1 C x2 $ 4/ 0
D D
@x $2.x1 $ x2 C 2/ C 4.x1 C x2 $ 4/3 0

4.x1 $ x2 C 2/ D 0
x1 $ x2 D $2

x1 C x2 $ 4 D 0
x1 C x2 D 4

)
h iT
xD 1 3

2 2 2
3
2 C 12.x1 C x2 $ 4/ $2 C 12.x1 C x2 $ 4/
@2f 6 7
D 4 5
@x 2
$2 C 12.x1 C x2 $ 4/2 2 C 12.x1 C x2 $ 4/2
" #
2 2
@f2
2 C 12.x1 C x2 $ 4/ $2 C 12.x1 C x2 $ 4/
D
@x 2 $2 C 12.x1 C x2 $ 4/2 2 C 12.x1 C x2 $ 4/2
" #
2 $2
D "1 D 4; "2 D 0
$2 2

Necessary conditions are satisfied; sufficient conditions are not.

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–13
f .x/ ' 0 8 .x1; x2/
f .x/ D 0 for .1; 3/

) f .1; 3/ is a local minimum

! For many multi-parameter optimization problems, the necessary


condition
@L
D0
@u
generates a set of equations that are too difficult to solve analytically.
! So what do we do? Compute numerically!

Descent Algorithms for Unconstrained Optimization

! Here we seek an iterative method for unconstrained optimization, i.e.,


one that iterates u.k/ so that it moves rapidly toward the neighborhood
of a local minimizer u" and converges rapidly to the point u" itself

– Order of convergence is a useful measure of algorithm behavior


ı Define the error vector,
h.k/ D u.k/ $ u"
ı Then if h.k/ ! 0 (convergence), it may be possible to give local
convergence results:
* .kC1/*
*h *
* .k/*p ! a
*h *
where a > 0 implies the order of convergence is p th order.

– Here the notation k!k denotes a vector norm and,


c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–14

ıpD1 ) first order or linear convergence


ıpD2 ) second order or quadratic convergence

Line Search Algorithms

! The basic idea is to search for a minimum function value along


coordinate directions, or in more general directions
! First we generate an initial estimate u.1/ , then for each k th iteration,

1. Determine a direction of search s.k/


2. Find ˛ .k/ to minimize L.u.k/ C ˛s.k// with respect to ˛
3. Set u.kC1/ D u.k/ C ˛ .k/s.k/

! Different methods correspond to different ways of choosing s.k/ in


step 1
! Step 2 is the line search subproblem and involves sampling L.u/ (and
possibly its derivatives) along the line

– Ideally, an exact minimizing value of ˛ .k/ is required, but this is not


practical in a finite number of steps

! It is apparent that the slope of dL=d˛ at ˛ .k/ must be zero, which gives

rL.kC1/T s.k/ D 0

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–15

Figure 3.8 Exact line search

! Generally, inexact or approximate line searches are used to satisfy


this minimizing condition
! Requirement that L.kC1/ < L.k/ is unsatisfactory by itself because
reductions in L might be negligible
! Aim of a line search is to:

– find a step ˛ .k/ which gives a significant reduction in L on each


iteration
– ensure points are not near the extremes of the interval Œ0; ˛N .k/!,
where ˛N .k/ denotes the least positive value of ˛ for which
L.u.k/ C ˛s.k// D L.u.k//

! Goldstein Conditions meet the above requirements:

– f .˛/ ) f .0/ C ˛#f 0.0/


– f .˛/ ' f .0/ C ˛.1 $ #/f 0.0/

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–16
+ ,
1
# 2 0; is a fixed parameter; the geometry is illustrated in
2
accompanying Figure 3.9 .
! The second of these conditions might exclude the minimizing point of
f .˛/, so an alternate condition is often used:
jf 0.˛/j ) $$f 0.0/

Figure 3.9 Line search geometry

! If ˛O is the least value of ˛ > 0 at which the f .˛/ curve intersects the
#-line, and $ > #, then it can be shown there exists an interal of
acceptable points satisfying the Goldstein conditions (proof omitted).
! In practice, it is customary to use $ D 0:1 and # D 0:01 , though the
behavior is not really too sensitive to choice of #
! Line search algorithm comprises two phases: bracketing and
sectioning:

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–17
– Bracketing: iterates ˛i move out to the right in increasingly large
jumps until an acceptable interval is located
- .
– Sectioning: generates a sequence of brackets aj ; bj whose
lengths tend toward zero

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–18

Bracketing Algorithm
For i D 1; 2; : : :
1. evaluate f .˛i /
2. if f .˛i / ) fmi n ) terminate line search
3. if f .˛i / > f .0/ C ˛#f 0.0/ or f .˛i / ' f .˛i $1/

(a) ai D ˛i $1
(b) bi D ˛i

) terminate bracket
4. evaluate f 0.˛i /
5. if jf 0.˛i /j ) $$f 0.0/ ) terminate line search
6. if f 0.˛i / ' 0

(a) ai D ˛i
(b) bi D ˛i $1

) terminate bracket
7. if % ) 2˛i $ ˛i$1

(a) ˛i C1 D %

8. else
(a) choose ˛i C1 2 Œ2˛i $ ˛i $1; min .%; ˛i C &1 .˛i $ ˛i $1//!
end

! Parameter &1 is preset and governs the size of the jumps; &1 D 9 is a
reasonable choice

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–19

! Choice of ˛i C1 can be made in any way, but a sensible choice is to


minimize a cubic polynomial interpolating f .˛i /, f 0.˛i /, f .˛i $1/, and
f 0.˛i$1/.

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–20

Example: Bracketing

! Consider the quadratic function


f .˛/ D 0:5 C 2 .˛ $ 3/

Since this is a quadratic, it’s somewhat of a special case. For this


example, we choose the following parameters for the start of the line
search:
˛0 D 0 ˛1 D 1 # D 0:25 $ D 0:5

– For simplicity, we select fN D 0 as an absolute lower bound


(although it’s obviously bounded by 0:5)

! We begin the first iteration of the bracketing algorithm (i D 1)


1. f .˛1/ D 8:5
2. Test: f .˛1/ ) fN No
3. Test: f .˛1/ > f .0/ C ˛i #f 0 .0/
8:5 > 18:5 C .1/ .0:25/ .$12/ D 15:5 No
4. f 0 .˛1/ D $8
5. Test: jf 0 .˛1/j ) $$f 0 .0/
j$8j ) $ .0:5/ .$12/ D 6 No
6. Test: f 0 .˛1/ > 0 No
7. Test: % ) 2˛1 $ ˛0 No
) Therefore, choose the next iterate within the interval
h " #i
˛2 2 2; min 6:1667; 1 C &1 .˛1 $ ˛0/
Subsituting values,
h " #i h i
˛2 2 2; min 6:1667; 1 C 9 .1 $ 0/ D 2; 6:1667

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–21

! Quadratic interpolation over this interval will give ˛2 D 3 as the next


iterate; this will terminate the line search in the next iteration at Step 5
– The bracketing sequence is depicted in the plot below
f (α) = 0.5 + 2(α − 3)2

f (α)

ρ = 0.5

f (0) ρ = 0.25

σf ′ (0)
(1 − ρ)f ′ (0)

µ α
Bracket

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–22

Sectioning Algorithm
For j D i; i C 1; : : :
- & ' & '.
1. choose ˛j 2 aj C &2 bj $ aj ; b j $ & 3 bj $ a j
2. evaluate f .˛j /
3. if f .˛j / > f .0/ C #˛j f 0.0/ or f .˛j / ' f .aj /

(a) aj C1 D aj
(b) bj C1 D ˛j

4. else

(a) evaluate f 0.˛j /


ˇ ˇ
(b) if ˇf 0.˛j /ˇ ) $$f 0.0/ ) terminate line search
i. aj C1 D ˛j
& '
(c) if bj $ aj f 0.˛j / ' 0
i. bj C1 D aj
(d) else
i. bj C1 D bj
(e) end

end

! Parameters &2 and &3 are preset and restrict ˛j from getting too close
- .
to the extremes of the interval aj ; bj :
1
0 < & 2 < &3 )
2
! Typical values are: &2 D 0:1 and &3 D 0:5

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–23
Polynomial Interpolation

! For the quadratic case, we can define the 2nd -order polynomial
pq.z/ D p2z 2 C p1z C p0

Considering the normalized interval z D Œ0; 1! corresponding to


- .
aj ; bj allows us to write the interpolation conditions:
pq.0/ D f .aj /
pq.1/ D f .bj /
pq0 .0/ D fz0.aj /
pq0 .1/ D fz0.bj /

Assuming we can compute the values f .aj /, fz0.aj /, and f .bj /,


substituting for z allows us to write
p .0/ D p0 D f .aj /
p 0 .0/ D p1 D fz0.aj /
p .1/ D p2 C p1 C p0 D f .bj /

or, assembling in matrix-vector form,


2 32 3 2 3
1 1 1 p2 f .bj /
6 76 7 6 0 7
4 0 1 0 5 4 p1 5 D 4 fz .aj / 5
0 0 1 p0 f .aj /
– Solving this system of equations yields
p0 D f .aj /
p1 D fz0.aj /
p2 D f .bj / $ f .aj / $ fz0.aj /
giving the interpolating polynomial:
- 0
. 2 - 0 .
pq .z/ D f .bj / $ f .aj / $ fz .aj / z C fz .aj / z C f .aj /
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–24

– Note that the mapping transformation is given by


˛ D a C z .b $ a/
where by the chain rule we have
df df d˛ df
Dfz0 D % D .b $ a/
dz d˛ dz d˛
which relates the derivatives of the mapped variables.
! If in addition f 0.bj / is available, we can find the cubic interpolating
polynomial:
pc .z/ D p3z 3 C p2z 2 C p1z C p0

where we assemble the interpolation equations:


& '
pc .1/ D p3 C p2 C p1 C p0 D f bj
0 0
& '
pc .1/ D 3p3 C 2p2 C p1 D fz bj
0 0
& '
pc .0/ D p1 D fz aj
& '
pc .0/ D p0 D f aj

or in matrix form,
2 32 3 2
& ' 3
1 1 1 1 p3 f bj
6 76 7 6 0& ' 7
63 2 1 07 6 7 6 f z bj 7
6 7 6 p2 7D6 & '7
60 07 6 7 6f0 a 7
4 0 1 5 4 p1 5 4 z & j' 5
0 0 0 1 p0 f aj
Thus giving the solution:
p0 D f .aj /
p1 D fz0.aj /
& '
p2 D 3 f .bj / $ f .aj / $ 2fz0.aj / $ fz0.bj /
& '
p3 D fz0.aj / C fz0.bj / $ 2 f .bj / $ f .aj /

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–25

Example 3: Line Search

! Consider the function,


& '2
f .x/ D 100 x2 $ x12 C .1 $ x1/2

and let " # " #


0 1
x .k/ D and s.k/ D
0 0
! We then have,
f .˛/ D 100˛ 4 C .1 $ ˛/2
f 0.˛/ D 400˛ 3 $ 2 .1 $ ˛/

! Choosing parameters,
$ D 0:1
# D 0:01
&1 D9
&2 D 0:1
&3 D 0:5

gives the following results for the cases ˛1 D 0:1 and ˛1 D 1:

Iteration 0 1 2 3 4

˛1 D 0:1

˛ 0 0:1 0:2 0:160948


f .˛/ 1 0:82 0:8 0:771111
f 0 .˛/ $2 $1:4 1:6 $0:010423

˛1 D 1
˛ 0 1 0:1 0:19 0:160922
f .˛/ 1 100 0:82 0:786421 0:771112
f 0 .˛/ $2 $1:4 1:1236 $0:011269

Table 3.1 LINE SEARCH EXAMPLE

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–26

Figure 3.10 Example 3: f .˛/ D 100˛ 4 C .1 ! ˛/2

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–27
Descent Methods

! Descent methods are line search methods where the the search
direction satisfies the descent property :

s.k/T g .k/ < 0

where
g .k/ D rL.u/

! This condition ensures

– The slope of dL=d˛ is always negative at ˛ D 0 (unless u.k/ is a


stationary point)
– The function L.u/ can be reduced in the line search for some
˛ .k/ > 0

Steepest Descent Methods

Steepest descent is defined by the condition:


s.k/ D $g .k/
for all k .
! This condition ensures that L.u/ decreases most rapidly local to u.k/
! Although appealing, the steepest descent method is not suited for
practical use, largely because:

– it usually exhibits oscillatory behavior


– it usually terminates far from the exact solution due to round-off
errors

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–28

! Inadequacy of steepest descent is due mostly to the model: the


steepest descent property along the line holds only at ˛ D 0 (not for
all ˛ )
! An exception occurs for quadratic models, which we’ll investigate in
more detail later

Convergence

! It is important to be able to determine when an algorithm has


converged to an acceptable solution
ˇ ˇ
ˇ .k/ "ˇ
! A useful test would be: L $ L ) ' or ˇx i $ x i ˇ ) 'i , but these
.k/ "

are not practical because they require the solution!


* *
! A practical alternative is: *g .k/* ) ' , though in practice it’s hard to
choose an appropriate '
! Far more practical are tests of the following form:
ˇ ˇ
ˇ .k/ .kC1/ ˇ
ˇxi $ xi ˇ ) 'i 8i

or
L.k/ $ L.kC1/ ) '

Newton-like Methods

! It was shown previously that there is a great advantage to deriving a


method based on a quadratic model
! Newton’s method is the most straightforward such method

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–29
Newton’s Method

! The key to this algorithm is that the values of u which minimize L are
the same as the ones which satisfy
@L
D0
@u
! So, we’ll set up an algorithm which searches for a solution to this
problem
! Writing the truncated Taylor series expansion of L.u/ about u.k/ :
1
L.uk/ C ı/ & q .k/.ı/ D L.k/ C g .k/T ı C ı T G .k/ı
2
where ı D u $ u.k/, and q .k/.ı/ is the resulting quadratic
approximation for iteration k.
! Iteration ukC1/ is computed as uk/ C ı .k/ where the correction ı .k/
minimizes q .k/.ı/
! Method requires the zero, first and second order derivatives of L.u/
! The basic algorithm can be written:

– solve for G .k/ı D $g .k/ for ı D ı .k/


– set u.kC1/ D u.k/ C ı .k/

! Newton’s method exhibits second-order convergence

Complete Algorithm )

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–30

@L
1. Compute the general functions g .k/ D and
@u
@2 L
G D 2 a priori
.k/
@u
2. Choose starting value u.1/
3. Evaluate g .1/ and G .1/ at u.1/
4. Solve for ı .1/ (solve the set of simultaneous equa-
tions G .1/ı .1/ D $g .1/)
5. Compute u.2/ D u.1/ C ı .1/
6. Repeat steps (3) - (5) for increasing values of k until
convergence condition is satisfied

! The biggest problem with this algorithm is that the calculation of the
@2L
Hessian 2 may be extremely tedious
@u

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–31

Example 4: Newton-like methods

Let
L.u/ D u41 C u1u2 C .1 C u2/2
where " #
1:25
u.1/ D
$0:2

Implementing Newton’s method for values of k from 1 to 7 gives the


results summarized in the table below; a graphical representation is
shown in Figure 3.11.

k 1 2 3 4 5 6 7

u.k/
1 1:25 0:9110 0:7451 0:69932 0:6959029 0:6958844 0:6958843
u.k/
2 $0:2 $1:455 $1:3726 $1:34966 $1:347951 $1:3479422 $1:3479422

g1.k/ 7:6125 1:5683 0:2823 $0:018382 0:0000982235 $0:0000000028559 0


g2.k/ 2:8500 0 0 0 0 0 0

L.k/ 2:8314 $0:4298 $0:5757 $0:582414 $0:5824452 $0:5824452 $0:5824452


* *
* .k/ *
*h * 1:2727 0:24046 0:0550691 0:00384881 0:0000206765 0:00000000064497 0

Table 3.2 NEWTON’S METHOD EXAMPLE

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–32

Figure 3.11 Newton’s method example

From the above, it can be shown that the ratio kh.kC1/k=kh.k/k2 ! a where
a & 1:4 , indicating second-order convergence.
! Basic Newton method is not suitable for a general purpose algorithm:

– G .k/ may not be be positive definite when x .k/ is far from the
solution
– even if G .k/ is positive definite, algorithm still may not converge
– convergence can be addressed by using Newton’s method with
line search:
s.k/ D $G .k/$1g .k/ ) search direction

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–33
Example 5
! Returning to the function of Example 4,
L.u/ D u41 C u1u2 C .1 C u2/2

If we choose " #
0
x .1/ D
0
then we have,
" # " # " #
0 0 1 $2
g .1/ D ; G .1/ D ; s.1/ D
2 1 2 0

! A search along ˙s.1/ changes only the x1 component of x .1/ which


finds x1 D 0 as the minimizing value in the search ) algorithm fails to
make progress!
! Reason for this is: s.1/T g .1/ D 0 ) directions are not downhill!
– This stems from the fact that ".G .1// D 2:1412; $0:4142 ) G .1/ not
positive definite
! But function L.u/ has a well defined minimum which can be found by
searching along steepest descent direction

Modifications to Newton’s Method


! Clearly, some modification is required to make the Newton algorithm
generally applicable
! Modification 1: Revert to steepest descent direction s.k/ D $g .k/
whenever G .k/ is not positive definite
– Unfortunately, this exhibits slow oscillatory behavior since the
method ignores information in the model quadratic function
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–34

! Modification 2: Adjust the Newton search direction by giving it a bias


towards the steepest descent vector, $g .k/:
& .k/ ' .k/
G C (I s D $g .k/

– Method adds factor ( to the eigenvalues of G .k/ to (hopefully)


make it positive definite
– Takes into account more of the function’s quadratic information
(except in the vicinity of a saddle point)

! Other modifications exist, but they are beyond the scope of this
course

1st-order Gradient Methods (a simplified approach)

! Instead of using knowledge about the “curvature” of L to help us find


ı, let’s simply step by some amount (???) in the direction of
decreasing L until we reach a minimum

– the solution of the linear equation Gı D $g can be expressed as


ı D $G $1g
(though in practice this is not how we would solve it)
– we now replace G $1 by a positive scalar constant K so that
ı D $Kg
– we can now perform the same iterative algorithm outlined above

! Can I convince you that this will work?


@L
– remember, L.u C ı/ D L.u/ C ı C O.2/
@u

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–35
@L
– if ı D $K , then
@u
ˇ ˇ2
ˇ @L ˇ
L.u C ı/ $ L.u/ & $K ˇˇ ˇˇ < 0
@u
– so, to first order, we are moving in the right direction!

! This verification also provides some insight into the problem of


selecting K W

– if K is too big, the 2nd-order term in the Taylor series may become
significant and the algorithm may overshoot the stationary point
and not converge at all.
– if K is too small, the higher-order-terms will be truly insignificant
but it may take forever to get to the solution
– how is it done in practice? Vary K during the iteration process

Quasi-Newton Methods

! Main disadvantage of Newton’s method is that the user must supply


explicit formulae to compute the second derivative matrix G
! But methods very similar to Newton’s method can be derived when
only first derivative formulae are available
! One straighforward approach is the Finite Difference Newton Method:

– estimate G .k/ by using finite differences in the gradient vectors, i.e.,


the .i; j / element of estimate GO .1/ is computed as:
" #
.k/ .k/
g j .x C hi e i / $ g j
GO ij D
hi
where hi is an increment length in the coordinate direction, e i .
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–36

– make GO symmetric by computing


1 " #
GO s D GO C GO T
2
– use GO s in place of G .k/ in Newton’s method

! The method can be useful, but has some disadvantages:

– GO s may not be positive definite


– n gradient evaluations are required to estimate G .k/
– a set of linear equations must be solved at each iteration

Quasi-Newton Methods

! Quasi-Newton methods avoid some of the disadvantages outlined


above by –

– employing Newton’s method with line search


– approximating G .k/$1 by a symmetric positive definite matrix H .k/
which is updated at each iteration

! Basic Algorithm:

– initialize H .1/ to any positive definite matrix (H .k/ D I is a good


choice)
– set search direction s.k/ D $H .k/g .k/
– perform line search along s.k/ giving u.kC1/ D u.k/ C ˛ .k/s.k/
– update H .k/ giving H .kC1/

! Advantages to this method:

– only first derivatives are required

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–37

– positive definite H .k/ implies the descent property


– order of n2 multiplications per iteration

! New aspect is the update calculation of H .kC1/ from H .k/

– attempts to augment H .k/ with second derivative information


gained from k th iteration
– ideally, want the update to change H .1/ into a close approximation
of G .k/$1
– one method of doing this involves defining the differences:
ı .k/ D ˛ .k/s.k/ D x .kC1/ $ x .k/
! .k/ D g .kC1/ $ g .k/
then the Taylor series of the gradient g .k/ gives
&* * '
! .k/ D G .k/ı .k/ C o *ı .k/*
where higher order terms can be neglected.
– since ı .k/ and ! .k/ can only be calculated after the line search,
H .k/ does not usually relate them correctly
– thus, H .kC1/ is chosen to correctly relate the differences
(quasi-Newton condition):
H .kC1/! .k/ D ı .k/

! Computationally, one approach is to introduce a recursive form:


H .kC1/ D H .k/ C E .k/

– let E .k/ be the rank one symmetric matrix avvT


– satisfying the quasi-Newton condition requires:
H .k/! .k/ C avvT ! .k/ D ı .k/

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–38

– which gives rise to the rank one formula:


.kC1/ .ı $ H !/ .ı $ H !/T
H DHC
.ı $ H !/T !

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–39

Example 6: Quasi-Newton methods

! Consider the quadratic function:

L.u/ D 10u21 C u22


" #
10 0
D uT u
0 1
where the initial point is given by
" #
0:1
u.1/ D
1
! Gradient: " #
20u1
g.u/ D
2u2
! Hessian: " #
20 0
G.u/ D
0 2
Iteration k D 1
" # " # " #
2 1 0 $2
g .1/ D H .1/ D s .1/ D ˛ .1/ D 0:0909
2 0 1 $2
Iteration k D 2
" # " # " #
:1 $2 $0:0818
) u.2/ D u.1/ C ˛ .1/s.1/ D C .0:0909/ D
1 $2 0:8182
" #
$1:6364
g .2/ D
1:6364

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–40

" # " # " #


$0:0818 0:1 $0:1818
ı .1/ D u.2/ $ u.1/ D D $
0:8182 1 0:1818
" # " # " #
$1:6364 2 $3:6363
! .1/ D g .2/ $ g .1/ D $ D
1:6364 2 $0:3636
" # " #" # " #
$0:1818 $3:6363 1 0
3:4545
v.1/ D ı .1/ $ H .1/! .1/ D $ D
0:1818 $:3636 0 1
0:1818
& .1/ ' & .1/ ' " #
.1/ .1/ .1/ .1/ T
ı $H ! ı $H ! 0:0550 $0:0497
H .2/ D H .1/ C & .1/ 'T D
ı $ H .1/! .1/ ! .1/ $0:0497 0:9974
" #" # " #
0:0550 $0:0497 $1:6364 0:1713
s.2/ D $H .2/g .2/ D D
$0:0497 0:9974 1:6364 $1:7135

˛ .2/ D 0:4775

Iteration k D 3
" # " # " #
$0:0818 0:1713 0
) u.3/ D u.2/ C ˛ .2/s.2/ D C .0:4775/ D
0:8182 $1:7135 0
" #
0
g .3/ D
0
" # " # " #
0 $0:0818 0:0818
ı .2/ D u.3/ $ u.2/ D $ D
0 0:8182 $0:8182
" # " # " #
0 $1:6364 1:6364
! .2/ D g .3/ $ g .2/ D $ D
0 1:6364 $1:6364
c 2013-2017, M. Scott Trimboli
Lecture notes prepared by M. Scott Trimboli. Copyright #
ECE5570, Parameter Optimization: Unconstrained 3–41

" #
$0:0895
v.2/ D ı .2/ $ H .2/! .2/ D
0:8953
" #
0:05 0
H .3/ D
0 0:5

! Note that the algorithm terminates with g " D 0 and H " D G $1


! It can be proven that under some mild conditions, the method
terminates on a quadratic function in at most n C 1 steps, with
H .nC1/ D G $1
! Two other well-known quasi-Newton algorithms are:

– DAVIDON -F LETCHER -P OWELL (DFP):

.kC1/ ıı T H !! T
HDFP DHC T $ T
ı ! ! H!
– B ROYDEN -F LETCHER -G OLDFARB -S HANNO (BFGS):
+ , T !
T T T
.kC1/ ! H ! ıı ı! H C H !ı
HBF GS D H C 1 C T $
ı ! ıT ! ıT !

ı The BFGS algorithm is perhaps the most widely used


Quasi-Newton numerical algorithm and works well with low
accuracy line searches

c 2013-2017, M. Scott Trimboli


Lecture notes prepared by M. Scott Trimboli. Copyright #
(mostly blank)
(mostly blank)
(mostly blank)

You might also like