Namic Programming
Namic Programming
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
8
11
12
12
15
15
1
The principle of optimality
Jab
Jbe
a
Jab
(a)
Jbce
Jbe
e
(b)
Figure 1.1: (a) Optimal path from a to e. (b) Two possible optimal paths
from b to e, (Kirk, 2004)
Suppose that the first decision (made at a) results in segment a b with
cost Jab and that remaining decisions yield segment b e at a cost Jbe . The
2
minimum cost Jae
from a to e is therefore:
Jae
= Jab + Jbe
but this last relation can be satisfied only by violating the condition that
a b e is optimal path from a to e. Thus the assertion is proved.
Bellman has called the above property of an optimal policy the principle
of optimality:
An optimal policy has the property that whatever the initial state and
initial decision are, the remaining decisions must constitute an optimal
policy with regard to the state resulting from the first decision.
In other words, an optimal trajectory has the property that at an intermediate point, no matter how it was reached, the rest of trajectory must
coincide with an optimal trajectory as computed from this intermediate point
as the initial point.
Bellmans principle of optimality serves to limit the number of potentially optimal control strategies that must be investigated.
Rutherford Aris restates the principle in more colloquial terms:
If you dont do the best with what you have happened to have got, you will
never do the best with what you should have had.
The following example illustrates the procedure for making a single optimal decision with the aid of the principle of optimality.
Example 1.1 , from (Weber, 2000).
Consider the problem in which a traveler wishes to minimize the length
of a journey from town A to town J by first traveling to one of B, C or D
and then onwards to one of E, F or G then onwards to one of H or I and the
finally to J. Thus there are 4 stages. The arcs are marked with distances
between towns. Let F (X) be the minimal distance required to reach J from
X. Then clearly:
F (J) = 0; F (H) = 3; F (I) = 4;
3
B
2
4
6
3
C 2
4
6
4
3
2
The recurrence relation of dynamic
programming
x(t)
= a(x(t), u(t))
(2.1)
0
0
tf
2
...
N-1
(2.3)
N
1
X
g(xk , uk )
(2.4)
k=0
or
J = h(xN ) +
N
1
X
L(xk , uk )
(2.5)
k=0
By making the problem discrete as we have done, it is now required that the
optimal control law:
u (x0 , 0), u (x1 , 1), ...u (xN 1 , N 1)
be determined for the system given by (2.3) which has the performance measure (2.5).
Begin by defining
JN N (xN ) = h(xN )
JN N is the cost of reaching the final state value xN .
Next define:
JN 1,N (xN 1 , uN 1 ) = L(xN 1 , uN 1 ) + h(xN ) = L(xN 1 , uN 1 ) + JN N (xN )
which is the cost of operation during the interval (N 1)t t Nt.
Observe that JN 1,N is also the cost of a one-stage process with initial
state xN 1 . The value of JN 1,N is dependent only on xN 1 and uN 1 since
xN is related to xN 1 , uN 1 through the state equation (2.3).
The optimal cost is then:
JN 1,N (xN 1 ) = min (L(xN 1 , uN 1 ) + JN N (f(xN 1 , xN 1 )))
uN1
xk
xN1
u1
J1
u2
xN-1
u3
J3
N-1
xN2
J2
xN3
N2
The principle of optimality states that for this two stage process, whatever
the initial state xN 2 and initial decision uN 2 , the remaining decision must
be optimal with respect to the value of xN 1 that results from application of
uN 1 . Therefore:
For a stage k, the performance measure over the last k stages is defined as:
Jk,N (xk , uk ) =
N
1
X
L(xi , ui )
i=k
Jk+1,N
(xk+1 ) i.e. uk can be found from :
Jk,N
(xk ) = min L(xk , uk )) + Jk+1,N
(xk+1 )
uk
or
7
(2.6)
Jk,N
(xk ) = min
L(xk , uk )) + Jk+1,N
(f(xk , uk ))
u
k
(2.7)
To begin the process we simply start with a zero stage process and generate
JN N = JN N (the * notation is just a notational convenience here; no choice
of a control is implied). Next the optimal cost can be found for a one-stage
process by using JN N .
2.1
N
1
X
L(xk , uk )
k=0
Jk,N
= min L(xk , uk ) + Jk+1,N
(xk+1 ) = min L(xk , uk ) + Jk+1,N
(f (xk , uk ))
uk
uk
This can be used to solve recursively for the optimal control starting from
= h(xN )
The following algorithm may summarize these results:
JN N (xN )
}|
Step N J0N
(x0 ) = min {L(x0 , u0 ) + J1N
(f (x0 , u0 ))}
u0
x0 = 4, k = 0, 1, 2 . . . N 1
Find the optimal control policy that minimizes the cost function:
J = (xN 10)2 +
1
1 NX
(x2 + u2k )
2 k=0 k
1
1X
(x2 + u2k )
2 k=0 k
J22
= (x2 10)2
Then:
J12
(x1 ) = min
J12 (x1 , u1 ) = min
u
u
1
1
= min (x21 + u21 ) + (x2 10)2
u1
2
1 2
2
2
= min
(x + u1 ) + (2x1 3u1 10)
u1
2 1
u1 can be found from
1 2
(x + u21 ) + J22
(x2 ) =
2 1
J12 (x1 , u1 )
=0
u1
J12 (x1 , u1 )
= u1 6(2x1 3u1 10) = 0
u1
u1 =
12x1 60
19
9
(2.8)
and
J12
(x1 )
"
1 2
12x1 60
=
x1 +
2
19
100
27 2 40
=
x1 x1 +
38
19
19
J02
(x0 )
2 #
+ 2x1 3
2
12x1 60
10
19
J02 (x0 , u0 )
=
u0
u0
1 2
27
40
100
x0 + u20 +
(2x0 3u0 )2
(2x0 3u0) +
2
38
19
19
6 27
120
= 2u0
u0 (2x0 3u0 ) +
=0
38
19
81
60
x0
= 0.618x0 0.458
(2.9)
131
131
J02
(x0 ) = J02
(4) = 13.97 the minimum value of the performance measure
for transferring the system from x0 = 4 to x2
u0 =
12x1 60
19
u1 = 1.923
x1 = 1.9541,
x2 = 9.679
J = J02
= 13.97
With a known initial state calculate in advance the entire control sequence
u0 , u1 , . . . uN 1
and apply it open loop to bring the plant from the initial state to zero.
Use the variable dependencies of the control value on the states at every
discrete time, as calculated for example in (2.8) and (2.9).
This is closed loop control with a time varying control (depending on
sample number k)
This closed loop control is preferable because open loop control is vulnerable to disturbances and uncertainties in model parameters.
Note also that this computational procedure is not trivial to be applied
for a large number of samples, it may result in very complicated calculations,
thus it is applicable for a limited number of situations.
2.2
Exercises
1
X
x2k + 2u2k
k=0
N
1
X
u2k
k=0
11
2.3
2.3.1
Example
(2.10)
Find the optimal control policy u0 , u1 , ..., uN 1 that minimizes the performance measure:
1
1 NX
J = x2N +
u2
(2.11)
2 k=0 k
Let N = 2 and consider that uk is limited to the discrete values: (1, 1/2, 0, 1/2, 1)
an xk is limited to (0, 1/2, 1, 3/2). The computational grid (i.e. all the
admissible values for all time samples) for u and x is presented in Figure 2.3.
u k, x k
3/2
computational
grid for x
1
1/2
0
k
2
-1/2
computational
grid for u
-1
N = 2, J2,2
(x2 ) = x22
12
1
1X
u2
2 k=0 k
(2.12)
x2 J2,2
(x2 ) = x22
0
0
1/2
1/4
1
1
3/2
9/4
N = 1, J1,2
(x1 ) = minu1 (J2,2
(x2 ) + 12 u21 )
x1
0
u1 x2 = x1 + u1
-1
-1
-1/2
-1/2
0
0
1/2
1/2
1
1
1/2 -1
-1/2
-1/2
0
0
1/2
1/2
1
1
3/2
1
-1
0
-1/2
1/2
0
1
1/2
3/2
1
2
3/2 -1
1/2
-1/2
1
0
3/2
1/2
2
1
5/2
J2,2
(x2 ) + 12 u21
optimum
not admissible
not admissible
u1 = 0
0 + 1/2 02 = 0
J1,2
(0) = 0
1/4 + 1/2 1/4 = 3/8
1 + 1/2 12 = 3/2
not admissible
0 + 1/2 1/4 = 1/8
u1 = 1/2
1 + 1/2 02 = 1
J1,2
(1) = 3/8
9/4 + 1/2 1/4 = 19/8
not admissible
1/4 + 1/2 1 = 3/4
1 + 1/2 1/4 = 9/8
u1 = 1
2
N = 0, J0,2
(x0 ) = minu0 (J1,2
(x1 ) + 12 u20 )
13
x0
0
u0 x1 = x0 + u0
-1
-1
-1/2
-1/2
0
0
1/2
1/2
1
1
1/2 -1
-1/2
-1/2
0
0
1/2
1/2
1
1
3/2
1
-1
0
-1/2
1/2
0
1
1/2
3/2
1
2
3/2 -1
1/2
-1/2
1
0
3/2
1/2
2
1
5/2
J1,2
(x1 ) + 12 u20
optimum
not admissible
not admissible
u0 = 0
2
0 + 1/2 0 = 0
J0,2 (0) = 0
1/8 + 1/2 1/4 = 1/4
3/8 + 1/2 12 = 7/8
not admissible
0 + 1/2 1/4 = 1/8 u0 = 1/2 or u0 = 0
If the initial state is x0 = 1, the last table indicates the value of the
optimal cost from x0 = 1 to the final state x2 :
J0,1
(1) =
1
4
and the optimal control value for the last stage (N = 0): u0 = 1/2 transfers
the system from state x0 = 1 to the state x1 = x0 + u0 = 1/2.
The table calculated for N = 1 gives the optimal control u1 which transfers the system from x1 = 1/2 to x2 = x1 + u1 = 1 + (1/2) = 0.
Thus, the optimal control sequence is:
1 1
u : ,
2 2
and the optimal state trajectory starting from 1:
x : 1,
The minimum cost is:
J0,2
=
14
1
, 0
2
1
4
2.3.2
Interpolation
In the preceding example, all of the trial control values drive the state of
the system either to a computational grid point or to a value outside the
allowable range.
If the numerical values were not carefully selected, this happy situation
would not have been obtained and interpolation would have been required.
J
J2
Ji
J1
x1
xi
x2 x
J1 J2
x1 J2 x2 J1
, n=
x1 x2
x1 x2
The calculate Ji for state value xi (which is not on the computational grid)
from
Ji = mxi + n
2.3.3
The algorithm
A general algorithm for discrete dynamic programming, for a first order system, may be formulated as in Algorithm 1. The procedure may be extrapolated for higher-order systems with a similar procedure (for details see (Kirk,
2004)).
15
N
1
X
L(xk , uk )
k=0
j = 1, nru
i = 1, nrx
16
Bibliography
and
Wen, J. T. (2002).
Optimal control.
fs.rpi.edu/ wenj/ECSE644S12/info.htm.
17
control.
online
at
online at http://cats-