Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
Books:
• Kirk (1998), “Optimal Control Theory”
• Bryson and Ho (1975), “Applied Optimal
Control: Optimization, Estimation, and
Control”
• Athans and Falb (1966), “Optimal Control:
An Introduction To The Theory And Its
Applications “
LQ problems (Linear system, quadratic
criterion)
Continuous time
Criterion to be minimized:
tf
min
J x (t )T
Q (t ) x (t ) u (t )T
R (t )u (t )dt
u 0
Criterion to be minimized:
J
T
min
u k k0
DYNAMIC PROGRAMMING
x
xe STATE
-K(t) -K(t)
ESTIMATOR
Principle of Optimality
(Bellman 1957)
“An optimal policy has the property that no
matter what the previous decision (i.e. controls)
have been, the remaining decisions must
constitute an optimal policy with regard to the
state resulting from those previous decisions.”
N 1 Criterion to be
J i ( xi ) ( N , x N ) L ( x k , u k )
k
minimized
k i
Lk ( xk , u k ) J k*1 ( xk 1 ) Determination of
the solution by the
J k* ( xk ) min Lk ( xk , uk ) J k*1 ( xk 1 ) principle of
optimality
x k 1 x k u k Process, plant
N 1
1
J 0 x N2 u k2 Criterion (to be minimized)
2 k 0
14
Note that if the matrix A is not symmetric in xT Ax it can be
symmetrized by the identity (prove)
A AT
x Ax x
T T
x
2
Hence it can always be assumed that the weight matrixes in the
cost criterion (Q and R) are symmetric.
x k 1 Ax k Bu k Process
1 T
1 N 1 T
J x N S N x N x k Qx k u kT Ru k
2 2 k i
Criterion
( S N 0, Q 0, R 0) symmetric
xi given x N free
*
Find u k in inteval i, N minimizing the criterion
1 T
J N* xN S N xN , k N Cost from the end state
2
1 T 1 T 1 T
J N 1 x N 1Qx N 1 u N 1 Ru N 1 x N S N x N
2 2 2
Backwards in time to time
instant N-1
x N 1Qx N 1 u TN 1 Ru N 1 Ax N 1 Bu N 1 S N Ax N 1 Bu N 1
1 T 1 1
J N 1
T
2 2 2
J N 1
0 Ru N 1 B T S N Ax N 1 Bu N 1 Minimize
u N 1
u *
N 1
B SN B R
T
1
B T S N Ax N 1
The solution can be presented in the form
u *N 1 K N 1 x N 1 , K N 1 B T S N B R 1
BT S N A
J N* 1
1 T
2
T
x N 1 A BK N 1 S N A BK N 1 K NT 1 RK N 1 Q x N 1
Define
S N 1 A BK N 1 S N A BK N 1 K NT 1 RK N 1 Q
T
1 T
J N* 1 x N 1 S N 1 x N 1
2
Backwards to the time instant k = N-2
1 T 1 T 1 T
J N 2 x N 2 Qx N 2 u N 2 Ru N 2 x N 1 S N 1 x N 1
2 2 2
1 T
J xk S k xk
*
k
2
The equation for Sk (Riccati equation) can also be written in the form
S k A S k 1 S k 1 Bk Bk S k 1 Bk Rk Bk S k 1 Ak Qk , k N , S N given
T T 1 T
k
S k S k 1 S
which also gives a constant K. In Matlab the command dlqr does
the job.
Continuous-time systems
1. Discretization, 2. Direct continuous-time solution
1. Discretization
x (t ) f ( x, u , t )
T
J (0) ( x(T ), T ) L( x(t ), u (t ), t )dt
0
x (kh) x k 1 x k / h x k 1 x k hf ( x k , u k , kh)
By defining
f k ( x k , u k ) x k hf ( x k , u k , kh)
J 0 J ( 0)
s ( N , x N ) ( x( Nh), Nh)
Lk ( x k , u k ) hL( x k , u k , kh)
N 1
J 0 s ( N , x N ) Lk ( x k , u k )
k 0
If the system is linear, it is preferable to use the ZOH-
equivalent
x k 1 A S x k B S u k
in which the coefficient matrices are
A e
S Ah
B S e A Bd
0
u (t ) u k* , kh t k 1h
Taylor series of functions
One variable
1
f ( x h) f ( x ) f ' ( x ) h f ' ' ( x ) h 2
2!
f ( x h) f ( x ) f '( x ) h 1st order approximation
Two variables
f x h, y k f (x, y) f x (x, y)h f y (x, y)k
1 2
2!
h f xx (x, y) 2hkfxy (x, y) k 2 f yy (x, y)
f ( x h, y k ) f ( x , y ) f x ( x , y ) h f y ( x , y ) k
1st order approximation
2. Direct continuous-time solution
x g ( x, u ), x(0) x0
T
J h( x, u )dt
0
T
Divide into two inter-
J h( x, u )dt h( x, u )dt vals and apply the
0 principle of optimality
J f ( x, T ) h ( x, u ) f ( x x, T )
h ( x, u ) f ( x g ( x, u ) , T )
f f
h ( x, u ) f ( x, T ) g ( x, u )
x T
f f
min f ( x, T ) min J min h( x, u ) f ( x, T ) g ( x, u )
x T
f f
min h( x, u ) g ( x, u )
T x
Ex. x (t ) u (t ), x(0) x0
T
1
J u 2 (t ) x 2 (t ) x 4 (t ) dt
0
2
H-J-B: f 1 4 f
2
min u x x u
2
T 2 x
which is minimized by the control
1 f
u
*
2 x
leading to the cost equation
f 1 f
2
1
x 2 x 4 , f ( x, 0) 0
T 4 x 2
But how to solve this?
x Ax Bu, t t 0 process
tf
J (t0 ) x (t f ) S (t f ) x(t f ) xT Qx u T Ru dt
1 T 1
2 2 t0
criterion to be minimized
S (t f ) 0, Q 0, R 0
symmetric weight matrices.
f f
H-J-B: min
h ( x , u ) g ( x , u )
T û x
An educated guess: when using optimal control, the cost has
the form
1 T 1 T
min
x Qx u Ru x T
SAx x T
SBu
u 2 2
1
u T ( R RT ) xT SB u T R xT SB 0
u 2
Ru BT Sx 0 u R 1 BT Sx Kx
x SAx x SA AT S x
T 1 T
But (prove) so that
2
1 T S
x xT Q SBR 1 BT S SA AT S x
1
x
2 T 2
S
Q AT S SA SBR 1 BT S , boundary condition S (T )
T
trajectory
S S
S
T t
time
u* Kx, K R 1 BT S
t
S (t ) AT S SA SBR 1 BT S Q
T
The Riccati equation
K R 1 B T S
1 T optimal cost
J (t 0 ) x (t 0 ) S (t 0 ) x(t 0 )
*
2
If the optimization horizon is ”long”, K approaches a
constant matrix, which is obtained by solving the stationary
Riccati equation. So, use S (t ) 0 in the Riccati equation.
But the Riccati equation is still nonlinear and difficult to
solve.
Discrete-time case
x k 1 Ak x k Bk u k , k i
1 T
1 N 1 T
J i x N S N x N x k Qk x k u kT Rk u k
2 2 k i
S N 0, Qk 0, Rk 0
Solution :
S k A BK k S k 1 A BK k K kT RK k Q
T
K k B S k 1 Bk Rk
T
k 1
BkT S k 1 Ak , k N
u k K k xk , k N
1 T
J i* xi S i xi
2
The Riccati equation can also be written in the form
S k A S k 1 S k 1 Bk Bk S k 1 Bk Rk Bk S k 1 Ak Qk , k N , S N given
T T 1 T
k
Continuous-time case:
x Ax Bu, t t 0
tf
J (t0 ) xT (t f ) S (t f ) x(t f ) xT Qx u T Ru dt
1 1
2 2 t0
S (t f ) 0, Q 0, R 0
K R 1 B T S
u Kx
1 T
J * (t 0 ) x (t 0 ) S (t 0 ) x(t 0 )
2
It can be proven that if the system is reachable, then the
solution of the LQ problem exists, is unique and leads to an
asymtotically stable closed loop system. (Both discrete and
continuous time cases).
x Ax Bu
y Cx
u Lx r
leads to the closed-loop system
x ( A BL ) x Br
The corresponding transfer function is
Y ( s ) C ( sI ( A BL)) 1 B R( s )
but the static gain
C ( A BL) 1 B
is not necessarily one. If the reference is a known
constant, a suitable (static) precompensator can be
used, which makes the gain from r to z one.
x n1
such that
x n1 r y r Cx
An augmented state-space realization is obtained
x A 0 x B 0
x C 0 x 0 u 1 r
n1 n1
Apply the state feeback to this
x
u L ln1 r ln1 is scalar
x n1
x A BL Bln1 x B
x x 1 r
n1 C 0 n1
0.5
x1,x2
-0.5
-1
-1.5
0 1 2 3 4 5 6 7 8 9 10
t
0.6
0.4
0.2
u
-0.2
0 1 2 3 4 5 6 7 8 9 10
t
Reference is constant; calculate the static gain
[A1,B1,C1,D1]=linmod('intha2')
K=1/dcgain(A1,B1,C1,D1)
1
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6 7 8 9 10
1.1
0.9
0.8
0.7
0.6
0 1 2 3 4 5 6 7 8 9 10
Adding an integrator
C2=[1 0];
A2=[A zeros(2,1);-C2 0];
B2=[B;0];
Q2=eye(3);
R2=1;
[L,S,E]=lqr(A2,B2,Q2,R2);
1
0.5
-0.5
-1
0 1 2 3 4 5 6 7 8 9 10
-1
-2
0 1 2 3 4 5 6 7 8 9 10