0% found this document useful (0 votes)
57 views

Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)

This document discusses dynamic programming and linear quadratic (LQ) control for both discrete-time and continuous-time systems. It provides the mathematical models and criteria for linear quadratic regulation and tracking problems. It also describes how to solve the discrete-time LQ problem using dynamic programming by deriving the Riccati equation and optimal state feedback control law. The solution is computed recursively backwards in time.

Uploaded by

balkyder
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)

This document discusses dynamic programming and linear quadratic (LQ) control for both discrete-time and continuous-time systems. It provides the mathematical models and criteria for linear quadratic regulation and tracking problems. It also describes how to solve the discrete-time LQ problem using dynamic programming by deriving the Riccati equation and optimal state feedback control law. The solution is computed recursively backwards in time.

Uploaded by

balkyder
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Dynamic Programming and Linear

Quadratic (LQ) control (discrete-


time and continuous time cases)
Dynamic programming

”Principle of optimality” (Bellman 1957)


(Linear, non-linear, time-varying
systems).
Computer applications; can be used to solve
the LQ-problem analytically.

Books:
• Kirk (1998), “Optimal Control Theory”
• Bryson and Ho (1975), “Applied Optimal
Control: Optimization, Estimation, and
Control”
• Athans and Falb (1966), “Optimal Control:
An Introduction To The Theory And Its
Applications “
LQ problems (Linear system, quadratic
criterion)

Continuous time

Model: x (t )  A(t ) x(t )  B(t )u (t ), x(t0 )  x0


y (t )  C (t ) x(t )  D(t )u (t )

Criterion to be minimized:
tf

min
 J    x (t )T
Q (t ) x (t )  u (t )T
R (t )u (t )dt
u 0

Find u such that the criterion is minimized. That is the


control law.
Discrete time

Model: x(k  1)  A(k ) x(k )  B(k )u (k ), x(k0 )  x0


y (k )  C (k ) x(k )  D(k )u (k )

Criterion to be minimized:

  x(k ) Q(k ) x(k )  u (k )T R (k )u (k )


N

 J
T
min
u k  k0

The final time of the optimization horizon, tf, or N can be finite


or ∞.The inital state x(t0 ) or x(k0 ) is given. The final state x(t f )
or x( N ) can be fixed or free.
Note: The theory covers time-varying systems and
criterions also (matrices like A(k), Q(k) etc. can depend on
time).

In the regulator problem the system is desired to follow a


constant signal. The system equations can always be
scaled such that the desired state is the origin of the state
space.

In the servo problem (tracking problem) the system is


desired to follow a changing reference signal.
Solution to the LQ problem
Discrete process Continuous process

DYNAMIC PROGRAMMING

Recursive equations Hamilton-Jacobi-Bellman-


to the cost equation; the Riccati equation
State feedback control law State feedback control law
u(x(t),t) u(x(t),t) y
PLANT PLANT

x
xe STATE
-K(t) -K(t)
ESTIMATOR
Principle of Optimality
(Bellman 1957)
“An optimal policy has the property that no
matter what the previous decision (i.e. controls)
have been, the remaining decisions must
constitute an optimal policy with regard to the
state resulting from those previous decisions.”

By applying this principle the number of candidates for


the optimal solution can be reduced.

Calculations ”backwards in time”.


Ex. Routing problem
Discrete-time optimization
problem
xk 1  f k ( xk , uk ) Process

N 1 Criterion to be
J i ( xi )   ( N , x N )   L ( x k , u k )
k
minimized
k i

Note that when the final state is free, there can be an


additional cost related to that state.
Use the principle of optimality. Let the optimal control
be calculated from time k+1 to N for all states
x at time k+1 and consider what happens
xk 1  f k ( xk , u k )
Problem
N 1
J i ( xi )   ( N , x N )   Lk ( xk , u k )
k i

Lk ( xk , u k )  J k*1 ( xk 1 ) Determination of
the solution by the

J k* ( xk )  min Lk ( xk , uk )  J k*1 ( xk 1 )  principle of
optimality

Find uk such that the expression is minimized;


optimal cost at time k.
An example tels more

x k 1  x k  u k Process, plant

N 1
1
J 0  x N2   u k2 Criterion (to be minimized)
2 k 0

u k  1, 0.5,0,0.5,1 Admissible (allowed) controls

x k  0,0.5,1.0,1.5 Admissible set of states


0  x k  1.5
Some matrix theory

Definition: Let A be a symmetric matrix. It is


positive definite, if the scalar xT Ax  0 for all
non-zero vectors x.

xT Ax  0 (often written as A > 0)

Correspondingly, A is negative definite, if


xT Ax  0 (often written as A < 0)

A is positive semidefinite, if xT Ax  0 , A0


A is negative semidefinite, if xT Ax  0, A  0

14
Note that if the matrix A is not symmetric in xT Ax it can be
symmetrized by the identity (prove)

 A  AT 
x Ax  x 
T T
x
 2 
Hence it can always be assumed that the weight matrixes in the
cost criterion (Q and R) are symmetric.

From linear algebra: The eigenvalues of symmetric matrices are


real. A symmetric matrix is positive definite, if and only
if all eigenvalues are positive. The matrix is positive
semidefinite, if and only if the eigenvalues are nonnegative.
Consider for multivariable x the scalar f ( x)  x Ax and the
T

vector Ax. It holds


f
 xT ( A  AT )
x
and  ( Ax)
A
x

where the gradient of a scalar is assumed to be a row vector with


respect to the components of x.
Solution of the discrete-time LQ-problem by
using dynamic programming

x k 1  Ax k  Bu k Process
1 T

1 N 1 T
J  x N S N x N   x k Qx k  u kT Ru k
2 2 k i
 Criterion

( S N  0, Q  0, R  0) symmetric

xi given x N free
*
Find u k in inteval i, N  minimizing the criterion
1 T
J N*  xN S N xN , k  N Cost from the end state
2
1 T 1 T 1 T
J N 1  x N 1Qx N 1  u N 1 Ru N 1  x N S N x N
2 2 2
Backwards in time to time
instant N-1
x N 1Qx N 1  u TN 1 Ru N 1   Ax N 1  Bu N 1  S N  Ax N 1  Bu N 1 
1 T 1 1
J N 1 
T

2 2 2
J N 1
0  Ru N 1  B T S N  Ax N 1  Bu N 1  Minimize
u N 1

u *
N 1 
  B SN B  R
T
 1
B T S N Ax N 1
The solution can be presented in the form


u *N 1   K N 1 x N 1 , K N 1  B T S N B  R 1
BT S N A

By substituting into J N 1 gives the optimal cost

J N* 1 
1 T
2
 T

x N 1  A  BK N 1  S N  A  BK N 1   K NT 1 RK N 1  Q x N 1

Define

S N 1   A  BK N 1  S N  A  BK N 1   K NT 1 RK N 1  Q
T

1 T
J N* 1  x N 1 S N 1 x N 1
2
Backwards to the time instant k = N-2
1 T 1 T 1 T
J N 2  x N  2 Qx N  2  u N  2 Ru N  2  x N 1 S N 1 x N 1
2 2 2

Now determine u N  2 , but the equations have the same form


*

as above. We obtain the general


solution

K k  B T S k 1 B  R 1
B T S k 1 A
u k*   K k x k
S k   A  BK k  S k 1  A  BK k   K kT RK k  Q
T

1 T
J  xk S k xk
*
k
2
The equation for Sk (Riccati equation) can also be written in the form

S k  A  S k 1  S k 1 Bk  Bk S k 1 Bk  Rk  Bk S k 1  Ak  Qk , k  N , S N given

T T 1 T
k
 

Note that Sk and K are calculated ”backwards in time”. They can


be calculated in advance and saved to be used when
control starts at time k0. The procedure matches exactly the
principle of optimality and the earlier graphical examples.

The result was quite beautiful, because it could be presented in


closed form (state feedback). One may ask, if a similar procedure
exists when the final state is fixed. Unfortunately the answer is
no. We can use dynamic programming here also and form the
solution, but it cannot be expressed as compactly as in the
previous case. The solution must be computed case by case.
The control law is time-varying, because the coefficient Kk changes
with K. For infinite horizon problems we can use the stationary
solution of the Riccati equation

S k  S k 1  S
which also gives a constant K. In Matlab the command dlqr does
the job.
Continuous-time systems
1. Discretization, 2. Direct continuous-time solution

1. Discretization

x (t )  f ( x, u , t )
T
J (0)   ( x(T ), T )   L( x(t ), u (t ), t )dt
0

Discretize both model and criterion, interval h

x (kh)   x k 1  x k  / h x k 1  x k  hf ( x k , u k , kh)
By defining

f k ( x k , u k )  x k  hf ( x k , u k , kh)

the model attains the desired form


x k 1  f k ( x k , u k )

Discretize the criterion function by writing


N 1 ( k 1) h
J (0)   ( x(T ), T )    L( x(t ), u(t ), t )dt N T /h
k 0 kh

which can be approximated by


N 1
J (0)   ( x(T ), T )   hL( x k , u k , kh)
k 0
By defining the discrete functions

J 0  J ( 0)
 s ( N , x N )   ( x( Nh), Nh)
Lk ( x k , u k )  hL( x k , u k , kh)

and the criterion has the desired form

N 1
J 0   s ( N , x N )   Lk ( x k , u k )
k 0
If the system is linear, it is preferable to use the ZOH-
equivalent

x k 1  A S x k  B S u k
in which the coefficient matrices are

A e
S Ah
B S   e A Bd
0

The control is piecewise linear

u (t )  u k* , kh  t  k  1h
Taylor series of functions
One variable
1
f ( x  h)  f ( x )  f ' ( x ) h  f ' ' ( x ) h 2  
2!
f ( x  h)  f ( x )  f '( x ) h 1st order approximation
Two variables
f x  h, y  k   f (x, y)  f x (x, y)h  f y (x, y)k 


1 2
2!
h f xx (x, y)  2hkfxy (x, y)  k 2 f yy (x, y) 
f ( x  h, y  k )  f ( x , y )  f x ( x , y ) h  f y ( x , y ) k
1st order approximation
2. Direct continuous-time solution

The Hamilton-Jacobi-Bellman equation

x  g ( x, u ), x(0)  x0
T
J   h( x, u )dt
0
 T
Divide into two inter-
J   h( x, u )dt   h( x, u )dt vals and apply the
0  principle of optimality
J  f ( x, T )  h ( x, u )   f ( x   x, T   )
 h ( x, u )   f ( x  g ( x, u )  , T   )
f f
 h ( x, u )   f ( x, T )  g ( x, u )   
x T
 f f 
min f ( x, T )  min J  min  h( x, u )  f ( x, T )  g ( x, u )   
 x T 
f  f 
  min  h( x, u )  g ( x, u ) 
T  x 

Ex. x (t )  u (t ), x(0)  x0

 
T
1
J   u 2 (t )  x 2 (t )  x 4 (t ) dt
0  
2

H-J-B: f 1 4 f 
 2
 min u  x  x  u 
2

T  2 x 
which is minimized by the control

1 f
u 
*

2 x
leading to the cost equation

f 1  f 
2
1
     x 2  x 4 , f ( x, 0)  0
T 4  x  2
But how to solve this?

Actually, an analytical solution exists in this case, but it is


very complicated. Generally analytical solutions are not
available. Numerical solutions are needed.
Solution to the continuous-time LQ-problem:

x  Ax  Bu, t  t 0 process

tf

J (t0 )  x (t f ) S (t f ) x(t f )    xT Qx  u T Ru  dt
1 T 1
2 2 t0
criterion to be minimized

S (t f )  0, Q  0, R  0
symmetric weight matrices.
f  f 
H-J-B:  min
  h ( x , u )  g ( x , u ) 
T û  x 
An educated guess: when using optimal control, the cost has
the form

Note. T is the ”time left” t f  t ;


1 T
f ( x, T )  x S (T ) x
2
also transpose. S is symmetric.
1 T S 1 T 
 x
2 T
x  min
 
 2
x Qx 
1 T
2
u Ru 
1 T
2
x  S  S T
  Ax  Bu 

u

1 T 1 T 
 min
  x Qx  u Ru  x T
SAx  x T
SBu 
u  2 2 
 
 1
 u T ( R  RT )  xT SB  u T R  xT SB  0
u 2
 Ru  BT Sx  0  u   R 1 BT Sx   Kx

Substitute the control into H-J-B to obtain the optimal cost


1 T S 1 1
x x  xT Qx  xT SBR 1 RR 1 BT Sx  xT SAx  xT SBR 1 BT Sx
2 T 2 2
 xT  Q  SBR 1 BT S  2 SA  x
1
2

x SAx  x  SA  AT S  x
T 1 T
But (prove) so that
2
1 T S
x  xT  Q  SBR 1 BT S  SA  AT S  x
1
x
2 T 2
S
  Q  AT S  SA  SBR 1 BT S , boundary condition S (T )
T

Now the cost is quadratic, as desired. Note the direction of time

trajectory
S S
S

T t

time
u*   Kx, K  R 1 BT S
t
 S (t )  AT S  SA  SBR 1 BT S  Q
T
The Riccati equation

 S (t )  AT S  SA  SBR 1 BT S  Q, t  t f , boundary condition S (t f )

K  R 1 B T S

u   Kx optimal control, note. K = K(t)

1 T optimal cost
J (t 0 )  x (t 0 ) S (t 0 ) x(t 0 )
*

2
If the optimization horizon is ”long”, K approaches a
constant matrix, which is obtained by solving the stationary
Riccati equation. So, use S (t )  0 in the Riccati equation.
But the Riccati equation is still nonlinear and difficult to
solve.

The Matlab function LQR calculates the stationary


solution.
Summary:

Discrete-time case

x k 1  Ak x k  Bk u k , k  i
1 T

1 N 1 T
J i  x N S N x N   x k Qk x k  u kT Rk u k
2 2 k i

S N  0, Qk  0, Rk  0
Solution :

S k   A  BK k  S k 1  A  BK k   K kT RK k  Q
T


K k  B S k 1 Bk  Rk
T
k  1
BkT S k 1 Ak , k  N

u k   K k xk , k  N
1 T
J i*  xi S i xi
2
The Riccati equation can also be written in the form
S k  A  S k 1  S k 1 Bk  Bk S k 1 Bk  Rk  Bk S k 1  Ak  Qk , k  N , S N given

T T 1 T
k
 
Continuous-time case:

x  Ax  Bu, t  t 0
tf

J (t0 )  xT (t f ) S (t f ) x(t f )    xT Qx  u T Ru  dt
1 1
2 2 t0

S (t f )  0, Q  0, R  0

Note. The matrices can also be time-varying,


A = A(t) etc. like previously in the discrete case.
Riccati equation
 S (t )  AT S  SA  SBR 1 BT S  Q, t  t f ,
boundary condition S (t f )

K  R 1 B T S
u   Kx
1 T
J * (t 0 )  x (t 0 ) S (t 0 ) x(t 0 )
2
It can be proven that if the system is reachable, then the
solution of the LQ problem exists, is unique and leads to an
asymtotically stable closed loop system. (Both discrete and
continuous time cases).

Actually, it is enough that the system is stabilizable


(possible
non-controllable modes are asymptotically stable).

Reachability (no non-controllable modes) implies


stabilizability,
of course.
But what about the servo problem. How to get rid of the
steady-state error?

x  Ax  Bu
y  Cx

The optimal control, when reference r is connected

u   Lx  r
leads to the closed-loop system

x  ( A  BL ) x  Br
The corresponding transfer function is

 
Y ( s )  C ( sI  ( A  BL)) 1 B R( s )
but the static gain

C ( A  BL) 1 B
is not necessarily one. If the reference is a known
constant, a suitable (static) precompensator can be
used, which makes the gain from r to z one.

But what if r varies? Solution: add integration to the


system (controller), which removes the error.
How to add Integration?

Take a new state variable

x n1
such that
x n1  r  y  r  Cx
An augmented state-space realization is obtained

 x   A 0   x   B 0
 x    C 0   x    0  u  1  r
 n1     n1     
Apply the state feeback to this

x 
u   L ln1    r ln1 is scalar
 x n1 

The closed loop system is then

 x   A  BL  Bln1   x   B 
 x      x   1  r
 n1    C 0   n1   

When the state moves to a constant value, the component


xn 1 moves to the origin; then the output follows the reference.

Note that this is a suboptimal solution.


Example.
Q=[1 0;0 1];
R=1;
[L,S,E]=lqr(A,B,Q,R);
L = 0.2361 0.5723
S = 1.5158 0.2361
0.2361 0.5723
E =-0.7862 + 1.2720i
-0.7862 - 1.2720i
1

0.5

x1,x2
-0.5

-1

-1.5
0 1 2 3 4 5 6 7 8 9 10
t
0.6

0.4

0.2
u

-0.2
0 1 2 3 4 5 6 7 8 9 10
t
Reference is constant; calculate the static gain
[A1,B1,C1,D1]=linmod('intha2')

K=1/dcgain(A1,B1,C1,D1)
1

0.8

0.6

0.4

0.2
0 1 2 3 4 5 6 7 8 9 10

1.1

0.9

0.8

0.7

0.6
0 1 2 3 4 5 6 7 8 9 10
Adding an integrator
C2=[1 0];
A2=[A zeros(2,1);-C2 0];
B2=[B;0];
Q2=eye(3);
R2=1;
[L,S,E]=lqr(A2,B2,Q2,R2);
1

0.5

-0.5

-1
0 1 2 3 4 5 6 7 8 9 10

-1

-2
0 1 2 3 4 5 6 7 8 9 10

In the lower figure the component x3 has been given more


weight in the criterion.

You might also like