A2 Linear-Quadratic Optimal Control
A2 Linear-Quadratic Optimal Control
1
Linear quadratic optimization is a basic method for designing controllers
for linear (and often nonlinear) dynamical systems and is actually frequently
used in practice, for example in aerospace applications. Moreover it also has
interpretations in terms of “classical control” notions, such as disturbance re-
jection, phase and gain margin, etc. (topics we will not cover, but a reference is
[AM07]). In estimation, it leads to the Kalman filter, which we will encounter
in chapter 5. In this chapter however, we continue with our investigation of
problems where the full state of the system is observable, and describe the
solution of the Linear Quadratic Regulator (LQR) problem.
Some references: [Ber07, section 4.1], Slides from Stephen Boyd’s EE363
class [Boya], [Ath71, AM07] (mostly continuous-time).
4.1 Model
We consider in this chapter a system with linear dynamics
xk+1 = Ak xk + Bk uk + wk ,
where xk and uk are real vectors of dimension n and m respectively, the states
and controls uk are unconstrained (Xk = Rn , Uk (xk ) = Rm for all k), and
the disturbances wk are independent random vectors (independent of xk and
uk ), with a known probability distribution with zero mean and finite second
moment matrix Wk = E[wk wkT ]. The matrices Ak ∈ Rn×n and Bk ∈ Rn×m are
called the dynamics and input matrices respectively. Such a model sometimes
comes from the discretization of a continuous-time system, as in example 1.1.2
of chapter 1, but sometimes the discrete-time nature of the model is more
intrinsic, for example in production planning or inventory control problems.
37
We wish to minimize the quadratic (finite-horizon) cost
!N −1 #
"
E (xTk Qk xk + uTk Rk uk ) + xTN QN xN , (4.1)
k=0
38
4.2 Solution of the Linear Quadratic Regulator
$ %T $ %$ %
x A B x
min xT Ax + uT Cu + 2xT Bu = min
u u u BT C u
$ %
A B
X= .
BT C
If X " 0, we know from convex analysis that the minimum value is a convex
function of x, so S " 0.
39
Solution using Backward Induction
In this section, we compute the optimal cost and optimal policy for the LQR
problem. The DP algorithm gives
∗
JN (xN ) = xTN QN xN
& '
Jk∗ (xk ) = min Ewk xTk Qk xk + uTk Rk uk + Jk+1
∗
(Ak xk + Bk uk + wk ) .
uk
Similarly to chapter 3, we show the key property of the value function using
backward induction. In this case, we show that Jk∗ (xk ) = xTk Pk xk + qk , for
some matrix Pk " 0 and constant qk ≥ 0. That is, the cost function is convex
quadratic in the state (with no linear terms). The induction hypothesis is true
for k = N , with PN = QN and qN = 0. Assuming it is true for index k + 1, we
show that it is then true for index k. In the DP recursion, we obtain
When evaluating the expectation, all terms that are linear in wk vanish because
wk is zero mean. So we obtain
We can rewrite the last term (it’s useful to remember this trick for other opti-
mization problems, although here it’s not adding much)
Now the minimization over uk in the first term corresponds to our result in the
previous paragraph on the partial minimization of a quadratic function, with
matrix $ T %
Ak Pk+1 Ak + Qk ATk Pk+1 Bk
X= .
BkT Pk+1 Ak BkT Pk+1 Bk + Rk
Because of our assumption that Rk $ 0, the matrix BkT Pk+1 Bk + Rk is positive
definite. The solution is then a control law that is linear in the state
and using the Schur complement result, we obtain directly that Jk∗ (xk ) =
xTk Pk xk + qk , with
Pk = ATk Pk+1 Ak + Qk − ATk Pk+1 Bk (BkT Pk+1 Bk + Rk )−1 BkT Pk+1 Ak (4.3)
and qk = qk+1 + Tr(Pk+1 Wk ). Note that for all k we have Pk " 0, which
can be seen simply by observing that the cost Jk∗ (xk ) is by its initial definition
40
lower bounded by 0, hence the minimization of xTk Pk xk yields a bounded value.
Finally, we see that the optimal cost for the problem is
N
" −1
J0∗ (x0 ) = xT0 P0 x0 + Tr(Pk+1 Wk )
k=0
N
" −1
= Tr(P0 X0 ) + Tr(Pk+1 Wk ), (4.4)
k=0
where X0 = x0 xT0 .
Remark. Without the assumption Rk $ 0, we could use the more general Schur
complement result and replace inverses by pseudo-inverses under appropriate
assumptions, following the remark in the previous paragraph.
This solution has a number of attractive properties which help explain its
popularity. First, we automatically obtain a closed-loop feedback law (as al-
ways with the DP approach) (4.2) which has the convenient property of being
linear. Hence we can synthesize automatically a optimum linear controller once
the cost matrices have been specified. This control law depends on the avail-
ability of the gain matrices Kk , which can be computed in advance however and
stored in the memory of the controller. This computation requires the compu-
tation of the matrices Pk using the backward difference equation (4.3), called
the (discrete-time) Riccati difference equation, initialized with PN = QN .
Moreover, a remarkable property of the solution is that the gain matrices
Kk and the Riccati equation do not depend on the actual characteristics of the
disturbances wk , which only enter in the total cost (4.4). The deterministic
problem with no disturbance (variables {wk } deterministic and equal to their
mean, 0 in this case) has the exact same solution, with total cost simply equal
to Tr(P0 X0 ). This special property shared by most linear quadratic optimal
control problems is called the certainty equivalence principle. We could have
designed the optimal controller even if we had assumed that the values of the
disturbances were certain and fixed to their mean. Note that although this
might be an positive characteristic from a computational point of view, it is
not necessarily so from a modeling point of view. Indeed, a consequence of the
certainty equivalence principle is that the controller does not change with the
disturbance variability as long as it is zero-mean, i.e., the LQR controller is risk-
neutral. There is a more general class of problems (linear exponential quadratic
[Jac73]), which allows us to take into account the second order properties of
the process noise, while still admitting a nice linear solution for the control
law. A brief introduction to this risk sensitive control problem will be given in
the problems.
Finally, note that we could use this solution for the inventory control prob-
lem, for which the dynamics are linear, as long as we assume the cost to be
quadratic. This was proposed in 1960 by Holt et al. [HMMS60]. However,
according to Porteus [Por02, p. 112], “the weakness of this model lies in the
difficulty of fitting its parameters to the cost functions observed in practice and
41
in the fact that it must allow for redemption of excess stock as well as replen-
ishment of low stock levels”. Indeed, this does not allow for a fixed positive
ordering cost. Moreover, reductions in the stock level are allowed.
42
Figure 4.1: Sample state and control trajectories for the problem (4.5) with
two different control costs. The realization of the noise trajectory is the same
in both cases.
Note here already that there is an apparent difficulty in defining the optimal
control problem as N → ∞ because the total cost (4.4) diverges, at least in
the presence of noise. The appropriately modified cost function in this case
consists in studying the average-cost
!N −1 #
1 "
J ∗ = lim E (xTk Qk xk + uTk Rk uk ) . (4.7)
N →∞ N
k=0
The optimal average-cost value, obtained by using the steady state controller
K described above, is then Tr(P W ), and in particular it is independent of the
initial condition! For practical applications, the optimal steady-state controller
is often used even for finite horizon problems because it is simpler to compute
and much easier to store in memory than the time-varying controller. Its good
practical performance can be explained by the rapid convergence observed as
in Fig. 4.2.
43
Figure 4.2: Controller gains Kk for different values of the final state cost matrix.
Note in every case the rapid convergence as the horizon becomes more distant.
44