Performing Aggressive Maneuvers Using Iterative Learning Control
Performing Aggressive Maneuvers Using Iterative Learning Control
Abstract— This paper presents an algorithm to iteratively Non-causal control laws allow ILC to preemptively com-
drive a system quickly from one state to another. A simple pensate for disturbances or model uncertainties which are
model which captures the essential features of the system is used constant from trial to trial. Formulating the problem and
to compute the reference trajectory as the solution of an optimal
control problem. Based on a lifted domain description of that controller in the lifted domain is a natural way of exploiting
same model an iterative learning controller is synthesized by the repetitive nature of the experiments, made viable by
solving a linear least-squares problem. The non-causality of the advancements in computer processors and memory. Rice and
approach makes it possible to anticipate recurring disturbances. Verhaegen [8] present a structured unified approach to ILC
Computational requirements are modest, allowing controller synthesis based on the lifted state-space description of the
update in real-time. The experience gained from successful
maneuvers can be used to significantly reduce transients when plant/controller system.
performing similar motions. The algorithm is successfully In practice ILC have been applied to repetitive tasks
applied to a real quadrotor unmanned aerial vehicle. The results performed by stationary systems, such as wafer stages [9],
are presented and discussed. chemical reactors [10], or industrial robots [11]. Applications
to autonomous vehicles are more rare.
I. I NTRODUCTION
This paper presents a lifted domain ILC algorithm which
With the increasing popularity of autonomous systems enables a system to perform an aggressive motion, i.e.
there arises a need to take advantage of their full capabilities. drive the system from one state to another. Aggressive in
One approach to increase the performance is to identify the this context characterizes a maneuver that takes place in
system well and apply advanced control methods. However, the nonlinear regime of the system and/or close to the
this possibly involves extensive system identification efforts. state or input constraints. This maneuver would be hard to
A different paradigm is to put the complexity in the software. tune by hand or would require very accurate knowledge of
A relatively simple model in conjunction with an adaptive the underlying system. Instead, the featured algorithm only
algorithm and a well-chosen set of sensors allows each ve- requires a comparatively simple model and initial guess for
hicle to experimentally determine how to perform a difficult the input. In case of an unstable system it is assumed that a
maneuver and to compensate for individual differences in the stabilizing controller is available.
system dynamics. The algorithm is intended for autonomous vehicles. How-
One data-based approach is called iterative learning con- ever, it can be applied to a wide range of different dy-
trol (ILC). The idea behind ILC is that the performance of namic systems without change. The controller update can
a system executing the same kind of motion repeatedly can be executed online with modest computational resources. If
be improved by learning from previous executions. Given a a particular maneuver is performed satisfactorily the gained
desired output signal ILC algorithms experimentally deter- knowledge can be utilized to perform a maneuver which is
mine an open-loop input signal which approximately inverts similar to the one just learned. This can reduce transients
the system dynamics and yields the desired output. Bristow and improve convergence.
et al. [1] provide a survey of different design techniques for The first step of the algorithm is the computation of the
ILC. reference trajectory and input by solving an optimal control
Chen and Moore [2] present an approach based on local problem. The nonlinear model is then linearized about the
symmetrical double-integration of the feedback signal and reference and discretized resulting in a discrete time linear
apply it to a simulated omnidirectional ground vehicle. time-varying (LTV) system. The lifted description of this
Ghosh and Paden [3] show an approach based on approx- LTV system defines the input-output relationship of the
imate inversion of the system dynamics. Chin et al. [4] system for one complete experimental run in form of a single
merge a model predictive controller [5] with an ILC [6]. The matrix. After performing an experiment the results are stored
real-time feedback component of this approach is intended and compared with the ideal trajectory, yielding an error
to reject non-repetitive noise while the ILC adjusts to the vector. Solving a linear least-squares (LLS) problem based on
repetitive disturbance. Cho et al. [7] put this approach in a the lifted LTV system and the error vector yields the change
state-space framework. in the input signal for the next trial. The algorithm terminates
1732
A. Generation of a reference trajectory experiments and synthesize a non-causal controller
The reference trajectory qd (t) is the solution of an optimal Q̃ = P Ũ (16)
control problem (OCP): compute the minimum time solution
q̃(1) ũ(0)
q̃(2) ũ(1)
T
Q̃ = . , Ũ = (17)
(qd (t), ud,temp (t)), ud,temp (t) = [f0,d (t) f1,d (t)] (8) ..
..
.
which drives the system (1) from the initial state q(0) = q0 q̃(N ) ũ(N − 1)
to the final state q(tf ) = qf , subject to constraints on the BD (0) 0 ··· 0
control effort AD (1)BD (0) BD (1) ··· 0
P =
.. .. ..
. . .
|fi (t)| ≤ fi,max , |f˙i (t)| ≤ f˙i,max (9)
ΦBD (0) ··· BD (N − 1)
(18)
with the maximum thrust fi,max and maximum rate of
change of thrust f˙i,max dependent on the used mo- Φ = AD (N − 1)AD (N − 2)...AD (1) (19)
tor/battery/propeller combination. For the purpose of this pa- where the capital letters Q̃ and Ũ indicate lifted versions of
per the OCP is solved using RIOTS [12], an optimal control q̃ and ũ while P denotes the matrix containing the lifted
toolbox written in Matlab and C. The above formulation of dynamics. A widely used [1] ILC approach is
the OCP has been chosen for simplicity in the expressions of
the constraints, which benefits the numerical solution process Uj+1 = L1 [Uj + L2 Ej ] (20)
of the OCP. However, in subsequent parts of the algorithm where the index j denotes the trial, L1 and L2 denote two
the inputs according to system (5) are being used. Therefore filter functions (matrices), and Ej is the lifted error signal
the optimal inputs ud,temp (t) have to be transformed to
T Ej = Qd − Qj,m , Qj,m = Qj + noise (21)
ud (t) = [fa,d (t) θc,d (t)] , such that ud (t) applied to (5)
yields qd (t). Substituting (3) into the third line of (1) yields with Qj,m representing noisy measurements. The presented
ILC takes advantage of the given model which captures
1 the essential dynamics of the underlying system. The exact
(f0 l0 − f1 l1 ) = kθ̇ θ̇ + kθ (θ − θc ) (10)
j formulation of the update law depends on the assumptions
made about the noise. In case that the noise d(k) does not
which can be solved for θc . Together with (4) this yields change from trial to trial the system takes the form
fa,d (t) = f0 (t) + f1 (t) (11) qj,m (k) = qd (k) + q̃j (k) + d(k) (22)
kθ̇ θ̇d (t) + kθ θd (t) − 1j (f0 (t)l0 − f1 (t)l1 ) Qj,m = Qd + Q̃j + D (23)
θc,d (t) = (12)
kθ with D being the lifted constant disturbance vector, rep-
resenting modeling errors or repeatable process noise for
Note that kθ is not equal to zero if the control loop around θ example. Using (16) and (21) it follows that
has a proportional term, which is the case for the application
at hand. Qj,m = Qd + P Ũj + D (24)
Ej = Qd − Qj,m = −P Ũj − D (25)
B. Tracking the reference trajectory Performing a single experiment or trial with Ũ0 = 0 yields
an error signal of E0 = −D. The input which minimizes the
One basic assumption of the algorithm is that the motion
square of the error signal is the solution of a linear least-
of the vehicle stays close to the generated reference trajectory
squares problem:
qd (t). Linearizing (5) about this trajectory and input yields
Ũ1 = arg min kEk22 = arg min kP Ũ + Dk22 (26)
∂f ∂f Ũ Ũ
˙
q̃(t) = q̃(t) + ũ(t) (13)
∂q ∂u = −P † D (27)
qd ,ud qd ,ud
†
= A(t)q̃(t) + B(t)ũ(t) (14) = Ũ0 + P E0 (28)
= Ũ0 + L2 E0 (29)
with q = qd + q̃ and u = ud + ũ. Converting to a discrete
where P † indicates the pseudo-inverse
time system results in a linear time-varying system
P† = lim (P T P + I)−1 P T , >0 (30)
→0
q̃(k + 1) = AD (k)q̃(k) + BD (k)ũ(k) (15)
which can be computed by well-established methods such as
singular value decomposition (SVD). The input Ũ1 applied
with k denoting a discrete time step and N being the trial
to the same system (24), will result in an error of
length. The dynamics (15) of a complete trial are written
in the lifted domain to exploit the repetitiveness of the E1 = −(I − P P † )D (31)
1733
The update law L2 = P † can be non-causal, which results is the direct application of the algorithm from Section III.
in a dense matrix P † . In case there is not only a constant However, this would neglect valuable information gained
disturbance but additional white noise vj (k) that changes from previous experiments. A better method is to utilize the
from trial to trial the system takes the form final input ŨM,1 and trajectory Q̃M,1 from tracking qd,1 (t)
in order to provide better initial guesses for the tracking of
qj,m (k) = qd (k) + q̃j (k) + d(k) + vj (k) (32)
qd,2 (t). The approach described here involves the adjustment
Qj,m = Qd + Q̃j + D + Vj (33) of the model parameters ρ (7). The goal is to adjust the
Ej = Qd − Qj,m = −P Ũj − D − Vj (34) nonlinear model such that the model (5) in conjunction with
the real input explains the real output, i.e.
with Vj being the lifted noise. All components of vj (k) and
Vj are assumed to be independent and identically distributed qM,1 (k) = qth,1 (k), k ∈ [1, N ] (42)
(iid) zero-mean Gaussian white noise. Using a similar ap-
proach as (28) for the update law results in with
Z t(k)
Ũj+1 = Ũj + αP † Ej , α ∈ (0, 1) (35) qth,1 (k) = f(ρ, q(τ ), uM,1 (τ ))dτ, q(0) = qM,1(43)
(0)
0
Substituting (34) the input dynamics in the trial domain are
Using lifted vectors this can be posed as a nonlinear quadratic
Ũj+1 = (I − αP † P )Ũj − αP † D − αP † Vj (36) optimization problem
2
which can readily be shown to be equal to ρ∗ = arg min kQM,1 − Qth,1 k2 (44)
ρ
Ũj+1 = − 1 − (1 − α)j+1 P † D
This optimal ρ∗ is then substituted into the model (5) to
j
X provide the basis for the algorithm as described in Section
− αP † (1 − α)j−i Vi (37)
III.
i=0
The selection of adjustment parameters can be crucial
while assuming Ũ0 = 0. In the limit with the number of for the convergence of an optimization process involving a
iterations tending towards infinity the input and error become nonlinear system. The particular vector (7) has been chosen
Ũ∞ = −P † D − P † W (38) since it allows the adjustment of the relationships between
† †
inputs and states. Further, it provides good results in practice,
E∞ = −[I − P P ]D − V∞ + P P W (39) see Section V. However, it should be noted that this selection
j
X is not unique and that other parameter vectors could provide
W = α (1 − α)j−i Vi (40) similar results.
i=0
The easiest approach is to define a ρ which is valid
while introducing the equivalent noise vector W over the entire duration of the trial. However, to improve
α performance of the adjustment process it is possible to
E[W ] = 0, E[W W T ] = E[V V T ] (41)
2−α define Nρ different parameter sets ρn which are valid during
The parameter α serves as a tuning parameter to regulate consecutive intervals [kρ,n,0 , kρ,n,f ] of equal size ∆kρ , i.e.
the influence of Vj . For α approaching 1 the variance is
kρ,n,f + 1 = kρ,n+1,0 (45)
not reduced, it is the same as performing the update only
once. For α approaching zero the solution tends towards the kρ,n,f − kρ,n,0 = ∆kρ (46)
optimum, i.e. minimizes the influence of Vj . The downside Figure 3 shows a plot of the residual of the optimization
is that the number of iterations required to get this solution (44) over the number of parameter sets Nρ for a typical
tends towards infinity. In practice the trade-off has to be experimental result (Q̃M , ŨM ). For the actual experiments
somewhere in between.
In addition to the update law L2 it is possible to introduce
a low-pass filter in L1 , which rejects high frequency noise
that gets injected by the measurements.
The algorithm terminates successfully if the error Ej is
smaller than a specified threshold. In that case the final
input is denoted ŨM and the final experimental trajectory
is denoted Q̃M .
IV. E XTENDING THE M ANEUVER
In the previous section an algorithm was presented which
iteratively tracks a given trajectory. In this section a method is
proposed which facilitates the tracking of qd,2 (t), assuming
that it is already known how to track qd,1 (t), with qd,1 (t)
and qd,2 (t) being similar. The most straightforward approach Fig. 3. Residual of Optimization over Number of Parameter Sets
1734
Nρ was set to four. The values of ρ∗n are not unreasonably
different from the unadjusted ρ0 , as can be seen in Table I.
TABLE I
C OMPARISON OF ρ
1735
Fig. 6. Maneuver 3, State Error
Fig. 5. Maneuver 2, State Error
[3] Ghosh J., Paden B.: “Pseudo-inverse based iterative learning control
previously gained information by adjusting the underlying for nonlinear plants with disturbances”, Proceedings of the 38th IEEE
model. The legend for Figure 6 is the same as for Figure 5. Conference on Decision and Control, Vol. 5, pp. 5206-5212, 1999,
DOI 10.1109/CDC.1999.833379
VI. C ONCLUSION [4] Chin I., Qin S.J., Lee K.S., Cho M.: “A two-stage iterative learning
control technique combined with real-time feedback for independent
An algorithm has been presented which enables a system disturbance rejection”, Automatica, Vol. 40, No. 11, pp. 1913-1922,
to iteratively perform an aggressive motion, given a simple 2004
[5] Lee K.S., Lee J.H., Chin I.S., Lee H.J.: “A model predictive control
model which captures the essential dynamics of the system. technique for batch processes and its application to temperature
Expressing the problem in the lifted domain allows the tracking control of an experimental batchreactor”, A.I.Ch.E. Journal,
synthesis of a non-causal controller, which can anticipate Vol. 45, No. 10, pp. 21752187, 1999
[6] Lee J.H., Lee K.S., Kim W.C.: “Model-based iterative learning control
recurring disturbances and compensate for them by adjusting with a quadratic criterion for time-varying linear systems”, Automat-
a feedforward signal. The controller synthesis is formulated ica, Vol. 36, pp. 641657, 2000
as a LLS problem, which can be readily solved and executed [7] Cho M., Lee Y., Joo S., Lee K.S.: “Semi-Empirical Model-Based
Multivariable Iterative Learning Control of an RTP System”, IEEE
online with modest computational resources. Using the data Transactions on Semiconductor Manufacturing, Vol. 18, No. 3, pp.
from a well tracked trajectory it is possible to adjust the 430-439, 2005
model in order to learn a motion which is similar to the [8] Rice J.K., Verhaegen M.: “Lifted repetitive learning control for
stochastic LTV systems: A structured matrix approach”, Submitted
original reference. The algorithm has been successfully ap- to: Automatica, March, 2007
plied to a quadrotor UAV, reducing the error norm of the [9] de Roover D., Bosgra O.H.: “Synthesis of robust multivariable iterative
desired maneuver by an order of magnitude. learning controllers with application to a wafer stage motion system”,
Int. J. Contr., vol. 73, no. 10, pp. 968979, 2000
[10] Mezghani M., Roux G., Cabassud M., Le Lann M.V., Dahhou B.,
R EFERENCES Casamatta G.: “Application of iterative learning control to an exother-
[1] Bristow D.A., Tharayil M., Alleyne A.G.: “A survey of iterative mic semibatch chemical reactor”, IEEE Trans. Contr. Syst. Technol.,
learning control”, IEEE Control Systems Magazine, Vol. 26, No. 3, vol. 10, no. 6, pp. 822834, 2002
pp. 96-114, 2006 [11] Norrlof M.: “An adaptive iterative learning control algorithm with
[2] Chen Y.Q., Moore K.L.: “A Practical Iterative Learning Path- experiments on an industrial robot”, IEEE Trans. Robot. Automat.,
Following Control of an Omni-Directional Vehicle”, Special Issue on vol. 18, no. 2, pp. 245251, 2002
Iterative Learning Control, Asian Journal of Control, Vol. 4, No. 1, [12] Schwartz A., Polak E., Chen Y.: “RIOTS 95”, Optimal Control
pp. 90-98, 2002 Toolbox for Matlab V6.5
1736