Local Koopman Operators For Data-Driven Control of Robotic Systems
Local Koopman Operators For Data-Driven Control of Robotic Systems
Abstract—This paper presents a data-driven methodology representation allows one to control the nonlinear system using
for linear embedding of nonlinear systems. Utilizing structural tools from linear optimal control [15], [16], which is typically
knowledge of general nonlinear dynamics, the authors exploit much easier and faster to implement than nonlinear methods.
the Koopman operator to develop a systematic, data-driven
approach for constructing a linear representation in terms of As a result, it enables online feedback for high-dimensional
higher order derivatives of the underlying nonlinear dynamics. nonlinear systems. Interestingly enough, beyond the reduction
With the linear representation, the nonlinear system is then in feedback complexity, controlling the linear representation
controlled with an LQR feedback policy, the gains of which need instead of the original nonlinear system can even lead to better
to be calculated only once. As a result, the approach enables fast performance [17].
control synthesis. We demonstrate the efficacy of the approach
with simulations and experimental results on the modeling and The Koopman operator can be readily combined with ma-
control of a tail-actuated robotic fish and show that the proposed chine learning tools to help learn unknown dynamics from
policy is comparable to backstepping control. To the best of our data [18]–[24]. With regard to robotic tasks involving fluid
knowledge, this is the first experimental validation of Koopman- environments, uncertain terrains or other complicated dynam-
based LQR control. ics such as those of bipedal walking or running, the ability
to use data to learn or adapt the model is significant. As a
I. I NTRODUCTION
result, the Koopman operator is a promising framework for
Optimal control theory has reached a level of maturity such data-driven system identification. More importantly, however,
that there are a number of available schemes suitable for prob- and as we detail later in Section II, the Koopman operator
lems with known dynamics. Examples of such include linear framework differs from standard system identification schemes
quadratic regulator (LQR) [1], linear model predictive control in that it places the learning task in the context of seeking
(LMPC) [2], nonlinear model predictive control (NMPC) [3], linear transformations of the states, which is useful for control
feedback linearization [4], differential dynamic programming purposes [25]–[27].
(DDP) [5], sequential action control (SAC) [6] and variants A downside of the Koopman operator, however, is that,
of the above [7]–[9]. The plethora of available techniques unless a finite-dimensional invariant subspace exists [17], it
allows one to compute satisfactory solutions for nonlinear is infinite-dimensional and presents practical challenges in
and high-dimensional systems. At the same time, it is often modeling and control. For this reason, recent studies try to
imperative that control solutions be calculated in real time. Un- obtain a finite-dimensional approximation to the Koopman
fortunately, the high nonlinearity and dimensionality present operator that describes the dynamics with high fidelity [13],
in many robotic systems are often an obstacle to the real-time [16]. In this trade-off between the dimensionality and the
implementation of nonlinear feedback control schemes [10]. modeling error of the linear representation, the challenge
Further, many robotic applications involve dynamics that are becomes finding the minimum number of basis functions for
unknown, or ever-changing. These challenges call for feedback the desired accuracy [24]. Choosing observable functions that
policies that can use data to adapt their models [11] and that best approximate the Koopman operator, however, remains an
make the necessary approximations to reach a good balance open research question. To the best of our knowledge, there
between model accuracy and computational efficiency. This has not been a systematic way of choosing the Koopman basis
is why, together with the evolution of machine learning tools, functions for general nonlinear systems. Rather, most efforts
there is increasing interest in data-driven modeling and control have relied on trial-and-error [28] and machine learning tools
approaches that can run in real-time. [24], or are system-specific [12].
In light of these challenges, the Koopman operator has In this work, we introduce a way of constructing the basis
recently drawn attention in the robotics community, as it can functions for the Koopman operator using higher-order deriva-
help address both the difficulty with nonlinearity, as well tives of general, but known, nonlinear dynamics, where the
as the need to incorporate data in the model [12], [13]. values of the linear coefficients may be unknown. Using a data-
Specifically, the Koopman operator propagates a nonlinear driven, least-squares technique with a closed-form solution,
system in a linear manner without loss of accuracy by evolving we obtain the coefficients for the linear transformation, based
functions of the states, termed observables [14]. The linear upon which an LQR policy is found. In particular, the LQR
gains need to be computed only once and the actual feedback A. Koopman Invariant Subspaces
control value is computed online with negligible cost. As a There exist nonlinear systems that admit a finite-
result, our approach differs from other data-driven efforts that dimensional linear Koopman representation. Work in [17]
require more intensive online calculations of the control [28]. shows that, for certain systems, there exist Koopman invariant
We validate our approach with simulation and experimental subspaces that lead to finite-dimensional linear representations
results using a tail-actuated robotic fish and compare our of nonlinear systems. The authors also demonstrate that the
method to backstepping control, a sophisticated and well- LQR control based on the linear representation could outper-
studied feedback scheme [29]–[31]. form LQR control calculated based on the original, nonlinear
The organization of the paper is as follows. Section II re- dynamics [34]. Unfortunately, Koopman invariant subspaces
views the Koopman operator and explains how it is used in the have only been found for a limited class of polynomial
present study for data-driven control. Section III describes the systems. Even more, there is no finite-dimensional Koopman
control synthesis approach that uses LQR feedback. Section invariant subspace for systems with multiple fixed points; the
IV illustrates the approach and demonstrates its performance representation of the Koopman operator has no closure [17].
using the system of a tail-actuated robotic fish. Section V Recent studies have focused on approximating the infinite
discusses the findings of this paper, as well as ideas for further dimensional operator K with a finite representation K̃ ∈ Rw×w
expanding this work. that captures the original nonlinear dynamics with acceptable
accuracy [13], [16]. These efforts have largely benefited from
II. KOOPMAN O PERATOR advances in machine learning, which make it possible to use
data-driven regression schemes to obtain a finite-dimensional
This section reviews the Koopman operator, methods to approximation to the Koopman operator. In this paper, we
obtain a finite approximation of the operator, and explains its adopt the least-squares method shown in [13], which we detail
relevance to system identification and optimal control. next.
The Koopman operator K is an infinite-dimensional linear B. Data-driven Finite-dimensional Approximation to Koop-
operator that evolves functions of the state s ∈ Rn (i.e., Ψ(s), man Operators
commonly referred to as observables) of a dynamical system.
That is, In the absence of a finite-dimensional Koopman invariant
subspace, a linear propagation of the states will induce errors.
d The challenge is then to obtain an approximation to the
Ψ(s) = KΨ(s) and Ψ(sk+1 ) = Kd Ψ(sk ), (1) Koopman operator that will linearly evolve the nonlinear
dt
system with tolerable error.
for continuous-time and discrete-time systems, respectively. In To obtain an approximation to the Koopman operator,
other words, it allows one to evolve the nonlinear dynamics in K̃, one chooses a set of observable functions Ψ(s) =
a linear setting without loss of accuracy. Contrary to dynamics [ψ1 (s), ψ2 (s), . . . , ψw (s)] ∈ Rw (which can include the states
that are linearized around a fixed point and become inaccurate themselves) and uses data to solve a least-squares minimiza-
away from the linearization point, the Koopman operator tion problem. To allow for the effect of actuation, (1) is
evolves a nonlinear system with full fidelity throughout the modified such that the observables include control terms as
state space. well [16], [28]. For the discrete-time case, this minimization
Expressing nonlinear systems in a linear manner is a de- takes the form
sirable property for many reasons. For example, Koopman P −1
eigenfunctions reveal state partitions along which the nonlinear
X 1
K̃d∗ = argmin kΨ(sk+1 , uk+1 ) − K̃d Ψ(sk , uk )k2 , (2)
dynamics evolve linearly. The ability to obtain geometric prop- K̃d 2
k=0
erties of nonlinear systems using the Koopman eigenvalues has where P is the number of measurements. Each measurement
drawn the attention of the scientific community. Work in [32], is a set of an initial state sk , final state sk+1 , and the actuation
for example, investigates the global stability of a system using applied at the same instants, uk and uk+1 , respectively.
the eigenfunctions of the Koopman operator, whereas work in The above expression has a closed-form solution, given by
[33] extends the local linearization around a stationary point
to the whole basin of attraction. K̃d∗ = AG † , (3)
In addition to studying the behavior of complex systems, where
the Koopman framework enables the use of feedback that P −1
1 X
is as simple as linear optimal control, while capturing the A= Ψ(sk+1 , uk+1 )Ψ(sk , uk )T
P
original nonlinear dynamics. The ability to control complex k=0
(4)
systems with linear feedback is rather promising for robotic P −1
1 X
applications that remain challenging with nonlinear schemes, G= Ψ(sk , uk )Ψ(sk , uk )T
P
such as underwater locomotion. At the same time, the infinite- k=0
dimensional nature of the Koopman operator renders any and † is the Moore-Penrose pseudoinverse. For a derivation
practical use prohibitive. of (3), the reader can refer to the Appendix. Note that the
time spacing δt between measurements sk and sk+1 must as
be consistent for all P training measurements. Last, one
can switch between the continuous-time and discrete-time 1
g(t) ≈ g(t0 ) + g 0 (t) δt + g 00 (t) δt2 + . . .
operators via K = log(Kd )/ts , where ts is the time between 2
measurements sk and sk+1 . t=t0n t=t0
(n)
δt
We should note that the data-driven approximation of the + g (t)
t=t0 n!
Koopman operator is not inherently different from other sys-
tem identification techniques. The Koopman operator can be g(t0 )
0
i g00(t0 )
approximated using any of the standard regression techniques, h
δt 2
δt n g (t0 )
such as ridge or lasso regression [35], [36]. However, the = 1 δt 2 . . . n! , (5)
..
Koopman operator places the system identification task in a .
meaningful context. Contrary to system identification tools that g (n) (t0 )
may try to estimate unknown parameters or, more generally,
the nonlinear dynamics of a system [23], [37], searching for where δt = t − t0 .
a data-driven Koopman operator translates to searching for a
For a fixed t, (5) can be written in the form of (1), where
linear representation of the nonlinear system.
the derivatives of the function g(t) are equivalent to the
observables Ψ(sk ). That is,
C. Synthesis of Basis Functions
δt2 δtn
1 δt ...
g(t) 2 n! g(t0 )
Here, we motivate the use of higher-order derivatives of δt n−1
g 0 (t)
0 1 δt ...
(n−1)! g 0 (t0 )
known nonlinear dynamics to populate the observables. The
g 00 (t) g 00 (t0 )
n−2
δt
proposed method is a data-driven way of constructing the
≈0 0 1 ...
(n−2)! . (6)
..
..
... .. .. ..
observables in order to approximate a Koopman invariant sub- . . . ... . .
space with a finite number of functions. We should note that (n) (n)
g (t) g (t0 )
the method is not meant to contribute to system identification | {z } | 0 0 0
{z
... 1
} | {z }
Ψ(sk+1 ) Ψ(sk ))
of completely unknown dynamics, but rather to capturing with K̃d
minimal error the evolution of an existing nonlinear model
using a linear representation for the purposes of computational For t close to t0 , it suffices in (5) to just use the first few
efficiency and control performance. As such, it does require derivatives, where each additional derivative has a decreasing
that a model of the dynamics already exists, but not requiring effect on the update of the function g(t) considered. Note
that the linear coefficients of the terms are known. When one that, in (6), all derivatives of g(t) are assumed to be different
is not available, system identification tools can be used, such functions. Further, the highest derivative is not propagated at
as in [23], to characterize the underlying system. all with this representation.
This method is inspired by work in [17] and [34]. The We argue that populating the observables Ψ(sk ) with the
former study identifies Koopman invariant subspaces for a next higher-order derivatives instead of randomly choosing
very limited class of nonlinear systems (whose dynamics basis functions generates, locally in time, the most accurate
have a specific polynomial structure). Their proposed method- linear representation of the nonlinear dynamics. However,
ology of populating the Koopman observable functions is due to the fact that there is no closure of the higher-order
using the Carleman linearization approach, appropriate for derivatives and the series will have to be truncated at some
the types of systems they consider. Despite commenting on point, the analytical expression of (6) would lead to a very
the challenge of non-closure, their approach is specific to inaccurate Koopman operator, as is commented in [17]. For
polynomial systems. Further, their suggested approach is to this reason, we use data-driven techniques to approximate K̃d ,
identify eigenfunctions from data and use those for control, as shown in Section II. B, even when a model is known, to
which they illustrate in [34]. propagate, more accurately than an analytical model of the
For the systems shown to admit a finite-dimensional Koop- form in (6), the last derivatives in terms of existing terms in
man invariant subspace, it is straightforward to show that the observables.
the terms in the observable function Ψ(s) capture all higher- In other words, linearly representing nonlinear dynamics
order derivatives of the original states. This is precisely the without error requires the existence of Koopman invariant
reason why the linear representation matches the nonlinear subspaces. The latter are formed by populating the observable
dynamics with no error. In cases where an invariant subspace functions with higher-order derivatives and provided that there
has not been found, there is no closure of the higher-order is a finite-dimensional closure. Even when no closure exists,
derivatives. However, this way of reasoning allows one to infer populating the basis functions with higher-order derivatives
information about the priority of certain functions over others creates increasingly better approximations to the invariant
in populating Ψ(sk ). That is, the evolution of a nonlinear subspaces. Data-driven methods can then be used to improve
equation ẋ(t) = g(t) can be approximated with a Taylor series the linear approximation of the nonlinear dynamics.
III. LQR ON KOOPMAN O PERATOR For the purposes of speed, we wish to avoid constantly
n
Consider a linear system with states s ∈ R , control u ∈ re-evaluating the expression in (12) when calculating the
Rm , dynamics given by feedback control input. To do that, we choose the basis
functions such that it is not required to re-evaluate A and B.
d
s = As + Bu, (7) That is, we choose Ψs,u (s, u) = u and (12) becomes
dt
and a performance objective
Z ∞ d
Ψs (s) ≈ K̃s Ψs (s) + K̃s,u · u,
J= (s − sdes )T Q(s − sdes ) + uT Rudt, (8) dt
0
where K̃s and K̃s,u are fixed.1 If one wishes to update
where Q 0 ∈ Rn×n and R 0 ∈ Rm×m are weights on
the Koopman operator online, these matrices would vary in
the deviation from the desired states and the applied control,
response to how incoming state measurements change the
respectively. For linear systems of the form in (7), the linear
solution to (3).
quadratic regulator (LQR) controller calculates the minimizer
Then, we define a similar optimization problem to (8)
to (8) in one iteration [1]. The control solution has the form
Z ∞
of a state feedback law of the form
JK̃ = (Ψs (s) − Ψs (sdes ))T QK̃ (Ψs (s) − Ψs (sdes ))
u = −KLQR (s − sdes ), (9) 0
+ uT Rudt, (13)
where KLQR ∈ Rm×n , the LQR gains, need to be calculated
only once for each minimization task defined by (8). Next, we where QK̃ 0 ∈ Rws ×ws penalizes the deviation from the
show how we modify the LQ optimization problem to control desired state of the observable functions Ψ(sdes ). We set
the Koopman representation of a nonlinear system.
Q 0
Consider an approximate Koopman operator K̃, such that QK̃ = ,
0 0
d
Ψ(s, u) ≈ K̃Ψ(s, u). (10) so that a meaningful comparison can be made with regards
dt
to the original nonlinear system and the associated objective
Let Ψ(s, u) = [Ψs (s), Ψs,u (s, u)]T , where Ψs (s) ∈ Rws are
shown in (8). The LQR feedback law for (13) becomes
the functions that depend only on the states, and Ψs,u (s, u) ∈
Rwu are those that depend on the input as well, where w = u = −KLQR (Ψ(s) − Ψ(sdes )). (14)
ws + wu . This notation allows us to re-write (10) as
Note that the control solution only updates based on the func-
d Ψs (s) K̃s K̃s,u Ψs (s)
≈ , (11) tions Ψs (s), thereby significantly reducing the computational
dt Ψs,u (s, u) K̃u,s K̃u,u Ψs,u (s, u) time compared to other feedback schemes that forward-predict
where K̃s ∈ Rws ×ws and K̃s,u ∈ Rws ×wu are sub-matrices the evolution of the system, compute an optimal response, and
of K̃ that describe the dynamics of the functions Ψs (s) that then perform a line search over the entire time horizon to
depend only on the states. Note that the dynamical equation decide the control solution.
(10) has been modified from (1) to allow for control inputs To validate our proposed method for the synthesis of the
[16], [28]. basis functions, we implement the Koopman-based LQR pol-
In order to obtain a state- and control-affine form, we can icy described in this section using a tail-actuated robotic fish,
linearize the Koopman representation with respect to Ψs (s) shown in Fig. 1.
and u. Note that in the Koopman representation, the states are
IV. R ESULTS
expanded from s to Ψs (which can include the states s of the
original system, too). We can then write The states of the robotic fish are s = [x, y, ψ, vx , vy , ω]T ,
d ∂ where x, y are the world-frame coordinates, ψ is the orienta-
Ψs (s) ≈ (K̃s Ψs (s) + K̃s,u Ψs,u (s, u)) · Ψs (s) tion, vx and vy are the body-frame linear velocities (surge and
dt ∂Ψs (s)
sway, respectively), and ω is the body-frame angular velocity.
∂
+ (K̃s Ψs (s) + K̃s,u Ψs,u (s, u)) · u We use α to indicate the angle of the tail. The tail is actuated
∂u with α(t) = αo + αa sin(ωa t), where αa , αo , ωa are the am-
∂
= (K̃s + K̃s,u Ψs,u (s, u)) · Ψs (s) plitude, bias, and frequency of the tail beat. The ranges of the
∂Ψs (s) bias and the amplitude are α0 ∈ [−50◦ , 50◦ ] and αa ∈ [0, 30◦ ],
∂ respectively. To simplify the problem, we keep the frequency
+ (K̃s,u Ψs,u (s, u)) · u. (12)
∂u fixed at ωa = 2π rad/s. These actuation constraints are applied
This form is equivalent to the linearized dynamics of the throughout the simulations and experiments.
original system with (K̃s + K̃s,u ∂Ψ∂s (s) Ψs,u (s, u)) ≡ A(s, u)
∂ 1 Choosing Ψ
and (K̃s,u ∂u Ψs,u (s, u)) ≡ B(s) for the purposes of designing s,u (s, u) = u allows us to simplify (12) at the expense,
however, of less accurate approximation of the dynamics. For example, if
a linear controller. This form allows one to employ linear cos(x)u appears in the dynamics, it will be approximated in the Koopman
control policies, after evaluating the terms A(s, u) and B(s). model as c1 u, where c1 ∈ R.
Simulation Parameters
Parameter Value Parameter Value
mb 0.725 kg ρ 1000 kg/m3
max −0.217 kg S 0.03 m2
may −0.7888 kg CD 0.97
Jbz 2.66 × 10−3 kg · m2 CL 3.9047
Jaz −7.93 × 10−4 kg · m2 KD 4.5 × 10−3
L 0.071 m Kf 0.7
d 0.04 m Km 0.45
c 0.105 m
[39]. that appears in the dynamics is similar to using the time derivatives of the
entire equation of a state (e.g. v̈y (t)). Despite increasing the number of basis
A. Simulation Results functions, we prefer the first approach because it does not require knowing
m2
the coefficients of the individual terms in advance (e.g. m ). As a result, it
In this section, we present simulation results on the data- 1
can be readily used for other robotic tail-actuated fish that have a different
driven LQR control on the tail-actuated robotic fish. First, morphology.
0.03
0.1 Koopman LQR
desired
(rad/s)
vx (m/s)
0
30 45
30 50
-0.2
-0.4 TABLE II: Amplitude and bias for the tail-beat oscillations of
-1 -0.5 0 0.5
x (m)
the robotic fish used to collect experimental data to train the
Koopman operator. The actuation is kept consistent through
Fig. 2: LQR-controlled robotic fish in simulation. The LQR each of the 22 runs (2 trials repeated for each combination of
gains are generated once using the learned Koopman operator. the controls.
The desired trajectory is given in terms of the forward and
angular velocity. Despite using fixed LQR gains, the controlled
system successfully tracks the desired trajectories that were
designed to produce a figure 8 pattern.
1 0.4
0.4
0.8 0.35
(rad)
0.2
x (m)
y (m)
0.6 0.3
experimental run, we apply constant tail bias and amplitude 0.4 0.25 0
for the oscillations of the tail fin. We run 22 experimental runs, 0.2 0.2 -0.2
0 5 10 0 5 10 0 5 10
with two trials for eleven different combinations of actuation t (sec) t (sec) t (sec)
parameters. The tail beat frequency is ωa = 2π rad/s for all 0.1 0.02 0.2
runs. The actuation patterns used for the experimental data are 0.1
(rad/s)
v x (m/s)
v y (m/s)
0
shown in Table II. 0.05
0
y (m)
v y (m/s)
-0.1
0.02
-0.2
continuously based on the initial states of each of the 22 0.02
0.01 -0.3
experimental runs. Then, we compare the resulting simulated 0 0 -0.4
0 5 10 15 0 5 10 15 0 5 10 15
trajectories against the corresponding experimental ones. For t (sec) t (sec) t (sec)
the purposes of illustration, we show two such comparisons (b) α0 = 45◦ , αa = 25◦
in Fig. 3. The linear Koopman model, despite not perfect,
reasonably follows the experimental data for at least five
seconds. We believe that the modeling is worsened by the Fig. 3: Fitness between Koopman model and experimental
average model (15) used to describe the longer-term behavior measurements. The green line shows data interpolated from
of the dynamics, rather than the original dynamic model. To experimental measurements (blue dots) every dt = 0.005 s.
improve the fitness, one might wish to avoid the average The red line shows the evolution of the states using the
model and instead use Kirchoff’s equations for a rigid body in Koopman model. The actuation is constant for each of the
fluid environment. Alternatively, one could also use a system two runs and is indicated in the caption.
identification algorithm, such as SINDy [23], to first obtain a
model for the nonlinear dynamics of the system. We plan to
Koopman LQR Straight Line Arc Line Circle
explore these avenues in future work. While we validate our approach with an example of tracking
Next, we use the Koopman operator, choose Q and R in a fluid environment, the proposed method can be used
to define the minimization problem (13), and calculate the for any system that can benefit from data-driven methods or
infinite-horizon LQR gains. We then run experiments using reduction of the nonlinearity. However, underwater robotics
the feedback policy in (14) to track a line, an arc, and a circle. is perhaps the most suitable application for this method, due
Feedback is implemented at a slow rate (1 Hz) due to speed to the inherent environmental uncertainty, the highly nonlinear
limitations in the image processing (about 4 Hz) used for the dynamics, and the need for controllers that use limited compu-
estimation of the states. tation (to preserve battery use or due to limited computational
We compare our method to backstepping feedback that power). While this method could certainly be applied to other
uses the model in (15) to calculate control responses online. systems, it perhaps would not be as useful for low-dimensional
Backstepping is a widely-used method that offers a systematic systems, with known dynamics and few nonlinearities.
way of synthesizing control for a system in strict-feedback In the future, we will extend both the experimental and
form [30], [31]. Via Lyapunov analysis, it guarantees the theoretical aspects of this work. Specifically, we plan on
stability of the system and it allows the accommodation of validating our method on a gliding robotic fish moving in
input constraints. Backstepping control generally requires low the 3D underwater environment, currently under development.
computational effort, often required for online applications. We also hope to compare our method against using traditional
As is seen in Fig. 4, the proposed data-driven policy is com- system identification tools to obtain the nonlinear dynamics.
parable to the more sophisticated, model-based backstepping Further, we wish to provide formal guarantees for the optimal-
controller. This result is promising, given that Koopman-based ity of our proposed structure of the basis functions, as well
LQR does not require the special form of dynamics that is as describe a process that can provably capture a nonlinear
needed for backstepping control. system in a linear setting up to the desired order of accuracy.