2017 Model-Based Control Using Koopman Operators
2017 Model-Based Control Using Koopman Operators
Abstract—This paper explores the application of Koopman of data to perform model-based control of various dynamical
operator theory to the control of robotic systems. The operator systems [11]. Nonetheless, several questions about the training
is introduced as a method to generate data-driven models that
arXiv:1709.01568v1 [cs.RO] 5 Sep 2017
yk = g(xk ) = Ψ(xk ), (5) Note that the state of the system, x ∈ Rn , is now included in
Ψ(x). Thus we can write the approximate dynamical equations
where Ψ(x) is a user-defined vector-valued function of the considered system as
Ψ(x) = [ψ1 (x), ψ2 (x), . . . , ψN (x)]. (6) xk+1 ≈ K̂ T Ψ(xk )T , (15)
Next, the relation described by (4) is now given as where K̂ T ∈ Rn×N is the first n columns of K. Note that
Ψ(xk+1 ) = Ψ(xk )K + r(xk ). (7) equation (15) simply propagates forward the quantities of
interest (e.g. system states). Furthermore, in this work, xk+1
where K ∈ CN ×N and r(xk ) is a residual (approximation is described as a linear combination of the system state, xk ,
error). Note that the matrix K advances Ψ forward one time and the functions ψi (xk ).
III. C ONTROL S YNTHESIS : O PEN - AND C LOSED - LOOP Open-Loop: Open-loop trajectory optimization precom-
C ONTROLLERS putes the set of trajectory and control actions that minimize
the objective function (23) subject to the modeled dynamical
In this section we formulate open- and closed-loop model- constraints in (22). Projection-based optimization [19] is used
based controllers using the Koopman operator. It is first in discrete time to generate the set of trajectory and control
shown that for a differentiable choice of basis function Ψ, the actions given an initial trajectory xk and control uk for k ∈
Koopman operator has a linearization that can be computed for [0, N ]. In the experiment, the projection-based optimization
model-based control methods. Given the linearizable Koopman algorithm first generates the control actions based on the
operator, a model-based optimal control problem is formulated dynamical model and then at a fixed rate the command signals
for open- and closed-loop controllers. are sent via Bluetooth communication to the robot. Odometry
data is collected only for post-processing and is not used to
A. Koopman Operator Linearization update the command signals.
Closed-Loop: In the simulated and the experimental work, a
By choosing a Ψ that is differentiable, the Koopman opera- discrete-time version of Sequential Action Control (SAC) [20]
tor approximation to the dynamical system can be linearized: is used with the Koopman operator to generate closed-loop
∂Ψ optimal control calculations. However, any MPC technique
xk+1 ≈ K̂ T xk (16) can be used with the Koopman operator. Here, SAC operates
∂x
≈ A(xk )xk . (17) by first forward simulating an open-loop trajectory for some
horizon N for a control-affine dynamical system given by
Control inputs are readily incorporated to the definition of xk+1 = f (xk , uk ) = g(xk ) + h(xk )uk . (24)
Ψ as an augmented state,
The sensitivity to a control injection for any given discrete
Ψ(x, u) = [xT , uT , ψ1 (x, u), ψ2 (x, u), . . . , ψN (x, u)]. (18) time of the objective function is given as
dJ
This yields the approximate dynamical equations, = ρk (f2 (k) − f1 (k)) (25)
dλk
xk+1 ≈ K̂ T Ψ(xk , uk )T (19) where
and the linearization of the approximate dynamical equations, f1 (k) = f (xk , u0,k ), (26)
f2 (k) = f (xk , u?k ) (27)
∂Ψ ∂Ψ
xk+1 ≈ K̂ T xk + K̂ T uk (20)
∂x ∂u are the dynamics subject to the default control u0,k and derived
≈ A(xk , uk )xk + B(xk , uk )uk . (21) control u?k . The co-state variable ρk ∈ Rn is computed by
backwards simulating the following discrete equation
Note that linearizable equations of motion of a dynamical
system can be computed solely from data. ∂lk ∂fk T
ρk−1 = + ρk , (28)
∂x ∂x
B. Optimal Control Problem where lk = 12 (xk − x̃k )T P(xk − x̃k ) + 21 uTk Ruk and fk =
f (xk , u0,k ) for some default u0,k subject to ρN = ~0. The
Control synthesis for trajectory optimization is generated optimal control u∗k is computed by first defining a secondary
for mobile robot dynamics of the form objective function as
N
xk+1 = f (xk , uk ), (22) X 1 dJ 1
Ju = ( − αd )2 + ku?k − u0,k k2R . (29)
n m
2 dλk 2
where x ∈ R is the state and u ∈ R is the control input. For k=0
a discrete system, we can solve for a trajectory that minimizes The objective (29) is now convex in u∗k and has a minimizer
the objective defined as when
N
u∗k = (Λ + RT )−1 h(xk )T ρk αd + u0,k , (30)
X 1 1
J= (xk − x̃k )T P(xk − x̃k ) + uTk Ruk , (23) where Λ = h(xk )T ρk ρTk h(xk ). Given the sequence of actions
2 2
k=0 u?k , it is then possible to calculate the time of control appli-
cation t?k as
where P ∈ Rn×n and R ∈ Rm×m are positive definite weight
dJ
matrices on state and control and x̃k is the reference trajectory t?k = argmin . (31)
at time k. Note that the accuracy of the system model (22) will dλk
largely determine the effectiveness of the synthesized optimal The control duration in discrete time is found using an outward
control. line search [21] for a sufficient descent on the cost.
seeking to approximated a high dimensional model, a reduced
state model was sought.
Figure 1 shows a closer look at the SPRK robot. Odometry
is collected using a Xbox Kinect with OpenCV [23] image
processing. More details about odometry and motion capture
are stated in the caption of Fig. 1.
B. SPRK Koopman Operator
The representation of the system consists of the position
of the robot (x, y), its velocity (ẋ, ẏ), and the commanded
velocity (ux , uy ). Odometry data from the Kinect paired
with recorded velocity commands are used to generate the
approximate Koopman operator. The vector-valued functions
used in this experiment are polynomial basis functions given
Fig. 1. Sphero SPRK Robot is shown with its clear spherical casing revealing as
the underlying mechanism. The internal mechanism shifts the center of mass
by rolling and rotating within the spherical enclosure, causing the SPRK to Ψ(x) = [x, y, ẋ, ẏ, ux , uy , 1, ψ1 , ψ2 , . . . , ψM ] (32)
roll. RGB LEDs on the top of the SPRK are utilized to track the odometry
of the robot through an Xbox Kinect with OpenCV and OpenKinect libraries ψi (x) = ẋαi ẏ βi (33)
for image processing and motion capture. ROS [22] is used to transmit and
collect data at 20Hz. where αi , and βi are nonnegative integers, index i tabulates
all the combinations such that αi + βi ≤ Q and Q > 1 defines
the largest allowed polynomial degree. We ignore higher order
IV. E XPERIMENTS U SING S PHERO SPRK position dependence in the operator in order to prevent any
In this section, we describe the experimental set-up for possible overfitting of position-based external disturbances.
use of the Sphero SPRK robot with model-based control The approximated Koopman operator was computed using
algorithms that utilize a state-space model generated via the data captured when the robot was operating at velocity under
Koopman operator. In particular, we define data-driven closed- 1 m/s for the open-loop trails.
and open-loop model predictive controllers as well as motivate V. R ESULTS
and explore the utility of Koopman operator for control of a A. Simulation: Mechanical Energy
robotic system.
In this section, the equations of motion of a double pendu-
In the experiments with the SPRK, trajectory optimization
lum are approximated with the method described in Section II.
is run both in open-loop form and closed-loop feedback form.
The mass of both pendulums are 1 kilogram and the lengths
Here, the tracked states of the robot are position x, y and
of both are 1 meter. The mass of the pendulums are assumed
velocity ẋ, ẏ and inputs to the robot are desired velocities
to be concentrated at their ends. The system is conservative
u1 , u2 . The objective function parameters are defined as
and subject to a gravitational field (9.81 m/s2 ).
P = diag([60, 60, 0.1, 0.1]) and R = diag([20, 20]) and are
The state of the system, x, is described by the relative
maintained constant through both open-loop and closed-loop
angles of the pendulums with respect to the vertical (θ1 and
experiments. An additional set of experiments are done to
θ2 ) and their time derivatives (θ̇1 and θ̇2 ). Data was collected
show the use of the Koopman operator for control in a sand
by simulating the system multiple times with random initial
environment.
conditions given by
A. SPRK x0 = [U(−1, 1)lθ1 , U(−1, 1)lθ2 , U(−1, 1)lθ̇1 , U(−1, 1)lθ̇2 ]
The SPRK is a differential drive mobile robot enclosed in a where U(−1, 1) is an uniformly distributed random variable
spherical case. The dynamics of the SPRK are driven by the with range −1 to 1. Furthermore, lθ1 = lθ2 = π3 and
nonlinear coupling between the internal mechanism and the lθ̇1 = lθ̇2 = 0.5. Therefore, the initial condition is uniformly
outer spherical encasing. In addition, proprietary underlying distributed around the origin (and the stable equilibrium) and
controllers govern how the command velocities are interpreted its range is defined by L = [lθ1 , lθ2 , lθ̇1 , lθ̇2 ]. Any data point
to low-level motors. The proprietary embedded software uses that fall outside of the range defined by L was not used to
the on-board gyro-accelerometers to balance the robot up- approximate the Koopman operator. Data collection occurred
right while rolling. The caster wheels on top of the internal at 100 Hz and was stopped when 2, 000 data points were
mechanism ensures constant contact of the lower wheels that collected.
are driven via two motors. The embedded software interfaces The vector-valued functions used in this numerical experi-
with heading and velocity (or x − y velocity) command inputs ment are polynomial basis functions give as
sent via Bluetooth communication. A high fidelity model of Ψ(x) = [θ1 , θ2 , θ̇1 , θ̇2 , 1, ψ1 , ψ2 , . . . , ψM ] (34)
the robot would include several internal states characterize αi βi γi δi
the internal mechanism and controller. However, rather than ψi (x) = (θ1 /lθ1 ) (θ2 /lθ2 ) (θ̇1 /lθ̇1 ) (θ̇2 /lθ̇2 ) (35)
Q=1 Q=2 Q=3
Simulated 10 5
Predicted
0.4 0.4 0.4
Prediction Error
0.2 0.2 0.2 10 0
0 0 0
Fig. 2. Simulated trajectories when the approximate Koopman operator was used to propagated the system’s configuration. As the complexity Ψ increases,
so does the accuracy in prediction. 100 trials with uniformly random initial conditions were conducted to invesgate the relationship between accuracy and
total mechanical energy. The prediction error tended to increase with total mechanical energy.
Fourier Basis
time (s)
Polynomial Basis
Fourier Basis
Polynomial Basis
Attempted Swing Up
2
x (m)
20 time (s) 40 60
-1
Fig. 3. The progressive improvement in control as the Koopman operator increases the basis order of complexity Q is shown. Each pendulum configuration
is taken as a snapshot in time. Koopman operators with complexity Q are trained on the initial first 20 seconds with the nominal model. Note that because
of the SO(1) configuration of the pendulum, a Fourier basis of complexity Q = 1 is sufficient to invert at stabilize the cart-pendulum. Adding a higher
complexity Q = 2 does not provide a different Koopman matrix (this does not necessarily hold true for non-simulated systems). It is interesting to note
that as the complexity of the polynomial basis increases, so do the number of attempts at swinging up the cart-pendulum. Link to multimedia provided:
https://vimeo.com/219458009 .
[24]. For this example, the problem of inverting the pendulum the Fourier basis functions is used, the controller generates
attached to a VTOL is slightly modified. Specifically, it is the appropriate control strategy to swing up and invert the
assumed that a well known model of the VTOL exists, but pendulum.
the interaction between the VTOL and the pendulum remains In the following section, our discussion on the use of the
unknown. Thus, the goal of this simulated example is to Koopman operator is extended to control of a Sphero SPRK
generate a Koopman operator that describes the interaction robot in a reduced state setting.
of the VTOL on the pendulum.
In this example, the Koopman operator is redefined as an C. SPRK Experiments
augmentation to a dynamical system 1) Open-Loop Trajectory Optimization: Figure 5 shows
xk+1 = f (xk , uk ) + K̃ Ψ(xk , uk ) . T T
(41) trajectories generated using the open-loop controller with
varying Q. The reference trajectory is given as
By subtracting the current nominal model of the system
f (xk , uk ) from both side in equation (41) and treating xk+1 x̃ r cos(vt)
ỹ r sin(2vt)
as the measurement of state, we can define the following as =
x̃˙ −rv sin(vt) . (43)
a nonlinear process that can be used to generate a Koopman
operator: ỹ˙ 2rv cos(2vt)
Ψ(xk+1 ) = xk+1 − f (xk , uk ) = K̃ T Ψ(xk , uk )T . (42) where r = 0.5 and v = 1.3. The reference trajectory was
made sufficiently aggressive to excite the system’s internal
Given the previous cart-pendulum result, we see that the nonlinearities.
interaction between the VTOL and the pendulum can be As expected, the system improves in performance when
captured solely via a vast set of basis functions across the state tracking the reference trajectory with increasing Q. In par-
of the VTOL-pendulum system. In Fig. 4, the VTOL is shown ticular, as Q goes from 1 to 2, less drift in the resulting open-
attempting to invert and balance the pendulum attached with loop trajectory is visually noted at the end of the path. As Q is
the use of the Koopman operator. Each sequential Koopman further increased, more complexity is added to the description
operator with increasing complexity is generated from the of the SPRK via the Koopman operator which in turn reduces
first 20 seconds worth of data. Originating from the nominal drift and improves the tracking performance. Furthermore, the
model, it can be seen that the swinging behavior captures a standard deviation of tracking error across trials is shown to
portion of the energy pumping maneuvers required to invert reduce as Q is increased. This implies both consistency in the
the pendulum. As the Koopman basis order increases, so does behavior of the robot subject to the controller. Therefore, it
the refinement in control authority. When Q = 2 for the can be concluded that the approximated Koopman operator
polynomial basis, it can be seen that swing up attempts are is better able to represent the dynamics of the system by
more successful. Once the Koopman operator generated from increasing the complexity of Ψ.
Nominal Model Polynomial (Q=1) Polynomial (Q=2) Fourier (Q=1)
Pendulum end
point
time
Fig. 4. Each Koopman operator is trained on the residual modeling error of 20 seconds attempted pendulum inversion using the nominal model. As the order
of the polynomial basis increases from 1 → 2, the number of swing up attempts also increases. Notably, a first order Fourier basis captures the necessary
features that allow the controller to invert and stabilize the pendulum. Link to multimedia provided: https://vimeo.com/219458009 .
Standard Deviation
Integrated Error
y (m)
Fig. 5. Here we show reference tracking using open-loop trajectory optimization. The reference trajectory was made sufficiently aggressive to excite the
system’s internal nonlinearities that cannot be captured completely by the minimal state representation. Respective integrated tracking errors are shown to
decrease with an increase in Q. This suggests that the approximate Koopman operator better represents the dynamics of the system with increasing complexity
of Ψ.
2) Closed-Loop Trajectory Tracking: Figure 6 shows the Koopman operator did not have a sparse enough data set that
experimental results for trajectory tracking on a tarp and sand spans the higher order terms in the operator. This can be fixed
terrain using closed-loop model-based controllers with the by collecting more data that spans the robot’s operating region.
Koopman operator. The optimal control signal was updated at Here, the nonlinear dynamics driven by the internal mech-
20 Hz and the reference trajectory was given by equation (43) anism become more apparent as the order of the operator is
where r is split into two components, rx = 0.7 and ry = 0.4, increased. In particular, equation (6) provides some insight
with v = 0.9. The nominal linear model is given by into the output of the data-driven model of the Koopman
operator for the update equation of the SPRK’s velocity subject
xk+1 = Axk + Buk , (44) to control inputs. Because the effect of the internal mecha-
nism’s configuration (typically described on SO(3)) cannot
where A and B are defined as a fully controllable double
be linearly approximated, the Koopman operator begins to
integrator system.
approximate a Taylor expansion (6). Therefore, the Koopman
The effectiveness of the closed-loop controller is bench-
operator captures the inherent nonlinearities that are utilized
marked by comparing the model generated from the Koopman
by the model-based controller with respect to the terrain.
operator to that of a simulated example of the controller
However, achieving a representation that performs consistently
knowing the true system model (Fig. 6). Using only the first
across all operating terrains seems infeasible with such limited
20 seconds worth of data from the nominal model controller,
information, without extra structure on the Koopman operator,
we can see in Fig. 6 A) that as the operator increases in com-
such as global Lie group structure or mechanical properties
plexity, so shows the performance of the controller relative to
(e.g. symmetries). VI. C ONCLUSION
the benchmark test. Specifically, Fig. 6 B) shows the tracking
error for experimental trials with increasing complexity of We present Koopman operator theory and focus on the
the Koopman operator. Notably, when Q = 3 in sand, the practical implementation of the theory for model-based con-
Target Trajectory
A) Experimental Results with Sphero SPRK Tarp Results
Sand Results B) Tracking Error Across
1.0 Simulated Baseline
Nominal Q=1 Q=2 Q=3 Operator Orders
Model
x (m)
Integrated Error
-1.0
1.0
y (m)
125
0 1 2 3
-1.0 Basis Order
0 20 40 60 80
Time (s)
Fig. 6. Here, we show closed-loop model-based control using sequentially increasing basis complexity, Q in the Koopman operator. Two examples using
the SPRK robot are run on a tarp and on sand. A baseline simulated example is provided to show the best-case performance of the controller subject to the
nominal model used. As the complexity of the operator’s basis function is increased, so does the performance of the tracking. Note that in B), the 3rd order
operator used in sand (shown as the dashed red line) did not have a sparse enough set of data to provide a stable model, although it performed better than
the nominal model. Link to multimedia provided: https://vimeo.com/219458009 .
0.08xk − 0.35yk + 0.76ẋk + 0.21ẏk + 1.06u1 − 0.17u2 − 0.05ẋ2k − 0.19ẏk2 − 1.09ẋk ẏk2 − 0.71ẏk ẋ2k + 0.40
ẋk+1
=
ẏk+1 −0.06xk + 0.16yk + 0.17ẋk + 0.87ẏk − 0.38u1 + 0.57u2 − 0.20ẋ2k − 0.52ẏk2 − 0.45ẋk ẏk2 − 3.17ẏk ẋ2k − 0.04
(6)
trol. We derive a linearizable data-driven model using the [9] M. Jordan and T. Mitchell, “Machine learning: Trends, perspectives, and
Koopman operator. Closed-loop and open-loop controllers prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015.
[10] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally weighted learning
were formulated using the proposed data-driven model. The for control,” in Lazy learning. Springer, 1997, pp. 75–113.
open-loop experiments reveal the Koopman operator improves [11] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots,
performance as the complexity of the basis increases. Closed- and E. A. Theodorou, “Information theoretic mpc for model-based
reinforcement learning,” International Conference on Robotics and
loop experiments reveal the Koopman operator is able to Automation (ICRA), 2017.
capture the nonlinear dynamics of simulated examples with [12] B. O. Koopman, “Hamiltonian systems and transformation in Hilbert
the cart- and VTOL-pendulum and the SPRK robot. space,” Proceedings of the National Academy of Sciences, vol. 17, no. 5,
pp. 315–318, 1931.
Future research directions include an in-depth analysis of [13] D. Henrion, I. Mezic, and M. Putinar, “Applied Koopmanism,” 2016.
the choice of basis for dynamical system with distinct structure [14] I. Mezić, “Analysis of fluid flows via spectral properties of the Koopman
(e.g. conservative systems, mechanical systems, etc.). The operator,” Annual Review of Fluid Mechanics, vol. 45, pp. 357–378,
2013.
relationship between available states and the accuracy of the [15] A. Mauroy and I. Mezić, “Global stability analysis using the eigen-
approximate Koopman operator needs rigorous stability anal- functions of the Koopman operator,” IEEE Transactions on Automatic
ysis. Moreover, numerical stability analysis and algorithmic Control, vol. 61, no. 11, pp. 3356–3369, 2016.
[16] I. Mezić, “On applications of the spectral theory of the Koopman
optimization is another possible research avenue. operator in dynamical systems and control theory,” in Decision and
Control (CDC), 2015, pp. 7034–7041.
R EFERENCES [17] A. Broad, T. D. Murphey, and B. Argall, “Learning models for shared
[1] N. Hovakimyan and C. Cao, L1 Adaptive Control Theory: Guaranteed control of human-machine systems with unknown dynamics,” Robotics:
Robustness with Fast Adaptation. SIAM, 2010. Science and Systems Proceedings, 2017.
[2] K. J. Åström and B. Wittenmark, Adaptive control. Courier Corporation, [18] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data–driven
2013. approximation of the koopman operator: Extending dynamic mode
[3] K. Zhou and J. C. Doyle, Essentials of robust control. Prentice hall decomposition,” Journal of Nonlinear Science, vol. 25, no. 6, pp. 1307–
Upper Saddle River, NJ, 1998, vol. 104. 1346, 2015.
[4] D. S. Bernstein and W. M. Haddad, “LQG control with an H/sup infin- [19] J. Hauser, “A projection operator approach to the optimization of
ity/performance bound: a Riccati equation approach,” IEEE Transactions trajectory functionals,” IFAC Proceedings Volumes, vol. 35, no. 1, pp.
on Automatic Control, vol. 34, no. 3, pp. 293–305, 1989. 377–382, 2002.
[5] S. C. Ong, S. W. Png, D. Hsu, and W. S. Lee, “Planning under [20] A. R. Ansari and T. D. Murphey, “Sequential action control: Closed-
uncertainty for robotic tasks with mixed observability,” The International form optimal control for nonlinear and nonsmooth systems,” IEEE
Journal of Robotics Research, vol. 29, no. 8, pp. 1053–1068, 2010. Transactions on Robotics, vol. 32, no. 5, pp. 1196–1214, Oct 2016.
[6] A. Bry and N. Roy, “Rapidly-exploring random belief trees for motion [21] J. Nocedal and S. J. Wright, “Numerical optimization 2nd,” 2006.
planning under uncertainty,” in International Conference on Robotics [22] M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs,
and Automation (ICRA), 2011, pp. 723–730. R. Wheeler, and A. Y. Ng, “ROS: an open-source robot operating
[7] J. Van Den Berg, S. Patil, and R. Alterovitz, “Motion planning under system,” in ICRA Workshop on Open Source Software, 2009.
uncertainty using differential dynamic programming in belief space,” in [23] G. Bradski, Dr. Dobb’s Journal of Software Tools, 2000.
Robotics Research. Springer, 2017, pp. 473–490. [24] T. Luukkonen, “Modelling and control of quadcopter,” Independent
[8] D. Nguyen-Tuong and J. Peters, “Model learning for robot control: a research project in applied mathematics, Espoo, 2011.
survey,” Cognitive processing, vol. 12, no. 4, pp. 319–340, 2011.