0% found this document useful (0 votes)
49 views14 pages

Datascience

This document discusses bridging data science and dynamical systems theory by combining ideas from the theory of dynamical systems with machine learning. It describes using mathematical techniques for statistical analysis and prediction of time-evolving phenomena, from simple systems like oscillators to highly complex systems like turbulent fluid dynamics. The key idea is that combining dynamical systems theory with machine learning provides data-driven models of complex systems with interpretable and refinable predictions as more data becomes available.

Uploaded by

joseluisap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views14 pages

Datascience

This document discusses bridging data science and dynamical systems theory by combining ideas from the theory of dynamical systems with machine learning. It describes using mathematical techniques for statistical analysis and prediction of time-evolving phenomena, from simple systems like oscillators to highly complex systems like turbulent fluid dynamics. The key idea is that combining dynamical systems theory with machine learning provides data-driven models of complex systems with interpretable and refinable predictions as more data becomes available.

Uploaded by

joseluisap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Bridging Data Science and

Dynamical Systems Theory

Tyrus Berry, Dimitrios Giannakis, and John Harlim


Modern science is undergoing what might arguably be In this short review, we describe mathematical tech-
called a “data revolution,” manifested by a rapid growth niques for statistical analysis and prediction of time-
of observed and simulated data from complex systems, as evolving phenomena, ranging from simple examples such
well as vigorous research on mathematical and computa- as an oscillator, to highly complex systems such as the tur-
tional frameworks for data analysis. In many scientific bulent motion of the Earth’s atmosphere, the folding of
branches, these efforts have led to the creation of statistical proteins, and the evolution of species populations in an
models of complex systems that match or exceed the skill ecosystem. Our main thesis is that combining ideas from
of first-principles models. Yet, despite these successes, sta- the theory of dynamical systems with learning theory pro-
tistical models are oftentimes treated as black boxes, pro- vides an effective route to data-driven models of complex
viding limited guarantees about stability and convergence systems, with refinable predictions as the amount of train-
as the amount of training data increases. Black-box mod- ing data increases, and physical interpretability through
els also offer limited insights about the operating mecha- discovery of coherent patterns around which the dynam-
nisms (physics), the understanding of which is central to ics is organized. Our article thus serves as an invitation to
the advancement of science. explore ideas at the interface of the two fields.
This is a vast subject, and invariably a number of impor-
tant developments in areas such as deep learning, reservoir
Tyrus Berry is an assistant professor of mathematics at George Mason University.
His email address is [email protected].
computing, control, and nonautonomous/stochastic sys-
Dimitrios Giannakis is an associate professor of mathematics at New York Uni- tems are not discussed here.1 Our focus will be on topics
versity. His email address is [email protected]. drawn from the authors’ research and related work.
John Harlim is a professor of mathematics and meteorology, and Faculty Fellow
of the Institute for Computational and Data Sciences, at the Pennsylvania State Statistical Forecasting
University. His email address is [email protected]. and Coherent Pattern Extraction
Communicated by Notices Associate Editor Reza Malek-Madani. Consider a dynamical system of the form Φ𝑡 ∶ Ω → Ω,
For permission to reprint this article, please contact:
where Ω is the state space and Φ𝑡 , 𝑡 ∈ ℝ, the flow map. For
[email protected].
1
See https://arxiv.org/abs/2002.07928 for a version of this article with
DOI: https://doi.org/10.1090/noti2151 references to the literature on these topics.

1336 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9


example, Ω could be Euclidean space ℝ𝑑 , or a more general focus will be on nonparametric methods, which do not em-
manifold, and Φ𝑡 the solution map for a system of ODEs ploy explicit parametric models for the dynamics. In-
defined on Ω. Alternatively, in a PDE setting, Ω could be stead, they use universal structural properties of dynam-
an infinite-dimensional function space and Φ𝑡 an evolu- ical systems to inform the design of data analysis tech-
tion group acting on it. We consider that Ω has the struc- niques. From a learning standpoint, Problems 1 and 2
ture of a metric space equipped with its Borel 𝜎-algebra, can be thought of as supervised and unsupervised learning,
playing the role of an event space, with measurable func- respectively. A mathematical requirement we will impose
tions on Ω acting as random variables, called observables. on methods addressing either problem is that they have a
In a statistical modeling scenario, we consider that avail- well-defined notion of convergence, i.e., they are refinable,
able to us are time series of various such observables, sam- as the number 𝑁 of training samples increases.
pled along a dynamical trajectory which we will treat as be-
ing unknown. Specifically, we assume that we have access Analog and POD Approaches
to two observables, 𝑋 ∶ Ω → 𝒳 and 𝑌 ∶ Ω → 𝒴, respec- Among the earliest examples of nonparametric forecast-
tively referred to as covariate and response functions, to- ing techniques is Lorenz’s analog method [Lor69]. This
gether with corresponding time series 𝑥0 , 𝑥1 , … , 𝑥𝑁−1 and simple, elegant approach makes predictions by tracking
𝑦0 , 𝑦1 , … , 𝑦𝑁−1 , where 𝑥𝑛 = 𝑋(𝜔𝑛 ), 𝑦𝑛 = 𝑌 (𝜔𝑛 ), and 𝜔𝑛 = the evolution of the response along a dynamical trajectory
Φ𝑛 ∆𝑡 (𝜔0 ). Here, 𝒳 and 𝒴 are metric spaces, Δ𝑡 is a pos- in the training data (the analogs). Good analogs are se-
itive sampling interval, and 𝜔0 is an arbitrary point in Ω lected according to a measure of geometrical similarity be-
initializing the trajectory. We shall refer to the collection tween the covariate variable observed at forecast initializa-
{(𝑥0 , 𝑦0 ), … , (𝑥𝑁−1 , 𝑦𝑁−1 )} as the training data. We require tion and the covariate training data. This method posits
that 𝒴 be a Banach space (so that one can talk about expec- that past behavior of the system is representative of its fu-
tations and other functionals applied to 𝑌 ), but allow the ture behavior, so looking up states in a historical record
covariate space 𝒳 to be nonlinear. that are closest to current observations is likely to yield a
Many problems in statistical modeling of dynamical sys- skillful forecast. Subsequent methodologies have also em-
tems can be expressed in this framework. For instance, in phasized aspects of state space geometry, e.g., using the
a low-dimensional ODE setting, 𝑋 and 𝑌 could both be training data to approximate the evolution map through
the identity map on Ω = ℝ𝑑 , and the task could be to patched local linear models, often leveraging delay coordi-
build a model for the evolution of the full system state. nates for state space reconstruction.
Weather forecasting is a classical high-dimensional appli- Early approaches to coherent pattern extraction include
cation, where Ω is the abstract state space of the climate the proper orthogonal decomposition (POD), which is
system, and 𝑋 a (highly noninvertible) map represent- closely related to principal component analysis (PCA, in-
ing measurements from satellites, meteorological stations, troduced in the early twentieth century by Pearson), the
and other sensors available to a forecaster. The response Karhunen–Loève expansion, and empirical orthogonal
𝑌 could be temperature at a specific location, 𝒴 = ℝ, il- function (EOF) analysis. Assuming that 𝒴 is a Hilbert
𝐿
lustrating that the response space may be of considerably space, POD yields an expansion 𝑌 ≈ 𝑌𝐿 = ∑𝑗=1 𝑧𝑗 , 𝑧𝑗 =
lower dimension than the covariate space. In other cases, 𝑢𝑗 𝜎𝑗 𝜓𝑗 . Arranging the data into a matrix 𝐘 = (𝑦0 , … , 𝑦𝑁−1 ),
e.g., forecasting the temperature field over a geographical the 𝜎𝑗 are the singular values of 𝐘 (in decreasing order),
region, 𝒴 may be a function space. The two primary ques- the 𝑢𝑗 are the corresponding left singular vectors, called
tions that will concern us here are: EOFs, and the 𝜓𝑗 are given by projections of 𝑌 onto the
Problem 1 (Statistical forecasting). Given the training EOFs, 𝜓𝑗 (𝜔) = ⟨𝑢𝑗 , 𝑌 (𝜔)⟩𝒴 . That is, the principal compo-
data, construct (“learn”) a function 𝑍𝑡 ∶ 𝒳 → 𝒴 that pre- nent 𝜓𝑗 ∶ Ω → ℝ is a linear feature characterizing the un-
dicts 𝑌 at a lead time 𝑡 ≥ 0. That is, 𝑍𝑡 should have the supervised data {𝑦0 , … , 𝑦𝑁−1 }. If the data is drawn from a
property that 𝑍𝑡 ∘ 𝑋 is closest to 𝑌 ∘ Φ𝑡 among all functions probability measure 𝜇, as 𝑁 → ∞ the POD expansion is
in a suitable class. optimal in an 𝐿2 (𝜇) sense; that is, 𝑌𝐿 has minimal 𝐿2 (𝜇)
error ‖𝑌 − 𝑌𝐿 ‖𝐿2 (𝜇) among all rank-𝐿 approximations of
Problem 2 (Coherent pattern extraction). Given the train- 𝑌 . Effectively, from the perspective of POD, the important
ing data, identify a collection of observables 𝑧𝑗 ∶ Ω → 𝒴 components of 𝑌 are those capturing maximal variance.
that have the property of evolving coherently under the dy- Despite many successes in challenging applications
namics. By that, we mean that 𝑧𝑗 ∘ Φ𝑡 should be relatable (e.g., turbulence), it has been recognized that POD may
to 𝑧𝑗 in a natural way. not reveal dynamically significant observables, offering
These problems have an extensive history of study limited predictability and physical insight. In recent years,
from an interdisciplinary perspective spanning mathemat- there has been significant interest in techniques that ad-
ics, statistics, physics, and many other fields. Here, our dress this shortcoming by modifying the linear map 𝐘 to

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1337


have an explicit dependence on the dynamics [BK86], or re- invariant measure supported on the famous “butterfly”
placing it by an evolution operator [DJ99, Mez05]. Either fractal attractor; see Figure 1. L63 exemplifies the fact that a
directly or indirectly, these methods make use of operator- smooth dynamical system may exhibit invariant measures
theoretic ergodic theory, which we now discuss. with nonsmooth supports. This behavior is ubiquitous in
models of physical phenomena, which are formulated in
Operator-Theoretic Formulation terms of smooth differential equations, but whose long-
The operator-theoretic formulation of dynamical systems term dynamics concentrate on lower-dimensional subsets
theory shifts attention from the state-space perspective, of state space due to the presence of dissipation. Our meth-
and instead characterizes the dynamics through its action ods should therefore not rely on the existence of a smooth
on linear spaces of observables. Denoting the vector space structure for 𝐴.
of 𝒴-valued functions on Ω by ℱ, for every time 𝑡 the dy- In the setting of ergodic, measure-preserving dynam-
namics has a natural induced action 𝑈 𝑡 ∶ ℱ → ℱ given ics on a metric space, two relevant structures that the dy-
by composition with the flow map, 𝑈 𝑡 𝑓 = 𝑓 ∘ Φ𝑡 . It namics may be required to preserve are continuity and
then follows by definition that 𝑈 𝑡 is a linear operator; i.e., 𝜇-measurability of observables. If the flow Φ𝑡 is contin-
𝑈 𝑡 (𝛼𝑓 + 𝑔) = 𝛼𝑈 𝑡 𝑓 + 𝑈 𝑡 𝑔 for all observables 𝑓, 𝑔 ∈ ℱ and uous, then the Koopman operators act on the Banach
every scalar 𝛼 ∈ ℂ. The operator 𝑈 𝑡 is known as a com- space ℱ = 𝐶(𝐴, 𝒴) of continuous, 𝒴-valued functions on
position operator, or Koopman operator after classical work 𝐴, equipped with the uniform norm, by isometries, i.e.,
of Bernard Koopman in the 1930s [Koo31], which estab- ‖𝑈 𝑡 𝑓‖ℱ = ‖𝑓‖ℱ . If Φ𝑡 is 𝜇-measurable, then 𝑈 𝑡 lifts to an
lished that a general (potentially nonlinear) dynamical sys- operator on equivalence classes of 𝒴-valued functions in
tem can be characterized through intrinsically linear oper- 𝐿𝑝 (𝜇, 𝒴), 1 ≤ 𝑝 ≤ ∞, and acts again by isometries. If 𝒴 is a
ators acting on spaces of observables. A related notion is Hilbert space (with inner product ⟨⋅, ⋅⟩𝒴 ), the case 𝑝 = 2 is
that of the transfer operator, 𝑃𝑡 ∶ ℳ → ℳ, which describes special, since 𝐿2 (𝜇, 𝒴) is a Hilbert space with inner product
the action of the dynamics on a space of measures ℳ via ⟨𝑓, 𝑔⟩𝐿2 (𝜇,𝒴) = ∫Ω ⟨𝑓(𝜔), 𝑔(𝜔)⟩𝒴 𝑑𝜇(𝜔), on which 𝑈 𝑡 acts as a
the pushforward map, 𝑃 𝑡 𝑚 ∶= Φ𝑡∗ 𝑚 = 𝑚 ∘ Φ−𝑡 . In a num- unitary map, 𝑈 𝑡∗ = 𝑈 −𝑡 .
ber of cases, ℱ and ℳ are dual spaces to one another (e.g., Clearly, the properties of approximation techniques for
continuous functions and Radon measures), in which case observables and evolution operators depend on the under-
𝑈 𝑡 and 𝑃𝑡 are dual operators. lying space. For instance, 𝐶(𝐴, 𝒴) has a well-defined no-
If the space of observables under consideration is tion of pointwise evaluation at every 𝜔 ∈ Ω by a contin-
equipped with a Banach or Hilbert space structure, and the uous linear map 𝛿𝜔 ∶ 𝐶(𝐴, 𝒴) → 𝒴, 𝛿𝜔 𝑓 = 𝑓(𝜔), which
dynamics preserves that structure, the operator-theoretic is useful for interpolation and forecasting, but lacks an
formulation allows a broad range of tools from spectral inner-product structure and associated orthogonal projec-
theory and approximation theory for linear operators to be tions. On the other hand, 𝐿2 (𝜇) has inner-product struc-
employed in the study of dynamical systems. For our pur- ture, which is very useful theoretically as well as for nu-
poses, a particularly advantageous aspect of this approach merical algorithms, but lacks the notion of pointwise eval-
is that it is amenable to rigorous statistical approximation, uation.
which is one of our principal objectives. It should be kept Letting ℱ stand for any of the 𝐶(𝐴, 𝒴) or 𝐿𝑝 (𝜇, 𝒴) spaces,
in mind that the spaces of observables encountered in ap- the set 𝑈 = {𝑈 𝑡 ∶ ℱ → ℱ}𝑡∈ℝ forms a strongly con-
plications are generally infinite-dimensional, leading to tinuous group under composition of operators. That is,
behaviors with no counterparts in finite-dimensional lin- 𝑈 𝑡 ∘ 𝑈 𝑠 = 𝑈 𝑡+𝑠 , 𝑈 𝑡,−1 = 𝑈 −𝑡 , and 𝑈 0 = Id, so that 𝑈
ear algebra, such as unbounded operators and continuous is a group, and for every 𝑓 ∈ ℱ, 𝑈 𝑡 𝑓 converges to 𝑓 in
spectrum. In fact, as we will see below, the presence of the norm of ℱ as 𝑡 → 0. A central notion in such evolu-
continuous spectrum is a hallmark of mixing (chaotic) dy- tion groups is that of the generator, defined by the ℱ-norm
namics. limit 𝑉𝑓 = lim𝑡→0 (𝑈 𝑡 𝑓 − 𝑓)/𝑡 for all 𝑓 ∈ ℱ for which the
In this review, we restrict attention to the operator- limit exists. It can be shown that the domain 𝐷(𝑉) of all
theoretic description of measure-preserving, ergodic dynam- such 𝑓 is a dense subspace of ℱ, and 𝑉 ∶ 𝐷(𝑉) → ℱ is a
ics. By that, we mean that there is a probability measure closed, unbounded operator. Intuitively, 𝑉 can be thought
𝜇 on Ω such that (i) 𝜇 is invariant under the flow, i.e., of as a directional derivative of observables along the dy-
Φ𝑡∗ 𝜇 = 𝜇; and (ii) every measurable, Φ𝑡 -invariant set has namics. For example, if 𝒴 = ℂ, 𝐴 is a 𝐶 1 manifold, and
either zero or full 𝜇-measure. We also assume that 𝜇 is the flow Φ𝑡 ∶ 𝐴 → 𝐴 is generated by a continuous vector
a Borel measure with compact support 𝐴 ⊆ Ω; this set field 𝑉 ⃗ ∶ 𝐴 → 𝑇𝐴, then the generator of the Koopman
is necessarily Φ𝑡 -invariant. An example known to rigor- group on 𝐶(𝐴) has as its domain the space 𝐶 1 (𝐴) ⊂ 𝐶(𝐴)
ously satisfy these properties is the Lorenz 63 (L63) system of continuously differentiable, complex-valued functions,
on Ω = ℝ3 , which has a compactly supported, ergodic and 𝑉𝑓 = 𝑉 ⃗ ⋅ ∇𝑓 for 𝑓 ∈ 𝐶 1 (𝐴). A strongly continuous

1338 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9


evolution group is completely characterized by its genera- Data-Driven Forecasting
tor, as any two such groups with the same generator are Based on the concepts introduced above, one can formu-
identical. late statistical forecasting in Problem 1 as the task of con-
The generator acquires additional properties in the set- structing a function 𝑍𝑡 ∶ 𝒳 → 𝒴 on covariate space 𝒳,
ting of unitary evolution groups on 𝐻 = 𝐿2 (𝜇, 𝒴), where it such that 𝑍𝑡 ∘ 𝑋 optimally approximates 𝑈 𝑡 𝑌 among all
is skew-adjoint, 𝑉 ∗ = −𝑉. Note that the skew-adjointness functions in a suitable class. We set 𝒴 = ℂ, so the re-
of 𝑉 holds for more general measure-preserving dynam- sponse variable is scalar-valued, and consider the Koop-
ics than Hamiltonian systems, whose generator is skew- man operator on 𝐿2 (𝜇), so we have access to orthogonal
adjoint with respect to Lebesgue measure. By the spectral projections. We also assume for now that the covariate
theorem for skew-adjoint operators, there exists a unique function 𝑋 is injective, so 𝑌𝑡̂ ∶= 𝑍𝑡 ∘ 𝑋 should be able
projection-valued measure 𝐸 ∶ ℬ(ℝ) → 𝐵(𝐻), giving the to approximate 𝑈 𝑡 𝑌 to arbitrarily high precision in 𝐿2 (𝜇)
generator and Koopman operator as the spectral integrals norm. Indeed, let {𝑢0 , 𝑢1 , …} be an orthonormal basis of
𝐿2 (𝜈), where 𝜈 = 𝑋∗ 𝜇 is the pushforward of the invari-
𝑉 = ∫ 𝑖𝛼 𝑑𝐸(𝛼), 𝑈 𝑡 = 𝑒𝑡𝑉 = ∫ 𝑒𝑖𝛼𝑡 𝑑𝐸(𝛼). ant measure onto 𝒳. Then, {𝜙0 , 𝜙1 , …} with 𝜙𝑗 = 𝑢𝑗 ∘ 𝑋
ℝ ℝ is an orthonormal basis of 𝐿2 (𝜇). Given this basis, and
because 𝑈 𝑡 is bounded, we have 𝑈 𝑡 𝑌 = lim𝐿→∞ 𝑈𝐿𝑡 𝑌 ,
Here, ℬ(ℝ) is the Borel 𝜎-algebra on the real line, and 𝐵(𝐻) 𝐿−1
where the partial sum 𝑈𝐿𝑡 𝑌 ∶= ∑𝑗=0 ⟨𝑈 𝑡 𝑌 , 𝜙𝑗 ⟩𝐿2 (𝜇) 𝜙𝑗 con-
the space of bounded operators on 𝐻. Intuitively, 𝐸 can
be thought of as an operator analog of a complex-valued verges in 𝐿2 (𝜇) norm. Here, 𝑈𝐿𝑡 is a finite-rank map
spectral measure in Fourier analysis, with ℝ playing the on 𝐿2 (𝜇) with range span{𝜙0 , … , 𝜙𝐿−1 }, represented by an
role of frequency space. That is, given 𝑓 ∈ 𝐻, the ℂ-valued 𝐿 × 𝐿 matrix 𝐔(𝑡) with elements 𝑈𝑖𝑗 (𝑡) = ⟨𝜙𝑖 , 𝑈 𝑡 𝜙𝑗 ⟩𝐿2 (𝜇) .
Borel measure 𝐸𝑓 (𝑆) = ⟨𝑓, 𝐸(𝑆)𝑓⟩𝐻 is precisely the Fourier ̂ )⊤ , 𝑦𝑗̂ = ⟨𝜙𝑗 , 𝑈 𝑡 𝑌 ⟩𝐿2 (𝜇) , and
Defining 𝑦 ⃗ = (𝑦0̂ , … , 𝑦𝐿−1
𝐿−1
spectral measure associated with the time-autocorrelation ̂ (𝑡))⊤ = 𝐔(𝑡)𝑦,⃗ we have 𝑈𝐿𝑡 𝑌 = ∑𝑗=0 𝑧𝑗̂ (𝑡)𝜙𝑗 .
(𝑧0̂ (𝑡), … , 𝑧𝐿−1
function 𝐶𝑓 (𝑡) = ⟨𝑓, 𝑈 𝑡 𝑓⟩𝐻 . The latter admits the Fourier ̂ ∈ 𝐿2 (𝜈),
Since 𝜙𝑗 = 𝑢𝑗 ∘ 𝑋, this leads to the estimator 𝑍𝑡,𝐿
representation 𝐶𝑓 (𝑡) = ∫ℝ 𝑒𝑖𝛼𝑡 𝑑𝐸𝑓 (𝛼). ̂ =∑ 𝐿−1
with 𝑍𝑡,𝐿 𝑧𝑗̂ (𝑡)𝑢𝑗 .
The Hilbert space 𝐻 admits a 𝑈 𝑡 -invariant splitting 𝐻 = 𝑗=0
The approach outlined above tentatively provides a con-
𝐻𝑎 ⊕ 𝐻𝑐 into orthogonal subspaces 𝐻𝑎 and 𝐻𝑐 associ-
sistent forecasting framework. Yet, while in principle ap-
ated with the point and continuous components of 𝐸, re-
pealing, it has three major shortcomings: (i) Apart from
spectively. In particular, 𝐸 has a unique decomposition
special cases, the invariant measure and an orthonormal
𝐸 = 𝐸𝑎 + 𝐸𝑐 with 𝐻𝑎 = ran 𝐸𝑎 (ℝ) and 𝐻𝑐 = ran 𝐸𝑐 (ℝ),
basis of 𝐿2 (𝜇) are not known. In particular, orthogo-
where 𝐸𝑎 is a purely atomic spectral measure, and 𝐸𝑐 is a
nal functions with respect to an ambient measure on Ω
spectral measure with no atoms. The atoms of 𝐸𝑎 (i.e., the
(e.g., Lebesgue-orthogonal polynomials) will not suffice,
singletons {𝛼𝑗 } with 𝐸𝑎 ({𝛼𝑗 }) ≠ 0) correspond to eigenfre-
since there are no guarantees that such functions form a
quencies of the generator, for which the eigenvalue equa-
Schauder basis of 𝐿2 (𝜇), let alone be orthonormal. Even
tion 𝑉𝑧𝑗 = 𝑖𝛼𝑧𝑗 has a nonzero solution 𝑧𝑗 ∈ 𝐻𝑎 . Under er-
with a basis, we cannot evaluate 𝑈 𝑡 on its elements without
godic dynamics, every eigenspace of 𝑉 is one-dimensional,
knowing Φ𝑡 . (ii) Pointwise evaluation on 𝐿2 (𝜇) is not de-
so that if 𝑧𝑗 is normalized to unit 𝐿2 (𝜇) norm, 𝐸({𝛼𝑗 })𝑓 =
fined, making 𝑍𝑡,𝐿 ̂ inadequate in practice, even if the coef-
⟨𝑧𝑗 , 𝑓⟩𝐿2 (𝜇) 𝑧𝑗 . Every such 𝑧𝑗 is an eigenfunction of the Koop-
ficients 𝑧𝑗̂ (𝑡) are known. (iii) The covariate map 𝑋 is often-
man operator 𝑈 𝑡 at eigenvalue 𝑒𝑖𝛼𝑗 𝑡 , and {𝑧𝑗 } is an orthonor-
times noninvertible, and thus the 𝜙𝑗 span a strict subspace
mal basis of 𝐻𝑎 . Thus, every 𝑓 ∈ 𝐻𝑎 has the quasiperiodic
of 𝐿2 (𝜇). We now describe methods to overcome these ob-
evolution 𝑈 𝑡 𝑓 = ∑𝑗 𝑒𝑖𝛼𝑗 𝑡 ⟨𝑧𝑗 , 𝑓⟩𝐿2 (𝜇) 𝑧𝑗 , and the autocorrela-
stacles using learning theory.
tion 𝐶𝑓 (𝑡) is also quasiperiodic. While 𝐻𝑎 always contains
Sampling measures and ergodicity. The dynamical tra-
constant eigenfunctions with zero frequency, it might not
jectory {𝜔0 , … , 𝜔𝑁−1 } in state space underlying the training
have any nonconstant elements. In that case, the dynamics
data is the support of a discrete sampling measure 𝜇𝑁 ∶=
is said to be weak-mixing. In contrast to the quasiperiodic 𝑁−1
evolution of observables in 𝐻𝑎 , observables in the contin- ∑𝑛=0 𝛿𝜔𝑛 /𝑁. A key consequence of ergodicity is that for
uous spectrum subspace exhibit a loss of correlation char- Lebesgue-a.e. sampling interval Δ𝑡 and 𝜇-a.e. starting point
acteristic of mixing (chaotic) dynamics. Specifically, for 𝜔0 ∈ Ω, as 𝑁 → ∞, the sampling measures 𝜇𝑁 weak-
every 𝑓 ∈ 𝐻𝑐 the time-averaged autocorrelation function converge to the invariant measure 𝜇; that is,
𝑡
𝐶𝑓̄ (𝑡) = ∫0 |𝐶𝑓 (𝑠)| 𝑑𝑠/𝑡 tends to 0 as |𝑡| → ∞, as do cross- lim ∫ 𝑓 𝑑𝜇𝑁 = ∫ 𝑓 𝑑𝜇 ∀𝑓 ∈ 𝐶(Ω). (1)
correlation functions ⟨𝑔, 𝑈 𝑡 𝑓⟩𝐿2 (𝜇) between observables in 𝑁→∞
Ω Ω
𝐻𝑐 and arbitrary observables in 𝐿2 (𝜇).

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1339


Since integrals against 𝜇𝑁 are time averages on dynam- 𝑁 → ∞ limits in 𝐿2 (𝜇), meaning that there is no suitable
𝑁−1
ical trajectories, i.e., ∫Ω 𝑓 𝑑𝜇𝑁 = ∑𝑛=0 𝑓(𝜔𝑛 )/𝑁, ergodic- notion of 𝑁 → ∞ convergence of the matrix elements of
𝑞
ity provides an empirical means of accessing the statistics 𝑈𝑁 in the standard basis. In response, we will construct a
of the invariant measure. In fact, many systems encoun- representation of the shift operator in a different orthonor-
tered in applications possess so-called physical measures, mal basis with a well-defined 𝑁 → ∞ limit. The main tools
where (1) holds for 𝜔0 in a “larger” set of positive mea- that we will use are kernel integral operators, which we now
sure with respect to an ambient measure (e.g., Lebesgue describe.
measure) from which experimental initial conditions are Kernel integral operators. In the present context, a kernel
drawn. Hereafter, we will let 𝑀 be a compact subset of function will be a real-valued, continuous function 𝑘 ∶ Ω ×
Ω, which is forward-invariant under the dynamics (i.e., Ω → ℝ with the property that there exists a strictly positive,
Φ𝑡 (𝑀) ⊆ 𝑀 for all 𝑡 ≥ 0), and contains 𝐴. For example, continuous function 𝑑 ∶ Ω → ℝ such that
in dissipative dynamical systems such as L63, 𝑀 can be
𝑑(𝜔)𝑘(𝜔, 𝜔′ ) = 𝑑(𝜔′ )𝑘(𝜔′ , 𝜔) ∀𝜔, 𝜔′ ∈ Ω. (2)
chosen as a compact absorbing ball.
Shift operators. Ergodicity suggests that appropriate data- Notice the similarity between (2) and the detailed balance
driven analogs are the 𝐿2 (𝜇𝑁 ) spaces induced by the sam- relation in reversible Markov chains. Now let 𝜌 be any
pling measures 𝜇𝑁 . For a given 𝑁, 𝐿2 (𝜇𝑁 ) consists of equiv- Borel probability measure with compact support 𝑆 ⊆ 𝑀
alence classes of measurable functions 𝑓 ∶ Ω → ℂ hav- included in the forward-invariant set 𝑀. It follows by con-
ing common values at the sampled states 𝜔𝑛 , and the in- tinuity of 𝑘 and compactness of 𝑆 that the integral operator
ner product of two elements 𝑓, 𝑔 ∈ 𝐿2 (𝜇𝑁 ) is given by 𝐾𝜌 ∶ 𝐿2 (𝜌) → 𝐶(𝑀),
an empirical time-correlation, ⟨𝑓, 𝑔⟩𝜇𝑁 = ∫Ω 𝑓∗ 𝑔 𝑑𝜇𝑁 =
𝑁−1 𝐾𝜌 𝑓 = ∫ 𝑘(⋅, 𝜔)𝑓(𝜔) 𝑑𝜌(𝜔), (3)
∑𝑛=0 𝑓∗ (𝜔𝑛 )𝑔(𝜔𝑛 )/𝑁. Moreover, if the 𝜔𝑛 are distinct (as Ω
we will assume for simplicity of exposition), 𝐿2 (𝜇𝑁 ) has is well-defined as a bounded operator mapping elements
dimension 𝑁, and is isomorphic as a Hilbert space to of 𝐿2 (𝜌) into continuous functions on 𝑀. Using 𝜄𝜌 ∶
ℂ𝑁 equipped with a normalized dot product. Given that, 𝐶(𝑀) → 𝐿2 (𝜌) to denote the canonical inclusion map, we
we can represent every 𝑓 ∈ 𝐿2 (𝜇𝑁 ) by a column vector consider two additional integral operators, 𝐺𝜌 ∶ 𝐿2 (𝜌) →
𝑓 ⃗ = (𝑓(𝜔0 ), … , 𝑓(𝜔𝑁−1 ))⊤ ∈ ℂ𝑁 , and every linear map 𝐿2 (𝜌) and 𝐺𝜌̃ ∶ 𝐶(𝑀) → 𝐶(𝑀), with 𝐺𝜌 = 𝜄𝜌 𝐾𝜌 and
𝐴 ∶ 𝐿2 (𝜇𝑁 ) → 𝐿2 (𝜇𝑁 ) by an 𝑁 × 𝑁 matrix 𝐀, so that 𝐺𝜌̃ = 𝐾𝜌 𝜄𝜌 , respectively.
𝑔⃗ = 𝐀𝑓 ⃗ is the column vector representing 𝑔 = 𝐴𝑓. The The operators 𝐺𝜌 and 𝐺𝜌̃ are compact operators act-
elements of 𝑓 ⃗ can also be understood as expansion coef- ing with the same integral formula as 𝐾𝜌 in (3), but their
ficients in the standard basis {𝑒0,𝑁 , … , 𝑒𝑁−1,𝑁 } of 𝐿2 (𝜇𝑁 ), codomains and domains, respectively, are different. Nev-
where 𝑒𝑗,𝑁 (𝜔𝑛 ) = 𝑁 1/2 𝛿𝑗𝑛 ; that is, 𝑓(𝜔𝑛 ) = ⟨𝑒𝑛,𝑁 , 𝑓⟩𝐿2 (𝜇𝑁 ) . ertheless, their nonzero eigenvalues coincide, and 𝜙 ∈
Similarly, the elements of 𝐀 correspond to the operator 𝐿2 (𝜌) is an eigenfunction of 𝐺𝜌 corresponding to a nonzero
matrix elements 𝐴𝑖𝑗 = ⟨𝑒𝑖,𝑁 , 𝐴𝑒𝑗,𝑁 ⟩𝐿2 (𝜇𝑁 ) . eigenvalue 𝜆 if and only if 𝜑 ∈ 𝐶(𝑀) with 𝜑 = 𝐾𝜌 𝜙/𝜆 is an
Next, we would like to define a Koopman operator on eigenfunction of 𝐺𝜌̃ at the same eigenvalue. In effect, 𝜙 ↦
𝐿2 (𝜇𝑁 ), but this space does not admit such an operator as a 𝜑 “interpolates” the 𝐿2 (𝜌) element 𝜙 (defined only up to 𝜌-
composition map induced by the dynamical flow Φ𝑡 on Ω. null sets) to the continuous, everywhere-defined function
This is because Φ𝑡 does not preserve null sets with respect 𝜑. It can be verified that if (2) holds, 𝐺𝜌 is a trace-class op-
to 𝜇𝑁 , and thus does not lead to a well-defined compo- erator with real eigenvalues, |𝜆0 | ≥ |𝜆1 | ≥ ⋯ ↘ 0+ . More-
sition map on equivalence classes of functions in 𝐿2 (𝜇𝑁 ). over, there exists a Riesz basis {𝜙0 , 𝜙1 , … , } of 𝐿2 (𝜌) and a
Nevertheless, on 𝐿2 (𝜇𝑁 ) there is an analogous construct to corresponding dual basis {𝜙0′ , 𝜙1′ , …} with ⟨𝜙𝑖′ , 𝜙𝑗 ⟩𝐿2 (𝜌) = 𝛿𝑖𝑗 ,
the Koopman operator on 𝐿2 (𝜇), namely, the shift operator, such that 𝐺𝜌 𝜙𝑗 = 𝜆𝑗 𝜙𝑗 and 𝐺𝜌∗ 𝜙𝑗′ = 𝜆𝑗 𝜙𝑗′ . We say that the ker-
𝑞
𝑈𝑁 ∶ 𝐿2 (𝜇𝑁 ) → 𝐿2 (𝜇𝑁 ), 𝑞 ∈ ℤ, defined as nel 𝑘 is 𝐿2 (𝜌)-universal if 𝐺𝜌 has no zero eigenvalues; this
𝑓(𝜔𝑛+𝑞 ), 0 ≤ 𝑛 + 𝑞 ≤ 𝑁 − 1, is equivalent to ran 𝐺𝜌 being dense in 𝐿2 (𝜌). Moreover, 𝑘
𝑞
𝑈𝑁 𝑓(𝜔𝑛 ) = { is said to be 𝐿2 (𝜌)-Markov if 𝐺𝜌 is a Markov operator, i.e.,
0, otherwise.
𝐺𝜌 ≥ 0, 𝐺𝜌 𝑓 ≥ 0 if 𝑓 ≥ 0, and 𝐺1 = 1.
𝑞
Even though 𝑈𝑁 is not a composition map, intuitively Observe now that the operators 𝐺𝜇𝑁 associated with
it should have a connection with the Koopman operator the sampling measures 𝜇𝑁 , henceforth abbreviated by
𝑈 𝑞 ∆𝑡 . One could consider, for instance, the matrix repre- 𝐺𝑁 , are represented by 𝑁 × 𝑁 kernel matrices 𝐆𝑁 =
𝑞
sentation 𝐔̃ 𝑁 (𝑞) = [⟨𝑒𝑖,𝑁 , 𝑈𝑁 𝑒𝑗,𝑁 ⟩𝐿2 (𝜇𝑁 ) ] in the standard [⟨𝑒𝑖,𝑁 , 𝐺𝑁 𝑒𝑗,𝑁 ⟩𝐿2 (𝜇𝑁 ) ] = [𝑘(𝜔𝑖 , 𝜔𝑗 )] in the standard basis of
basis, and attempt to connect it with a matrix representa- 𝐿2 (𝜇𝑁 ). Further, if 𝑘 is a pullback kernel from covariate
tion of 𝑈 𝑞 ∆𝑡 in an orthonormal basis of 𝐿2 (𝜇). However, space, i.e., 𝑘(𝜔, 𝜔′ ) = 𝜅(𝑋(𝜔), 𝑋(𝜔′ )) for 𝜅 ∶ 𝒳 × 𝒳 → ℝ,
the issue with this approach is that the 𝑒𝑗,𝑁 do not have then 𝐆𝑁 = [𝜅(𝑥𝑖 , 𝑥𝑗 )] is empirically accessible from the

1340 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9


training data. Popular kernels in applications include the 𝐔𝑁 (𝑞) = [𝑈𝑁,𝑖𝑗 (𝑞)], with 𝑖, 𝑗 ∈ {0, … , 𝐿 − 1}. Checkerboard
covariance kernel 𝜅(𝑥, 𝑥′ ) = ⟨𝑥, 𝑥′ ⟩𝒳 on an inner-product plots of 𝐔𝑁 (𝑞) for the L63 system are displayed in Figure 1.
′ 2
space and the radial Gaussian kernel 𝜅(𝑥, 𝑥′ ) = 𝑒−‖𝑥−𝑥 ‖𝒳 /𝜖 . This method for approximating matrix elements of
It is also common to employ Markov kernels constructed Koopman operators was proposed in a technique called
by normalization of symmetric kernels [CL06, BH16]. We diffusion forecasting (named after the diffusion kernels em-
will use 𝑘𝑁 to denote kernels with data-dependent normal- ployed) [BGH15]. Assuming that the response 𝑌 is contin-
izations. uous and by spectral convergence of 𝐺𝑁 , for every 𝑗 ∈ ℕ0
A widely used strategy for learning with integral oper- such that 𝜆𝑗 > 0, the inner products 𝑌𝑗,𝑁 ̂ = ⟨𝜙𝑗,𝑁

, 𝑌 ⟩𝜇𝑁 con-
ators [vLBB08] is to construct families of kernels 𝑘𝑁 con- ̂ ′
verge, as 𝑁 → ∞, to 𝑌𝑗 = ⟨𝜙𝑗 , 𝑌 ⟩𝐿2 (𝜇) . This implies that for
verging in 𝐶(𝑀 × 𝑀) norm to 𝑘. This implies that for every any 𝐿 ∈ ℕ such that 𝜆𝐿−1 > 0, ∑
𝐿−1
̂ 𝜑𝑗,𝑁 converges
𝑌𝑗,𝑁
𝑗=0
nonzero eigenvalue 𝜆𝑗 of 𝐺 ≡ 𝐺𝜇 , the sequence of eigen-
in 𝐶(𝑀) to a continuous representative of Π𝐿 𝑌 , where
values 𝜆𝑗,𝑁 of 𝐺𝑁 satisfies lim𝑁→∞𝜆𝑗,𝑁 = 𝜆𝑗 . Moreover,
Π𝐿 is the orthogonal projection on 𝐿2 (𝜇) mapping into
there exists a sequence of eigenfunctions 𝜙𝑗,𝑁 ∈ 𝐿2 (𝜇𝑁 ) span{𝜙0 , … , 𝜙𝐿−1 }. Suppose now that 𝜚𝑁 is a sequence of
corresponding to 𝜆𝑗,𝑁 , whose continuous representatives, continuous functions converging uniformly to 𝜚 ∈ 𝐶(𝑀),
𝜑𝑗,𝑁 = 𝐾𝑁 𝜙𝑗,𝑁 /𝜆𝑗,𝑁 , converge in 𝐶(𝑀) to 𝜑𝑗 = 𝐾𝜙𝑗 /𝜆𝑗 , such that 𝜚𝑁 are probability densities with respect to 𝜇𝑁
where 𝜙𝑗 ∈ 𝐿2 (𝜇) is any eigenfunction of 𝐺 at eigen- (i.e., 𝜚𝑁 ≥ 0 and ‖𝜚𝑁 ‖𝐿1 (𝜇𝑁 ) = 1). By similar argu-
value 𝜆𝑗 . In effect, we use 𝐶(𝑀) as a “bridge” to estab- ments as for 𝑌 , as 𝑁 → ∞, the continuous function
lish spectral convergence of the operators 𝐺𝑁 , which act 𝐿−1
̂ 𝜑𝑗,𝑁 with 𝜚𝑗,𝑁
∑𝑗=0 𝜚𝑗,𝑁 ̂ ′
= ⟨𝜑𝑗,𝑁 , 𝜚𝑁 ⟩𝐿2 (𝜇𝑁 ) converges to
on different spaces. Note that (𝜆𝑗,𝑁 , 𝜑𝑗,𝑁 ) does not con- 2
Π𝐿 𝜚 in 𝐿 (𝜇). Putting these facts together, and setting
verge uniformly with respect to 𝑗, and for a fixed 𝑁, eigen-
values/eigenfunctions at larger 𝑗 exhibit larger deviations 𝜚𝑁⃗ = (𝜚0,𝑁
̂ , … , 𝜚𝐿−1,𝑁
̂ )⊤ and 𝑌𝑁⃗ = (𝑌0,𝑁̂ , … , 𝑌𝐿−1,𝑁
̂ )⊤ , we
from their 𝑁 → ∞ limits. Under measure-preserving, er- conclude that
godic dynamics, convergence occurs for 𝜇-a.e. starting state 𝑁→∞
⃗ 𝐔𝑁 (𝑞)𝑌𝑁⃗ −−−−→ ⟨Π𝐿 𝜚, Π𝐿 𝑈 𝑞 ∆𝑡 𝑌 ⟩𝐿2 (𝜇) .

𝜚𝑁 (4)
𝜔0 ∈ 𝑀, and 𝜔0 in a set of positive ambient measure if 𝜇
is physical. In particular, the training states 𝜔𝑛 need not Here, the left-hand side is given by matrix–vector prod-
lie on 𝐴. See Figure 1 for eigenfunctions of 𝐺𝑁 computed ucts obtained from the data, and the right-hand side
from data sampled near the L63 attractor. is equal to the expectation of Π𝐿 𝑈 𝑞 ∆𝑡 𝑌 with respect to
Diffusion forecasting. We now have the ingredients to the probability measure 𝜌 with density 𝑑𝜌/𝑑𝜇 = 𝜚; i.e.,
build a concrete statistical forecasting scheme based on ⟨Π𝐿 𝜚, Π𝐿 𝑈 𝑞 ∆𝑡 𝑌 ⟩𝐿2 (𝜇) = 𝔼𝜌 (Π𝐿 𝑈 𝑞 ∆𝑡 𝑌 ), where 𝔼𝜌 (⋅) ∶=
data-driven approximations of the Koopman operator. In ∫Ω (⋅) 𝑑𝜌.
particular, note that if 𝜙𝑖,𝑁 ′
, 𝜙𝑗,𝑁 are biorthogonal eigen- What about the dependence of the forecast on 𝐿? As 𝐿
functions of 𝐺𝑁 ∗
and 𝐺𝑁 , respectively, at nonzero eigen- increases, Π𝐿 converges strongly to the orthogonal projec-
values, we can evaluate the matrix element 𝑈𝑁,𝑖𝑗 (𝑞) ∶= tion Π𝐺 ∶ 𝐿2 (𝜇) → 𝐿2 (𝜇) onto the closure of the range of 𝐺.

⟨𝜙𝑖,𝑁
𝑞
, 𝑈𝑁 𝜙𝑗,𝑁 ⟩𝐿2 (𝜇𝑁 ) of the shift operator using the contin- Thus, if the kernel 𝑘 is 𝐿2 (𝜇)-universal (i.e., ran 𝐺 = 𝐿2 (𝜇)),
uous representatives 𝜑𝑖,𝑁 ′
, 𝜑𝑗,𝑁 , Π𝐺 = Id, and under the iterated limit of 𝐿 → ∞ after
𝑁 → ∞ the left-hand side of (4) converges to 𝔼𝜌 𝑈 𝑞 ∆𝑡 𝑌 .
𝑁−1−𝑞 In summary, implemented with an 𝐿2 (𝜇)-universal ker-
1 ′
𝑈𝑁,𝑖𝑗 (𝑞) = ∑ 𝜙𝑖,𝑁 (𝜔𝑛 )𝜙𝑗,𝑁 (𝜔𝑛+𝑞 ) nel, diffusion forecasting consistently approximates the
𝑁 𝑛=0 expected value of the time-evolution of any continuous
𝑁−𝑞 ′ observable with respect to any probability measure with
= ∫ 𝜑𝑖,𝑁 𝑈 𝑞 ∆𝑡 𝜑𝑗,𝑁 𝑑𝜇𝑁−𝑞 ,
𝑁 continuous density relative to 𝜇. An example of an 𝐿2 (𝜇)-

universal kernel is the pullback of a radial Gaussian ker-
where 𝑈 𝑞 ∆𝑡 is the Koopman operator on 𝐶(𝑀). Therefore, nel on 𝒳 = ℝ𝑚 . In contrast, the covariance kernel is not
if the corresponding eigenvalues 𝜆𝑖 , 𝜆𝑗 of 𝐺 are nonzero, 𝐿2 (𝜇)-universal, as in this case the rank of 𝐺 is bounded by
by the weak convergence of the sampling measures in (1) 𝑚. This illustrates that forecasting in the POD basis may
and 𝐶(𝑀) convergence of the eigenfunctions, as 𝑁 → be subject to intrinsic limitations, even with full observa-
∞, 𝑈𝑖𝑗,𝑁 (𝑞) converges to the matrix element 𝑈𝑖𝑗 (𝑞 Δ𝑡) = tions.
⟨𝜙𝑖 , 𝑈 𝑞 ∆𝑡 𝜙𝑗 ⟩𝐿2 (𝜇) of the Koopman operator on 𝐿2 (𝜇). This Kernel analog forecasting. While providing a flexible
convergence is not uniform with respect to 𝑖, 𝑗, but if we framework for approximating expectation values of observ-
fix a parameter 𝐿 ∈ ℕ (which can be thought of as spec- ables under measure-preserving, ergodic dynamics, diffu-
tral resolution) such that 𝜆𝐿−1 ≠ 0, we can obtain a statisti- sion forecasting does not directly address the problem
cally consistent approximation of 𝐿×𝐿 Koopman operator of constructing a concrete forecast function, i.e., a func-
matrices, 𝐔(𝑞 Δ𝑡) = [𝑈𝑖𝑗 (𝑞 Δ𝑡)], by shift operator matrices, tion 𝑍𝑡 ∶ 𝒳 → ℂ approximating 𝑈 𝑡 𝑌 as stated in

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1341


𝑞∗
(f) 𝜄𝑁 𝑈𝑁 Π𝐿 𝔼(⋅) 𝑌
Ψ𝑁 ℋ𝑁 𝐿2 (𝜇𝑁 ) 𝐿2 (𝜇𝑁 ) span({𝜙𝑖,𝑁 }𝐿𝑖=1 ) 𝒴
𝑀⊆Ω Ψ
error
𝑈 𝑡∗ 𝔼(⋅) 𝑌
𝜄
∈ ℋ(𝑀) 𝐿2 (𝜇) 𝐿2 (𝜇) 𝒴
𝜔 Ψ(𝜔) 𝑈 𝑡∗ Ψ(𝜔) 𝔼Ψ(𝜔) 𝑈 𝑡 𝑌

𝑞
(g) 𝑈𝑁 Π𝑋 𝒩𝑁 𝜄
𝜄𝑁 𝐿2 (𝜇𝑁 ) 𝐿2 (𝜇𝑁 ) 𝐿2𝑋 (𝜇𝑁 ) ℋ𝑁 𝐿2𝑋 (𝜇)
𝐶(𝑀) 𝜄
error
𝑈𝑡 Π𝑋
∈ 𝐿2 (𝜇) 𝐿2 (𝜇) 𝐿2𝑋 (𝜇)
𝑌 𝑈𝑡𝑌 𝑍𝑡 ∘ 𝑋 = 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋)

Figure 1. Panel (a) shows eigenfunctions 𝜙𝑗,𝑁 of 𝐺𝑁 for a dataset sampled near the L63 attractor. Panel (b) shows the action of
𝑞
the shift operator 𝑈𝑁 on the 𝜙𝑗,𝑁 from (a) for 𝑞 = 50 steps, approximating the Koopman operator 𝑈 𝑞 ∆𝑡 . Panels (c, d) show the
𝑞
matrix elements ⟨𝜙𝑖,𝑁 , 𝑈𝑁 𝜙𝑗,𝑁 ⟩𝜇𝑁 of the shift operator for 𝑞 = 5 and 50. The mixing dynamics is evident in the larger
far-from-diagonal components in 𝑞 = 50 vs. 𝑞 = 5. Panel (e) shows the matrix representation of a finite-difference approximation
of the generator 𝑉, which is skew-symmetric. Panels (f, g) summarize the diffusion forecast (DF) and kernel analog forecast (KAF)
for lead time 𝑡 = 𝑞 Δ𝑡. In each diagram, the data-driven finite-dimensional approximation (top row) converges to the true forecast
(middle row). DF maps an initial state 𝜔 ∈ 𝑀 ⊆ Ω to the future expectation of an observable 𝔼Ψ(𝜔) 𝑈 𝑡 𝑌 = 𝔼𝑈 𝑡∗ Ψ(𝜔) 𝑌 , and KAF maps
a response function 𝑌 ∈ 𝐶(𝑀) to the conditional expectation 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋).

1342 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9


Problem 1. One way of defining such a function is to let kernel analog forecasting (KAF) [AG20]. Mathematically,
𝜅𝑁 be an 𝐿2 (𝜈𝑁 )-Markov kernel on 𝒳 for 𝜈𝑁 = 𝑋∗ 𝜇𝑁 , and KAF is closely related to kernel principal component re-
to consider the “feature map” Ψ𝑁 ∶ 𝒳 → 𝐶(𝑀) mapping gression. To build the KAF estimator, we work again with
each point 𝑥 ∈ 𝒳 in covariate space to the kernel section integral operators as in (3), with the difference that now
Ψ𝑁 (𝑥) = 𝜅𝑁 (𝑥, 𝑋(⋅)). Then, Ψ𝑁 (𝑥) is a continuous proba- 𝐾𝜌 ∶ 𝐿2 (𝜌) → ℋ(𝑀) takes values in the restriction of
bility density with respect to 𝜇𝑁 , and we can use diffusion ℋ to the forward-invariant set 𝑀, denoted ℋ(𝑀). One

can show that the adjoint 𝐾𝜌∗ ∶ ℋ(𝑀) → 𝐿2 (𝜌) coin-
forecasting to define 𝑍𝑞 ∆𝑡 (𝑥) = Ψ⃗ ⃗
𝑁 (𝑥) 𝐔𝑁 (𝑞)𝑌𝑁 with no-
cides with the inclusion map 𝜄𝜌 on continuous functions,
tation as in (4).
so that 𝐾𝜌∗ maps 𝑓 ∈ ℋ(𝑀) ⊂ 𝐶(𝑀) to its correspond-
While this approach has a well-defined 𝑁 → ∞ limit, it
does not provide optimality guarantees, particularly in sit- ing 𝐿2 (𝜌) equivalence class. As a result, the integral oper-
uations where 𝑋 is noninjective. Indeed, the 𝐿2 (𝜇)-optimal ator 𝐺𝜌 ∶ 𝐿2 (𝜌) → 𝐿2 (𝜌) takes the form 𝐺𝜌 = 𝐾𝜌∗ 𝐾𝜌 , be-
approximation to 𝑈 𝑡 𝑌 of the form 𝑍𝑡 ∘ 𝑋 is given by the coming a self-adjoint, positive-definite, compact operator
conditional expectation 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋). In the present 𝐿2 setting with eigenvalues 𝜆0 ≥ 𝜆1 ≥ ⋯ ↘ 0+ , and a correspond-
we have 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋) = Π𝑋 𝑈 𝑡 𝑌 , where Π𝑋 is the orthog- ing orthonormal eigenbasis {𝜙0 , 𝜙1 , …} of 𝐿2 (𝜌). Moreover,
onal projection into 𝐿2𝑋 (𝜇) ∶= {𝑓 ∈ 𝐿2 (𝜇) ∶ 𝑓 = 𝑔 ∘ 𝑋}. {𝜓0 , 𝜓1 , …} with 𝜓𝑗 = 𝐾𝜌 𝜙𝑗 /𝜆𝑗1/2 is an orthonormal set in
That is, the conditional expectation minimizes the error ℋ(𝑀). In fact, Mercer’s theorem provides an explicit repre-

‖𝑓−𝑈 𝑡 𝑌 ‖2𝐿2 (𝜇) among all pullbacks 𝑓 ∈ 𝐿2𝑋 (𝜇) from covari- sentation 𝑘(𝜔, 𝜔′ ) = ∑𝑗=0 𝜓𝑗 (𝜔)𝜓𝑗 (𝜔′ ), where direct evalu-
ate space. Even though 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋 = 𝑥) can be expressed ation of the kernel in the left-hand side (known as “kernel
as an expectation with respect to a conditional probability trick”) avoids the complexity of inner-product computa-
measure 𝜇(⋅ ∣ 𝑥) on Ω, that measure will generally not have tions between feature vectors 𝜓𝑗 . Here, our perspective is
an 𝐿2 (𝜇) density, and there is no map Ψ ∶ 𝒳 → 𝐶(𝑀) such to rely on the orthogonality of the eigenbasis to approxi-
that ⟨Ψ(𝑥), 𝑈 𝑡 𝑌 ⟩𝐿2 (𝜇) equals 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋 = 𝑥). mate observables of interest at fixed 𝐿, and establish con-
To construct a consistent estimator of the conditional vergence of the estimator as 𝐿 → ∞. A similar approach
expectation, we require that 𝑘 be a pullback of a kernel was adopted for density estimation on noncompact do-
𝜅 ∶ 𝒳 × 𝒳 → ℝ on covariate space which is (i) symmet- mains, with Mercer-type kernels based on orthogonal poly-
ric, 𝜅(𝑥, 𝑥′ ) = 𝜅(𝑥′ , 𝑥) for all 𝑥, 𝑥′ ∈ 𝒳 (so (2) holds); (ii) nomials [ZHL19].
strictly positive; and (iii) strictly positive-definite. The latter Now a key operation that the RKHS enables is the Nys-
means that for any sequence 𝑥0 , … , 𝑥𝑛−1 of distinct points tröm extension, which interpolates 𝐿2 (𝜌) elements of appro-
in 𝒳 the matrix [𝜅(𝑥𝑖 , 𝑥𝑗 )] is strictly positive. These proper- priate regularity to RKHS functions. The Nyström operator
ties imply that there exists a Hilbert space ℋ of complex- 𝒩𝜌 ∶ 𝐷(𝒩𝜌 ) → ℋ(𝑀) is defined on the domain 𝐷(𝒩𝜌 ) =
valued functions on Ω, such that (i) for every 𝜔 ∈ Ω, {∑𝑗 𝑐𝑗 𝜙𝑗 ∶ ∑𝑗 |𝑐𝑗 |2 /𝜆𝑗 < ∞} by linear extension of 𝒩𝜌 𝜙𝑗 =
the kernel sections 𝑘𝜔 = 𝑘(𝜔, ⋅) lie in ℋ; (ii) the evalu- 𝜓𝑗 /𝜆𝑗1/2 . Note that 𝒩𝜌 𝜙𝑗 = 𝐾𝜌 𝜙𝑗 /𝜆𝑗 = 𝜑𝑗 , so 𝒩𝜌 maps 𝜙𝑗 to its
ation functional 𝛿𝜔 ∶ ℋ → ℂ is bounded and satisfies continuous representative, and 𝐾𝜌∗ 𝒩𝜌 𝑓 = 𝑓, meaning that
𝛿𝜔 𝑓 = ⟨𝑘𝜔 , 𝑓⟩ℋ ; (iii) every 𝑓 ∈ ℋ has the form 𝑓 = 𝑔 ∘ 𝑋 𝒩𝜌 𝑓 = 𝑓, 𝜌-a.e. While 𝐷(𝒩𝜌 ) may be a strict 𝐿2 (𝜌) subspace,
for a continuous function 𝑔 ∶ 𝒳 → ℂ; and (iv) 𝜄𝜇 ℋ lies for any 𝐿 with 𝜆𝐿−1 > 0 we define a spectrally truncated op-
dense in 𝐿2𝑋 (𝜇). 𝐿−1
erator 𝒩𝐿,𝜌 ∶ 𝐿2 (𝜌) → ℋ(𝑀), 𝒩𝐿,𝜌 ∑𝑗 𝑐𝑗 𝜙𝑗 = ∑𝑗=0 𝑐𝑗 𝜓𝑗 /𝜆𝑗1/2 .
A Hilbert space of functions satisfying (i) and (ii) above
Then, as 𝐿 increases, 𝐾𝜌∗ 𝒩𝐿,𝜌 𝑓 converges to Π𝐺𝜌 𝑓 in 𝐿2 (𝜌).
is known as a reproducing kernel Hilbert space (RKHS), and
To make empirical forecasts, we set 𝜌 = 𝜇𝑁 , compute the
the associated kernel 𝑘 is known as a reproducing kernel.
expansion coefficients 𝑐𝑗,𝑁 (𝑡) of 𝑈 𝑡 𝑌 in the {𝜙𝑗,𝑁 } basis of
RKHSs have many useful properties for statistical learning
[CS02], not least because they combine the Hilbert space 𝐿2 (𝜇𝑁 ), and construct 𝑌𝑡,𝐿,𝑁 = 𝒩𝐿,𝑁 𝑈 𝑡 𝑌 ∈ ℋ(𝑀). Because
structure of 𝐿2 spaces with pointwise evaluation in spaces 𝜓𝑗,𝑁 are pullbacks of known functions 𝑢𝑗,𝑁 ∈ 𝐶(𝒳), we
𝐿−1
of continuous functions. The density of ℋ in 𝐿2𝑋 (𝜇) is a have 𝑌𝑡,𝐿,𝑁 = 𝑍𝑡,𝐿,𝑁 ∘𝑋, where 𝑍𝑡,𝐿,𝑁 = ∑𝑗=0 𝑐𝑗 (𝑡)𝑢𝑗,𝑁 /𝜆1/2
𝑗,𝑁
consequence of the strict positive-definiteness of 𝜅. In par- can be evaluated at any 𝑥 ∈ 𝒳.
ticular, because the conditional expectation 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋) The function 𝑌𝑡,𝐿,𝑁 is our estimator of the conditional
lies in 𝐿2𝑋 (𝜇), it can be approximated by elements of ℋ expectation 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋). By spectral convergence of ker-
to arbitrarily high precision in 𝐿2 (𝜇) norm, and every such nel integral operators, as 𝑁 → ∞, 𝑌𝑡,𝐿,𝑁 converges to
approximation will be a pullback 𝑌𝑡̂ = 𝑍𝑡 ∘ 𝑋 of a continu- 𝑌𝑡,𝐿 ∶= 𝒩𝐿 𝑈 𝑡 𝑌 in 𝐶(𝑀) norm, where 𝒩𝐿 ≡ 𝒩𝐿,𝜇 . Then,
ous function 𝑍𝑡 that can be evaluated at arbitrary covariate as 𝐿 → ∞, 𝐾 ∗ 𝑌𝑡,𝐿 converges in 𝐿2 (𝜇) norm to Π𝐺 𝑈 𝑡 𝑌 . Be-
values. cause 𝜅 is strictly positive-definite, 𝐺 has dense range in
We now describe a data-driven technique for construct- 𝐿2𝑋 (𝜇), and thus Π𝐺 𝑈 𝑡 𝑌 = Π𝑋 𝑈 𝑡 𝑌 = 𝔼(𝑈 𝑡 𝑌 ∣ 𝑋). We
ing such a prediction function, which we refer to as therefore conclude that 𝑌𝑡,𝐿,𝑁 converges to the conditional

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1343


transfer operators for detection of invariant sets
[DJ99, DFS00], harmonic averaging [Mez05],
and dynamic mode decomposition (DMD)
[RMB+ 09,Sch10,WKR15,KNK+ 18] techniques for approx-
imating Koopman eigenfunctions, and Darboux kernels
for approximating spectral projectors [KPM20]. While nat-
ural from a theoretical standpoint, evolution operators
tend to have more complicated spectral properties than
kernel integral operators, including nonisolated eigenval-
ues and continuous spectrum. The following examples il-
lustrate distinct behaviors associated with the point (𝐻𝑎 )
and continuous (𝐻𝑐 ) spectrum subspaces of 𝐿2 (𝜇).
Example 1 (Torus rotation). A quasiperiodic rotation on
the 2-torus, Ω = 𝕋2 , is governed by the system of ODEs

𝜔̇ = 𝑉(𝜔), where 𝜔 = (𝜔1 , 𝜔2 ) ∈ [0, 2𝜋)2 , 𝑉 ⃗ = (𝜈1 , 𝜈2 ),
and 𝜈1 , 𝜈2 ∈ ℝ are rationally independent frequency pa-
rameters. The resulting flow, Φ𝑡 (𝜔) = (𝜔1 + 𝜈1 𝑡, 𝜔2 + 𝜈2 𝑡)
mod 2𝜋, has a unique Borel ergodic invariant probability
measure 𝜇 given by a normalized Lebesgue measure. More-
over, there exists an orthonormal basis of 𝐿2 (𝜇) consisting
of Koopman eigenfunctions 𝑧𝑗𝑘 (𝜔) = 𝑒𝑖(𝑗𝜔1 +𝑘𝜔2 ) , 𝑗, 𝑘 ∈ ℤ,
Figure 2. KAF applied to the L63 state vector component with eigenfrequencies 𝛼𝑗𝑘 = 𝑗𝜈1 + 𝑘𝜈2 . Thus, 𝐻𝑎 = 𝐿2 (𝜇),
𝑌 (𝜔) = 𝜔1 with full (blue) and partial (red) observations. In the
fully observed case, the covariate 𝑋 is the identity map on
and 𝐻𝑐 is the zero subspace. Such a system is said to have
Ω = ℝ3 . In the partially observed case, 𝑋(𝜔) = 𝜔1 is the a pure point spectrum.
projection to the first coordinate. Top: Forecasts 𝑍𝑡,𝐿,𝑁 (𝑥)
initialized from fixed 𝑥 = 𝑋(𝜔), compared with the true Example 2 (Lorenz 63 system). The L63 system on Ω = ℝ3
evolution 𝑈 𝑡 𝑌 (𝜔) (black). Shaded regions show error bounds is governed by a system of smooth ODEs with two qua-
based on KAF estimates of the conditional standard deviation, dratic nonlinearities. This system is known to exhibit
𝜎𝑡 (𝑥). Bottom: RMS forecast errors (solid lines) and 𝜎𝑡 (dashed a physical ergodic invariant probability measure 𝜇 sup-
lines). The agreement between actual and estimated errors
ported on a compact set (the L63 attractor), with mixing
indicates that 𝜎𝑡 provides useful uncertainty quantification.
dynamics. This means that 𝐻𝑎 is the one-dimensional sub-
expectation as 𝐿 → ∞ after 𝑁 → ∞. Forecast results from space of 𝐿2 (𝜇) consisting of constant functions, and 𝐻𝑐 con-
the L63 system are shown in Figure 2. sists of all 𝐿2 (𝜇) functions orthogonal to the constants (i.e.,
with zero expectation value with respect to 𝜇).
Coherent Pattern Extraction
Delay-coordinate approaches. For the point spectrum
We now turn to the task of coherent pattern extraction in
subspace 𝐻𝑎 , a natural class of coherent observables is
Problem 2. This is a fundamentally unsupervised learning
provided by the Koopman eigenfunctions. Every Koop-
problem, as we seek to discover observables of a dynami-
man eigenfunction 𝑧𝑗 ∈ 𝐻𝑎 evolves as a harmonic oscil-
cal system that exhibit a natural time evolution (by some
suitable criterion), rather than approximate a given observ- lator at the corresponding eigenfrequency, 𝑈 𝑡 𝑧𝑗 = 𝑒𝑖𝛼𝑗 𝑡 𝑧𝑗 ,
able as in the context of forecasting. We have mentioned and the associated autocorrelation function, 𝐶𝑧𝑗 (𝑡) = 𝑒𝑖𝛼𝑗 𝑡 ,
POD as a technique for identifying coherent observables also has a harmonic evolution. Short of temporal invari-
through eigenfunctions of covariance operators. Kernel ance (which only occurs for constant eigenfunctions un-
PCA [SSM98] is a generalization of this approach utilizing der measure-preserving ergodic dynamics), it is natural to
integral operators with potentially nonlinear kernels. For think of a harmonic evolution as being “maximally” coher-
data lying on Riemannian manifolds, it is popular to em- ent. In particular, if 𝑧𝑗 is continuous, then for any 𝜔 ∈ 𝐴,
ploy kernels approximating geometrical operators, such the real and imaginary parts of the time series 𝑡 ↦ 𝑈 𝑡 𝑧𝑗 (𝜔)
as heat operators and their associated Laplacians. Exam- are pure sinusoids, even if the flow Φ𝑡 is aperiodic. Further
ples include Laplacian eigenmaps [BN03], diffusion maps attractive properties of Koopman eigenfunctions include
[CL06], and variable-bandwidth kernels [BH16]. Mean- the facts that they are intrinsic to the dynamical system
while, coherent pattern extraction techniques based on generating the data, and they are closed under pointwise
evolution operators have also gained popularity in re- multiplication, 𝑧𝑗 𝑧𝑘 = 𝑧𝑗+𝑘 , allowing one to generate every
cent years. These methods include spectral analysis of eigenfunction from a potentially finite generating set.

1344 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9


Yet, consistently approximating Koopman eigenfunc- raw covariate map 𝑋. This intuition has been made precise
tions from data is a nontrivial task, even for simple systems. in a number of “embedology” theorems [SYC91], which
For instance, the torus rotation in Example 1 has a dense state that under mild assumptions, for any compact sub-
set of eigenfrequencies by rational independence of the ba- set 𝑆 ⊆ Ω (including, for our purposes, the invariant set
sic frequencies 𝜈1 and 𝜈2 . Thus, any open interval in ℝ 𝐴), the delay-coordinate map 𝑋𝑄 is injective on 𝑆 for suffi-
contains infinitely many eigenfrequencies 𝛼𝑗𝑘 , necessitat- ciently large 𝑄. As a result, delay-coordinate maps provide
ing some form of regularization. Arguably, the term “pure a powerful tool for state space reconstruction, as well as for
point spectrum” is somewhat of a misnomer for such sys- constructing informative predictor functions in the context
tems since a nonempty continuous spectrum is present. In- of forecasting.
deed, since the spectrum of an operator on a Banach space Aside from considerations associated with topological
includes the closure of the set of eigenvalues, 𝑖ℝ ⧵ {𝑖𝛼𝑗𝑘 } reconstruction, however, observe that a metric 𝑑 ∶ 𝒳×𝒳 →
lies in the continuous spectrum. ℝ on covariate space pulls back to a distance-like function
As a way of addressing these challenges, observe that if 𝑑𝑄̃ ∶ Ω × Ω → ℝ such that
𝐺 is a self-adjoint, compact operator commuting with the
𝑄−1
Koopman group (i.e., 𝑈 𝑡 𝐺 = 𝐺𝑈 𝑡 ), then any eigenspace 2̃ 1
𝑊𝜆 of 𝐺 corresponding to a nonzero eigenvalue 𝜆 is in- 𝑑𝑄 (𝜔, 𝜔′ ) = ∑ 𝑑2 (𝑋(𝜔−𝑞 ), 𝑋(𝜔−𝑞

)). (5)
𝑄 𝑞=0
variant under 𝑈 𝑡 , and thus under the generator 𝑉. More-
over, by compactness of 𝐺, 𝑊𝜆 has finite dimension. Thus, In particular, 𝑑𝑄2̃
has the structure of an ergodic average of
for any orthonormal basis {𝜙0 , … , 𝜙𝑙−1 } of 𝑊𝜆 , the genera- a continuous function under the product dynamical flow
tor 𝑉 on 𝑊𝜆 is represented by a skew-symmetric, and thus Φ𝑡 × Φ𝑡 on Ω × Ω. By the von Neumann ergodic theorem,
unitarily diagonalizable, 𝑙 × 𝑙 matrix 𝐕 = [⟨𝜙𝑖 , 𝑉𝜙𝑗 ⟩𝐿2 (𝜇) ]. as 𝑄 → ∞, 𝑑𝑄 ̃ converges in 𝐿2 (𝜇 × 𝜇) norm to a bounded
The eigenvectors 𝑢⃗ = (𝑢0 , … , 𝑢𝑙−1 )⊤ ∈ ℂ𝑙 of 𝐕 then con- ̃
function 𝑑∞ , which is invariant under the Koopman oper-
tain expansion coefficients of Koopman eigenfunctions ator 𝑈 𝑡 ⊗ 𝑈 𝑡 of the product dynamical system. Note that
𝑙−1
𝑧 = ∑𝑗=0 𝑢𝑗 𝜙𝑗 in 𝑊𝜆 , and the eigenvalues corresponding 𝑑∞̃ need not be 𝜇 × 𝜇-a.e. constant, as Φ𝑡 × Φ𝑡 need not
to 𝑢⃗ are eigenvalues of 𝑉. be ergodic, and aside from special cases it will not be con-
On the basis of the above, since any integral operator 𝐺 tinuous on 𝐴 × 𝐴. Nevertheless, based on the 𝐿2 (𝜇 × 𝜇)
on 𝐿2 (𝜇) associated with a symmetric kernel 𝑘 ∈ 𝐿2 (𝜇×𝜇) is convergence of 𝑑𝑄 ̃ to 𝑑∞
̃ , it can be shown [DG19] that for
Hilbert–Schmidt (and thus compact), and we have a wide any continuous function ℎ ∶ ℝ → ℝ, the integral operator
variety of data-driven tools for approximating integral op- 𝐺∞ on 𝐿2 (𝜇) associated with the kernel 𝑘∞ = ℎ ∘ 𝑑∞ com-
erators, we can reduce the problem of consistently approxi- mutes with 𝑈 𝑡 for any 𝑡 ∈ ℝ. Moreover, as 𝑄 → ∞, the
mating the point spectrum of the Koopman group on 𝐿2 (𝜇) operators 𝐺𝑄 associated with 𝑘𝑄 = ℎ ∘ 𝑑𝑄 converge to 𝐺∞
to the problem of constructing a commuting integral op- in 𝐿2 (𝜇) operator norm, and thus in spectrum.
erator. As we now argue, the success of a number of tech- Many of the operators employed in SSA, DMDC, NLSA,
niques, including singular spectrum analysis (SSA) [BK86], and Hankel DMD can be modeled after 𝐺𝑄 described
diffusion-mapped delay coordinates (DMDC) [BCGFS13], above. In particular, because 𝐺𝑄 is induced by a continu-
nonlinear Laplacian spectral analysis (NLSA) [GM12], and ous kernel, its spectrum can be consistently approximated
Hankel DMD [BBP+ 17], in identifying coherent patterns by data-driven operators 𝐺𝑄,𝑁 on 𝐿2 (𝜇𝑁 ), as described in
can at least be partly attributed to the fact that they em- the context of forecasting. The eigenfunctions of these
ploy integral operators that approximately commute with operators at nonzero eigenvalues approximate eigenfunc-
the Koopman operator. tions of 𝐺𝑄 , which approximate in turn eigenfunctions of
A common characteristic of these methods is that they 𝐺∞ lying in finite unions of Koopman eigenspaces. Thus,
employ, in some form, delay-coordinate maps [SYC91]. for sufficiently large 𝑁 and 𝑄, the eigenfunctions of 𝐺𝑄,𝑁 at
With our notation for the covariate function 𝑋 ∶ Ω → nonzero eigenvalues capture distinct timescales associated
𝒳 and sampling interval Δ𝑡, the 𝑄-step delay-coordinate with the point spectrum of the dynamical system, provid-
map is defined as 𝑋𝑄 ∶ Ω → 𝒳 𝑄 with 𝑋𝑄 (𝜔) = ing physically interpretable features. These kernel eigen-
(𝑋(𝜔0 ), 𝑋(𝜔−1 ), … , 𝑋(𝜔−𝑄+1 )) and 𝜔𝑞 = Φ𝑞 ∆𝑡 (𝜔). That is, functions can also be employed in Galerkin schemes to
𝑋𝑄 can be thought of as a lift of 𝑋, which produces “snap- approximate individual Koopman eigenfunctions.
shots,” to a map taking values in the space 𝒳 𝑄 containing Besides the spectral considerations described above,
“videos.” Intuitively, by virtue of its higher-dimensional in [BCGFS13] a geometrical characterization of the
codomain and dependence on the dynamical flow, a delay- eigenspaces of 𝐺𝑄 was given based on Lyapunov metrics of
coordinate map such as 𝑋𝑄 should provide additional in- dynamical systems. In particular, it follows by Oseledets’s
formation about the underlying dynamics on Ω over the multiplicative ergodic theorem that for 𝜇-a.e. 𝜔 ∈ 𝑀 there
exists a decomposition 𝑇𝜔 𝑀 = 𝐹1,𝜔 ⊕ ⋯ ⊕ 𝐹𝑟,𝜔 , where 𝑇𝜔 𝑀

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1345


is the tangent space at 𝜔 ∈ 𝑀, and 𝐹𝑗,𝜔 are subspaces satis- in the domain of the Nyström operator, which maps them
fying the equivariance condition 𝐷Φ𝑡 𝐹𝑗,𝜔 = 𝐹𝑗,Φ𝑡 (𝜔) . More- to differentiable functions in an RKHS. Here, 𝜏 is a posi-
over, there exist Λ1 > ⋯ > Λ𝑟 , such that for every 𝑣 ∈ 𝐹𝑗,𝜔 , tive regularization parameter such that, as 𝜏 → 0+ , 𝑉𝜏̃ con-
𝑡 verges to 𝑉 in a suitable spectral sense. We will assume that
Λ𝑗 = lim𝑡→∞ ∫0 log‖𝐷Φ𝑠 𝑣‖ 𝑑𝑠/𝑡, where ‖⋅‖ is the norm on
𝑇𝜔 𝑀 induced by a Riemannian metric. The numbers Λ𝑗 the forward-invariant, compact manifold 𝑀 has 𝐶 1 regular-
are called Lyapunov exponents, and are metric-independent. ity, but will not require that the support 𝐴 of the invariant
Note that the dynamical vector field 𝑉(𝜔)⃗ lies in a subspace measure be differentiable.
𝐹𝑗0 ,𝜔 with a corresponding zero Lyapunov exponent. With these assumptions, let 𝑘 ∶ Ω × Ω → ℝ be a
If 𝐹𝑗0 ,𝜔 is one-dimensional, and the norms ‖𝐷Φ𝑡 𝑣‖ obey symmetric, positive-definite kernel, whose restriction on
appropriate uniform growth/decay bounds with respect to 𝑀 × 𝑀 is continuously differentiable. Then, the corre-
𝜔 ∈ 𝑀, the dynamical flow is said to be uniformly hyper- sponding RKHS ℋ(𝑀) embeds continuously in the Ba-
bolic. If, in addition, the support 𝐴 of 𝜇 is a differentiable nach space 𝐶 1 (𝑀) of continuously differentiable functions
manifold, then there exists a class of Riemannian metrics, on 𝑀, equipped with the standard norm. Moreover, be-
called Lyapunov metrics, for which the 𝐹𝑗,𝜔 are mutually or- cause 𝑉 is an extension of the directional derivative 𝑉 ⃗ ⋅∇ as-
thogonal at every 𝜔 ∈ 𝐴. In [BCGFS13], it was shown sociated with the dynamical vector field, every function in
that using a modification of the delay-distance in (5) with ℋ(𝑀) lies, upon inclusion, in 𝐷(𝑉). The key point here is
exponentially decaying weights, as 𝑄 → ∞, the top eigen- that regularity of the kernel induces RKHSs of observables
(𝑄) which are guaranteed to lie in the domain of the generator.
functions 𝜙𝑗 of 𝐺𝑄 vary predominantly along the sub-
In particular, the range of the integral operator 𝐺 = 𝐾 ∗ 𝐾
space 𝐹𝑟,𝜔 associated with the most stable Lyapunov ex-
on 𝐿2 (𝜇) associated with 𝑘 lies in 𝐷(𝑉), so that 𝐴 = 𝑉𝐺
ponent. That is, for every 𝜔 ∈ Ω and tangent vector
is well-defined. This operator is, in fact, Hilbert–Schmidt,
𝑣 ∈ 𝑇𝜔 𝑀 orthogonal to 𝐹𝑟,𝜔 with respect to a Lyapunov
(𝑄)
with Hilbert–Schmidt norm bounded by the 𝐶 1 (𝑀 × 𝑀)
metric, lim𝑄→∞ 𝑣 ⋅ ∇𝜙𝑗 = 0. norm of the kernel 𝑘. What is perhaps less obvious is
RKHS approaches. While delay-coordinate maps are ef- that 𝐺 1/2 𝑉𝐺 1/2 (which “distributes” the smoothing by 𝐺
fective for approximating the point spectrum and associ- to the left and right of 𝑉), defined on the dense subspace
ated Koopman eigenfunctions, they do not address the {𝑓 ∈ 𝐿2 (𝜇) ∶ 𝐺 1/2 𝑓 ∈ 𝐷(𝑉)}, is also bounded, and thus
problem of identifying coherent observables in the con- has a unique closed extension 𝑉 ̃ ∶ 𝐿2 (𝜇) → 𝐿2 (𝜇), which
tinuous spectrum subspace 𝐻𝑐 . Indeed, one can verify that turns out to be Hilbert–Schmidt. Unlike 𝐴, 𝑉 ̃ is skew-
in mixed-spectrum systems the infinite-delay operator 𝐺∞ , adjoint, and thus preserves an important structural prop-
which provides access to the eigenspaces of the Koopman erty of the generator. By skew-adjointness and compact-
operator, has a nontrivial nullspace that includes 𝐻𝑐 as a ness of 𝑉,̃ there exists an orthonormal basis {𝑧𝑗̃ ∶ 𝑗 ∈ ℤ} of
subspace. More broadly, there is no obvious way of iden- 𝐿2 (𝜇) consisting of its eigenfunctions 𝑧𝑗̃ , with purely imagi-
tifying coherent observables in 𝐻𝑐 as eigenfunctions of an nary eigenvalues 𝑖𝛼𝑗̃ . Moreover, (i) all 𝑧𝑗̃ corresponding to
intrinsic evolution operator. As a remedy to this problem, nonzero 𝛼𝑗̃ lie in the domain of the Nyström operator, and
we relax the problem of seeking Koopman eigenfunctions, therefore have 𝐶 1 representatives in ℋ(𝑀); and (ii) if 𝑘 is
and consider instead approximate eigenfunctions. An observ- 𝐿2 (𝜇)-universal, Markov, and ergodic, then 𝑉 ̃ has a simple
able 𝑧 ∈ 𝐿2 (𝜇) is said to be an 𝜖-approximate eigenfunction eigenvalue at zero, in agreement with the analogous prop-
of 𝑈 𝑡 if there exists 𝜆𝑡 ∈ ℂ such that erty of 𝑉.
Based on the above, we seek to construct a one-
‖𝑈 𝑡 𝑧 − 𝜆𝑡 𝑧‖𝐿2 (𝜇) < 𝜖‖𝑧‖𝐿2 (𝜇) . (6)
parameter family of such kernels 𝑘𝜏 , with associated
The number 𝜆𝑡 is then said to lie in the 𝜖-approximate spec- RKHSs ℋ𝜏 (𝑀), such that as 𝜏 → 0+ , the regularized genera-
trum of 𝑈 𝑡 . A Koopman eigenfunction is an 𝜖-approximate tors 𝑉𝜏̃ converge to 𝑉 in a sense suitable for spectral conver-
eigenfunction for every 𝜖 > 0, so we think of (6) as a relax- gence. Here, the relevant notion of convergence is strong
ation of the eigenvalue equation, 𝑈 𝑡 𝑧 − 𝜆𝑡 𝑧 = 0. This sug- resolvent convergence; that is, for every element 𝜆 of the re-
gests that a natural notion of coherence of observables in solvent set of 𝑉 and every 𝑓 ∈ 𝐿2 (𝜇), (𝑉𝜏̃ − 𝜆)−1 𝑓 must
𝐿2 (𝜇), appropriate to both the point and continuous spec- converge to (𝑉 − 𝜆)−1 𝑓. In that case, for every element 𝑖𝛼
trum, is that (6) holds for 𝜖 ≪ 1 and all 𝑡 in a “large” inter- of the spectrum of 𝑉 (both point and continuous), there
val. exists a sequence of eigenvalues 𝑖𝛼̃ 𝑗𝜏 ,𝜏 of 𝑉𝜏̃ converging to
We now outline an RKHS-based approach [DGS18], 𝑖𝛼 as 𝜏 → 0+ 0. Moreover, for any 𝜖 > 0 and 𝑇 > 0, there
which identifies observables satisfying this condition exists 𝜏∗ > 0 such that for all 0 < 𝜏 < 𝜏∗ and |𝑡| < 𝑇, 𝑒𝑖𝛼𝑗𝜏 ,𝜏 𝑡
through eigenfunctions of a regularized operator 𝑉𝜏̃ on lies in the 𝜖-approximate spectrum of 𝑈 𝑡 and 𝑧𝑗̃ 𝜏 ,𝜏 is an
𝐿2 (𝜇) approximating 𝑉 with the properties of (i) being 𝜖-approximate eigenfunction.
skew-adjoint and compact; and (ii) having eigenfunctions

1346 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9


In [DGS18], a constructive procedure was proposed for
obtaining the kernel family 𝑘𝜏 through a Markov semi-
group on 𝐿2 (𝜇). This method has a data-driven implemen-
tation, with analogous spectral convergence results for the
associated integral operators 𝐺𝜏,𝑁 on 𝐿2 (𝜇𝑁 ) to those de-
scribed in the setting of forecasting. Given these opera-
1/2 1/2
tors, we approximate 𝑉𝜏̃ by 𝑉𝜏,𝑁 ̃ = 𝐺𝜏,𝑁 𝑉𝑁 𝐺𝜏,𝑁 , where 𝑉𝑁
is a skew-adjoint, finite-difference approximation of the gen-
erator. For example, 𝑉𝑁 = (𝑈𝑁1 − 𝑈𝑁1∗ )/(2 Δ𝑡) is a second-
order finite-difference approximation based on the 1-step
shift operator 𝑈𝑁1 . See Figure 1 for a graphical represen-
tation of a generator matrix for L63. As with our data-
driven approximations of 𝑈 𝑡 , we work with a rank-𝐿 opera-
tor 𝑉𝜏̂ ∶= Π𝜏,𝑁,𝐿 𝑉𝜏,𝑁 Π𝜏,𝑁,𝐿 , where Π𝜏,𝑁,𝐿 is the orthogonal
projection to the subspace spanned by the first 𝐿 eigenfunc-
tions of 𝐺𝜏,𝑁 . This family of operators converges spectrally
to 𝑉𝜏 in a limit of 𝑁 → 0, followed by Δ𝑡 → 0 and 𝐿 → ∞,
where we note that 𝐶 1 regularity of 𝑘𝜏 is important for the
finite-difference approximations to converge.
At any given 𝜏, an a posteriori criterion for identify-
ing candidate eigenfunctions 𝑧𝑗,𝜏 ̂ satisfying (6) for small
𝜖 is to compute a Dirichlet energy functional, 𝒟(𝑧𝑗,𝜏 ̂ ) =
̂ ‖2ℋ𝜏 (𝑀) /‖𝑧𝑗,𝜏
‖𝒩𝜏,𝑁 𝑧𝑗,𝜏 ̂ ‖2𝐿2 (𝜇𝑁 ) . Intuitively, 𝒟 assigns a mea-
sure of roughness to every nonzero element in the domain
of the Nyström operator (analogously to the Dirichlet en-
ergy in Sobolev spaces on differentiable manifolds), and
the smaller 𝒟(𝑧𝑗,𝜏 ̂ ) is, the more coherent 𝑧𝑗,𝜏 ̂ is expected to
be. Indeed, as shown in Figure 3, the 𝑧𝑗,𝜏 ̂ corresponding to
low Dirichlet energy identify observables of the L63 system Figure 3. A representative eigenfunction 𝑧𝑗,𝜏
̂ of the
with a coherent dynamical evolution, even though this sys- compactified generator 𝑉𝜏̂ for the L63 system, with low
tem is mixing and has no nonconstant Koopman eigen- corresponding Dirichlet energy. Top: Scatterplot of Re 𝑧𝑗,𝜏
̂ on
functions. Sampled along dynamical trajectories, the ap- the L63 attractor. Bottom: Time series of Re 𝑧𝑗,𝜏
̂ sampled
proximate Koopman eigenfunctions resemble amplitude- along a dynamical trajectory.
modulated wavepackets, exhibiting a low-frequency mod- Mori–Zwanzig formalism so as to incorporate memory ef-
ulating envelope while maintaining phase coherence and a fects. Another potential direction for future development
precise carrier frequency. This behavior can be thought of is to incorporate wavelet frames, particularly when the
as a “relaxation” of Koopman eigenfunctions, which gen- measurements or probability densities are highly localized.
erate pure sinusoids with no amplitude modulation. Moreover, when the attractor 𝐴 is not a manifold, appro-
Conclusions and Outlook priate notions of regularity need to be identified so as to
fully characterize the behavior of kernel algorithms such as
We have presented mathematical techniques at the inter-
diffusion maps. While we suspect that kernel-based con-
face of dynamical systems theory and data science for sta-
structions will still be the fundamental tool, the choice of
tistical analysis and modeling of dynamical systems. One
kernel may need to be adapted to the regularity of the at-
of our primary goals has been to highlight a fruitful inter-
tractor to obtain optimal performance. Finally, a number
play of ideas from ergodic theory, functional analysis, and
of applications (e.g., analysis of perturbations) concern
differential geometry, which, coupled with learning the-
the action of dynamics on more general vector bundles
ory, provide an effective route for data-driven prediction
besides functions, potentially with a noncommutative al-
and pattern extraction, well-adapted to handle nonlinear
gebraic structure, calling for the development of suitable
dynamics and complex geometries.
data-driven techniques for such spaces.
There are several open questions and future research di-
rections stemming from these topics. First, it should be
possible to combine pointwise estimators derived from
methods such as diffusion forecasting and KAF with the

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1347


[GM12] D. Giannakis and A. J. Majda, Nonlinear Lapla-
ACKNOWLEDGMENTS. Research of the authors de- cian spectral analysis for time series with intermittency and
scribed in this review was supported by DARPA low-frequency variability, Proc. Natl. Acad. Sci. USA 109
grant HR0011-16-C-0116; NSF grants 1842538, DMS- (2012), no. 7, 2222–2227, DOI 10.1073/pnas.1118984109.
1317919, DMS-1521775, DMS-1619661, DMS-172317, MR2898568
DMS-1854383; and ONR grants N00014-12-1-0912, [KNK+ 18] S. Klus, F. Nüske, P. Koltai, H. Wu, I. Kevrekidis, C.
N00014-14-1-0150, N00014-13-1-0797, N00014-16-1- Schütte, and F. Noé, Data-driven model reduction and transfer
2649, N00014-16-1-2888. operator approximation, J. Nonlinear Sci. 28 (2018), no. 3,
985–1010, DOI 10.1007/s00332-017-9437-7. MR3800253
[Koo31] B. O. Koopman, Hamiltonian systems and transforma-
References tion in Hilbert space, Proc. Natl. Acad. Sci. 17 (1931), no. 5,
[AG20] R. Alexander and D. Giannakis, Operator-theoretic 315–318, DOI 10.1073/pnas.17.5.315.
framework for forecasting nonlinear time series with kernel [KPM20] M. Korda, M. Putinar, and I. Mezić, Data-driven
analog techniques, Phys. D 409 (2020), 132520, 24, DOI spectral analysis of the Koopman operator, Appl. Com-
10.1016/j.physd.2020.132520. MR4093838 put. Harmon. Anal. 48 (2020), no. 2, 599–629, DOI
[BBP+ 17] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, 10.1016/j.acha.2018.08.002. MR4047538
and J. N. Kutz, Chaos as an intermittently forced linear system, [Lor69] E. N. Lorenz, Atmospheric predictability as re-
Nat. Commun. 8 (2017), no. 19, DOI 10.1038/s41467-017- vealed by naturally occurring analogues, J. Atmos.
00030-8. Sci. 26 (1969), 636–646, DOI 10.1175/1520-
[BCGFS13] T. Berry, J. R. Cressman, Z. Gregurić-Ferenček, 0469(1969)26<636:aparbn>2.0.co;2.
and T. Sauer, Time-scale separation from diffusion-mapped de- [Mez05] I. Mezić, Spectral properties of dynamical systems,
lay coordinates, SIAM J. Appl. Dyn. Syst. 12 (2013), no. 2, model reduction and decompositions, Nonlinear Dynam. 41
618–649, DOI 10.1137/12088183X. MR3047439 (2005), no. 1-3, 309–325, DOI 10.1007/s11071-005-2824-
[BGH15] T. Berry, D. Giannakis, and J. Harlim, Nonparametric x. MR2157184
forecasting of low-dimensional dynamical systems, Phys. Rev. E. [RMB+ 09] C. W. Rowley, I. Mezić, S. Bagheri, P. Schlat-
91 (2015), 032915, DOI 10.1103/PhysRevE.91.032915. ter, and D. S. Henningson, Spectral analysis of nonlin-
[BH16] T. Berry and J. Harlim, Variable bandwidth diffusion ker- ear flows, J. Fluid Mech. 641 (2009), 115–127, DOI
nels, Appl. Comput. Harmon. Anal. 40 (2016), no. 1, 68– 10.1017/S0022112009992059. MR2577895
96, DOI 10.1016/j.acha.2015.01.001. MR3431485 [Sch10] P. J. Schmid, Dynamic mode decomposition of numerical
[BK86] D. S. Broomhead and G. P. King, Extracting qualita- and experimental data, J. Fluid Mech. 656 (2010), 5–28, DOI
tive dynamics from experimental data, Phys. D 20 (1986), 10.1017/S0022112010001217. MR2669948
no. 2-3, 217–236, DOI 10.1016/0167-2789(86)90031-X. [SSM98] B. Schölkopf, A. Smola, and K. Müller, Nonlinear
MR859354 component analysis as a kernel eigenvalue problem, Neural
[BN03] M. Belkin and P. Niyogi, Laplacian eigenmaps for di- Comput. 10 (1998), 1299–1319.
mensionality reduction and data representation, Neural Com- [SYC91] T. Sauer, J. A. Yorke, and M. Casdagli, Embedol-
put. 15 (2003), 1373–1396. ogy, J. Statist. Phys. 65 (1991), no. 3-4, 579–616, DOI
[CL06] R. R. Coifman and S. Lafon, Diffusion maps, Appl. 10.1007/BF01053745. MR1137425
Comput. Harmon. Anal. 21 (2006), no. 1, 5–30, DOI [vLBB08] U. von Luxburg, M. Belkin, and O. Bousquet,
10.1016/j.acha.2006.04.006. MR2238665 Consistency of spectral clustering, Ann. Statist. 36 (2008),
[CS02] F. Cucker and S. Smale, On the mathematical foun- no. 2, 555–586, DOI 10.1214/009053607000000640.
dations of learning, Bull. Amer. Math. Soc. (N.S.) 39 MR2396807
(2002), no. 1, 1–49, DOI 10.1090/S0273-0979-01-00923- [WKR15] M. O. Williams, I. G. Kevrekidis, and C. W. Row-
5. MR1864085 ley, A data-driven approximation of the Koopman operator: ex-
[DFS00] M. Dellnitz, G. Froyland, and S. Sertl, On the tending dynamic mode decomposition, J. Nonlinear Sci. 25
isolated spectrum of the Perron-Frobenius operator, Nonlin- (2015), no. 6, 1307–1346, DOI 10.1007/s00332-015-9258-
earity 13 (2000), no. 4, 1171–1188, DOI 10.1088/0951- 5. MR3415049
7715/13/4/310. MR1767953 [ZHL19] H. Zhang, J. Harlim, and X. Li, Computing linear re-
[DG19] S. Das and D. Giannakis, Delay-coordinate maps and sponse statistics using orthogonal polynomial based estimators:
the spectra of Koopman operators, J. Stat. Phys. 175 (2019), An RKHS formulation, 2019, https://arxiv.org/abs
no. 6, 1107–1145, DOI 10.1007/s10955-019-02272-w. /1912.11110.
MR3962976
[DGS18] S. Das, D. Giannakis, and J. Slawinska, Reproduc-
ing kernel Hilbert space compactification of unitary evolution
groups, 2018, https://arxiv.org/abs/1808.01515.
[DJ99] M. Dellnitz and O. Junge, On the approxi-
mation of complicated dynamical behavior, SIAM J.
Numer. Anal. 36 (1999), no. 2, 491–515, DOI
10.1137/S0036142996313002. MR1668207

1348 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9


Faculty Position
in Mathematics
at the Ecole polytechnique fédérale
de Lausanne (EPFL)
Tyrus Berry Dimitrios Giannakis
The School of Basic Sciences (Physics, Chemistry and Mathemat-
ics) at EPFL seeks to appoint a Tenure-Track Assistant Professor
of Mathematics. We seek outstanding candidates with research
interests in the theory of dynamical systems, broadly construed.
Indicative areas of interest include, but are not limited to: algebraic/
analytical dynamics, dynamics and geometry, ergodic theory, local-
ly homogeneous spaces.

We expect candidates to establish leadership and strengthen the


EPFL’s profile in the field. Priority will be given to the overall origi-
nality and promise of the candidate’s work over any particular spe-
cialization area.
John Harlim
Candidates should hold a PhD and have an excellent record of sci-
Credits entific accomplishments in the field. In addition, commitment to
Figures are courtesy of the authors. teaching at the undergraduate, master and doctoral levels is ex-
Photo of Tyrus Berry is courtesy of Miruna Tecuci. pected. Proficiency in French teaching is not required, but willing-
Photo of Dimitrios Giannakis is courtesy of Joanna Slawin- ness to learn the language expected.
ska. EPFL, with its main campus located in Lausanne, Switzerland, on
Photo of John Harlim is courtesy of Leonie Vachon. the shores of lake Geneva, is a dynamically growing and well-fund-
ed institution fostering excellence and diversity. It has a highly
international campus with first-class infrastructure, including high
performance computing.

As a technical university covering essentially the entire palette of


engineering and science, EPFL offers a fertile environment for re-
search cooperation between different disciplines. The EPFL envi-
ronment is multi-lingual and multi-cultural, with English often serv-
ing as a common interface.

Applications should include a cover letter, a CV with a list of publi-


cations, a concise statement of research (maximum 3 pages) and
teaching interests (one page), and the names and addresses (in-
cluding e-mail) of at least three references.

Applications should be uploaded (as PDFs) by September 30,


2020 to :

https://facultyrecruiting.epfl.ch/positiondetails/23691270

Enquiries may be addressed to:


Prof. Victor Panaretos
Chair of the Search Committee
E-mail : [email protected]

For additional information, please consult www.epfl.ch,


sb.epfl.ch, math.epfl.ch

EPFL is an equal opportunity employer and family friendly university.


It is committed to increasing the diversity of its faculty.
It strongly encourages women to apply.

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1349

You might also like