Exploration and Prediction of Fluid Dynamical Systems Using
Exploration and Prediction of Fluid Dynamical Systems Using
CrossMark
View Export
Online Citation
Lionel Agostinia)
AFFILIATIONS
Department of Aeronautics, Imperial College London, South Kensington, London SW7 2AZ, United Kingdom
a)
Author to whom correspondence should be addressed: [email protected]
ABSTRACT
Machine-learning (ML) algorithms offer a new path for investigating high-dimensional, nonlinear problems, such as flow-dynamical systems.
The development of ML methods, associated with the abundance of data and combined with fluid-dynamics knowledge, offers a unique
opportunity for achieving significant breakthroughs in terms of advances in flow prediction and its control. The objective of this paper is to
discuss some possibilities offered by ML algorithms for exploring and predicting flow-dynamical systems. First, an overview of basic concepts
underpinning artificial neural networks, deep neural networks, and convolutional neural networks is given. Building upon this overview, the
I. INTRODUCTION The non-linear terms are the origin of the turbulence—a chaotic (but
not random) phenomenon that is characterized by a broad range of
Fluid mechanics is in the enviable position of having a firmly spatio-temporal scales, some of which are associated with coher-
established, mathematical foundation, in the form of the Navier– ent structures and feature certain aspects of order and organiza-
Stokes (NS) equations, which describe all single-phase, non-reacting tion.15,55,64 This makes most fluid flows complex dynamical sys-
fluid-flow scenarios. tems that contain numerous spatio-temporal degrees of freedom—
For an incompressible flow, the NS equations can be written as characterizing the system’s “dimensionality.”
As the dimensionality becomes ever higher at the increas-
∂u 1
+ (u ⋅ ∇)u = − ∇p + g + ν∇2 u , (1) ing speed (Reynolds number), solving the NS equations becomes
∂t ´¹¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ρ ² more and more resource-intensive, exceeding present computa-
° convection ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ diffusion
variation sources tional capabilities in most practically relevant conditions. A route
often taken for predicting the flow is, therefore, to simplify the sys-
where u is the velocity field, ρ is the density, ν is the dynamic vis- tem by replacing it with an appropriate model, thus reducing the
cosity, p is the pressure, and g represents body acceleration. The system’s dimensionality. The real system is then represented by a
motion of any viscous fluid can be accurately predicted, in principle, simplified reduced-order model, composed of a minimum num-
by solving these equations if the boundary conditions are fully pre- ber of modes able to represent the principal physical and opera-
scribed (Laplace’s determinism65 ). However, the non-linear terms tional characteristics of the flow system. As non-trivial fluid-flow
associated with the convective acceleration are the source of instabil- systems are inherently complex—characterized by extremely high
ity that manifests itself, at more than very modest speeds, by a wide dimensionality, a wide range of scales, and strong non-linearity—
range of interacting eddies that evolve chaotically in time and space. the identification of the phenomena driving the flow dynamics and
their associated modes remains a major challenge in present-day ML is also becoming increasingly pertinent to the construc-
fluid-flow physics and engineering. tion of efficient control algorithms for non-linear fluid-flow sys-
Thanks to recent advances in experimental and computational tems. Within a closed-loop-control framework, the fluid flow as a
methods, and the drastic reduction in computational costs due to whole may be regarded as a dynamical system, which is steered
advances in hardware, both the number and the size of databases toward a desired state by control laws. Inputs are disturbances and
that contain the output of measurements and simulations have actuation signals applied by the controller, whereas the dynami-
increased dramatically over the past decade or two. This abundance cal system’s outputs are the cost function—the quantity that has
of data holds great promise for deepening the knowledge of fluid- to be optimized—and sensor signals corresponding to the observ-
dynamic systems but poses new challenges for processing the data able quantities. The controller’s objective is to determine from the
that requires novel tools. Candidates for such tools and related tech- sensors the control laws that must be imposed onto the system for
niques are progressively available within the rapidly growing field maximizing or minimizing the cost function.
“Machine Learning (ML),” and present methods are increasingly Rapid developments of ML algorithms in fluid mechanics (see
made applicable to the analysis of spatio-temporal fluid-flow data, the recent reviews of Brunton et al.7 and Brenner et al.6 ) and modern
aided by recent advances in hardware (Graphics/Tensor Process- computational hardware provide the rationale for pursuing the use
ing Unit—GPU/TPU—platforms, in particular), algorithms, and of data-driven methods and ML algorithms for active flow control.
open-source libraries. The following two ML-based routes can be taken for implementing
By definition, ML is the field of study that gives computers the flow-control strategies, one being conventional and the other more
ability to learn without being explicitly programmed.61 ML algo- progressive:
rithms are categorized as supervised, semi-supervised, and unsuper-
vised (see Fig. 1 and the recent review of Brunton et al.7 ). The task ● Model-driven approach. The real system is represented
of supervised learning is to learn a function that maps input to out- by a simplified, reduced-order model. This model can be
put based on labeled examples. In contrast, unsupervised learning described by a set of equations—e.g., linearized Navier–
consists of learning unknown patterns in a set of data without a Stokes equations,2,18 resolvent analysis,41 exact coherent
priori information. Finally, semi-supervised learning operates on a structures,25 or by equations coupled with data—e.g., POD-
large amount of unlabeled data, or with corrective input from an Galerkin14 or data assimilation.16,68 Whichever method
appropriate environment, for training a small amount of labeled is used, the model-driven approach poses majors chal-
hardware, data-driven algorithms hold great potential for Auto-Encoder (AE) technology is a “good candidate” for reduc-
identifying control strategies by focusing directly on the ing efficiently the problem dimensionality, as shown by the online
controller without any knowledge of the dynamics and article “A look at deep learning for science” by Prabhat,49 which pro-
without recourse to any model—the so-called model-free vides an excellent overview of some uses of the deep learning tech-
approach. ML optimizes the system based on data the sys- nology in science applications. Indeed, several studies presented in
tem generates—thus unlike model-driven control, which this overview were only possible by the use of AEs. This method
optimizes the constrained model. aims at extracting essential information and representing it in opti-
mal space, by reducing the dimensionality of the data and conserv-
A whole range of ML methods have been successfully used in ing only the most essential information. By identifying new coor-
fluid mechanics for a variety of desired outcomes; among meth- dinate systems, built on non-linear embedding, the AE combined
ods used specifically for “Machine Learning Control (MLC),” evo- with other ML algorithms allows for the approximation of strongly
lutionary algorithms and genetic algorithms have been shown to non-linear dynamics by a sparse, weakly non-linear model,11,43,51
be very effective for designing well behaved control laws. Evolu- enabling MLC methods to design the most efficient control laws.
tionary algorithms are based on the biological principle of natural After a short introduction to Artificial Neural Networks
selection. Starting with an initial population of control laws, which (ANNs) in Sec. II, the concept of the AE is discussed and com-
are all competing against one another, only the most effective one pared to the Proper Orthogonal Decomposition (POD) in Sec. III,
for optimizing the objective function is selected. Control laws are a conventional technique used to represent flow-structure informa-
improved from generation to generation by applying genetic oper- tion. In Secs. V–VII, the low-dimensional dynamical representation
ations (elitism, mutation, replication, crossover). By “breeding” the obtained by the AE is then used, in conjunction with other machine
most effective control laws, these are selectively improved from gen- learning algorithms, to
eration to generation. Unlike genetic algorithms, which search for
(i) provide a low-dimensional dynamical model (a probabilistic
the best parameters to apply to the control law, genetic program-
flow prediction)—in Sec. V,
ming control learns the structure of control laws and parameters.17
(ii) give a deterministic flow prediction—in Sec. VI, and
Genetic programming has been used for turbulence control in vari-
(iii) retrieve high-resolution data in the spatio-temporal domain
ous applications, for example, the mixing-layer,52 flow-separation,21
from contaminated and/or under-sampled data—in Sec. VII.
turbulent separated shear flow,4 and flow around bluff bodies (e.g.,
the Ahmed body39,40 ). Another approach for identifying an opti-
To demonstrate the effect of the use of non-linear activation Training the network entails finding the parameters θ
L
functions, the functions in Eq. (2) are initially prescribed to be linear = {Wl , bl }l=1 that minimize the (expected) loss of meaningful infor-
[e.g., f (x) = ax]. The transformation operated by the network is then mation between the output and the target value. Modern neural
given by networks use a technique called backpropagation9,37,60,67 to train the
model and discover intricate structures in high-dimensional data.
L ⎛ L
bi ⎞ A backpropagation algorithm allows the DNN to adjust its inter-
y = (∏ ai Wi ) x + ∑ i j j . (3)
i=1 ⎝ i=1 ∏j=1 a W ⎠
nal parameters used to compute the representation in each layer
from the representation in the previous layer. This approach places
This formula shows clearly that their combination is still a lin- an increased computational strain on the activation function and
ear function—i.e., the final activation function of the last layer is a its derivative function. To properly adjust the weight vector Wl ,
linear function of the input into the first layer. Therefore, a DNN a gradient vector is computed by the learning algorithm for each
with L layers can be replaced by an ANN with a single layer model- weight, providing information on the error variation as the weight is
ing only a linear regression between the input and the output. This is slightly modified. The weight vector is then adjusted in the opposite
contrasted with the relation given in Eq. (2) for which the DNN cor- direction to the gradient vector.
responds to a non-linear regression model if the activation functions The concept of backpropagation by which ANNs learn to tune
are non-linear. their parameters is illustrated in Fig. 2 by using an electric circuit
as a surrogate for a DNN. The electric current flows from a socket get smaller and smaller as information is pushed backward in the
to a bulb light, and its intensity is regulated by two rotary switches. network, resulting in that the neurons embedded in the first layer
The intensity of the input and output currents is represented by x are learning very slowly as compared to the neurons in the later lay-
and y, respectively, and h is the voltage after the first switch (hidden ers in the hierarchy. The problem of vanishing gradients implies that
layer). The objective here is to find the configuration of both switches DNNs take a long time to train and thus degrade the model’s accu-
(weights) for which the brightness is optimal (not necessarily maxi- racy. On the other hand, exploding gradients is the exact opposite
mal), y′ being the optimal intensity. A cost function J can be defined problem to vanishing gradients: large gradient errors accumulate,
and represented by the relation J = ∥y − y′ ∥. The optimal bright- as the magnitude of weights is high, resulting in excessively large
ness can be obtained by minimizing the cost function J by adjusting updates to the model’s weights. The model oscillates around the
w1 and w2 . The variation of the cost function for small changes in minimum or even overshoots the optimum, and the model is thus
w1 and w2 has to be determined. Although these derivatives can- unable to learn.
not be defined directly, they can be estimated by backpropagating Generally, the corruption of the learning process by vanish-
the sensitivity of J to variations in the electric current. The route ing gradients can be avoided by using the REctified Linear Unit
to extracting these derivatives and linking them to ∂y ∂J
is referred (RELU)48 function as the activation function, as the gradient is 0
to as the “chain of sensitivity” or “chain of rules,”9,35,67 and this is for negative inputs and 1 for positive ones. The RELU is relatively
shown in the second frame in Fig. 2. The role of the backpropaga- robust toward the vanishing/exploding gradient problem, and for
tion is to define the amounts by which w1 and w2 have to be adjusted, DNNs, the initial values of the weights can be set to particular values,
the adjustments being weighted by the sensitivity. Finding the opti- depending on the non-linear activation function used, as defined by
mal configuration is an iterative process of making small changes Glorot and Bengio.22
in w1 and w2 and updating the sensitivity estimates until J cannot
decrease further. Small weight changes Δw1 and Δw2 are imposed at
each iteration in order to perform a step in the downhill direction
of the sensitivity curve. This approach is referred to as the “gradient III. DIMENSIONALITY REDUCTION WITH
descent” method. Weight increments are proportional to the sen- AUTO-ENCODER
sitivity through the constant multiplier lr, called the “learning rate.” An AE is a neural-network architecture that has three parts:
This constant has to be chosen with care. If it is too large, the descent an encoder, a bottleneck (latent space), and a decoder. An AE is an
non-linear functions and consequently is more effective for describ- POD is intrinsically a linear regression method, and it essen-
ing the underlying manifold structure of the data.3,5 The AE can tially learns a linear transformation that projects the data into
be regarded as a generalization of the POD54 or Dynamic Mode another space, where vectors of projections are defined by the
Decomposition (DMD).51,66 The automated extraction of flow fea- variance of the data. The AE methodology is not subject to such
tures by unsupervised learning algorithms,35 such as the AE com- linearity constraints, as data dimensionality is reduced by stack-
bined with other ML algorithms—for instance, clustering methods ing multiple non-linear transformations. The AE is capable of
(to be discussed later)—offers innovative methods for flow modeling modeling complex non-linear functions and consequently is more
and control using low-dimensional dynamical models. effective for describing the underlying manifold structure of the
data.3,5
The manner in which a single-layer AE operates on input data
A. Auto-encoder technology: Principles and purpose
to give an output that optimally represents the input by way of a
The AE’s principles were first introduced in the ML commu- low-order representation of the input data is given by Eq. (2) and
nity more than 30 years ago.5,34 However, it is only the development illustrated in Fig. 4. For maximizing the likeness between input and
of new powerful computational hardware and Python libraries that reconstruction, the AE has to adjust its weights and biases. If h and
have enabled its widespread utilization. As previously mentioned, g are linear activation functions and no bias is used, then the output
an AE is a multi-layered neural network, composed of an encoder is y = WW ′ x [see also Eq. (3)]. Plaut54 showed that the AE learns to
and a decoder, separated by a “bottleneck.” As a first step, the input span the same subspace as POD, where y = WW T x, if the mean-
data are compressed into a lower-dimensional space by the encoder. squared error defined the loss function (J = (x − y)2 ). By using
Then, the decoder reconstitutes the original input by using the com- non-linear activation functions, non-linear trends in the data set
pressed information embedded in the “latent space,” as illustrated can be identified. Hence, the AE is capable of modeling complex
in Fig. 3. If some structure or order exists in the input data, in non-linear functions, and it can be seen as a generalization of the
whatever sense (e.g., correlations between features), this structure POD.3,5,12,45,54
is “learned” and consequently leveraged when the input is forced To illustrate the difference between PCA and ANN with non-
through the AE’s bottleneck. By reducing the latent-space dimen- linear activation functions, an imaginary set of data is prescribed
sion, the AE has to exploit the natural data structure for identi- in Fig. 5. A set of points are organized, broadly, along a curved
fying the most efficient representation, in order to compress the path with two lines of different slopes in quadrants II and IV,
means that, by adding layers with a large number of neurons or fil- scales, the CNN can thus be applied to the identification of patterns
ters with non-linear activation functions, the potency of the AE for featuring in such flows.10,38
function approximation can be efficiently increased, but the nega-
tive aspect is that interpretability of the data structure in the latent D. Autoencoder cost function: Measure of the
space decreases. Even if the dynamics in the latent space mimics “likeness”
the fluid dynamics in the original space, data in the latent space
The primary goal of most ML algorithms is to construct a
cannot be directly easily related to the physics of the flow. Nev-
model able to map input variables by identifying their features to
ertheless, the motivation of using the AE with a sufficiently high-
output variables, referred to as a target. Given a training set, the
level of complexity is the opportunity to work within a space of low
objective is to learn a function f, such that f (x) is a “good” pre-
dimension.
dictor for the corresponding value of y. Whether the predictor is
considered “good” or “bad” is subjective, and it is tied to the prob-
C. Definition of a convolutional auto-encoder (CAE) lem hypothesis. In fact, a single data set can be used for different
In the early stages of the development of computer vision, neu- purposes, and the cost function measuring the model accuracy dur-
ral networks were composed of stacked layers of perceptrons, fully ing the training process must be adapted to the objectives in order
It can be argued that the CNN pertains to image recognition ever, while the latter measures the linear link between fluctuations,
and cannot be related to, or exploited for, fluid dynamics. However, α is more restrictive, as the magnitude is also taken into considera-
fluid flows, even if turbulent and thus chaotic, contain influential tion. For example, if y is linearly correlated with x by the function y′
′
√
coherent structures and feature certain aspects of order and orga- = ax , then γ2 is equal to 1 whatever the value of a, while α equals
nization. By defining “libraries” of filters able to describe coherent 1 if and only if a = 1; thus, y′ = x′ .
1 1
J= − . (5)
[ α+1
2
] 2n + ε 1 + ε
negative inputs are processed: the ELU varies slowly until its output A subset that is composed of 128 snapshots (corresponding approx-
equals a negative constant, whereas the RELU becomes sharply null. imately to ten shedding cycles) is used for the AE training. This
In the case of ELU, the neurons never contain zero ordered gradient number has been chosen to take full advantage of the TPU hard-
even for negative inputs and the training can continue, whatever the ware, developed by Google. All codes used to obtain the results in
sign of the input values, unlike ANNs using the RELU. This modifi- this paper were written using the COLAB platform with version 14
cation allows the model using the ELU to give more accurate results of the Tensorflow library.1
with faster convergence.13 Once the training has been performed, a volume of streamwise
velocity fields is randomly chosen from the validation subset and fed
into the AE. One of the 12 snapshots is shown in Fig. 8(a). The corre-
B. Autoencoder: Learning process and reconstruction sponding reconstructed snapshot from the AE is shown in Fig. 8(b).
The database used for the present study contained around 43 For measuring the reconstruction error, the parameter ϕ given by
shedding periods, which are sampled over 500 temporal snapshots. the following equation is introduced:
FIG. 8. Snapshot of the streamwise velocity field for a 2D flow around a cylinder: (a) original field; (b) and (c) reconstructed snapshots from the AE and three POD modes,
respectively; (d) and (e) reconstruction error defined by ϕ2 [Eq. (6)]; (f) probability-weighted distribution of ϕ2 .
√
n n
(Urec (x, z) − Uori (x, z))2 Fig. 8(d). A visual comparison of the input image with the recon-
ϕn (x, z) = 100 × n , (6) structed one demonstrates the high-fidelity of the reconstruction
Uori (x, z)
process, with errors occurring mainly in the region just behind the
where U ori and U rec are the original and reconstructed fields, respec- cylinder. The error margin increases up to ϕ2 = 15. However, the
tively. The reconstruction error defined by ϕ2 (x, z) is conveyed in probability-weighted distribution of ϕ2 , plotted in blue in Fig. 8(f),
FIG. 10. Identification process of a low-dimensional dynamical model by using the AE combined with the clustering algorithm.
A. Low-dimensionality projection using AE FIG. 11. (a)–(c) Plots showing a variable in the latent space from different perspec-
tives. Each input volume is mapped to a single black dot point. The first 11 data
The requirement that the latent-space dimensions should be
volumes are represented by the colored dots, ranging from blue to red.
substantially lower than the input tensor implies that the encoder
and decoder must be able to identify and keep the most meaningful
features. As the flow structures are coherent in space and time, a vol-
ume of 12 stacked snapshots is used as an input tensor, 12 snapshots
corresponding to the vortex-shedding period. The encoder converts
the input volume of 12 × 256 × 88 points to encoded features embed-
ded within a three-component vector. Each data-volume input is
thus reduced to a single point in the 3D latent-space representa-
tion. The projection of the volumes is conveyed by the black dots
in Fig. 11, and the first 11 volumes are represented by the colored
dots, sequentially arranged from blue to red. The encoded features
form a loop, where a revolution is equivalent to a shedding cycle.
Although the physical flow features cannot be interpreted directly
from the components in the latent space, the compact representa-
tion nevertheless reflects a spatio-temporal organization. The low-
dimensionality of the latent space can be leveraged for represent-
ing the underlying temporal dynamics by processing elements or
groups of elements, which will then be used by the decoder to yield
a physical representation of the original field.
features embedded within an optimal subspace. By combining this cluster is associated with a flow state, forming the “skeleton” of the
low-dimensional representation of the data and the spectral cluster- flow-dynamical system. This system can then be represented as a
ing algorithm, a coarse-grained description of the data is defined. probabilistic Markov chain, corresponding to a stochastic model,
Results from spectral clustering are shown in Fig. 13, each cluster which describes a sequence of possible events in which the probabil-
being represented by a different color. The number of clusters is ity of each event depends only on the state attained in the previous
constrained to be 8, and as shown in the histogram in Fig. 14(a), event, following the cluster-based reduced-order modeling (CROM)
the events are equally distributed over all the states (clusters).
FIG. 14. Probabilistic clustered dynamical model. Probability values are estimated
in percentage terms and given by the integers. (a) Probabilities for the flow to FIG. 15. Low-dimensional dynamical model: (a) map of migration probabilities
stay in its current configuration (same cluster) and (b) probabilities for the flow to between modes and (b) modes reconstructed from clusters by the decoder.
navigate from one state to another. Integers in figure (a) correspond to modes shown in figure (b).
framework introduced in Kaiser et al.29 Thus, it is necessary to deter- used to make “conventional,” deterministic, predictions of the flow
mine the probability of a flow configuration remaining in its present evolution, based on a preceding learning process.
state, Fig. 14(a), and the probability of the flow migrating toward Intuitively, one approach is to use as input a snapshot at a spe-
another state, the map of the motions between clusters being shown cific time step and “ask” the AE to construct the following time-step
in Fig. 14(b). snapshot, using the latter as output—an approach similar to the
The flow configuration used in this study is relatively simple, dynamic mode decomposition. In fact, if the activation functions
and the low-dimensional dynamical system is depicted in Fig. 15. are linear and MSE is used as the cost function, the AE is a linear
The skeleton of the low-dimensional dynamical system is obtained operator, which should be similar to the DMD.66 However, the AE
using the decoder for reconstructing elements from clusters in the is also capable of performing non-linear transformation if the activa-
latent space and is shown in Fig. 15(b). The dynamics of the sys- tion functions are non-linear. Therefore, the “time-lagged AE” can
tem are illustrated by the orientation and thickness of the arrows in be seen as the generalization of the DMD.
Fig. 15(a). Another possible approach, illustrated in Fig. 16, is to per-
In this section, it was shown that a model, making probabilis- form the prediction in the latent space, taking advantage of the low
tic prediction possible, can be derived by clustering elements from dimensionality, in which case only the dynamics of the most relevant
the latent space obtained by the AE. This approach can be useful features are conserved. Modeling the evolution of latent variables is
for providing insight into which control strategy to adopt, by either easier and more cost-effective, as with this approach, the evaluation
promoting some modes rather than others or undermining modes of new steps scales with the size of the low-dimensional represen-
leading the flow to the disadvantageous configuration (such as drag tation and not with the size of the full-dimensional data. The first
increase47 ). step is to train the AE to learn how to extract and compact the most
valuable information associated with the flow dynamics in the latent
space. Once this task is completed, the following step is to predict the
temporal evolution of the low-dimensional system by using a Con-
VI. PREDICTION OF TEMPORAL EVOLUTION: volutional Neural Network (CNN) in the latent space, which aims
DETERMINISTIC APPROACH
to extract the temporal evolution. As is indicated in Fig. 16, once
In Sec. V, a low-dimensional model of the flow dynamical sys- the training phase is completed, the combination of the AE with the
tem was obtained using an AE in combination with spectral clus- CNN can be used to predict the flow evolution from an initial con-
tering. Importantly, the flow prediction is probabilistic—providing
FIG. 16. Illustration of the temporal prediction process using the AE: upper row, training; lower row, prediction.
space to predict the following time step (at t r +1 ). The vector obtained
is fed into the CNN in order to predict the time step t r +2 , and this is
repeated within a closed loop. The vectors obtained are decoded “on
the fly” for constructing the time-lagged data volumes. The CNN
used comprises five convolutional layers with mono-dimensional fil-
ters, the input being time-series data covering 12 time steps in the
latent space.
One of the most useful ways to uncover the structure in high-
dimensional data is to project it down to a subspace, such as a 2D
plane, where hidden features may become visible. Of course, the
challenge is to find the plane that best illuminates the data structure.
In the present test application, the above process is imple-
mented in order to predict up to 20 shedding cycles. The low-
dimensional data predicted in the latent space are compared with the
original encoded data in Fig. 17. The left-hand side plots show the
time evolution for the three latent space dimensions. The predicted
variations (blue lines) virtually collapse onto the variations result-
ing from feeding the actual fields into the latent space (red lines).
However, closer examination reveals differences in the peaks of the
latent-space vector and also reveals a lag between the prediction and
the actual values [see Fig. 17(b)]. In the phase plot in Fig. 17(d), every
projection of the original event over the 20 cycles is represented by
a black dot, forming a near-continuous loop. Prediction results are
compared to original projection at four selected time steps: ◯ t/T 0
≈ 4, △ t/T 0 ≈ 7, ◽ t/T 0 ≈ 15, and ◇ t/T 0 ≈ 19, with T 0 being the period
for one shedding cycle. At t/T 0 ≈ 4, i.e., after four vortex-shedding
while the general spatial features are well predicted, the precise tem- a smooth manifold in the high-dimensional input space. By identi-
poral evolution suffers from the phase lag discussed above. The accu- fying dependencies between data in a high-dimensional space, man-
racy of the CNN can probably be improved by tuning its architecture ifolds on which data live on are learned by the AE and then lever-
and the training parameters. aged for computing a low-dimensional embedding of the underlying
To conclude, the implicit assumption behind dimensional- manifold. Input data are represented by the AE in a more com-
ity reduction is the existence of relationships between features of pact coordinate system in which manifolds are “unfolded.” As only
high-dimensional data, i.e., data points lying on, or being close to, essential information is conserved and represented in lower-space
FIG. 20. Illustration of the process of sparse reconstruction using the AE and
CNN.
18
in allowing him to conduct this work under favorable working con- B. F. Farrell and P. J. Ioannou, “Stochastic forcing of the linearized Navier-
ditions. The author is also grateful for the insightful comments Stokes equations,” Phys. Fluids A 5(11), 2600–2609 (1993).
19
offered by Laurent Cordier, Nicholas Hutchins, and the anonymous I. Foster, R. Ghani, R. S. Jarmin, F. Kreuter, and J. Lane, Big Data and Social
peer reviewers. The generosity and expertise of one and all have Science: A Practical Guide to Methods and Tools (CRC Press, 2016).
20
P. Garnier, J. Viquerat, J. Rabault, A. Larcher, A. Kuhnle, and E. Hachem, “A
improved this study in innumerable ways.
review on deep reinforcement learning for fluid mechanics,” arXiv:1908.04127
(2019).
21
DATA AVAILABILITY N. Gautier, J.-L. Aider, T. Duriez, B. R. Noack, M. Segond, and M. Abel,
“Closed-loop separation control using machine learning,” J. Fluid Mech. 770,
Data and codes that support the findings of this study are avail- 442–457 (2015).
22
able on Github: github.com/LionelAgo/Vortex_AE. All codes used X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor-
to obtain the results in this paper were written using the COLAB ward neural networks,” in Proceedings of the Thirteenth International Conference
on Artificial Intelligence and Statistics (AISTATS, 2010), pp. 249–256.
platform with version 14 of the Tensorflow library.1 23
F. J. Gonzalez and M. Balajewicz, “Deep convolutional recurrent autoen-
coders for learning low-dimensional feature dynamics of fluid systems,”
REFERENCES arXiv:1808.01346 (2018).
24
1
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016).
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghe- 25
P. Hall and S. Sherwin, “Streamwise vortices in shear flows: Harbingers of
mawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. transition and the skeleton of coherent structures,” J. Fluid Mech. 661, 178–205
Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, (2010).
and X. Zheng, “Tensorflow: A system for large-scale machine learning,” in 12th 26
R. Hecht-Nielsen, “Kolmogorov’s mapping neural network existence theorem,”
USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)
in Proceedings of the International Conference on Neural Networks (IEEE Press,
(USENIX Association, Savannah, GA, 2016), pp. 265–283.
2 New York, 1987), Vol. 3, pp. 11–14.
S. Bagheri, L. Brandt, and D. S. Henningson, “Input-output analysis, model 27
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput.
reduction and control of the flat-plate boundary layer,” J. Fluid Mech. 620,
9(8), 1735–1780 (1997).
263–298 (2009). 28
3 S. Jamal and J. S. Bloom, “On neural architectures for astronomical time-series
P. Baldi and K. Hornik, “Neural networks and principal component analysis:
classification,” arXiv:2003.08618 (2020).
Learning from examples without local minima,” Neural Networks 2(1), 53–58 29
(1989). E. Kaiser, B. R. Noack, L. Cordier, A. Spohn, M. Segond, M. Abel, G. Daviller,
4 J. Östh, S. Krajnović, and R. K. Niven, “Cluster-based reduced-order modelling of
N. Benard, J. Pons-Prats, J. Periaux, G. Bugeda, P. Braud, J. P. Bonnet, and
44 56
R. Maulik, B. Lusch, and P. Balaprakash, “Reduced-order modeling of J. Rabault, J. Kolaas, and A. Jensen, “Performing particle image velocimetry
advection-dominated systems with recurrent neural networks and convolutional using artificial neural networks: A proof-of-concept,” Meas. Sci. Technol. 28(12),
autoencoders,” arXiv:2002.00470 (2020). 125301 (2017).
45 57
M. Milano and P. Koumoutsakos, “Neural network modeling for near wall J. Rabault, U. Reglade, N. Cerardi, M. Kuchta, and A. Jensen, “Deep rein-
turbulent flow,” J. Comput. Phys. 182(1), 1–26 (2002). forcement learning achieves flow control of the 2d Karman vortex street,”
46 arXiv:1808.10754 (2018).
A. T. Mohan, N. Lubbers, D. Livescu, and M. Chertkov, “Embedding hard
58
physical constraints in neural network coarse-graining of 3d turbulence,” A. Rocchetto, E. Grant, S. Strelchuk, G. Carleo, and S. Severini, “Learning hard
arXiv:2002.00021 (2020). quantum distributions with variational autoencoders,” npj Quantum Inf. 4(1), 1–7
47 (2018).
A. G. Nair, C.-A. Yeh, E. Kaiser, B. R. Noack, S. L. Brunton, and K. Taira,
59
“Cluster-based feedback control of turbulent post-stall separated flows,” J. Fluid F. Rosenblatt, “The perceptron: A probabilistic model for information storage
Mech. 875, 345–375 (2019). and organization in the brain,” Psychol. Rev. 65(6), 386 (1958).
48 60
V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltz- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
mann machines,” in Proceedings of the 27th International Conference on Interna- back-propagating errors,” Nature 323(6088), 533–536 (1986).
61
tional Conference on Machine Learning, ICML’10 (Omnipress, Madison, WI, USA, A. L. Samuel, “Some studies in machine learning using the game of checkers,”
2010), pp. 807–814. IBM J. Res. Dev. 3(3), 210–229 (1959).
49 62
See https://www.oreilly.com/content/a-look-at-deep-learning-for-science/ for H. Sidky, W. Chen, and A. L. Ferguson, “Machine learning for collective variable
information about deep learning technologies apply to science. discovery and enhanced sampling in biomolecular simulation,” Mol. Phys. 118,
50
See https://palabos.unige.ch/lattice-boltzmann/lattice-boltzmann-sample-codes- e1737742 (2020).
63
various-other-programming-languages for information about the code. R. Sutton and A. Barto, Reinforcement Learning: An Introduction (MIT Press,
51
S. E. Otto and C. W. Rowley, “Linearly recurrent autoencoder networks for 2011).
64
learning dynamics,” SIAM J. Appl. Dyn. Syst. 18(1), 558–593 (2019). H. Tennekes and J. L. Lumley, A First Course in Turbulence (MIT Press, 1972).
52 65
V. Parezanović, J.-C. Laurentie, C. Fourment, J. Delville, J.-P. Bonnet, A. Spohn, N. G. Van Kampen, “Determinism and predictability,” Synthese 89(2), 273–281
T. Duriez, L. Cordier, B. R. Noack, M. Abel et al., “Mixing layer manipulation (1991).
experiment,” Flow, Turbul. Combust. 94(1), 155–173 (2015). 66
C. Wehmeyer and F. Noé, “Time-lagged autoencoders: Deep learning of slow
53
K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” collective variables for molecular kinetics,” J. Chem. Phys. 148(24), 241703 (2018).
London, Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901). 67
P. Werbos, “Beyond regression: New tools for prediction and analysis in the
54
E. Plaut, “From principal subspaces to principal components with linear autoen- behavioral sciences,” Ph.D. dissertation (Harvard University, 1974).
coders,” arXiv:1804.10253 (2018). 68
X. Zou, I. M. Navon, and J. Sela, “Control of gravitational oscillations in
55
S. B. Pope, Turbulent Flows (IOP Publishing, 2001). variational data assimilation,” Mon. Weather Rev. 121(1), 272–289 (1993).