0% found this document useful (0 votes)
20 views20 pages

Neurocomputing: Craig Michoski, Miloš Milosavljevi C, Todd Oliver, David R. Hatch

Uploaded by

Samir Stha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views20 pages

Neurocomputing: Craig Michoski, Miloš Milosavljevi C, Todd Oliver, David R. Hatch

Uploaded by

Samir Stha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Neurocomputing 399 (2020) 193–212

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Solving differential equations using deep neural networks


Craig Michoski a,∗, Miloš Milosavljević b, Todd Oliver a, David R. Hatch c
a
Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, United States
b
Department of Astronomy, The University of Texas at Austin, Austin, TX 78712, United States
c
Institute for Fusion Studies, University of Texas at Austin, Austin, TX 78712, United States

a r t i c l e i n f o a b s t r a c t

Article history: Recent work on solving partial differential equations (PDEs) with deep neural networks (DNNs) is pre-
Received 25 July 2019 sented. The paper reviews and extends some of these methods while carefully analyzing a fundamental
Revised 14 December 2019
feature in numerical PDEs and nonlinear analysis: irregular solutions. First, the Sod shock tube solution
Accepted 4 February 2020
to the compressible Euler equations is discussed and analyzed. This analysis includes a comparison of a
Available online 19 February 2020
DNN-based approach with conventional finite element and finite volume methods, and demonstrates that
Communicated by Dr. Biao Luo the DNN is competitive in terms of degrees of freedom required for a given accuracy. Further, the DNN-
based approach is extended to consider performance improvements and simultaneous parameter space
Keywords:
Deep neural networks exploration. Next, a shock solution to compressible magnetohydrodynamics (MHD) is solved for, and used
Differential equations in a scenario where experimental data is utilized to enhance a PDE system that is a priori insufficient to
Partial differential equations validate against the observed/experimental data. This is accomplished by enriching the model PDE sys-
Nonlinear tem with source terms that are then inferred via supervised training with synthetic experimental data.
Shocks The resulting DNN framework for PDEs enables straightforward system prototyping and natural integra-
Data analytics tion of large data sets (be they synthetic or experimental), all while simultaneously enabling single-pass
Optimization exploration of an entire parameter space.
Published by Elsevier B.V.

1. Introduction To explore how efficient, reliable, robust, and generalizable


DNN-based solutions are will undoubtedly require careful and pro-
Solving differential equations using optimization/minimization longed study. Here, we offer a practical starting point by asking
strategies has been popular for some time in the form of least how well, and in what ways, can DNNs manage one of the cleanest,
squares finite element methods (LSFEM) [7,29,57], and element- simplest, and most ubiquitous irregularities observed in differential
free Galerkin methods [4,36], where even for mesh-free formula- systems, namely shock fronts. Shock fronts provide a quintessential
tions, theoretical convergence criteria have been examined [3]. Ap- test bed for a numerical method in that they contain an isolated
proaches such as these were originally applied to neural networks analytically non-differentiable feature that propagates through the
in [32], though using neural networks in this context has recently differential system. The feature reduces the local regularity of the
experienced a resurgence of interest as seen in [5,25,45,51,54]. solution both in space and time and leads to numerical represen-
These recent works have shown that remarkably simple imple- tations that must accommodate local first-order analytic disconti-
mentations of deep neural networks (DNNs) can be used to solve nuities while still maintaining some concept of numerical stability
relatively broad classes of differential equations. From a method and robustness. Indeed the way a numerical method responds to
development point of view, demonstrating this capability is in and a local shock front tends to reveal the fundamental signature re-
of itself of research interest. However, it remains unclear what sponse that the numerical method has to local irregularities in the
the practical utility and scope of these types of methods may solution, which might be thought of as providing an indicator by
ultimately be within science and engineering applications, as well which to gauge the method’s applicability in such cases.
as for the numerical analysis of differential systems. A tremendous amount of work spanning pure mathematical
analysis [33,55], numerical analysis [1,46], physics [10,61], etc., has
been done on differential systems that demonstrate shock-like be-
∗ havior. Shock-like numerical behavior is deeply rooted in discrete
Corresponding author.
E-mail addresses: [email protected], [email protected] (C. Mi-
function representation and approximation theory, and further in
choski). functional and discrete functional analysis, specifically in what it

https://doi.org/10.1016/j.neucom.2020.02.015
0925-2312/Published by Elsevier B.V.
194 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

means to locally and globally approximate a member of a function physical systems (as discussed in some detail in Section 4.4 be-
space [11,18,19,26]. low). In (III), another powerful and practical advantage is identi-
Along these lines, the main portion of this paper is dedicated fied, where n-dimensional parameter space exploration simply re-
to examining the method of DNNs for solving PDEs, and exploring quires augmenting the solution domain (space-time) with addi-
the behavior of DNN approximations to the conventional Sod shock tional parameter axes, (x, t, p1 , . . . , pn ), and then optimizing the
tube solution to the compressible Euler equations of gas dynamics DNN to solve the PDEs as a function of the parameters as inputs.
[56]. We use the shock tube solution as a setting in which to ex- The input space augmentation adds little algorithmic or computa-
plore what is meant by a DNN solution to a system of PDEs, what tional complexity over solving on the space-time domain (x, t) at a
is meant by a regular (and irregular) solution to a system of PDEs single parameter point ( p1 , . . . , pn ) and is drastically simpler than
both analytically as well as numerically, how these two frames of exploring parameter space points sequentially.
reference relate to each other, and where they differ. We spend Some of the present drawbacks of the DNN-based approach to
some time examining what regular/irregular solutions really rep- solving PDEs include:
resent and how DNN solutions to differential equations provide an
opportunity for transitioning into, what we refer to in this paper (i) Absence of theoretical convergence guarantees for the non-
as a data-rational1 paradigm where the critical and central focus convex PDE residual minimization.
of study turns to the content of the relational mapping between (ii) Slower overall run time per forward solve.
numerical systems and observation. Then we discuss in some de- (iii) Weaker theoretical grounding of the method in standard
tail how DNN methods compare and relate to more traditional and PDE analysis.
conventional ways of solving systems of PDEs.
We touch on these weaknesses throughout the paper, but
The last portion of the paper is focused more directly on how
briefly, (i) indicates the challenge posed by understanding conver-
to improve and extend the performance of DNN solutions in the
gence criteria in non-convex optimization, where solutions may
context of irregular PDE solvers, and how to capitalize on the
get trapped in local minima. The concern of (ii) is substantially
operational details of DNNs solvers to perform complete and si-
more subtle than it appears at first glance, and strongly depends
multaneous parameter space exploration. We also consider how
on many aspects of the chosen neural network architecture, how
to extend the framework of physics-based modeling to incorpo-
optimized the hyperparameters are, what the ultimate goal of the
rate experimental data into the analytic workflow. These questions
simulation is, etc. Finally, (iii) merely highlights that DNNs have
have growing importance in science and engineering applications
only recently been seriously considered for PDE solvers, and thus,
as data becomes increasingly available. Many of the more tradi-
remain largely theoretically untouched.
tional approaches, including data-informed parameter estimation
The paper is organized as follows: Section 2.3 provides a
methods [2] and real-time filtering [23], are frequently incorpo-
method overview of how DNNs can be used to solve a nonlin-
rated into more standard forward PDE solvers. We argue that DNN-
ear system of PDEs. Within this section is a discussion about
based PDE solvers provide a natural interface between PDE solvers
the relationship between different notions of solutions to PDEs
for physics-based models and advanced data analytics. The general
and attempts to provide some context in which to understand
observation is that because DNNs are able to combine a PDE solver
where DNN solutions can be placed relative to solutions provided
framework with a simple and flexible interface for optimization,
by more conventional numerical methods, and how they point
it becomes natural to ask how sufficient a modeling system is at
towards ‘data-rational’ paradigms. Section 3 provides numerical
representing experimental observations and to explore the nature
experiments which demonstrate how the DNN solution behaves
of the mismatch between a simulated theoretical model and an
given unbounded gradients, how the DNN approximations of the
experimentally measured phenomenon. In this context, the DNN
Sod shock tube problem compare to more standard finite element
framework automatically inherits the ability to couple many prac-
and finite volume methods, and what is meant by concepts of
tical data-driven concepts, from signal analysis, to optimal control,
convergence within this context. Finally Section 4 expands on the
to data-driven PDE enrichment and discovery.
standard DNN solution approach examining natural extensions to
To provide some insight into the utility of the DNN framework
this framework. Here methods for improving convergence times
for solving PDEs, it is instructive to discuss some of the apparent
of DNN solutions are discussed, as are solution dependencies on
strengths and drawbacks. Three strengths of the framework are:
network architecture. The final subsection of Section 4 discusses
(I) Phenomenal ease of prototyping PDEs. data-enrichment of PDEs, which is presented through the lens of a
(II) Natural incorporation of large data. hypothetical experimental scenario, where the experimenter finds
(III) Simultaneous solution over an entire parameter space. herself in the situation in which the anticipated physical model of
the system does not adequately validate against the experimental
Regarding (I), complicated systems of highly parameterized and data.
multidimensional PDEs can be prototyped in TensorFlow or Py-
Torch in hundreds of lines of code, in a day or two. This might be
2. Background on irregular solutions and DNN methods
compared to the decade-long development cycle of many legacy
PDE solvers. For (II), incorporating and then utilizing experimental
In this section we first discuss some of the motivation under-
data in the PDE workflow as a straightforward supervised machine
lying the use of DNN methods for solving differential equations,
learning task is algorithmically very simple and provides a painless
and how this naturally leads to a data-rational scientific paradigm.
integration with the empirical scientific method. This feature could
Next we review the DNN method for solving an n-coupled system
be naturally leveraged for practical uncertainty quantification, risk
of nonlinear PDEs, and then discuss how this method is directly
assessment, data-driven exploration, optimal control, and even
applied to the shock problem.
discovery and identification of the theoretical underpinnings of

2.1. Irregular solutions to PDEs


1
In this paper “data-rational” refers to a systems approach to differential equa-
tions that studies, as its primary focus, the mapping between the explicit numeri-
cal content of a differential system (e.g. a computer register) and the observed data
Though many, if not most, known PDEs have been derived from
(e.g. experimental measurements) it approximates. This topic is discussed in more and are studied for their ability to represent natural phenomena,
detail in Section 2.2. at their core they remain mathematical abstractions that are only
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 195

Fig. 1. The unlimited solutions to the Sod shock tube. The solution is fully ill-posed in the classical formulation, leading to undefined behavior (i.e., unbounded derivatives)
along the shock fronts. Nevertheless, the solution remains stable (in contrast with the unlimited versions of other algorithms), and appears relatively robust to the major
features of the solution.

approximations to the phenomena that inspire them. These ab- in a regular solution, enough derivatives exist at every point in the
stractions are often derived from simple variational perturbation solution so that operations within the PDE make sense at every
theories, and are subsequently celebrated for their ability to cap- point in the entire domain of relevance. A well-posed regular so-
ture and predict the physical behavior of complex natural sys- lution is formally defined to provably exist, be unique, and depend
tems. This “capturing” of physical systems is conventionally ac- continuously on the data given in the problem [15]. In contrast to
complished at the level of experimental validation, which is of- regular solutions, the concept of a weak solution was developed to
ten many steps removed from the abstraction that the PDE itself substantially weaken the notion of what is meant by a solution to
represents. a PDE. In weak solutions derivatives only exist in a “weak” distri-
In the study and understanding of PDEs and mathematical anal- butional sense, and the solution is subsequently recast in terms of
ysis, it is difficult to over-emphasize the importance of stability and distributions and integral norms.
regularity. One of the most challenging problems in modern theo- For example, in the case of a “strong solution” to a PDE,
retical science and engineering is developing a strategy for solving which is a “weak solution” that is bounded and stable in the L2 -
a PDE and ultimately interpreting the solution. In this context, ir- norm and thus preserves important geometric properties, a solu-
regular solutions to PDEs are easy to come by. To account for this, tion may admit, e.g., pointwise discontinuities and remain an ad-
convention introduces the concept of mathematically well-posed missible solution. Such solutions are the strongest, and/or most
solutions (in the sense of Hadamard [24]). Well-posedness circum- well-behaved of the weak solutions; and yet still, interpreting the
scribes what is meant, in some sense, by a “good” or admissible space of strong solutions from a physical point of view can be
solution to a PDE. It is within this framework that the concept challenging. Moreover, weak solutions in the sense of Hadamard
of classical and regular solutions, strong solutions, weak solutions, are only proper solutions to a PDE if they are adjoined with a
and so forth has evolved. concept of stability (such as Lp -stability) that implies a bound
A “regular solution” can be thought of as one where the actual in a given normed space. This is a rather practical condition, in
PDE is satisfied in an intuitively sensible and classical way. Namely, that a solution in a distributional sense that is not bounded in
196 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

norm is too permissive to pointwise “variation” to be physically data analytic frameworks and coding infrastructures (e.g. Tensor-
interpretable. Flow), it becomes natural and simple to incorporate and couple
A substantial fraction of solutions arising in applied science these frameworks with data-critical observational systems (as dis-
are weak solutions that demonstrate stability in notably irregu- cussed in more detail in Section 4.4).
lar spaces. Even the simplest of equations can admit solutions of
highly irregular character (e.g., the Poisson equation [58]). In fact, 2.3. Method for solving PDEs using deep neural networks
in many applied areas the concept of turbulence becomes impor-
tant, which from a mathematical point of view deals with PDEs In this paper we are interested in systems of n-coupled poten-
in regimes where neither regularity nor stability are observed, a tially nonlinear PDEs that can be written in terms of the initial-
fact with broad conceptual ramifications [28]. This irregular and boundary value problem, for i = 1, . . . , n:
unexpected behavior in PDE analysis leads to a rather basic and
inevitable question that we address in the next section: how well ∂t ui + Ni (u; p) = 0, for (x, t, p) ∈  × [0, Ts ] ×  p ,
do our specific discrete numerical approximate solutions to certain ui |t=0 = ui,0 (x, p), for (x, p) ∈  ×  p ,
PDEs capture their rigorously established mathematical properties, Bi (u, p) = 0, for (x, t, p) ∈ ∂  × [0, Ts ] ×  p , (1)
and why might this question matter?
over the domain  ⊂ Rd with boundary ∂ , and the m di-
2.2. Numerical solutions and the data-rational paradigm mensional parameter domain p ∈ p given m parameters p =
( p1 , . . . , pm ), where u = u(x, t, p) is the n dimensional solution
Discrete numerical systems cannot, in general, exactly replicate vector with ith scalar-valued component ui = ui (x, t, p), Ts is the
infinite dimensional spaces (except in an abstract limit), and thus duration in time, Ni is generally a nonlinear differential oper-
approximate numerical solutions to PDEs remain just that: ap- ator, with arbitrary boundary conditions expressed through the
proximations to mathematical PDEs. As a matter of practice, weak component-wise operator Bi (u, p).
formulations are often implemented in numerical methods, but Here we are interested in approximating the solutions to (1) us-
while these methods often preserve the spirit of their mathemat- ing deep neural networks (DNNs). A feed-forward network can be
ical counterparts, they subsequently lose much of their content. described in terms of the input y ∈ Rdin for y = (x, t, p), the out-
For example, solutions in discontinuous finite element methods put zL ∈ Rdout , and an input-to-output mapping y→zL , where din
are frequently couched in terms of L2 residuals and are presented and dout are the input and output dimension. In the present set-
along with numerical stability results, when in effect the discrete ting, din = d + 1 + m and dout = n. The components of the pre- and
L2 spaces they are purported to represent are simply piecewise post-activations of hidden layer  are denoted yk and zj , respec-
continuous polynomial spaces that are capable of representing only tively, where the activation function is a sufficiently differentiable
a tiny fraction of admissible solutions in L2 . function φ : R → R. The jth component of the activations in the th
It is then a natural question to ask, what has become of all the hidden layer of the network is given by
admissible solutions that even the most rigorous numerical meth-
ods discard by construction? And how exactly has it been deter-

N
zj (y ) = bj +  
W jk yk ( y ), yk = φ (zk−1 (y )), (2)
mined which solutions are to be preserved? A pragmatist might
k=1
respond with the argument that numerical methods develop, pri-
marily, to deliver practical utility, adjoining as a central thematic where N are the numbers of neurons in the hidden layers of the
network, and W jk  and b are the weight and bias parameters of
pillar to their evolution the notion of “physical relevance,” where j
physical relevance is notionally defined in terms of the utility that layer . These parameters make up the tunable, or “trainable”, pa-
the numerical solution provides in the elucidation of a practical rameters of the network, which can be thought of as the degrees
observation. of freedom that parametrize the representation space of the DNN.
While this may certainly be the case, and while we adopt the Note that in Section 4.3 we will consider a slightly more exotic
pragmatic perspective in this paper, we would like to raise a con- network architecture, but for now, we use the standard “multilayer
sideration to the reader: if numerical methods, particularly those perceptron” described in (2).
driving the evaluation of predictive simulation models, are, with In this context, the output of the neural network zL , where the
widespread application, utilizing numerical schemes that to some number of units N is chosen and fixed for each layer, realizes a
extent arbitrarily select solutions of a particular kind from a much family of approximate solutions to (1), such that for each i compo-
larger space of mathematically admissible solutions, and then use nent of (1),
that subset of solutions as the justification upon which interpre-
ziL (x, t; p, ϑ ) ≈ ui (x, t, p), (3)
tive predictive physical understanding is derived, then how can one
evaluate the predictive capability of a model without directly com- where we have set the parameters ϑ = (W, b). Similarly, activation
paring it to observed data? functions φ must be chosen such that the differential operators
As a consequence of these considerations, it seems more sci-
∂ ziL
entifically and logically responsible to operate under what might (x, t; p, ϑ ) and Ni (zL ; p),
be characterized as a data-rational paradigm, that views a dif- ∂t
ferential equation perhaps more primarily as a tool for approxi- can be readily and robustly evaluated using reverse mode auto-
mating functional relationships between measurable and/or deriv- matic differentiation [17].
able quantities for the purposes of validating and predicting tar- We proceed by defining a slightly more general objective func-
geted systems. The numerical solution to a differential equation in tion to that presented in [51,54],
this paradigm is more of a discrete relational mapping between
an encoded numerical representation space (e.g. a computer reg- Ji (ϑ ) = JiPDE (ϑ ) + JiBC (ϑ ) + JiIC (ϑ ) (4)
ister) and a network of observed data (e.g. experimental measure- where
ments). In this regard, the DNN solutions to differential equations
 ∂ zL 
explored in this paper may provide an opportunity for easy tran-  i 
JiPDE (ϑ ) =  ∂ t + Ni ( z L ; p ) , JiBC (ϑ ) = Bi (u, p)Li (∂ ×[0,Ts ]× p ,℘2 ) ,
sition into such a data-rational paradigm. As DNN methods are of- Li (×[0,Ts ]× p ,℘1 )

ten already couched within the context of optimization algorithms, JiIC (ϑ ) = ziL (x, 0, p; ϑ ) − ui,0 (x, p)Li (× p ,℘3 ) , (5)
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 197

and Li indicates the form of the ith loss and ℘1,2,3 denotes the arise [12,41]. As a consequence, numerical approaches to the shock
probability density with respect to which the expected values of problem display unique characteristics that often expose deep nu-
the loss are calculated. In particular, the form of the loss function merical insights into the nature of the method. The unique signa-
can correspond to any standard norm (e.g., the discrete kernel of ture that a numerical method displays when dispersing its instabil-
an Lp -norm), the mean square error, or specialized hybrids such as ities throughout a solution foretells the predictive characteristics of
the Huber loss [27]. For this chosen form L , the corresponding the simulation itself.
loss is given by the following expected value: Many successful numerical methods can be viewed as varia-
 tions of general strategies with important but subtle differences in
 f (X )L(Y,℘) = E℘[L ( f )] = L ( f )℘(X )dX. the underlying detail. Consider, for example, finite volume meth-
Y
ods (FVM), spectral, pseudospectral, and Fourier Galerkin meth-
Letting zL = {z1L , . . . , znL }, we define any particular approximate so- ods, continuous and discontinuous Galerkin finite element meth-
lution zˆL to (1) as finding a ϑ that minimizes the loss relative to ods, etc. In each of these cases, the system of PDEs is cast into
the parameter set. That is: a weak formulation, where the spatial derivatives are effectively

n transferred to analytically robust basis functions, and fluxes are re-
ϑˆ = arg min Ji (ϑ ). (6) covered from integration by parts. For example, if we choose some
ϑ i=1 adequately smooth test function ϕ, multiply it by some transport
To solve this optimization problem, one must approximate the equation ∂t U + F (U )x = 0 in a state space where U a state vector,
loss function and couple this approximation with an optimization and integrate by parts, we have an equation that can be written in
algorithm. For instance, using stochastic gradient descent (SGD), the form:
one uses a Monte Carlo approximation of the loss coupled with   
 ϕ  U dx + ∂  ϕ  F dS −  F  ϕx dx
d
dt
= 0, (8)
gradient descent. Thus, the loss is computed by drawing a set of
samples Xbatch from the joint distribution ℘1℘2℘3 and forming the where  denotes componentwise multiplication.
following approximation: To see the relation between distinct numerical methods, recall
here that when ϕ is a high order polynomial, and the discrete do-
Gi (ϑ ; Xbatch ) = GPDE
i (ϑ; Xbatch ) + GBC
i (ϑ ; Xbatch ) + Gi (ϑ ; Xbatch ),
IC
main is, for example, a single element, the recovered method is a
where type of spectral element method (SEM). Whereas, when the dis-

NPDE cretization is turned into a finite element mesh h comprised of
1
GPDE
i (ϑ; Xbatch ) = L (Ri (xn , tn , pn )), i subintervals, 1 , . . . , i , and the basis functions are chosen to
NPDE be continuous or piecewise discontinuous polynomials, we have
n=1
the continuous Galerkin (CG) or discontinuous Galerkin (DG) fi-
1 
NBC

i (ϑ ; Xbatch ) =
GBC L (Bi (xn , pn )), nite element methods, respectively. Similarly, when the elements
NBC i are viewed as finite volumes and the basis functions ϕ are cho-
n=1
sen to be piecewise constant, a finite volume method can be easily
1 
NIC

i (ϑ ; Xbatch ) =
GIC L (ziL (xn , 0, pn ; ϑ ) − ui,0 (xn , pn )), recovered.
NIC As discussed above the weak formulation (8) of a method
n=1

and Ri (xn , tn , pn ) denotes the ith PDE residual evaluated at the should not really be viewed, in and of itself, as an advantage in the
batch point (xn , tn , pn ), which is drawn according to ℘1 , and simi- data-rational paradigm. Indeed, the DNN loss functions (4) could
larly for the BC and IC terms. Then, the SGD algorithm for the k + 1 equally be cast into a weak formulation (as in [7] for example),
iteration can be written as paying a price in the simplicity of the method. Similarly, by using
  different activation functions (e.g., φ being taken as ReLU), the uni-

n
versal approximation theorem can, in the limit sense, recover all
ϑk+1 ← ϑk − αk ∇ϑ Gi ( ϑ k
k ; Xbatch ) , (7)
Lebesgue integral function spaces [38]. We sidestep these issues,
i=1
viewing them as unnecessary complications that muddy the sim-
k
where Xbatch denotes the batch set drawn at iteration k. plicity and elegance of the DNN solution. Instead, we choose the
In practice we find the Adam optimizer [30] to be the most ef- hyperbolic tangent tanh (z) activation function and interpret solu-
ficient SGD-family algorithm for the cases run in this paper. For tions to shock problems through the lens that smooth solutions are
network initialization we use the Xavier uniform initializer [22]. dense in Lp .
It is also worth noting that the hyperparameters L and N as well
as the sampling distribution ℘ are additional hyperparameters that 2.3.2. Euler’s equations and the Sod shock tube
can be optimized. For simplicity, however, we do not perform hy- For simplicity, we start by considering the dimensionless form
perparameter optimization in this paper. of the compressible Euler equations in 1D, solved over only the
space-time domain  × [0, Ts ], with  = [−1, 1] and bound-
2.3.1. Shocks ary ∂  = {−1, 1}. We are interested in solutions to the initial-
Perhaps the simplest type of irregularity that confronts a nu- boundary value problem:
merical method is a numerical shock. A numerical shock need not
be a mathematical discontinuity, and as such it is a fundamen- ∂t U + ∂x F (U ) = 0, U |t=0 = U 0 , U |x∈∂  = U ∂  , (9)
tally different concept than both a mathematical shock and a phys- for a state vector U = (ρ , ρ u, E ) and corresponding flux F (U ) =
ical shock. One definition of a numerical shock is: the numeri- (ρ u, ρ u2 + p, u( p + E )) . The state vector is comprised of the den-
cal response (in the form of numerical instability) of a method sity ρ , velocity u, and total energy of the gas
when the representation space that the method is constructed in
p ρ
does not support the function space it is interrogating. Examples E= + u2 ,
of this behavior can be found from Gibb’s phenomena in spectral
γ −1 2
methods to Runge’s phenomena in polynomial based methods and while the pressure-temperature relation is the single component
elsewhere. More broadly, one can simply posit that the radius of ideal gas law p = ρ T . The constant γ denotes the adiabatic index
convergence of the solution must be contained within the sup- and, unless otherwise noted, is taken throughout the remainder
 of
port of its discrete representation, or else instabilities are likely to the paper to be γ = 5/3, while the speed of sound is cs = γ T .
198 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

To analyze the behavior of the network on a simple and classic local minimum. This leads to an interesting feature of the DNN so-
1D shock problem, we solve the Sod shock tube. The initial condi- lutions, namely that they seem to demonstrate robustness while
tion is chosen to be piecewise constant: broadly maintaining many of the small-scale features of the exact
solution, even in the presence of irregularity in the formulation.
Ul, x<0
U0 = (10)
U r, x≥0
3.1.1. DNNs as gridless adaptive inverse PDE solvers
where the left state is defined by U l = (1, 0, 1.5 ) and the right by To explain why an approximate regular DNN solution to a shock
U r = (0.125, 0, 0.15 ) . This also defines the left and right Dirich- problem can perform as well as it does, as shown in Fig. 1, we
let boundaries U ∂  = {U l , U r }. The exact solution to this system is note that a DNN solver can really be viewed as a type of grid-
readily obtained by a standard textbook procedure [35]. Except for less adaptive inverse solver. First, the absence of a fixed grid in the
in section 4.4, the simulation time horizon is set to Ts = 0.2. evaluation of the loss functions (4) and determination of the opti-
mization direction (7) means that the probability density ℘ can be
3. Numerical experiments, results, and discussion finitely sampled in space as densely as the method can computa-
tionally afford at every iteration and for as many iterations as can
In this section we examine the basic numerical behavior of be afforded. Given the computational resources, this can lead to a
DNN solutions to the Sod shock tube, present some results and relatively densely sampled interrogation of the space-time domain.
discussion. In section 3.1 we show how the DNN solution behaves Moreover, adaptive mesh methods for shock problems have long
without any regularization and discuss how that compares to other been known to be highly effective [6,34], particularly when shock
numerical methods. In section 3.2 we discuss how to add analytic fronts can be precisely tracked, and there are a discrete number of
diffusion to regularize the solution. Section 3.3 compares the regu- them. However, AMR and hp-adaptive methods are constrained by
larized DNN results against those from standard finite volume and their meshing/griding geometries, even as isoparametric and isoge-
discontinuous Galerkin finite element methods. ometric methods have been developed to, at least in part, reduce
these dependencies [42].
3.1. An unlimited solution using DNNs The second pertinent observation about the DNN solution is
that because it utilizes an optimization method to arrive at the
The standard solution to the Sod shock tube problem for solution, it can be viewed as a type of inverse problem. Unlike a
(9) poses real challenges to most numerical methods. In the stan- forward numerical method, which accumulates temporal and spa-
dard DG method, for example, when the polynomial degree is tial numerical error as it runs, the DNN solutions become more
nonzero, p > 0, the solution becomes numerically unstable with- accurate with iterations, being a global-in-space-time inverse-like
out the use of some form of slope limiting [43,44]. Similarly when solver. This means that it becomes entirely reasonable to consider
FVM is evaluated at higher than first order accuracy, the absence running such systems in single or even half precision floating point
of flux-limiting leads to unstable solutions [14,59]. This behavior formats, leading to potential performance benefits in comparison
is also observed in spectral methods [21,60], continuous Galerkin to more standard PDE solvers. As an example, all DNN computa-
methods [39], and so on. tions done in this paper are run in single precision.
One of the more robust regimes for solving shock problems
can be found in constant DG methods with degree p = 0 basis 3.2. Diffusing along unbounded gradients
functions (or equivalently, FVM methods with first order accuracy),
wherein both a stable and relatively robust solution can be readily As discussed above, solving a shock problem without adding
calculated, but at the cost of large numerical diffusion. Indeed the some form of diffusion (often in the form of limiters or low or-
ability to solve these types of irregular problems helps explain the der accurate methods with upwind fluxes) tends to lead to unsta-
preference practitioners often demonstrate in choosing low order ble solutions in classical numerical methods. Similarly here, even
methods to compute solutions to irregular systems, where to re- in the case of the DNN solution, when searching for an approxi-
solve solution features one relies solely on mesh/grid refinement. mation to the regular solution shown in Fig. 1, the resulting solu-
In contrast to lower order approaches, the DNN solution to the tion, though reasonably well-behaved, is of course spurious along
Sod shock tube problem is able to exploit the relative flexibility of the undefined shock fronts.
DNN functional approximators, as determined by composition of As in all numerical methods, to ameliorate this problem, all
affine transformations and element-wise activation nonlinearities. that needs to be done is to add some small amount of diffusion
For example, with hyperbolic tangent activations, the DNN solution to constrain the derivatives along the shock fronts. In this case,
to the shock tube problem becomes an approximation to a regular we slightly alter (9) into the non-conductive compressible Navier-
solution of the Sod problem at high order accuracy, in the sense Stokes type system,
that the approximation order of a linear combination of transcen-
dental functions is C∞ . ∂t U + ∂x F (U ) − ∂x G (U ) = 0, U |t=0 = U 0 , U |x∈∂  = V ∂  ,
In the case of the DNN solution, even though the discontin- (11)
uous shock fronts display unbounded derivatives in the classical
sense, the DNN is able to find a relatively well-behaved solution as given a dissipative flux G (U ) = (0, ρτ ux , 12 ρτ u2x ) . Again here we
shown in Fig. 1. The DNN formulation with the hyperbolic tangent start by merely solving over the space-time domain,  × [0, Ts ],
activation, if the optimization is to converge to a parameter point given a fixed and positive constant τ ∈ R+ . As we show in more
ϑˆ , is predicated on a regular solution. Failure to converge could detail below, adding a modest amount of viscosity τ = 0.005 (i.e.
arise if the derivatives at the locus of discontinuity, and the com- Reynolds number ~ 340) leads to a DNN solution that is compet-
ponents of ϑ, enter unstable unbounded growth in the iterations of itive with some of the most effective numerical methods available
the optimization protocol. Indeed, the DNN approximates the dis- for solving the Sod shock tube problem.
continuous mathematical shock precisely in the limit in which cer-
tain components of ϑ tend to infinity. We observe that in practice, 3.3. Numerical benchmarks
however, this does not happen. The inherent spectral bias [49] of
the DNNs seems to be able to contain the potential instability of In this section, we show numerical results obtained for the DNN
the solution until the optimization has settled in a “reasonable” with artificial viscosity and compare to analogous results obtained
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 199

Fig. 2. Comparison of slope-limited discontinuous Galerkin (DG) finite element method solvers, flux-limited finite volume method (FVM) solvers, and the dissipative DNN
solver holding fixed the relative resolution in the spatial grids of each solution, Nbatch ∼ Nvol ∼ (n + 1 )Nel .

with DG finite element methods (FEMs) and FVMs for the Sod
shock tube problem. Such a comparison is fraught with complex-
ity for at least two reasons. First, the iterative convergence of the
DNN approach is different from that of a typical method, since it is
posed as an optimization problem rather than a nonlinear system
of equations. Because the optimization problem is non-convex, we
must accept that the optimal DNN approximation is not necessar-
ily unique, and, even if it is, may converge instead to a local mini-
mum. However, as long as the loss function at this local minimum
is small, the resulting DNN represents a reasonable approximation.
Fig. 3 shows a typical convergence result as a function of iteration
number for the DNN method, which is analogous to the conver-
gence behavior of SGD type problems in general. Note here that all
runs in this paper are run to losses of comparable magnitude, i.e.
RMSE between ∼ 2 × 10−4 and ∼ 8 × 10−5 .
Second, it is not clear how to make the comparison between
the DNN approach and typical methods fair. While total time to
solution at a target accuracy is likely the most practically rele-
vant metric, the time to solution for the DNN method depends Fig. 3. Here we show the convergence behavior on the problem in Fig. 2, using a
on a long list of choices and tunable parameters that affect the decreasing learning rate schedule of × 0.1 every 25K iterations, starting at 10−4 .
optimization algorithm, including, for example, the batch size and
the learning rate. We have made little attempt to optimize these
200 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

Fig. 4. Comparison of slope-limited discontinuous Galerkin (DG) finite element method solvers, flux-limited finite volume method (FVM) solvers, and the dissipative DNN
solver holding fixed the total degrees of freedom (in both space and time) of the discrete representation.

parameters in this work, and thus, we choose not to compare DNN the residual is evaluated at batch points determined by the
based on time to solution. sampling distribution x ~ ℘, which we denote with Nbatch .
Instead, we use proxies for the complexity of the DNN. In com- One nuance in comparing the methods through residual eval-
paring typical discretization schemes, it is common to make com- uation points is that in standard FVM and DGFEM methods, these
parisons at constant numbers of degrees of freedom (DoFs), essen- points are spatially fixed. In the DNN approach however, the
tially using the DoFs as a proxy for time to solution. In the context batch points are resampled after every evaluation of the objective
of the DNN, there are at least two quantities that affect time to so- function gradients. As such, it might seem more appropriate to
lution in a qualitatively similar way to DoFs in a standard method: look at how DNNs compare to hp-adaptive DGFEM and FVM
the number of batch points, which is somewhat analogous to grid that use adaptive mesh refinement (AMR). Note, however, that
density or number of quadrature points in a typical method, and adaptive forward problems using DGFEM and FVM are adaptive in
the number of tunable parameters (i.e., DoFs) in the DNN, which a fundamentally different way relative to the residual evaluation
is analogous to the number of DoFs in a standard method. Thus, as compared to the DNN method. In forward problems, the resid-
we compare the methods, both graphically and using the mean ual evaluation shifts in space relative to the current solution in
squared error, in terms of the spatial “grid” and the number of time. In the DNN solution, however, though the batch points are
DoFs. resampled, the samples can be chosen independent of the solution
behavior, and are done so over the whole spacetime domain
3.3.1. Solutions as a function of spatial grid density h × [0, Ts ].
A way to evaluate the relationship between the DNN method This dependence on space and time of the sampled batch points
and FVM or DGFEM is by comparing the relative density of ef- Nbatch , distinguishes the DNN solution from even the adaptive FVM
fective residual evaluation points in space. Broadly, in the DG so- and FEM solutions. Thus in order to compare the spatial resolu-
lution, for degree p = 1 polynomial solutions, this corresponds to tion of the two different typesof solutions, we compare solutions
( p + 1 )Nel for Nel the number of elements. In the FVM method this using the rule of thumb that Nbatch ∼ Nvol ∼ (n + 1 )Nel , as these
corresponds to simply Nvol , the number of finite volumes. In the capture the average relative ‘grid density’ along one dimension.
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 201

Table 1 results show that the DNN solution compares favorably as a func-
Mean square error of the unlimited DNN solution in Fig. 1 and the solutions as a
tion of degrees of freedom.
function of the grid density in Fig. 2, against the exact Sod shock tube solution.

Solution Type Density Pressure Velocity


4. Natural extensions
Unlimited DNN 0.00199 0.00074 0.01272
FVM 1st Order 0.00117 0.00139 0.00804
FVM 2nd Order 0.00030 0.00019 0.00164
The remainder of the paper is dedicated to examining natural
DG, p = 2 0.00163 0.00259 0.01413 extensions to the DNN framework applied to the Sod Shock tube
DG, p = 5 0.00314 0.00811 0.02945 problem. The first of these is an application of a type of “simulated
DNN, τ = .005 0.00020 0.00025 0.00136 annealing,” that can be used along the viscous and/or dissipative
parameter directions to improve the effective time to convergence
Table 2 in the method. Next we show how solving the DNN framework
Mean square errors of the solutions as a function of fixed degrees of freedom in along a full parameter axis, in this case along the adiabatic index
both space and time, as shown in Fig. 4. axis γ , can lead to a simultaneous and fully parameterized solu-
Solution type Density Pressure Velocity tion in the entire parameter space at nearly no additional cost. We
FVM 1st Order 0.00027 0.00017 0.00136
then examine the behavior of an LSTM-like network for this sys-
FVM 2nd Order 0.00011 0.00005 0.00084 tem. And finally we discuss what might be the most practical ap-
DG, p = 2 0.00021 0.00013 0.00182 plication of the DNN methods for shocktube type problems, which
DG, p = 5 0.00042 0.00029 0.00257 is: PDE solutions via data-enrichment.
DNN, τ = .005 0.00020 0.00025 0.00136

4.1. Dissipative annealing


Results of this comparison are shown in Fig. 2 and Table 1,
One of the potential limitations of the DNN approach, in com-
where the timestepping is chosen sufficiently small. Here the
parison to other numerical methods, is the compute time-to-
DGFEM solution is run at p = 2, Nel = 24 and at p = 5, Nel = 12
convergence. Discussion of convergence in the DNN/PDE setting is
using a Rusanov Riemann solver with adaptive hierarchical slope
complicated by the fact that “convergence” refers to different con-
limiting [44]. Both first and second order FVM solutions are run
cepts in numerical PDE analysis and in numerical optimization. In
at Nvol = 72, where the first order solution is unlimited, and the
numerical PDE analysis, convergence usually refers to rate at which
second order uses a standard per-face slope limiter. The DNN so- an approximate solution tends to the exact solution as a func-
lution is run at Nbatch = 72, using τ = 0.005. The results indicate
tion of the representation parameters, e.g., mesh width h and/or
that the DNN solution performs favorably to the FVM and DGFEM
polyniomial order p. In iterative numerical optimization, however,
solutions relative to spatial grid density.
convergence refers to the rate at which the parameters ϑ converge
to a (possibly local) minimum as a function of the number of iter-
3.3.2. Solutions as a function of degrees of freedom
ations. The global minimum may exist only in the limit in which
Another way of comparing the DNN solution to the FVM and
some components of ϑ tend to infinity. Then, heuristics mandate
DGFEM methods is to look at solutions as a function of their
that we seek convergence to an acceptable local minimum. It is
respective degrees of freedom (DoFs). Heuristically this can be
thus difficult to arrive at a formal definition for what is meant
thought of as comparing the number of “tunable parameters”
by time-to-convergence in the DNN setting, and this can be easily
within the representation space of the solution.
seen in Fig. 3. In this section, we resort to a hybrid heuristic be-
Since the solution in the DNN is a global in space and time so-
tween the two concepts of convergence discussed above, and refer
lution, we consider the DoFs spanning the whole space, h × [0,
to “time-to-convergence” as the number of computational cycles
Ts ]. In the DG solution this corresponds to S(n + 1 )Nel Tgrid , where
required to reach a comparable level of accuracy to that of a more
Tgrid is the temporal grid and S are the number of stages in the
traditional PDE solution method. It is in this sense that DNN solu-
Runge-Kutta discretization. Likewise in the FVM setting, we have
tions can appear noticeably more expensive, though in this section
SNvol Tgrid . In the case of the DNN, on the other hand, the degrees
we discuss some of the nuances that underlie this observation, and
of freedom are recovered by counting the scalar components of the
why the unique capacity and flexibility of DNN solutions makes
vector ϑ parametrizing the network architecture (2), and this yields
the time-to-convergence less restrictive than it seems.
L−1 The premise we assume here about slowly converging solu-

DNNDoFs = N1 (din + 1 ) + dout (NL + 1 ) + N+1 (N + 1 ), (12) tions (in the sense of optimization) is that large-scale features of
=1 a smooth solution are relatively easy to converge to, but when ac-
companied with finer scale features, the optimization noise from
where L is the number of hidden layers, N is the width of the attempting to fit the finer scales can obscure the larger scale fea-
-th layer, and din and dout are the input and output dimensions, tures from the optimizer. To mitigate this effect, inspired by the
respectively. simulated annealing method [31], we recast the dissipative flux in
The results are presented in Fig. 4 and Table 2. In the DNN so- (11) as
lution, the degrees of freedom are DNNDoFs = 248, 065 after setting
L = 16 and N = 128. The DGFEM solutions are run the same as G (U ) = (0, ρτ0 ux , 1
2
ρτ0 u2x ) ,
in Section 3.3.1, except that at sixth order, p = 5, S = 4, Nel = 104
(6 ) where τ0 = τ0 (l ) becomes an iteration-l numerical viscosity. In this
and Tgrid = 100, so that DGFEMDoFs = 249, 600. Similarly the third
case τ0 (l ) = g(l )τsmooth for some smooth τsmooth ∈ R. The function
order accurate DGFEM solutions use p = 2, S = 4, Nel = 208 and
(3 ) g(l) is chosen as a fractional stepwise function bounded from above
Tgrid = 100, corresponding to DGFEMDoFs = 249, 600. The first order by unity.
accurate FVM solution is run the same as in Section 3.3.1, except The idea of using τ 0 , is to converge early and fast in the itera-
S = 1 (a forward Euler solver is used), Nvol = 500, and Tgrid = 500 tion cycle to an overtly smooth solution of (11) with large viscos-
(1 )
so that FVMDoFs = 250, 0 0 0. At second order the FVM method uses ity. Since such a solution has dramatically fewer fine-scale features,
a predictor-corrector method, which we set as S = 2, Nvol = 350, it is conjectured that the optimizer can more easily find a stable
(2 )
and Tgrid = 355 corresponding to FVMDoFs = 248, 500. Again, the minimum of the objective function for such a smooth solution. As
202 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

Fig. 5. The Sod shock tube solution using a DNN solver along with dissipative annealing, out to only 60K iterations.

a consequence, fewer total iterations are required to converge to 4.2. Simultaneous parameter scans
the minimum.
We have tested this idea on the Sod problem, and have found Dense parameter space exploration is the most reliable way
that with minimal effort it seems to reduce the number of itera- to develop predictive input-output mappings for scenario devel-
tions needed to arrive at similar results. For example, in Figs. 5, opment, optimal design and control, risk assessment, uncertainty
,–7, we test a DNN using a decreasing learning rate schedule. This quantification, etc., in many real-world applications. Without un-
base case is run with the minibatch size of 50 0 0 and initial learn- derstanding the response to the parameters of the system, be
ing rate of 10−4 that is reduced every 25,0 0 0 iterations by fac- they variable boundary conditions, interior shaping constraints, or
tor of 0.1 for a total of 10 0,0 0 0 iterations. This case is compared constitutive coefficients, it is all but impossible to examine the
to a dissipatively annealed solution obtained with the same mini- utility an engineered model might have in some particular de-
batch size, but instead of 25, 0 0 0 iterations per learning rate decre- sign process. PDEs clearly help in this regard since they are fully
ment we find we can get away with 120 0 0 iterations, where after parametrized models of anticipated physical responses. That being
each boundary set, g = {1, 0.2, 0.18, 0.1386, 0.1}, such that τ0 (l ) = said, PDEs are also often fairly expensive to run forward simula-
g(l )τsmooth for τsmooth = 0.05 and l = 1, . . . , 5. This means the dis- tions on, frequently requiring large HPC machines to simulate even
sipatively annealed solution takes only 60,0 0 0 iterations to arrive modest systems [8,40]. Consequently, parameter space exploration
at τ0 (5 ) = 0.005. of a PDE system is usually both very important and very computa-
As is clear from Figs. 5–7, the results are nearly indistinguish- tionally expensive.
able. This result, however, should be taken with a grain of salt. In To mitigate the expense of running forward models, surrogate
this example case the two runs are set up exactly the same, up modeling [16] and multifidelity hierarchies [48] are often con-
to the dissipative annealing algorithm. It is not clear that this is ceived in order to develop cost-effective ways of examining the re-
an entirely fair comparison, given the number of hyperparameters sponse of a PDE relative to its parametric responses. One of the
that can be tuned in each case, and provided the non-deterministic ways in which these methods function is by developing reduced
nature of SGD. models to emulate the parametric dependencies from a relatively
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 203

Fig. 6. The Sod shock tube solution using a DNN solver without dissipative annealing to 100K iterations.

sparse sampling of the space. In this way it becomes possible— parameter sets, the same minibatch size and learning rate sched-
for a sufficiently smooth response surface—to tractably parametrize ule, optimization over the parameter-augmented input space (x, t,
the input-output mapping, and thus interrogate the response at a γ ) reaches similar loss values with almost no computational over-
significantly reduced cost, lending itself to statistical inference, un- head. Contour and time slice plots of the solutions are shown in
certainty quantification, and real-world validation studies. Fig. 8 and Fig. 9 for the density, velocity pressure, and tempera-
However, as is commonly the case, the response surface is not ture, each as a function of (x, t, γ ) ∈ [−1, 1] × [0, 0.2] × [1.1, 2.0].
sufficiently smooth, or its correlation scale-length cannot be de- Remarkably, in this example, the DNN framework demonstrates
termined a priori, and these reduced order mappings can become that it is as cost effective to perform dense parameter space ex-
highly inaccurate. In such circumstances having a way to emu- ploration along the dimension of the adiabatic index γ as it is to
late the exact response surface would clearly be of high practi- solve at a single fixed value of γ . It remains unclear how robust
cal value. As a potential solution to this, one of the most pow- this behavior is over physical parameters of the model — in this
erful immediate features of the DNN framework seems to be the case (11). But however robust this behavior ends up being, even if
ability to perform simultaneous and dense parameter space scans only across isolated and specific parameters, this demonstrates a
with little effort. The framework is thus a natural, and in some remarkably advantageous aspect of the DNN formulation.
sense exact emulator for complicated system response models. In
the case of the Sod shock problem this can be accomplished by
simply recasting (11) relative to its parameter γ , where instead of 4.3. DNNs with residual connections and multiplicative gates
solving over space-time (x, t) ∈  × [0, Ts ], the solution is solved
over the parameter-augmented parameter space (x, t, γ ) ∈  × [0, As pointed out in [54], the DNN architecture has a signifi-
Ts ] × [γ low , γ high ]. cant influence on its solution behavior. In [54] it is shown that
Intuitively one might expect the increase of dimensionality to an LSTM-inspired feed-forward network architecture with resid-
be prohibitive, as the parameter space grows from a discrete 2D ual connections and multiplicative gates can help improve perfor-
system to a discrete 3D system. However, in analogy with the ob- mance for some parabolic PDE problems. These models are effec-
servation of Section 4.1 where the parameter τ is dissipatively an- tively an alternative to the more standard deep neural network ar-
nealed, this is not what is observed. Instead, using the same DNN chitecture presented in (2).
204 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

Fig. 7. The final timestep of Sod shock tube solution comparing the dissipative annealed solution at 60K, to the standard solution at 100K iterations, to the exact solution.

We implement these LSTM-like architectures for our 1D mixed and the number of units per layer is N . For more details on the
hyperbolic-parabolic like PDE system (9), to see how it behaves rel- network architecture we refer the reader to [54].
ative to standard DNNs in the context of more irregularity in the To understand whether the standard DNN architecture is more
solution space. Setting ζ = (x, t ), we test the Euler system (9) on effective, in some sense, than (13), it is essential to recognize that
the following architecture comprised of L + 1 hidden layers: the degrees of freedom scale differently for the LSTM-like system
(13) than they do in (12), where the LSTM-like system is substan-
S0 = φ (W 0 ζ + b0 ), tially more expensive per network layer . More explicitly, assum-
Z 
= σ (U z, ζ + W z, S + bz, ),  = 0, . . . , L − 1, ing that each layer has the same width, N , the degrees of freedom
in the LSTM-like system can be calculated as:
G = σ (U g, ζ + W g, S + bg, ),  = 0, . . . , L − 1,
R = φ (U r, ζ + W r, S + br, ),  = 0, . . . , L − 1, LSTMDoFs = 4LN2 + (4L(din + 1 ) + din + dout + 1 )N + dout . (15)
H 
= φ (U h, ζ + W h, (S  R + bh, ),  = 0, . . . , L − 1, Comparing (12) to (15), for a fixed number of layers L = 2, set-
S+1 = ( 1 − G )  H  + Z   S  ,  = 0, . . . , L − 1, ting for (9) din = 2 and dout = 3, the resulting relationship between
the number of units in the LSTM, N ⇒NLSTM , and DNN, N ⇒NDNN ,
zL (ζ , θ ) = W L SL + bL , (13)
can be computed by solving the following ceiling function:
  
where σ (x ) = 1/(1 + e−x ) is the sigmoid function and the network −7 + 2
16NLSTM + 72NLSTM + 49
parameters are given by: NDNN = .
2

θ = W 0 , b0 , U z, , W z, , bz,


L−1
, For our numerical test, we set L = 16 with the reference so-
=0
lution width set to NDNN = 128. This leads to DNNDoFs = 248451.
U g, , W g, , bg,
L−1
, U r, , W r, , br,
L−1
, U h, , W h, , bh,
L−1
, W L , bL , Solving for the relationship above this yields NLSTM = 63. The re-
=0 =0 =0
sults are presented in Fig. 10, and show fairly unambiguously that
(14) for this particular test case the LSTM-like architecture does not
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 205

Fig. 8. 100 3D contours of the Sod shock tube solution solved over the continuous parameter scan, (x, t, γ ) ∈ [−1, 1] × [0, 0.2] × [1.1, 2.0] at 100K iterations.

perform as well as the standard DNN. As above, this result must be started to emerge, and we presume that this trend will only ac-
taken in context. It may well be, for example, that the tuned pa- celerate. For example, Raissi et al. [51] introduced “data discovered
rameters for standard DNNs do not tune the LSTM-like networks PDEs,” where a relatively traditional parameter estimation is per-
well, and that the result presented is not a proper comparison. formed over differential operators that are discovered through op-
Again, to fully reveal the relationship between these architectures timization. This parameter estimation can be performed for a λ ∈ R
on even just this one simple hyperbolic test case, would require a that factors through a differential operator λuux . This can be ex-
full study of its own. panded to data driven discovery of PDEs, where parameter hier-
That said, from a purely practical point of view, the implemen- archies can be used to generate libraries of operators that are se-
tation of the DNN versus the implementation of the LSTM-like net- lected based on system specific physical criteria and constraints, in
works seems to lead to a network architecture with a large dispar- order to effectively “construct” and/or discover PDE systems from
ity in the number of graph edges. The DNN network has effectively large data sets [37]. These types of approaches are emerging with
one layer per  ∈ L, while the LSTM-like network has, in some increasing interest [5,9,50,52,53], and naturally lend themselves to
sense, five layer-like networks per  ∈ L as seen in (13. The re- the DNN frameworks.
sult of this is that in our implementation, even at fixed degrees of These types of empirical PDE discovery techniques offer clear
freedom, the LSTM-like network takes almost five times longer in advantages in cases where the systems are assumed to be too com-
compute time to finish than the standard DNN, at a fixed number plex to easily construct first principle models. However, researchers
of iterations. Again, it is not clear if this slowdown can be reduced also find themselves in a different type of situation, where the
by using a more clever implementation of the LSTM-like networks, physics model that describes the system response is confidently
or if this is simply a result of increasing the network complexity in prescribed, so that any mismatch between the model and data
the graph. raises questions about the experimental setup just as it raises them
about the model.
4.4. Data-enriched PDEs To explore this common circumstance, we address a situation
relating to the experimental validation of a first principles model
The simplicity and apparent robustness of DNNs for solving sys- system. First principles models are predicated on the idea that
tems of nonlinear and irregular differential equations raises the the physical dependencies within an experimental system are, or
question: how can one incorporate experimental data into such can be, fully understood. Validation studies, however, frequently
a framework? Work combining DNN-based approaches for solving discover that this is in fact not the case [20,47]. Frequently sys-
PDEs with conventional supervised machine learning has already tematic behaviors, engineering details, alleatoric and/or epistemic
206 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

Fig. 9. Variation in γ and x of the Sod shock tube solution, at Ts = 0.2 with 100K iterations.

uncertainties can cascade, and lead to systems with behaviors that 4.4.1. Hypothetical experimental senario
are uncertain and/or different from those of the model systems an- We consider the following hypothetical scenario: an experimen-
ticipated to describe them. tal test facility is set up to run experiments interrogating the be-
In circumstances such as these, a researcher might be in a situ- havior of transonic gas dynamics. All solutions in this section are
ation where a prescribed model system does not plausibly validate run using the DNN method for solving PDEs. Although the experi-
against the large quantities of high resolution experimentally mea- mental setup is mildly exotic, the expected responses of the system
sured data that have been collected. Some of the scientific ques- are anticipated to obey classical gas dynamics (9). Experiments are
tions one might want to raise in such circumstances are: run, and a process is initially set up to validate relative to the fol-
lowing dimensionless ideal model system:
1. Are my experimental results reliable?
2. Is my model system sufficient to describe the experimental ∂t ρ + (ρ u )x = 0,
data? ∂t (ρ u ) + ρ u2 + p − τ ρ ux x = 0,
3. How far from my model system is the measured data?
4. What is the form of the mismatch between the data and the
∂t E + (Eu + pu − τ ρ uux − κ Tx )x = 0, (16)
model? where the total energy density is given by
5. How might I enrich my model system to more accurately cap- p ρ
E= + u2 ,
ture the observed behavior? γ −1 2
6. Are there characteristics in the mismatch that reveal missing
and the initial-boundary data is set to the same as in (9). Here a
physical subsystems?
constant heat conductivity κ = τ = 0.005 is also assumed. In the
Below we present a simple example of how to approach such a process of validating the system, the classical gas dynamic model
circumstance, and provide an outline of how such a series of ques- (16) repeatedly demonstrates that it does not strongly validate
tions can be systematically retired. against the experimentally measured data. After making sure the
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 207

Fig. 10. The Sod shock tube solution at the final timestep, comparing the LSTM-like architecture versus standard DNN at fixed DoFs.

Both camps might proceed by testing their subsystems to the best


of their ability and coming to the conclusion that nothing is wrong,
per se, but there is some other physics occurring in the test facil-
ity that they have not properly anticipated. As a consequence, the
next step might be to slowly start adding back physics terms to
(16). Subsequent efforts may include parameter estimation on re-
maining uncertain parameters in the model until a better fit can
be recovered. In this case, however, such a process would be ex-
tremely slow and tedious as illustrated below.
Unbeknownst to both the modellers and experimentalists, the
actual dynamics going on inside the experiment, is a substantially
more complicated system. In fact, it turns out that a feature of the
experimental setup is inadvertently inducing ionization of the gas
in the chamber. As a result, the gas actually behaves like a reac-
Fig. 11. The reactive subsystem from (18). To illustrate the autocatalytic oscillation, tive photoactivated autocatalytic plasma in the center of the reac-
the initial state is set to ρ1 = 2ρl /3, ρ2 = ρl /3, though the autocatalysis is largely tor, characterized exactly by the following equations:
independent of the initial conditions.

∂t ρ1 + (ρ1 u )x = (k1 A1 + k2 ρ12 ρ2 − k3 ρ1 A2 − k4 ρ1 )ζ ,


measurement equipment is well calibrated, and the confidence in
the data is high, the researchers are left with the quandary: what ∂t ρ2 + (ρ2 u )x = (k3 ρ1 A2 − k2 ρ12 ρ2 )ζ ,
 1 2 1 2

is happening?
∂t (ρ u ) + ρ u2 + p + B − Bx − τ ρ ux = 0,
A traditional approach to this problem might begin with a vig- 8π 4π x
orous debate between the simulation experts and the laboratory  1

experts, both claiming systematic errors on behalf of the other. ∂t (ρv ) + ρ uv − Bx By − τ ρvx = 0,
4π x
208 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

Fig. 12. The top left shows the result from the total measured density in the experiment. The top right is the DNN-enriched simulated value of the density after adding the
fi ’s. The bottom panels show the function f1 and it’s first derivative, where its dominant signature effect can be found near the initial state of the system, t = 0.

Table 3
The initial-boundary data for the model (16) and the experiment (19).

B.C. ρ u v w T Bx By Bz

Left Mod. B.C. 1.08 1.2 – – 0.8796 – – –


Right Mod. B.C. 0.9891 -0.0131 – – 0.9823 – – –
Left Exp. B.C. 1.08 1.2 0.01 0.5 0.8796 2.0 3.6 2.0
Right Exp. B.C. 0.9891 -0.0131 0.0269 0.010037 0.9823 2.0 4.0244 2.0026

 1

just like for the model system (16), the initial conditions for all
∂t (ρ w ) + ρ uw − Bx Bz − τ ρ wx = 0,
4π x unknowns are initialized with a jump discontinuity at x = 0, over
 1 2 1
 the domain  = [−0.5, 0.5], for t ∈ [0, 0.1], as inspired in the RP2
∂t E + Eu + pu + B − Bx (v · B ) − τ ρ uux − κ Tx = 0, case in [13]. The remaining full boundary conditions are listed in
8π 4π x
Table 3.
∂t Bx = 0, ∇ · B = 0,
The autocatalytic reactive subsystem is induced by virtue of an
∂t By + (uBy − vBx )x = 0, unexpected ionization pulse in the reactor, leading to an oscillating
∂t Bz + (uBz − wBx )x = 0, (17) chaotic attractor characterized by the reactions:
where
k1
p ρ 1 2  A1 −−−→ ρ1
E= + v2 + B , and ρi = ρ .
γ −1 2 8π
i
k2
2ρ1 + ρ2 −−−→ 3ρ1
Here B = (Bx , By , Bz ) is a magnetic field, thought inconsequential in A2 + ρ1 −−−→
k3
ρ2 + A 3
the erroneously presumed absence of ions, and the velocity field
k4
in the gas is defined as v = (u, v, w ). In this hypothetical scenario, ρ1 −−−→ A4 (18)
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 209

Fig. 13. The top left is the experimentally measured velocity, and the top right the DNN-enriched simulation including the contribution from the mismatch function f2 . The
mismatch function f2 is shown on the bottom left, with df2 /dx on the bottom right.

where ρ1 is the first chemical species with density ρ 1 , and ρ2 the The solution from the experiment (17) is now used as training
second chemical species with density ρ 2 . Here the Ai are excess data to supervise (19). Formally the objective function from (4) is
bulk species, and the ki are dimensional rate constants. The con- enriched with the training data, U exp = (ρexp , ρexp , uexp , Eexp ) from
dition for instability is that A2 > A21 + 1, thus we set A2 = 2, and (17), by appending the new loss functions
A1 = 0.9, where for simplicity we set ki = 150 for all i. Photoacti-
vation only occurs in the center of the reactor, and thus
ρexp − ρLs (×[0,Ts ],E ) + ρexp uexp − ρ uLs (×[0,Ts ],E )
+Eexp − E L2 (×[0,Ts ],E )
1 x2
ζ= √ e − 2σ 2 .
12σ 2π to (4), where E are the experimentally measured data support
points, and Ls is the mean square error. For simplicity, the sup-
The solution to this subsystem is shown in Fig. 11
port points E are chosen as a 10 0 0 × 10 0 0 point grid in (x, t). In
To reveal the mismatch of the underlying system that is actu-
addition, the neural network outputs associated to the f i = fi (x, t )
ally observed in the experiment, the DNN is trained to model the
are L2 -regularized by setting:
following enriched PDE system instead of (16): 
∂t ρ + (ρ u )x = f1 ,  fi2 Li (×[0,Ts ],℘i ) ,
i
∂t (ρ u ) + ρ u2 + p − τ ρ ux x = f2 ,
where here the distributions ℘i are chosen to be the same as for
∂t E + (Eu + pu − τ ρ uux − κ Tx )x = f3 (19) (19), and the Li are L2 -losses with weight wi = 0.0 0 01 for each i in
where order to minimize the penalization for accumulating non-zero fi .
The resulting system matches up reasonably well to the exper-
p ρ
E= + u2 , imental data. As can be seen in Fig. 12, the simulation density and
γ −1 2
experimental density match to within the eyeball norm, where the
with again the same initial-boundary data from Table 3. initial mismatch is corrected with the f1 function. Though the total
210 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

Fig. 14. The experimental temperature profile is given on the top left, and the simulation on the top right with f3 . The mismatch function f3 is shown on the bottom left,
with its x-derivative on the bottom right.

Fig. 15. The experimental magnetic fields, where bx is measured to be negligibly small.
C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212 211

density nearly completely obfuscates the oscillations from the re- Data curation, Writing - original draft. Todd Oliver: Conceptualiza-
active subsystem in Fig. 11, the total density is off primarily near tion, Methodology, Investigation, Formal analysis, Writing - review
the initial state, where the reactive subsystem indicate a dramatic & editing. David R. Hatch: Investigation, Supervision, Writing - re-
influx of mass due to the species being tracked by the sensor data view & editing, Funding acquisition.
in the experiment. This type of mismatch in the mass conservation
clearly indicates chemical reactions relative to the tracked species Acknowledgments
densities, unless, of course, the system is somehow not closed. In
Fig. 13 the velocity mismatch is similarly convincing, where the This work was supported by U.S. DOE Contract No. DE-FG02-
mismatch f2 shows complicated behavior with nontrivial variation 04ER54742 and U.S. DOE Office of Fusion Energy Sciences Scientific
that cannot be readily explained by the mass influx in f1 . Similarly Discovery through Advanced Computing (SciDAC) program under
the mismatch f3 in the temperature in Fig. 14 also shows an un- Award Number DE-SC0018429. This work was also supported by
usual signature not present in f1 . the NSF grant AST-1413501. We acknowledge the Texas Advanced
Returning to our list of potential questions about how the Computing Center at The University of Texas at Austin for provid-
model system relates to the experimental system (17), we can now ing HPC resources. Computations were performed on the Maver-
make some fairly strong conclusions. First, clearly there is a mass ick2 GPU cluster.
mismatch in the measured species density. The mismatch is sub-
stantial and cannot be captured by simple parameter estimation. References
Therefore, there is physics in the mass equation that is not be-
ing accounted for by the ascribed model system (16). Moreover, [1] R. Abgrall, R. Saurel, Discrete equations for physical and numerical compress-
ible multiphase mixtures, J. Comput. Phys. 186 (2) (2003) 361–396, doi:10.
it is also straightforward to conclude that the giant influx of mass 1016/S0 021-9991(03)0 0 011-1.
makes a chemical reaction a very likely candidate to explain the [2] R. Aster, B. Borchers, C. Thurber, Parameter Estimation and Inverse Problems,
experimental behavior, assuming the system is closed. in: Parameter Estimation and Inverse Problems, in: International Geophysics
Series, 90, Academic Press Inc Elsevier Science, 2005, pp. 1–303.
In contrast to the mass equation, the momentum and energy
[3] I. Babuška, U. Banerjee, J.E. Osborn, Survey of meshless and generalized finite
equations in (16) show perturbations away from the model Sod element methods: a unified approach, Acta Numer. 12 (2003) 1–125, doi:10.
shock solution that indicate that complicated system dynamics is 1017/S09624929020 0 0 090.
[4] T. Belytschko, Y.Y. Lu, L. Gu, Element-free Galerkin methods, Int. J. Numer.
occurring in the continuum. While this behavior might be more
Methods Eng. 37 (2) (1994) 229–256, doi:10.1002/nme.1620370205.
challenging to diagnose, it is clear in these cases too that the base [5] J. Berg, K. Nystrm, Data-driven discovery of pdes in complex datasets, J. Com-
equations (16), even with the addition of f1 , cannot account for the put. Phys. 384 (2019) 239–252, doi:10.1016/j.jcp.2019.01.036.
system response, and there is missing physics. As it so happens, [6] M. Berger, P. Colella, Local adaptive mesh refinement for shock hydrodynamics,
J. Comput. Phys. 82 (1) (1989) 64–84, doi:10.1016/0 021-9991(89)90 035-1.
knowing simply the magnetic field variation of the experiment, as [7] P.B. Bochev, M.D. Gunzburger, Least-squares finite element methods, Applied
shown in Fig. 15 is really enough to diagnose the mismatch in both Mathematical Sciences, 166, Springer, New York, 2009, doi:10.1007/b13382.
the the velocity f2 and temperature f3 as related to the magnetic [8] M. Bremer, K. Kazhyken, H. Kaiser, C. Michoski, C. Dawson, Performance com-
parison of hpx versus traditional parallelization strategies for the discontin-
field, since they exhibit signature features that track the magnetic uous galerkin method, J. Sci. Comput. 80 (2) (2019) 878–902, doi:10.1007/
field variation. s10915- 019- 00960- z.
[9] K. Champion, B. Lusch, J.N. Kutz, S.L. Brunton, Data-driven discovery of coordi-
nates and governing equations, 2019.
5. Conclusion [10] R. Chevalier, J. Blondin, R. Emmering, Hydrodynamic instabilities in supernova-
remnants - self-similar driven waves, Astrophys. J. 392 (1, 1) (1992) 118–130,
doi:10.1086/171411.
Gridless representations provided by deep neural networks in [11] R.A. DeVore, G.G. Lorentz, Constructive approximation, Grundlehren der Math-
combination with numerical optimization can be used to solve ematischen Wissenschaften [Fundamental Principles of Mathematical Sci-
complicated systems of differential equations that display irregular ences], 303, Springer-Verlag, Berlin, 1993, doi:10.1007/978- 3- 662- 02888- 9.
[12] C. Domb, M.F. Sykes, J.T. Randall, On the susceptibility of a ferromagnetic above
solutions. This requires fairly straightforward techniques, and in ir- the curie point, Proc. R. Soc. Lond. Series A. Math. Phys. Sci. 240 (1221) (1957)
regular solutions benefits from the usual trick of adding numerical 214–228, doi:10.1098/rspa.1957.0078.
diffusion to smooth numerically unstable function representations. [13] M. Dumbser, D. Balsara, M. Tavelli, F. Fambri, A divergence-free semi-implicit
finite volume scheme for ideal, viscous, and resistive magnetohydrodynamics,
The DNN method compares favorably with regard to accuracy and Int. J. Numer. Methods Fluids 89 (1-2) (2019) 16–42, doi:10.1002/fld.4681.
stability to more conventional numerical methods and yet enables [14] M. Dumbser, M. Kaeser, V.A. Titarev, E.F. Toro, Quadrature-free non-oscillatory
one-shot exploration of the entire parameter space. The incorpo- finite volume schemes on unstructured meshes for nonlinear hyperbolic sys-
tems, J. Comput. Phys. 226 (1) (2007) 204–243, doi:10.1016/j.jcp.20 07.04.0 04.
ration of efficient optimization algorithms in DNN methods into
[15] L.C. Evans, Partial Differential Equations, American Mathematical Society, Prov-
the PDE solver lends itself to an ease and simplicity of integrat- idence, R.I., 2010.
ing advanced data analytic techniques into physics-enriched model [16] A. Forrester, A. Sobester, A. Keane, W.I.O. service), Engineering Design via Sur-
systems, and provides an opportunity for transitioning into a data- rogate Modelling : A Practical Guide, Hoboken, N.J. : Wiley ; Chichester :
John Wiley [distributor], 2008. Description based upon print version of record.
rational scientific paradigm. Early results indicate that DNNs enable http://www.SLQ.eblib.com.au/patron/FullRecord.aspx?p=366798.
a simple and powerful framework for exploring and advancing pre- [17] D.A. Fournier, H.J. Skaug, J. Ancheta, J. Ianelli, A. Magnusson, M.N. Maun-
dictive capabilities in science and engineering. der, A. Nielsen, J. Sibert, Ad model builder: using automatic dif-
ferentiation for statistical inference of highly parameterized com-
plex nonlinear models., Opt. Methods Softw. 27 (2) (2012) 233–249
http://ezproxy.lib.utexas.edu/login?url=http://search.ebscohost.com/login.
Declaration of Competing Interest
aspx?direct=true&db=syh&AN=73326763&site=ehost-live.
[18] J.B. Garnett, Bounded analytic functions, Pure and Applied Mathematics, 96,
he authors declare that they have no known competing finan- Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-Lon-
cial interests or personal relationships that could have appeared to don, 1981.
[19] T. Gerstner, M. Griebel, Numerical integration using sparse grids, Numer. Algo-
influence the work reported in this paper. rithms 18 (3-4) (1998) 209–232, doi:10.1023/A:1019129717644.
[20] R. Ghanem, D. Higdon, H. Owhadi, Handbook of Uncertainty Quantification,
Springer, New York, 2017.
CRediT authorship contribution statement [21] j. Giannakouros, G. Karniadakis, Spectral element Fct method for scalar hyper-
bolic conservation-laws, Int. J. Numer. Methods Fluids 14 (6) (1992) 707–727,
Craig Michoski: Conceptualization, Methodology, Validation, doi:10.1002/fld.1650140605.
[22] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedfor-
Formal analysis, Visualization, Writing - original draft, Software. ward neural networks, in: Y.W. Teh, M. Titterington (Eds.), Proceedings of the
Miloš Milosavljević: Software, Conceptualization, Methodology, Thirteenth International Conference on Artificial Intelligence and Statistics,
212 C. Michoski, M. Milosavljević and T. Oliver et al. / Neurocomputing 399 (2020) 193–212

Proceedings of Machine Learning Research, 9, PMLR, Chia Laguna Resort, Sar- nonlinear partial differential equations, J. Comput. Phys. 378 (2019) 686–707,
dinia, Italy, 2010, pp. 249–256. http://proceedings.mlr.press/v9/glorot10a.html. doi:10.1016/j.jcp.2018.10.045.
[23] N. Gordon, D. Salmond, A. Smith, Novel-approach to nonlinear non-gaussian [52] S.H. Rudy, S.L. Brunton, J.L. Proctor, J.N. Kutz, Data-driven discovery of partial
Bayesian state estimation, IEE Proc.-F Radar Signal Process. 140 (2) (1993) 107– differential equations, Sci. Adv. 3 (4) (2017), doi:10.1126/sciadv.1602614.
113, doi:10.1049/ip- f- 2.1993.0015. [53] H. Schaeffer, Learning partial differential equations via data discovery and
[24] J. Hadamard, Sur les problèmes aux dérivés partielles et leur signification sparse optimization, Proc. R. Soc. A: Math., Phys. Eng. Sci. 473 (2197) (2017)
physique, Princeton Univ. Bull. 13 (1902) 49–52. 20160446, doi:10.1098/rspa.2016.0446.
[25] J. Han, A. Jentzen, W. E, Solving high-dimensional partial differential equations [54] J. Sirignano, K. Spiliopoulos, DGM: a deep learning algorithm for solving partial
using deep learning, Proc. Natl. Acad. Sci. 115 (34) (2018) 8505–8510, doi:10. differential equations, J. Comput. Phys. 375 (2018) 1339–1364, doi:10.1016/j.jcp.
1073/pnas.1718942115. 2018.08.029.
[26] J.S. Hesthaven, T. Warburton, Nodal discontinuous Galerkin methods, Texts [55] J. Smoller, Shock waves and reaction-diffusion equations, Grundlehren der
in Applied Mathematics, 54, Springer, New York, 2008, doi:10.1007/ Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sci-
978- 0- 387- 72067- 8. Algorithms, analysis, and applications ence], 258, Springer-Verlag, New York-Berlin, 1983.
[27] P.J. Huber, Robust estimation of a location parameter, Ann. Math. Stat. 35 (1) [56] G.A. Sod, A survey of several finite difference methods for systems of nonlinear
(1964) 73–101, doi:10.1214/aoms/1177703732. hyperbolic conservation laws, J. Comput. Phys. 27 (1) (1978) 1–31, doi:10.1016/
[28] P. Isett, A proof of Onsager’s conjecture, Ann. Math. 188 (3) (2018) 871–963, 0 021-9991(78)90 023-2.
doi:10.4007/annals.2018.188.3.4. [57] G. Strang, G.J. Fix, An Analysis of the Finite Element Method, Prentice-Hall, Inc.,
[29] B.-N. Jiang, G.F. Carey, Least-squares finite element methods for compressible Englewood Cliffs, N. J., 1973. Prentice-Hall Series in Automatic Computation
Euler equations, Int. J. Numer. Methods Fluids 10 (5) (1990) 557–568, doi:10. [58] L. Wang, F. Yao, S. Zhou, H. Jia, Optimal regularity for the poisson equation,
1002/fld.1650100504. Proc. Am. Math. Soc. 137 (6) (2009) 2037–2047 http://www.jstor.org/stable/
[30] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, 2014. 20535956.
[31] S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing., [59] Z. Xu, Parametrized maximum principle preserving flux limiters for high order
Science 220 4598 (1983) 671–680. schemes solving hyperbolic conservation laws: one-dimensional scalar prob-
[32] I.E. Lagaris, A. Likas, D.I. Fotiadis, Artificial neural networks for solving ordinary lem, Math. Comput. 83 (289) (2014) 2213–2238.
and partial differential equations, IEEE Trans. Neural Netw. 9 (5) (1998) 987– [60] Z. Xu, Y. Liu, C.-W. Shu, Hierarchical reconstruction for spectral volume method
10 0 0, doi:10.1109/72.712178. on unstructured grids, J. Comput. Phys. 228 (16) (2009) 5787–5802, doi:10.
[33] P.D. Lax, Hyperbolic Systems of Conservation Laws and the Mathematical The- 1016/j.jcp.20 09.05.0 01.
ory of Shock Waves, Society for Industrial and Applied Mathematics, Philadel- [61] Z. Zhang, G. Gogos, Theory of shock wave propagation during laser ablation,
phia, Pa., 1973. Conference Board of the Mathematical Sciences Regional Con- Phys. Rev. B 69 (23) (2004), doi:10.1103/PhysRevB.69.235403.
ference Series in Applied Mathematics, No. 11.
[34] R. LeVeque, Wave propagation algorithms for multidimensional hyperbolic sys- Craig Michoski is currently a Research Scientist at the
tems, J. Comput. Phys. 131 (2) (1997) 327–353, doi:10.1006/jcph.1996.5603. University of Texas at Austin in the Oden Institute for
[35] R.J. LeVeque, Finite-Volume Methods for Hyperbolic Problems, Cambridge Uni- Computational Engineering and Sciences. He received his
versity Press, 2002. A.B. (1999) from the University of Colorado Boulder in
[36] X. Li, Error estimates for the moving least-square approximation and the chemistry, and his Ph.D. (2009) from the University of
element-free Galerkin method in n-dimensional spaces, Appl. Numer. Math. Texas at Austin in chemistry and applied mathemat-
IMACS J. 99 (2016) 77–97, doi:10.1016/j.apnum.2015.07.006. ics. His research spans computational engineering, ap-
[37] Z. Long, Y. Lu, X. Ma, B. Dong, PDE-net: Learning PDEs from data, 2018. https: plied mathematics, physics, machine learning, and data
//openreview.net/forum?id=SylJ1D1C- analytics.
[38] Z. Lu, H. Pu, F. Wang, Z. Hu, L. Wang, The expressive power of neu-
ral networks: a view from the width, in: I. Guyon, U.V. Luxburg,
S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.),
Advances in Neural Information Processing Systems 30, Curran
Associates, Inc., 2017, pp. 6231–6239. http://papers.nips.cc/paper/
7203- the- expressive- power- of- neural- networks- a- view- from- the- width.pdf.
Miloš Milosavljević is a computational scientist and Asso-
[39] S. Mabuza, J.N. Shadid, D. Kuzmin, Local bounds preserving stabilization for
ciate Professor of Astronomy at the University of Texas at
continuous Galerkin discretization of hyperbolic systems, J. Comput. Phys. 361
Austin. He received A.B. and Ph.D. in Physics, respectively,
(2018) 82–110, doi:10.1016/j.jcp.2018.01.048.
from Harvard and Rutgers Universities, and was a post-
[40] N. Malaya, D. McDougall, C. Michoski, M. Lee, C.S. Simmons, Experiences port-
doctoral scholar at the California Institute of Technology.
ing scientific applications to the intel (knl) xeon phi platform, in: Proceedings
He has worked on problems in astrophysical gravitational
of the Practice and Experience in Advanced Research Computing 2017 on Sus-
and fluid dynamics and relativistic plasma physics.
tainability, Success and Impact, in: PEARC17, ACM, New York, NY, USA, 2017,
pp. 40:1–40:8, doi:10.1145/3093338.3093371.
[41] G. Mercer, A. Roberts, A centre manifold description of contaminant dispersion
in channels with varying flow properties, SIAM J. Appl. Math. 50 (6) (1990)
1547–1565, doi:10.1137/0150091.
[42] C. Michoski, J. Chan, L. Engvall, J. Evans, Foundations of the blended isogeo-
metric discontinuous galerkin (bidg) method, Comput. Methods Appl. Mech.
Eng. 305 (2016) 658–681, doi:10.1016/j.cma.2016.02.015.
[43] C. Michoski, C. Dawson, E.J. Kubatko, D. Wirasaet, S. Brus, J.J. Westerink, A com- Todd Oliver received his S.B. (2002), S.M. (2004), and
parison of artificial viscosity, limiters, and filters, for high order discontinuous Ph.D. (2008) from MIT in Aerospace Engineering. He is
Galerkin solutions in nonlinear settings, J. Sci. Comput. 66 (1) (2016) 406–434, currently a Research Scientist at the Oden Institute for
doi:10.1007/s10915-015-0027-2. Computational Engineering and Sciences at The University
[44] C. Michoski, C. Mirabito, C. Dawson, D. Wirasaet, E.J. Kubatko, J.J. Westerink, of Texas at Austin. His research interests include compu-
Adaptive hierarchic transformations for dynamically p-enriched slope-limiting tational fluid dynamics, turbulence modeling, and uncer-
over discontinuous Galerkin systems of generalized equations, J. Comput. Phys. tainty quantification.
230 (22) (2011) 8028–8056, doi:10.1016/j.jcp.2011.07.009.
[45] M.A. Nabian, H. Meidani, Physics-driven regularization of deep neural net-
works for enhanced engineering design and analysis, J. Comput. Inf. Sci. Eng.
20 (1) (2019), doi:10.1115/1.4044507. 011006
[46] T. Ohwada, S. Kobayashi, Management of discontinuous reconstruction in ki-
netic schemes, J. Comput. Phys. 197 (1) (2004) 116–138, doi:10.1016/j.jcp.2003.
11.020.
[47] T.A. Oliver, G. Terejanu, C.S. Simmons, R.D. Moser, Validating predictions of un- David R. Hatch is a plasma physicist in the Institute
observed quantities, Comput. Methods Appl. Mech. Eng. 283 (2015) 1310–1335, for Fusion Studies at the University of Texas at Austin.
doi:10.1016/j.cma.2014.08.023. He received B.S. degrees in physics and math from Utah
[48] B. Peherstorfer, K. Willcox, M. Gunzburger, Survey of multifidelity methods in State University (2005) and a Ph.D. in physics from the
uncertainty propagation, inference, and optimization, SIAM Rev. 60 (3) (2018) University of Wisconsin-Madison (2010). He is interested
550–591. in kinetic plasma turbulence, turbulent transport in mag-
[49] N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F.A. Hamprecht, Y. Bengio, netically confined fusion plasmas, and most recently the
A. Courville, On the spectral bias of neural networks, 2018. intersection of these topics with machine learning. He is
[50] M. Raissi, Deep hidden physics models: deep learning of nonlinear partial dif- known for his elucidation of the velocity space spectrum
ferential equations, J. Mach. Learn. Res. 19 (1) (2018) 932–955 http://dl.acm. of gyrokinetic turbulence and for his identification of
org/citation.cfm?id=3291125.3291150. the salient transport mechanisms through the tokamak
[51] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: a periphery.
deep learning framework for solving forward and inverse problems involving

You might also like