Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
Abstract
When the governing ordinary differential equations (ODEs) are still elusive, Neural ODEs (NODEs) offer a
promising way to comprehend and forecast the behavior of complex dynamical systems, especially when trajectories
are available. In order to capture the complexities of complex physics within dynamic systems, this paper explores
the use of NODEs. By exploring their predictive potential, we hope to demonstrate that NODEs are useful tools
that can shed light on the dynamics of systems whose ODEs are unknown, in addition to being efficient trajectories
interpolators..
diction and measuring the committed error. Ini- a mapping between successive time-steps, providing a
tial and boundary conditions are enforced in a sim- more accurate and stable time-stepping algorithm.
ilar manner as Lagrangian multiplier in classical ResNet has demonstrated significant success in learn-
optimization problem. This design specialize the ing time-stepping representations within the framework
trained NN to particular case of interest. This lim- of flow maps1 [31]. Researchers [22, 16], (such as Qin
its the generalization property that such NN should et al. [22] and Liu et al. [16]) have explored the use of
feature. neural networks to construct flow maps for advancing
solutions in time without relying on high-fidelity simu-
• RNN [28, 10] are a type of neural network de-
lations. This approach provides a broader framework for
signed to process sequential data. They have
fast time-stepping algorithms, eliminating the need for
loops that allow information to persist across time
initial dimensionality reduction. Qin et al. [22] specifi-
steps, making them suitable for tasks involving se-
cally used a ResNet as the fundamental architecture for
quences, such as time-series data and natural lan-
approximating the neural network model fθ , which rep-
guage processing. Essentially, these networks are
resents the flow map. They introduced one-step, recur-
trained to predict the subsequent value from a pro-
rent, and recursive ResNet models for handling different
vided sequence.
time-step scenarios. The formulation is designed in the
• LSTM [8] is a specialized RNN variant that ad- weak form, requiring no derivative information for gen-
dresses neural network technical issues limiting the erating time-stepping approximations. Numerical ex-
RNN applications, known as vanishing and ex- amples showcased the method’s exceptional accuracy,
ploding gradient problem. They manage the both surpassing even direct numerical integration [18, 27],
short and long term memory dependencies by us- and highlighting the remarkable universal approxima-
ing gating mechanisms, making it particularly ef- tion properties of the ResNet-based model fθ .
fective for tasks with longer-term dependencies. Furthermore, the flow map approximation scheme
is used to learn multi-scale time-stepping [16]. By
• ResNet [9] is a deep learning architecture com-
constructing flow maps for different characteristic
monly used for computer vision tasks. It intro-
timescales, it becomes possible to efficiently forecast
duces residual connections between layers or block
long into the future while minimizing steps on fast
of layers improving the gradient flow and mitigate
timescales. This Hierarchical Time-Stepping (HiTS)
the vanishing gradient problem as well. They en-
scheme achieves remarkable efficiency and accuracy,
able the training of very deep neural networks.
even when compared to leading time-series neural net-
• NODE [4] are a different paradigm for model- works such as LSTM.
ing dynamical systems. Instead of discrete time NODEs have emerged as a versatile tool with wide-
steps, they represent dynamics using continuous- ranging applications, particularly in domains requiring
time differential equations. They provide a flexi- continuous and dynamic models. They excel in con-
ble and adaptive approach for handling time-series structing continuous-time series models capable of ac-
data and are especially useful for physics prob- commodating irregularly spaced data. The concept of
lems and irregularly sampled time data. Similarly continuous neural networks has been further refined
to PINN, the NODE leads to interpretable results, in subsequent studies. The Augmented NODEs (AN-
which is needed in a physical context. Essentially, ODEs) [6] presents enhanced expressiveness, stabil-
they can independently solve ODE systems when ity, and computational efficiency compared to standard
provided with an initial condition. NODEs. Importantly, it can model functions with in-
In [30] the use of physics-guided loss functions in tersecting continuous trajectory mappings that are be-
deep learning to study complex dynamical systems is yond the reach of NODEs. Extending this paradigm
discussed. The PiNN paradigm has shown promising to graph data, the Graph NODEs [20] harnesses con-
results in capturing from simple differential equations tinuous neural networks for graph convolutions. In-
[14] to more elaborated PDEs [11]. However, they can troducing stochasticity, NODEs [20] inject stochastic
face challenges in learning complex physical systems noise through differential equations, reducing overfit-
and suffer from limited generalizability and sensitivity ting and enhancing generalization. Leveraging mech-
to input data quality [5, 2, 32, 13, 17, 1]. In [30] other anistic knowledge, a generative time-series model [15]
approaches are also explored, improving prediction per- incorporates the known differential equation directly
formance, and data generation [5, 32, 33, 29]. into the NODEs framework. Furthermore, NODEs have
As pointed out in [3], researchers [18, 27] (Parish and been employed to approximate unknown terms in dif-
Carlberg [18], as well as Regazzoni et al. [27]) have ferential equations [23], utilizing neural networks and
introduced neural-network-based methods for learn- the adjoint method for efficient gradient computation,
ing time-stepping models. They extensively compare 1 flow map represents the evolution of a system’s state over time,
various neural network architectures, including RNN oftenAdenoted as uk+1 = F(uk ) , where uk is the state at time step
and LSTM, with traditional time-series modeling ap- tk and uk+1 is the state at the next time step. It describes how the
proaches. By using neural networks, they establish system transitions from one state to another.
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations
encompassing initial conditions, ODE parameters, and There are an infinite number of solutions to the previous
boundary conditions. These advancements highlight equation, since it is subject to the initial conditions (as
NODEs’ pivotal role in continuous modeling across di- many as the order of the ODE). One is thus, interested
verse contexts. in solving the following system:
However, while ResNet, RNN, and LSTM have
shown great potential in various applications, including y(0) = y0 ,
time-series data and language processing, they may not y′ (t) = f (y(t), t).
always be the most suitable choices for predicting the
coefficients of the Galerkin projection in the context of It is convenient to express it on its integral form:
a physics problem. Here are some remarks: Z t1
y(t) = y0 + y′ (t)dt.
1. Data Efficiency: Physics problems often have t0
limited data available, especially in experimental ′
scenarios or when simulating expensive computa- In general, the integral of y is difficult to perform or not
tional models. Training deep learning models like possible to be obtained analytically. In that case one can
ResNet or LSTM typically requires a large amount try to approximate it nummerically. For example in the
of data, which might not be feasible in physics ap- most basic approach one can use the Euler method to go
plications. In such cases, NODEs might be more from a continous system to a discrete:
suitable as they can achieve accurate predictions y(t + ∆t) = y(t) + y′ (t)∆t. (1)
with less data.
This method stands as a fundamental numerical integra-
2. Interpretablility: Physics problems often require tion technique. The underlying concept guiding the Eu-
interpretable models to gain insights into underly- ler method is that trajectory evolution occurs through a
ing physical mechanisms. ResNet and LSTM can series of incremental steps, aligning with the slope’s di-
be complex and challenging to interpret due to their rection in an iterative manner.
intricate architectures. In contrast, NODEs provide
a more transparent representation of the underly- 3.1 Machine learning applied to ODEs
ing dynamics, making them more interpretable for
physicists. ResNets have been adapted for solving ODEs by for-
3. Adaptability to Irregular Time Sampling: mulating them as a supervised learning problem. In this
Physics data may not always be uniformly sam- context, ResNets can approximate the unknown solution
pled in time. NODEs can handle irregular time of an ODE by learning the underlying dynamics from
sampling effectively, whereas LSTM and ResNet data.
might require additional preprocessing or adapta- In ResNets for ODEs, residual blocks play a crucial
tions. role. A residual block is a building block that con-
sists of multiple layers and uses skip connections to di-
While RNNs, LSTM, and ResNet have certain de- rectly propagate information from one layer to another.
grees of versatility and can handle varying initial con- These skip connections allow the network to learn resid-
ditions to some extent, when it comes to handling time- ual functions, which capture the difference between the
series data and dynamics, RNNs and LSTM are gener- current layer’s input and output. By stacking multiple
ally more well-suited due to their sequential processing residual blocks, the network can effectively learn com-
capabilities and ability to capture temporal dependen- plex nonlinear dynamics.
cies. NODEs still offer advantages in terms of adaptabil- In the context of ODEs, a residual block takes the
ity and stability, especially when dealing with physics form:
problems and irregular time sampling. Therefore, for
time-series data in physics applications, NODEs are an output = input + residual(input)
appealing choice due to their interpretablility, the trans-
parent representation of dynamics and the ability to gen- where the input represents the hidden state or feature
eralize to unseen initial conditions and time points. representation at a specific point in the ODE solution,
and the residual function captures the transformation ap-
plied to the input within the block.
3 Introduction to NODEs The graphical depiction of the residual block is pre-
sented in Figure 1 (left). The input vector x is subject to
ODEs are equations which of the form
matrix multiplication, yielding an output F (x, θ). Ad-
′ ′′ (n)
G(x, F (x), F (x), . . . , F (x)) = 0. ditionally, a bypass route allows the unmodified x to di-
rectly contribute to the output of the weighted layer. The
In the case of a first-order, continuous-time, linear sys- ⊕ symbol represents value addition. Hence, the overall
tem, the above expression can be written as outcome is expressed as
representing the combined outcome of the layer output goal is to approximate this dynamics function with an
and the skip connection x. This architecture is termed a approximation fˆ(z, t, θ) parameterized by θ [12]. This
residual neural network due to the summation of these approach is different from ResNet since now the neural
two constituents. Basically, it can be understood as a network predicts the derivative instead of the next state
comprehensive network that accepts input x, undergoes of time.
network processing to generate output y, all governed
by parameter set θ. 3.3 Forward pass
ResNets can be understood as solutions to ODEs. In
ResNets, the residual connection in (2) is akin to the up- To simplify the task, let us initially consider a scenario
date equation (1). We can string together multiple resid- where only two observations are available: one at the
ual blocks in series to get a deep residual network, or beginning of the trajectory (z0 , t0 ) and the other at the
ResNet (Figure 1, left). The hidden states in ResNets end (zm , tm ). We can then start the system’s evolution
for ODEs refer to the intermediate representations or from the initial state z0 at time t0 using an initial value
features learned by the network at different stages. As solver for ODEs, for a duration of ∆t = tm − t0 . This
the ODE solution is approximated, the network learns results in a new pair (ẑm , tm ). By comparing this pre-
to map the input (e.g., initial conditions, time points) to dicted state with the observed state (zm , tm ), we can
hidden states that capture the dynamics of the system. adjust the parameters θ to minimize the discrepancy.
These hidden states serve as a compressed representa- More formally, the optimization problem involves
tion of the ODE solution and are crucial for accurately minimizing the following loss function L(ẑm ):
predicting the system’s behavior at different time points. Z tm
By training the ResNet on known ODE solutions or
L(z(tm )) = L z(t0 )+ f (z(t), t, θ)dt = L ODESolve(z(t0 ), f, t0 ,
data generated from ODEs, the network can learn to t0
generalize and approximate the ODE dynamics. The
hidden states within the residual blocks capture the es- where L represents the loss function that quantifies
sential information required for accurately predicting the difference between the predicted state (zm , tm ) ob-
the ODE solution at different time points. By recog- tained by numerically solving the ODE and the observed
nizing residual connections as discretized Euler steps, state (ẑm , tm ). The optimization procedure seeks to find
we can control network depth via discretization, adjust- the optimal parameter values θ that minimize this loss
ing solution accuracy and potentially achieving infinite- function, thereby approximating the underlying dynam-
layer behavior. ics function f (z, t).
However, as it was mentioned in the previous chapter,
ResNets have their limitations. These are related to dis- 3.4 Backward pass
cretization errors, fixed architecture and computational
costs. The continuous-time ODE must typically be dis- In a regular neural network, the process of training in-
cretized into a series of discrete time steps before being volves updating the parameters of the network based on
used with ResNets. With complex dynamics or long- the gradients of the loss function with respect to those
term predictions in particular, this discretization intro- parameters. The gradients indicate the direction and
duces an error. They also have a fixed architecture and magnitude of the changes needed to minimize the loss.
a set quantity of residual blocks. This fixed architecture In the context of NODEs, we also want to update the
might not be adaptable enough to different ODE sys- parameters of the neural network that approximates the
tems with different levels of complexity. Finally, they ODE dynamics in order to learn the underlying dynam-
might need a lot of computations and parameters, espe- ics of the system. The parameters of the neural network
cially for systems with high dimensions or long time in- capture the behavior of the ODE and affect how the sys-
tervals. As a result, training and inference may become tem evolves over time.
computationally expensive. NODEs are other machine To compute the gradients of the loss with respect
learning-oriented techniques to solve ODEs which im- to the neural network parameters, we need to perform
prove ResNets in terms of accuracy, versatility and per- backpropagation. Backpropagation calculates the gra-
formance. dients by propagating the error backwards through the
computational graph of the neural network. This pro-
3.2 NODEs cess allows us to determine how each parameter con-
tributes to the overall loss. In the case of NODEs, the
Consider the problem of modeling a process governed computational graph includes both the neural network
by an unknown ODE, with observations available along and the ODE solver. The ODE solver is responsible for
its trajectory. The ODE can be represented as: integrating the dynamics obtained from the neural net-
work to compute the next state. Therefore, in order to
dz compute the gradients of the loss with respect to the neu-
= f (z(t), t)
dt ral network parameters, we need to account for the con-
where z(t) denotes the state of the system at time t, and tribution of the ODE solver. However, directly differen-
f (z, t) represents the unknown dynamics function. The tiating the ODE solver with respect to the parameters is
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations
z3 = F (z2 , t3 , θ) + z2
z3 = F3 (z2 , θ2 ) + z2
F (z21, t32, θ)
F3 (z2 , θ2 )
z2 = F (z1 , t2 , θ) + z1
z2 = F2 (z1 , θ1 ) + z1
z1 = F1 (z0 , θ0 ) + z0 z1 = F (z0 , t1 , θ) + z0
F1 (z0 , θ0 )
F (z0 , t1 , θ)
z0
z0
ResNet
NODE
Figure 1: Comparison of ResNet architecture and NODE. Both figures depict different components of the neural
network model. Left panel shows a ResNet architecture with three residual blocks, where the input states z0 and
z3 represent the initial and final states, and z1 and z2 represent the hidden states. Right panel illustrates a single
NODE in the neural network.
challenging and computationally expensive in terms of to the forward ODE solution. During the forward pass,
memory efficiency. the ODE solver calculates the solution of the differen-
tial equation z(t) using the initial state z(t0 ) and the
3.5 The adjoint sensitivity method function fθ (z(t), t). During backpropagation, the ad-
joint state a(t) is computed by solving the differential
This main technical difficulty in training was addressed equation starting from the final time tm and backprop-
for first time in [4] by making use of the adjoint sensitiv- agating through time. Subsequently, this adjoint state
ity method [21] to compute the gradients. An overview is employed to determine gradients for the loss func-
of this process is shown in Figure 2. This method facili- tion in relation to the initial state and parameters of the
tates the computation of gradients of a loss function, L, ODE function, which are then utilized to update the
concerning both the initial state, z(t0 ), and the parame- model parameters through gradient descent optimiza-
ters, θ, by introducing an adjoint state, a(t), defined as tion. This approach optimizes gradient computation, en-
the derivative of L with respect to z(t), hancing the efficiency and effectiveness of NODE-based
models.
∂L
a(t) = ,
∂z
4 General Training Framework for
and governed by the following differential equation [4]: Neural Network-Based Dynamical
da(t) ∂fθ (z(t), t) Systems
= −a(t)T
dt ∂z
This adjoint state facilitates the computation of gradi- For training the models, we utilized torchdiffeq [4] in
ents through the formulas [4]: PyTorch [19], a powerful library for solving ordinary
Z t0 differential equations efficiently and accurately.
∂L ∂fθ (z(t), t) The training process is structured as follows:
=− a(t)T dt
∂θ tm ∂θ
• Dataset. The neural network training relies on a
These formulas are efficiently computed by solving the comprehensive dataset derived from the system un-
ODE for a(t), using the same numerical method applied der study. The training set encompasses a range of
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations
The NN
ResNet: Input NN Output computes
the total loss
The NN
NODE: Input NN Dynamics ODESolver Output computes
the total loss
Figure 2: Schematic comparing the iterative process of the ResNet and NODE to update the network parameters.
Since the loss is calculated directly with the output of the Resnet, we can do the backpropagation straightforward.
In contrast, the NODE returns the derivative, with which the next step and loss are calculated. This forces the use
of the adjoint method in order to update the network parameters.
observations over a specified time interval. Addi- point; and mean absolute error
tionally, a distinct test set is constructed for vali-
dating model performance. This test set covers a N
1 X
subsequent time span and is initialized with data MAE = |ztrue ,i − zpred ,i | ,
N i=1
from the final state of the training set.
terms. This combined loss is expressed as a
• Loss function. Our primary objective is to op- weighted sum of these two components. The
timize the neural network parameters to mini- weighted mean coefficient, denoted by α = 0.9,
mize the discrepancy between NODE-generated controls the balance between the MSE and MAE
trajectories and ground truth trajectories obtained terms, allowing for a flexible adjustment of the op-
through traditional numerical solvers. The loss timization focus between accurate fitting and ro-
function employed is a combined loss, bustness to outliers in the training process.
the first one, to retain foundational insights while teractions between fluctuation energy and the sheared
adapting specific aspects. E × B flow.
Our training set spans a time interval from 0 to 28,
• Optimizer. We configure the optimizer for the representing the 80% of the data points. Additionally,
training process. Utilizing the Adam optimizer we construct a test set to validate the performance of
with a specified initial learning rate of 10−2 and our model. The test set spans a time interval from 28
weight decay of 10−4 , we set the stage for it- to 35, representing the 20% of the data points (all vari-
erative parameter updates that drive the model’s ables in this chapter are dimensionless). We initialize
convergence toward an improved representation of the test set using the final state of the training set, en-
plasma-turbulence dynamics. suring a seamless transition between the two. Here, we
• Learning rate. We incorporate a learning rate distinguish two models. The first, trained using 800 data
scheduler that dynamically adjusts the learning rate points and only one initial condition (NODE1); and the
during training (dividing by the half ). This mecha- second trained using 1600 data points and multiple ini-
nism responds to the model’s performance and en- tial conditions (NODEm). The architecture utilized is a
sures that the optimization process converges ef- single hidden layer of 50 neurons. The training process
fectively. Specifically, we employ the ReduceL- encompassed around 300 and 600 epochs 2
(respectively),
ROnPlateau scheduler, which monitors the valida- each comprising multiple data batches .
tion loss and adapts the learning rate accordingly. The results of the training are illustrated in Figure 3.
The NODE effectively captures the system’s temporal
• Training Process. The training procedure involves evolution for both models, depicting the expected trend
the stochastic selection of n samples, subsequently of initial oscillations that gradually dampen until con-
organized into m batches of size p. Initially, a ran- verging to a stable equilibrium. The model’s perfor-
dom sampling approach is employed, accompanied mance is substantiated by a test set loss of less than
by a scheduled learning rate decay on a plateau un- 10−2 for both lower bounds. The flow diagrams ex-
til convergence in local minima of the loss func- tracted from both models are presented and juxtaposed
tion. Upon achieving convergence, the training against the actual flow in Figure 4. Notably, both models
regimen resets using the weights of the best model. adeptly capture the overarching trend of the dynamics,
Subsequent sampling prioritizes regions with 70% exhibiting trajectories that converge towards the same
focus on less accurate areas and 30% on the best- stable equilibrium point. It is noteworthy that the model
performing regions. This phase is characterized by initialized with a solitary initial condition (NODE1)
a scheduled learning rate decay on a plateau un- fails to discern the second saddle point along the E
til convergence is once again attained. The cyclic axis at E = 0.8, a distinction achieved by the second
repetition of these phases includes a hybridization model (NODEm). Nevertheless, it is observed that, in
step, incorporating crossover operations between both cases, certain flow lines intersect the axes—an oc-
the best model in each cycle and the absolute best currence that contradicts the inherent nature of the sys-
model. tem, where points satisfying E = 0 signify stationary
states with constant U values (as dictated by the under-
5 Application of NODEs in Complex lying differential equations), and vice versa. Such dis-
Dynamical Systems crepancies, however, are anticipated, given the inherent
smoothness assumption intrinsic to the NODE frame-
Expanding on the case study, we examine how increas- work. This effect can be attributed to the significant dy-
ing the number of initial conditions affects the predic- namics shift occurring in close proximity to the axes.
tive power of NODEs. We show empirically that this
approach improves trajectory predictions and enables 5.2 Dynamics Analysis and Limit Cycle In-
NODEs to function as efficient interpolators, capturing vestigation of the Rössler System using
subtle features in the dynamical system. NODEs
5.1 Modeling and Analysis of Lotka-Volterra The Rössler system is a classic example of a three-
Dynamics using NODEs dimensional autonomous chaotic dynamical system. It
is described by the following set of ordinary differential
In this study, we endeavor to capture the intricate equations:
dynamics of the Lotka-Volterra model using NODEs. ẋ = −y − z
Our approach involves encapsulating the Lotka-Volterra
equations within a neural network architecture. Specif- ẏ = x + ay
ically, we design a neural network that simulates ż = b + z(x − c)
predator-prey interactions. It isolates and model indi-
vidual facets of the system’s behavior building an inte- 2 In general, a more diverse training data set will need more epochs
E (True) U (True)
E (NODE1) U (NODE1) True NODEm Train (NODE1) Train (NODEm)
E (NODEm) U (NODEm) NODE1 Test (NODE1) Test (NODEm)
1.00 1.00 100
0.75 0.75
10 2
Energy
Loss
0.50 0.50
U
0.25 0.25 10 4
0.00 0.00 10 6
0 20 40 0.00 0.25 0.50 0 200 400
Time E Epoch
Figure 3: Left panel: prediction of the evolution of the system starting from E, U = 0.05. Center panel:
prediction of the trajectory in the phase space. Right panel: evolution of the loss function over the epochs for one
initial condition at E, U = 0.05 and multiple initial conditions.
0.9
0.6
0.3
0.00.0 0.5 1.0 0.00.0 0.5 1.0 0.00.0 0.5 1.0 0.0
E E E
Figure 4: True plot of the dynamical system portrait (left), prediction with an initial condition at x, y = 0.05
(center) and prediction of with multiple initial conditions (right).
Loss
1 10 2
z
2 2.5
2 0 0.0 x
4 2 2.5
0 20 40 y 0 500 1000
Time Epoch
Figure 5: Left panel: prediction of the evolution of the Rössler system starting from x, y, z = 0.2. Center panel:
prediction of the trajectory in the phase space. Right panel: evolution of the loss function over the epochs.
Trajectory
Poincaré section 1.50 True
NODE
1.25
1.00
z
0.75
Z
0.50
0.25
0.5 1.0 1.5 2.0
y
Figure 6: Left Panel: True trajectory over a time span of 10000 units, exhibiting intersections with the Poincaré
section. Right Panel: Detailed view of individual intersections (true in blue and NODE in orange) between the
trajectory and the Poincaré section.
as an interpolator, interpreting system behavior in the conditions causes the loss curve to flatten, highlighting
vicinity of observed trajectories. But when it comes to the increasing improvement in the performance and ro-
extrapolating outside of the studied trajectories, it be- bustness of the model within the range [xmin , xmax ].
comes less effective, as Figure 7 (the validation set is
shown in the right column) illustrates. In particular, the 6 Conclusion
training set’s initial conditions, which follow the format
x = y = z, are x0 = 0.2 and x1 = 3, while the val- NODEs can be used to describe and forecast complex
idation set’s initial conditions are x = 1.5 and x = 4. physics in dynamical systems. In this work we mea-
Interestingly, there are noticeable differences in the pre- sure their capacity to interpolate trajectories and en-
diction accuracy for x0 < x < x1 and x > x1 . hance predictions through a greater number of initial
Even with its poor ability to extrapolate, NODE conditions.
shows a strong grasp of dynamics in the interval x0 <
x < x1 . This claim is strengthened by means of an
extensive analysis involving several initial conditions.
Specifically, Figure X shows the loss curves for trajec-
tories starting at x=y=z that were trained with one initial
condition (x0 = 1), two initial conditions (x0 = 0.2 and
x1 = 3), and three initial conditions (x0 = 0.2, x1 = 3,
and x2 = 5). It is evident that a higher number of initial
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations
Figure 7: Top left panel: train (green)/ test (yellow) loss over epochs. Top center panel: learning rate schedule
over epochs. Top right panel: memori usage over epochs. Bottom left panel: prediction of the trajectory (heavy
solid) againts its true value (light solid) for the train set (x0 = 0.2 and x1 = 3). Bottom right panel: prediction of
the trajectory (heavy solid) againts its true value (light solid) for the validation set (x = 1.0 and x1 = 4).
Figure 8: The curves show the points calculated by measuring the trajectory error from the points x = y = z for
models trained with only one (blue), two (orange) and three (green) initial conditions..
[3] Steven L. Brunton and J. Nathan Kutz. Data- [6] Emilien Dupont, Arnaud Doucet, and Yee Whye
Driven Science and Engineering: Machine Learn- Teh. Augmented neural odes. Advances in neural
ing, Dynamical Systems, and Control. Cambridge information processing systems, 32, 2019.
University Press, 2019.
[4] Ricky T. Q. Chen, Yulia Rubanova, Jesse Betten- [7] James Gleick and M Berry. Chaos-making a new
court, and David K Duvenaud. Neural ordinary science. Nature, 330:293, 1987.
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations
[8] Sepp Hochreiter and Jürgen Schmidhuber. style, high-performance deep learning library. Ad-
Long short-term memory. Neural computation, vances in neural information processing systems,
9(8):1735–1780, 1997. 32, 2019.
[9] S Jian, H Kaiming, R Shaoqing, and Z Xiangyu. [20] Michael Poli, Stefano Massaroli, Junyoung Park,
Deep residual learning for image recognition. In Atsushi Yamashita, Hajime Asama, and Jinkyoo
IEEE Conference on Computer Vision & Pattern Park. Graph neural ordinary differential equations.
Recognition, pages 770–778, 2016. arXiv preprint arXiv:1911.07532, 2019.
[10] MI Jordan. Serial order: a parallel distributed [21] L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkre-
processing approach. technical report, june 1985- lidze, and E. F. Mishchenko. Mathematical the-
march 1986. Technical report, California Univ., ory of optimal processes. John Wiley & Sons,
San Diego, La Jolla (USA). Inst. for Cognitive Sci- Nashville, TN, 1962.
ence, 1986.
[22] Tong Qin, Kailiang Wu, and Dongbin Xiu. Data
[11] S Kawaguchi, K Takahashi, H Ohkama, and
driven governing equations approximation using
K Satoh. Deep learning for solving the boltz-
deep neural networks. Journal of Computational
mann equation of electrons in weakly ionized
Physics, 395:620–635, 2019.
plasma. Plasma Sources Science and Technology,
29(2):025021, 2020. [23] Christopher Rackauckas, Yingbo Ma, Julius
[12] Patrick Kidger. On neural differential equations. Martensen, Collin Warner, Kirill Zubov, Rohit
arXiv preprint arXiv:2202.02435, 2022. Supekar, Dominic Skinner, Ali Ramadhan, and
Alan Edelman. Universal differential equations
[13] Wouter M Kouw and Marco Loog. An introduc- for scientific machine learning. arXiv preprint
tion to domain adaptation and transfer learning. arXiv:2001.04385, 2020.
arXiv preprint arXiv:1812.11806, 2018.
[24] Maziar Raissi and George Em Karniadakis. Hid-
[14] Aditi Krishnapriyan, Amir Gholami, Shandian den physics models: Machine learning of nonlin-
Zhe, Robert Kirby, and Michael W Mahoney. ear partial differential equations. Journal of Com-
Characterizing possible failure modes in physics- putational Physics, 357:125–141, 2018.
informed neural networks. Advances in Neural In-
formation Processing Systems, 34:26548–26560, [25] Maziar Raissi, Paris Perdikaris, and George E Kar-
2021. niadakis. Physics-informed neural networks: A
deep learning framework for solving forward and
[15] Ori Linial, Neta Ravid, Danny Eytan, and Uri inverse problems involving nonlinear partial dif-
Shalit. Generative ode modeling with known un- ferential equations. Journal of Computational
knowns. In Proceedings of the Conference on physics, 378:686–707, 2019.
Health, Inference, and Learning, pages 79–94,
2021. [26] Maziar Raissi, Paris Perdikaris, and George Em
Karniadakis. Physics informed deep learn-
[16] Yuying Liu, J Nathan Kutz, and Steven L
ing (part i): Data-driven solutions of nonlin-
Brunton. Hierarchical deep learning of multi-
ear partial differential equations. arXiv preprint
scale differential equation time-steppers. Philo-
arXiv:1711.10561, 2017.
sophical Transactions of the Royal Society A,
380(2229):20210200, 2022. [27] Francesco Regazzoni, Luca Dede, and Alfio Quar-
[17] Sinno Jialin Pan and Qiang Yang. A survey on teroni. Machine learning for fast and reliable
transfer learning. IEEE Transactions on knowl- solution of time-dependent differential equations.
edge and data engineering, 22(10):1345–1359, Journal of Computational physics, 397:108852,
2009. 2019.
[18] Eric J Parish and Kevin T Carlberg. Time- [28] David E Rumelhart, Geoffrey E Hinton, Ronald J
series machine-learning error models for approx- Williams, et al. Learning internal representations
imate solutions to parameterized dynamical sys- by error propagation, 1985.
tems. Computer Methods in Applied Mechanics
and Engineering, 365:112990, 2020. [29] Rui Wang, Karthik Kashinath, Mustafa Mustafa,
Adrian Albert, and Rose Yu. Towards physics-
[19] Adam Paszke, Sam Gross, Francisco Massa, informed deep learning for turbulent flow predic-
Adam Lerer, James Bradbury, Gregory Chanan, tion. In Proceedings of the 26th ACM SIGKDD In-
Trevor Killeen, Zeming Lin, Natalia Gimelshein, ternational Conference on Knowledge Discovery
Luca Antiga, et al. Pytorch: An imperative & Data Mining, pages 1457–1466, 2020.
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations