0% found this document useful (0 votes)
48 views12 pages

Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations

This document discusses using neural ordinary differential equations (NODEs) to model and predict trajectories in complex dynamical systems when the governing equations are unknown. It provides background on NODEs and their advantages over other deep learning methods for dynamical systems. The goal is to explore the predictive potential of NODEs and demonstrate their usefulness for understanding unknown systems through modeling with data.

Uploaded by

Nathaniel Saura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views12 pages

Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations

This document discusses using neural ordinary differential equations (NODEs) to model and predict trajectories in complex dynamical systems when the governing equations are unknown. It provides background on NODEs and their advantages over other deep learning methods for dynamical systems. The goal is to explore the predictive potential of NODEs and demonstrate their usefulness for understanding unknown systems through modeling with data.

Uploaded by

Nathaniel Saura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Enhancing Trajectory Prediction in Complex Dynamical Systems

with Neural Ordinary Differential Equations


David Garrido Gonzalez1 , Nathaniel Saura1 , Peter Beyer1 and Saddrudin Benkadda1
1
Aix-Marseille Université, CNRS, PIIM UMR 7345, Marseille, France

Abstract
When the governing ordinary differential equations (ODEs) are still elusive, Neural ODEs (NODEs) offer a
promising way to comprehend and forecast the behavior of complex dynamical systems, especially when trajectories
are available. In order to capture the complexities of complex physics within dynamic systems, this paper explores
the use of NODEs. By exploring their predictive potential, we hope to demonstrate that NODEs are useful tools
that can shed light on the dynamics of systems whose ODEs are unknown, in addition to being efficient trajectories
interpolators..

Keywords— NODEs, Dynamical Systems, Trajectory Prediction, Interpolation

1 Introduction surements. An accurate navigation between different


time points is made possible by the ODE solver’s adap-
The complexities of dynamical systems, which are dis- tive step length, which strikes a careful balance between
tinguished by their dynamic nature and the numerous computational efficiency and prediction accuracy. Be-
ways in which distinct phenomena interact with one cause of their versatility, NODEs can be extremely ef-
another, are found in many different fields of science. fective at forecasting system dynamics, particularly in
Dynamical systems are systems that change continu- situations where time-series data is not uniformly sam-
ously over time. They are found in many fields, includ- pled.
ing engineering, chemistry, biology, and physics, which The challenge of extracting analytical dynamics with-
emphasizes the importance of modeling these systems out prior system knowledge is taken on as this work ex-
for both scientific research and real-world applications. plores the use of NODEs for modeling and learning dy-
However, situations often arise where the ODEs gov- namical systems from data. Through the use of the en-
erning the system’s motion are not readily available. coded derivatives found in the ANN block, NODEs pro-
Parameter estimation was the foundation of traditional vide a data-driven method for deciphering the complex
dynamical system modeling techniques, which required dynamics of intricate systems. This investigation aims
the creation of previous models and the laborious pro- to shed light on the revolutionary potential of NODEs
cess of fitting parameters to available data. But recent in redefining our method for comprehending and fore-
developments in data-driven methods have opened the casting the long-term evolution of complex systems, in
door for model-free approaches that use deep learning addition to making a contribution to the developing field
and artificial neural networks (ANNs) to reduce the need of data-driven dynamical systems modeling.
for domain-specific knowledge and automate the mod-
eling process. 2 State-of-the-Art in Deep Learning
Residual Neural Networks (ResNet), among the wide Techniques for Complex Systems
variety of ANNs, have played a significant role in in-
troducing skip connections to lessen difficulties related Recent developments in the field have seen the emer-
to training deep networks. Building on this framework, gence of machine learning techniques, particularly
NODEs offer a new angle by understanding skip con- Residual Network (ResNet), Recurrent Neural Network
nections as an expression of Euler’s algorithm for solv- (RNN), Long Short-Term Memory (LSTM) networks,
ing ODEs numerically. The fundamental function of Physics-Informed Neural Network (PINN), and NODEs
NODEs is the ANN block’s ability to parameterize the to model complex physical dynamics more efficiently
local derivative of the input data. This block provides and accurately:
a novel method for deriving insights into the system’s
evolution over time by learning the dynamics encoded • The PINN [26, 24, 25] is a type of neural network
in the underlying ODE during training. designed to emulate PDE. The targeted physics is
NODEs are incredibly flexible, especially when incorporated into the loss functions. The main part
working with time-series data that has irregular mea- is solving the wanted PDEs using the running pre-
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

diction and measuring the committed error. Ini- a mapping between successive time-steps, providing a
tial and boundary conditions are enforced in a sim- more accurate and stable time-stepping algorithm.
ilar manner as Lagrangian multiplier in classical ResNet has demonstrated significant success in learn-
optimization problem. This design specialize the ing time-stepping representations within the framework
trained NN to particular case of interest. This lim- of flow maps1 [31]. Researchers [22, 16], (such as Qin
its the generalization property that such NN should et al. [22] and Liu et al. [16]) have explored the use of
feature. neural networks to construct flow maps for advancing
solutions in time without relying on high-fidelity simu-
• RNN [28, 10] are a type of neural network de-
lations. This approach provides a broader framework for
signed to process sequential data. They have
fast time-stepping algorithms, eliminating the need for
loops that allow information to persist across time
initial dimensionality reduction. Qin et al. [22] specifi-
steps, making them suitable for tasks involving se-
cally used a ResNet as the fundamental architecture for
quences, such as time-series data and natural lan-
approximating the neural network model fθ , which rep-
guage processing. Essentially, these networks are
resents the flow map. They introduced one-step, recur-
trained to predict the subsequent value from a pro-
rent, and recursive ResNet models for handling different
vided sequence.
time-step scenarios. The formulation is designed in the
• LSTM [8] is a specialized RNN variant that ad- weak form, requiring no derivative information for gen-
dresses neural network technical issues limiting the erating time-stepping approximations. Numerical ex-
RNN applications, known as vanishing and ex- amples showcased the method’s exceptional accuracy,
ploding gradient problem. They manage the both surpassing even direct numerical integration [18, 27],
short and long term memory dependencies by us- and highlighting the remarkable universal approxima-
ing gating mechanisms, making it particularly ef- tion properties of the ResNet-based model fθ .
fective for tasks with longer-term dependencies. Furthermore, the flow map approximation scheme
is used to learn multi-scale time-stepping [16]. By
• ResNet [9] is a deep learning architecture com-
constructing flow maps for different characteristic
monly used for computer vision tasks. It intro-
timescales, it becomes possible to efficiently forecast
duces residual connections between layers or block
long into the future while minimizing steps on fast
of layers improving the gradient flow and mitigate
timescales. This Hierarchical Time-Stepping (HiTS)
the vanishing gradient problem as well. They en-
scheme achieves remarkable efficiency and accuracy,
able the training of very deep neural networks.
even when compared to leading time-series neural net-
• NODE [4] are a different paradigm for model- works such as LSTM.
ing dynamical systems. Instead of discrete time NODEs have emerged as a versatile tool with wide-
steps, they represent dynamics using continuous- ranging applications, particularly in domains requiring
time differential equations. They provide a flexi- continuous and dynamic models. They excel in con-
ble and adaptive approach for handling time-series structing continuous-time series models capable of ac-
data and are especially useful for physics prob- commodating irregularly spaced data. The concept of
lems and irregularly sampled time data. Similarly continuous neural networks has been further refined
to PINN, the NODE leads to interpretable results, in subsequent studies. The Augmented NODEs (AN-
which is needed in a physical context. Essentially, ODEs) [6] presents enhanced expressiveness, stabil-
they can independently solve ODE systems when ity, and computational efficiency compared to standard
provided with an initial condition. NODEs. Importantly, it can model functions with in-
In [30] the use of physics-guided loss functions in tersecting continuous trajectory mappings that are be-
deep learning to study complex dynamical systems is yond the reach of NODEs. Extending this paradigm
discussed. The PiNN paradigm has shown promising to graph data, the Graph NODEs [20] harnesses con-
results in capturing from simple differential equations tinuous neural networks for graph convolutions. In-
[14] to more elaborated PDEs [11]. However, they can troducing stochasticity, NODEs [20] inject stochastic
face challenges in learning complex physical systems noise through differential equations, reducing overfit-
and suffer from limited generalizability and sensitivity ting and enhancing generalization. Leveraging mech-
to input data quality [5, 2, 32, 13, 17, 1]. In [30] other anistic knowledge, a generative time-series model [15]
approaches are also explored, improving prediction per- incorporates the known differential equation directly
formance, and data generation [5, 32, 33, 29]. into the NODEs framework. Furthermore, NODEs have
As pointed out in [3], researchers [18, 27] (Parish and been employed to approximate unknown terms in dif-
Carlberg [18], as well as Regazzoni et al. [27]) have ferential equations [23], utilizing neural networks and
introduced neural-network-based methods for learn- the adjoint method for efficient gradient computation,
ing time-stepping models. They extensively compare 1 flow map represents the evolution of a system’s state over time,
various neural network architectures, including RNN oftenAdenoted as uk+1 = F(uk ) , where uk is the state at time step
and LSTM, with traditional time-series modeling ap- tk and uk+1 is the state at the next time step. It describes how the
proaches. By using neural networks, they establish system transitions from one state to another.
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

encompassing initial conditions, ODE parameters, and There are an infinite number of solutions to the previous
boundary conditions. These advancements highlight equation, since it is subject to the initial conditions (as
NODEs’ pivotal role in continuous modeling across di- many as the order of the ODE). One is thus, interested
verse contexts. in solving the following system:
However, while ResNet, RNN, and LSTM have
shown great potential in various applications, including y(0) = y0 ,
time-series data and language processing, they may not y′ (t) = f (y(t), t).
always be the most suitable choices for predicting the
coefficients of the Galerkin projection in the context of It is convenient to express it on its integral form:
a physics problem. Here are some remarks: Z t1
y(t) = y0 + y′ (t)dt.
1. Data Efficiency: Physics problems often have t0
limited data available, especially in experimental ′
scenarios or when simulating expensive computa- In general, the integral of y is difficult to perform or not
tional models. Training deep learning models like possible to be obtained analytically. In that case one can
ResNet or LSTM typically requires a large amount try to approximate it nummerically. For example in the
of data, which might not be feasible in physics ap- most basic approach one can use the Euler method to go
plications. In such cases, NODEs might be more from a continous system to a discrete:
suitable as they can achieve accurate predictions y(t + ∆t) = y(t) + y′ (t)∆t. (1)
with less data.
This method stands as a fundamental numerical integra-
2. Interpretablility: Physics problems often require tion technique. The underlying concept guiding the Eu-
interpretable models to gain insights into underly- ler method is that trajectory evolution occurs through a
ing physical mechanisms. ResNet and LSTM can series of incremental steps, aligning with the slope’s di-
be complex and challenging to interpret due to their rection in an iterative manner.
intricate architectures. In contrast, NODEs provide
a more transparent representation of the underly- 3.1 Machine learning applied to ODEs
ing dynamics, making them more interpretable for
physicists. ResNets have been adapted for solving ODEs by for-
3. Adaptability to Irregular Time Sampling: mulating them as a supervised learning problem. In this
Physics data may not always be uniformly sam- context, ResNets can approximate the unknown solution
pled in time. NODEs can handle irregular time of an ODE by learning the underlying dynamics from
sampling effectively, whereas LSTM and ResNet data.
might require additional preprocessing or adapta- In ResNets for ODEs, residual blocks play a crucial
tions. role. A residual block is a building block that con-
sists of multiple layers and uses skip connections to di-
While RNNs, LSTM, and ResNet have certain de- rectly propagate information from one layer to another.
grees of versatility and can handle varying initial con- These skip connections allow the network to learn resid-
ditions to some extent, when it comes to handling time- ual functions, which capture the difference between the
series data and dynamics, RNNs and LSTM are gener- current layer’s input and output. By stacking multiple
ally more well-suited due to their sequential processing residual blocks, the network can effectively learn com-
capabilities and ability to capture temporal dependen- plex nonlinear dynamics.
cies. NODEs still offer advantages in terms of adaptabil- In the context of ODEs, a residual block takes the
ity and stability, especially when dealing with physics form:
problems and irregular time sampling. Therefore, for
time-series data in physics applications, NODEs are an output = input + residual(input)
appealing choice due to their interpretablility, the trans-
parent representation of dynamics and the ability to gen- where the input represents the hidden state or feature
eralize to unseen initial conditions and time points. representation at a specific point in the ODE solution,
and the residual function captures the transformation ap-
plied to the input within the block.
3 Introduction to NODEs The graphical depiction of the residual block is pre-
sented in Figure 1 (left). The input vector x is subject to
ODEs are equations which of the form
matrix multiplication, yielding an output F (x, θ). Ad-
′ ′′ (n)
G(x, F (x), F (x), . . . , F (x)) = 0. ditionally, a bypass route allows the unmodified x to di-
rectly contribute to the output of the weighted layer. The
In the case of a first-order, continuous-time, linear sys- ⊕ symbol represents value addition. Hence, the overall
tem, the above expression can be written as outcome is expressed as

G(t, y′ (t)) = 0 ⇐⇒ y′ = f (y(t), t). y = F (x, θ) + x (2)


Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

representing the combined outcome of the layer output goal is to approximate this dynamics function with an
and the skip connection x. This architecture is termed a approximation fˆ(z, t, θ) parameterized by θ [12]. This
residual neural network due to the summation of these approach is different from ResNet since now the neural
two constituents. Basically, it can be understood as a network predicts the derivative instead of the next state
comprehensive network that accepts input x, undergoes of time.
network processing to generate output y, all governed
by parameter set θ. 3.3 Forward pass
ResNets can be understood as solutions to ODEs. In
ResNets, the residual connection in (2) is akin to the up- To simplify the task, let us initially consider a scenario
date equation (1). We can string together multiple resid- where only two observations are available: one at the
ual blocks in series to get a deep residual network, or beginning of the trajectory (z0 , t0 ) and the other at the
ResNet (Figure 1, left). The hidden states in ResNets end (zm , tm ). We can then start the system’s evolution
for ODEs refer to the intermediate representations or from the initial state z0 at time t0 using an initial value
features learned by the network at different stages. As solver for ODEs, for a duration of ∆t = tm − t0 . This
the ODE solution is approximated, the network learns results in a new pair (ẑm , tm ). By comparing this pre-
to map the input (e.g., initial conditions, time points) to dicted state with the observed state (zm , tm ), we can
hidden states that capture the dynamics of the system. adjust the parameters θ to minimize the discrepancy.
These hidden states serve as a compressed representa- More formally, the optimization problem involves
tion of the ODE solution and are crucial for accurately minimizing the following loss function L(ẑm ):
predicting the system’s behavior at different time points. Z tm
By training the ResNet on known ODE solutions or
 
L(z(tm )) = L z(t0 )+ f (z(t), t, θ)dt = L ODESolve(z(t0 ), f, t0 ,
data generated from ODEs, the network can learn to t0
generalize and approximate the ODE dynamics. The
hidden states within the residual blocks capture the es- where L represents the loss function that quantifies
sential information required for accurately predicting the difference between the predicted state (zm , tm ) ob-
the ODE solution at different time points. By recog- tained by numerically solving the ODE and the observed
nizing residual connections as discretized Euler steps, state (ẑm , tm ). The optimization procedure seeks to find
we can control network depth via discretization, adjust- the optimal parameter values θ that minimize this loss
ing solution accuracy and potentially achieving infinite- function, thereby approximating the underlying dynam-
layer behavior. ics function f (z, t).
However, as it was mentioned in the previous chapter,
ResNets have their limitations. These are related to dis- 3.4 Backward pass
cretization errors, fixed architecture and computational
costs. The continuous-time ODE must typically be dis- In a regular neural network, the process of training in-
cretized into a series of discrete time steps before being volves updating the parameters of the network based on
used with ResNets. With complex dynamics or long- the gradients of the loss function with respect to those
term predictions in particular, this discretization intro- parameters. The gradients indicate the direction and
duces an error. They also have a fixed architecture and magnitude of the changes needed to minimize the loss.
a set quantity of residual blocks. This fixed architecture In the context of NODEs, we also want to update the
might not be adaptable enough to different ODE sys- parameters of the neural network that approximates the
tems with different levels of complexity. Finally, they ODE dynamics in order to learn the underlying dynam-
might need a lot of computations and parameters, espe- ics of the system. The parameters of the neural network
cially for systems with high dimensions or long time in- capture the behavior of the ODE and affect how the sys-
tervals. As a result, training and inference may become tem evolves over time.
computationally expensive. NODEs are other machine To compute the gradients of the loss with respect
learning-oriented techniques to solve ODEs which im- to the neural network parameters, we need to perform
prove ResNets in terms of accuracy, versatility and per- backpropagation. Backpropagation calculates the gra-
formance. dients by propagating the error backwards through the
computational graph of the neural network. This pro-
3.2 NODEs cess allows us to determine how each parameter con-
tributes to the overall loss. In the case of NODEs, the
Consider the problem of modeling a process governed computational graph includes both the neural network
by an unknown ODE, with observations available along and the ODE solver. The ODE solver is responsible for
its trajectory. The ODE can be represented as: integrating the dynamics obtained from the neural net-
work to compute the next state. Therefore, in order to
dz compute the gradients of the loss with respect to the neu-
= f (z(t), t)
dt ral network parameters, we need to account for the con-
where z(t) denotes the state of the system at time t, and tribution of the ODE solver. However, directly differen-
f (z, t) represents the unknown dynamics function. The tiating the ODE solver with respect to the parameters is
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

z3 = F (z2 , t3 , θ) + z2
z3 = F3 (z2 , θ2 ) + z2

F (z21, t32, θ)
F3 (z2 , θ2 )

z2 = F (z1 , t2 , θ) + z1
z2 = F2 (z1 , θ1 ) + z1

F2 (z1 , θ1 ) F (z1 , t2 , θ) F (z, t, θ)

z1 = F1 (z0 , θ0 ) + z0 z1 = F (z0 , t1 , θ) + z0

F1 (z0 , θ0 )
F (z0 , t1 , θ)

z0
z0
ResNet
NODE

Figure 1: Comparison of ResNet architecture and NODE. Both figures depict different components of the neural
network model. Left panel shows a ResNet architecture with three residual blocks, where the input states z0 and
z3 represent the initial and final states, and z1 and z2 represent the hidden states. Right panel illustrates a single
NODE in the neural network.

challenging and computationally expensive in terms of to the forward ODE solution. During the forward pass,
memory efficiency. the ODE solver calculates the solution of the differen-
tial equation z(t) using the initial state z(t0 ) and the
3.5 The adjoint sensitivity method function fθ (z(t), t). During backpropagation, the ad-
joint state a(t) is computed by solving the differential
This main technical difficulty in training was addressed equation starting from the final time tm and backprop-
for first time in [4] by making use of the adjoint sensitiv- agating through time. Subsequently, this adjoint state
ity method [21] to compute the gradients. An overview is employed to determine gradients for the loss func-
of this process is shown in Figure 2. This method facili- tion in relation to the initial state and parameters of the
tates the computation of gradients of a loss function, L, ODE function, which are then utilized to update the
concerning both the initial state, z(t0 ), and the parame- model parameters through gradient descent optimiza-
ters, θ, by introducing an adjoint state, a(t), defined as tion. This approach optimizes gradient computation, en-
the derivative of L with respect to z(t), hancing the efficiency and effectiveness of NODE-based
models.
∂L
a(t) = ,
∂z
4 General Training Framework for
and governed by the following differential equation [4]: Neural Network-Based Dynamical
da(t) ∂fθ (z(t), t) Systems
= −a(t)T
dt ∂z
This adjoint state facilitates the computation of gradi- For training the models, we utilized torchdiffeq [4] in
ents through the formulas [4]: PyTorch [19], a powerful library for solving ordinary
Z t0 differential equations efficiently and accurately.
∂L ∂fθ (z(t), t) The training process is structured as follows:
=− a(t)T dt
∂θ tm ∂θ
• Dataset. The neural network training relies on a
These formulas are efficiently computed by solving the comprehensive dataset derived from the system un-
ODE for a(t), using the same numerical method applied der study. The training set encompasses a range of
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

The NN gets Forward The NN emits


an input pass an output

The NN
ResNet: Input NN Output computes
the total loss

Parameters Backward Start


updated pass backpropagation

The NN gets Comppute the Forward The NN emits


an input derivative pass an output

The NN
NODE: Input NN Dynamics ODESolver Output computes
the total loss

Parameters Start Backward Compute the


updated backpropagation pass adjoint

Figure 2: Schematic comparing the iterative process of the ResNet and NODE to update the network parameters.
Since the loss is calculated directly with the output of the Resnet, we can do the backpropagation straightforward.
In contrast, the NODE returns the derivative, with which the next step and loss are calculated. This forces the use
of the adjoint method in order to update the network parameters.

observations over a specified time interval. Addi- point; and mean absolute error
tionally, a distinct test set is constructed for vali-
dating model performance. This test set covers a N
1 X
subsequent time span and is initialized with data MAE = |ztrue ,i − zpred ,i | ,
N i=1
from the final state of the training set.
terms. This combined loss is expressed as a
• Loss function. Our primary objective is to op- weighted sum of these two components. The
timize the neural network parameters to mini- weighted mean coefficient, denoted by α = 0.9,
mize the discrepancy between NODE-generated controls the balance between the MSE and MAE
trajectories and ground truth trajectories obtained terms, allowing for a flexible adjustment of the op-
through traditional numerical solvers. The loss timization focus between accurate fitting and ro-
function employed is a combined loss, bustness to outliers in the training process.

L = α · MSE +(1 − α) · MAE, (3) • Architecture. The neural architecture comprises


an initial linear layer mapping the input to a multi-
comprising both mean squared error, layer perceptron, with ReLU activation functions.
The final layer is a linear projection back to the
N system’s dimensionality.
1 X 2
MSE = (ztrue ,i − zpred ,i )
N i=1 • Transfer learning. To enhance the training pro-
cess and transfer relevant knowledge, parameters
with N as the total number of data points and ztrue ,i from pre-trained models can be employed. This
(zpred ,i ), the true (predicted) value of the i-th data process includes freezing certain layers, typically
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

the first one, to retain foundational insights while teractions between fluctuation energy and the sheared
adapting specific aspects. E × B flow.
Our training set spans a time interval from 0 to 28,
• Optimizer. We configure the optimizer for the representing the 80% of the data points. Additionally,
training process. Utilizing the Adam optimizer we construct a test set to validate the performance of
with a specified initial learning rate of 10−2 and our model. The test set spans a time interval from 28
weight decay of 10−4 , we set the stage for it- to 35, representing the 20% of the data points (all vari-
erative parameter updates that drive the model’s ables in this chapter are dimensionless). We initialize
convergence toward an improved representation of the test set using the final state of the training set, en-
plasma-turbulence dynamics. suring a seamless transition between the two. Here, we
• Learning rate. We incorporate a learning rate distinguish two models. The first, trained using 800 data
scheduler that dynamically adjusts the learning rate points and only one initial condition (NODE1); and the
during training (dividing by the half ). This mecha- second trained using 1600 data points and multiple ini-
nism responds to the model’s performance and en- tial conditions (NODEm). The architecture utilized is a
sures that the optimization process converges ef- single hidden layer of 50 neurons. The training process
fectively. Specifically, we employ the ReduceL- encompassed around 300 and 600 epochs 2
(respectively),
ROnPlateau scheduler, which monitors the valida- each comprising multiple data batches .
tion loss and adapts the learning rate accordingly. The results of the training are illustrated in Figure 3.
The NODE effectively captures the system’s temporal
• Training Process. The training procedure involves evolution for both models, depicting the expected trend
the stochastic selection of n samples, subsequently of initial oscillations that gradually dampen until con-
organized into m batches of size p. Initially, a ran- verging to a stable equilibrium. The model’s perfor-
dom sampling approach is employed, accompanied mance is substantiated by a test set loss of less than
by a scheduled learning rate decay on a plateau un- 10−2 for both lower bounds. The flow diagrams ex-
til convergence in local minima of the loss func- tracted from both models are presented and juxtaposed
tion. Upon achieving convergence, the training against the actual flow in Figure 4. Notably, both models
regimen resets using the weights of the best model. adeptly capture the overarching trend of the dynamics,
Subsequent sampling prioritizes regions with 70% exhibiting trajectories that converge towards the same
focus on less accurate areas and 30% on the best- stable equilibrium point. It is noteworthy that the model
performing regions. This phase is characterized by initialized with a solitary initial condition (NODE1)
a scheduled learning rate decay on a plateau un- fails to discern the second saddle point along the E
til convergence is once again attained. The cyclic axis at E = 0.8, a distinction achieved by the second
repetition of these phases includes a hybridization model (NODEm). Nevertheless, it is observed that, in
step, incorporating crossover operations between both cases, certain flow lines intersect the axes—an oc-
the best model in each cycle and the absolute best currence that contradicts the inherent nature of the sys-
model. tem, where points satisfying E = 0 signify stationary
states with constant U values (as dictated by the under-
5 Application of NODEs in Complex lying differential equations), and vice versa. Such dis-
Dynamical Systems crepancies, however, are anticipated, given the inherent
smoothness assumption intrinsic to the NODE frame-
Expanding on the case study, we examine how increas- work. This effect can be attributed to the significant dy-
ing the number of initial conditions affects the predic- namics shift occurring in close proximity to the axes.
tive power of NODEs. We show empirically that this
approach improves trajectory predictions and enables 5.2 Dynamics Analysis and Limit Cycle In-
NODEs to function as efficient interpolators, capturing vestigation of the Rössler System using
subtle features in the dynamical system. NODEs

5.1 Modeling and Analysis of Lotka-Volterra The Rössler system is a classic example of a three-
Dynamics using NODEs dimensional autonomous chaotic dynamical system. It
is described by the following set of ordinary differential
In this study, we endeavor to capture the intricate equations:
dynamics of the Lotka-Volterra model using NODEs. ẋ = −y − z
Our approach involves encapsulating the Lotka-Volterra
equations within a neural network architecture. Specif- ẏ = x + ay
ically, we design a neural network that simulates ż = b + z(x − c)
predator-prey interactions. It isolates and model indi-
vidual facets of the system’s behavior building an inte- 2 In general, a more diverse training data set will need more epochs

grated and comprehensive portrayal of the intricate in- to converge.


Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

E (True) U (True)
E (NODE1) U (NODE1) True NODEm Train (NODE1) Train (NODEm)
E (NODEm) U (NODEm) NODE1 Test (NODE1) Test (NODEm)
1.00 1.00 100
0.75 0.75
10 2
Energy

Loss
0.50 0.50

U
0.25 0.25 10 4

0.00 0.00 10 6
0 20 40 0.00 0.25 0.50 0 200 400
Time E Epoch

Figure 3: Left panel: prediction of the evolution of the system starting from E, U = 0.05. Center panel:
prediction of the trajectory in the phase space. Right panel: evolution of the loss function over the epochs for one
initial condition at E, U = 0.05 and multiple initial conditions.

1.0 True 1.0 NODE1 1.0 NODEm 2.1


1.8
1.5
1.2
0.5 0.5 0.5
U

0.9
0.6
0.3
0.00.0 0.5 1.0 0.00.0 0.5 1.0 0.00.0 0.5 1.0 0.0
E E E
Figure 4: True plot of the dynamical system portrait (left), prediction with an initial condition at x, y = 0.05
(center) and prediction of with multiple initial conditions (right).

where x, y, and z are the state variables, and a, b and 10−2 .


c are system parameters. The Rössler system exhibits The phase diagram depicted in Figure 5 (center) il-
intricate behavior characterized by chaotic attractors, lustrates a trajectory that exhibits characteristics indica-
which are non-periodic trajectories that are sensitive to tive of a potential limit cycle. In order to rigorously
initial conditions. The Rössler system’s attractor fea- establish the presence of this limit cycle, we employ the
tures a distinctive shape with folding and stretching in Poincaré section method [7]. We meticulously track the
its trajectory, creating a continuous evolution through points where the trajectory intersects the plane defined
phase space. Its chaotic attractor provides a challenging by x = 0 and y > 0. This geometric plane is illustrated
scenario to evaluate the NODE’s ability to capture and in the left panel of Figure 6.
predict intricate trajectories.
The intricate intersections of the trajectory with the
Poincaré plane showcase an oscillatory pattern, oscil-
The examined system is characterized by parameters
lating back and forth within a clustered region. This
a = 0.2, b = 0.2, and c = 1.7. The training configura-
observed behavior holds true for both the true system
tion parallels that of the predator-prey model. The train-
and the NODE, providing compelling evidence for the
ing dataset encompasses a temporal range from 0 to 40,
plausible existence of a limit cycle within this specific
comprising a total of 1600 data instances. Subsequently,
parameter configuration.
the test dataset spans from 40 to 48, encompassing 400
data points. The results of the training are illustrated in Remarkably, the NODE replicates the intersections
Figure 5. The NODE effectively captures the system’s with the plane quite accurately, even when evaluated
temporal evolution, depicting the expected trend of ini- within the 10,000 unit time frame. This accuracy is no-
tial oscillations that gradually are augmented, leading table considering that the NODE was trained on data
to a more stable periodic trajectory. The model’s per- ranging only from 0 to 40.
formance is substantiated by a test set loss of less than In fact, the NODE model performs exceptionally well
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

x (True) y (True) z (True) True NODE Train (NODE) Test (NODE)


x (NODE) y (NODE) z (NODE)
4
102
2
100
2
x,y,z

Loss
1 10 2

z
2 2.5
2 0 0.0 x
4 2 2.5
0 20 40 y 0 500 1000
Time Epoch

Figure 5: Left panel: prediction of the evolution of the Rössler system starting from x, y, z = 0.2. Center panel:
prediction of the trajectory in the phase space. Right panel: evolution of the loss function over the epochs.

Trajectory
Poincaré section 1.50 True
NODE
1.25
1.00
z

0.75
Z
0.50
0.25
0.5 1.0 1.5 2.0
y

Figure 6: Left Panel: True trajectory over a time span of 10000 units, exhibiting intersections with the Poincaré
section. Right Panel: Detailed view of individual intersections (true in blue and NODE in orange) between the
trajectory and the Poincaré section.

as an interpolator, interpreting system behavior in the conditions causes the loss curve to flatten, highlighting
vicinity of observed trajectories. But when it comes to the increasing improvement in the performance and ro-
extrapolating outside of the studied trajectories, it be- bustness of the model within the range [xmin , xmax ].
comes less effective, as Figure 7 (the validation set is
shown in the right column) illustrates. In particular, the 6 Conclusion
training set’s initial conditions, which follow the format
x = y = z, are x0 = 0.2 and x1 = 3, while the val- NODEs can be used to describe and forecast complex
idation set’s initial conditions are x = 1.5 and x = 4. physics in dynamical systems. In this work we mea-
Interestingly, there are noticeable differences in the pre- sure their capacity to interpolate trajectories and en-
diction accuracy for x0 < x < x1 and x > x1 . hance predictions through a greater number of initial
Even with its poor ability to extrapolate, NODE conditions.
shows a strong grasp of dynamics in the interval x0 <
x < x1 . This claim is strengthened by means of an
extensive analysis involving several initial conditions.
Specifically, Figure X shows the loss curves for trajec-
tories starting at x=y=z that were trained with one initial
condition (x0 = 1), two initial conditions (x0 = 0.2 and
x1 = 3), and three initial conditions (x0 = 0.2, x1 = 3,
and x2 = 5). It is evident that a higher number of initial
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

Figure 7: Top left panel: train (green)/ test (yellow) loss over epochs. Top center panel: learning rate schedule
over epochs. Top right panel: memori usage over epochs. Bottom left panel: prediction of the trajectory (heavy
solid) againts its true value (light solid) for the train set (x0 = 0.2 and x1 = 3). Bottom right panel: prediction of
the trajectory (heavy solid) againts its true value (light solid) for the validation set (x = 1.0 and x1 = 4).

Figure 8: The curves show the points calculated by measuring the trajectory error from the points x = y = z for
models trained with only one (blue), two (orange) and three (green) initial conditions..

References differential equations. In S. Bengio, H. Wallach,


H. Larochelle, K. Grauman, N. Cesa-Bianchi, and
[1] Dario Amodei, Chris Olah, Jacob Steinhardt, R. Garnett, editors, Advances in Neural Informa-
Paul Christiano, John Schulman, and Dan Mané. tion Processing Systems, volume 31. Curran Asso-
Concrete problems in ai safety. arXiv preprint ciates, Inc., 2018.
arXiv:1606.06565, 2016.
[2] Chandrajit Bajaj, Luke McLennan, Timothy An- [5] Zhao Chen, Yang Liu, and Hao Sun. Physics-
deen, and Avik Roy. Robust learning of informed learning of governing equations from
physics informed neural networks. arXiv preprint scarce data. Nature communications, 12(1):6136,
arXiv:2110.13330, 2021. 2021.

[3] Steven L. Brunton and J. Nathan Kutz. Data- [6] Emilien Dupont, Arnaud Doucet, and Yee Whye
Driven Science and Engineering: Machine Learn- Teh. Augmented neural odes. Advances in neural
ing, Dynamical Systems, and Control. Cambridge information processing systems, 32, 2019.
University Press, 2019.
[4] Ricky T. Q. Chen, Yulia Rubanova, Jesse Betten- [7] James Gleick and M Berry. Chaos-making a new
court, and David K Duvenaud. Neural ordinary science. Nature, 330:293, 1987.
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

[8] Sepp Hochreiter and Jürgen Schmidhuber. style, high-performance deep learning library. Ad-
Long short-term memory. Neural computation, vances in neural information processing systems,
9(8):1735–1780, 1997. 32, 2019.

[9] S Jian, H Kaiming, R Shaoqing, and Z Xiangyu. [20] Michael Poli, Stefano Massaroli, Junyoung Park,
Deep residual learning for image recognition. In Atsushi Yamashita, Hajime Asama, and Jinkyoo
IEEE Conference on Computer Vision & Pattern Park. Graph neural ordinary differential equations.
Recognition, pages 770–778, 2016. arXiv preprint arXiv:1911.07532, 2019.
[10] MI Jordan. Serial order: a parallel distributed [21] L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkre-
processing approach. technical report, june 1985- lidze, and E. F. Mishchenko. Mathematical the-
march 1986. Technical report, California Univ., ory of optimal processes. John Wiley & Sons,
San Diego, La Jolla (USA). Inst. for Cognitive Sci- Nashville, TN, 1962.
ence, 1986.
[22] Tong Qin, Kailiang Wu, and Dongbin Xiu. Data
[11] S Kawaguchi, K Takahashi, H Ohkama, and
driven governing equations approximation using
K Satoh. Deep learning for solving the boltz-
deep neural networks. Journal of Computational
mann equation of electrons in weakly ionized
Physics, 395:620–635, 2019.
plasma. Plasma Sources Science and Technology,
29(2):025021, 2020. [23] Christopher Rackauckas, Yingbo Ma, Julius
[12] Patrick Kidger. On neural differential equations. Martensen, Collin Warner, Kirill Zubov, Rohit
arXiv preprint arXiv:2202.02435, 2022. Supekar, Dominic Skinner, Ali Ramadhan, and
Alan Edelman. Universal differential equations
[13] Wouter M Kouw and Marco Loog. An introduc- for scientific machine learning. arXiv preprint
tion to domain adaptation and transfer learning. arXiv:2001.04385, 2020.
arXiv preprint arXiv:1812.11806, 2018.
[24] Maziar Raissi and George Em Karniadakis. Hid-
[14] Aditi Krishnapriyan, Amir Gholami, Shandian den physics models: Machine learning of nonlin-
Zhe, Robert Kirby, and Michael W Mahoney. ear partial differential equations. Journal of Com-
Characterizing possible failure modes in physics- putational Physics, 357:125–141, 2018.
informed neural networks. Advances in Neural In-
formation Processing Systems, 34:26548–26560, [25] Maziar Raissi, Paris Perdikaris, and George E Kar-
2021. niadakis. Physics-informed neural networks: A
deep learning framework for solving forward and
[15] Ori Linial, Neta Ravid, Danny Eytan, and Uri inverse problems involving nonlinear partial dif-
Shalit. Generative ode modeling with known un- ferential equations. Journal of Computational
knowns. In Proceedings of the Conference on physics, 378:686–707, 2019.
Health, Inference, and Learning, pages 79–94,
2021. [26] Maziar Raissi, Paris Perdikaris, and George Em
Karniadakis. Physics informed deep learn-
[16] Yuying Liu, J Nathan Kutz, and Steven L
ing (part i): Data-driven solutions of nonlin-
Brunton. Hierarchical deep learning of multi-
ear partial differential equations. arXiv preprint
scale differential equation time-steppers. Philo-
arXiv:1711.10561, 2017.
sophical Transactions of the Royal Society A,
380(2229):20210200, 2022. [27] Francesco Regazzoni, Luca Dede, and Alfio Quar-
[17] Sinno Jialin Pan and Qiang Yang. A survey on teroni. Machine learning for fast and reliable
transfer learning. IEEE Transactions on knowl- solution of time-dependent differential equations.
edge and data engineering, 22(10):1345–1359, Journal of Computational physics, 397:108852,
2009. 2019.

[18] Eric J Parish and Kevin T Carlberg. Time- [28] David E Rumelhart, Geoffrey E Hinton, Ronald J
series machine-learning error models for approx- Williams, et al. Learning internal representations
imate solutions to parameterized dynamical sys- by error propagation, 1985.
tems. Computer Methods in Applied Mechanics
and Engineering, 365:112990, 2020. [29] Rui Wang, Karthik Kashinath, Mustafa Mustafa,
Adrian Albert, and Rose Yu. Towards physics-
[19] Adam Paszke, Sam Gross, Francisco Massa, informed deep learning for turbulent flow predic-
Adam Lerer, James Bradbury, Gregory Chanan, tion. In Proceedings of the 26th ACM SIGKDD In-
Trevor Killeen, Zeming Lin, Natalia Gimelshein, ternational Conference on Knowledge Discovery
Luca Antiga, et al. Pytorch: An imperative & Data Mining, pages 1457–1466, 2020.
Enhancing Trajectory Prediction in Complex Dynamical Systems with Neural Ordinary Differential Equations

[30] Rui Wang and Rose Yu. Physics-guided deep


learning for dynamical systems: A survey. arXiv
preprint arXiv:2107.01272, 2021.

[31] S. Wiggins. Introduction to Applied Nonlinear Dy-


namical Systems and Chaos. Texts in Applied
Mathematics. Springer New York, NY, 2 edition,
2003.
[32] Liu Yang, Xuhui Meng, and George Em Karni-
adakis. B-pinns: Bayesian physics-informed neu-
ral networks for forward and inverse pde prob-
lems with noisy data. Journal of Computational
Physics, 425:109913, 2021.
[33] Yinhao Zhu, Nicholas Zabaras, Phaedon-Stelios
Koutsourelakis, and Paris Perdikaris. Physics-
constrained deep learning for high-dimensional
surrogate modeling and uncertainty quantification
without labeled data. Journal of Computational
Physics, 394:56–81, 2019.

You might also like