0% found this document useful (0 votes)

36 views

Trajectory Flow Matching

Uploaded by

stevelime8

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Trajectory Flow Matching

Uploaded by

stevelime8

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Trajectory Flow Matching with Applications to

Clinical Time Series Modeling

Xi Zhang1,2 ∗ Yuan Pu3 ∗ Yuki Kawamura4 Andrew Loza3

3 †
Yoshua Bengio2,5,6
Dennis L. Shung Alexander Tong2,5 †
arXiv:2410.21154v1 [cs.LG] 28 Oct 2024

1
McGill University, 2 Mila - Quebec AI Institute,
3
Yale School of Medicine
4
School of Clinical Medicine, University of Cambridge,
5
Université de Montréal, 6 CIFAR Fellow

Abstract

Modeling stochastic and irregularly sampled time series is a challenging problem

found in a wide range of applications, especially in medicine. Neural stochastic
differential equations (Neural SDEs) are an attractive modeling technique for
this problem, which parameterize the drift and diffusion terms of an SDE with
neural networks. However, current algorithms for training Neural SDEs require
backpropagation through the SDE dynamics, greatly limiting their scalability and
stability. To address this, we propose Trajectory Flow Matching (TFM), which
trains a Neural SDE in a simulation-free manner, bypassing backpropagation
through the dynamics. TFM leverages the flow matching technique from generative
modeling to model time series. In this work we first establish necessary conditions
for TFM to learn time series data. Next, we present a reparameterization trick
which improves training stability. Finally, we adapt TFM to the clinical time
series setting, demonstrating improved performance on three clinical time series
datasets both in terms of absolute performance and uncertainty prediction, a crucial
parameter in this setting.

1 Introduction
Real world problems often involve systems that evolve continuously over time, yet these systems
are usually noisy and irregularly sampled. In addition, real-world time series often relate to other
covariates, leading to complex patterns such as intersecting trajectories. For instance, in the context
of clinical trajectories in healthcare, patients’ vital sign evolution can follow drastically different,
crossing paths even if the initial measurements are similar, due to the influence of the covariates such
as medication intervention and underlying health conditions. These covariates can be time-varying or
static, and often sparse.
Differential equation-based dynamical models are proficient at learning continuous variables with-
out imputations [Chen et al., 2018, Rubanova et al., 2019, Kidger et al., 2021b]. Nevertheless,
systems governed by ordinary differential equations (ODEs) or stochastic differential equations
(SDEs) are unable to accommodate intersecting trajectories, and thus requires modifications such
as augmentation or modelling higher-order derivatives [Dupont et al., 2019]. While ODEs model
deterministic systems, SDEs contain a diffusion term and can better represent the inherent uncertainty
and fluctuations present in many real world systems. However, fitting stochastic equations to real life
∗
Joint first authorship
†
Joint senior authorship. Correspondence to [email protected]
Code available at: https://github.com/nZhangx/TrajectoryFlowMatching

38th Conference on Neural Information Processing Systems (NeurIPS 2024).

Figure 1: Trajectory Flow Matching trains both an estimator of the next timepoint (x̂θ (t, x)) and
an estimation of the uncertainty (σθ (t, xt )). Using the conditional flow matching framework, these
can be used to predict the instantaneous velocity vθ (t, xt ) and future observations. Both flows are
conditioned on past data x[t−h,t−1] and conditional variables c.

data is challenging because they have thus far required time-consuming backpropagation through an
SDE integration.
In the domain of generative models, diffusion models [Ho et al., 2020, Nichol and Dhariwal, 2021,
Song et al., 2021] and more recently flow matching models [Lipman et al., 2023, Albergo et al.,
2023, Li et al., 2020] have had enormous success by training dynamical models in a simulation-free
framework. The simulation-free framework facilitates the training of much larger models with
significantly improved speed and stability. In this work we generalize simulation-free training for
fitting stochastic differential equations to time-series data, to learn population trajectories while
preserving individual characteristics with conditionals. We present this method as Trajectory Flow
Matching. We demonstrate that our method outperforms current state of the art time series modelling
architecture including RNN, ODE based and flow matching methods. We empirically demonstrate the
utility of our method in clinical applications where hemodynamic trajectories are critical for ongoing
dynamic monitoring and care. We applied our method to the following longitudinal electronic health
record datasets from multiple clinical settings: medical intensive care unit (MICU) data of patients
with sepsis, Emergency Department (ED) data of patients with acute gastrointestinal bleeding, and
MICU data of patients with acute gastrointestinal bleeding.
Our main contributions are:
• We prove the conditions under which continuous time dynamics can be trained simulation-
free using matching techniques.
• We extend the approach to irregularly sampled trajectories with a time predictive loss and to
estimate uncertainty using an uncertainty prediction loss.
• We empirically demonstrate that our approach reduces the error by 15-83% when applied to
the real world clinical data modelling.

2 Preliminaries
2.1 Notation
We consider the setting of a distribution of trajectories over Rd denoted X := {x1 , x2 , . . . , xn }
where each xi consists of a set of trajectories of length T i.e. xi := {xi1 , xi2 , . . . , xiT } with associated
times ti := {ti1 , ti2 , . . . , tiT }. Let xi[t−h,t−1] denote a vector of the last h observed time points. We
denote a (Lipschitz smooth) time dependent vector field conditioned on arbitrary conditions c ∈ Re
v(t, xt , x[t−h,t−1] , c) → dx d
dt : ([0, 1], R , R
h×d
, Re ) → Rd with flow ϕt (v) which induces the time-
dependent density pt = ϕt (v)# (p0 ) for any density p0 : Rd → R+ with Rd p0 = 1. We also
R

consider the coupling π(x0 , x1 ) which operates on the product space of marginal distributions p0 , p1 .

2
2.2 Neural Stochastic Differential Equations
A stochastic differential equation (SDE) can be expressed in terms of a smooth drift f : [0, T ]×Rd →
2
Rd and diffusion g : [0, T ] × Rd → Rd in the Ito sense as:
dxt = f dt + g dWt
d
where Wt : [0, T ] → R is the d-dimensional Wiener process. A density p0 (x0 ) evolved according to
an SDE induces a collection of marginal distributions pt (xt ) viewed as a function p : [0, T ] × Rd →
R+ . In a Neural SDE [Li et al., 2020, Kidger et al., 2021a,b] the drift and diffusion terms are
parameterized with neural networks fθ (t, xt ) and gθ (t, xt ).
dxt = fθ (t, xt )dt + gθ (t, xt )dWt (1)
where the goal is to select θ to enforce xT ∼ Xtrue for some distributional notion of similarity
such as the Wasserstein distance [Kidger et al., 2021b] or Kullback-Leibler divergence [Li et al.,
2020]. However, these objectives are simulation-based, requiring a backpropagation through an SDE
solver, which suffers from severe speed and stability issues. While some issues such as memory
and numerical truncation can be ameliorated using the adjoint state method and advanced numerical
solvers [Kidger et al., 2021b], optimization of Neural SDEs is still a significant issue.
We note that in the special case of zero-diffusion (i.e. gθ (t, xt ) = 0) this reduces to a neural ordinary
differential equation (Neural ODE) [Chen et al., 2018], which is easier to optimize than SDEs, but
still presents challenges to scalability.
2.3 Matching algorithms
Matching algorithms are a simulation-free class of training algorithms which are able to bypass
backpropagation through the solver during training by constructing the marginal distribution as a
mixture of tractable conditional probability paths.
The marginal density pt induced by eq. 1 evolves according to the Fokker-Plank equation (FPE):
g2
∂t pt = −∇ · (pt ft ) + ∆pt (2)
2
where ∆pt = ∇ · (∇pt ) denotes the Laplacian of pt and gradients are taken with respect to xt .
Matching algorithms first construct a factorization of pt into conditional densities pt (xt |z) such that
pt = Eq(z) [pt (xt |z)] and where pt (xt |z) is generated by an SDE dxt = vt (xt |z)dt + σt (xt |z)dWt .
Given this construction it can be shown that the minimizer of
h i
2 2
Lmatch (θ) := Et,q(z),pt (x|z) ∥fθ (t, xt ) − vt (xt |z)∥ + λ2t ∥gθ (t, xt ) − σt (xt |z)∥ (3)
satisfies the FPE of the marginal pt . This is especially useful in the generative modeling setting where
q0 is samplable noise (e.g. N (0, 1)) and q1 is the data distribution. Then we can define z := (x0 , x1 )
as a tuple of noise and data with q(z) := q0 (x0 ) ⊗ q1 (x1 ). This makes eq. 3 optimize a model which
will draw new samples according to the data distribution q1 (x1 ) using
Z 1
x0 ∼ q 0 ; x1 = fθ (t, xt )dt + gθ (t, xt )dWt (4)
0
with the integration computed numerically using any off-the-shelf SDE solver. While this is guaran-
teed to preserve the distribution over time, it is not guaranteed to preserve the coupling of q0 and q1
(if given).
Paired bridge matching In generative modeling random pairings [Liu et al., 2023c, Albergo and
Vanden-Eijnden, 2023, Albergo et al., 2023] or optimal transport [Tong et al., 2024, Pooladian et al.,
2023] pairings are constructed for the conditional distribution q(z). However, in some problems we
would like to match pairs of points as is the case in image-to-image translation [Isola et al., 2017, Liu
et al., 2023a, Somnath et al., 2023]. In this case, training data comes as pairs (x0 , x1 ). In this case we
set q(z) := q(x0 , x1 ) to be samples from these known pairs, and optimize eq. 3. While empirically,
these models perform well, there are no guarantees that the coupling will be preserved outside of the
special case when data comes from the (entropic) optimal transport coupling πε∗ (q0 , q1 ) and defined
as: Z
∗
πε (q0 , q1 ) = arg min d(x0 , x1 )2 dπ(x0 , x1 ) + ε KL(π∥q0 ⊗ q1 ), (5)
π∈U (q0 ,q1 )

3
Algorithm 1 General Trajectory Flow Matching
Input: Trajectories X , noise σ, initial network vθ .
while Training do
z ∼ U(X ), k ∼ U{1, T − 1}, t ∼ U(0, 1)
µt ← txk + (1 − t)xk+1
xt ∼ N (µt , σ 2 t(1 − t)I)
2
xk+1 −xt
LTFM (θ) ← vθ (k + t, xt ) − 1−t
2
Lσt (θ) ← ∥σθ (k + t, xt ) − LTFM ∥
θ ← Update(θ, ∇θ LTFM (θ), ∇θ Lσt (θ))
return vθ , σθ

where U (q0 , q1 ) is the set of admissible transport plans (i.e. joint distributions over x0 and x1 whose
marginals are equal to q0 and q1 ) as shown in [Shi et al., 2023] for some regularization parameter
ε ∈ R≥0 .

3 Trajectory Flow Matching

We now describe our simulation-free method to learn SDEs on time-series data using trajectory
flow matching as summarized in Alg. 1. In the case of time series we need to ensure that trajectory
couplings are preserved. We first set out a general algorithm for flow matching on vector fields in
§3.1 then present a numerical reparameterization which we find stabilizes training in §3.2, a next
observation prediction for irregularly sampled time series in §3.3, and finally present how to learn the
noise in §3.4.
3.1 Preserving Couplings
In this section, we assume access to fully observed and evenly spaced trajectories X =
(x1 , x2 , . . . , xn ) with xi := (xi1 , xi2 , . . . , xiT ) for clarity and notational simplicty. We note that
our method is easily extensible to the more general setting of irregularly sampled trajectories. In this
simplified case we let
z := (x1 , x2 , . . . , xT ) (6)
q(z) := U(X ) (7)
pt (x|z) := N ((⌈t⌉ − t)x⌊t⌋ + (t − ⌊t⌋)x⌈t⌉ , σ 2 (⌈t⌉ − t)(t − ⌊t⌋)I) (8)
x⌈t⌉ − xt
ut (x|z) := (9)
⌈t⌉ − t
where U(X ) is the uniform empirical distribution over X , ⌈·⌉, ⌊·⌋ are the ceiling and floor functions,
and N (·, ·) is the multivariate normal distribution. This is a valid regression in the sense that a
function minimized with Alg. 1 will return a stochastic process that will match the observed marginal
distributions over time as shown in the following lemma.
Lemma 3.1. The SDE dxt = ut (x|z)dt + σ 2 dWt where ut is defined in eq. 9 generates pt (x|z) in
eq. 8 with initial condition p0 := δx1 where δ is the Dirac delta function.
however, while useful, this is still insufficient for time series modeling, as it does not ensure coupling
preservation. For intuition why this is an issue see figure 2.
In TFM we ensure that the couplings are preserved for history lengths h > 0. i.e.
π̂(xT −h , xT −h+1 , . . . , xT ) = π(xT −h , xT −h+1 , . . . , xT ). We first establish a method to ensure
that these couplings are preserved allowing us to use simulation-free flow matching training for the
time-series modeling task. Specifically, as long as the model takes as input (xT −h , xT −h+1 , . . . , xT )
in predicting the flow from T → T + 1, then there exists a function fθ (XT −h:T ) such that the
coupling is preserved.
Proposition 3.2 (Coupling Preservation). Under mild regulatory criteria on ut (·|z), pt , and q, if
Et∼U (0,T ),z∼q(z),c∼q(c|z),xt ∼pt (xt |z) ∥ut (xt |z, c) − ut (xt |c)∥22 = 0
and z, q(z), pt (x|z) and ut (x|z) are as defined in eqs. 6-9 then Π(u)⋆ = Π⋆ (x1:T ).

4
Where Π(u)⋆ represents the coupling of a model which attains minimal loss according to eq. 3 and
Π⋆ (x1:T ) is the coupling of the data distribution. Intuitively, as long as no two paths cross given
conditionals c, then the coupling is preserved. In prior work c = ∅, and the coupling is only preserved
in special cases such as eq. 5.
We next enumerate three assumptions under which the coupling is guaranteed to be preserved at the
optima. We note that these are
(A1) When c = x0 and there exists T : X → X such that T (x0 ) = x1 iff Π⋆ (x0 , x1 ). We note
that this is equivalent to asserting the existence of a Monge map T ⋆ for the coupling Π⋆ .
(A2) There exist no two trajectories xi , xj such that xit = xjt for h consecutive observations and
g = 0.
(A3) Trajectories are associated with unique conditional vectors c independent of t.
Even in cases when (A1)-(A3) may not hold exactly, TFM is a useful model and can often still learn
useful models of the data. In some sense uniqueness up to some history length is enough as it shows
TFM is as powerful as discrete-time autoregressive models. Proofs and further examples are available
in §A.1.
3.2 Target prediction reparameterization
While flow matching generally predicts the flow, there is a target predicting equivalent namely
⌈t⌉ ⌈t⌉
x̂ (t,x )−x −xt
given vθ (t, x) := θ ⌈t⌉−tt t
and ut (x|z) := x⌈t⌉−t which is equivalent to x1 − x0 when
xt : tx1 + (1 − t)x0 then it is easy to show that the target predicting loss is equivalent to a time-
weighted flow-matching loss. Specifically let the target predicting loss be
⌈t⌉
Ltarget (θ) = Et,q(z),pt (x|z) ∥x̂θ (t, x) − x⌈t⌉ ∥2 (10)
then it is easy to show that
Proposition 3.3. There exists a scaling function c(t) : R+ → R such that Ltarget (θ) = c(t)Lmatch (θ).
3.3 Irregularly sampled trajectories

We next consider irregularly sampled time series of the form xi := (xi1 , ti1 ), (xi2 , ti2 ), . . . , (xiT , tiT )
with ti1 < ti2 < · · · < tiT with tnext denoting the next timepoint observed after time t. In this case,
when combined with the target predicting reparameterization in §3.2, we can predict the time till next
observation. We therefore parameterize an auxiliary model hθ (t, xt ) : [0, T ] × Rd → [0, T ] which
predicts the next observation time. This is useful numerically, but also, perhaps more importantly,
is useful in a clinical setting, where the spacing between measurements can be as informative as
the measurements themselves [Allam et al., 2021]. hθ is trained to predict the time till the next
observation with the time predictive loss:
X
Ltp (θ) = ∥hθ (t, xt ) − (tnext − t)∥22 (11)
t∈T i
where tnext is the time of the next measurement. This can be used in conjunction with the xnext
predictor to calculate the flow at time t as
x̂1θ (t, xt ) − xt
vθ (t, xt ) := (12)
hθ (t, xt ) − t
which can be used for inference on new trajectories.
3.4 Uncertainty prediction
Finally, we consider uncertainty prediction. till now we have defined conditional probability paths
using a fixed noise parameter σ. However, this does not have to be fixed. Instead, we consider a
learned σθ (t, xt ) which can be learned iteratively with the loss:
X 2
Luncertainty (θ, x) = σθ (t, xt ) − ∥x̂θ (t, xt ) − xnext ∥22 2 (13)
t∈T
which learns to predict the error in the estimate of xt . This loss can be interpreted as training an
epistemic uncertainty predictor which is similar to that proposed in direct epistemic uncertainty
prediction (DEUP) [Lahlou et al., 2023].

5
Figure 2: 1D harmonic oscillator overfitting experiment results. Left: TFM-ODE (ours) with memory
= 3. Middle: TFM-ODE (ours) without memory. Right: Aligned FM [Liu et al., 2023a, Somnath
et al., 2023].

4 Experimental Results
In this section we empirically evaluate the performance of the trajectory flow matching objective
in terms of time series modeling error, but also uncertainty quantification. We also evaluate a
variety of simulation-based and simulation-free methods including both stochastic and deterministic
methods. Stochastic methods are in general more difficult to fit, but can be used to better model
uncertainty and variance. Further experimental details can be found in §B. Experiments were run on
a computing cluster with a heterogenous cluster of NVIDIA RTX8000, V100, A40, and A100 GPUs
for approximately 24,000 GPU hours. Individual training runs require approximately one gpu day.
Baselines In addition to different ablations of trajectory flow matching, we also evaluate Neu-
ralODE [Chen et al., 2018], NeuralSDE [Li et al., 2020, Kidger et al., 2021b, Kidger, 2022], Latent
NeuralODE [Rubanova et al., 2019], and an aligned flow matching method (Aligned FM) [Liu et al.,
2023a, Somnath et al., 2023] where the couplings are sampled according to the ground truth coupling
during training.
Metrics We primarily make use of two metrics. The average mean-squared-error (Mean MSE) over
left out time series to measure the time series modeling error defined as
1 X
MSE(x̂, x) = ∥x̂t − xt ∥22 , (14)
T −1
t∈[2,T ]

where x̂ and x are the predicted and true trajectories respectively. We also use the maximum mean
discrepancy with a radial basis function kernel (RBF MMD) which measures how well the distribution
over next observation is modelled by comparing the predicted distribution to the distribution over
next states in the ground truth trajectory. Specifically we compute:
1 X
ˆ t , ∆t )
RBF-MMD(θ, x̂, x) := MMD(∆ (15)
T −1
t∈[2,T ]
Rt
ˆ t = x̂t − xt−1 , ∆t = xt − xt−1 , and x̂t :=
where ∆ fθ (s, xs )ds + gθ (t, xs )dWs is a set of
s=t−1
samples from the model prediction at time t.
4.1 Exploring coupling preservation with 1D harmonic oscillators
We begin by evaluating how trajectory flow matching performs in a simple one dimensional setting of
harmonic oscillators. We show that vanilla conditional flow and bridge matching [Liu et al., 2023c,b,
Albergo and Vanden-Eijnden, 2023], specifically aligned approaches [Somnath et al., 2023, Liu
et al., 2023a] are unable to preserve the coupling even in a simple one dimensional setting. However,
augmented with our trajectory flow matching approach, and specifically using (A2), which includes
information on previous observations, the model is able to fit the harmonic oscillator dataset well.
The harmonic oscillator dataset consists of one-dimensional oscillatory trajectories from a damped
harmonic oscillator, with each trajectory distinguished by a unique damping coefficient c. Specifically
we sample trajectories x from:
xi = xi−1 + vi−1 (ti − ti−1 ); x0 = 1 (16)

6
where v is the velocity of the oscillator updated by

c k
vi = vi−1 + − vi−1 − xi−1 (ti − ti−1 ); v0 = 0 (17)
m m
with ti = 0.1 · i for i = 0, 1, 2, . . . , 99, spring constant k = 1, and mass m = 1.
As c increases, the trajectories evolve from underdamped scenarios with prolonged oscillations to
critically and overdamped states where the oscillator quickly stabilizes. This leads to intersecting
trajectories due to frequency and phase differences, despite their shared starting point. We perform
overfitting experiments on three trajectories generated by varying c.
As shown in Figure 2, models without history information are unable to distinguish between the
three crossing trajectories that share the same starting point, resulting in overlapping predictions. In
contrast, TFM-ODE that incorporates three previous observations is able to fit the crossing trajectories
with high accuracy, with the predicted trajectories almost completely overlapping the ground truth.
This is because the dataset with satisfies (A2) with h = 4 (TFM-ODE), but not h = 0 (TFM-ODE no
memory and Aligned FM).
4.2 Experiments on clinical datasets
Next we compared the performance of TFM and TFM-ODE with the current SDE and ODE baselines,
respectively, for modeling real-world patient trajectories formed with heart rate and mean arterial
blood pressure measurements within the first 24 hours of admission across three different datasets.
These are clinical measurements that are taken most frequently and used to evaluate the hemodynamic
status of patients, a key indicator of disease severity. Additionally, we evaluated our models against
flow matching on these datasets, each with distinct characteristics, to assess their ability to generalize
across different distributions. A full description of the datasets are available in Appendix B.2 with
the publicly available datasets used under The PhysioNet Credentialed Health Data License Version
1.5.0 and the EHR dataset with local institutional IRB approval:
• ICU Sepsis: a subset of the eICU Collaborative Research Database v2.0 [Pollard et al.,
2019] of patients admitted with sepsis as the primary diagnosis
• ICU Cardiac Arrest: a subset of the eICU Collaborative Research Database v2.0 [Pollard
et al., 2019] of patients at risk for cardiac arrest
• ICU GIB: a subset of the Medical Information Mart for Intensive Care III [Johnson et al.,
2016] of patients with gastrointestinal bleeding as the primary diagnosis
• ED GIB: patients presenting with signs and symptoms of acute gastrointestinal bleeding to
the emergency department of a large tertiary care academic health system
4.2.1 Prediction accuracy and precision: TFM and TFM-ODE
TFM-ODE yields more accurate trajectory prediction Across the three datasets TFM-ODE
outperformed the baseline models by 15% to 20%, as seen in table 1. We noticed that TFM has a
similar performance as TFM-ODE. In one case TFM outperformed the non-stochastic TFM-ODE, as
seen in the ICU GIB dataset. For ICU sepsis, the performance improvement from the baseline is the
most significant, around 83%. This coincides with the ICU sepsis dataset having the most amount
of measurement per trajectory. The improvement is seen in both TFM and TFM-ODE, possibly
indicating they are able to learn better given more data, resulting in a more precise flow. Not formally
measured, we noted that given the same time constraint, FM based models were significantly faster
and often finished training before the time limit.
TFM yields better uncertainty prediction Though TFM-ODE had lower test MSE for two out
of three times, TFM yielded better uncertainty prediction overall, as seen in table 2. Notably, TFM
also had less variance in the uncertainty prediction than TFM-ODE. A plausible explanation in this
case is a sacrifice in bias that subsequently decreases the variance for the stochastic implementation,
reflecting the bias-variance trade off. Sampled graphs of TFM can be seen in figure 3. It is notable
that the model is able to detect the measurement uncertainty at certain timepoints, matching the
increase in amplitude of oscillation in patient trajectories.
4.2.2 Trajectory Variance Distribution Comparison
TFM trajectories accurately match the noise distribution in the data TFM is able to match
the noise distribution in addition to the overall trajectory shape, which is useful in settings where

7
Table 1: Mean ± Std. deviation MSE (×10−3 ) by models and datasets. Split into deterministic (top)
and stochastic models (bottom). Top performing model for each setting and dataset in bold.

ICU Sepsis ICU Cardiac Arrest ICU GIB ED GIB

NeuralODE 4.776 ± 0.000 6.153 ± 0.000 3.170 ± 0.000 10.859 ± 0.000
FM baseline ODE 4.671 ± 0.791 10.207± 1.076 118.439 ± 17.947 11.923 ± 1.123
LatentODE RNN 61.806 ± 46.573 386.190 ± 558.140 422.886 ± 431.954 980.228±1032.393
TFM-ODE (ours) 0.793 ± 0.017 2.762 ± 0.021 2.673 ± 0.069 8.245 ± 0.495
NeuralSDE 4.747 ± 0.000 3.250 ± 0.024 3.186 ± 0.000 10.850 ± 0.043
TFM (ours) 0.796 ± 0.026 2.755 ± 0.015 2.596 ± 0.079 8.613 ± 0.260

Table 2: Uncertainty test MSE loss for TFM-ODE and TFM with two different ICU datasets.

ICU sepsis ICU Cardiac Arrest ICU GIB

TFM-ODE 1.039 ± 0.1645 0.970 ± 0.1426 0.9843 ± 0.2233
TFM 0.724 ± 0.0072 0.636 ± 0.0024 0.605 ± 0.0137

Figure 3: Three samples from predicted trajectory and uncertainty on ICU GIB test set. Top:
Predicted (orange) and the ground truth (blue) mean arterial pressure (MAP). Bottom: The absolute
value of the uncertainty predicted by TFM.

data has high stochasticity. We compared our models to NeuralODE and NeuralSDE in matching the
variance in neighboring data points, seen in table 3. We verify that between the baseline NeuralSDE
and NeuralODE, NeuralSDE has a lower MMD and is better able to match data points. We find in
ICU GIB and ED GIB datasets, TFM outperforms both in matching the variance in data. Notably,
the performance pattern is reversed for the MMD metrics and mean MSE metrics with respect to
TFM and TFM-ODE where better MSE leads to worse MMD and vice versa. As such, this further
confirms the bias-variance trade-off for both TFM and TFM-ODE implementation.

Table 3: Data variance MMD for by models and datasets. Split into deterministic models (top) and
stochastic models (bottom). Top performing model for each setting and dataset in bold.

ICU Sepsis ICU Cardiac Arrest ICU GIB ED GIB

NeuralODE 1.988 ± 0.000 2.246 ± 0.000 2.090 ± 0.000 2.192 ± 0.000
TFM-ODE (ours) 1.172 ± 0.017 1.295 ± 0.006 1.087 ± 0.02 1.063 ± 0.031
NeuralSDE 1.212 ± 0.000 3.261 ± 0.020 1.332 ± 0.000 1.465 ± 0.122
TFM (ours) 1.199 ± 0.006 0.993 ± 0.003 0.844 ± 0.013 0.717 ± 0.016

4.2.3 Ablation Study

We performed ablation studies on TFM and TFM-ODE to attribute importance of various model
components contributing to the performance, as seen in table 4. We examined three aspects of the
model, two of which were part of our main contributions: uncertainty prediction and memory. We
also ablate the model hidden dimension width to infer its potential in scaling effect.
TFM and TFM-ODE performance scales with model size In contrast to Neural DE based models,
TFM and TFM-ODE exhibit scaling effect, in which the model performance becomes better with
a larger hidden dimension. This has been observed in previous flow matching models in image
generation [Tong et al., 2024]. This may pave the way for further improvements from larger models.

8
Table 4: Mean MSE (×10−3 ) by ablated versions of TFM, TFM-ODE, and datasets.
Uncertainty Memory Hidden ICU Sepsis ICU Cardiac ICU GIB ED GIB
Prediction Size Arrest
TFM-ODE ✓ ✓ 256 0.793 ± 0.017 2.762 ± 0.017 2.673 ± 0.069 8.245 ± 0.495
✓ 256 1.170 ± 0.014 2.759 ± 0.015 3.097 ± 0.054 8.659 ± 0.429
256 1.555 ± 0.122 3.242 ± 0.050 2.981 ± 0.161 6.381 ± 0.451
64 1.936 ± 0.262 3.244± 0.025 4.003 ± 0.347 11.253± 4.597
TFM ✓ ✓ 256 0.796 ± 0.026 2.596 ± 0.079 2.762 ± 0.021 8.613 ± 0.260
✓ 256 0.816 ± 0.031 2.778 ± 0.021 2.754 ± 0.095 8.600 ± 0.389
64 1.965 ± 0.289 3.271 ± 0.031 4.037 ± 0.314 7.549 ± 0.737

Uncertainty improves performance of trajectory prediction For TFM and TFM-ODE, the flow
network used to learn the uncertainty σxt is separate from the flow network learning xt . The loss
function of the network learning xt is independent of uncertainty flow network. Therefore, it was
unexpected that taking away the uncertainty prediction would result in increased MSE test loss for
learning xt . This implies further a process in the synergistic effects between xt flow and σxt flow.
Trajectory memory may improve performance in high frequency measurement settings We
conditioned the model based on a sliding window of trajectory history to disentangle data points that
otherwise look indistinguishable to FM models. This improved the interpolation performance in the
ICU Sepsis and ICU GIB dataset. Notably, this modification did not improve performance the ED
GIB dataset, which could be due to shorter trajectories for patients and lower measurement frequency
in the defined time period. This may also be explained by the decreased severity of disease in the
ED compared to the ICU. Adding memory as a condition may be more suitable for patients whose
clinical trajectories have a higher frequency of measurements.

5 Related Work
Continuous-time neural network architectures have outperformed traditional RNN methods in mod-
eling irregularly sampled clinical time series to optimize interpolation and extrapolation. Neural
ODE with latent representations of trajectories [Rubanova et al., 2019] outperformed RNN-based
approaches [Lipton et al., 2016, Che et al., 2018, Cao et al., 2018, Rajkomar et al., 2018] for inter-
polation while providing explicit uncertainty estimates about latent states. More recently, Neural
SDEs appear to outperform LSTM [Hochreiter and Schmidhuber, 1997], Neural ODE [Chen et al.,
2018, De Brouwer et al., 2019, Dupont et al., 2019, Lechner and Hasani, 2020], and attention-based
[Shukla and Marlin, 2021, Lee et al., 2022] approaches in interpolation performance while natively
handling uncertainty using drift and diffusion terms [Oh et al., 2024].
Discrete-time approaches offer an alternative to our continuous-time model model transformers
utilize a discrete-time representation with a sequential processing [Gao et al., 2024, Nie et al., 2023,
Woo et al., 2024, Ansari et al., 2024, Dong et al., 2024, Garza and Mergenthaler-Canseco, 2023, Das
et al., 2024, Liu et al., 2024, Kuvshinova et al., 2024] models for traditional time series modeling.
Adaptations to the baseline transformer includes structuring observations into text with finetuning
[Zhang et al., 2023, Zhou et al., 2023], without finetuning [Xue and Salim, 2024, Gruver et al., 2023],
or using autoregressive model vision transformers to model unevenly spaced time series data by
converting time series into images [Li et al., 2023].
Continuous-time systems are of great interest for learning causal representations using assumptions
by using observations to directly modify the system state [De Brouwer et al., 2022, Jia and Benson,
2019]. Variations include intervention modeling with separate ODEs for interventions and outcome
processes [Gwak et al., 2020], using liquid time-constant networks [Hasani et al., 2021, Vorbach
et al., 2021], or modeling treatment effects with either one [Bellot and van der Schaar, 2021] or
multiple interventions [Seedat et al., 2022]. The importance of accounting for external interventions
is a particular challenge in clinical data, where external interventions (change in environment due to
treatment decisions or clinical context such as ED or ICU) are common in clinical data trajectories.

6 Conclusion
In this work we present Trajectory Flow Matching, a simulation-free training algorithm for neural
differential equation models. We show when trajectory flow matching is valid theoretically, then
demonstrate its usefulness empirically in a clinical setting. The ability to model the underlying

9
continuous physiologic processes during critical illness using irregular, sparsely sampled, and noisy
data has the potential for broad impacts in care settings such as the emergency department or ICU.
These models could be used to improve clinical decision making, inform monitoring strategies,
and optimize resource allocation by identifying which patients are likely to deteriorate or recover.
These use cases will require thorough prospective validation and calibration for specific clinical
outcomes, for example using the likelihood of a patient crossing a specific heart rate or blood pressure
threshold for decisions on level of care (ICU versus inpatient floors) or specific interventions such
as transfusions. In these applications, it will be important to assess and control for bias that may be
present due to which patient subpopulations are present in training data.
Limitations Limitations of the method includes the selective utility of integrating memory in
clinical settings with high measurement frequency and no current capacity for estimating causal
representations, though this will be an important future research direction. Potential harms include
the following: erroneous predictions that either results in delayed care or overutilization of the
health system. Accurate trajectory predictions have the potential to inform clinical decision-making
regarding the appropriate level of care, leading to more timely and appropriate interventions.
Future work We hope to extend our method to cover other types of time series that have periodicity
in the components, potentially incorporating Fourier transform [Li et al., 2021] and Physics-Inspired
Neural Networks (PINN). Since interpretability is an important factor for clinical reliability, we are
developing methods to further elucidate key components affecting the prediction. As well, we hope
to incorporate functional flow matching for fully continuous setting [Kerrigan et al., 2024].

7 Broader Impact
Our work extends flow matching into the domain of time series modeling, demonstrating a specific
instance of clinical time series prediction. In contrast to the large transformer-based models, our
method has fewer in parameters and less training time needed. Notably, it scales well with parameters.
As well, our parameterization on Stochastic Differential Equations (SDE) allow faster training time
than traditional SDE integration.
Accurate timeseries modeling in healthcare has the potential for significant benefits, but also in-
troduces risks. Benefits that could be derived from more accurate prediction of clinical courses
include improved treatment decisions, resource allocation, as well as more informative discussions
of prognosis with patients or family members. Risks may come from inaccuracies in predictions
which could lead to harms by biasing decision making of clinical teams. In the general case of
false negative prediction (prediction of trajectories with falsely favorable outcomes) this may lead to
undertreatment and in the case of false positive prediction (prediction of trajectories with incorrect
detrimental outcomes) or overtreating patients. These inaccuracies may also propagate biases in
training data.
To move towards broad impact in the clinical domain, this work will require validation and bias esti-
mates. Furthermore, models deployed in domains with high-stakes prediction require interpretability,
which can help identify biases, miscalibration, discordance with domain knowledge, as well as build
trust with teams using predictions from the model. At this time, flow-based methods have limited
tools for interpretability, and we recognize this as a gap in need of future work.

Acknowledgements
The authors would like to thank Mathieu Blanchette for useful comments on early versions of this
manuscript. We are also grateful to the anonymous reviewers for suggesting numerous improvements.
The authors acknowledge funding from the National Institutes of Health, UNIQUE, CIFAR, NSERC,
Intel, and Samsung. The research was enabled in part by computational resources provided by the Dig-
ital Research Alliance of Canada (https://alliancecan.ca), Mila (https://mila.quebec),
Yale School of Medicine and NVIDIA.

References
M. S. Albergo and E. Vanden-Eijnden. Building normalizing flows with stochastic interpolants.
In The Eleventh International Conference on Learning Representations, 2023. URL https:
//openreview.net/forum?id=li7qeBbCR1t.

10
M. S. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework
for flows and diffusions. CoRR, abs/2303.08797, 2023. URL https://doi.org/10.48550/
arXiv.2303.08797.
A. Allam, S. Feuerriegel, M. Rebhan, and M. Krauthammer. Analyzing patient trajectories with
artificial intelligence. J Med Internet Res, 23(12):e29812, Dec 2021. ISSN 1438-8871. doi:
10.2196/29812. URL https://www.jmir.org/2021/12/e29812.
A. F. Ansari, L. Stella, C. Turkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram,
S. Pineda Arango, S. Kapoor, J. Zschiegner, D. C. Maddix, H. Wang, M. W. Mahoney, K. Torkkola,
A. Gordon Wilson, M. Bohlke-Schneider, and Y. Wang. Chronos: Learning the language of time
series. arXiv preprint arXiv:2403.07815, 2024.
A. Bellot and M. van der Schaar. Policy analysis using synthetic controls in continuous-time. In
M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine
Learning, volume 139 of Proceedings of Machine Learning Research, pages 759–768. PMLR,
18–24 Jul 2021. URL https://proceedings.mlr.press/v139/bellot21a.html.
W. Cao, D. Wang, J. Li, H. Zhou, L. Li, and Y. Li. Brits: Bidirectional recurrent imputation
for time series. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and
R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran As-
sociates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/
file/734e6bfcd358e25ac1db0a4241b95651-Paper.pdf.
Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu. Recurrent neural networks for multivariate
time series with missing values. Scientific Reports, 8(1), 2018. doi: 10.1038/s41598-018-24271-9.
R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential
equations. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Gar-
nett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Asso-
ciates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/
file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf.
T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm
sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
M. M. Churpek, T. C. Yuen, S. Y. Park, D. O. Meltzer, J. B. Hall, and D. P. Edelson. Derivation of a
cardiac arrest prediction model using ward vital signs. Critical Care Medicine, 2012.
A. Das, W. Kong, R. Sen, and Y. Zhou. A decoder-only foundation model for time-series forecasting.
In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.
net/forum?id=jn2iTJas6h.
E. De Brouwer, J. Simm, A. Arany, and Y. Moreau. Gru-ode-bayes: Continuous modeling of
sporadically-observed time series. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-
Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, vol-
ume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_
files/paper/2019/file/455cb2657aaa59e32fad80cb0b65b9dc-Paper.pdf.
E. De Brouwer, J. Gonzalez, and S. Hyland. Predicting the impact of treatments over time with
uncertainty aware neural differential equations. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera,
editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics,
volume 151 of Proceedings of Machine Learning Research, pages 4705–4722. PMLR, 28–30 Mar
2022. URL https://proceedings.mlr.press/v151/de-brouwer22a.html.
J. Dong, H. Wu, Y. Wang, Y.-Z. Qiu, L. Zhang, J. Wang, and M. Long. Timesiam: A pre-training
framework for siamese time-series modeling. In Forty-first International Conference on Machine
Learning, 2024. URL https://openreview.net/forum?id=wrTzLoqbCg.
E. Dupont, A. Doucet, and Y. W. Teh. Augmented neural odes. In H. Wal-
lach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, edi-
tors, Advances in Neural Information Processing Systems, volume 32. Curran Associates,
Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/
21be9a4bd4f81549a9d1d241981cec3c-Paper.pdf.

11
S. Gao, T. Koker, O. Queen, T. Hartvigsen, T. Tsiligkaridis, and M. Zitnik. Units: Building a unified
time series model. arXiv, 2024. URL https://arxiv.org/pdf/2403.00131.pdf.

A. Garza and M. Mergenthaler-Canseco. Timegpt-1, 2023.

N. Gruver, M. Finzi, S. Qiu, and A. G. Wilson. Large language models are zero-shot time series
forecasters. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,
Advances in Neural Information Processing Systems, volume 36, pages 19622–19635. Curran As-
sociates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/
file/3eb7ca52e8207697361b2c0fb3926511-Paper-Conference.pdf.

D. Gwak, G. Sim, M. Poli, S. Massaroli, J. Choo, and E. Choi. Neural Ordinary Differential
Equations for Intervention Modeling. arXiv e-prints, art. arXiv:2010.08304, Oct. 2020. doi:
10.48550/arXiv.2010.08304.

R. Hasani, M. Lechner, A. Amini, D. Rus, and R. Grosu. Liquid time-constant networks. Proceedings
of the AAAI Conference on Artificial Intelligence, 35(9):7657–7666, May 2021. doi: 10.1609/aaai.
v35i9.16936. URL https://ojs.aaai.org/index.php/AAAI/article/view/16936.

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in Neural
Information Processing Systems, 33:6840–6851, 2020.

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780,

1997. ISSN 1530-888X. doi: 10.1162/neco.1997.9.8.1735. URL http://dx.doi.org/10.
1162/neco.1997.9.8.1735.

P. Isola, J. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial
networks. CVPR, 2017.

J. Jia and A. R. Benson. Neural jump stochastic differential equations. In H. Wal-

lach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, edi-
tors, Advances in Neural Information Processing Systems, volume 32. Curran Associates,
Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/
59b1deff341edb0b76ace57820cef237-Paper.pdf.

A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits,

L. Anthony Celi, and R. G. Mark. Mimic-iii, a freely accessible critical care database. Scientific
data, 3(1):1–9, 2016.

G. Kerrigan, G. Migliorini, and P. Smyth. Functional flow matching. In S. Dasgupta, S. Mandt, and
Y. Li, editors, Proceedings of The 27th International Conference on Artificial Intelligence and
Statistics, volume 238 of Proceedings of Machine Learning Research, pages 3934–3942. PMLR,
02–04 May 2024. URL https://proceedings.mlr.press/v238/kerrigan24a.html.

P. Kidger. On neural differential equations, 2022. URL https://arxiv.org/abs/2202.02435.

P. Kidger, J. Foster, X. Li, H. Oberhauser, and T. Lyons. Neural sdes as infinite-dimensional gans. In
International conference on machine learning. PMLR, 2021a.

P. Kidger, J. Foster, X. C. Li, and T. Lyons. Efficient and accurate gradients for neural sdes. In
M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in
Neural Information Processing Systems, volume 34, pages 18747–18761. Curran Associates,
Inc., 2021b. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/
9ba196c7a6e89eafd0954de80fc1b224-Paper.pdf.

K. Kuvshinova, O. Tsymboi, A. Kostromina, D. Simakov, and E. Kovtun. Towards foundation time

series model: To synthesize or not to synthesize? arXiv preprint arXiv:2403.02534, 2024.

S. Lahlou, M. Jain, H. Nekoei, V. I. Butoi, P. Bertin, J. Rector-Brooks, M. Korablyov, and Y. Bengio.

DEUP: Direct epistemic uncertainty prediction. Transactions on Machine Learning Research,
2023. ISSN 2835-8856. URL https://openreview.net/forum?id=eGLdVRvvfQ. Expert
Certification.

12
M. Lechner and R. Hasani. Learning long-term dependencies in irregularly-sampled time series.
arXiv preprint arXiv:2006.04418, 2020.
Y. Lee, E. Jun, J. Choi, and H.-I. Suk. Multi-view integrative attention-based deep representation
learning for irregular clinical time-series data. IEEE Journal of Biomedical and Health Informatics,
26(8):4270–4280, 2022. doi: 10.1109/JBHI.2022.3172549.
X. Li, T.-K. L. Wong, R. T. Q. Chen, and D. K. Duvenaud. Scalable gradients and variational
inference for stochastic differential equations. In C. Zhang, F. Ruiz, T. Bui, A. B. Dieng, and
D. Liang, editors, Proceedings of The 2nd Symposium on Advances in Approximate Bayesian
Inference, volume 118 of Proceedings of Machine Learning Research, pages 1–28. PMLR, 08 Dec
2020. URL https://proceedings.mlr.press/v118/li20a.html.
Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anandkumar.
Fourier neural operator for parametric partial differential equations. In International Conference on
Learning Representations, 2021. URL https://openreview.net/forum?id=c8P9NQVtmnO.
Z. Li, S. Li, and X. Yan. Time series as images: Vision transformer for irregularly sampled time
series. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,
Advances in Neural Information Processing Systems, volume 36, pages 49187–49204. Curran As-
sociates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/
file/9a17c1eb808cf012065e9db47b7ca80d-Paper-Conference.pdf.
Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative
modeling. In The Eleventh International Conference on Learning Representations, 2023. URL
https://openreview.net/forum?id=PqvMRDCJT9t.
Z. C. Lipton, D. Kale, and R. Wetzel. Directly modeling missing data in sequences with rnns:
Improved classification of clinical time series. In F. Doshi-Velez, J. Fackler, D. Kale, B. Wallace,
and J. Wiens, editors, Proceedings of the 1st Machine Learning for Healthcare Conference,
volume 56 of Proceedings of Machine Learning Research, pages 253–270, Northeastern University,
Boston, MA, USA, 18–19 Aug 2016. PMLR. URL https://proceedings.mlr.press/v56/
Lipton16.html.
G.-H. Liu, A. Vahdat, D.-A. Huang, E. A. Theodorou, W. Nie, and A. Anandkumar. I2sb: image-
to-image schrödinger bridge. In Proceedings of the 40th International Conference on Machine
Learning, ICML’23. JMLR.org, 2023a.
X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with
rectified flow. The Eleventh International Conference on Learning Representations (ICLR), 2023b.
URL https://par.nsf.gov/biblio/10445517.
X. Liu, L. Wu, M. Ye, and qiang liu. Learning diffusion bridges on constrained domains. In
The Eleventh International Conference on Learning Representations, 2023c. URL https://
openreview.net/forum?id=WH1yCa0TbB.
Y. Liu, H. Zhang, C. Li, X. Huang, J. Wang, and M. Long. Timer: Transformers for Time Series
Analysis at Scale. arXiv e-prints, art. arXiv:2402.02368, Feb. 2024. doi: 10.48550/arXiv.2402.
02368.
A. Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. In M. Meila and
T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume
139 of Proceedings of Machine Learning Research, pages 8162–8171. PMLR, 18–24 Jul 2021.
URL https://proceedings.mlr.press/v139/nichol21a.html.
Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. A time series is worth 64 words: Long-
term forecasting with transformers. In The Eleventh International Conference on Learning
Representations, 2023. URL https://openreview.net/forum?id=Jbdc0vTOcol.
Y. Oh, D. Lim, and S. Kim. Stable neural stochastic differential equations in analyzing irregular time
series data. In The Twelfth International Conference on Learning Representations, 2024. URL
https://openreview.net/forum?id=4VIgNuQ1pY.

13
T. Pollard, A. Johnson, J. Raffa, L. A. Celi, O. Badawi, and R. Mark. eicu collaborative research
database (version 2.0), 2019. URL https://doi.org/10.13026/C2WM1R.

A.-A. Pooladian, H. Ben-Hamu, C. Domingo-Enrich, B. Amos, Y. Lipman, and R. T. Q. Chen.

Multisample flow matching: straightening flows with minibatch couplings. In Proceedings of the
40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.

A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun,
P. Sundberg, H. Yee, K. Zhang, Y. Zhang, G. Flores, G. E. Duggan, J. Irvine, Q. Le, K. Litsch,
A. Mossin, J. Tansuwan, D. Wang, J. Wexler, J. Wilson, D. Ludwig, S. L. Volchenboum, K. Chou,
M. Pearson, S. Madabushi, N. H. Shah, A. J. Butte, M. D. Howell, C. Cui, G. S. Corrado, and
J. Dean. Scalable and accurate deep learning with electronic health records. npj Digital Medicine,
1(1):18, 2018. doi: 10.1038/s41746-018-0029-1.

Y. Rubanova, R. T. Q. Chen, and D. K. Duvenaud. Latent ordinary differential equations for

irregularly-sampled time series. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-
Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, vol-
ume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_
files/paper/2019/file/42a6845a557bef704ad8ac9cb4461d43-Paper.pdf.

N. Seedat, F. Imrie, A. Bellot, Z. Qian, and M. van der Schaar. Continuous-time modeling of
counterfactual outcomes using neural controlled differential equations. In K. Chaudhuri, S. Jegelka,
L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International
Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research,
pages 19497–19521. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/
seedat22b.html.

Y. Shi, V. De Bortoli, A. Campbell, and A. Doucet. Diffusion schrödinger bridge matching. In

A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in
Neural Information Processing Systems, volume 36, pages 62183–62223. Curran Associates,
Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/
c428adf74782c2092d254329b6b02482-Paper-Conference.pdf.

S. N. Shukla and B. Marlin. Multi-time attention networks for irregularly sampled time series. In
International Conference on Learning Representations, 2021. URL https://openreview.net/
forum?id=4c0J6lwQ4_.

V. R. Somnath, M. Pariset, Y.-P. Hsieh, M. R. Martinez, A. Krause, and C. Bunne. Aligned diffusion
Schrödinger bridges. In R. J. Evans and I. Shpitser, editors, Proceedings of the Thirty-Ninth
Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine
Learning Research, pages 1985–1995. PMLR, 31 Jul–04 Aug 2023. URL https://proceedings.
mlr.press/v216/somnath23a.html.

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative

modeling through stochastic differential equations. In International Conference on Learning
Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.

A. Tong, K. FATRAS, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Ben-

gio. Improving and generalizing flow-based generative models with minibatch optimal trans-
port. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URL https:
//openreview.net/forum?id=CD9Snc73AW. Expert Certification.

C. Vorbach, R. Hasani, A. Amini, M. Lechner, and D. Rus. Causal navigation by continuous-time

neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,
Advances in Neural Information Processing Systems, volume 34, pages 12425–12440. Curran As-
sociates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/
file/67ba02d73c54f0b83c05507b7fb7267f-Paper.pdf.

G. Woo, C. Liu, A. Kumar, C. Xiong, S. Savarese, and D. Sahoo. Unified training of universal time
series forecasting transformers. In Forty-first International Conference on Machine Learning,
2024. URL https://openreview.net/forum?id=Yd8eHMY1wz.

14
H. Xue and F. D. Salim. PromptCast: A New Prompt-Based Learning Paradigm for Time
Series Forecasting . IEEE Transactions on Knowledge & Data Engineering, 36(11):6851–
6864, Nov. 2024. ISSN 1558-2191. doi: 10.1109/TKDE.2023.3342137. URL https:
//doi.ieeecomputersociety.org/10.1109/TKDE.2023.3342137.
Y. Zhang, K. Gong, K. Zhang, H. Li, Y. Qiao, W. Ouyang, and X. Yue. Meta-transformer: A unified
framework for multimodal learning. arXiv preprint arXiv:2307.10802, 2023.
T. Zhou, P. Niu, x. wang, L. Sun, and R. Jin. One fits all: Power general time series analysis by
pretrained lm. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,
Advances in Neural Information Processing Systems, volume 36, pages 43322–43355. Curran As-
sociates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/
file/86c17de05579cde52025f9984e6e2ebb-Paper-Conference.pdf.
J. E. Zimmerman, A. A. Kramer, D. S. McNair, and F. M. Malila. Acute physiology and chronic health
evaluation (apache) iv: hospital mortality assessment for today’s critically ill patients. Critical
care medicine, 34(5):1297–1310, 2006.

15
A Proof of theorems
We first prove a Lemma which shows TFM learns valid flows between distributions with the target
prediction reparameterization trick.
Lemma A.1. If pt (x) > 0, δdata is Lipschitz continuous for all x ∈ Rd and t ∈ [0, 1], LF M and
LT F M are equal,
∇θ LF M (θ) = ∇θ LT F M (θ)

Proof. This proof is a simple extension of Lipman et al. [2023], Tong et al. [2024] which proved
LCF M and LF M are equal under similar constraint.
−x0
Given δdata = t1 − t0 , we have ut (x) = xδ1data where t0 is the previous time in the time series,
and t1 is the current time for inference. For the time series data, we are assuming it to be Lipschitz
continuous there exist L ≥ 0 such that for all x, y ∈ Rn , |f (x) − f (y)| ≤ L∥x − y∥.

1
∇θ Ept (x) ∥vθ (t, x) − ut (x)∥2 = Et,q(z),pt (x|z) ∥x̂1 (t, x) − x1 ∥2 (18)
(1 − t)2 θ
1
∥x̂1θ (t, x)∥2 − 2 x̂1θ (t, x), x1 + x21

= ∇θ Et,q(z),pt (x|z) 2
(1 − t)
(19)

1
= ∇θ Et,q(z),pt (x|z) ∥x̂1 (t, x)∥2 − 2 x̂1θ (t, x), x1
(1 − t)2 θ
(20)

By bilinearity of the 2-norm and since x1 is independent of θ. Next,

Z
1 1 2
Ept (x) ∥x̂ (t, x)∥ = ∥x̂1θ (t, x)∥2 pt (x)dx
(1 − t)2 θ
ZZ
= ∥x̂1θ (t, x)∥2 pt (x|z)q(z)dzdx

= Eq(z),pt (x|z) ∥x̂1θ (t, x)∥2

Finally,
Z R
x1 pt (x|z)q(z)dz
Ept (x) x̂1θ (t, x), x1 = x̂1θ (t, x), pt (x)dx
pt (x)
Z Z
= x̂1θ (t, x), x1 pt (x|z)q(z)dz dx
ZZ
= x̂1θ (t, x), x1 pt (x|z)q(z)dzdx

= Eq(z),pt (x|z) x̂1θ (t, x), x1

Where we first substitute then change the order of integration for the final equality. Since at all times
t the gradients of LFM and LT F M are equal, ∇θ LFM (θ) = ∇θ LT F M
by substitution.
1 ⌈t⌉
LEt,q(z),pt (x|z) ∥vθ (t, x) − ut (x|z)∥2 = Et,q(z),pt (x|z) ∥x̂ (t, x) − x⌈t⌉ ∥2 (21)
(⌈t⌉ − t)2 θ

⌈t⌉ 2
x̂θ (t, x) − x x⌈t⌉ − x
Et,q(z),pt (x|z) ∥vθ (t, x) − ut (x|z)∥2 = Et,q(z),pt (x|z) − (22)
⌈t⌉ − t ⌈t⌉ − t
1 ⌈t⌉
= Et,q(z),pt (x|z) ∥x̂ (t, x) − x⌈t⌉ ∥2 (23)
(⌈t⌉ − t)2 θ

16
Lemma 3.1 The SDE dxt = ut (x|z)dt + σ 2 dWt where ut is defined in eq. 9 generates pt (x|z) in
eq. 8 with initial condition p0 := δx1 where δ is the Dirac delta function.

Proof. For simplicity of notation we first show the case where ⌈t⌉ = 1.
1 − xt
dxt = ut (x|z)dt + σ 2 dWt = dt + σ 2 dWt (24)
1−t
which is equivalent to the d dimensional Brownian bridge which has marginal
N ((1 − t)x0 + tx1 , σ 2 t(1 − t)) (25)
which completes the proof for ⌈t⌉ = 1.

Proof. We prove the deterministic case with T = 1. The extensions to stochastic and T > 1 are
evident. The couplings are equal if the marginal vector field ut (xt |c) = ut (xt |z, c) everywhere as
R1
the coupling is governed by the push forward flows ϕ(x0 , c) = 0 ut (xt |c)dt, and ϕ(x0 , c, z) =
R1
0
(ut (xt |z, c). If

Et∼U (0,T ),z∼q(z),c∼q(c|z),xt ∼pt (xt |z) ∥ut (xt |z, c) − ut (xt |c)∥22 = 0
then ϕ(x0 , c, z) = ϕ(x0 , c) for all x0 and therefore the couplings of the optimal map are equivalent.
We note that this requires exchange of integrals under the same conditions as Lemma A.1.

Next we show how (A1)-(A3) satisfy Prop. 3.2.

(A1) When c = x0 and there exists T : X → X such that T (x0 ) = x1 if and only if Π⋆ (x0 , x1 ).
We note that this is equivalent to asserting the existence of a Monge map T ⋆ for the coupling
Π⋆ .
In the two timepoint case, c = x0 is sufficient as long as there aren’t two trajectories that
have the same x0 but different x1 s. Conditioning on this way ensures the conditions of of
Prop. 3.2 as the uniqueness property ensures the uniqueness of ut (xt |c).
(A2) There exist no two trajectories xi , xj such that xit = xjt for h + 1 consecutive observations.
In this case notice that this is simply a multi-timepoint extension of A1 to c = xt−h−1:t−1 ,
i.e. conditioned on a history of length h. If this is the case then the same reasoning as A1
applies.
(A3) Trajectories are associated with unique conditional vectors c independent of t.
This satisfies Prop 3.2 by definition.
Proposition 3.3 There exists a scaling function c(t) : R+ → R such that Ltarget (θ) = c(t)Lmatch (θ).

Proof. We start with the matching loss.

17
Figure 4: Left: Distribution of number of complete vital measurements per patient trajectory within
the first 24 hours of admission in each clinical dataset. Right: Distribution of raw heart rate values in
each clinical dataset.

B Experimental Details
B.1 1D Oscillators
The three oscillation trajectories correspond to c = 0.25 (the red trajectory in Figure 2), c = 2 (blue),
and c = 3.75 (green). Before used as an input, t was scaled to between 0 and 1 by dividing by 10.

B.2 Clinical Datasets

B.2.1 Clinical Data Characteristics
In order to accurately model the perturbations in the physiologic signals (mean arterial pressure and
heart rate) of the underlying patient states, we need to learn beyond the general trend of the data.
While the physiologic measurements themselves reflect patient status and drive clinical decision
making, the degree of variation holds information that goes beyond the snapshot at a single time
point. Our approach models the data distribution and stochasticity rather than just fitting the average
trajectory. Other time-varying such as treatment conditions and non-time-varying covariates such as
underlying disease states may also hold information that may impact the underlying state generating
the physiologic signals. Our approach also incorporates this information to inform the trajectory
modeling.
The data distribution in the ICU datasets reflect its status as the most resource-intensive clinical
setting with increased measurement frequency and data distribution shift towards more abnormal
physiologic values (Figure 4). The ED dataset reflects its status as the clinical setting focused on
triaging patients, with sparser and physiologic measurements that fall in the normal range.
B.2.2 Clinical Data Preprocessing
For each clinical dataset, we modeled patient trajectories formed with heart rate and blood pressure
measurements during the first 24 hours following admission. The timeline for each trajectory,
originally in minutes, was scaled to a range between 0 and 1 by dividing by 1440. Additionally, heart
rate and blood pressure values were z-score normalized to standardize the data.
Intensive Care Unit Sepsis (ICU Sepsis) Dataset The eICU Collaborative Research Database v2.0
[Pollard et al., 2019] is a database including deidentified information collected from over 200,000
patients in multiple intensive care units (ICUs) in the United States from 2014 to 2015. The ICU
Sepsis Dataset was created by subsetting the eICU Database for 3362 patients with sepsis as the

18
primary admission diagnosis (2689 patients in training set, 336 in validation set, and 337 in test
set). The following data fields were extracted: patient sex, age, heart rate, mean arterial pressure,
norepinephrine dose and infusion rate, and a validated ICU score (APACHE-IV). Each patient’s
complete pair measurements of heart rate and mean arterial pressure over time form one trajectory to
be modeled.
Norepinephrine infusion rates were calculated by converting drug doses or infusion rates to µg/kg/min,
and where drug doses were not explicitly available, the dose was inferred from the free text given in
the drug name. Start and end times for norepinephrine infusion were calculated by dividing the dose
by the infusion rate. Where there appeared to be multiple infusions at the same time, the maximum
infusion rate was taken as the infusion rate. As a conditional input to the models, the norepinephrine
infusion doses are then scaled to between 0 and 1 by dividing by the maximum norepinephrine value
in the dataset.
The APACHE-IV score, a validated critical care risk score, predicts individual patient mortality risk
[Zimmerman et al., 2006]. In data preprocessing, we uses logistic regression of the score against
binary hospital mortality data to generate a probability for each patient, serving as an additional input
condition for models.
Intensive Care Unit Cardiac Arrest (ICU Cardiac Arrest) Dataset This dataset was extracted
from the to eICU Collaborative Research Database v2.0 [Pollard et al., 2019] described above to
reflect ICU patients at risk for cardiac arrest. This dataset excludes patients who presented with
myocardial infarction (MI) and includes variables used in the Cardiac Arrest Risk Triage (CART)
score [Churpek et al., 2012]: respiratory rate, heart rate, diastolic blood pressure, and age at the time
of ICU admission. As an input to the model, the age was z-score normalized. 51671 patients were
included in the training set, with 6459 patients each in the validation and test sets.
Intensive Care Unit Acute Gastrointestinal Bleeding (ICU GIB) Dataset The Medical Infor-
mation Mart for Intensive Care III (MIMIC-III) critical care database contains data for over 40,000
patients in the Beth Israel Deaconess Medical Center from 2001 to 2012 requiring an ICU stay
[Johnson et al., 2016]. We selected a cohort of 2602 ICU patients with the primary diagnosis of
gastrointestinal bleeding to form the ICU GIB dataset, split into a training set of 2082 patients, and
a validation set and a test set of 260 patients each. We extracted the following variables: age, sex,
heart rate, systolic blood pressure, diastolic blood pressure, usage of vasopressor, usage of blood
product, usage of packed red blood cells, and liver disease. Since the vasopressor and blood product
usage are encoded as a binary value and may not represent actual infusion amount that are most
likely decaying, we experimented with adding a Gaussian decay to them to use as conditional inputs.
Likewise, trajectories to model consist of complete pairs of heart rate and mean arterial pressure
(calculated from systolic blood pressure and diastolic blood pressure) measurements.
Emergency Department Acute Gastrointestinal Bleeding (ED GIB) Dataset This dataset reflects
3348 patients presenting with signs and symptoms of acute gastrointestinal bleeding to two hospital
campuses in Yale New Haven Hospital between 2014 and 2018. The patients were split into a training
set, a validation set, and a test set of 2636, 352, and 360 patients. Variables extracted include patient
sex, age, heart rate, mean arterial pressure, initial measurements of 24 lab tests, and 17 pre-existing
medical conditions as determined by ICD-10 codes. Like ICU Sepsis data, the trajectoires consist of
complete pairs of heart rate and mean arterial pressure measurements.
Age, initial lab test measurements (three labs omitted due to missing data), and pre-existing medical
conditions were used to train an XGBoost model [Chen and Guestrin, 2016] to predict the binary
outcome variable indicating the need for hospital-based care. The resulting probabilities of requiring
hospital-based care (outcome of 1) for each patient were then calculated using the trained model and
used as conditional input to conditional models in experiments on this dataset.
Of note, the outcome variable was defined as 1 if a patient (1) requires red blood cell transfusion, (2)
requires urgent intervention (endoscopic, interventional radiologic, or surgical) to stop bleeding or
(3) all-cause 30-day mortality. Labs and medical conditions included in this dataset are listed below.
Labs in bold were excluded from the XGBoost risk score calculation due to missing data.

• Labs: Sodium, Potassium, Chloride, Carbon Dioxide, Blood Urea Nitrogen, Creatinine,
International Normalized Ratio, Partial Thromboplastin Time, White Blood Cell Count,
Hemoglobin, Platelet Count, Hematocrit, Mean Corpuscular Volume, Mean Corpuscular
Hemoglobin, Mean Corpuscular Hemoglobin Concentration, Red Cell Distribution Width,

19
Figure 5: Sigma mean MSE comparison

Red Blood Cell Count, Aspartate Aminotransferase, Alanine Aminotransferase, Alkaline

Phosphatase, Total Bilirubin, Direct Bilirubin, Albumin, Lactate.
• Previous Medical Histories: Charlson Comorbidity Index, Cerebrovascular Accident,
Deep Vein Thrombosis, Pulmonary Embolism, Atrial Fibrillation, Upper Gastrointestinal
Bleeding, Lower Gastrointestinal Bleeding, Unspecified Gastrointestinal Bleeding, Peptic
Ulcer Disease, Helicobacter Pylori Infection, Coronary Artery Disease, Heart Failure,
Hypertension, Type 2 Diabetes Mellitus, Chronic Kidney Disease, Alcohol Use Disorder,
Cirrhosis.
B.3 Training
B.3.1 1D Oscillators
Since the trajectories in this dataset are deterministic and regularly sampled, we deployed only
TFM-ODE and applied solely the Lmatch loss (i.e, no uncertainty or time predictive loss), as these
methods sufficiently address the structured nature of the data to generate proof-of-concept results.
The three models presented in Figure 2 all have hidden size of 256, σ of 0.1, trained under seed=0
with Adam optimizer with learning rate 1 × 10−3 for a maximum of 1000 epochs with early stopping
(patience=3) monitoring validation loss.
B.3.2 Clinical Data
All the models for clinical data experiments are trained with Adam optimizer. A maximum training
time and epochs are set to 48 hours and 300, with early stopping (patience=3) monitoring validation
loss. All metrics reported were ran with 5 seeds (0,1,2,3,4) to ensure it is reproducible.
TFM, TFM-ODE, and ablations The TFM models were trained with learning rate 1 × 10−6 and
had σ of 0.1. The complete models have hidden size of 256 and memory of 3, while ablation study
with a hidden size of 64 and/or no memory was performed (Table 4). The noise parameter for the
SDE implementation was set to 0.1 for ablations without Luncertainty . The hyperparameters σ = 0.1
and memory=3 for full models were selected through experiments with different values of σ and
memory (Figure 5 and 6).
FM The FM baseline models were trained with learning rate 1 × 10−6 . All models had a hidden
size of 64 with σ of 0.1.
Latent Neural ODE The latent Neural ODE models were trained with a learning rate of 1 × 10−3 .
100 GRU units were used for the encoder model and the number of latent dimensions was 2.
Baseline Neural SDE and Neural ODE Both baseline models were trained with learning rate
1 × 10−5 and had a hidden size of 64.

20
Figure 6: Memory Mean MSE comparison

Paper About AI 1
No ratings yet
Paper About AI 1
28 pages
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
No ratings yet
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
13 pages
Dynamical Measure Transport and Neural PDE Solvers For Sampling
No ratings yet
Dynamical Measure Transport and Neural PDE Solvers For Sampling
25 pages
High Precision Differentiation Techniques For Data-Driven Solution of Nonlinear Pdes by Physics-Informed Neural Networks
No ratings yet
High Precision Differentiation Techniques For Data-Driven Solution of Nonlinear Pdes by Physics-Informed Neural Networks
28 pages
Cifar 10
No ratings yet
Cifar 10
31 pages
(Berg, Jens, and Kaj Nystrom), Data-Driven Discovery of PDEs in Complex Datasets, Journal of Computational Physics 384 (2019)
No ratings yet
(Berg, Jens, and Kaj Nystrom), Data-Driven Discovery of PDEs in Complex Datasets, Journal of Computational Physics 384 (2019)
14 pages
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
No ratings yet
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
19 pages
An Optimal Control Perspective On Diffusion-Based Generative Modeling
No ratings yet
An Optimal Control Perspective On Diffusion-Based Generative Modeling
37 pages
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
No ratings yet
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
27 pages
Chapter 5
No ratings yet
Chapter 5
140 pages
009 Opening The Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
No ratings yet
009 Opening The Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
11 pages
Consistency Models
No ratings yet
Consistency Models
41 pages
Recurrent Interpolants For Probabilistic Time Series Prediction
No ratings yet
Recurrent Interpolants For Probabilistic Time Series Prediction
14 pages
NueralFLOWS
No ratings yet
NueralFLOWS
24 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
No ratings yet
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
12 pages
Modeling Systems With Machine Learning Based Differential Equations
No ratings yet
Modeling Systems With Machine Learning Based Differential Equations
12 pages
Diffusion Based Representation Learning
No ratings yet
Diffusion Based Representation Learning
20 pages
Flow Matching With Gaussian Process Priors For Probabilistic Time Series Forecasting
No ratings yet
Flow Matching With Gaussian Process Priors For Probabilistic Time Series Forecasting
21 pages
Neural Operator - Learning Maps Between Function Spaces
No ratings yet
Neural Operator - Learning Maps Between Function Spaces
93 pages
UC Merced Previously Published Works
No ratings yet
UC Merced Previously Published Works
40 pages
Pde 1707.02568
No ratings yet
Pde 1707.02568
13 pages
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
22 pages
Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport
No ratings yet
Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport
34 pages
1812 11285v4
No ratings yet
1812 11285v4
46 pages
MCG
No ratings yet
MCG
29 pages
Inductive Moment Matching
No ratings yet
Inductive Moment Matching
36 pages
An Overview on Machine Learning Methods for Partial Differential Equations From Physics Informed Neural Networks to Deep Operator Learning
No ratings yet
An Overview on Machine Learning Methods for Partial Differential Equations From Physics Informed Neural Networks to Deep Operator Learning
59 pages
Stochastic Physics-Informed Neural Ordinary Differential Equations
No ratings yet
Stochastic Physics-Informed Neural Ordinary Differential Equations
35 pages
PINN
100% (1)
PINN
22 pages
[2024-ICML] Variational Schrodinger Diffusion Models
No ratings yet
[2024-ICML] Variational Schrodinger Diffusion Models
24 pages
Demystifying Variational Diffusion Models
No ratings yet
Demystifying Variational Diffusion Models
48 pages
Consistency Models
No ratings yet
Consistency Models
42 pages
Variational Inference For SDE
No ratings yet
Variational Inference For SDE
28 pages
Deep Learning Enhanced Dynamic Mode(오토인코더 EDMD)
No ratings yet
Deep Learning Enhanced Dynamic Mode(오토인코더 EDMD)
31 pages
Raissi - PIDL Part 2
No ratings yet
Raissi - PIDL Part 2
19 pages
Low-dimensional Adaptation of Diffusion Models
No ratings yet
Low-dimensional Adaptation of Diffusion Models
52 pages
Machine Learning Approximation Algorithms For High-Dimensional Fully Nonlinear PDEs and Second-Order Backward SDEs
No ratings yet
Machine Learning Approximation Algorithms For High-Dimensional Fully Nonlinear PDEs and Second-Order Backward SDEs
56 pages
A Proposal On Machine Learning Via Dynamical Systems
No ratings yet
A Proposal On Machine Learning Via Dynamical Systems
11 pages
Datascience
No ratings yet
Datascience
14 pages
Tutorialon Diffusion Modelsfor Imaging and Vision
No ratings yet
Tutorialon Diffusion Modelsfor Imaging and Vision
90 pages
Diffusion PDE
No ratings yet
Diffusion PDE
22 pages
merger02
No ratings yet
merger02
5 pages
#Loyola, Pedergnana & Garcia - Smart Sampling and Incremental Function Learning For Very Large High Dimensional Data
No ratings yet
#Loyola, Pedergnana & Garcia - Smart Sampling and Incremental Function Learning For Very Large High Dimensional Data
13 pages
Solving High-Dimensional Hamilton-Jacobi-Bellman Equations With Functional Hierarchical Tensor
No ratings yet
Solving High-Dimensional Hamilton-Jacobi-Bellman Equations With Functional Hierarchical Tensor
24 pages
Neural Networks-Based Backward Scheme For Fully Nonlinear Pdes
No ratings yet
Neural Networks-Based Backward Scheme For Fully Nonlinear Pdes
24 pages
pde 微分方程与神经网络
No ratings yet
pde 微分方程与神经网络
16 pages
Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
No ratings yet
Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
42 pages
Diffusion Models For Time Series Applications: A Survey
No ratings yet
Diffusion Models For Time Series Applications: A Survey
25 pages
1-s2.0-S0021999119304504-main
No ratings yet
1-s2.0-S0021999119304504-main
16 pages
2017-44 FP
No ratings yet
2017-44 FP
14 pages
Reliable extrapolation [Comput. Methods Appl. Mech. Eng.]
No ratings yet
Reliable extrapolation [Comput. Methods Appl. Mech. Eng.]
36 pages
Self-adaptive physics-informed neural networks
No ratings yet
Self-adaptive physics-informed neural networks
23 pages
Flow Straight and Fast: Learning To Generate and Transfer Data With Rectified Flow
No ratings yet
Flow Straight and Fast: Learning To Generate and Transfer Data With Rectified Flow
41 pages
Approximation of Solution Operators for High-dimensional PDEs部分2
No ratings yet
Approximation of Solution Operators for High-dimensional PDEs部分2
2 pages
Solving Parabolic Periodic P-Laplacian by Deep Learning
No ratings yet
Solving Parabolic Periodic P-Laplacian by Deep Learning
15 pages
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
No ratings yet
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
23 pages
Enrico-Schiassi Dissertation (11)
No ratings yet
Enrico-Schiassi Dissertation (11)
134 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
Forecasting Models – an Overview With The Help Of R Software
From Everand
Forecasting Models – an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
Kerja Kursus Add Math (Johor)
No ratings yet
Kerja Kursus Add Math (Johor)
8 pages
MMMS
100% (1)
MMMS
302 pages
Winplot Intro Spr2012
No ratings yet
Winplot Intro Spr2012
194 pages
Lesson Planner PH Philosophy
No ratings yet
Lesson Planner PH Philosophy
19 pages
Euler's Generalization of Fermat's Theorem
No ratings yet
Euler's Generalization of Fermat's Theorem
1 page
Table of Areas Under The Normal Curve (Updated 2023)
No ratings yet
Table of Areas Under The Normal Curve (Updated 2023)
2 pages
Complexity Theory ChapterTwo 2 Computability
No ratings yet
Complexity Theory ChapterTwo 2 Computability
28 pages
Grade 11 Genmath Quarter 1 Week 2 Module 5
100% (8)
Grade 11 Genmath Quarter 1 Week 2 Module 5
11 pages
GATE Statistics PYQ (2023-2019)
No ratings yet
GATE Statistics PYQ (2023-2019)
154 pages
Pertemuan - 2: Mata Kuliah: Statika Kode: CVL - 104 SKS: 3 Sks
No ratings yet
Pertemuan - 2: Mata Kuliah: Statika Kode: CVL - 104 SKS: 3 Sks
26 pages
Tutorial Backpropagation Neural Network
No ratings yet
Tutorial Backpropagation Neural Network
10 pages
Syllabus
No ratings yet
Syllabus
2 pages
(Ebook) Linear programming with MATLAB by Michael C. Ferris, Olvi L. Mangasarian, Stephen J. Wright ISBN 9780898716436, 0898716438download
100% (3)
(Ebook) Linear programming with MATLAB by Michael C. Ferris, Olvi L. Mangasarian, Stephen J. Wright ISBN 9780898716436, 0898716438download
59 pages
Class - 12 Syllabus Cms
No ratings yet
Class - 12 Syllabus Cms
22 pages
g8 Pretest Set A
No ratings yet
g8 Pretest Set A
2 pages
20241110_AMC12 Lecture number theory 1
No ratings yet
20241110_AMC12 Lecture number theory 1
17 pages
Qrt1 Week 3 TG Lesson 10
No ratings yet
Qrt1 Week 3 TG Lesson 10
4 pages
Forays Into Mathematical Physics
No ratings yet
Forays Into Mathematical Physics
32 pages
Maths (311) TMA
No ratings yet
Maths (311) TMA
9 pages
Grade 6 Mathematics Weeks 5-8 Worksheets - Term 2
No ratings yet
Grade 6 Mathematics Weeks 5-8 Worksheets - Term 2
109 pages
Math Brevet Trial Test 2019
No ratings yet
Math Brevet Trial Test 2019
2 pages
4.4 Trip Generation: 4.4.1 Model Function
No ratings yet
4.4 Trip Generation: 4.4.1 Model Function
7 pages
za-m-1732308677-grade-9-term-1-assessment-for-maths_ver_2
No ratings yet
za-m-1732308677-grade-9-term-1-assessment-for-maths_ver_2
15 pages
Detailed-Lesson-Plan-in-Mathematics-5-4th Quarter Week-2-Day-2
No ratings yet
Detailed-Lesson-Plan-in-Mathematics-5-4th Quarter Week-2-Day-2
6 pages
Periodic Signals: 1. Application Goal
No ratings yet
Periodic Signals: 1. Application Goal
10 pages
Cazoom Math. Circles. Circle Theorems (a). Answers
No ratings yet
Cazoom Math. Circles. Circle Theorems (a). Answers
2 pages
4 Infix To Postfix
No ratings yet
4 Infix To Postfix
2 pages
v1 Covered
No ratings yet
v1 Covered
10 pages
L1 FCG
No ratings yet
L1 FCG
2 pages
Inversion - Mathematical Excalibur
No ratings yet
Inversion - Mathematical Excalibur
4 pages

Trajectory Flow Matching

Uploaded by

Trajectory Flow Matching

Uploaded by

Trajectory Flow Matching with Applications to

Clinical Time Series Modeling

Xi Zhang1,2 ∗ Yuan Pu3 ∗ Yuki Kawamura4 Andrew Loza3

Modeling stochastic and irregularly sampled time series is a challenging problem

38th Conference on Neural Information Processing Systems (NeurIPS 2024).

3 Trajectory Flow Matching

ICU Sepsis ICU Cardiac Arrest ICU GIB ED GIB

ICU sepsis ICU Cardiac Arrest ICU GIB

ICU Sepsis ICU Cardiac Arrest ICU GIB ED GIB

4.2.3 Ablation Study

A. Garza and M. Mergenthaler-Canseco. Timegpt-1, 2023.

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780,

J. Jia and A. R. Benson. Neural jump stochastic differential equations. In H. Wal-

A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits,

P. Kidger. On neural differential equations, 2022. URL https://arxiv.org/abs/2202.02435.

K. Kuvshinova, O. Tsymboi, A. Kostromina, D. Simakov, and E. Kovtun. Towards foundation time

S. Lahlou, M. Jain, H. Nekoei, V. I. Butoi, P. Bertin, J. Rector-Brooks, M. Korablyov, and Y. Bengio.

A.-A. Pooladian, H. Ben-Hamu, C. Domingo-Enrich, B. Amos, Y. Lipman, and R. T. Q. Chen.

Y. Rubanova, R. T. Q. Chen, and D. K. Duvenaud. Latent ordinary differential equations for

Y. Shi, V. De Bortoli, A. Campbell, and A. Doucet. Diffusion schrödinger bridge matching. In

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative

A. Tong, K. FATRAS, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Ben-

C. Vorbach, R. Hasani, A. Amini, M. Lechner, and D. Rus. Causal navigation by continuous-time

By bilinearity of the 2-norm and since x1 is independent of θ. Next,

= Eq(z),pt (x|z) ∥x̂1θ (t, x)∥2

= Eq(z),pt (x|z) x̂1θ (t, x), x1

Next we show how (A1)-(A3) satisfy Prop. 3.2.

Proof. We start with the matching loss.

B.2 Clinical Datasets

Red Blood Cell Count, Aspartate Aminotransferase, Alanine Aminotransferase, Alkaline

You might also like