Parameter Estimation For Dynamical Systems Using A Deep Neural Network
Parameter Estimation For Dynamical Systems Using A Deep Neural Network
Research Article
Parameter Estimation for Dynamical Systems Using a Deep
Neural Network
Received 16 December 2021; Revised 4 April 2022; Accepted 5 April 2022; Published 27 April 2022
Copyright © 2022 Tamirat Temesgen Dufera et al. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
The deep neural network (DNN) was applied for estimating a set of unknown parameters of a dynamical system whose measured
data are given for a set of discrete time points. We developed a new vectorized algorithm that takes the number of unknowns (state
variables) and number of parameters into consideration. The algorithm, first, trains the network to determine weights and biases.
Next, the algorithm solves the systems of algebraic equations to estimate the parameters of the system. If the right hand side
function of the system is smooth and the system have equal numbers of unknowns and parameters, the algorithm solves the
algebraic equation at the discrete point where absolute error between the neural network solutions and the measured data is
minimum. This improves the accuracy and reduces computational time. Several tests were carried out in linear and non-linear
dynamical systems. Last, we showed that the DNN approach is more successful in terms of computational time as the number of
hidden layers increases.
dependent on the initial guesses, see, for example, [20, 21] and algorithm for parameter estimation; (5) numerical experi-
further references therein. Despite the variety of methods, this ments; and (6) conclusions and further work.
estimation problem remains still a well-known challenging
problem. 2. Problem Formulation
An alternative approach for calibrating dynamical sys-
tems is artificial neural networks (ANN). The beginning of Let us consider a dynamical system described by n ordinary
ANN is often attributed to the research article by McCulloch differential equations involving m unknown parameters,
[22]. By then, it was less popular due to the capacity of du(t)
computational machines. Nevertheless, the fast development � f(t, u(t), p), t ∈ t0 , tend ⊂ R,
dt (1)
of computer science and technology in the last decade, has
led to an exponential increase of the capacity of machines, u t 0 � u0 ,
for both: storing and processing data.
An ANN is defined as “an information-processing system where u(t) � [u1 (t), u2 (t), . . . , un (t)]T is a vector field with
that has certain performance characteristics in common with n components ui (t), i � 1, . . . , n; p � [p1 , p2 , . . . , pm ] ∈ Rm
biological neural networks” [23, 24]. A network is composed is the vector containing the unknown parameters; u0 ∈ Rn is
of several layers. The first layer is usually called the input layer, the initial condition and
whereas the last one is referred as the output layer. Layers
f1 t, u1 , u2 , . . . , un , p1 , p2 , . . . , pm
falling in between the input and output layers are called ⎡⎢⎢⎢ ⎤⎥
hidden layers. Furthermore, each layer have a set of neurons ⎢⎢⎢ f2 t, u1 , u2 , . . . , un , p1 , p2 , . . . , pm ⎥⎥⎥⎥⎥
f(t, u, p): � ⎢⎢⎢ ⎢
⎢ ⎥⎥⎥, (2)
or units. Deep neural networks (DNN) are ANNs with more ⎢⎢⎢ ⎥⎥⎥
⎣ ⋮ ⎥⎥⎦
than one hidden layer [25, 26]. DNNs are widely applied in
artificial intelligence. For instance: in computer vision, image fn t, u1 , u2 , . . . , un , p1 , p2 , . . . , pm
processing, pattern recognition, and cybersecurity [27–29].
is a given vector-valued function, not necessarily linear, with
The successful performance of DNNs is owed to the fact that
n components fi (t, u1 , u2 , . . . , un , p1 , p2 , . . . , pm ),
deep layers are able to capture more variances [30].
i � 1, . . . , n. The measured data for the model equation (1)
ANNs and, in particular, DNNs, could potentially address
will be denoted by u∗ (t).
some of the challenges of the aforementioned standard
Objective: Given measured data at N time points of (1),
methods. One of these drawbacks is the need of a large training
i.e., (tk , u∗ (tk )) with k � 1, 2, . . . , N, we aim to develop a
dataset to obtain a sufficiently accurate estimation of the pa-
vectorized deep neural network algorithm that estimates the
rameters, which entails a high computational cost [31–33]. The
unknown parameters p and the solution u(t) for all t.
ANN approach has been implemented to minimize this lim-
itation. In particular, Dua [34, 35] proposed ANN methods for
parameter estimation in systems of differential equations. 3. Deep Neural Network Model
However, to the best of the knowledge of the authors, we have
We consider a deep neural network with a similar archi-
not encountered in the literature a vectorized DNN algorithm
tecture as in [36] for non-parametric systems of ODEs
which approximates the solution of a dynamical system with
depicted in Figure 1. In this network diagram, ti ∈ (t0 , tend )
unknown parameters given a set of values of the solution in a
with i ∈ {1, 2, . . . N} represents N time points; ak (l) is the
set of time points, and thus, it is the main purpose of the paper.
state of the j-th neuron in layer l with l ∈ {1, 2, . . . , L}, where
Additionally, we provide the following original results.
L denotes the total number of layers; b(l) denotes the bias in
layer l; and W(l) : � w(l)jk denotes the weight matrix of layer l.
(1) We extended the algorithm from [36] for systems of
Lastly, u (i)
j denotes the estimated solution of the unknown
differential equations to systems of differential
function uj (t(i) ), evaluated at the time point t(i) .
equations with unknown parameters.
The architecture of the DNN is based on the following:
(2) We enhance the efficiency and accuracy of the al-
gorithm in the case when the number of unknowns (i) An input layer (l � 0) consisting of a single neuron
of the dynamical system coincides with the number corresponding to the time point t(i) ;
of unknown parameters. (ii) An output layer (l � L) with n output functions u (i)
j ;
(3) We show that the DNN algorithm for this problem (iii) L − 1 hidden layers with hl neurons in each layer,
becomes faster in terms of computational time as the l � {1, . . . L − 1}.
number of layers increase.
Let us remark that the number of neurons in each hidden
For the calculation of gradients of the cost functions, we layer might not be the same. As a result, the weight matrix W
utilized the auto-differentiation technique supported in the has dimension hl × hl−1 , and thus it might not be a square
code by the Autograd package [37] and for the optimization matrix.
of the learning rule, we implemented the Adam method [38], Moreover, the number of neurons in each hidden layer
which successfully addresses the local minimum problem could be determined based on the performance of the model
present in the standard approaches. [36]. Here, by performance of the model, we mean the DNN
The paper is structured as follows: (2) mathematical architecture which gives the best approximation with a small
formulation of the problem; (3) DNN model; (4) vectorized number of iterations and reduced computational time.
Applied Computational Intelligence and Soft Computing 3
� ��2
After obtaining the optimal values of the learning pa- 1 m ���d
u �
rameters W∗ and b∗ , we seek the values of parameter p by J2 (p) � , p��� .
�� t(i) , W∗ , b∗ − f t(i) , u (14)
�
2 i�1 dt �
solving the following non-linear system of algebraic
equations,
d
u (i) ∗ ∗ (i) Remark 2. Let us note, that the approach suggested in Case 2
F(p): � t , W , b − f t . u, p � 0. (11)
dt can also be applied to Case 1 but not the other way round.
When attempting to solve (11), we will distinguish two
possible cases: (1) The number of equations equals to the 4. Vectorized Algorithm for
number of parameters and (2) Otherwise. Parameter Estimation
Case 1. For the particular case in which the right hand side In this section, we describe the DNN algorithm to estimate
function of the dynamical system (1) has as many parameters the parameters of the system of differential equations (1). It
as equations, i.e., n � m, the following proposition addresses differs from the algorithm in [36]. The current algorithm
the existence of a solution in this case. This proposition follows supervised learning, since we have the target output.
directly follows from the well-known inverse function Additionally, it includes the case where the number of
theorem. parameters equals the number of unknowns.
(1) Input data: Define the vector T � [t(1) , t(2) , . . . , t(m) ]
Proposition 1. If number of unknowns and number of of size 1 × m.
parameters are equal, i.e., (n � m), F(p) ∈ Rn is C1 in a (2) Define the deep neural network structure: Determine
neighborhood of some initial parameters p0 , and if the Ja- the numbers of layers L, input layer (having one
cobian (zF/zp)(p0 ) is non-singular, then equation (9) has unit), L − 1 hidden layers (having nl units), and the
unique solution in the neighborhood of p0 and at point output layer (having n units).
t∗ ∈ (t0 , tend ). (3) Initialize the network parameters: Choose the
weights to have small random entries, whereas the
u/dt)(t∗ , W∗ , b∗ ) is
Proof. Let t∗ ∈ (t0 , tend ), then C∗ � (d biases and the moment matrices in the Adam al-
constant vector, and gorithm to have zero entries.
F(p) � C∗ − f t∗ , u
, p � 0, (12) (i) W1 has n1 × 1 dimension,
(ii) Wl has nl × nl−1 dimension, bl has nl × 1
is a system of non-linear algebraic equations having n dimension,
equations in n unknowns. Hence, the proposition holds by (iii) WL has n × nL−1 dimension, bL has n×1
virtue of the inverse function theorem. □ dimension,
(iv) Mw and Vw have the same size as the corre-
Remark 1. The properties of F in the proposition is deter- sponding W, and
mined by the right hand side function f of the dynamical (v) Mb and Vb have the same size as the corre-
systems (1). Hence, we require the C1 smoothness of f in sponding b.
order to apply Proposition 1. (4) Forward propagation:
(i) For the input layer start by assigning, A0 � T.
Proposition 2. The best estimation for the parameter p is (ii) For the hidden layers, 1 ≤ l ≤ L − 1,
obtained if t∗ is chosen according to Proposition 1 such that
the sum of the absolute error between solutions of the neural Zl � Wl Al−1 + bl , Al � σ l Zl , (15)
network and the measured data is minimum. That is, choose
t∗ such that where σ l is the activation function corre-
n � sponding to the lth hidden layer.
� ��
���u j t∗ , W∗ , b∗ − uj t∗ ���, (13) (iii) For the output layer,
j�1
ZL � WL AL−1 + bL , AL � σ L ZL . (16)
is minimum.
(iv) Assign the trial solution using equation (8).
Proof. Condition (13) ensures the best fit for the solution (5) Compute the cost J1 (T, W, b) and its gradient (9):
trajectories. □ Calculate the partial derivatives of the J1 function,
given in (9), gradients with respect to T, W and b. To
Case 2. The number of parameters does not equal to the carry out these computations, we apply automatic
number of equations of the dynamical system. In this case, differentiation techniques [37, 40].
we can solve the non-linear system (11) by minimizing the (6) Update the learning parameters using the Adam
following objective function method described in (10).
Applied Computational Intelligence and Soft Computing 5
(7) Solve the non-linear system of algebraic equations Table 1: Discrete points of the analytical solution (18).
(11) to obtain the parameter p. Recall that there are t u1 u2
two possible scenarios.
0.0 1.000000 0.000000
(i) If the number of unknowns equals that of pa- 0.1 0.606531 0.372883
rameters i.e., m � n, find t∗ which minimize the 0.2 0.367879 0.563564
absolute error given by (13) and solve the al- 0.3 0.223130 0.647110
gebraic equation given by (12). 0.4 0.135335 0.668731
(ii) If m ≠ n; solve the minimization problem (14). 0.5 0.082085 0.655557
0.6 0.049787 0.623781
0.7 0.030197 0.582985
5. Numerical Experiments 0.8 0.018316 0.538767
0.9 0.011109 0.494326
In this section, we apply the algorithm proposed in the 1.0 0.006738 0.451427
previous Section 4, implemented in Python, to four different
benchmark problems taken from the literature. The choice of Case 1 proved to be more accurate. Moreover, the com-
numbers of neurons, numbers of hidden layers, activation putational time of Step 7 in the algorithm was 2.5 ms CPU
functions, and other network parameters are specified time for the first approach, whereas for the second approach,
within each example. Three out of four of these problems it was 7.6 ms.
have an analytical exact solution which is then used for
validating the model. Example 2. Dynamical system models a biomass transfer
([41], p. 531).
Example 1. In this example, taken from [34, 35] and ([6], du1
problem 1), we consider the following system of differential � −c1 u1 + c2 u2 ,
dt
equations related to the mathematical modeling of a chain of
irreversible chemical reactions du2
� −c2 u2 + c3 u3 , (19)
du1 dt
� −k1 u1 , u1 (0) � 1,
dt du3
(17) � −c3 u3 ,
du2 dt
� k1 u1 − k2 u2 , u2 (0) � 0.
dt with initial condition u(0) � (0, 0, 1)T . For c � [1, 3, 5], the
analytical solution is given by the following:
This system consists of two unknown functions, two
15
parameters. In this case, given the parameters u1 (t) � e−5t − 2e−3t + e−t ,
[k1 , k2 ] � [5, 1], the analytical expression of the solution is as 8
follows: 5
u2 (t) � −e−5t + e−3t , (20)
u1 (t) � e−5t , 2
(18) u3 (t) � e−5t .
5
u2 (t) � e−5t −1 + e4t .
4
Let us take discrete points from the analytical solution Following a similar procedure as for the previous ap-
(18), see Table 1. plication example, we consider 11 points uniformly dis-
To solve this problem, we considered a neural archi- tributed in the interval from the analytical solution (20), see,
tecture with one hidden layer of h � 40 neurons. Further- Table 3.
more, sigmoid activation functions were applied to each unit In this example, we have three unknown functions and
and trained with 90000 epochs. three system parameters. The neural architecture considered
The results of applying the algorithm are shown in has one hidden layer with h � 10 neurons, tanh activation
Figure 2. On the one hand, Figure 2(a) proves the accuracy of functions for each unit. The network is trained with 71800
the method as we cannot distinguish between the analytical epochs and the results are shown in Figure 3. On the one
and the approximated solution. On the other hand, hand, Figure 3 shows the estimated solutions using the DNN
Figure 2(b) shows that the absolute error is bounded by approach and the analytical solutions. As we can see,
0.00015. Figure 3(a) proves again the accuracy of the method. On the
The parameters k1 and k2 were worked out using both other hand, Figure 3(b) shows that the absolute error is
approaches, Case 1 and Case 2, (see Remark 2). For the bounded by 0.0002. The corresponding system of algebraic
approach described in Case 1, the corresponding system of equation (11) was solved at t∗ � 0.4. Again, the parameter
algebraic equation (11) was solved at t∗ � 0.6. Table 2 shows estimation was conducted using approaches from Case 1 and
the estimated parameters obtained through the approaches Case 2. The output is shown in Table 4. Last, the compu-
described in Case 1 and Case 2, which can be applied due to tational time of Step 7 in the algorithm was 2.9 ms CPU time
Remark 2. In this particular example, the approach from for the first approach and 10.5 ms for the second approach.
6 Applied Computational Intelligence and Soft Computing
Exact solution, ANN solution and data The absolute error plot between data and ANN solutions
0.00030
1.0
0.00025
0.8 0.00020
0.00015
0.6
Error
u 0.00010
0.4
0.00005
0.2 0.00000
–0.00005
0.0
–0.00010
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Time t Time t
(a) (b)
Figure 2: Results of Example 1. (a) Exact solution, ANN, and data. (b) Absolute error.
Table 3: Data generated for Example 2. The data are taken from reference [20], as shown in
Table 5.
t u1 u2 u3
It is to be noted that the number of unknowns and
0.0 0.000000 0.000000 1.000000 parameters do not coincide in this example. Hence, unique
0.1 0.055747 0.335719 0.606531 solvability for the corresponding algebraic system (11) is not
0.2 0.166850 0.452330 0.367879
guaranteed. For solving this ODE system, we considered a
0.3 0.282767 0.458599 0.223130
0.4 0.381125 0.414647 0.135335
single layer neural network with 60 units in the hidden
0.5 0.454416 0.352613 0.082085 layers, we have chosen 100000 epochs and the activation
0.6 0.502502 0.288780 0.049787 functions in the hidden layer were tanh functions. Using the
0.7 0.528506 0.230648 0.030197 approach described in Case 2, we obtain
0.8 0.536641 0.181006 0.018316 p [p1 , p2 , p3 ] [0.854, 2.201, 2.013]. The proposed algo-
0.9 0.531127 0.140241 0.011109 rithm gives better fit compared to the results reported in
1.0 0.515706 0.107623 0.006738 [20]. This estimation of the parameters provides an accurate
solution; see Figure 4.
We conclude that in this example, the approach de-
scribed in Case 1 was again more successful. Example 4. Consider a dynamical system with three state
variables which models methanol-to-hydrocarbon process
Example 3. The next problem is taken from [20] and ([34], Example 5).
consists of a dynamical system with initial value condition du1
−2k1 − + k3 + k4 u1 ,
k1 u 2
k2 + k5 u1 + u2
containing three parameters p1 , p2 , and p3 , see also, ([6],
dt
problem 7). The model occurs in several applications such as
chemical kinetics, theoretical biology, ecology, etc. du2 k1 u1 k2 u1 − u2
+ k3 u1 , (22)
du1
p1 u1 − p2 u1 u2 , u1 (0) 1.0, dt k2 + k5 u1 + u2
dt
(21) du3 k1 u1 u2 + k5 u1
+ k4 u1 ,
dt k2 + k5 u1 + u2
du2
p2 u1 u2 − p3 u2 , u2 (0) 0.3.
dt
Applied Computational Intelligence and Soft Computing 7
Exact solution, ANN solution and data The absolute error between data and ANN solutions
0.0004
1.0
0.0003
0.8
0.0002
0.6
Error
u 0.0001
0.4
0.0000
0.2
–0.0001
0.0
–0.0002
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Time t Time t
(a) (b)
Figure 3: Plot for comparing the ANN solution and exact solution. (a) Exact solution, ANN and data. (b) Absolute error.
Table 5: Measurement taken at discrete points Example 3. Fitted NN solution to the data
1.8
t u1 u2
1.6
0.0 1.0 0.3
0.5 1.1 0.35 1.4
1.0 1.3 0.4 1.2
1.5 1.1 0.5
2.0 0.9 0.5 1.0
u
2.5 0.7 0.4 0.8
3.0 0.5 0.3
3.5 0.6 0.25 0.6
4.0 0.7 0.25 0.4
4.5 0.8 0.3
5.0 1.0 0.35 0.2
0.0
0 1 2 3 4 5
Time t
with initial conditions, u0 [1, 0, 0], and t ∈ [0, 1]. We took
the data from the reference ([34], Table 6). NN solution u1 data u1
NN solution u2 data u2
Using the second approach, we estimated the parameters Figure 4: Neural network solution and the measured data for
k1 , k2 , k3 , k4 , and k5 for Example 4. Here, the sigmoid ac- Example 3.
tivation function with 90000 epochs and one hidden layer
having 40 neurons was used. See Table 7 for comparison.
Figure 5 shows how the solution trajectories nicely fitted to
the data. 5.1. Shallow vs. Deep Layers. Here, we looked at the effect of
Table 8 shows that the results obtained in this paper are adding more layers in the network. Specifically, we com-
in agreement with those from the existing literature. pared the network architecture with one, two, and three
8 Applied Computational Intelligence and Soft Computing
1.0
0.8
0.6
u
0.4
0.2
0.0
Figure 5: Neural network solution and the measured data for Example 4.
hidden layers. We considered Examples 1 and 2. In the three required, the experimental results from Tables 9 and 6 show
possible network structure, we take the same activation that reasonable accuracy on the parameter estimation could
function, 1/(1 + e−2.5x ) and we fix the error tolerance of the be achieved by increasing the hidden layer with less numbers
cost function to be 10−6 . Although further analysis is of epochs and computation time.
Applied Computational Intelligence and Soft Computing 9
Table 9: Comparison of shallow vs. deep layers for Example 1. [2] D. Gonze, K. Z. Coyte, L. Lahti, and K. Faust, “Microbial
communities as dynamical systems,” Current Opinion in
Layers and neurons k1 k2 Time (sec) Epochs
Microbiology, vol. 44, pp. 41–49, 2018.
1, (40) 5.0460 0.9979 19 21378 [3] S. Qureshi and A. Yusuf, “Mathematical modeling for the
2, (40, 40) 5.0549 0.9997 30 24358 impacts of deforestation on wildlife species using caputo
3, (40, 40, 40) 4.9103 1.0129 26 16021 differential operator,” Chaos, Solitons & Fractals, vol. 126,
pp. 32–40, 2019.
[4] F. Brauer, C. Castillo-Chavez, and Z. Feng, Mathematical
6. Conclusions and Outlook Models in Epidemiology, Vol. 32, Springer, Berlin, Germany,
2019.
In this paper, we developed a vectorized deep neural network [5] D. J. Higham, “Modeling and simulating chemical reactions,”
algorithm for estimating the unknown parameters in a SIAM Review, vol. 50, no. 2, pp. 347–368, 2008.
dynamical system based on a system of ODEs as well as the [6] I. B. Tjoa and L. T. Biegler, “Simultaneous solution and
solution functions of the system. This problem arises in optimization strategies for parameter estimation of differ-
different sciences as shown in the Numerical Experiments ential-algebraic equation systems,” Industrial & Engineering
Section. We highlighted that there are two approaches ap- Chemistry Research, vol. 30, no. 2, pp. 376–385, 1991.
[7] M. Peifer and J. Timmer, “Parameter estimation in ordinary
plicable to problems where the number of parameters co-
differential equations for biochemical processes using the
incides with the number of ODEs in the system. In this case, method of multiple shooting,” IET Systems Biology, vol. 1,
the system of algebraic equation arising from the optimal no. 2, pp. 78–88, 2007.
learning parameters (weights and biases) is uniquely solved [8] J. Han, A. Jentzen, and W. E, “Solving high-dimensional
at a specific point provided that the right side function partial differential equations using deep learning,” Proceedings
satisfies the conditions stated in Proposition 1. Choosing t∗ of the National Academy of Sciences, vol. 115, no. 34,
significantly improves the accuracy of the parameter pp. 8505–8510, 2018.
estimation. [9] H. Murakami and R. Zimka, “On dynamics in a two-sector
Last, the experimental result shows that, for deep neural Keynesian model of business cycles,” Chaos, Solitons &
network, reasonable accuracy of parameter estimation could Fractals, vol. 130, Article ID 109419, 2020.
be achieved for less number of epochs and hence less [10] V. Novotná and V. Štěpánková, “Parameter estimation for
dynamic model of the financial system,” Acta Universitatis
running time.
Agriculturae et Silviculturae Mendelianae Brunensis, vol. 63,
For the future work, it is worth comparing the proposed
no. 6, pp. 2051–2055, 2015.
method with the traditional methods in terms of accuracy, [11] M. Nouiehed and M. Razaviyayn, “Learning deep models:
computational complexity, and robustness. Furthermore, it critical points and local openness,” 2018, https://arxiv.org/
is worth also comparing with other neural network archi- abs/1803.02968.
tectures. Moreover, the method is yet to be exploited, for [12] C. Yun, S. Sra, and A. Jadbabaie, “A critical view of global
instance, for solving delay differential equations, stochastic optimality in deep learning,” 2018, https://arxiv.org/abs/1802.
dynamical systems, or rather more complex problems such 03487.
as integral equations. [13] N. Kalogerakis and R. Luus, “Improvement of gauss-Newton
method for parameter estimation through the use of infor-
Data Availability mation index,” Industrial & Engineering Chemistry Funda-
mentals, vol. 22, no. 4, pp. 436–445, 1983.
The underlying data supporting this result are available from [14] H. U. Voss, J. Timmer, and J. Kurths, “Nonlinear dynamical
the literature cited in this article. system identification from uncertain and indirect mea-
surements,” International Journal of Bifurcation and Chaos
in Applied Sciences and Engineering, vol. 14, no. 6,
Conflicts of Interest pp. 1905–1933, 2004.
[15] N. Baden and J. Villadsen, “A family of collocation based
The authors declare that they have no known competing
methods for parameter estimation in differential equations,”
financial interests or personal relationships that could have The Chemical Engineering Journal, vol. 23, no. 1, pp. 1–13,
appeared to influence the work reported in this paper. 1982.
[16] O. Aydogmus and A. H. Tor, “A modified multiple shooting
Acknowledgments algorithm for parameter estimation in ODEs using adjoint
sensitivity analysis,” Applied Mathematics and Computation,
The third author truly acknowledges the support received vol. 390, 2021.
from the grant PID2020-117800GB-I00 of the Ministry of [17] B. Wang and W. Enright, “Parameter estimation for odes
Science and Innovation of the Government of Spain for the using a cross-entropy approach,” SIAM Journal on Scientific
research project METRICA. Computing, vol. 35, no. 6, 2013.
[18] J. O. Ramsay, G. Hooker, D. Campbell, and J. Cao, “Parameter
References estimation for differential equations: a generalized smoothing
approach,” Journal of the Royal Statistical Society: Series B,
[1] M. Ashyraliyev, Y. Fomekong-Nanfack, J. A. Kaandorp, and vol. 69, no. 5, pp. 741–796, 2007.
J. G. Blom, “Systems biology: parameter estimation for bio- [19] A. A. Poyton, M. S. Varziri, K. B. McAuley, P. J. McLellan, and
chemical models,” FEBS Journal, vol. 276, no. 4, pp. 886–902, J. O. Ramsay, “Parameter estimation in continuous-time
2009. dynamic models using principal differential analysis,”
10 Applied Computational Intelligence and Soft Computing
Computers & Chemical Engineering, vol. 30, no. 4, pp. 698– the International Conference on Machine Learning, pp. 404–
708, 2006. 413, PMLR, 2018.
[20] L. Edsberg and P.-A. Wedin, “Numerical tools for parameter [39] D. P. Kingma and J. Ba, “Adam: a method for stochastic
estimation in ode-systems,” Optimization Methods and optimization,” 2014, https://arxiv.org/abs/1412.6980.
Software, vol. 6, no. 3, pp. 193–217, 1995. [40] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and
[21] A. Gábor and J. R. Banga, “Robust and efficient parameter J. M. Siskind, “Automatic differentiation in machine learning:
estimation in dynamic models of biological systems,” BMC a survey,” Journal of Machine Learning Research, vol. 18, 2018.
Systems Biology, vol. 9, no. 1, pp. 74–25, 2015. [41] B. Winkel and G. B. Gustafson, “Differential equations course
[22] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas materials,” 2017, https://www.simiode.org/resources/3892.
immanent in nervous activity,” Bulletin of Mathematical
Biophysics, vol. 5, no. 4, pp. 115–133, 1943.
[23] N. Yadav, A. Yadav, and M. Kumar, An Introduction to Neural
Network Methods for Differential Equations, Springer, Berlin,
Germany, 2015.
[24] I. A. Basheer and M. Hajmeer, “Artificial neural networks:
fundamentals, computing, design, and application,” Journal of
Microbiological Methods, vol. 43, no. 1, pp. 3–31, 2000.
[25] J. Schmidhuber, “Deep learning in neural networks: an
overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
[26] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep
Learning, Vol. 1, MIT Press Cambridge, Cambridge, UK, 2016.
[27] S. Dong, P. Wang, and K. Abbas, “A survey on deep learning
and its applications,” Computer Science Review, vol. 40, Article
ID 100379, 2021.
[28] P. Dixit and S. Silakari, “Deep learning algorithms for
cybersecurity applications: a technological and status review,”
Computer Science Review, vol. 39, Article ID 100317, 2021.
[29] S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza,
N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation
using deep learning: a survey,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2021.
[30] J. Bruna and L. Dec, Mathematics of Deep Learning, Courant
Institute of Mathematical Science, NYU, 2018.
[31] M. Paliwal and U. A. Kumar, “Neural networks and statistical
techniques: a review of applications,” Expert Systems with
Applications, vol. 36, no. 1, pp. 2–17, 2009.
[32] C. G. Villegas-Mier, J. Rodriguez-Resendiz, J. M. Álvarez-
Alvarado, H. Rodriguez-Resendiz, A. M. Herrera-Navarro,
and O. Rodrı́guez-Abreo, “Artificial neural networks in mppt
algorithms for optimization of photovoltaic power systems: a
review,” Micromachines, vol. 12, no. 10, 1260 pages, 2021.
[33] X. Glorot and Y. Bengio, “Understanding the difficulty of
training deep feedforward neural networks,” in Proceedings of
the Thirteenth International Conference on Artificial Intelli-
gence and Statistics, pp. 249–256, Chia Laguna Resort, Sar-
dinia, Italy, May 2010.
[34] V. Dua, “An artificial neural network approximation based
decomposition approach for parameter estimation of system
of ordinary differential equations,” Computers & Chemical
Engineering, vol. 35, no. 3, pp. 545–553, 2011.
[35] V. Dua and P. Dua, “A simultaneous approach for parameter
estimation of a system of ordinary differential equations,
using artificial neural network approximation,” Industrial &
Engineering Chemistry Research, vol. 51, no. 4, pp. 1809–1814,
2012.
[36] T. T. Dufera, “Deep neural network for system of ordinary
differential equations: vectorized algorithm and simulation,”
Machine Learning with Applications, vol. 5, Article ID 100058,
2021.
[37] J. Bradbury, R. Frostig, P. Hawkins et al., “JAX: Composable
Transformations of Python+NumPy Programs,” 2018, http://
github.com/google/jax.
[38] L. Balles and P. Hennig, “Dissecting Adam: the sign, mag-
nitude and variance of stochastic gradients,” in Proceedings of