PINN
PINN
Eigenvalue Problems
Henry Jin Marios Mattheakis
School of Engineering and Applied Sciences School of Engineering and Applied Sciences
Harvard University Harvard University
Cambridge, MA, USA Cambridge, MA, USA
[email protected] [email protected]
Pavlos Protopapas
School of Engineering and Applied Sciences
arXiv:2203.00451v1 [cs.LG] 24 Feb 2022
Harvard University
Cambridge, MA, USA
[email protected]
Abstract—Eigenvalue problems are critical to several fields of [11], and available data can be incorporated into the loss
science and engineering. We expand on the method of using function to improve the network’s performance [12].
unsupervised neural networks for discovering eigenfunctions and Eigenvalue differential equations with certain boundary
eigenvalues for differential eigenvalue problems. The obtained
solutions are given in an analytical and differentiable form that conditions appear in a wide range of problems of applied
identically satisfies the desired boundary conditions. The network mathematics and physics, including quantum mechanics and
optimization is data-free and depends solely on the predictions electromagnetism. Lagaris et al. [1] have shown that neural
of the neural network. We introduce two physics-informed loss networks are able to solve eigenvalue problems and proposed
functions. The first, called ortho-loss, motivates the network to a partially iterative method that solves a differential equation
discover pair-wise orthogonal eigenfunctions. The second loss
term, called norm-loss, requests the discovery of normalized with a fixed eigenvalue at each iteration. More recently, Li et
eigenfunctions and is used to avoid trivial solutions. We find al. [13] showed that neural networks can solve the stationary
that embedding even or odd symmetries to the neural network Schrödinger equation for systems of coupled quantum oscilla-
architecture further improves the convergence for relevant prob- tors. This is a variational approach where the eigenvalue is in-
lems. Lastly, a patience condition can be used to automatically directly calculated from the predicted eigenfunction. Our work
recognize eigenfunction solutions. This proposed unsupervised
learning method is used to solve the finite well, multiple finite expands on the unsupervised neural network eigenvalue solver
wells, and hydrogen atom eigenvalue quantum problems. presented by Jin et al. [14], which simultaneously and directly
Index Terms—neural networks, eigenvalue, eigenfunction, dif- learns the eigenvalues and the associated eigenfunctions using
ferential equation a scanning mechanism. Here, we introduce physics-informed
improvements to the regularization loss terms: orthogonal loss
I. I NTRODUCTION (ortho-loss) and normalization loss (norm-loss). We further
design special neural network architectures with embedded
Differential equations are prevalent in every field of science symmetries that ensure the prediction of perfectly even or
and engineering, ranging from physics to economics. Thus, odd eigenfunctions. Furthermore, a modified parameterization
extensive research has been done on developing numerical is introduced to handle problems with non-zero boundary
methods for solving differential equations. With the unprece- conditions. The proposed technique is an extension to physics-
dented availability of computational power, neural networks informed neural network differential equation solvers and,
hold promise in redefining how computational problems are consequently, inherits all the benefits that neural network
solved or improving existing numerical methods. Among solvers have over numerical integrators. Moreover, our method
other applications in scientific computing, neural networks are has an additional advantage over integrators in that it discovers
capable of efficiently solving differential equations [1]–[4]. solutions that identically satisfy the boundary conditions.
These neural network solvers pose several advantages over We assess the performance of the proposed architecture by
numerical integrators: the obtained solutions are analytical and solving a number of standard eigenvalue problems of quantum
differentiable [3], numerical errors are not accumulated [4], mechanics: the single finite square well, multiple finite square
networks are more robust against the ‘curse of dimensionality’ wells, and the hydrogen atom.
[5], [6], a family of solutions corresponding to different initial
or boundary conditions can be constructed [7], the neural II. BACKGROUND
solutions can be transferred for fast discovery of new solutions This study extends the method presented in [14], where
[8], [9], inverse problems can be solved systematically [10], a fully connected neural network architecture was proposed,
with a single output corresponding to the predicted eigenfunc- network takes two inputs, the variable x and a constant input
tion, and with a constant input node designed to learn constant of ones. The constant input feeds into a single linear neuron
eigenvalues through backpropagation. To identically satisfy the (affine transformation) that is updated through optimization,
boundary conditions, a parametric function was used. In order allowing the network to find a constant λ. Afterwards, x
for the network to find non-trivial solutions to the differential and λ are inputs to a fully-connected feed-forward neural
eigenvalue equation, the two regularization loss functions network that returns an output function N (x, λ). The predicted
1 1 eigenfunctions f (x, λ) are defined using a parametric trick,
Lf = , Lλ = 2 (1) similar to [4], according to the equation:
f (x, λ)2 λ
were used to penalize trivial eigenfunctions and zero eigen- f (x, λ) = fb + g(x)N (x, λ). (4)
values, respectively. Moreover, a scanning mechanism allows
the network to search the eigenvalue space for eigenfunctions By choosing an appropriate g(x), the predicted eigenfunction
of different eigenvalues, enabled by the loss term defined as identically satisfies certain boundary conditions.
Ldrive = e−λ+c , (2)
where c was a value that changed during training through
scheduled increases, and was used to control the scanning.
The research by Li et al. [13] on neural network-based
multi-state solvers is also relevant to this study. However,
we present some novelties and differences in methodology.
Specifically, we assign a trainable network parameter to
discover the eigenvalue instead of indirectly calculating it
through the expectation of the Hamiltonian of the system.
Our approach avoids the repeated calculation of an integral Fig. 1: Physics-informed neural architecture for solving eigen-
(i.e., for the expectation value) which is evaluated every value problems.
training epoch. The second novelty of our approach is the
embedding of physical symmetries in the network architecture.
Our aim is to discover pairs of f (x, λ) and λ that approxi-
The symmetry of the wavefunctions can be determined by
mately satisfy Eq. (3). This is achieved by minimizing, during
the symmetry of the given potential function. We design a
the network optimization, a loss function L defined by Eq. (3)
specialized architecture with embedded even or odd symmetry
as:
that significantly improves the overall network optimization.
Finally, we suggest a parameterization that identically satisfies L = LDE + Lreg
non-zero boundary conditions, which is necessary to solve the M
radial equation of the hydrogen atom. 1 X 2
L= Lf (xi , λ) − λf (xi , λ) + Lreg , (5)
Orthogonality loss is also used in [13], where it is lever- M i=1
aged to simultaneously produce multiple eigenvalue solution
where averaging with respect to xi takes place in LDE for
outputs that are pair-wise orthogonal. This differs from our
M training sample points, namely x = (x1 , · · · , xM ). Any
method, since our neural network outputs one solution at a
derivative with respect to xi contained in L is calculated
time, and the orthogonality loss term is used to prevent us
by using the auto-differentiation technique [15]. The Lreg
from finding the same solution multiple times.
term in Eq. (5) contains regularization loss terms. In this
III. M ETHODOLOGY work, we introduce and apply a regularization function that
We consider an eigenvalue problem that exhibits the form: consists of three terms of the form: Lreg = νnorm Lnorm +
νorth Lorth + νdrive Ldrive . Empirically, for the problems dis-
Lf (x) = λf (x), (3)
cussed below, we found the optimal regularization coefficients
where x is the spatial variable, L is a differential operator that νnorm = νorth = 1. The normalization loss Lnorm encourages
depends on x and its derivatives, f (x) is the eigenfunction, normalized eigenfunctions, avoiding the discovery of trivial
and λ is the associated eigenvalue. For the finite square well eigenfunctions and eigenvalues, since it enforces non-zero
problems, we assume homogeneous Dirichlet boundary condi- solution as well as constraining the eigenfunction’s squared
tions at the left and right boundaries xL and xR , respectively, integral to be finite. The Lorth motivates the network to scan for
such that f (xL ) = f (xR ) = fb , where fb is a given constant orthogonal eigenfunctions and can replace or assist the non-
boundary value. On the other hand, for the hydrogen atom physical scanning (Ldrive ) method used in [14]. Ldrive accounts
problem, a single Dirichlet boundary condition f (xR ) = fb is to the scanning method which is used to guide the model’s
enforced. eigenvalue weight and is given by Eq. 2. However, for the
We expand on the network architecture proposed by [14] experiments presented in this study, we use Lorth as a physics-
and shown in Fig. 1. This feed-forward neural network is capa- informed regularization term that can replace the non-physical
ble of solving Eq. (3) when both f (x) and λ are unknown. The scanning method with Ldrive , and thus νnorm is set to 0.
A. Normalization Loss serving as a more physics-aware loss term than the brute-force
Our contribution includes a novel approach to solving the scanning approach.
trivial solution problem. While [14] employed non-trivial Following the network’s convergence to a new solution,
eigenfunction and non-trivial eigenvalue loss terms Lf and the new eigenfunction is added to ψeigen and thus, it is the
Lλ , as described in Eq. 2, these loss terms cannot numerically linear combination of all the discovered solutions. Hence, a
converge to 0 without scaling the solutions to infinity, and single orthogonality loss term is computed for each learning
thus they introduce numerical error. While they were effective gradient, as opposed to separate orthogonality computations
for preventing the network from converging to trivial f (x) for each learned eigenfunction. This reduces computational
and λ, they hold no physical meaning. We present a physics- cost since only one dot product is computed for each training
aware regularization loss function that not only prevents trivial iteration, as opposed to multiple dot products with each found
solutions, but also motivates the eigenfunction’s inner product eigenfunction.
with itself to approach a specific constant number, which is the C. Embedding Even and Odd Symmetry
normalization constraint physically required of eigenfunctions
in quantum mechanics. Thus, Lnorm is given by For certain differential equations where prior information
about the potential dictates even or odd symmetric eigen-
2 functions, the neural network architecture can be embedded
M with a physics-informed modification that enforces the correct
Lnorm = f (x, λ) · f (x, λ) − , (6)
xR − xL symmetry in the eigenfunction output. As demonstrated by
where dot denotes the inner product. The loss function in Mattheakis et al. in [16] and extended by [17], symmetry can
Eq. (6) drives the network to find solutions with non-zero be embedded by feeding a negated input stream in parallel
integrals, where f (x, λ) represents the network solution, M to the original input, then combining streams before the final
is the number of samples, and xR − xL is the training range. dense layer. Adding streams leads to even symmetric outputs,
Specifically, this motivates the network solution to have a while subtracting gives rise to odd symmetric predictions.
squared integral equal to one. Unlike Lf and Lλ , Lnorm can We found that embedding symmetry into our model sig-
strictly reach zero and can also satisfy the normality constraint nificantly accelerates the convergence to a solution. This is
for eigenfunction solutions of Schrodinger’s equation. relevant for the multiple finite square wells problem, as we
demonstrate below.
B. Orthogonality Loss
An orthogonality loss regularization function is included as D. Parametric Function
part of Lreg to motivate the network to find different eigen- Selecting an appropriate parametric function g(x) is neces-
solutions to Schrodinger equation. This presents a physics- sary for enforcing boundary conditions. The following para-
informed approach whereby we can motivate a network to metric equation enforces a f (xL ) = f (xR ) = 0 boundary
solve for orthogonal solutions for problems where it is known conditions:
that solutions are orthogonal, a fundamental property of linear
differential eigenvalue problems. Schrodinger’s equation is g(x) = 1 − e−(x−xL ) 1 − e−(x−xR ) . (8)
one such example, but this mechanism can be extended to As demonstrated in [14], this parametric function is suitable
any Hermitian operator. This serves as a replacement or an for problems where the eigenfunctions are fixed to or converge
improvement over solely relying on the scanning mechanism to zero, as in the case of the infinite square well and the
Ldrive presented in [14]. While a scanning search through harmonic oscillator problems. In the following experiments,
the eigenvalue space using Ldrive can be useful for providing we employ the parametric function of Eq. (8) for finite square
control over the model’s search for eigenfunction solutions, well problems, as they similarly require eigenfunctions to taper
solving equations that are known to be Hermitian (such as to zero at domain limits.
the Schrodinger equation) allows the use of an orthogonal The differential eigenvalue equation for the hydrogen atom,
loss term, since eigenfunctions of Hermitian operators are however, has a single zero boundary condition at x → ∞, as
orthogonal. In this paper, we show that the neural network is the fundamental solution is not fixed to 0 at the origin. For
able to find orthogonal eigenfunction solutions solely based on such problems where a single Dirichlet boundary condition is
the orthogonality loss. This loss term is given by the following required, we use the following parametric function:
equation.
g(x) = 1 − e−(x−xR ) . (9)
Lorth = ψeigen · ψ, (7) E. Towards Solution Recognition
where ψeigen denotes the sum of all eigenfunctions that have To automatically extract the correct eigenfunctions, we
already been discovered by the network during training, and define convergence to an eigenfunction solution using two
ψ is the current network prediction. This regularization term criteria: the differential equation loss LDE and patience.
embeds the network with a physics-informed predisposition LDE describes the loss term for the differential eigenvalue
towards finding orthogonal solutions to a Hermitian operator, equation in question. For our experiments, without loss of
generality, we used Schrodinger’s equation. Nevertheless, the this study, we are interested in solving the one-dimensional
method is valid for any differential equation eigenvalue prob- stationary Schrodinger’s equation defined as:
lem. Considering that perfect eigenvalue solutions will have
~2 ∂ 2
an LDE loss equal to zero, we claim that a solution is found − + V (x) ψ(x) = Eψ(x), (10)
when LDE falls below a chosen threshold, which is a hyper- 2m ∂x2
parameter in the training process. where ~ and m stand for the reduced Planck constant and
The patience condition describes the model’s training the mass respectively, which without loss of generality, can
progress. When solving for a solution, the model initially be set to ~ = m = 1. Equation (10) defines an eigenvalue
improves very quickly, resulting in a fast decrease of LDE . problem where ψ(x) and E denote the eigenfunction f (x, λ)
However, over the course of converging to a solution, the rate and eigenvalue λ pair. The differential equation loss for this
of decrease in LDE decreases as well. Thus, we use the rate of one-dimensional stationary Schrodinger’s equation is given by
decrease in LDE as another condition for solution recognition. Equation (11), and henceforth we call this the Schrodinger
If the rolling average during the training iterations of the equation loss.
successive differences in LDE over a specified window hyper-
parameter falls below a chosen threshold hyper-parameter, we M
~2 ∂ 2
consider the patience condition to be met. 1 X 2
LDE = − 2
+ V (x i ) f (xi , E) − Ef (xi , E) .
When both the LDE condition (LDE falling below a thresh- M i=1 2m ∂x
old) and the patience condition are satisfied, we consider an (11)
eigenvalue solution to have been found. On the other hand,
A boundary condition eigenvalue problem is defined by
if only the patience condition is satisfied, then we interpret
considering a certain potential function V (x) and bound-
this to mean that the model has converged to a false solution.
ary conditions for ψ(x). We assess the performance of the
Consequently, we switch the symmetry (from even to odd
proposed network architecture by solving Eq. (10) for the
symmetry or vice versa) of the model to motivate the network
potential functions of the single finite well, multiple coupled
to search for other solutions. This approach of switching the
finite wells, and the radial equation for the hydrogen atom, all
symmetry of the model upon converging to a false solution
of which have known analytical solutions.
was inspired by our finding that the network’s function output
after converging to false solutions resembled true solutions, For the training, a batch of xi points in the interval [xL , xR ]
but of the opposite symmetry. Upon adopting this switching is selected as input. In every training iteration (epoch) the
approach, we found that the model was able to resume finding input points are perturbed by a Gaussian noise to prevent the
true solutions. The above method is described by Algorithm network from learning the solutions only at fixed points. Adam
1. optimizer is used with a learning rate of 8 · 10−3 . We use two
hidden layers of 50 neurons per layer with trigonometric sin(·)
Algorithm 1 The Physics-Informed Neural Eigenvalue Solver activation function. The use of sin(·) instead of more common
Algorithm activation functions, such as Sigmoid(·) and tanh(·), signifi-
cantly accelerates the network’s convergence to a solution [4].
1: Instantiate model with even symmetry We implemented the proposed neural network in pytorch [15]
2: while training do and published the code on github1 .
3: Generate training samples xi
4: Compute LDE , Lnorm A. Single Finite Well
5: Compute Lorth using all stored eigenfunctions
The finite well potential function is defined as:
6: Backpropagate and step
(
7: if patience condition and LDE < threshold then 0 0≤x≤`
8: Store copy of model V (x) = , (12)
V0 otherwise
9: else if patience condition then
10: Switch model symmetry where ` is the length and V0 is the depth of the quantum well.
11: end if The analytical solution to the finite well problem is tradition-
12: end while ally found by solving the stationary Schrodinger’s equation
in each region, then ’stitching’ the solutions of each region
together while enforcing a continuous eigenfunction that is
also continuously differentiable. For bound eigenfunctions, the
IV. E XPERIMENTS general form of the solution for regions where the eigenvalue
E is greater than the potential reads:
We evaluate the effectiveness of the proposed method
√
by solving eigenvalue problems defined by Schrodinger’s 2mE
equation. Schrodinger’s equation is the fundamental equation ψ = A sin(kx) + B cos(kx), k = . (13)
~
in quantum mechanics that describes the state wavefunction
ψ(x) and the associated energy E of a quantum system. In 1 https://github.com/henry1jin/quantumNN
For regions where the eigenvalue E is smaller than the the quantum finite well. The lower left panel outlines the
potential energy, the solution’s general form is LDE during the training. The red curve in the upper left
graph demonstrates the predicted energies where the plateaus
p
2m(V0 − E) indicate the discovery of an eigenstate; the dashed black lines
−αx αx
ψ = Ce + De , α= . (14) show the ground truth energies. On the right side, the four
~
predicted eigenfunctions are represented by blue lines; the
The solutions then for Eq. (12) is the following piece-wise bottom graph corresponds to the ground state. In particular,
eigenfunction, where constants c1 , c2 , and δ, are determined the neural network finds for the ground state solution with
by the requirement that the eigenfunction is continuous, con- energy E = 0.3586. After the first solution is found, we
tinuously differentiable, and normalized. introduce the orthogonal loss term into the training, motivating
αx the network to find a new eigenfunction. Consequently, the
c1 e x ≤ 0,
eigenvalue weight departs from its first value and rises to find
ψ(x) = c2 sin(kx + δ) 0 < x ≤ `, (15) the next even-symmetry solution with eigenvalue E = 3.2132
−αx
c1 e x>` (the third graph on the right side in Fig. 2, counting from the
bottom). Once the patience condition is reached, the network
The ψ(x) eigenfunctions must decay to infinity outside the
automatically adds the latest solution to the orthogonal loss,
walls, implying the boundary conditions ψ(−∞) = ψ(∞) =
motivating the network to once again depart its solution in
0. In numerical methods, infinity is approximated with large
search of the next orthogonal solution. The model converges
values relative to the potentials. We adopt the approximate
to an eigenvalue of around E = 1.8, however it does not
boundary conditions of ψ(xL ) = ψ(xR ) = 0 with the choice
meet both conditions for solution acceptance. In particular,
xL = xR = 6`, for ` = 1 and V0 = 20. The proposed model
it does not meet the LDE condition. We take this to mean
with the orthogonal loss term is capable of solving for all
that, while the model has converged, it has converged to a
bound eigenstates. In the following we use the neural network
false solution. So the symmetry of the model is switched
to approximate the first four eigenfunctions and the associated
to odd symmetry. The next two solutions found are odd-
energies.
symmetric and correspond to the eigenvalues of E = 1.4322
and E = 5.6873 shown respectively by the second and fourth
images in the right panel of Fig. 2.