0% found this document useful (0 votes)
58 views11 pages

SchNet-a Deep Learning Architecture For Molecules and Materials Research Paper

SchNet is a deep learning architecture designed specifically for modeling atomistic systems like molecules and materials. It uses continuous-filter convolutional layers to learn representations of atoms that capture fundamental symmetries like rotational and translational invariance. SchNet has been shown to accurately predict properties across chemical space and learn chemically plausible embeddings of atom types from the periodic table. It can also be used to predict potential energy surfaces and learn energy-conserving force fields for accelerated molecular dynamics simulations.

Uploaded by

python
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views11 pages

SchNet-a Deep Learning Architecture For Molecules and Materials Research Paper

SchNet is a deep learning architecture designed specifically for modeling atomistic systems like molecules and materials. It uses continuous-filter convolutional layers to learn representations of atoms that capture fundamental symmetries like rotational and translational invariance. SchNet has been shown to accurately predict properties across chemical space and learn chemically plausible embeddings of atom types from the periodic table. It can also be used to predict potential energy surfaces and learn energy-conserving force fields for accelerated molecular dynamics simulations.

Uploaded by

python
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

SchNet – a deep learning architecture for molecules and materials

K.T. Schütt,1, a) H.E. Sauceda,2 P.-J. Kindermans,1 A. Tkatchenko,3, b) and K.-R. Müller1, 4, 5, c)
1)
Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
2)
Fritz-Haber-Institut der Max-Planck-Gesellschaft, 14195 Berlin, Germany
3)
Physics and Materials Science Research Unit, University of Luxembourg, L-1511 Luxembourg,
Luxembourg
4)
Max-Planck-Institut für Informatik, Saarbrücken, Germany
5)
Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713,
South Korea
(Dated: 23 March 2018)
arXiv:1712.06113v3 [physics.chem-ph] 22 Mar 2018

Deep learning has led to a paradigm shift in artificial intelligence, including web, text and image search,
speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning in
general and deep learning in particular is ideally suited for representing quantum-mechanical interactions,
enabling to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound
space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic
systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by
accurately predicting a range of properties across chemical space for molecules and materials where our model
learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet
to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of
small molecules and perform an exemplary study of the quantum-mechanical properties of C20 -fullerene that
would have been infeasible with regular ab initio molecular dynamics.

I. INTRODUCTION allows for spatially and chemically resolved insights into


quantum-mechanical observables28 .
Accelerating the discovery of molecules and materi- Here we build upon this work and present the deep
als with desired properties is a long-standing challenge learning architecture SchNet that allows to model com-
in computational chemistry and the materials sciences. plex atomic interactions in order to predict potential-
However, the computational cost of accurate quantum- energy surfaces or speeding up the exploration of chem-
chemical calculations proves prohibitive in the explo- ical space. SchNet, being a variant of DTNNs, is able
ration of the vast chemical space. In recent years, there to learn representations for molecules and materials that
have been increased efforts to overcome this bottleneck follow fundamental symmetries of atomistic systems by
using machine learning, where only a reduced set of construction, e.g., rotational and translational invariance
reference calculations is required to accurately predict as well as invariance to atom indexing. This enables accu-
chemical properties1–15 or potential-energy surfaces16–25 . rate predictions throughout compositional and configura-
While these approaches make use of painstakingly hand- tional chemical space where symmetries of the potential
crafted descriptors, deep learning has been applied to energy surface are captured by design. Interactions be-
predict properties from molecular structures using graph tween atoms are modeled using continuous-filter convo-
neural networks26,27 . However, these are restricted to lutional layers30 being able to incorporate further chemi-
predictions for equilibrium structures due to the lack of cal knowledge and constraints using specifically designed
atomic positions in the input. Only recently, approaches filter-generating neural networks. We demonstrate that
that learn a representation directly from atom types and those allow to efficiently incorporate periodic boundary
positions have been developed28–30 . While neural net- conditions enabling accurate predictions of formation en-
works are often considered a ’black box’, there has re- ergies for a diverse set of bulk crystals. Beyond that,
cently been an increased effort to explain their predic- both SchNet and DTNNs provide local chemical poten-
tions in order to understand how they operate or even tials to analyze the obtained representation and allow for
extract scientific insight. This can either be done by chemical insights28 . An analysis of the obtained repre-
analyzing a trained model31–37 or by directly designing sentation shows that SchNet learns chemically plausible
interpretable models38 . For quantum chemistry, some embeddings of atom types that capture the structure of
of us have proposed such an interpretable architecture the periodic table. Finally, we present a path-integral
with Deep Tensor Neural Networks (DTNN) that not molecular dynamics (PIMD) simulation using an energy-
only learns a representation of atomic environments but conserving force field learned by SchNet trained on refer-
ence data from a classical MD at the PBE+vdWTS39,40
level of theory effectively accelerating the simulation by
three orders of magnitude. Specifically, we employ the re-
a) [email protected]
cently developed perturbed path-integral approach41 for
b) [email protected]
carrying out imaginary time PIMD, which allows quick
c) [email protected]
convergence of quantum-mechanical properties with re-
2

SchNet, the atoms are described by a tuple of features


X l = (xl1 , . . . xln ), with xli ∈ RF with the number of
feature maps F , the number of atoms n and the current
layer l. The representation of site i is initialized using an
embedding dependent on the atom type Zi :

x0i = aZi . (1)

These embeddings aZ are initialized randomly and opti-


mized during training. They represent atoms of a system
disregarding any information about their environment for
now.

B. Atom-wise layers
FIG. 1. Illustrations of the SchNet architecture (left) and
interaction blocks (right) with atom embedding in green, in-
Atom-wise layers are dense layers that are applied sep-
teraction blocks in yellow and property prediction network in
blue. For each parameterized layer, the number of neurons arately to the representations xli of each atom i:
is given. The filter-generating network (orange) is shown in
detail in Fig. 2. xl+1
i = W l xli + bl (2)

Since weights W l and biases bl are shared across atoms,


our architecture remains scalable with respect to the
spect to the number of classical replicas (beads). This ex- number of atoms. While the atom representations are
emplary study shows the advantages of developing com- passed through the network, these layers transform them
putationally efficient force fields with ab initio accuracy, and process information about the atomic environments
allowing nanoseconds of PIMD simulations at low tem- incorporated through interaction layers.
peratures – an inconceivable task for regular ab initio
molecular dynamics (AIMD) that could be completed
with SchNet within hours instead of years. C. Interaction blocks

The interaction blocks of SchNet add refinements to


II. METHOD
the atom representation based on pair-wise interactions
with the surrounding atoms. In contrast to DTNNs,
SchNet is a variant of the earlier proposed Deep Tensor here we model these with continuous-filter convolutional
Neural Networks (DTNN)28 and therefore shares a num- layers (cfconv) that are a generalization of the discrete
ber of their essential building blocks. Among these are convolutional layers commonly used, e.g., for images46,47
atom embeddings, interaction refinements and atom-wise or audio data48 . This generalization is necessary since
energy contributions. At each layer, the atomistic system atoms are not located on a regular grid like image pixels,
is represented atom-wise being refined using pair-wise in- but can be located at arbitrary positions. Therefore, a
teractions with the surrounding atoms. In the DTNN filter-tensor, as used in conventional convolutional lay-
framework, interactions are modeled by tensor layers, ers, is not applicable. Instead we need to model the fil-
i.e., atom representations and interatomic distances are ters continuously with a filter-generating neural network.
combined using a parameter tensor. This can be approx- Given atom-wise representations X l at positions R, we
imated using a low-rank factorization for computational obtain the interactions of atom i as the convolution with
efficiency42–44 . SchNet instead makes use of continuous- all surrounding atoms
filter convolutions with filter-generating networks30,45 to
nX
atoms
model the interaction term. These can be interpreted as
a special case of such factorized tensor layers. In the fol- xl+1
i = (X l ∗ W l )i = xlj ◦ W l (rj − ri ), (3)
j=0
lowing, we introduce these components and describe how
they are assembled to form the SchNet architecture. For where ”◦” represents the element-wise multiplication.
an overview of the SchNet architecture, see Fig. 1. Note that we perform feature-wise convolutions for com-
putational efficiency49 . Cross-feature processing is subse-
quently performed by atom-wise layers. Instead of a filter
A. Atom embeddings tensor, we define a filter-generating network W l : R3 →
RF that maps the atom positions to the corresponding
An atomistic system can be described uniquely by a set values of the filter bank (see Section II D).
of n atom sites with nuclear charges Z = (Z1 , . . . , Zn ) A cfconv layer together with three atom-wise layers
and positions R = (r1 , . . . rn ). Through the layers of constitutes the residual mapping50 of an interaction block
3

2. Periodic boundary conditions

For atomistic systems with periodic boundary condi-


tions (PBCs), each atom-wise feature vector xi has to be
equivalent across all periodic repetitions, i.e., xi = xia =
xib for repeated unit cells a and b. Due to the linearity of
the convolution, we are therefore able to apply the PBCs
directly to the filter to accurately describe the atom in-
teractions while keeping invariance to the choice of the
unit cell. Given a filter W̃ l (rjb − ria ) over all atoms with
krjb − ria k < rcut , we obtain the convolution
1 X
FIG. 2. Architecture of the filter-generating network used xl+1 = xl+1
im = xl ◦ W̃ l (rjn − rim )
in SchNet (left) and 5Å x 5Å cuts through generated filters
i
nneighbors j,n jn
(right) from the same filter-generating networks (columns) rjn
under different periodic bounding conditions (rows). Each !
1 X X
filter is learned from data and represents the effect of an in- = xlj ◦ W̃ l (rjn − rim ) .
teraction on a given feature of an atom representation located nneighbors j n
in the center of the filter. For each parameterized layer, the | {z }
number of neurons is given. W

This new filter W now depends on the PBCs of the sys-


tem as we sum over all periodic images within the given
(see Fig. 1, right). We use a shifted softplus ssp(x) = cutoff rcut . We find that the training is more stable when
ln(0.5ex +0.5) as activation functions throughout the net- normalizing the filter response xl+1 by the number of
i
work. The shifting ensures that ssp(0) = 0 and improves atoms within the cutoff range. Fig. 2 (right) shows a
the convergence of the network while having infinite order selection of generated filters without PBCs, with a cu-
of continuity. This allows us to obtain smooth potential bic diamond crystal cell and with an hexagonal graphite
energy surfaces, force fields and second derivatives that cell. As the filters for diamond and graphite are superpo-
are required for training with forces as well as the calcu- sitions of single-atom filters according to their respective
lation of vibrational modes. lattice, they reflect the structure of the lattice. Note that
while the single-atom filters are circular due to the rota-
tional invariance, the periodic filters become rotationally
D. Filter-generating networks
equivariant w.r.t. the orientation of the lattice, which
still keeps the property prediction rotationally invariant.
The filter-generating network determines how interac- While we have followed a data-driven approach where we
tions between atoms are modeled and can be used to con- only incorporate basic invariances in the filters, careful
strain the model and include chemical knowledge. We design of the filter-generating network provides the pos-
choose a fully-connected neural network that takes the sibility to incorporate further chemical knowledge in the
vector pointing from atom i to its neighbor j as input to network.
obtain the filter values W (rj − ri ) (see Fig. 2, left). This
allows us to include known invariances of molecules and
materials into the model. E. Property prediction

Finally, a given property P of a molecule or material is


1. Rotational invariance
predicted from the obtained atom-wise representations.
We compute atom-wise contributions P̂i from the fully-
It is straightforward to include rotational invariance connected prediction network (see blue layers in Fig 1).
by computing pairwise distances instead of using relative Depending on whether the property is intensive or exten-
positions. We further expand the distances in a basis of sive, we calculate the final prediction P̂ by summing or
Gaussians averaging over the atomic contributions, respectively.
ek (rj − ri ) = exp(−γ(krj − ri k − µk )2 ), Since the initial atom embeddings are obviously equiv-
ariant to the order of atoms, atom-wise layers are in-
with centers µk chosen on a uniform grid between zero dependently applied to each atom and continuous-filter
and the distance cutoff. This has the effect of decorrelat- convolutions sum over all neighboring atoms, indexing
ing the filter values which improves the conditioning of equivariance is retained in the atom-wise representations.
the optimization problem. The number of Gaussians and Therefore, the prediction of properties as a sum over
the hyper parameter γ determine the resolution of the fil- atom-wise contributions guarantees indexing invariance.
ter. We have set the grid spacing and scaling parameter When predicting atomic forces, we instead differenti-
γ to be 0.1Å for all models in this work. ate a SchNet predicting the energy w.r.t. the atomic
4

positions:
TABLE I. Mean absolute errors for energy predictions on
the QM9 data set using 110k training examples. For SchNet,
F̂i (Z1 , . . . , Zn , r1 , . . . , rn ) = we give the average over three repetitions as well as standard
∂ Ê errors of the mean of the repetitions. Best models in bold.
− (Z1 , . . . , Zn , r1 , . . . , rn ). (4)
∂ri Property Unit SchNet (T = 6) enn-s2s29
When using a rotationally invariant energy model, this HOMO eV 0.041 ± 0.001 0.043
ensures rotationally equivariant force predictions and LUMO eV 0.034 ± 0.000 0.037
guarantees an energy conserving force field21 . ∆ eV 0.063 ± 0.000 0.069
ZPVE meV 1.7 ± 0.033 1.5
µ Debye 0.033 ± 0.001 0.030
F. Training
α Bohr3 0.235 ± 0.061 0.092
hR2 i Bohr2 0.073 ± 0.002 0.180
We train SchNet for each property target P by mini- U0 eV 0.014 ± 0.001 0.019
mizing the squared loss
U eV 0.019 ± 0.006 0.019
`(P̂ , P ) = kP − P̂ k2 . H eV 0.014 ± 0.001 0.017
G eV 0.014 ± 0.000 0.019
For the training of energies and forces of molecular dy- Cv cal / molK 0.033 ± 0.000 0.040
namics trajectories, we use a combined loss

`((Ê, F̂1 , . . . , F̂n )), (E, F1 , . . . , Fn )) = 0.09


! 2 DTNN, T=3
0.08
nX

SchNet, T=1
atoms
2 1 ∂ Ê
ρ kE − Êk + Fi − − (5)

SchNet, T=2
mean abs. error [eV]
natoms ∂Ri 0.07

SchNet, T=3

i=0
0.06 SchNet, T=6
where ρ is a trade-off between energy and force loss51 .
All models are trained with mini-batch stochastic gra- 0.05 chem. accuracy
dient descent using the ADAM optimizer52 with mini- 0.04
batches of 32 examples. We decay the learning rate ex-
ponentially with ratio 0.96 every 100,000 steps. In each
0.03
experiment, we split the data into a training set of given 0.02
size N and use a validation set for early stopping. The 0.01
remaining data is used for computing the test errors. 10k 25k 50k 100k
Since there is a maximum number of atoms being lo- # training examples
cated within a given cutoff, the computational cost of a
training step scales linearly with the system size if we FIG. 3. Mean absolute error (in eV) of energy predictions
precompute the indices of nearby atoms. (U0 ) on the QM9 dataset53–55 depending on the number of
interaction blocks and reference calculations used for train-
ing. For reference, we give the best performing DTNN models
III. RESULTS (T=3)28 .

A. Learning molecular properties


the performance to the message-passing neural network
We train SchNet models to predict various proper- enn-s2s 29 that use additional bond information beyond
ties of the QM9 dataset53–55 of 131k small organic atomic positions to learn a molecular representation. The
molecules with up to nine heavy atoms from CONF. Fol- SchNet predictions of the polarizability α and the elec-
lowing Gilmer et al. 29 and Faber et al. 10 , we use a val- tronic spatial extent hR2 i fall noticeably short in terms
idation set of 10,000 molecules. We sum over atomic of accuracy. This is most likely due to the decomposi-
contribution P̂i for all properties but HOMO , LUMO and tion of the energy into atomic contributions which is not
the gap ∆, where we take the average. We use T = 6 in- appropriate for these properties. In contrast to SchNet,
teraction blocks and atomic representations with F = 64 Gilmer et al. 29 employ a set2set model variant56 that
feature dimension and perform up to 10 million gradient obtains a global representation and does not suffer from
descent parameter updates. Since the molecules of QM9 this issue. However, SchNet reaches or improves over
are quite small, we do not use a distance cutoff. For the enn-s2s in 8 out of 12 properties where a decomposition
Gaussian expansion, we use a range up to 20Å to cover into atomic contributions is a good choice. The distribu-
all interatomic distances occurring in the data. The pre- tions of the errors of all predicted properties are shown
diction errors are listed in Table I, where we compare in Appendix A. Extending SchNet with interpretable,
5

2.0 IA VA
Cl
TABLE II. Mean absolute errors for formation energy pre- S Br
Se IIA VIA
dictions in eV/atom on the Materials Project data set. For 1.5
O IIIA VIIA
SchNet, we give the average over three repetitions as well as

2nd principal component


1.0 I IVA VIIIA
standard errors of the mean of the repetitions. Best models Xe
Kr F
in bold. Te
N K Ba
0.5
Model N = 3, 000 N = 60, 000 As RbCs
P Na
5 0.0
ext. Coulomb matrix 0.64 – Sr
H Li
Ewald sum matrix5 0.49 – Ge He Ca
0.5 Sb Ne Ar Mg
Bi
In B
sine matrix5 0.37 – Sn GaPbAl
C Tl
1.0
SchNet (T = 6) 0.127 ± 0.001 0.035 ± 0.000 Si
Be
1.5
1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5
1st principal component
property-specific output layers, e.g. for the dipole mo-
ment57 , is subject to future work. FIG. 4. The two leading principal components of the learned
Fig. 3 shows learning curves of SchNet for the total en- embeddings x0 of sp atoms learned by SchNet from the Ma-
ergy U0 with T ∈ {1, 2, 3, 6} interaction blocks compared terials Project dataset. We recognize a structure in the em-
to the best performing DTNN models28 . The best per- bedding space according to the groups of the periodic table
(color-coded) as well as an ordering from lighter to heavier
forming DTNN with T = 3 interaction blocks can only
elements within the groups, e.g., in groups IA and IIA from
outperform the SchNet model with T = 1. We observe light atoms (left) to heavier atoms (right).
that beyond two interaction blocks the error improves
only slightly from 0.015 eV with T = 2 interaction blocks
to 0.014 eV for T ∈ {3, 6} using 110k training examples.
ality, we visualize two leading principal components of all
When training on fewer examples, the differences become
sp-atom type embeddings as well as their corresponding
more significant and T = 6, while having the most pa-
group (see Fig. 4). The neural network aims to use the
rameters, exhibits the lowest errors. Additionally, the
embedding space efficiently, such that this 2d projection
model requires much less epochs to converge, e.g., using
explains only about 20% of the variance of the embed-
110k training examples reducing the required number of
dings, i.e., since important directions are missing, em-
epochs from 2400 with T = 2 to less than 750 epochs
beddings might cover each other in the projection while
with T = 6.
actually being further apart. Still, we already recognize
a grouping of elements following the groups of the pe-
riodic table. This implies that SchNet has learned that
B. Learning formation energies of materials atom types of the same group exhibit similar chemical
properties. Within some of the groups, we can even ob-
We employ SchNet to predict formation energies for serve an ordering from lighter to heavier elements, e.g.,
bulk crystals using 69,640 structures and reference calcu- in groups IA and IIA from light elements on the left to
lations from the Materials Project (MP) repository58,59 . heavier ones on the right or, less clear in group VA with
It consists of a large variety of bulk crystals with atom a partial ordering N – {As, P} – {Sb, Bi}. Note that
type ranging across the whole periodic table up to Z = this knowledge was not imposed on the machine learning
94. Mean absolute errors are listed in Table II. Again, we model, but inferred by SchNet from the geometries and
use T = 6 interaction blocks and atomic representations formation energy targets of the MP data.
with F = 64 feature dimension. We set the distance cut-
off rcut = 5Å and discard two examples from the data
set that would include isolated atoms with this setting. C. Local chemical potentials
Then, the data is randomly split into 60,000 training ex-
amples, a validation set of 4,500 examples and the re-
maining data as test set. Even though the MP reposi- Since the SchNet is a variant of DTNNs, we can vi-
tory is much more diverse than the QM9 molecule bench- sualize the learned representation with a “local chemical
mark, SchNet is able to predict formation energies up to potential” ΩZprobe (r) as proposed by Schütt et al. 28 : We
a mean absolute error of 0.035 eV/atom. The distribu- compute the energy of a virtual atom that acts as a test
tion of the errors is shown in Appendix A. On a smaller charge. This can be achieved by adding the probe atom
subset 3,000 training examples, SchNet still achieves an (Zprobe , rprobe ) as an input of SchNet. The continuous
MAE of 0.127 eV/atom improving significantly upon the filter-convolution of the probe atom with the atoms of
descriptors proposed by Faber et al. 5 . the system
Since the MP dataset contains 89 atom types rang- nX
atoms
ing across the periodic table, we examine the learned xl+1 l l
xli ◦ W l (rprobe − ri ),
probe = (X ∗ W )i = (6)
atom type embeddings x0 . Due to their high dimension- i=0
6

P
FIG. 5. Local chemical potentials ΩC (r) of DTNN (top) and SchNet (bottom) using a carbon test charge on a i kr − ri k =
3.7Å isosurface are shown for benzene, toluene, methane, pyrazine and propane.

embedding in its molecular environment. It remains to


be seen how the “local chemical potentials” inferred by
the networks can be correlated with traditional quantum-
mechanical observables such as electron density, electro-
static potentials, or electronic orbitals. In addition, such
local potentials could aid in the understanding and pre-
diction of chemical reactivity trends.
In the same manner, we show cuts through ΩC (r) for
graphite and diamond in Fig. 6. As expected, they re-
semble the periodic structure of the solid, much like the
corresponding filters in Fig. 2. In solids, such local chem-
ical potentials could be used to understand the formation
and distribution of defects, such as vacancies and inter-
stitials.

D. Combined learning of energies and atomic forces


FIG. 6. Cuts through local chemical potentials ΩC (r) of
SchNet using a carbon test charge are shown for graphite We apply SchNet to the prediction of potential energy
(left) and diamond (right). surfaces and force fields of the MD17 benchmark set of
molecular dynamics trajectories introduced by Chmiela
et al. 21 . MD17 is a collection of eight molecular dynam-
ensures that the test charge only senses but does not ics simulations for small organic molecules. Tables III
influence the feature representation. We use Mayavi60 to and IV list mean absolute errors for energy and force
visualize the potentials. predictions. We trained SchNet on randomly sampled
Figure 5 shows a comparison of the local potentials of training sets with N = 1, 000 and N = 50, 000 reference
various molecules from QM9 generated by DTNN and calculations for up to 2 million mini-batch gradient steps
SchNet. Both DTNN and SchNet can clearly grasp fun- and additionally used a validation set of 1,000 examples
damental chemical concepts such as bond saturation and for early stopping. The remaining data was used for test-
different degrees of aromaticity. While the general struc- ing. We also list the performances of gradient domain
ture of the potential on the surfaces is similar, the SchNet machine learning (GDML)21 and DTNN28 for reference.
potentials exhibit sharper features and have a more pro- SchNet was trained with T = 3 interaction blocks and
nounced separation of high-energy and low-energy areas. F = 64 feature maps using only energies as well as using
The overall appearence of the distinguishing molecular the combined loss for energies and forces from Eq. 5 with
features in the “local chemical potentials” is remarkably ρ = 0.01. This trade-off constitutes a compromise to ob-
robust to the underlying neural network architecture, tain a single model that performs well on energies and
representing the common quantum-mechanical atomic forces for a fair comparison with GDML. Again, we do
7

TABLE III. Mean absolute errors for total energies (in kcal/mol). GDML21 , DTNN28 and SchNet30 test errors for N=1,000
and N=50,000 reference calculations of molecular dynamics simulations of small, organic molecules are shown. Best results are
given in bold.

N = 1,000 N = 50,000
GDML SchNet DTNN SchNet
trained on forces energy energy+forces energy energy energy+forces
Benzene 0.07 1.19 0.08 0.04 0.08 0.07
Toluene 0.12 2.95 0.12 0.18 0.16 0.09
Malonaldehyde 0.16 2.03 0.13 0.19 0.13 0.08
Salicylic acid 0.12 3.27 0.20 0.41 0.25 0.10
Aspirin 0.27 4.20 0.37 – 0.25 0.12
Ethanol 0.15 0.93 0.08 – 0.07 0.05
Uracil 0.11 2.26 0.14 – 0.13 0.10
Naphthalene 0.12 3.58 0.16 – 0.20 0.11

TABLE IV. Mean absolute errors for atomic forces (in kcal/mol/Å). GDML21 and SchNet30 test errors for N=1,000 and
N=50,000 reference calculations of molecular dynamics simulations of small, organic molecules are shown. Best results are
given in bold.

N = 1,000 N = 50,000
GDML SchNet SchNet
trained on forces energy energy+forces energy energy+forces
Benzene 0.23 14.12 0.31 1.23 0.17
Toluene 0.24 22.31 0.57 1.79 0.09
Malonaldehyde 0.80 20.41 0.66 1.51 0.08
Salicylic acid 0.28 23.21 0.85 3.72 0.19
Aspirin 0.99 23.54 1.35 7.36 0.33
Ethanol 0.79 6.56 0.39 0.76 0.05
Uracil 0.24 20.08 0.56 3.28 0.11
Naphthalene 0.23 25.36 0.58 2.58 0.11

not use a distance cutoff due to the small molecules and tain MAEs of energy and force predictions below 0.12
a range up to 20Å for the Gaussian expansion to cover kcal/mol and 0.33 kcal/mol/Å, respectively. Remark-
all distances. In Section III E, we will see that even lower ably, SchNet performs better while using the combined
errors can be achieved when using two separate SchNet loss with energies and forces on 1,000 reference calcula-
models for energies and forces. tions than training on energies of 50,000 examples.
SchNet can take significant advantage of the additional
force information, reducing energy and force errors by 1-
2 orders of magnitude compared to energy only training E. Application to molecular dynamics of C20 -fullerene
on the small training set. With 50,000 training exam-
ples, the improvements are less apparent as the potential
After demonstrating the accuracy of SchNet on the
energy surface is already well-sampled at this point. On
MD17 benchmark set, we perform a study of a ML-
the small training set, SchNet outperforms GDML on
driven MD simulation of C20 -fullerene. This middle-
the more flexible molecules malonaldehyde and ethanol,
sized molecule has a complex PES that requires to be
while GDML reaches much lower force errors on the re-
described with accuracy to reproduce vibrational normal
maining MD trajectories that all include aromatic rings.
modes and their degeneracies. Here, we use SchNet to
A possible reason is that GDML defines an order of atoms
perform an analysis of some basic properties of the PES
in the molecule, while the SchNet architecture is inher-
of C20 when introducing nuclear quantum effects. The
ently invariant to indexing which constitutes a greater
reference data was generated by running classical MD at
advantage in the more flexible molecules.
500 K using DFT at the generalized gradient approxi-
While GDML is more data-efficient than a neural net- mation (GGA) level of theory with the Perdew-Burke-
work, SchNet is scalable to larger data sets. We ob- Ernzerhof (PBE)39 exchange-correlation functional and
8

10
3
TABLE V. Mean absolute errors for energy and force pre- 8
dictions of C20 -fullerene in kcal/mol and kcal/mol/Å, respec-

dist ribut ion


dist ribut ion
6 2
tively. We compare SchNet models with varying number of
interaction blocks T , feature dimensions F and energy-force 4
tradeoff ρ. For force-only training (ρ = 0), the integration 1
2
constant is fitted separately. Best models in bold.
0 0
T F ρ energy forces 1.3 1.4 1.5 1.6 1.7 3.8 4.0 4.2 4.4 4.6
nearest C-C [ Å] diam et er [ Å]
3 64 0.010 0.228 0.401 2.5
P= 1
6 64 0.010 0.202 0.217 2.0 P= 8
3 128 0.010 0.188 0.197

h(r) [ a.u.]
1.5
6 128 0.010 0.1002 0.120
1.0
6 128 0.100 0.027 0.171
0.5
6 128 0.010 0.100 0.120
0.0
6 128 0.001 0.238 0.061 1.5 2.0 2.5 3.0 3.5 4.0 4.5
6 128 0.000 0.260 0.058 r [ Å]

FIG. 8. Analysis of the fullerene C20 dynamics at 300K using


SchNet@DFT. Distribution functions for nearest neighbours,
DFT (PBE-TS) SchNet (PBE-TS) diameter of the fullerene and the atomic-pairs distribution
function using classical MD (blue) and PIMD with 8 beads
spectrum

(green).

of the combined loss function. First, we select the best


hyper-parameters T , F of the model given the trade-off
0 200 400 600 800 1000 1200 1400 ρ = 0.01 that we established to be a good compromise
frequency [cm 1]
on MD17 (see the upper part of Table V). We find that
the configuration of T = 6 and F = 128 works best for
[cm 1]

5 energies as well as forces. Given the selected model, we


next validate the best choice for the trade-off ρ. Here
0
DFT SchNet

we find that the best choices for energy and forces vastly
diverge: While we established before that energy predic-
5 tions benefit from force information (see Table III), we
achieve the best force predictions for C20 -fullerene when
0 200 400 600 800 1000 1200 1400 neglecting the energies. We still benefit from using the
frequency [cm 1]
derivative of an energy model as force model, since this
FIG. 7. Normal mode analysis of the fullerene C20 dynamics still guarantees an energy-conserving force field21 .
comparing SchNet and DFT results. For energy predictions, we obtain the best results when
using a larger ρ = 0.1 as this puts more emphasis on
the energy loss. Here, we select the force-only model as
the Tkatchenko-Scheffler (TS) method40 to account for force field to drive our MD simulation since we are inter-
van der Waals interactions. For further details about the ested in the mechanical properties of the C20 fullerene.
simulations can be found in Appendix B. Fig. 7 shows a comparison of the normal modes obtained
By training SchNet on DFT data at the PBE+vdWTS from DFT and our model. In the bottom panel, we show
level, we reduce the computation time per single point by the accuracy of SchNet with the largest error being ∼1%
three orders of magnitude from 11s using 32 CPU cores of the DFT reference frequencies. Given these results
to 10ms using one NVIDIA GTX1080. This allows us to and the accuracy reported in Table V, we obtained a
perform long MD simulations with DFT accuracy at low model that is successfully reconstructing the PES and its
computational cost, making this kind of study feasible. symmetries61 .
In order to obtain accurate energy and force predic- In addition, in Fig. 8 we present an analysis of the
tions, we first perform an extensive model selection on nearest neighbor (1nn), diameter and radial distribution
the given reference data. We use 20k C20 references cal- functions at 300 K for classical MD (blue) and PIMD
culations as training set, 4.5k examples for early stopping (green) simulations that include nuclear quantum effects.
and report the test error on the remaining data. Table See Appendix B for further details on the simulation.
V lists the results for various settings of number of inter- From Fig. 8 (and Fig. 11), it looks like nuclear delocal-
action blocks T , number of feature dimensions F of the ization does not play a significant role in the peaks of
atomic representations and the energy-force trade-off ρ the pair distribution function h(r) for C20 at room tem-
9

perature. The nuclear quantum effects increase the 1nn molecules and periodic systems as well as further develop-
distances by less than 0.5% but the delocalization of the ments towards interpretable deep learning architectures
bond lengths is considerable. This result agrees with pre- to assist chemistry research.
viously reported PIMD simulations of graphene62 . How-
ever, here we have a non-symmetric distributions due to
the finite size of C20 . ACKNOWLEDGMENTS
Overall, with SchNet we could carry out 1.25 ns of
PIMD, reducing the runtime compared to DFT by 3-4 This work was supported by the Federal Ministry of
orders of magnitude: from about 7 years to less than Education and Research (BMBF) for the Berlin Big Data
7 hours with much less computational resources. Such Center BBDC (01IS14013A). Additional support was
long time MD simulations are required for detailed stud- provided by the DFG (MU 987/20-1), from the European
ies of mechanical and thermodynamical properties as a Union’s Horizon 2020 research and innovation program
function of the temperature, especially in the low temper- under the Marie Sklodowska-Curie grant agreement NO
ature regime where the nuclear quantum effects become 657679, the BK21 program funded by Korean National
extremely important. Clearly, this application evinces Research Foundation grant (No. 2012-005741) and the
the need for fast and accurate machine learning model Institute for Information & Communications Technology
such as SchNet to explore the different nature of chemical Promotion (IITP) grant funded by the Korea government
interactions and quantum behavior to better understand (no. 2017-0-00451). A.T. acknowledges support from the
molecules and materials. European Research Council (ERC-CoG grant BeStMo).
Correspondence to KTS, AT and KRM.

IV. CONCLUSIONS
Appendix A: Error distributions
Instead of having to painstakingly design mechanistic
force fields or machine learning descriptors, deep learning In Figures 9 and 10, we show histograms of the pre-
allows to learn a representation from first principles that dicted properties of the QM9 and Materials Project
adapts to the task and scale at hand, from property pre- dataset, respectively. The histograms include all test er-
diction across chemical compound space to force fields in rors made across all three repetitions.
the configurational space of single molecules. The design
challenge here has been shifted to modelling quantum
interactions by choosing a suitable neural network archi- Appendix B: MD simulation details
tecture. This gives rise to the possibility to encode known
quantum-chemical constraints and symmetries within the
The reference data for C20 was generated using clas-
model without loosing the flexibility of a neural network.
sical molecular dynamics in the NVT ensemble at 500
This is crucial in order to be able to accurately represent,
K using the Nose-Hoover thermostat with a time step of
e.g., the full potential-energy surface and in particular its
1 fs. The forces and energies were computed using DFT
anharmonic behavior.
with the generalized gradient approximation (GGA) level
We have presented SchNet as such a versatile deep of theory with the non-empirical exchange-correlation
learning architecture for quantum chemistry and a valu- functional of Perdew-Burke-Ernzerhof (PBE)39 and the
able tool in a variety of applications ranging from the Tkatchenko-Scheffler (TS) method40 to account for ubiq-
property prediction for diverse datasets of molecules and uitous van der Waals interactions. The calculations were
materials to the highly accurate prediction of potential done using all-electrons with a light basis set imple-
energy surfaces and energy-conserving force fields. As mented in the FHI-aims code63 .
a variant of DTNNs, SchNet follows rotational, transla- The quantum nuclear effects are introduced using
tional and permutational invariances by design and, be- path-integral molecular dynamics (PIMD) via the Feyn-
yond that, is able to directly model periodic boundary man’s path integral formalism. The PIMD simulations
conditions. Not only does SchNet yield fast and accu- were done using the SchNet model implementation in the
rate predictions, it also allows to examine the learned i-PI code64 . The integration timestep was set to 0.5 fs
representation using local chemical potentials28 . Beyond to ensure energy conservation along the MD using the
that, we have analyzed the atomic embeddings learned by NVT ensemble with a stochastic path integral Langevin
SchNet and found that fundamental chemical knowledge equation (PILE) thermostat65 . In PIMD the treatment
had been recovered purely from a dataset of bulk crys- of NQE is controlled by the number of beads, P. In our
tals and formation energies. Most importantly, we have example for C20 fullerene, we can see that at room tem-
performed an exemplary path-integral molecular dynam- perature using 8 beads gives an already converged radial
ics study of the fullerene C20 at the PBE+vdWTS level distribution function h(r) as shown in Figure 11.
of theory that would not have been computational feasi-
ble with common DFT approaches. These encouraging 1 M.
Rupp, A. Tkatchenko, K.-R. Müller, and O. A. Von Lilienfeld,
results will guide future work such as studies of larger Phys. Rev. Lett. 108, 058301 (2012).
10

5000 5000 5000


4000
4000 4000 4000
# predictions

# predictions

# predictions

# predictions
3000 3000 3000
3000
2000 2000 2000 2000
1000 1000 1000 1000
0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101
| HOMO - HOMO | [eV] | LUMO - LUMO | [eV] | - | [eV] | ZPVE - ZPVE | [eV]
5000 6000
4000
8000
4000
3000
# predictions

# predictions

# predictions

# predictions
4000 6000
3000
2000 4000
2000
2000
1000 1000 2000
0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101
| - | [Debye] | - | [Bohr3] | R2 - R 2 | [Bohr2] | U0 - U0 | [eV]
5000 5000
5000 5000
4000 4000 4000 4000
# predictions

# predictions

# predictions

# predictions
3000 3000 3000 3000
2000 2000 2000 2000
1000 1000 1000 1000
0
10 5 10 4 10 3 10 10
2 1 100 101 0
10 5 10 4 10 3 10 10
2 1 100 101 0
10 5 10 4 10 3 10 10
2 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101
| U - U | [eV] | H - H | [eV] | G - G | [eV] | Cv - Cv | [cal/molK]

FIG. 9. Histograms of absolute errors for all predicted properties of QM9. The histograms are plotted on a logarithmic scale
to visualize the tails of the distribution.

von Lilienfeld, K.-R. Müller, and A. Tkatchenko, J. Phys. Chem.


Lett. 6, 2326 (2015).
2000 8 F. A. Faber, A. Lindmaa, O. A. Von Lilienfeld, and R. Armiento,

Physical review letters 117, 135502 (2016).


9 M. Hirn, S. Mallat, and N. Poilvert, Multiscale Modeling &
1500
# predictions

Simulation 15, 827 (2017).


10 F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Schoenholz,
1000 G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Riley, and O. A. von
Lilienfeld, arXiv preprint arXiv:1702.05532 (2017).
500 11 H. Huo and M. Rupp, arXiv preprint arXiv:1704.06439 (2017).
12 M. Eickenberg, G. Exarchakis, M. Hirn, and S. Mallat, in Ad-

0 vances in Neural Information Processing Systems 30 (2017) pp.


10 5 10 4 10 3 10 2 10 1 100 6522–6531.
|E-E| 13 O. Isayev, C. Oses, C. Toher, E. Gossett, S. Curtarolo, and

A. Tropsha, Nature communications 8, 15679 (2017).


14 K. Ryczko, K. Mills, I. Luchak, C. Homenick, and I. Tamblyn,
FIG. 10. Histogram of absolute errors for the predictions of arXiv preprint arXiv:1706.09496 (2017).
formation energies / atom for the Materials Project dataset. 15 I. Luchak, K. Mills, K. Ryczko, A. Domurad, and I. Tamblyn,

The histogram is plotted on a logarithmic scale to visualize arXiv preprint arXiv:1708.06686 (2017).
16 J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 (2007).
the tails of the distribution.
17 J. Behler, J. Chem. Phys. 134, 074106 (2011).
18 A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Phys.

Rev. Lett. 104, 136403 (2010).


19 A. P. Bartók, R. Kondor, and G. Csányi, Phys. Rev. B 87,
2 G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia,
184115 (2013).
K. Hansen, A. Tkatchenko, K.-R. Müller, and O. A. von Lilien- 20 A. V. Shapeev, Multiscale Modeling & Simulation 14, 1153
feld, New J. Phys. 15, 095003 (2013).
3 K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Schef- (2016).
21 S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T.
fler, O. A. Von Lilienfeld, A. Tkatchenko, and K.-R. Müller, J.
Schütt, and K.-R. Müller, Science Advances 3, e1603015 (2017).
Chem. Theory Comput. 9, 3404 (2013). 22 F. Brockherde, L. Voigt, L. Li, M. E. Tuckerman, K. Burke, and
4 K. T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K.-R. Müller,
K.-R. Müller, Nature Communications 8, 872 (2017).
and E. Gross, Phys. Rev. B 89, 205118 (2014). 23 J. S. Smith, O. Isayev, and A. E. Roitberg, Chemical Science 8,
5 F. Faber, A. Lindmaa, O. A. von Lilienfeld, and R. Armiento,
3192 (2017).
International Journal of Quantum Chemistry 115, 1094 (2015). 24 E. V. Podryabinkin and A. V. Shapeev, Computational Materials
6 R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld,
Science 140, 171 (2017).
Journal of chemical theory and computation 11, 2087 (2015). 25 P. Rowe, G. Csányi, D. Alfè, and A. Michaelides, arXiv preprint
7 K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A.
11

2.5
P=1
2.0 P=4

h(r) [a.u.]
1.5 P=8
1.0 P=12
0.5
0.0
1.5 2.0 2.5 3.0 3.5 4.0 4.5
r [Å]

FIG. 11. Histograms of absolute errors for all predicted properties of QM9. The histograms are plotted on a logarithmic scale
to visualize the tails of the distribution.

arXiv:1710.04187 (2017). Advances in Neural Information Processing Systems 29, edited


26 D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and
T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, in NIPS, edited R. Garnett (2016) pp. 667–675.
by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and 46 Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
R. Garnett (2015) pp. 2224–2232. W. Hubbard, and L. D. Jackel, Neural computation 1, 541
27 S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. F. Riley,
(1989).
Journal of Computer-Aided Molecular Design 30, 595 (2016). 47 A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Advances in
28 K. T. Schütt, F. Arbabzadah, S. Chmiela, K.-R. Müller, and
neural information processing systems (2012) pp. 1097–1105.
A. Tkatchenko, Nature Communications 8, 13890 (2017). 48 A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,
29 J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E.
A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, in
Dahl, in Proceedings of the 34th International Conference on 9th ISCA Speech Synthesis Workshop (2016) pp. 125–125.
Machine Learning (2017) pp. 1263–1272. 49 F. Chollet, arXiv preprint arXiv:1610.02357 (2016).
30 K. T. Schütt, P.-J. Kindermans, H. E. Sauceda, S. Chmiela, 50 K. He, X. Zhang, S. Ren, and J. Sun, in Proceedings of the IEEE
A. Tkatchenko, and K.-R. Müller, in Advances in Neural In- Conference on Computer Vision and Pattern Recognition (2016)
formation Processing Systems 30 (2017) pp. 992–1002. pp. 770–778.
31 D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, 51 A. Pukrittayakamee, M. Malshe, M. Hagan, L. Raff, R. Narulkar,
K. Hansen, and K.-R. Müller, Journal of Machine Learning Re- S. Bukkapatnum, and R. Komanduri, The Journal of chemical
search 11, 1803 (2010). physics 130, 134101 (2009).
32 K. Simonyan, A. Vedaldi, and A. Zisserman, arXiv preprint 52 D. P. Kingma and J. Ba, in ICLR (2015).
arXiv:1312.6034 (2013). 53 R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld,
33 S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller,
Scientific Data 1, 140022 (2014).
and W. Samek, PloS one 10, e0130140 (2015). 54 L. C. Blum and J.-L. Reymond, J. Am. Chem. Soc. 131, 8732
34 L. M. Zintgraf, T. S. Cohen, T. Adel, and M. Welling, in ICLR
(2009).
(2017). 55 J.-L. Reymond, Acc. Chem. Res. 48, 722 (2015).
35 G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. 56 O. Vinyals, S. Bengio, and M. Kudlur, arXiv preprint
Müller, Pattern Recognition 65, 211 (2017). arXiv:1511.06391 (2015).
36 P.-J. Kindermans, K. T. Schütt, M. Alber, K.-R. Müller, D. Er- 57 M. Gastegger, J. Behler, and P. Marquetand, arXiv preprint
han, B. Kim, and S. Dähne, arXiv preprint arXiv:1705.05598 arXiv:1705.05907 (2017).
(2017). 58 A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards,
37 G. Montavon, W. Samek, and K.-R. Müller, Digital Signal Pro-
S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A.
cessing 73, 1 (2018). Persson, APL Materials 1, 011002 (2013).
38 K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, 59 S. P. Ong, W. D. Richards, A. Jain, G. Hautier, M. Kocher,
R. Zemel, and Y. Bengio, in International Conference on Ma- S. Cholia, D. Gunter, V. L. Chevrier, K. A. Persson, and
chine Learning (2015) pp. 2048–2057. G. Ceder, Computational Materials Science 68, 314 (2013).
39 J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 60 P. Ramachandran and G. Varoquaux, Computing in Science &
3865 (1996). Engineering 13, 40 (2011).
40 A. Tkatchenko and M. Scheffler, Phys. Rev. Lett. 102, 073005 61 Code and trained models are available at:
(2009). https://github.com/atomistic-machine-learning/SchNet.
41 I. Poltavsky and A. Tkatchenko, Chem. Sci. 7, 1368 (2016). 62 I. Poltavsky, R. A. DiStasio Jr., and A. Tkatchenko, J. Chem.
42 G. W. Taylor and G. E. Hinton, Proceedings of the 26th Annual
Phys. 148, 102325 (2018).
International Conference on Machine Learning ICML 09 49, 1 63 V. Blum, R. Gehrke, F. Hanke, P. Havu, V. Havu, X. Ren,
(2009). K. Reuter, and M. Scheffler, Computer Physics Communications
43 D. Yu, L. Deng, and F. Seide, IEEE Transactions on Audio,
180, 2175 (2009).
Speech, and Language Processing 21, 388 (2013). 64 M. Ceriotti, J. More, and D. E. Manolopoulos, Computer
44 R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning,
Physics Communications 185, 1019 (2014).
A. Y. Ng, and C. Potts, in EMNLP, Vol. 1631 (2013) p. 1642. 65 M. Ceriotti, M. Parrinello, T. E. Markland, and D. E.
45 X. Jia, B. De Brabandere, T. Tuytelaars, and L. V. Gool, in
Manolopoulos, The Journal of Chemical Physics 133, 124104
(2010), https://doi.org/10.1063/1.3489925.

You might also like