SchNet-a Deep Learning Architecture For Molecules and Materials Research Paper
SchNet-a Deep Learning Architecture For Molecules and Materials Research Paper
K.T. Schütt,1, a) H.E. Sauceda,2 P.-J. Kindermans,1 A. Tkatchenko,3, b) and K.-R. Müller1, 4, 5, c)
1)
Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
2)
Fritz-Haber-Institut der Max-Planck-Gesellschaft, 14195 Berlin, Germany
3)
Physics and Materials Science Research Unit, University of Luxembourg, L-1511 Luxembourg,
Luxembourg
4)
Max-Planck-Institut für Informatik, Saarbrücken, Germany
5)
Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713,
South Korea
(Dated: 23 March 2018)
arXiv:1712.06113v3 [physics.chem-ph] 22 Mar 2018
Deep learning has led to a paradigm shift in artificial intelligence, including web, text and image search,
speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning in
general and deep learning in particular is ideally suited for representing quantum-mechanical interactions,
enabling to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound
space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic
systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by
accurately predicting a range of properties across chemical space for molecules and materials where our model
learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet
to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of
small molecules and perform an exemplary study of the quantum-mechanical properties of C20 -fullerene that
would have been infeasible with regular ab initio molecular dynamics.
B. Atom-wise layers
FIG. 1. Illustrations of the SchNet architecture (left) and
interaction blocks (right) with atom embedding in green, in-
Atom-wise layers are dense layers that are applied sep-
teraction blocks in yellow and property prediction network in
blue. For each parameterized layer, the number of neurons arately to the representations xli of each atom i:
is given. The filter-generating network (orange) is shown in
detail in Fig. 2. xl+1
i = W l xli + bl (2)
positions:
TABLE I. Mean absolute errors for energy predictions on
the QM9 data set using 110k training examples. For SchNet,
F̂i (Z1 , . . . , Zn , r1 , . . . , rn ) = we give the average over three repetitions as well as standard
∂ Ê errors of the mean of the repetitions. Best models in bold.
− (Z1 , . . . , Zn , r1 , . . . , rn ). (4)
∂ri Property Unit SchNet (T = 6) enn-s2s29
When using a rotationally invariant energy model, this HOMO eV 0.041 ± 0.001 0.043
ensures rotationally equivariant force predictions and LUMO eV 0.034 ± 0.000 0.037
guarantees an energy conserving force field21 . ∆ eV 0.063 ± 0.000 0.069
ZPVE meV 1.7 ± 0.033 1.5
µ Debye 0.033 ± 0.001 0.030
F. Training
α Bohr3 0.235 ± 0.061 0.092
hR2 i Bohr2 0.073 ± 0.002 0.180
We train SchNet for each property target P by mini- U0 eV 0.014 ± 0.001 0.019
mizing the squared loss
U eV 0.019 ± 0.006 0.019
`(P̂ , P ) = kP − P̂ k2 . H eV 0.014 ± 0.001 0.017
G eV 0.014 ± 0.000 0.019
For the training of energies and forces of molecular dy- Cv cal / molK 0.033 ± 0.000 0.040
namics trajectories, we use a combined loss
2.0 IA VA
Cl
TABLE II. Mean absolute errors for formation energy pre- S Br
Se IIA VIA
dictions in eV/atom on the Materials Project data set. For 1.5
O IIIA VIIA
SchNet, we give the average over three repetitions as well as
P
FIG. 5. Local chemical potentials ΩC (r) of DTNN (top) and SchNet (bottom) using a carbon test charge on a i kr − ri k =
3.7Å isosurface are shown for benzene, toluene, methane, pyrazine and propane.
TABLE III. Mean absolute errors for total energies (in kcal/mol). GDML21 , DTNN28 and SchNet30 test errors for N=1,000
and N=50,000 reference calculations of molecular dynamics simulations of small, organic molecules are shown. Best results are
given in bold.
N = 1,000 N = 50,000
GDML SchNet DTNN SchNet
trained on forces energy energy+forces energy energy energy+forces
Benzene 0.07 1.19 0.08 0.04 0.08 0.07
Toluene 0.12 2.95 0.12 0.18 0.16 0.09
Malonaldehyde 0.16 2.03 0.13 0.19 0.13 0.08
Salicylic acid 0.12 3.27 0.20 0.41 0.25 0.10
Aspirin 0.27 4.20 0.37 – 0.25 0.12
Ethanol 0.15 0.93 0.08 – 0.07 0.05
Uracil 0.11 2.26 0.14 – 0.13 0.10
Naphthalene 0.12 3.58 0.16 – 0.20 0.11
TABLE IV. Mean absolute errors for atomic forces (in kcal/mol/Å). GDML21 and SchNet30 test errors for N=1,000 and
N=50,000 reference calculations of molecular dynamics simulations of small, organic molecules are shown. Best results are
given in bold.
N = 1,000 N = 50,000
GDML SchNet SchNet
trained on forces energy energy+forces energy energy+forces
Benzene 0.23 14.12 0.31 1.23 0.17
Toluene 0.24 22.31 0.57 1.79 0.09
Malonaldehyde 0.80 20.41 0.66 1.51 0.08
Salicylic acid 0.28 23.21 0.85 3.72 0.19
Aspirin 0.99 23.54 1.35 7.36 0.33
Ethanol 0.79 6.56 0.39 0.76 0.05
Uracil 0.24 20.08 0.56 3.28 0.11
Naphthalene 0.23 25.36 0.58 2.58 0.11
not use a distance cutoff due to the small molecules and tain MAEs of energy and force predictions below 0.12
a range up to 20Å for the Gaussian expansion to cover kcal/mol and 0.33 kcal/mol/Å, respectively. Remark-
all distances. In Section III E, we will see that even lower ably, SchNet performs better while using the combined
errors can be achieved when using two separate SchNet loss with energies and forces on 1,000 reference calcula-
models for energies and forces. tions than training on energies of 50,000 examples.
SchNet can take significant advantage of the additional
force information, reducing energy and force errors by 1-
2 orders of magnitude compared to energy only training E. Application to molecular dynamics of C20 -fullerene
on the small training set. With 50,000 training exam-
ples, the improvements are less apparent as the potential
After demonstrating the accuracy of SchNet on the
energy surface is already well-sampled at this point. On
MD17 benchmark set, we perform a study of a ML-
the small training set, SchNet outperforms GDML on
driven MD simulation of C20 -fullerene. This middle-
the more flexible molecules malonaldehyde and ethanol,
sized molecule has a complex PES that requires to be
while GDML reaches much lower force errors on the re-
described with accuracy to reproduce vibrational normal
maining MD trajectories that all include aromatic rings.
modes and their degeneracies. Here, we use SchNet to
A possible reason is that GDML defines an order of atoms
perform an analysis of some basic properties of the PES
in the molecule, while the SchNet architecture is inher-
of C20 when introducing nuclear quantum effects. The
ently invariant to indexing which constitutes a greater
reference data was generated by running classical MD at
advantage in the more flexible molecules.
500 K using DFT at the generalized gradient approxi-
While GDML is more data-efficient than a neural net- mation (GGA) level of theory with the Perdew-Burke-
work, SchNet is scalable to larger data sets. We ob- Ernzerhof (PBE)39 exchange-correlation functional and
8
10
3
TABLE V. Mean absolute errors for energy and force pre- 8
dictions of C20 -fullerene in kcal/mol and kcal/mol/Å, respec-
h(r) [ a.u.]
1.5
6 128 0.010 0.1002 0.120
1.0
6 128 0.100 0.027 0.171
0.5
6 128 0.010 0.100 0.120
0.0
6 128 0.001 0.238 0.061 1.5 2.0 2.5 3.0 3.5 4.0 4.5
6 128 0.000 0.260 0.058 r [ Å]
(green).
we find that the best choices for energy and forces vastly
diverge: While we established before that energy predic-
5 tions benefit from force information (see Table III), we
achieve the best force predictions for C20 -fullerene when
0 200 400 600 800 1000 1200 1400 neglecting the energies. We still benefit from using the
frequency [cm 1]
derivative of an energy model as force model, since this
FIG. 7. Normal mode analysis of the fullerene C20 dynamics still guarantees an energy-conserving force field21 .
comparing SchNet and DFT results. For energy predictions, we obtain the best results when
using a larger ρ = 0.1 as this puts more emphasis on
the energy loss. Here, we select the force-only model as
the Tkatchenko-Scheffler (TS) method40 to account for force field to drive our MD simulation since we are inter-
van der Waals interactions. For further details about the ested in the mechanical properties of the C20 fullerene.
simulations can be found in Appendix B. Fig. 7 shows a comparison of the normal modes obtained
By training SchNet on DFT data at the PBE+vdWTS from DFT and our model. In the bottom panel, we show
level, we reduce the computation time per single point by the accuracy of SchNet with the largest error being ∼1%
three orders of magnitude from 11s using 32 CPU cores of the DFT reference frequencies. Given these results
to 10ms using one NVIDIA GTX1080. This allows us to and the accuracy reported in Table V, we obtained a
perform long MD simulations with DFT accuracy at low model that is successfully reconstructing the PES and its
computational cost, making this kind of study feasible. symmetries61 .
In order to obtain accurate energy and force predic- In addition, in Fig. 8 we present an analysis of the
tions, we first perform an extensive model selection on nearest neighbor (1nn), diameter and radial distribution
the given reference data. We use 20k C20 references cal- functions at 300 K for classical MD (blue) and PIMD
culations as training set, 4.5k examples for early stopping (green) simulations that include nuclear quantum effects.
and report the test error on the remaining data. Table See Appendix B for further details on the simulation.
V lists the results for various settings of number of inter- From Fig. 8 (and Fig. 11), it looks like nuclear delocal-
action blocks T , number of feature dimensions F of the ization does not play a significant role in the peaks of
atomic representations and the energy-force trade-off ρ the pair distribution function h(r) for C20 at room tem-
9
perature. The nuclear quantum effects increase the 1nn molecules and periodic systems as well as further develop-
distances by less than 0.5% but the delocalization of the ments towards interpretable deep learning architectures
bond lengths is considerable. This result agrees with pre- to assist chemistry research.
viously reported PIMD simulations of graphene62 . How-
ever, here we have a non-symmetric distributions due to
the finite size of C20 . ACKNOWLEDGMENTS
Overall, with SchNet we could carry out 1.25 ns of
PIMD, reducing the runtime compared to DFT by 3-4 This work was supported by the Federal Ministry of
orders of magnitude: from about 7 years to less than Education and Research (BMBF) for the Berlin Big Data
7 hours with much less computational resources. Such Center BBDC (01IS14013A). Additional support was
long time MD simulations are required for detailed stud- provided by the DFG (MU 987/20-1), from the European
ies of mechanical and thermodynamical properties as a Union’s Horizon 2020 research and innovation program
function of the temperature, especially in the low temper- under the Marie Sklodowska-Curie grant agreement NO
ature regime where the nuclear quantum effects become 657679, the BK21 program funded by Korean National
extremely important. Clearly, this application evinces Research Foundation grant (No. 2012-005741) and the
the need for fast and accurate machine learning model Institute for Information & Communications Technology
such as SchNet to explore the different nature of chemical Promotion (IITP) grant funded by the Korea government
interactions and quantum behavior to better understand (no. 2017-0-00451). A.T. acknowledges support from the
molecules and materials. European Research Council (ERC-CoG grant BeStMo).
Correspondence to KTS, AT and KRM.
IV. CONCLUSIONS
Appendix A: Error distributions
Instead of having to painstakingly design mechanistic
force fields or machine learning descriptors, deep learning In Figures 9 and 10, we show histograms of the pre-
allows to learn a representation from first principles that dicted properties of the QM9 and Materials Project
adapts to the task and scale at hand, from property pre- dataset, respectively. The histograms include all test er-
diction across chemical compound space to force fields in rors made across all three repetitions.
the configurational space of single molecules. The design
challenge here has been shifted to modelling quantum
interactions by choosing a suitable neural network archi- Appendix B: MD simulation details
tecture. This gives rise to the possibility to encode known
quantum-chemical constraints and symmetries within the
The reference data for C20 was generated using clas-
model without loosing the flexibility of a neural network.
sical molecular dynamics in the NVT ensemble at 500
This is crucial in order to be able to accurately represent,
K using the Nose-Hoover thermostat with a time step of
e.g., the full potential-energy surface and in particular its
1 fs. The forces and energies were computed using DFT
anharmonic behavior.
with the generalized gradient approximation (GGA) level
We have presented SchNet as such a versatile deep of theory with the non-empirical exchange-correlation
learning architecture for quantum chemistry and a valu- functional of Perdew-Burke-Ernzerhof (PBE)39 and the
able tool in a variety of applications ranging from the Tkatchenko-Scheffler (TS) method40 to account for ubiq-
property prediction for diverse datasets of molecules and uitous van der Waals interactions. The calculations were
materials to the highly accurate prediction of potential done using all-electrons with a light basis set imple-
energy surfaces and energy-conserving force fields. As mented in the FHI-aims code63 .
a variant of DTNNs, SchNet follows rotational, transla- The quantum nuclear effects are introduced using
tional and permutational invariances by design and, be- path-integral molecular dynamics (PIMD) via the Feyn-
yond that, is able to directly model periodic boundary man’s path integral formalism. The PIMD simulations
conditions. Not only does SchNet yield fast and accu- were done using the SchNet model implementation in the
rate predictions, it also allows to examine the learned i-PI code64 . The integration timestep was set to 0.5 fs
representation using local chemical potentials28 . Beyond to ensure energy conservation along the MD using the
that, we have analyzed the atomic embeddings learned by NVT ensemble with a stochastic path integral Langevin
SchNet and found that fundamental chemical knowledge equation (PILE) thermostat65 . In PIMD the treatment
had been recovered purely from a dataset of bulk crys- of NQE is controlled by the number of beads, P. In our
tals and formation energies. Most importantly, we have example for C20 fullerene, we can see that at room tem-
performed an exemplary path-integral molecular dynam- perature using 8 beads gives an already converged radial
ics study of the fullerene C20 at the PBE+vdWTS level distribution function h(r) as shown in Figure 11.
of theory that would not have been computational feasi-
ble with common DFT approaches. These encouraging 1 M.
Rupp, A. Tkatchenko, K.-R. Müller, and O. A. Von Lilienfeld,
results will guide future work such as studies of larger Phys. Rev. Lett. 108, 058301 (2012).
10
# predictions
# predictions
# predictions
3000 3000 3000
3000
2000 2000 2000 2000
1000 1000 1000 1000
0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101
| HOMO - HOMO | [eV] | LUMO - LUMO | [eV] | - | [eV] | ZPVE - ZPVE | [eV]
5000 6000
4000
8000
4000
3000
# predictions
# predictions
# predictions
# predictions
4000 6000
3000
2000 4000
2000
2000
1000 1000 2000
0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101
| - | [Debye] | - | [Bohr3] | R2 - R 2 | [Bohr2] | U0 - U0 | [eV]
5000 5000
5000 5000
4000 4000 4000 4000
# predictions
# predictions
# predictions
# predictions
3000 3000 3000 3000
2000 2000 2000 2000
1000 1000 1000 1000
0
10 5 10 4 10 3 10 10
2 1 100 101 0
10 5 10 4 10 3 10 10
2 1 100 101 0
10 5 10 4 10 3 10 10
2 1 100 101 0
10 5 10 4 10 3 10 2 10 1 100 101
| U - U | [eV] | H - H | [eV] | G - G | [eV] | Cv - Cv | [cal/molK]
FIG. 9. Histograms of absolute errors for all predicted properties of QM9. The histograms are plotted on a logarithmic scale
to visualize the tails of the distribution.
The histogram is plotted on a logarithmic scale to visualize arXiv preprint arXiv:1708.06686 (2017).
16 J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 (2007).
the tails of the distribution.
17 J. Behler, J. Chem. Phys. 134, 074106 (2011).
18 A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Phys.
2.5
P=1
2.0 P=4
h(r) [a.u.]
1.5 P=8
1.0 P=12
0.5
0.0
1.5 2.0 2.5 3.0 3.5 4.0 4.5
r [Å]
FIG. 11. Histograms of absolute errors for all predicted properties of QM9. The histograms are plotted on a logarithmic scale
to visualize the tails of the distribution.