0% found this document useful (0 votes)
105 views590 pages

Cui, Qiang - Meuwly, Markus - Ren, Pengyu - Many-Body Effects and Electrostatics in Biomolecules-Pan Stanford Publishing - Roca Raton (2016)

Uploaded by

杨旭东
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views590 pages

Cui, Qiang - Meuwly, Markus - Ren, Pengyu - Many-Body Effects and Electrostatics in Biomolecules-Pan Stanford Publishing - Roca Raton (2016)

Uploaded by

杨旭东
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 590

Many-Body Effects

and Electrostatics
in Biomolecules
This page intentionally left blank
Many-Body Effects
and Electrostatics
in Biomolecules
edited by
Qiang Cui
Markus Meuwly
Pengyu Ren
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 20160308

International Standard Book Number-13: 978-981-4613-93-4 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

Contents

Preface xv

SECTION I QM AND QM/MM METHODS

1 A Modified Divide-and-Conquer Linear-Scaling Quantum


Force Field with Multipolar Charge Densities 3
Timothy J. Giese and Darrin M. York
1.1 Introduction 3
1.2 Linear-Scaling Quantum Force Fields 5
1.3 Methods 7
1.3.1 The Modified Divide-and-Conquer Method 7
1.3.2 Models 9
1.3.3 Computational Details 10
1.4 Results and Discussion 13
1.5 Conclusion 17
1.6 Appendices 18
1.6.1 Complex Harmonics and the Spherical Tensor
Gradient Operator 18
1.6.2 Real-Valued Harmonics 21
1.6.3 Gaussian Multipole Expansions 23
1.6.4 Point Multipole Expansions 24
1.6.5 Real-Valued Spherical Harmonic Gaunt
Coefficients 26

2 Explicit Polarization Theory 33


Yingjie Wang, Michael J. M. Mazack, Donald G. Truhlar,
and Jiali Gao
2.1 Introduction 34
2.2 Theoretical Background 36
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

vi Contents

2.2.1 Approximation of the Total Wave Function and


Total Energy 37
2.2.2 Approximation on the Electrostatic Interaction
between Fragments 38
2.2.3 Approximations to Interfragment
Exchange–Dispersion Interactions 40
2.2.4 Double Self-Consistent Field 41
2.3 Computational Details 44
2.4 Illustrative Examples 45
2.4.1 Multilevel X-Pol as a Quantum Chemical Model
for Macromolecules 45
2.4.2 The XP3P Model for Water as a Quantum
Mechanical Force Field 50
2.5 Conclusions 55

3 Quantum Mechanical Methods for Quantifying and


Analyzing Non-Covalent Interactions and for Force-Field
Development 65
C. David Sherrill and Kenneth M. Merz, Jr.
3.1 Introduction 65
3.2 Testing Force Fields Against High-Accuracy Quantum
Mechanics 67
3.2.1 Coupled-Cluster Benchmarks for Non-Bonded
Interactions 67
3.2.2 Comparison of Force Fields to Quantum
Mechanical Benchmarks 73
3.2.3 Performance of Force Fields for π -Interactions 74
3.2.4 Error Analysis for the Indinavir/HIV-II
Protease Complex 75
3.2.5 Error Analysis for Ubiquitin Folding 78
3.2.6 The Bio-Fragment Database 79
3.3 Understanding and Quantifying Intermolecular
Interactions using Symmetry-Adapted Perturbation
Theory 81
3.3.1 Using SAPT to Investigate Challenges for
Current Force Fields 86
3.3.2 Atomic-Partitioned Symmetry-Adapted
Perturbation Theory 90
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

Contents vii

3.4 Force Fields Fit to High-Quality Quantum Mechanical


Data 92
3.4.1 Force Fields Fit to SAPT 94
3.5 Conclusions 100

4 Force Field Development with Density-Based Energy


Decomposition Analysis 121
Nengjie Zhou, Qin Wu, and Yingkai Zhang
4.1 Introduction 121
4.2 Density-Based Energy Decomposition Analysis 122
4.2.1 The DEDA Approach 123
4.2.1.1 The frozen density energy 124
4.2.1.2 The electronic relaxation energy 124
4.2.1.3 The total binding energy 124
4.2.1.4 The implementation of DEDA 125
4.2.2 DEDA vs. EDA 125
4.2.3 Directional Dependence of Hydrogen Bonding 126
4.3 Smeared Charge Multipole Model for Electrostatics
and Its Parameterization Protocol 129
4.3.1 Brief Summary of Current Electrostatic Models 129
4.3.2 Going Beyond Point Charges: The Smeared
Charge with Multipole Model 129
4.4 Examination and Parameterization of Interatomic
Potentials for Rare Gas Dimers 131
4.4.1 Van der Waals Descriptions by Atomic Force
Fields 131
4.4.2 DEDA and the Born–Mayer-D3 van der Waals
Model 133
4.5 Outlook 135

5 Effective Fragment Potential Method 147


Lyudmila V. Slipchenko
5.1 Introduction 147
5.2 Overview of the EFP Theory 148
5.3 Accuracy of the EFP Method for Describing
Intermolecular Interactions 151
5.4 Chemistry of Non-Covalent Interactions 153
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

viii Contents

5.4.1 Competition between H-Bonding, π–π


Bonding, and π–H Bonding 153
5.4.2 Many-Body Interactions in Mixed Systems 154
5.4.3 Role of Polarization Energy Increases from
Dimers to Bulk 156
5.4.4 Affinity of Ions to Hydrophobic Interfaces 159
5.5 QM/EFP Schemes 162
5.6 Excited State Chemistry in the Condensed Phase 168
5.6.1 Solvatochromic Shifts and Photodynamics of
Para-Nitroaniline 168
5.6.2 Thymine in Water 173
5.7 Technical Details and Implementation 176
5.8 Future Directions and Outlook 177

SECTION II ATOMISTIC MODELS

6 Explicit Inclusion of Induced Polarization in Atomistic Force


Fields Based on the Classical Drude Oscillator Model 191
Alexey Savelyev, Benoı̂t Roux, and Alexander D. MacKerell, Jr.
6.1 Introduction 191
6.2 Classification of Polarizable Models 193
6.2.1 Induced Dipole Models 193
6.2.2 Fluctuating Charge Models 195
6.2.3 Classical Drude Oscillator Model 196
6.2.4 Molecular Dynamics Simulations with the
Classical Drude Polarizable Model via an
Extended Lagrangian Integrator 202
6.3 Parametrization of the Drude Polarizable Force Field
in CHARMM 207
6.3.1 Optimization of Electrostatic Parameters 207
6.3.2 Optimization of Lennard–Jones and
Intramolecular Parameters 210
6.3.3 Optimization at the Macromolecular Level 212
6.4 Historical Overview of the CHARMM Drude
Polarizable Force Field for Small Molecules and
Biological Polymers 214
6.5 Conclusion 216
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

Contents ix

7 Multipolar Force Fields for Atomistic Simulations 233


Tristan Bereau and Markus Meuwly
7.1 Introduction 233
7.2 Describing Electrostatics in Atomistic Force Fields 234
7.2.1 Multipolar Interactions 235
7.2.2 Reference Axis Systems and Symmetries 239
7.2.3 Fluctuating and Conformationally Dependent
Multipoles 240
7.3 Examples of MTP Implementations 242
7.3.1 Discrete Multipoles 243
7.3.2 Gaussian Multipoles 245
7.4 Parametrization of MTPs 246
7.4.1 Distributed Multipole Analysis 247
7.4.2 ESP-Based Fitting Methods 248
7.5 Molecular Simulations with MTPs 249
7.5.1 Energy Conservation 249
7.5.2 Long-Range Electrostatics 250
7.5.3 Performance Issues 250
7.6 Applications 253
7.6.1 Spectroscopy 253
7.6.2 Free-Energy Calculations 256
7.6.3 Dynamical Properties 258
7.7 Conclusions and Outlook 261

8 Status of the Gaussian Electrostatic Model, a Density-Based


Polarizable Force Field 269
Jean-Philip Piquemal and G. Andrés Cisneros
8.1 Introduction 269
8.2 Density Fitting Methods 272
8.2.1 Analytical Fitting 272
8.2.2 Numerical Fitting 273
8.3 Distributed Multipoles 274
8.4 Reciprocal Space Methods for Integral Evaluation 276
8.5 The GEM and GEM* Force Fields 278
8.5.1 The GEM Functional Form 278
8.5.2 GEM*: molecular Dynamics with Fitted
Densities 280
8.6 Combining SIBFA and GEM: S/G–1 284
8.7 Conclusion and Perspective 290
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

x Contents

9 Water Models: Looking Forward by Looking Backward 301


Toshiko Ichiye
9.1 Introduction 301
9.2 Potential Energy Functions for Liquid Water 305
9.2.1 Multisite Models 306
9.2.2 Molecular Multipole Models 309
9.2.3 Summary 311
9.3 The Pure Liquid 312
9.3.1 The Water Molecule in the Liquid Phase 312
9.3.2 Liquid Water 315
9.3.3 Summary 320
9.4 Aqueous Solutions 320
9.4.1 Hydrophobic Solvation 321
9.4.2 Polar Solvation 324
9.4.3 Ionic Solvation 325
9.4.4 Summary 327
9.5 Conclusions 327

10 Quantum Mechanics–Based Polarizable Force Field for


Proteins 337
Changge Ji, Ye Mei, and John Z. H. Zhang
10.1 Fragment Quantum Chemistry Calculation of
Proteins 337
10.2 Protein Solvation 341
10.3 Polarized Protein-Specific Charge 344
10.4 Dynamically Adapted Hydrogen Bond Charge 347
10.5 Effective Polarizable Bond Method 350
10.6 Applications 354
10.6.1 Thermodynamics of Proton Binding in
Protein 354
10.6.2 Protein Ligand Binding 355
10.6.3 Protein Folding 356

11 Polarizable Continuum Models for (Bio)Molecular


Electrostatics: Basic Theory and Recent Developments for
Macromolecules and Simulations 363
John M. Herbert and Adrian W. Lange
11.1 Overview 363
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

Contents xi

11.2 Theoretical Background 367


11.2.1 Continuum Electrostatics 367
11.2.2 Practical Considerations 373
11.2.2.1 Matrix equations 373
11.2.2.2 Cavity construction and
discretization 375
11.2.2.3 Beyond electrostatics 377
11.3 New Models and Insights 378
11.3.1 Generalized Debye–Hückel Theory 378
11.3.1.1 Alternative derivation of
C-PCM/GCOSMO 379
11.3.1.2 DESMO and ion exclusion 379
11.3.2 Connections to Generalized Born Models 382
11.4 Advances in Algorithms 387
11.4.1 Intrinsically Smooth Discretization 388
11.4.2 Linear Scaling and Parallelization 392
11.4.2.1 Conjugate gradient solvers 394
11.4.2.2 Fast multipole method 396
11.4.2.3 Parallelization strategies 399
11.4.2.4 Surface construction strategies 402
11.4.2.5 Scalability tests 403
11.5 Summary and Future Directions 407

12 Differential Geometry-Based Solvation and Electrolyte


Transport Models for Biomolecular Modeling: A Review 417
Guo Wei Wei and Nathan A. Baker
12.1 Background 417
12.2 Differential Geometry-Based Solvation Models 419
12.2.1 Nonpolar Solvation Model 424
12.2.2 Incorporating Polar Solvation with a
Poisson—Boltzmann Model 427
12.2.3 Improving Poisson–Boltzmann Model
Charge Distributions with Quantum
Mechanics 430
12.3 Differential Geometry-Based Electrolyte Transport
Models 435
12.3.1 A Differential Geometry-Based
Poisson–Nernst–Planck Model 436
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

xii Contents

12.3.2 Quantum Mechanical Charge Distributions


in the Poisson–Nernst–Planck Model 441
12.4 Concluding Remarks 444

SECTION III COARSE-GRAINED MODELS

13 A Physics-Based Coarse-Grained Model with Electric


Multipoles 465
Guohui Li and Hujun Shen
13.1 Introduction 465
13.2 Model 471
13.2.1 GBEMP Energy Function 471
13.2.2 Gay–Berne Potential 472
13.2.3 Electric Multipole Potential 475
13.3 GBEMP Model for Molecular Solvents 476
13.4 GBEMP Model for Proteins 480
13.5 Summary 486

14 Coarsed-Grained Membrane Force Field Based on


Gay–Berne Potential and Electric Multipoles 495
Dejun Lin and Alan Grossfield
14.1 Introduction 495
14.2 GBEMP: A Coarse-Grained Model Based on the
Gay–Berne Potential and Electric Multipoles 498
14.3 Application of the GBEMP Model to Lipid
Membranes 499
14.3.1 Group Neighboring Heavy Atoms into CG
Particles 500
14.3.2 Derive Initial Parameters from Gas-Phase
Calculations 502
14.3.3 Validate and Adjust Parameters by
Liquid-Phase Simulations 504
14.4 Implement the GBEMP Force Field in LAMMPS 505
14.5 Discussion 506

15 RNA Coarse-Grained Model Theory 515


David Bell and Pengyu Ren
15.1 Introduction 516
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

Contents xiii

15.2 Primary and Secondary Structure 517


15.3 Three-Dimensional Structure 518
15.4 Fragment Library–Based Models 521
15.5 Coarse-Grained Force field 523
15.6 Physics-Based Models 525
15.7 Conclusion 529

16 Perspectives on the Coarse-Grained Models of DNA 535


Ignacia Echeverria and Garegin A. Papoian
16.1 Introduction 535
16.2 Methods 539
16.2.1 Model 1: One-Bead Double-Stranded DNA
Model by molecular Renormalization
Group Coarse-Graining 541
16.2.2 Model 2: Three-Collinear Bead DNA Model
for Applications in Nanotechnology 546
16.2.3 Model 3: Three-Bead DNA Model to
Reproduce Melting Temperatures 550
16.2.4 Other Models 555
16.3 Results 558
16.3.1 Reproducing DNA’s Structural Properties
from CG Models 558
16.3.2 Reproducing DNA’s Thermodynamic
Properties from CG Models 560
16.3.3 Example 1: Salt-Dependent Buckling of
Circular DNA Molecules 560
16.3.4 Example 2: Obtaining the Hybridization
Rate Constants 561
16.3.5 Example 3: Toehold-Mediated DNA Strand
Displacement 562
16.3.6 Modeling of Chromatin 563
16.4 Conclusions and Outlook 564

Index 571
This page intentionally left blank
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

Preface

As computational hardware continues to develop at a rapid pace,


quantitative computations are playing increasingly important roles
in studying biomolecular systems. One of the eminent challenges
that the field faces is to develop the next generation of computational
models that strike the proper balance of computational efficiency
and accuracy, so that problems of increasing complexity can be
tackled in a systematic and physically robust manner. In particular,
properly treating intermolecular interactions is fundamentally
important to the reliability of all computational studies. In this book,
we have invited leading experts in the area of biomolecular simula-
tions to discuss cutting-edge ideas regarding effective strategies to
describe many-body effects and electrostatics at quantum, classical
and coarse-grained levels.
The first section covers recent developments of quantum
mechanical (QM) models for biomolecular applications. We start
with two chapters that discuss quantum mechanics–based force
fields, i.e., linear-scaling quantum mechanical models that divide
a macromolecule into smaller QM fragments that interact with
each other through approximate intermolecular forces. The model
of York and co-workers is built on an approximate density
functional tight binding model for the fragments but multipolar
interactions between the fragments. The Xpol framework of Gao and
Truhlar is general with respect to the QM level and inter-fragment
interactions can be treated with different levels of approximations.
The next few chapters discuss the physical origins of intermolecular
interactions using different energy decomposition schemes, and
the insights provide guidance to the development of the next
generation of force field models. Sherrill and Merz discuss the
symmetry adapted perturbation theory in great depth, while Zhang
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

xvi Preface

examines intermolecular interactions using a different density-


based energy decomposition scheme; Slipchenko reviews the recent
developments of the effective fragment potential approach, in which
all terms have clear physical connection to an underlying QM model.
The second section focuses on recent advances in atomistic
force fields for biomolecules. Mackerell and Roux summarize
the recent development of polarizable force fields, especially the
Drude-oscillator based model. This is complemented by chapters
by Meuwly et al., and Cisneros and Piquemal, who discuss the
importance of multipole-based electrostatic models in development
of highly accurate atomic force fields. These discussions are then
followed by the contribution of Ichiye, who clearly highlights the
importance of multipoles in describing water, arguably the most
important molecule of all time. Zhang and co-workers then discuss
the treatment of polarization using a framework that couples linear-
scaling QM calculations with classical simulations. Finally, we have
two chapters on the treatment of solvent using a continuum model;
although this topic has a long history, progress is needed to make
such models numerically robust and efficient for very large solutes,
treated either quantum mechanically or classically. Herbert et al.
and Wei and Baker have attacked these issues from complementary
angles.
In the final section, we have several chapters that discuss
the treatment of electrostatics and many-body effects in the
context of coarse-grained (CG) models. CG models are required to
extend the temporal and spatial scales of molecular simulations,
although the development of physically robust and transferrable
CG models represents a major challenge as well. The chapters
of Li, Grossfield and Ren emphasize the importance of including
multipolar effects in developing CG models for proteins, lipids and
RNA systems, respectively. The contribution from Papoian and co-
workers nicely demonstrates how electrostatic interactions can be
treated effectively at the CG level for highly charged systems such as
DNA and protein-DNA complexes.
With this book, our goal is to not only provide an up-to-
date snapshot of the current molecular simulation field but also
stimulate the exchange of ideas across different sub-fields of
modern computational (bio)chemistry. We hope that the book will
February 11, 2016 11:21 PSP Book - 9in x 6in 00-Qiang-Cui-prelims

Preface xvii

become a broadly adopted reference for the biomolecular simulation


community and help attract talented young students into this
exciting frontier of research.

Qiang Cui
Markus Meuwly
Pengyu Ren
This page intentionally left blank
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

SECTION I

QM AND QM/MM METHODS


This page intentionally left blank
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Chapter 1

A Modified Divide-and-Conquer
Linear-Scaling Quantum Force Field with
Multipolar Charge Densities

Timothy J. Giese and Darrin M. York


Center for Integrative Proteomics Research, BioMaPS Institute for Quantitative Biology
and Department of Chemistry and Chemical Biology, Rutgers University,
Piscataway, NJ 08854-8087, USA
[email protected]

1.1 Introduction

Recent advances in biomolecular modeling have emphasized the


importance of inclusion of explicit electronic polarizabilty, and
a description of electrostatic interactions that includes atomic
multipoles; however, these additional levels of treatment necessarily
increase a model’s computational cost. Ultimately, the decision as to
whether inclusion of these more rigorous levels are justified rests
on the degree to which they impact the specific application areas
of interest, balanced with the overhead of their computational cost.
The purpose of this book is to stimulate the exchange of effective

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

4 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

strategies used to describe many-body effects and electrostatics


across the quantum, classical, and coarse-grained modeling regimes.
In this chapter, we describe a linear-scaling quantum force
field based on a modified divide-and-conquer (mDC) procedure
and discuss the practical consequence of including (or exclud-
ing) multipolar electrostatic interactions with a few illustrative
examples. These observations are then used to rationalize some
of the hydrogen bond geometries produced by other models,
including the standard DFTB3 semiempirical Hamiltonian, which
include multipoles within its tight-binding matrix elements but
limit the second-order electrostatic interactions to monopoles.
Furthermore, we assess the ability of a recent mDC parametrization
to reproduce nucleobase dimer binding energies relative to high-
level ab initio calculations and we compare nucleobase trimer
formation enthalpies to experimental estimates.
The description of the mDC method in the present work
is supplemented with mathematical details that we Have used
to introduce multipolar densities efficiently into the model. In
particular, we describe the mathematics needed to construct atomic
multipole expansions from atomic orbitals (AOs) and interact the
expansions with point-multipole and Gaussian-multipole functions.
With that goal, we present the key elements required to use the
spherical tensor gradient operator (STGO) and the real-valued
solid harmonics; perform multipole translations for use in the Fast
Multipole Method (FMM); electrostatically interact point-multipole
expansions; interact Gaussian-multipoles in a manner suitable for
real-space Particle Mesh Ewald (PME) corrections; and we list the
relevant real-valued spherical harmonic Gaunt coefficients for the
expansion of AO product densities into atom-centered multipoles.
Section 1.2 discusses the obstacles encountered in producing
a linear-scaling quantum force field and the methods used to
overcome them. The linear-scaling quantum force field energy is
described in Section 1.3. Section 1.4 discusses the consequences
of including higher-order multipoles into the model and assess the
quality of the mDC method in reproducing nucleobase interactions.
The mathematical details used in the mDC model are collected into a
series of small appendices at the end of the chapter (Sections 1.6.1-
1.6.5) to facilitate the narrative.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Linear-Scaling Quantum Force Fields 5

1.2 Linear-Scaling Quantum Force Fields

Conventional ab initio electronic structure methods have compu-


tational/memory requirements that scale non-linearly (typically
N 3 /N 2 or higher) with number of particles. This restricts the size
of the systems that these methods can be applied. There is a rich
literature associated with the development of electronic structure
methods that scale “linearly” with system size, both at the ab
initio and semiempirical levels, that allow them to be extended to
very large systems [22]. These methods have traditionally involved
introduction of carefully chosen approximations that allow re-
formulation of the equations so that computation can be achieved
with computational cost and memory requirements that increase
in linear proportion to the number of particles and size of the
system. By adjustment of control parameters, these methods can be
made to systematically converge to the full non-linear scaling result.
The simplest and most widely applied linear-scaling electronic
structure methods are based on single-determinant wave function
methods such as Hartree–Fock Kohn–Sham density-functional
theory or semiempirical/tight-binding models. With these classes
of methods, the most critical challenge involves circumventing the
need for a globally orthonormal set of molecular orbitals (MOs) or,
equivalently, an exactly idempotent single-particle density-matrix.
A “linear-scaling quantum force field” is a model that abandons
the goal of being able to recover the full nonlinear quantum result,
but instead takes recourse into additional layers of empiricism to
achieve much greater efficiency and even higher accuracy. Typically,
these force fields invoke a construct whereby a large system is
divided into predetermined localized fragments (or residues), and
different models may be employed for intra- and inter-residue in-
teractions. One strategy has been to develop electron density-based
quantum force fields [8, 34] that do not require the construction or
orthogonalization of molecular orbitals (MOs). While this class of
force fields has demonstrated considerable promise for molecular
simulations, it has limitations in its ability to model reactive
chemical processes involving formation and cleavage of chemical
bonds. A different strategy, which we have taken here, involves using
localized MOs to describe intra-residue interactions, and empirical
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

6 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

density-based models to describe inter-residue interactions. This


framework borrows ideas from “orbital-free” density-functional
methods [48, 49, 56], but with the added element that non-bonded
inter-residue interactions are much simpler and empirically fine-
tuned to obtain accuracy required for biological applications. There
are a number of recent models that have used this strategy. The X-
Pol method replaces the explicit inter-residue orbital coupling with
empirical Lennard–Jones or Buckingham potentials [9, 14, 26, 45,
50–52, 55] or through perturbative corrections [7, 27]. In Ref. [16],
we used a charge-dependent density-overlap van der Waals model
as means of combining the density- and MO-based quantum force
field strategies.
All quantum force fields, regardless of the specific form of the
intra- and inter-residue interactions, involve long-range electrosta-
tic interactions that must be computed with linear-scaling methods.
The two most common linear-scaling methods for electrostatic
interactions are tree codes and FMMs [2, 20, 23, 46] for non-periodic
systems, and linear-scaling Ewald methods such as PME [10, 12, 36]
for periodic systems. In brief, FMM is founded upon the physical
interpretation of the Laplace expansion of the Coulomb kernel
[Eqs. (1.36)–(1.37)], i.e., the Taylor series expansion of 1/r: If each of
two charge densities are circumscribed by non-intersecting spheres,
then the Coulomb interaction between the two densities can be
computed from a single point-multipole interaction between the
sphere centers. Linear-scaling is achieved by introducing hierarchy,
i.e., the system is recursively divided, the multipole moments of a
region are computed from the moments of its children, and the
electrostatic interaction is performed at the most “ancient level”
possible. Linear-scaling Ewald methods, on the other hand, split
the electrostatic interactions into a short-ranged “direct-space”
[see e.g., Eq. (1.62) and surrounding discussion] and long-ranged
“reciprocal-space” components, the former which can be computed
using a distance cut-off, and the latter computed efficiently with
O[N log(N)] computational scaling using Fast Fourier Transforms.
In the Methods section that follows, we describe an mDC method
that is based on the DFTB3 Hamiltonian [15] and uses a simple
Lennard–Jones model for the non-electrostatic non-bonded inter-
actions between residues. However, unlike the DFTB3 Hamiltonian,
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Methods 7

which uses a monopole approximation in its treatment of second-


order electrostatics, the mDC model employs atomic multipoles to
compute the inter-region interactions. As demonstrated in the Re-
sults and Discussion section, the use of a multipolar representation
for electrostatics is key for obtaining robust, hydrogen bond angles,
and hydrogen bond and base stacking interactions for nucleobase
dimers and trimers. The appendices contain further key technical
details that are needed for implementation of the mDC method with
linear-scaling electrostatic methods such as FMM and PME.

1.3 Methods

1.3.1 The Modified Divide-and-Conquer Method


The mDC total energy is a sum of fragment ab initio energies
E A which we compute using the DFTB3 Hamiltonian [15]; the
inter-fragment multipolar electrostatics (second term); the inter-
fragment Lennard–Jones (LJ) interactions (third term); and MM
bond energies E bonded (fourth term) for those bonds, angles, and
torsions that cross fragment boundaries
 1
E = E A (CσA ; R A ) + qlμ plμ
A
2 a

lμ∈a
(1.1)
+ E LJ (Rab ) + E bonded (R).
b>a

CσA are the σ -spin MO coefficients for the A’th fragment, and R A are
the nuclear positions of the atoms in fragment A.

qlμ∈a = Z a δl0 δμ0 − ρa (r)C lμ (r − Ra )d 3r (1.2)

are atomic multipole moments on atom a, C lμ (r) is a real regular


solid harmonic [Eq. (1.43)], ρa (r) is an atom-partitioned density, Z a
is a nuclear charge, and
 C lμ (∇a ) C j κ (∇b ) 1
plμ∈a = qjκ (1.3)
b=a
(2l − 1)!! (2 j − 1)!! Rab
j κ∈b

is a “multipolar potential,” i.e., the derivative of the interaction


with respect to a multipole moment. The primed summations
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

8 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

indicate that intrafragment electrostatics are excluded because


those Coulomb interactions are already considered in the ab initio
calculation of E A . C lμ (∇a ) is a real-valued STGO acting on the
coordinates of atom a (see Sec. 1.6.1). The expressions used to
evaluate Eq. (1.3) are provided in Sec. 1.6.4. E bonded includes
corrections for those bonds, angles, and dihedrals that cross the
boundary between two covalently bonded fragments; however, the
present work will consider nonbonded interactions exclusively.
As discussed in the previous section, the relaxation of the
orthonormality constraints allows one to solve for the MO coeffi-
cients through a series of small generalized eigenvalue problems
(proportional to the size of a fragment)
FσA · CσA = S A · CσA · EσA . (1.4)
The inter-fragment coupling occurs through the interaction of their
atomic multipoles which are determined from the fragment electron
densities within the self-consistent-field (SCF) procedure. The σ -
spin Fock matrix for region A with inclusion of this coupling is
 
σ ∂ E A   ∂qlμ 
F A, i j =  + plμ  , (1.5)
∂Pσ  ∂Pσ 
A, i j q, p, R a∈A A, i j p, R
lμ∈a

where

σ
P A, ij = nσA, k C σA, i k C σA, j k (1.6)
k
is the spin-resolved AO density matrix of fragment A, and nσA, k is the
occupation number of σ -spin orbital k in fragment A.
The atomic multipoles are computed from the DFTB3 density
matrix

ρa (r) = P A, i j χi (r)χ j (r)
i j ∈a
  (1.7)
+ fab (bab ) P A, i j χi (r)χ j (r)
b=a i ∈a
j ∈b

where χi (r) = χi (r)Yli μi () is an AO basis function, Ylμ () is a real-


valued spherical harmonic [Eq. (1.45)],
 s 
bab − bab 
fab (bab ) = fab + Son
s d
fab − fab
s
(1.8)
bab − bab
d s

is a fraction between 0 and 1 and holding the property fab = 1 − fba ,


February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Methods 9


bab = 2 P A, i j S A, i j (1.9)
i ∈a
j ∈b

is a Mulliken bond-order, and




⎪ if x < 0
⎨0
Son (x) = 1 if x > 1 (1.10)


⎩10x 3 − 15x 4 + 6x 5 otherwise
s d
is a smooth polynomial used to switch fab from fab to fab as the bond
order increases.
The atomic multipoles are obtained by inserting Eq. (1.7) into
Eq. (1.2). If we restrict the contributions of the two-center densities
to charge only, then the charge on atom a is

q00 = Z a − baa /2 − fab (bab )bab (1.11)
b∈A
b=a

and its higher-order multipole moments are



 4π
A (l)
qlm = Pi j Mi j Ylμ ()Yli μi ()Yl j μ j ()d, (1.12)
i j ∈a
2l + 1

where the integral is a real-valued spherical harmonic Gaunt


coefficient (Sec. 1.6.5) and the
 ∞
(l)
Mi j = χi (r)χ j (r)r 2+l dr (1.13)
0

are treated as a parameters. For an sp-basis, there are two


(1)
parameters: Msp and M(2)
pp , which control the magnitude of the
dipole and quadrupole contributions, respectively. We restrict the
(2) (1) (2)
Msd , Mpd , and Mdd parameters encountered in an spd-basis to the
values of M(2) (1) (2)
pp , Msp , and M pp , respectively.

1.3.2 Models
The different models compared and discussed in this paper include:
mDC: The method described in the previous section and parame-
trized to the S22 [29], S66 [41], JSCH [28, 29, 43], SCAI [5] databases
and to a database of sulfur containing molecules and water clusters.
The description of the parametrization procedure and a detailed and
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

10 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

Table 1.1 Average molecular dipole and


quadrupole moment errors of 10 nucle-
obase monomers. All values in a.u..

Dipole Quadrupole
mDC 0.07 1.22
DFTB3 0.21 3.26
GAFF 0.25 2.29

broad analysis of mDC performance are presented in a manuscript


that is, at the time of this writing, in press.
mDC(q): The modified “charge-only” model described in Ref. [16].
This model does not expand the atomic densities to higher-order
multipoles.
DFTB3: The 3ob parametrized version of DFTB3 [15], i.e., DFTB3-
3ob.
GAFF/TIP3P: The general Amber force field [6, 44] and TIP3P water.
PM6: The semiempirical method described in Ref. [38], as imple-
mented in Gaussian 09 [13].
PM3BP : The semiempirical method described in Ref. [17].
mPWPW91, M062X, and B97D: The mPWPW91/MIDI!, M062X/6-
311++G**, and B97D/6-311++G** density functional methods im-
plemented in Gaussian 09 [13]. The B97D model contains empirical
long-range dispersion corrections [24].

1.3.3 Computational Details


Table 1.1 displays molecular dipole and quadrupole moment errors
averaged over 10 nucleobases. The reference molecular moments
where computed with B3LYP/6-311++G**. An error of these vector
quantities is taken to be the magnitude of the difference vector
between the model and reference moments. The average magnitude
of the reference dipole and quadrupole moment vectors are 1.98 and
14.58 a.u., respectively.
The reference energies and geometries used in Table 1.2
were computed with counterpoise-corrected CCSD(T)/CBS//MP2/
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Methods 11

Table 1.2 Nucleobase dimer binding energies (kcal/mol), binding


energy statistics, and a summary of geometrical errors.

Ref. mDC DFTB3 GAFF PM6 M062X B97D


H-bonded dimers
AT wc −16.86 −15.99 −8.90 −13.60 −8.90 −14.87 −15.56
GC wc −32.06 −32.50 −21.92 −26.15 −18.47 −28.22 −28.11
GA 1 −19.40 −18.36 −11.32 −15.70 −10.13 −17.09 −18.14
GA 2 −14.40 −14.11 −5.86 −12.40 −6.45 −11.23 −12.57
GA 3 −18.80 −17.90 −9.60 −15.39 −9.18 −16.09 −16.83
GA 4 −13.50 −14.43 −6.76 −12.85 −7.42 −12.07 −13.48
GA 1 pl −18.90 −17.98 −11.15 −11.20 −10.13 −17.09 −18.14
GA 2 pl −12.80 −13.28 −5.37 −8.20 −6.11 −11.22 −12.58


E mue ··· 0.73 8.23 3.90 8.74 2.36 1.41

E mse ··· 0.27 8.23 3.90 8.74 2.36 1.41
crms (Å) ··· 0.13 0.16 0.36 0.29 0.19 0.18
∠plane (◦ ) ··· 4.91 13.50 17.48 12.00 10.62 9.66

Stacked dimers
AT S1 −12.30 −13.34 −9.11 −13.20 −5.27 −13.86 −12.25
mAmT S −14.57 −15.29 −9.17 −14.66 −5.99 −16.46 −14.89
GC S −19.02 −18.62 −21.92 −26.15 −18.47 −28.25 −28.11
mGmC S −20.35 −20.02 −21.99 −22.01 −18.28 −27.94 −27.99


E mue ··· 0.62 3.28 2.44 4.56 5.07 4.27

E mse ··· −0.26 1.01 −2.44 4.56 −5.07 −4.25
crms (Å) ··· 0.28 2.70 0.80 1.56 1.41 1.42
∠plane (◦ ) ··· 8.79 10.29 15.48 10.56 6.38 5.23

Combined statistics

E mue ··· 0.70 6.58 3.42 7.35 3.26 2.37

E mse ··· 0.10 5.83 1.79 7.35 −0.12 −0.47
crms (Å) ··· 0.18 1.00 0.51 0.71 0.60 0.59
∠plane (◦ ) ··· 6.20 12.43 16.81 11.52 9.21 8.18

TZVPP or MP2/CBS//MP2/cc-pVTZ, which were taken from


Refs. [28] and [43], whose naming convention we adopt. “crms” is
the average coordinate root mean square deviation of the optimized
dimer geometry relative to the reference geometry. All atoms were
included in the calculation of the crms. The row of angle mean signed
errors was constructed by comparing the angles formed between
the two planes of the nucleobases relative those in the reference
geometry.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

12 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

Table 1.3 Hydrogen bond length (Å) and angle (◦ ) errors.

mDC DFTB3 GAFF PM6 M062X B97D


RN(-H)· · · O mse −0.02 0.02 −0.01 0.07 0.04 0.01
R (N-)H· · · O mse −0.01 0.01 −0.00 0.07 0.04 −0.00
∠N-H· · · O mue 1.44 2.17 4.71 3.88 1.75 1.30

RN(-H)· · · N mse −0.10 0.08 −0.00 0.19 0.01 −0.03


R (N-)H· · · N mse −0.10 0.08 0.02 0.21 0.01 −0.04
∠N-H· · · N mue 2.93 1.97 6.69 9.40 1.59 2.20

Table 1.4 Nucleobase trimer formation enthalpies (kcal/


mol). Brackets represent a Boltzmann averaging of the
conformations shown above it at 298K.

Expt. mDC DFTB3 mPWPW91 PM3BP


UUA 1 ··· 28.3 15.3 21.0 25.2
UUA 2 ··· 28.0 15.1 21.4 25.1
UUA 3 ··· 24.7 16.1 17.0 20.5
UUA 4 ··· 26.7 17.5 17.4 20.6
UUA 27−29 28.1 17.3 21.3 25.2

UUU 1 ··· 26.5 16.1 8.5 13.1


UUU 2 ··· 21.6 16.5 11.3 14.6
UUU 20−22 26.5 16.4 11.3 14.5

UUT 23−25 21.6 15.6 7.1 12.7


CCC 4 33−38 34.0 20.0 22.0 28.9

The geometrical errors shown in Table 1.3 include 6 N-H· · · O


bond lengths and angles and 11 N-H· · · N bond lengths and angles.
R X(-H)· · · Y and R(X-)H· · · Y denote the bond length between X-Y and H-Y,
respectively.
The experimental numbers appearing in Table 1.4 are taken from
Ref. [53]. The mPWPW91/MIDI! and PM3BP results are taken from
Ref. [17], whose naming convention we adopt. The mDC, DFTB3, and
mPWPW91 results include zero point and thermal corrections to the
enthalpy at 298K using standard ideal-gas statistical mechanics and
the rigid-rotor harmonic-oscillator approximation. The presence of
two experimental numbers represents the two manners used to
analyze the results in Ref. [53].
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Results and Discussion 13

1.4 Results and Discussion

In addition to providing the mathematical details of how atomic


multipoles are incorporated into the mDC model, we wish to explain
with some illustrative examples why we consider them necessary
and then use these examples to interpret some observations made
using the standard DFTB3 semiempirical Hamiltonian.
The DFTB3 Hamiltonian contains four components to the energy:

(1) the MO-computed tight-binding interaction composed of the


electron kinetic energy and the first-order interaction of the
response density with the effective chemical potential caused by
the neutral atom reference density, as modeled by a 1- or 2-body
approximation,
(2) the second-order electrostatic interaction of the response den-
sity with itself, which has been parametrized to experimental
hardness so as to effectively include nonclassical effects,
(3) a short-range repulsive function to achieve good covalent bond
lengths, and
(4) a third-order response interaction which attempts to correct the
second-order electrostatics to account for the fact that anionic
electron densities should be more diffuse than the neutrals.
The DFTB3 electrostatic interactions are computed from atomic
charges only, even though the orbitals used to compute the first-
order interactions contain higher-order multipoles.

When we built the mDC method upon the DFTB3 Hamiltonian, we


were thus faced with the choice of computing the inter-fragment
interactions using the atomic charges that DFTB3 happens to use
or construct our own representation of the charge density from
the DFTB3 density matrix. Preliminary tests of a method using the
DFTB3 charges, mDC(q), proved unsatisfactory upon examining the
geometries of hydrogen bonded (H-bonded) clusters (see Fig. 1.1).
One of our goals was to make sure that the mDC method was
at least as good as DFTB3, but the H-bond angles produced by
mDC(q) model were more similar to the TIP3P model than to
either ab initio calculations or DFTB3. Considering that both DFTB3
and mDC(q) use atomic charges to compute the second-order
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

14 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

Figure 1.1 Optimized water dimer hydrogen bond angles.

Figure 1.2 Hydrogen bond angle of water to asparagine at N· · · O


separations of 3 and 4 Å.

electrostatics, we were left to hypothesize that it was the multipolar


character in DFTB3’s tight-binding matrix elements that caused it
to achieve good H-bond angles. The inter-fragment tight-binding
matrix elements are removed in the mDC model, so we chose to
model the behavior by increasing the order of atomic multipoles
used to compute the electrostatics [16]. The resulting method, mDC,
yields water H-bond angles in good agreement with DFTB3.
This hypothesis is further supported upon considering the H-
bond formed between water and the amine group of asparagine (see
Fig. 1.2). Ab initio geometry optimizations produce a water that is
angled relative to the plane of asparagine’s amine group, whereas
TIP3P water is consistently coplanar. We interpret this observation
as resulting from TIP3P’s lack of higher-order multipoles. DFTB3
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Results and Discussion 15

produces an angled water near the energy minimum, but when the
water is pulled away from the amine group, it reverts to a coplanar
TIP3P-like structure. In other words, the DFTB3 geometries agree
with ab initio when there is significant AO overlap between the
molecules, but acts like a point-charge model when the overlap is
small. This is consistent with the above hypothesis and suggests
that one could improve DFTB3 by extending its second-order
electrostatic interactions to include higher-order multipoles. The
mDC model uses higher-order multipoles without making intrusive
changes to the underlying DFTB3 Hamiltonian and produces H-bond
angles in better agreement with ab initio for all separations.
The use of atomic multipoles improves mDC’s description of
electrostatic potentials. Upon comparing the electrostatic potentials
generated by mDC and DFTB3, we’ve found that the most significant
improvements occur in molecules containing π-bonds, sp3 oxygen
and sulfur lone pair electrons, and sp2 nitrogen lone pairs. In
comparison to DFTB3 and GAFF, mDC also shows an overall
statistical improvement in the molecular dipole and quadrupole
moments (see e.g., Table 1.1).
The above assessment of mDC focused on examples that highlight
the influence of including higher-order multipoles. We now assess
the quality of mDC H-bonded and stacked nucleobase interactions
and make comparison to other commonly used methods. There
are many small variations and parametrizations of semiempirical
models [1, 3, 17, 18, 30–32, 35, 37, 39, 40, 42, 54], but for brevity
we limit our comparison to those which have seen widespread
use and implementation into common software packages. Firstly,
mDC produces the smallest energetic and geometrical errors of any
method in Table 1.2. Generally speaking, the high-level reference
binding energies are much stronger than those of the predicted by
the standard semiempirical models. The GAFF force field energies
are better than the other semiempirical methods and often prevents
the stacked dimers from devolving into H-bonded complexes. The
DFTB3 method reproduce H-bonded geometries more accurately
than GAFF even though GAFF’s H-bonded energetic errors are nearly
twice as small. The ab initio H-bond interactions are superior to
those of the semiempirical models, but do not show a significant
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

16 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

Figure 1.3 mGmC S stacked dimer (left) and GA 4 hydrogen bonded dimer
(right) coordinate root mean square overlay of the mDC structure (ball-
and-stick, colored) on to the reference structures (lines, black). These two
dimers are the worst mDC structures in the set of molecules listed in
Table 1.2.

improvement for stacked interactions. This is, in part, due to the “de-
stacking” of some dimers upon geometry optimization.
Our primary measure of quantifying geometrical errors is
through the coordinate root mean square overlays (crms). The ∠plane
errors measure the angle formed between the vectors normal to
the plane of the two bases, which is computed from diagonalizing
their moment of inertia tensors. We place greater meaning to the H-
bonded ∠plane errors than we do for the stacked dimers because the
angle in a stacked dimer is approximately zero, but if the geometry
optimization de-stacks the structure, then the angle within the
resulting (incorrect) H-bonded structure is also approximately zero.
Table 1.3 compares the H-bond distance and angle errors. We
note that the mDC N-N distances are 0.1Å too short, which may
explain why mDC was capable of reproducing the high-level dimer
interaction energies. The mDC errors listed in Table 1.3 and Fig. 1.3
are not disturbing considering that the overall errors in the mDC
geometries are significantly better than the other methods.
Table 1.4 compares the experimental trimer enthalpies of
formation to mDC, mPWPW91, and PM3BP . mDC is in much better
agreement with the experimental results than the other methods,
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Conclusion 17

Figure 1.4 UUA 1 trimer coordinate root mean square overlay of the mDC
structure (ball-and-stick, colored) on to the mPWPW91 structure (lines,
black). Hydrogen bond lengths are listed in Å.

which underpredict the strength of the H-bonds in the trimer. We


suspect that the added strength afforded by mDC is largely a result
of the slightly reduced N-N distances, as seen in Table 1.3; however,
comparison between the mPWPW91 and mDC geometries show an
overall agreement in geometries (see e.g., Fig. 1.4.)

1.5 Conclusion

This chapter has sought to stimulate the exchange of effective


strategies used to describe many-body effects and electrostatics
within the context of a linear-scaling quantum force field. In
particular, we’ve provided the mathematical details required to
implement the multipolar densities used in the mDC model and
highlighted the importance of using multipoles in our method with
some illustrative examples.
We observe that the water dimer H-bond angles are reproduced
when higher-order multipoles are included, whereas a charge-
only model causes the dimer to revert into a TIP3P-like structure.
Furthermore, we observe that standard DFTB3 H-bond angles are
quite good when there is significant overlap between the AOs of two
molecules, but it too can revert to a TIP3P like structure when the
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

18 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

AO overlap is small. We attribute this phenomenon to DFTB3’s use of


multipoles in the AO tight-binding matrix and not the second-order
electrostatics.
In addition, we provided a brief comparison between nucleobase
dimer and trimer binding energies and geometries as computed
with mDC, other semiempirical models, a molecular mechanical
force field, and several ab initio methods. mDC was shown to
reproduce the high-level ab initio and experimental results with the
greatest accuracy.
Further tests with the mDC model will be necessary to fully
realize the benefits of a linear-scaling quantum force field. We are
currently implementing a generalized PME method for condensed
phase calculations using our treatment of atomic multipoles (further
details of which are described in the appendices). Incorporation
of mDC and the generalized PME method is ongoing and will be
described in more detail in future work.

1.6 Appendices

1.6.1 Complex Harmonics and the Spherical Tensor


Gradient Operator
The complex spherical harmonic Ylm () is related to the associated
Legendre polynomial Plm (x) by


2l + 1 (l − m)!
Ylm () = (−1)m Plm (cos θ )ei mφ (1.14)
4π (l + m)!
 m
d
Plm (x) = (1 − x )
2 m/2
Pl (x) (1.15)
dx
 l
1 d
Pl (x) = l (x 2 − 1)l , (1.16)
2 l! dx

where Pl (x) is a Legendre polynomial. The complex-valued regular


C lm (r) and irregular Z lm (r) solid harmonics and the complex-valued
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Appendices 19

scaled regular Rlm (r) and irregular Ilm (r) solid harmonics are


C lm (r) = r l
Ylm () (1.17)
2l + 1


Z lm (r) = r −l−1 Ylm () (1.18)
2l + 1
Rlm (r) = C lm (r)/alm (1.19)
Ilm (r) = alm Z lm (r) (1.20)

alm = (l + m)!(l − m)! (1.21)
The spherical tensor gradient operator (STGO) is a solid harmonic
whose Cartesian coordinate arguments have been replaced by
Cartesian derivatives. Hobson’s theorem [25] is the result of acting a
STGO upon any spherical function f (r 2 )
 
d l
C lm (∇) f (r 2 ) = 2l C lm (r) f (r 2 ) (1.22)
dr 2
 
d l
Rlm (∇) f (r ) = 2 Rlm (r)
2 l
f (r 2 ). (1.23)
dr 2
The STGO obeys the product rule [11]
 alm
C lm (∇) [ f (r)g(r)] =
a j k al− j, m−k
jk (1.24)
  
× C l− j, m−k (∇) f (r) C j k (∇)g(r)
  
Rlm (∇) [ f (r)g(r)] = Rl− j, m−k (∇) f (r) R j k (∇)g(r) (1.25)
jk

And when acted upon another solid harmonic, one obtains the
following STGO differentiation rules [4, 47]
(2 j − 1)!!alm
C j k (∇)C lm (r) = (−1)k C l− j, m+k (r) (1.26)
a j k al− j, m+k

(2 j − 1)!!alm
C ∗j k (∇)C lm (r) = C l− j, m−k (r) (1.27)
a j k al− j, m−k

(2 j − 1)!!al+ j, m+k
C j k (∇)Z lm (r) = (−1) j Z l+ j, m+k (r) (1.28)
alm a j k

(2 j − 1)!!al+ j, m−k
C ∗j k (∇)Z lm (r) = (−1) j +k Z l+ j, m−k (r) (1.29)
alm a j k
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

20 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

a2j k
R j k (∇)Rlm (r) = (−1)k Rl− j, m+k (r) (1.30)
(2 j − 1)!!

a2j k
R ∗j k (∇)Rlm (r) = Rl− j, m−k (r) (1.31)
(2 j − 1)!!

a2j k
R j k (∇)Ilm (r) = (−1) j Il+ j, m+k (r) (1.32)
(2 j − 1)!!

a2j k
R ∗j k (∇)Ilm (r) = (−1) j +k Il+ j, m−k (r)
(2 j − 1)!! (1.33)
= (−1) j +m Il+

j, k−m (r).

The utility of the above rules are numerous; however, the reader may
gain a better appreciation upon considering two brief examples. We
can express the translation of a regular or irregular solid harmonic
with a Taylor series expansion.

C lm (r + a) = ea ·∇ C lm (r)
 C j k (a)
= C ∗j k (∇)C lm (r)
jk
(2 j − 1)!! (1.34)
 alm
= C l− j, m−k (r)C j k (a)
jk
a j k al− j, m−k

The second line made use of the fact that ∇ 2 C lm (r) = 0, and the
third line used Eq. (1.27). This result is known as the addition
theorem of solid harmonics. Applying this same procedure to the
other harmonics produces

Rlm (r + a) = Rl− j, m−k (r)R j k (a) (1.35)
jk
 al− j, k−m
Z lm (r + a) = (−1) j +m C j k (a)Z l+

j, k−m (r) (1.36)
jk
alm a j k

Ilm (r + a) = (−1) j +m R j k (a)Il+

j, k−m (r). (1.37)
jk

For the special case l = m = 0 and a = −r , Eqs. (1.36)–(1.37) are


known as the Laplace expansion.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Appendices 21

1.6.2 Real-Valued Harmonics


The scaled solid harmonics are decomposed into their real (c) and
imaginary (s) components Rlm (r) = Rlm c
(r) + i Rlm
s
(r) from which
one defines the real-valued scaled solid harmonics

c
Rl|m| (r), μ ≥ 0
Rlμ (r) = (1.38)
Rl|m| (r), μ < 0,
s

where a negative μ represents the sine-component of positive m and


is used only to simplify notation where appropriate. When −m’s
are encountered in formula, the reader is implicitly instructed to
c/s c/s
apply the symmetry property Rl, −m (r) = ±(−1)m Rlm (r), which

follows directly from Rlm (r) = (−1) Rl, −m (r), where the sign ±
m

corresponds to the cosine/sine designation. The solid harmonics are


efficiently computed from recursion [46]
c/s s/c
x Rm−1, m−1 (r) ∓ y Rm−1, m−1 (r) (1.39)
c/s
Rmm (r) = −
2m
c/s c/s
c/s (2l − 1)zRl−1, m (r) − r 2 Rl−2, m (r)
Rlm (r) = (1.40)
(l + m)(l − m)

(2m − 1)  c/s s/c



c/s
Imm (r) = − x I m−1, m−1 (r) ∓ y I m−1, m−1 (r) (1.41)
r2

c/s (2l − 1) c/s (l − 1)2 − m2 c/s


Ilm (r) = 2
zIl−1, m (r) − Il−2, m (r) (1.42)
r r2
c
which are initiated from R 00 (r) = 1, R00
s
(r) = 0, I00
c
(r) = 1/r, and
I00 (r) = 0. The real-valued regular and irregular solid harmonics
s

and real-valued spherical harmonics are then


C lμ (r) = Alμ Rlμ (r) (1.43)
Z lμ (r) = Ilμ (r)/Alμ (1.44)

2l + 1
Ylμ () = C lμ (r̂) (1.45)

where

Alμ = (−1)μ (2 − δμ, 0 )(l + μ)!(l − μ)!. (1.46)
One can construct a real-valued STGO by replacing the Cartesian
coordinate arguments of C lμ (r) with their Cartesian gradients.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

22 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

By using the complex-valued STGO differentiation rules and the


relationship between the complex- and real-valued harmonics, one
obtains the gradients [33]
1  c/s 
c/s
d Rlm (r) c/s
= Rl−1, m+1 (r) − Rl−1, m−1 (r) (1.47)
dx 2
1  s/c 
c/s
d Rlm (r) s/c
=± Rl−1, m+1 (r) + Rl−1, m−1 (r) (1.48)
dy 2
c/s
d Rlm (r) c/s
= Rl−1, m (r) (1.49)
dz
and
1  c/s 
c/s
d Ilm (r) c/s
= Il+1, m+1 (r) − Il+1, m−1 (r) (1.50)
dx 2
1  s/c 
c/s
d Ilm (r) s/c
=± Il+1, m+1 (r) + Il+1, m−1 (r) (1.51)
dy 2
c/s
d Ilm (r) c/s
= −Il+1, m (r). (1.52)
dz
The real solid harmonics obey the translation theorems [19, 46]

Rlμ (r − Rb ) = Wlμ, j κ (Rab )R j κ (r − Ra ) (1.53)


C lμ (r − Rb ) = W̄lμ, j κ (Rab )C j κ (r − Ra ) (1.54)

where
 
Wlm, j k (r) = Rl− j, m−k (r) + (−1)k Rl− j, m+k (r) /2δk, 0
c/s, c c/s c/s
(1.55)

c/s, s s/c s/c


Wlm, j k (r) = ∓Rl− j, m−k (r) ± (−1)k Rl− j, m+k (r) (1.56)
and
W̄lμ, j κ (r) = (Alμ /A j κ )Wlμ, j κ (r). (1.57)
Consider a system composed of atomic multipoles, i.e., ρ(r) =

a, lμ∈a qlμ χlμ (r − Ra ), where χlμ (r − Ra ) is any function satisfying

χlμ (r − Ra )C j κ (r − Ra )d 3r = δl j δμκ ; (1.58)
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Appendices 23

then the multipole moments of ρ(r) evaluated about the origin Ro is



qlμ = ρ(r)C lμ (r − Ro )d 3r
 (1.59)
= W̄lμ, j κ (Rao )q j κ .
a, j κ∈a
The translation of multipoles, as in Eq. (1.59), is a key component to
the Fast Multipole Method [20, 46].
The gradients of W(r) can be expressed in terms of the matrix
elements themselves in a manner analogous to Eq. (1.47)–(1.49), e.g.
c/s, c/s c/s, c/s
d/dzWlm, j k (r) = W(l−1, m), j k (r). The translation matrix is efficiently
c/s, c/s c/s, c/s
computed using the identity W(l+1, m), ( j +1, k) (r) = Wlm, j k (r).

1.6.3 Gaussian Multipole Expansions


One deduces the form of a Gaussian multipole upon considering
Eq. (1.58) and the orthogonality of the spherical harmonics [19]
  32
ζ (2ζ )l
e−ζ |r−R| C lμ (r − Ra )
2
χlμ (r − Ra ; ζ ) =
π (2l − 1)!! (1.60)
C lμ (∇a )
= χ00 (r − Ra ; ζ )
(2l − 1)!!
The interaction of two Gaussian multipoles via operator Ô(r − r ) is
[19]

E = qlμ q j κ Olμ, j κ (Rab ), (1.61)
lμ∈a
j κ∈b

where  
C lμ (∇a ) C j κ (∇b )
Olμ, j κ (Rab ) = χ00 (r − Ra ; ζa )
(2l − 1)!! (2 j − 1)!!
× Ô(r − r )χ00 (r − Rb ; ζb )d 3rd 3r 

min(l, j )
2l (2u − 1)!!2 j
= (−1) j
Ol+ j −u
u=0
(2l − 1)!!2u (2 j − 1)!!

u
× W̄lμ, uν (Rab )W̄ j κ, uν (Rab ) (1.62)
ν=−u
and
 n
d
On = 2
2
O00, 00 (Rab ) (1.63)
d Rab
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

24 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

is an “auxiliary vector.” For example, when O0 = erfc(ζ Rab )/Rab ,


Eq. (1.62) is the real-space Ewald correction for point-multipole
interactions. The gradients of Eq. (1.62) can be expressed as a linear
combination of auxiliary matrix elements [19], where the auxiliary
matrix is Eq. (1.62) evaluated with one extra derivative applied to
Eq. (1.63). The beauty of Eq. (1.62) is exhibited when contracted
Gaussian functions are used; in which case, only Eq. (1.63)
depends on the contraction coefficients and primitive exponents.
This property was exploited in Ref. [21] which demonstrated how
Eq. (1.62) is used to efficiently rotate the pretabulated overlap and
tight-binding matrix elements encountered in the DFTB2 and DFTB3
semiempirical Hamiltonians.

1.6.4 Point Multipole Expansions


A point multipole δlμ (r − Ra ) is best described as a Gaussian
multipole [Eq. (1.60)] in the limit of infinite exponent [19, 47]
δlμ (r − Ra ) = lim χlμ (r − Ra ; ζ )
ζ →∞
  32
C lμ (∇a ) ζ
e−ζ |r−Ra |
2
= lim (1.64)
(2l − 1)!! ζ →∞ π
C lμ (∇a )
= δ(r − Ra ).
(2l − 1)!!
By writing the real-valued STGO as a linear combination of the
complex-valued STGO and applying the product and differentiation
rules, one derives the Coulomb interaction energy between two
point multipole expansions [46]
 C lμ (∇a ) C j κ (∇b ) 1
E = qlμ q j κ
(2l − 1)!! (2 j − 1)!! Rab
lμ∈a
j κ∈b
(1.65)
= qa · T̄(Rab ) · qb
where

T̄lμ, j κ (Rab ) = Tlμ, j κ (Rab )/(Alμ A j κ ), (1.66)

  2(−1)l
c, c/s c/s c/s
Tlm, j k (r) = Il+ j, m+k (r) ± (−1)k Il+ j, m−k (r) δ +δ (1.67)
2 m, 0 k, 0
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

Appendices 25

Table 1.5 The unique nonzero real-valued spherical har-


monic Gaunt coefficients for expanding atomic orbital
products to quadrupole. Glμ lμ
(l a μa ), (l b μb ) = s|G (l a μa ), (l b μb ) |, where
the sign s and the magnitude are listed in the table.


|G(la μa ), (l μ ) | s l μ la μa lb μb
b b

1 + 1 0 0 0 1 0
4π + 1 1 0 0 1 1
+ 1 −1 0 0 1 −1
+ 2 0 0 0 2 0
+ 2 1 0 0 2 1
+ 2 −1 0 0 2 −1
+ 2 2 0 0 2 2
+ 2 −2 0 0 2 −2

1 4
+ 2 0 1 0 1 0
4π 5

1 1 − 2 0 1 1 1 1
4π 5 − 2 0 1 −1 1 −1


1 3 + 2 1 1 0 1 1
4π 5 + 2 −1 1 0 1 −1
+ 2 2 1 1 1 1
− 2 2 1 −1 1 −1
+ 2 −2 1 1 1 −1


5 1 4 + 2 0 2 0 2 0
7 4π 5 − 2 0 2 2 2 2
− 2 0 2 −2 2 −2


5 1 1 + 2 0 2 1 2 1
7 4π 5 + 2 0 2 −1 2 −1


5 1 3 + 2 2 2 1 2 1
7 4π 5 − 2 2 2 −1 2 −1
+ 2 −2 2 1 2 −1

and
  2(−1)l
s, c/s s/c s/c
Tlm, j k (r) = (−1)k Il+ j, m−k (r) ± Il+ j, m+k (r) δ +δ . (1.68)
2 m, 0 k, 0
Eqs. (1.66)–(1.68) are a special case of the more general Eq. (1.62).
The gradients of T(r) can be expressed in terms of the matrix
elements themselves in a manner analogous to Eq. (1.50)–(1.52).
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

26 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

By having written a point-multipole as the spherical tensor gra-


dients passing through a point, one easily derives the particle mesh
Ewald method for point multipoles. The main differences occur
in the calculation of the structure factor, which requires spherical
tensor gradients of the Cardinal B-spline weight, and the calculation
of the short-range real-space correction (see Section 1.6.3).

1.6.5 Real-Valued Spherical Harmonic Gaunt Coefficients


A real-valued spherical harmonic Gaunt coefficient corresponds to
the integral


G(la μa ), (lb μb ) = Ylμ ()Yla μa ()Ylb μb ()d

= G(lb μb ), (la μa )
(1.69)
la μa l μ
= G(lμ), (l b μb ) = G(lab μab ), (lμ)
l b μb l μ
= G(lμ), (la μa ) = G(lba μba ), (lμ) ,

which has a six-fold degeneracy. The values of the these coefficients


are different than those encountered in text books, which tend to list
those for complex-valued harmonics. Most combinations of indices
produce a zero result. The unique nonzero values used to perform
the auxiliary expansion of the DFTB3 density are listed in Table 1.5.

Acknowledgments

The authors are grateful for financial support provided by the


National Institutes of Health (GM62248). This work used the
Extreme Science and Engineering Discovery Environment (XSEDE),
which is supported by National Science Foundation grant number
OCI-1053575.

References

1. Acevedo, O., and Jorgensen, W. L. (2010). Advances in Quantum and


Molecular mechanical (QM/MM) simulations for organic and enzymatic
reactions, Acc. Chem. Res. 43, 142–151.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

References 27

2. Barnes, J., and Hut, P. (1986). A hierarchical O(NlogN) force-calculation


algorithm, Nature 324, 446–449.
3. Barnett, C. B., and Naidoo, K. J. (2010). Ring puckering: A metric
for evaluating the accuracy of AM1, PM3, PM3CARB-1, and SCC-DFTB
carbohydrate QM/MM Simulations, J. Phys. Chem. B 114, 17142–17154.
4. Bayman, B. F. (1978). A generalization of the spherical harmonic
gradient formula, J. Math. Phys. 19(12), 2558–2562.
5. Berka, K., Laskowski, R., Riley, K. E., Hobza, P., and Vondrášek, J.
(2009). Representative amino acid side chain interactions in proteins.
A comparison of highly accurate correlated ab initio quantum chemical
and empirical potential procedures, J. Chem. Theory Comput. 5, 982–992.
6. Case, D. A., Darden, T. A., Cheatham III, T. E., Simmerling, C. L., Wang,
J., Duke, R. E., Luo, R., Walker, R. C., Zhang, W., Merz, K. M., Roberts, B.,
Hayik, S., Roitberg, A., Seabra, G., Swails, J., Götz, A. W., Kolossváry, I.,
Wong, K. F., Paesani, F., Vanicek, J., Wolf, R. M., Liu, J., Wu, X., Brozell,
S. R., Steinbrecher, T., Gohlke, H., Cai, Q., Ye, X., Wang, J., Hsieh, M.-J., Cui,
G., Roe, D. R., Mathews, D. H., Seetin, M. G., Salomon-Ferrer, C., R. Sagui,
Babin, V., Luchko, T., Gusarov, S., Kovalenko, A., and Kollman, P. A. (2012).
AMBER 12, University of California, San Francisco, San Francisco, CA.
7. Cembran, A., Bao, P., Wang, Y., Song, L., Truhlar, D. G., and Gao, J. (2010).
On the interfragment exchange in the X-Pol method, J. Chem. Theory
Comput. 6(8), 2469–2476.
8. Cisneros, G. A., Piquemal, J., and Darden, T. A. (2006). Generalization
of the Gaussian electrostatic model: Extension to arbitrary angular
momentum, distributed multipoles, and speedup with reciprocal space
methods, J. Chem. Phys. 125, 184101.
9. Dahlke, E. E., and Truhlar, D. G. (2007). Electrostatically embedded
many-body correlation energy, with applications to the calculation of
accurate second-order Møller–Plesset perturbation theory energies for
large water clusters, J. Chem. Theory Comput. 3(4), 1342–1348.
10. Darden, T., York, D., and Pedersen, L. (1993). Particle mesh Ewald: An
N log(N) method for Ewald sums in large systems, J. Chem. Phys. 98,
10089–10092.
11. Dunlap, B. I. (2001). Direct quantum chemical integral evaluation, Int. J.
Quantum Chem. 81, 373–383.
12. Essmann, U., Perera, L., Berkowitz, M. L., Darden, T., Hsing, L., and
Pedersen, L. G. (1995). A smooth particle mesh Ewald method, J. Chem.
Phys. 103(19), 8577–8593.
13. Frisch, M. J., Trucks, G. W., Schlegel, H. B., Scuseria, G. E., Robb, M. A.,
Cheeseman, J. R., Scalmani, G., Barone, V., Mennucci, B., Petersson, G. A.,
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

28 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

Nakatsuji, H., Caricato, M., Li, X., Hratchian, H. P., Izmaylov, A. F., Bloino,
J., Zheng, G., Sonnenberg, M., Hada, M., Ehara, M., Toyota, K., Fukuda,
R., Hasegawa, J., Ishida, M., Nakajima, T., Honda, Y., Kitao, O., Nakai,
H., Vreven, T., Montgomery, J. A., J., Peralta, J. E., Ogliaro, F., Bearpark,
M., Heyd, J. J., Brothers, E., Kudin, K. N., Straverov, V. N., Kobayashi,
R., Normand, J., Raghavachari, K., Rendell, A., Burant, J. C., Iyengar,
S. S., Tomasi, J., Cossi, M., Rega, N., Millam, J. M., Klene, M., Knox, J. E.,
Cross, J. B., Bakken, V., Adamo, C., Jaramillo, J., Gomperts, R., Stratmann,
R. E., Yazyev, O., Austin, A. J., Cammi, R., Pomelli, C., Ochterski, J. W.,
Martin, R. L., Morokuma, K., Zakrzewski, V. G., Voth, G. A., Salvador, P.,
Dannenberg, J. J., Dapprich, S., Daniels, A. D., Farkas, O., Foresman, J. B.,
Ortiz, J. V., Cioslowski, J., and Fox, D. J. (2009). Gaussian 09, Revision A.02,
Gaussian, Inc., Wallingford, CT.
14. Gao, J., and Wang, Y. (2012). Communication: Variational many-body
expansion: Accounting for exchange repulsion, charge delocalization,
and dispersion in the fragment-based explicit polarization method, J.
Chem. Phys. 136, 071101.
15. Gaus, M., Goez, A., and Elstner, M. (2013). Parametrization and
benchmark of DFTB3 for organic molecules, J. Chem. Theory Comput. 9,
338–354.
16. Giese, T. J., Chen, H., Dissanayake, T., Giambaşu, G. M., Heldenbrand, H.,
Huang, M., Kuechler, E. R., Lee, T.-S., Panteva, M. T., Radak, B. K., and York,
D. M. (2013). A variational linear-scaling framework to build practical,
efficient next-generation orbital-based quantum force fields, J. Chem.
Theory Comput. 9, 1417–1427.
17. Giese, T. J., Sherer, E. C., Cramer, C. J., and York, D. M. (2005). A
semiempirical quantum model for hydrogen-bonded nucleic acid base
pairs, J. Chem. Theory Comput. 1(6), 1275–1285.
18. Giese, T. J., and York, D. M. (2005). Improvement of semiempirical
response properties with charge-dependent response density, J. Chem.
Phys. 123(16), 164108.
19. Giese, T. J., and York, D. M. (2008). Contracted auxiliary Gaussian basis
integral and derivative evaluation, J. Chem. Phys. 128(6), 064104.
20. Giese, T. J., and York, D. M. (2008). Extension of adaptive tree code
and fast multipole methods to high angular momentum particle charge
densities, J. Comput. Chem. 29(12), 1895–1904.
21. Giese, T. J., and York, D. M. (2008). Spherical tensor gradient operator
method for integral rotation: A simple, efficient, and extendable
alternative to Slater–Koster tables, J. Chem. Phys. 129(1), 016102.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

References 29

22. Goedecker, S., and Scuseria, G. E. (2003). Linear scaling electronic


structure methods in chemistry and physics, IEEE Comput. Sci. Eng. 5,
14–21.
23. Greengard, L., and Rokhlin, V. (1987). A fast algorithm for particle
simulations, J. Comput. Phys. 73, 325–348.
24. Grimme, S., Ehrlich, S., and Goerigk, L. (2011). Effect of the damping
function in dispersion corrected density functional theory, J. Comput.
Chem. 32(7), 1456–1465.
25. Hobson, E. W. (1892). On a theorem in differentiation, and its application
to spherical harmonics, Proc. London Math. Soc. 24(1), 55–67.
26. Isegawa, M., Gao, J., and Truhlar, D. G. (2011). Incorporation of charge
transfer into the explicit polarization fragment method by grand
canonical density functional theory, J. Chem. Phys. 135, 084107.
27. Jacobson, L. D., and Herbert, J. M. (2011). An efficient, fragment-based
electronic structure method for molecular systems: Self-consistent
polarization with perturbative two-body exchange and dispersion, J.
Chem. Phys. 134, 094118.
28. Jurečka, P., and Hobza, P. (2003). True stabilization energies for
the optimal planar hydrogen-bonded and stacked structures of gua-
nine...cytosine, adenine...thymine, and their 9- and 1-methyl derivatives:
Complete basis set calculations at the MP2 and CCSD(T) levels and
comparison with experiment, J. Am. Chem. Soc. 125, 15608–15613.
29. Jurečka, P., Šponer, J., Černý, J., and Hobza, P. (2006). Benchmark
database of accurate (MP2 and CCSD(T) complete basis set limit)
interaction energies of small model complexes, DNA base pairs, and
amino acid pairs, Phys. Chem. Chem. Phys. 8, 1985–1993.
30. Korth, M. (2010). Third-generation hydrogen-bonding corrections for
semiempirical QM methods and force fields, J. Chem. Theory Comput. 6,
3808–3816.
31. Martin, B., and Clark, T. (2006). Dispersion treatment for NDDO-based
semiempirical MO techniques, Int. J. Quantum Chem. 106, 1208–1216.
32. McNamara, J. P., and Hillier, I. H. (2007). Semi-empirical molecular
orbital methods including dispersion corrections for the accurate pre-
diction of the full range of intermolecular interactions in biomolecules,
Phys. Chem. Chem. Phys. 9, 2362–2370.
33. Pérez-Jordá, J., and Yang, W. (1996). A concise redefinition of the solid
spherical harmonics and its use in fast multipole methods, J. Chem. Phys.
104, 8003–8006.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

30 A Modified Divide-and-Conquer Linear-Scaling Quantum Force Field

34. Piquemal, J., Cisneros, G., Reinhardt, P., Gresh, N., and Darden, T. A.
(2006). Towards a force field based on density fitting, J. Chem. Phys.
124(10), 104101.
35. Rocha, G. B., Freire, R. O., Simas, A. M., and P. Stewart, J. J. (2006). RM1:
A reparameterization of AM1 for H, C, N, O, P, S, F, Cl, Br, and I, J. Comput.
Chem. 27(10), 1101–1111.
36. Sagui, C., Pedersen, L. G., and Darden, T. A. (2004). Towards an accurate
representation of electrostatics in classical force fields: efficient
implementation of multipolar interactions in biomolecular simulations,
J. Chem. Phys. 120(1), 73–87.
37. Sattelmeyer, K. W., Tubert-Brohman, I., and Jorgensen, W. L. (2006). NO-
MNDO: Reintroduction of the overlap matrix into MNDO, J. Chem. Theory
Comput. 2, 413–419.
38. Stewart, J. J. P. (2007). Optimization of parameters for semiempirical
methods V: Modification of NDDO approximations and application to 70
elements, J. Mol. Model. 13, 1173–1213.
39. Tuttle, T., and Thiel, W. (2008). OMx-D: semiempirical methods with
orthogonalization and dispersion corrections. Implementation and
biochemical application, Phys. Chem. Chem. Phys. 10, 2125–2272.
40. R̆ezác̆, J., and Hobza, P. (2012). Advanced corrections of hydrogen bond-
ing and dispersion for semiempirical quantum mechanical methods, J.
Chem. Theory Comput. 8, 141–151.
41. R̆ez̀ac̆, J., Riley, K. E., and Hobza, P. (2011). S66: A well-balanced database
of benchmark interaction energies relevant to biomolecular structures,
J. Chem. Theory Comput. 7, 2427–2438.
42. Řezáč, J., Fanfrlı́k, J., Salahub, D., and Hobza, P. (2009). Semiempirical
quantum chemical PM6 method augmented by dispersion and H-
bonding correction terms reliably describes various types of noncova-
lent complexes, J. Chem. Theory Comput. 5, 1749–1760.
43. Šponer, J., Jurečka, P., and Hobza, P. (2004). Accurate interaction energies
of hydrogen-bonded nucleic acid base pairs, J. Am. Chem. Soc. 126,
10142–10151.
44. Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004).
Development and testing of a general amber force field, J. Comput. Chem.
25, 1157–1174.
45. Wang, Y., Sosa, C. P., Cembran, A., Truhlar, D. G., and Gao, J. (2012).
Multilevel X-Pol: A fragment-based method with mixed quantum
mechanical representations of different fragments, J. Phys. Chem. B
116(23), 6781–6788.
February 2, 2016 14:20 PSP Book - 9in x 6in 01-Qiang-Cui-c01

References 31

46. Watson, M. A., Sałek, P., Macak, P., and Helgaker, T. (2004). Linear-scaling
formation of Kohn–Sham Hamiltonian: Application to the calculation
of excitation energies and polarizabilities of large molecular systems,
J. Chem. Phys. 121(7), 2915–2931.
47. Weniger, E. J., and Steinborn, E. O. (1983). New representations for the
spherical tensor gradient and the spherical delta function, J. Math. Phys.
24(11), 2553–2563.
48. Wesolowki, T. A. (2008). Embedding a multideterminantal wave
function in an orbital-free environment, Phys. Rev. A. 77, 012504–
012513.
49. Wesolowski, T. A., and Warshel, A. (1993). Frozen density functional
approach for ab Initio calculations of solvated molecules, J. Phys. Chem.
97, 8050–8053.
50. Xie, W., and Gao, J. (2007). Design of a next generation force field: The
X-Pol potential, J. Chem. Theory. Comput. 3(6), 1890–1900.
51. Xie, W., Orozco, M., Truhlar, D. G., and Gao, J. (2009). X-Pol potential:
An electronic structure-based force field for molecular dynamics
simulation of a solvated protein in water, J. Chem. Theory Comput. 5,
459–467.
52. Xie, W., Song, L., Truhlar, D. G., and Gao, J. (2008). The variational explicit
polarization potential and analytical first derivative of energy: Towards
a next generation force field, J. Chem. Phys. 128, 234108.
53. Yanson, I. K., Teplitsky, A. B., and Sukhodub, L. F. (1979). Experimental
studies of molecular interactions between nitrogen bases of nucleic
acids, Biopolymers 18, 1149–1170.
54. Zhang, P., Fiedler, L., Leverentz, H. R., Truhlar, D. G., and Gao, J. (2011).
Polarized molecular orbital model chemistry. 2. The PMO method, J.
Chem. Theory Comput. 7, 857–867.
55. Zhang, P., Truhlar, D. G., and Gao, J. (2012). Fragment-based quantum
mechanical methods for periodic systems with Ewald summation and
mean image charge convention for long-range electrostatic interactions,
Phys. Chem. Chem. Phys. 14(21), 7821–7829.
56. Zhou, B., Ligneres, V. L., and Carter, E. A. (2005). Improving the orbital-
free density functional theory description of covalent materials, J. Chem.
Phys. 122(4), 044103.
This page intentionally left blank
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Chapter 2

Explicit Polarization Theory

Yingjie Wang,a Michael J. M. Mazack,a Donald G. Truhlar,a


and Jiali Gaoa,b
a Department of Chemistry and Supercomputing Institute,

University of Minnesota, Minneapolis, MN 55455, USA


b Theoretical Chemistry Institute,

State Key Laboratory of Theoretical and Computational Chemistry,


Jilin University, Changchun, Jilin Province 130023, P. R. China
[email protected]

Molecular mechanical force fields have been successfully used


to model condensed-phase and biomolecular systems for a half
century. Molecular mechanical force fields are analytic potential
energy functions based on classical mechanical force constants, van
der Waals potentials, electrostatics, and torsional potentials, with
parameters fit to experiment, to quantum mechanical calculations,
or to both. Accurate results can be obtained from simulations
employing molecular mechanics for processes not involving bond
breaking or bond forming. In this chapter, we describe a new
approach to developing force fields; this approach involves the direct
use of quantum mechanical calculations rather than using them as
a training set for classical mechanical force fields. Computational
efficiency is achieved by partitioning of the entire system into
molecular fragments. Since the mutual electronic polarization is
explicitly treated by electronic structural theory, we call this

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

34 Explicit Polarization Theory

approach the explicit polarization (X-Pol) method. Strategies and


examples are presented to illustrate the application of X-Pol to
describe intermolecular interactions as a quantum chemical model
and as a force field to carry out statistical mechanical Monte Carlo
and molecular dynamics simulations.

2.1 Introduction

Molecular mechanical force fields (MMFFs) were first proposed in


the 1940s to study steric effects of organic molecules1, 2 and were
extended to model biomolecular systems by Lifson and coworkers
in the 1960s.3−5 Since that time, significant progress has been made,
and a number of force fields have been developed that can be used
to provide excellent quantitative interpretation of experimental
observations.6−27
Although the widely used force fields differ in their details
(for example, some of them include coupling between internal
coordinates), the functional forms used in MMFFs have remained
essentially unchanged over the past half century,5, 28 and the
functional form depicted in Eq. 2.1 captures the essence of a typical
MMFF potential energy function:


bonds
1  1
angles
V = Kb (R b − Rb ) +
o 2
Ka (θa − θao )2
b
2 a
2
 
torsion
Vtn
+ [1 + cos(nφt − φto )]
t n
2
      
 σi j 12 σi j 6 qi q j
+ εi j − + (2.1)
i< j
Ri j Ri j Ri j

In this equation, the first sum accounts for bond stretching,


the second sum for valence angle bending, the third (double)
sum for torsions, and the fourth, where the sum goes only over
nonbonded and nongeminal atoms, for van der Waals interactions
and nonbonded Coulomb forces.
The importance of polarization has long been recognized, and
Eq. (2.1) includes polarization implicitly through the choice of
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Introduction 35

parameters, which are often designed to include not just the effect
of intramolecular polarization but also the effect of polarization by
the solvent or other surroundings in a condensed-phase medium.
Major current efforts in improving MMFFs are being devoted to the
explicit inclusion of polarization by means of terms of various forms
to account for inductive forces.29−49 We will label force fields that
include polarization explicitly as polarized molecular mechanics
force fields or PMMFFs, while we restrict the acronym MMFFs to
force fields that include polarization only implicitly through the
parametrization.
Despite the success of molecular mechanics,28, 50, 51 there are
also a number of limitations: There is no general approach to
treat the coupling of internal degrees freedom, the treatment of
electronic polarization is difficult, intermolecular charge transfer is
neglected, excited electronic states cannot be treated, and in the
form usually employed the methods are inapplicable to chemical
reactions.28 In recent years, some extensions to treat chemically
reactive systems have been presented,52−55 and one can overcome
some of the limitations in specific applications by introducing
additional empirical terms,31, 32, 56, 57 but here we discuss another
approach, where the whole treatment is intrinsically based on
quantum mechanics (QM).
Quantum mechanical electronic structure calculations can pro-
vide both reactive and nonreactive potential energy surfaces,
including not only electrostatics and van der Waals forces but
also polarization and charge transfer effects. However, it is a
daunting task (essentially impossible) to solve the Schrödinger
equation for a condensed-phase system. Therefore, a wide range
of approximate quantum chemical model chemistries have been
developed, including both wave function theory (WFT)58 and
density functional theory (DFT),59 as well as various linear scaling
and fragment-based QM methods that have been proposed to reduce
the computation costs.60−95 The latter represents an active approach
to balance accuracy and efficiency in applying electronic structural
methods to large systems.
The explicit polarization (X-Pol) model is a fragment-based
QM method, in which the entire system is divided into molecular
subunits,65, 66, 77, 80 which can be individual molecules, ions, ligands
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

36 Explicit Polarization Theory

or cofactors, and amino acid residues or a group of these entities.


The key assumption in the X-Pol method is that the wave function
of the entire system is approximated as a Hartree product of
the wave functions of the individual fragments. Consequently, the
optimization of the total wave function can be reduced to the
optimization of each fragment embedded in and polarized by
the rest of the system. Clearly, variational optimization of the
mutual dependence of the fragmental wave functions is critical
to the success of this method. As a force field, the energy of
each fragment corresponding to the intramolecular energy terms
in an MMFF is determined by the electronic structure method
used, whereas intermolecular interactions are modeled through
electrostatic embedding in terms of one-electron integrals. The
short-range exchange repulsion interactions between fragments, the
long-range dispersion interactions between different fragments, and
the interfragment correlation energy are neglected in the Hartree
product approximation but are modeled empirically as in molecular
mechanics.65, 66, 77 Alternatively, these energy contributions can be
modeled by density-dependent functional,96, 97 by Hartree–Fock
(HF) exchange,98 or by making use of many-body expansion
corrections.99 The latter also takes into account interfragment
charge transfer effects, which are otherwise neglected, although
intrafragment charge transfer is fully included. X-Pol92 can also be
used as a general QM-QM fragment-coupling scheme,88, 100, 101 in
which different levels of theory are employed to model different
fragments; we refer to this as a multilevel method.
In the following sections, we summarize the theoretical formula-
tion of the X-Pol model and illustrate the multilevel X-Pol92 method
for studying intermolecular interactions. In addition, we discuss our
work on using X-Pol as a quantum mechanical force field (QMFF) for
liquid water simulations.

2.2 Theoretical Background

In X-Pol, a macromolecular system is partitioned into molecular


fragments, which may be called monomers. The division is flexible
within the constraint that monomers do not overlap, (i.e., the
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Theoretical Background 37

subsystem included in one fragment does not appear in another


monomer). For solutions with small solute molecules, a fragment
can be a single solute or solvent molecule.65, 66 For large solute
molecules or biomacromolecules, (e.g., a protein or enzyme–
substrate complex) a fragment can be a connected group of atoms
(e.g., peptide unit, or a metal atom or ion, a cofactor, or a substrate
molecule).77, 102 Several peptide units can be combined into the
same fragment, if desired, which can be useful for modeling systems
containing disulfide bonds. The X-Pol method is derived from a
standard electronic structure method by a nested set of three
approximations, described next.

2.2.1 Approximation of the Total Wave Function and Total


Energy
The first approximation in the X-Pol theory is that the molecular
wave function of the entire system  is approximated as a
Hartree product of the antisymmetric wave functions of individual
fragments, { A ; A = 1, · · · , N}:

N
= A. (2.2)
A=1
The wave function of fragment A,  A , can either be a single deter-
minant from HF theory or Kohn–Sham DFT, or a multiconfiguration
wave function derived from complete active space self-consistent
field (CASSCF) or valence bond (VB) calculations.
The effective Hamiltonian of the system is expressed as Eq. 2.3

N
1   i nt
N N
Ĥ = Ĥ Ao + ( Ĥ [ρ B ] + E XD
A B ), (2.3)
A
2 A B= A A

where the first term sums over the Hamiltonians of all isolated
fragments and the second double summation accounts for pairwise
interactions among all the fragments. The explicit form of Ĥ Ao , which
is the Hamiltonian for an isolated fragment A in the gas phase, varies
according to the level of theory employed, for instance, post-HF
correlated methods can be used to treat the active site of an enzyme,
and HF or semiempirical molecular orbital methods can be used to
treat solvent molecules or peptide units that are distant from the
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

38 Explicit Polarization Theory

reactive center. The Hamiltonian Ĥ Ai nt [ρ B ] represents electrostatic


interactions between fragments A and B, and the final term E XD AB
specifies exchange-repulsion, dispersion and other interfragment
correlation energy contributions, and charge transfer interactions,
as explained in more detail in the following sections.
The total energy of the system is written as the expectation value
of the effective Hamiltonian,
 N
1   i nt
N N
E [{ρ}] =< | H | > = EA + (E A B [ρ A , ρ B ] + E XD
AB)
A
2 A B= A
(2.4)
where E A is the energy of fragment A that is determined using its
wave function as polarized by all other fragments, and E iAntB [ρ A , ρ B ]
is the electrostatic interaction energy between fragments A and B,
again calculated using the polarized wave functions. The latter term
is calculated from the point of view of fragment A and also from the
point of view of fragment B, and the sum of these results is divided
by two since the same interactions are counted twice. Therefore, we
have
E A =<  A | Ĥ Ao | A >, (2.5)
1

E iAntB [ρ A , ρ B ] = <  A | Ĥ Ai nt [ρ B ]| A > + <  B | Ĥ Bi nt [ρ A ]| B > .
2
(2.6)

2.2.2 Approximation on the Electrostatic Interaction


between Fragments
The second approximation in the X-Pol theory is the method
of treating the interaction between fragments. The interaction
Hamiltonian between fragment A and B is defined as

MA 
NA
Ĥ A [ρ B ] = −
i nt
e E (ri ) +
B A
Z αA EB (RαA ), (2.7)
i =1 α=1
where MA and N A are respectively the number of electrons and
nuclei in fragment A, Z αA is the nuclear charge of atom α of fragment
A, and EB (rxA ) is the electrostatic potential at rx from fragment B.
The electrostatic potential is given by

ρ B (r )
E (rx ) =
B A
dr , (2.8)
|rxA − r |
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Theoretical Background 39


where ρ B (r ) = −ρele
B
(r ) + Z βB δ(r − RβB ) is the total charge
β
density of fragment B, including electron density ρeleB
(r ) and nuclear
charge Z β at Rβ . The potential E (rx ) can be used directly
B B B A

to determine the electrostatic interaction energy of Eq. 2.7; this


involves or is equivalent to evaluating the corresponding four-index
two-electron integrals explicitly, which is time-consuming and could
be ill-behaved when large basis sets are used. Although it yields the
classical electrostatic part of the interaction without approximation,
it does not include the exchange repulsion part of the interfragment
interaction or the interfragment correlation energy, which will be
discussed in Section 2.2.3. To reduce the computational cost in two-
electron integral calculation, it is desirable to an efficient approach
to treat interfragment electrostatic interactions.65, 66
The quantity EB (rxA ) may be considered as an embedding
potential of fragment A due to the external charge
distribution of fragment B, and a number of well-established
techniques.15, 21, 103−107 can be used to model it. A general approach
for the classical electrostatic potential is to use a multicenter
multipole expansion,107 of which the simplest form is to limit the
expansion to the monopole terms, so the result only depends on
the partial atomic charges. The use of partial atomic charges to
approximate EB (rxA ) is particularly convenient for constructing the
effective Hamiltonian of Eq. 2.7, and this is the strategy that has been
adopted for the classical electrostatic part in the X-Pol method.65, 66
The next issue in modeling the electrostatic interaction is
the method to obtain the monopole charges. For these charges,
one may use partial atomic charges fitted to the electrostatic
potential (ESP)15, 105, 106, 108−113 or one may use Mulliken popula-
tion analysis,104 population analysis based on Löwdin orthogona-
lization,103 or class IV charges from mapping procedures114, 115 in
which the mapping function has been parametrized to yield atomic
charges that reproduce experimental molecular dipole moments.
Another method is based on optimization of atomic charges to
reproduce the molecular multipole moments from QM calculations,
and we have recently used a procedure that preserves the molecular
dipole moment and polarizability to generate dipole-preserving and
polarization-consistent charges (DPPCs).116
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

40 Explicit Polarization Theory

Using the approximation of point charges, Eq. 2.8 is simplified to

 qβB
EB (rxA ) = . (2.9)
β
|rxA − RβB |

2.2.3 Approximations to Interfragment


Exchange–Dispersion Interactions
The Hartree product wave function in Eq. 2.2 neglects the long-range
interfragment dispersion interactions, the other interfragment
correlation energy contributions, and the short-range interfragment
exchange-repulsion interactions arising from the Pauli exclusion
principle. Furthermore, the partition of a molecular system into
fragments and the restriction to an integer number of electrons in
each fragment precludes charge transfer between the fragments. But
interfragment dispersion interactions, the other interfragment cor-
relation energy contributions, the short-range exchange-repulsion
interactions, and charge transfer make critical contributions to
intermolecular interactions, so they must be added to the X-Pol
energy expression. A brute force approach is to employ variational
many-body expansion (VMB) theory to make two-body, three-
body, and higher order corrections.99 Although the accuracy can
be systematically improved by using many-body corrections, the
number of terms involved increases rapidly with the number of
fragments and the order of correction, rendering this approach
impractical beyond two-body correction terms. Thus in using
this approach, it is critical to define the reference state for the
monomer energies such that the higher-order correction terms
are negligible. However, when the X-Pol method is used as a
theoretical framework to develop force fields for condensed-phase
and macromolecular systems, we can use a simpler approach. In
particular, we introduce empirical terms such as Lennard–Jones or
Buckingham potentials (as used in molecular mechanics) to estimate
the exchange repulsion, dispersion, other interfragment correlation,
and charge transfer energies. In one of the applications described
in Section 2.4.1,92 we add the following pairwise Buckingham-
potential term to the interaction energy between fragments A and
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Theoretical Background 41

B:
 NB 
NA  
−B I J ·R I J CI J
E XD
AB = AI J e − 6 (2.10)
I J
RI J

where the parameters are determined from the atomic parameters


according to combining rules:

A I J = (A I A J )1/2 (2.11)

B I J = (B I + B J )/2 (2.12)

C I J = (C I C J )1/2 (2.13)

In the other application discussed in Section 2.4.2, we used pairwise


Lennard–Jones potentials.

2.2.4 Double Self-Consistent Field


As in standard electronic structure methods, the Roothaan–Hall
equation on each fragment in X-Pol is solved iteratively. However,
in X-Pol, in addition to the SCF convergence within each molecular
fragment, the mutual polarization among all fragments of the whole
system must be converged. A procedure is depicted in Fig. 2.1, which
may be described as a double self-consistent field (DSCF) iterative
scheme. In practice, however, there is no need to fully converge
the inner, intrafragment SCF before proceeding to the next iteration
step for the outer, interfragment SCF. We found that it is often
computationally efficient to carry out two to three iterations in the
intrafragment SCF between the outer SCF iterations.
There are two ways of constructing the Fock matrix for solving
the DSCF equations; one is based on the variational optimization
of the energy of Eq. 2.4,80 and the other, which was first used in
Monte Carlo simulations where analytic forces are not required,65, 66
is written by assuming that each monomer is embedded in the fixed
electrostatic field of the rest of the system. The two approaches are
discussed next.
(a) Variational X-Pol. In X-Pol, the Fock operator for a fragment, A,
is derived by taking the derivative of the total energy (Eq. 2.4) with
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

42 Explicit Polarization Theory

Figure 2.1 The schematic flow chart of DSCF iterations.

A
respect to each element Pμν of the electron density matrix:

∂ E [{ρ}] 1   B
B A 1  A
A
A, Xpol
Fμν = = Fμν
A, o
− q I + X a μν ,
A, SC F
∂ Pμν 2 B= A b∈B b b μν 2 a∈A a
(2.14)
A, o
where Fμν is the Fock matrix element for the Hamiltonian of the
isolated fragment A, qbB is the point charge on atom b of fragment
B, IbB is the matrix of the one-electron integrals of the embedding
potential due to fragment B, X aA is a vector arising from the
derivative of the electrostatic interaction energy with respect to the
point charge of atom a:
 
 
A B  Z bB
X aA = B
Pλσ Ia λσ + , (2.15)
B= A λσ b∈B
|RbB − RaA

and aA is the response density matrix:


January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Theoretical Background 43


∂qaA ∂qaA, SCF
aA μν
= A, SCF
= A, SCF
. (2.16)
∂ Pμν ∂ Pμν

(b) Charge-embedding X-Pol. If each fragment is considered to be


embedded in the instantaneous static electrostatic field of the rest of
the system, one can construct a Fock operator for fragment A simply
as follows:
 
A
F A, C E = F A, o − qbB IbB . (2.17)
B= A b∈B

In the charge-embedding approach, the mutual polarization among


all fragments in the system is achieved by iteratively updating the
partial atomic charges {qbB } derived from the wave function for each
fragment in each outer, interfragment SCF step (Fig. 2.1). Note that
Eq. 2.14 indicates that the wave function of each fragment, A, is fully
polarized by the full electric field of all other fragments, but the total
interaction energy will be determined by multiplying a factor of 0.5
since the interactions between two monomers are counted twice.
Similar expressions are often found in continuum self-consistent
reaction field models for solvation.

Comparison. In comparing methods a and b, we note that the


variational X-Pol method has the advantage of allowing the com-
putation of analytic gradients for efficient geometry optimization
and dynamics simulations. Furthermore, the total energy obtained
from the variational procedure is necessarily lower than that from
the charge-embedding scheme. Consequently, it is expected that
the use of the variational X-Pol energy as the monomer energy
reference state in many-body energy expansion be more efficient
than other alternatives. Although it is possible to obtain analytic
gradients for the non-variational, charge-embedding approaches,
it generally involves solution of coupled-perturbed self-consistent
field (CPSCF) equations, which is more time consuming. As a referee
of this manuscript lucidly pointed out, “often in the fragment
quantum chemistry literature, those response terms have simply
been ignored, with numerical consequences that have never been
investigated.”
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

44 Explicit Polarization Theory

2.3 Computational Details

The X-Pol method has been implemented in a developmental version


of the Gaussian software package (H35).117 Although a single
quantum chemical model can be used to represent all fragments,
any of the electronic structure methods available in Gaussian, such
as HF, DFT, MP2, CCSD, BD, etc., can be mixed to represent different
fragments in a multilevel X-Pol calculation. We have illustrated
the multilevel approach in a recent study92 of two hydrogen-
bonded complexes, including (a) acetic acid (fragment A) and water
(fragment B), and (b) H5 O+2 ion (fragment A) and four surrounding
water molecules (fragments B, five fragments in total). In that
work, the geometries of the complexes and isolated monomers were
optimized using the M06 exchange-correlation functional118 and the
MG3S119 basis set, which was followed by single-point, multilevel X-
Pol calculations using the 6-31G(d)120 basis set.
For condensed-phase and macromolecular simulations, we have
written an X-Pol software package using the C++ language, which
has been incorporated into NAMD121 and CHARMM.27 The X-Pol
program can be used with the popular NDDO-based semiempirical
Hamiltonians as well as the recently developed polarized molecular
orbital (PMO) model.122, 123 Molecular dynamics simulations of
liquid water have been carried out using the NAMD/X-Pol interface.
In addition, we have used an earlier version of the X-Pol model in
Monte Carlo simulations of liquid water.
Statistical mechanical Monte Carlo simulations were performed
on a system consisting of 267 water molecules in a cubic box, em-
ploying the XP3P water model, built upon the PMOw Hamiltonian124
and the DPPC charge model.116 Periodic boundary conditions were
used along with the isothermal-isobaric ensemble (NPT) at 1

atm and for a temperature ranging from −40 to 100 C. Spherical
cutoffs with a switching function between 8.5 Å and 9.0 Å based
on oxygen–oxygen separations were employed, and a long-range
correction to the Lennard–Jones potential was included. In Monte
Carlo simulations, new configurations were generated by randomly
translating and rotating a randomly selected water molecule within
ranges of ± 0.13 Å and ± 13◦ . In addition, the volume of the
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Illustrative Examples 45

system was changed randomly within the limit of ± 150 Å3 on every
550th attempted move, and the coordinates of oxygen atoms were
scaled accordingly. At least 5×106 configurations were discarded for
equilibration, followed by an additional 107 to 108 configurations for
averaging. About 6×106 configurations can be executed per day on
a six-core Intel Xeon X7542 Westmere 2.66 GHz processor.
The XP3P model was further employed in molecular dynamics
simulations for 500 ps in the NVT ensemble using the Lowe-
Andersen thermostat.125, 126 The volume was fixed at the average
value from the Monte Carlo simulation. The monomer geometries
were enforced by the SHAKE/RATTLE procedure.127 The velocity
Verlet integration algorithm was used with a 1fs time step. The
Monte Carlo simulations were performed using the MCSOL program
for X-Pol simulations,128 while molecular dynamics simulations
were carried out using a newly developed X-Pol program129 written
in C++ which has been interfaced both with CHARMM27 and
NAMD.121

2.4 Illustrative Examples

2.4.1 Multilevel X-Pol as a Quantum Chemical Model for


Macromolecules
The X-Pol theory can be used with a combination of different
electronic structure methods for different fragments. This provides
a general, multi-level QM/QM-type of treatment of a large system,
where the region of interest could be modeled by a high-level
theory, embedded in an environment modeled by a lower level
representation. Some arbitrary combinations of different electronic
models are illustrated by calculations92 of the interaction energy
between acetic acid and water at the minimum-energy configuration
optimized with M06/MG3S (Fig. 2.2). To represent the electrostatic
potential in Eq. 2.9, we used two charge models, Mulliken population
analysis (MPA) and ESP charge-fitting with the Merz–Kollman
scheme (MK), to construct the charge-embedding Fock matrix (Eq.
2.14), whereas only the MPA charges were used in variational X-Pol
(Eq. 2.11).
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

46 Explicit Polarization Theory

Figure 2.2 Schematic illustration of the optimized configuration of acetic


acid and water using M06/MG3S.

The binding energy for a bimolecular complex is defined by


E b = E A B − E oA − E oB (2.18)
(We have not applied any correction for the basis set superposition
error since the main purpose here is to illustrate the possibility of
mixing different levels of theory in multi-level X-Pol calculations.)
In X-Pol, the binding energy is written as the sum of electrostatic
(E elec ) and exchange-charge transfer-dispersion (E XCD ) terms.
E b = E elec + E XCD , (2.19)
where the electrostatic interaction energy in X-Pol is given by
1 i nt
E elec = [E (B) + E iBnt ( A)] + (E A − E oA ) + (E B − E oB ), (2.20)
2 A
where E iXnt (Y ) represents the interaction of “QM” fragment X
polarized by the electrostatic potential from fragment Y , and (E X −
E oX ) is the energy difference between fragment X in the complex
and in isolation. Table 2.1 summarizes the results from these
calculations.
The E XCD term can be determined by VMB expansion. For the
bimolecular complex in Fig. 2.2, the two-body correction energy
is exact. For condensed-phase and macromolecular systems, it is
convenient to simply approximate E XCD by an empirical potential
such as the Lennard–Jones potential or the Buckingham potential.
The total binding energy between acetic acid and water were
estimated to be −6.9 and −6.6 kcal/mol from M06/MG3S and
CCSD(T)/MG3S, respectively. Therefore, Table 2.1 shows that the
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Illustrative Examples 47

Table 2.1 Computed electrostatic interactions ener-


gies E elec (kcal/mol) between acetic acid (A) and
water (B) using multilevel X-Pol with the charge-em-
bedding and variational interaction Hamiltonians

A B Charge-embedding Variational Full QMa

MK-ESP MPA MPA


M06 M06 −7.0 −7.7 −9.0 −6.9
M06 B3LYP −6.8 −7.3 −8.7 −6.9
M06 HF −7.2 −7.9 −9.4 −6.9
MP2 HF −7.1 −7.7 −8.0 −6.5
CCSD M06 −7.2 −7.6 −8.0 −6.6b

Note: The 6-31G(d) basis set was used in all calculations with the
M06/MG3S optimized monomer and dimer geometries.
a
Computed for the complex using the method listed under A with the
MG3S basis set.
b
Determined using CCSD(T).

approximate electrostatic components computed by the X-Pol


method overestimate binding interactions for all combinations of
methods examined except the combination of M06 for acetic acid
and B3LYP for water. Within the charge-embedding scheme, the use
of ESP-fitted charges resulted in somewhat weaker binding inter-
actions than those from Mulliken population analysis. However, the
variational approach yielded binging energies about 1–2 kcal/mol
greater than the corresponding embedding model; at the M06/6-
31G(d) level, the binding energy difference between the variational
X-Pol result and reference value is about 2 kcal/mol. An empirical
correction based on the Buckingham potential, dominated by the
first term that represent exchange repulsion, gives a correction of
2.1 kcal/mol, and if this is added to the electrostatic terms, the total
X-Pol results obtained using the variational approach become more
consistent with the values from fully delocalized calculations.
Table 2.2 shows the computed electrostatic interaction energies
and the empirical E XCD correction term for a protonated water
cluster using the multilevel X-Pol scheme. The protonated water
cluster is a Zundel ion H5 O+ 2 with four water molecules; the
optimized structure of the complex obtained by the M06/MG3S
method is illustrated in Fig. 2.3. Next we analyze the individual
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

48 Explicit Polarization Theory

Table 2.2 Computed electrostatic interactions energies


E elec (kcal/mol) between H5 O+ 2 (A) and (H2 O)4 (B)
using multilevel X-Pol with the charge-embedding and
variational interaction Hamiltonians

A B Charge-embedding Variational
MK-ESP MPA MPA E XCD E b

M06 M06 −89.1 −87.5 −91.0 18.2 −72.8


M06 B3LYP −87.7 −85.2 −88.1 18.2 −69.9
M06 HF −92.0 −91.7 −94.5 18.2 −76.3
MP2 HF −92.9 −92.7 −94.4 18.2 −76.2
CCSD M06 −89.5 −88.0 −83.9 18.2 −65.7

Note: The 6-31G(d) basis set was used in all calculations with M06/MG3S
optimized monomer and dimer geometries.

Figure 2.3 Fragment partition of the H5 O+


2 (H2 O)4 cluster optimized using
M06/MG3S.

contributions from exchange-repulsion, dispersion and charge


transfer interactions.
As explained elsewhere,98 exchange repulsion can be obtained
as the difference between the energy from the antisymmetrized
X-Pol wave function for the two fragments, Â{ A  B }, and the X-
Pol electrostatic interaction energy E ele obtained at the SCF level.
Using M06/6-31G(d), the charge-embedding scheme yielded an
February 11, 2016 16:59 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Illustrative Examples 49

exchange repulsion energy of 30.0 kcal/mol with the MK charge


model and 28.5 kcal/mol with the MPA charge scheme. This may be
compared with a value of 35.8 kcal/mol from variational X-Pol using
MPA. The difference between the non-variational charge-embedding
scheme and the variational X-Pol result shows that there is charge
penetration between the two monomer fragments, but the use of
unscreened point-charge interactions does not account for this.130
Note that the exchange energy described above was estimated using
the X-Pol electrostatic energy, which is an approximation to the
two-electron repulsion integrals between the two fragments, as
explained in Section 2.2.2.
The exchange repulsion energy can be obtained more rigorously
by block localized energy decomposition analysis,131, 132 and we
have carried out this analysis for the complex at the HF/aug-cc-
pVDZ level. The computed exchange-repulsion and charge transfer
energies are 38.8 kcal/mol and −13.3 kcal/mol, with a net
contribution of 25.5 kcal/mol from the two energy terms.
The dispersion-correlation energy can be defined as the differ-
ence between the interaction energy computed using an accurate
post-Hartree–Fock method and that at the Hartree–Fock level, both
corrected by basis set superposition errors. Here, we have not
included the BSSE correction contributions, which will affect the
quantitative results. Based on the binding energies calculated by
CCSD(T)/MG3S (−69.7 kcal/mol) and by HF/aug-cc-pVDZ (−62.4
kcal/mol), we estimate a dispersion-correlation energy of −7.3
kcal/mol. The sum of these terms, that is, 25.5 minus 7.3 kcal/mol,
which includes exchange repulsion, charge transfer, and dispersion-
correlation, gives an estimate of the E XCD term, which is 18.2
kcal/mol for the interactions between the Zundel ion and four water
molecules. Including the E XCD energy, we find that the total X-
Pol binding energies from various multilevel calculation range from
–65 to –76 kcal/mol, which may be compared with the binding
energy computed using CCSD(T)/MG3S (−69.7 kcal/mol) for the
full system. The discrepancy between the X-Pol results and full QM
results has several contributing factors, chief of which include fixed
geometry at a different level of theory and basis set, and the use of
a rather small basis set in the X-Pol calculations. Without including
E XCD , the binding energies for different X-Pol calculations range
February 11, 2016 16:59 PSP Book - 9in x 6in 02-Qiang-Cui-c02

50 Explicit Polarization Theory

from −83 and −92 kcal/mol, all significantly greater than the full
QM value.

2.4.2 The XP3P Model for Water as a Quantum


Mechanical Force Field
Although ab initio molecular orbital theory and density functional
theory can be used to systematically improve the accuracy of X-Pol
results for large systems, it is still impractical to use these methods
to perform molecular dynamics simulations for an extended period
of time. With increased computing power, this will become feasible
in the future; however, at present, it is desirable to use semiempirical
molecular orbital models such as the popular approaches based
on neglect of diatomic differential overlap (NDDO)133 or the more
recent self-consistent-charge tight-binding density functional (SCC-
DFTB)134, 135 method to model condensed-phase and biomacromole-
cules.
Most semiempirical molecular orbital methods are known to be
inadequate to describe intermolecular interactions, especially on
hydrogen bonding interactions because molecular polarizabilities
are systematically underestimated in comparison with experiments.
Recently, we introduced a polarized molecular orbital (PMO)
method122−123 which is based on the MNDO136−138 formalism with
the addition of a set of p-orbitals on each hydrogen atom.139 It
was found that the computed molecular polarizabilities for a range
of compounds containing hydrogen, carbon and oxygen are very
significantly improved.122, 123 In addition to the enhancement in
computed molecular polarizability, a damped dispersion function
is included as a post-SCF correction to the electronic energy. In
principle, the Lennard–Jones terms originally adopted in the X-
Pol method could be used.66 Here, we added damped dispersion
by following the work of, among others, Tang and Toennies
in wave function theory140 and Grimme in density functional
theory141, 142 and we used the parameters proposed by Hillier
and co-workers in the PM3-D method.143−145 The inclusion of
the damped dispersion terms further improves the description of
intermolecular interactions and the performance of PMO on small
molecular clusters.
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Illustrative Examples 51

We note one previous model similar in spirit to PMO, namely


the semiempirical self-consistent polarization neglect of diatomic
differential overlap (SCP-NDDO) method, parametrized to repro-
duce properties of water clusters by Chang et al.146 They obtained a
good polarizability of water without using p functions on hydrogen
(i.e., they used the minimal basis set employed in most NDDO
calculations), but their model is parametrized only for water. Since
a minimal basis set does not have the flexibility to yield an accurate
polarizability in ab initio calculations,139 it is not clear if the SCP-
NDDO-type parametrization could be extended to a broader range
of molecules.
The construction of a QMFF based on the X-Pol formalism
has two components. First, a computationally efficient quantum
chemical model is needed to describe the electronic structure of
individual molecular fragments. For liquid water, we adopted the
PMOw Hamiltonian,124 which has been specifically parameterized
for compounds containing oxygen and hydrogen atoms. Second,
a practical and parametrizable procedure is desired to model
interfragment electrostatic and exchange-dispersion interactions.
Here, for the electrostatic component, we used the dipole preserving
and polarization consistent (DPPC) charges to approximate the
electrostatic potential of individual fragments. In this approach,
the partial atomic charges are derived to exactly reproduce the
instantaneous molecular dipole moment from the polarized electron
density of each fragment. Since the DPPC charges are optimized
by the Langrangian multiplier technique, there are no adjustable
parameters. For the E XCD term, we used pairwise Lennard–Jones
potentials, which are based two parameters for each atomic number
(with pairwise potentials obtained by combining rules). Employing
this strategy, we have developed an X-Pol quantum chemical model
for water, called the XP3P model, to be used in fluid simulations.
The computed and experimental thermodynamic and dynamic
properties of liquid water at 25◦ C and 1 atm are listed in Table 2.3,
along with the results from an MMFF, namely TIP3P,8 and from two
PMMFFs, namely AMOEBA39 and SWM4-NDP.44 The standard errors
(± 1σ ) were obtained from fluctuations of separate averages over
blocks of 2–4 ×105 configurations. The average density of XP3P is
0.996 ± 0.001 g/cm3 which is within 1% of the experimental value
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

52 Explicit Polarization Theory

Table 2.3 Computed liquid properties of the XP3P model for water
along with those from experiments, and the TIP3P, AMOEBA, and
SWM4-NDP modelsa

XP3P TIP3P AMOEBA SWM4-NDP Expt.


H v , kcal/mol 10.42 ± 0.01 10.41 10.48 10.51 10.51
Density, g/cm3 0.996 ± 0.001 1.002 1.000 1.000 0.997
C p , cal mol−1 K−1 21.8 ± 1.0 20.0 20.9 18.0
106 κ, atm−1 25 ± 2 60 46
105 α, K−1 37 ± 3 75 26
μgas , D 1.88 2.31 1.77 1.85 1.85
μliq , D 2.524 ± 0.002 2.31 2.78 2.33 2.3-2.6
105 D, cm2 /s 2.7 5.1 2.02 2.3 2.3
ε 97 ± 8 92 82 79 ± 3 78

a
H v , heat of vaporization; C p heat capacity; κ, isothermal compressibility; α,
coefficient of thermal expansion; μ, dipole moment; D, diffusion constant; and ε,
dielectric constant.

and is similar to results obtained with other polarizable and non-


polarizable force fields (see Table 2.3). The heat of vaporization was
computed using H v = −E i (l) + RT , where E i (l) is the average
interaction energy per monomer from the Monte Carlo simulation,
and R and T are the gas constant and temperature. The XP3P model
for water yielded an average H v of 10.42 ± 0.01 kcal/mol using
the non-variational (charge-embedding) approximation, whereas
the value is increased to 10.58 kcal/mol using the variational Fock
operator in molecular dynamics. The variational X-Pol approach
lowers the interaction energy in the liquid by about 1.5% as
compared to the direct charge-embedding approach. Considering
the difficulty to achieve converged results on quantities involving
fluctuations, including isothermal compressibility, coefficient of
thermal expansion and dielectric constant, overall, the agreement
with experiment is good, and the performance of the XP3P model is
as good as any other empirical force fields in dynamics simulations.
The average molecular dipole moment of molecules in a
condensed phase is not well defined, but it is very common for it to
be calculated from partial atomic charges or other analysis methods.
We calculated the average dipole moment of water in the liquid
< μliq > to be 2.524 ± 0.002 D, which represents an increase of
35% relative to the gasphase equilibrium-geometry value (1.88 D
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Illustrative Examples 53

from the PMOw Hamiltonian). We found that water molecules in


the liquid experience a wide spectrum of instantaneous electrostatic
fields from the rest of the system, reflected in the distribution of
the instantaneous molecular dipole moments that range from 2.1
to 2.9 D. In MMFF models, the dipole moment is fixed and thus has
no fluctuation at all. Of the two PMMFFs in the table, the AMOEBA
model produced a much larger dipole moment (2.78 D) than PMOw
in the liquid, but the SWM4-NDP model yielded a somewhat small
value of 2.46 D. The PMMFF model of Dang and Chang34 increases
the dipole moment from an equilibrium value of 1.81 D in the gas
to an average value of 2.75 D in the liquid, and a survey of eight
PMMFFs by Chen et al.35 found average dipole moments in the
liquid ranging from 2.31 to 2.83 D. Examining two other PMMFFs,
Habershon et al.147 found average dipole moments of 2.35 and 2.46
D. Stern and Berne,148 based on a fluctuating charge model type
of PMMFF, calculated an equilibrium gas-phase dipole moment of
1.86 D, an average gas-phase dipole moment of 1.92 D (3.6% larger
than experiment), and a liquid-phase average dipole moment of
3.01 D. With another PMMFF, Yu and van Gunsteren42 calculated an
equilibrium gas-phase dipole moment of 1.86 D and a liquid-phase
average dipole moment of 2.57 D.
Murdachaew et al.149 used the SCP-NDDO semiempirical mole-
cular orbital model to calculate an increase in the dipole moment
from the equilibrium gas-phase value to the liquid-phase value from
2.16 D to 2.8 D, an increase of 30%, whereas with the older PM3150
and PM6151 NDDO-type method, which significantly underestimate
the polarizability of water, they found that the increase was only 9%
and 11%, respectively.
Direct dynamics calculations152 with the BLYP exchange-
correlation functional and electric properties computed from
localized Wannier functions predicted an increase of the dipole
moment from an equilibrium value of 1.87 D in the gas to an average
value of 2.95 D in the liquid.
There is no experimental data for direct comparison, but values
ranging from 2.3 to 3.0 D have been advocated, based in part on an
estimate for ice Ih.153, 154 The point of these various comparisons
of the calculated dipole moment of water in the bulk is not
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

54 Explicit Polarization Theory

Figure 2.4 Computed (solid) and experimental (dashed) radial distribution


functions for O–O, O–H, and H–H pairs in liquid water at 25◦ C.
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

Conclusions 55

to claim that the X-Pol value is more accurate than the others,
but rather to show that it is consistent with the range of
previous estimates. Nevertheless, based on analysis of dielectric
screening effects of water, Sprik pointed out that an average dipole
moment of 2.5–2.6 D in liquid water would most likely yield the
correct dielectric constant,155 and a similar approach was used by
Lamoureux et al.156
All other thermodynamic and dynamic properties determined
using the XP3P model in Table 2.3 are in reasonable accord with
experiments and are of similar accuracy in comparison with other
empirical models. We note that in contrast to the large number of
PMMFFs in the literature that are based on parameterization using
different physical approximations, the electronic polarization from
the present XP3P model is explicitly described based on a quantum
chemical formalism.
Figure 2.4 shows the structure of liquid water characterized by
radial distribution functions (RDFs); gx y (r) gives the probability of
finding an atom of type y at a distance r from an atom of type
x relative to the bulk distribution, where the type is determined
by the atomic number. In comparison with the neutron scattering
data, the computational results are in excellent agreement with
experiments. In particular, a well-resolved minimum following the
first peak in the O–O distribution was obtained, whereas the widely
used TIP3P and SPC models do not show this feature.8 For the XP3P
potential, the location of the maximum of the first peak of the O-O
RDF is 2.78 ± 0.05 Å with a peak height of 3.0. For comparison, the
corresponding experimental values are 2.73 Å and 2.8 from neutron
diffraction.157, 158 The coordination number of a water molecule in
the first solvation layer was estimated to be 4.5, in agreement with
the neutron diffraction result of 4.51.157, 158 The oxygen–hydrogen
and hydrogen–hydrogen radial distribution functions also agree well
with experiments.

2.5 Conclusions

Molecular mechanical force fields (MMFFs) have been successfully


used to model condensed-phase and biological systems for a
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

56 Explicit Polarization Theory

half century, and more recently polarized molecular mechanics


force fields (PMMFFs) have been developed. Thanks to careful
parametrization, such classical force fields can be used to provide
useful interpretation of experimental findings. In this chapter,
we presented a new strategy to construct the potential energy
surface for macromolecular systems on the basis of quantum
mechanical formalisms. Rather than using quantum chemical results
as the target for fitting empirical parameters in the force field, we
employ electronic structure theory directly to model intermolecular
interactions. As a result, we call this approach a quantum mechanical
force field (QMFF).
Our strategy is based on partition of condensed-phase and
macromolecular systems into fragments, each of which is explicitly
represented by an electronic structure theory with an antisym-
metrized wave function. To achieve efficient scaling in the computa-
tional cost, the overall molecular wave function of the entire system
is approximated by a Hartree product of the individual fragment
wave functions. Consequently, the self-consistent field optimization
of each molecular wave function can be carried out separately
under the influence of the self-consistent polarization by the electric
field of the rest of the system. Since the electronic polarization
due to interfragment interactions is treated explicitly by electronic
structural theory, we call this method the explicit polarization
(X-Pol) theory. In this chapter, we summarized the theoretical
background of X-Pol and illustrated its application as a versatile
electronic structure method to treat intermolecular interactions
that can be extended to large molecular and biomolecular systems,
including condensed-phase systems. A key application is that we
presented an optimized model for statistical mechanical Monte
Carlo and molecular dynamics simulations of liquid water by using
X-Pol as a QMFF. These illustrative examples in this chapter show
that the X-Pol method can be used as a next-generation force field
to accurately model molecular complexes and condensed-phase
systems and in other work we have also illustrated the method for
biomolecular systems.102
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

References 57

Acknowledgments

We thank the National Institutes of Health (GM46376) for support-


ing of this research.

References

1. Hill, T. L. J. Chem. Phys. 1946, 14, 465.


2. Westheimer, F. H. Mayer, J. E. J. Chem. Phys. 1946, 14, 733.
3. Bixon, M. Lifson, S. Tetrahedron 1967, 23, 769.
4. Levitt, M. Lifson, S. J. Mol. Biol. 1969, 46, 269.
5. Levitt, M. Nat. Struct. Biol. 2001, 8, 392.
6. McCammon, J. A. Gelin, B. R. Karplus, M. Nature 1977, 267, 585.
7. Brooks, B. R. Bruccoleri, R. E. Olafson, B. D. States, D. J. Swaminathan, S.
Karplus, M. J. Comput. Chem. 1983, 4, 187.
8. Jorgensen, W. L. Chandrasekhar, J. Madura, J. D. Impey, R. W. Klein, M. L.
J. Chem. Phys. 1983, 79, 926.
9. Weiner, S. J. Kollman, P. A. Case, D. A. Singh, U. C. Ghio, C. Alagona, G.
Profeta, S. Weiner, P. J. Am. Chem. Soc. 1984, 106, 765.
10. Jorgensen, W. L. Tirado-Rives, J. J. Am. Chem. Soc. 1988, 110, 1657.
11. Allinger, N. L. Yuh, Y. H. Lii, J. H. J. Am. Chem. Soc. 1989, 111, 8551.
12. Mayo, S. L. Olafson, B. D. Goddard, W. A. III. J. Phys. Chem. 1990, 94,
8897.
13. Rappé, A. K., Casewit, C. J. Colwell, K. Goddard, W. A. III Skiff, W. J. Am.
Chem. Soc. 1992, 114, 10024.
14. Hagler, A. Ewig, C. Comput. Phys. Commun. 1994, 84, 131.
15. Cornell, W. D. Cieplak, P. Bayly, C. I. Gould, I. R. Merz, K. M. Ferguson, D.
M. Spellmeyer, D. C. Fox, T. Caldwell, J. W. Kollman, P. A. J. Am. Chem. Soc.
1995, 117, 5179.
16. Halgren, T. A. J. Comput. Chem. 1996, 17, 490.
17. Jorgensen, W. L. Maxwell, D. S. Tirado-Rives, J. J. Am. Chem. Soc. 1996,
118, 11225.
18. MacKerell, A. D. Bashford, D. Bellott Dunbrack, R. L. Evanseck, J. D. Field,
M. J. Fischer, S. Gao, J. Guo, H. Ha, S. Joseph-McCarthy, D. Kuchnir, L.
Kuczera, K. Lau, F. T. K. Mattos, C. Michnick, S. Ngo, T. Nguyen, D. T.
Prodhom, B. Reiher, W. E. Roux, B. Schlenkrich, M. Smith, J. C. Stote,
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

58 Explicit Polarization Theory

R. Straub, J. Watanabe, M. Wiórkiewicz-Kuczera, J. Yin, D. Karplus, M. J.


Phys. Chem. B 1998, 102, 3586.
19. Sun, H. J. Phys. Chem. B 1998, 102, 7338.
20. Chen, B. Siepmann, J. I. J. Phys. Chem. B 1999, 103, 5370.
21. Cieplak, P. Caldwell, J. Kollman, P. J. Comput. Chem. 2001, 22, 1048.
22. Kaminski, G. A. Friesner, R. A. Tirado-Rives, J. Jorgensen, W. L. J. Phys.
Chem. B 2001, 105, 6474.
23. Van Gunsteren, W. F. Daura, X. Mark, A. E. In Encyclopedia of
Computational Chemistry John Wiley & Sons, Ltd, New York: 2002.
24. Duan, Y. Wu, C. Chowdhury, S. Lee, M. C. Xiong, G. Zhang, W. Yang, R.
Cieplak, P. Luo, R. Lee, T. J. Comput. Chem. 2003, 24, 1999.
25. Wang, J. Wolf, R. M. Caldwell, J. W. Kollman, P. A. Case, D. A. J. Comput.
Chem. 2004, 25, 1157.
26. Oostenbrink, C. Villa, A. Mark, A. E. Van Gunsteren, W. F. J. Comput.
Chem. 2004, 25, 1656.
27. Brooks, B. R. Brooks, C. L. Mackerell, A. D. Nilsson, L. Petrella, R. J. Roux,
B. Won, Y. Archontis, G. Bartels, C. Boresch, S. Caflisch, A. Caves, L. Cui,
Q. Dinner, A. R. Feig, M. Fischer, S. Gao, J. Hodoscek, M. Im, W. Kuczera,
K. Lazaridis, T. Ma, J. Ovchinnikov, V. Paci, E. Pastor, R. W. Post, C. B. Pu,
J. Z. Schaefer, M. Tidor, B. Venable, R. M. Woodcock, H. L. Wu, X. Yang, W.
York, D. M. Karplus, M. J. Comput. Chem. 2009, 30, 1545.
28. Mackerell, A. D. J. Comput. Chem. 2004, 25, 1584.
29. Dykstra, C. E. J. Am. Chem. Soc. 1989, 111, 6168.
30. Bernardo, D. N. Ding, Y. Krogh-Jespersen, K. Levy, R. M. J. Phys. Chem.
1994, 98, 4180.
31. Gao, J. Habibollazadeh, D. Shao, L. J. Phys. Chem. 1995, 99, 16460.
32. Gao, J. Pavelites, J. J. Habibollazadeh, D. J. Phys. Chem. 1996, 100, 2689.
33. Gao, J. J. Comput. Chem. 1997, 18, 1061.
34. Dang, L. X. Chang, T.-M. J. Chem. Phys. 1997, 106, 8149.
35. Chen, B. Xing, J. Siepmann, J. I. J. Phys. Chem. B 2000, 104, 2391.
36. Saint-Martin, H. Hernández-Cobos, J. Bernal-Uruchurtu, M. I. Ortega-
Blake, I. Berendsen, H. J. J. Chem. Phys. 2000, 113, 10899.
37. Ren, P. Ponder, J. W. J. Comput. Chem. 2002, 23, 1497.
38. Kaminski, G. A. Stern, H. A. Berne, B. J. Friesner, R. A. Cao, Y. X. Murphy,
R. B. Zhou, R. Halgren, T. A. J. Comput. Chem. 2002, 23, 1515.
39. Ren, P. Ponder, J. W. J. Phys. Chem. B 2003, 107, 5933.
40. Kaminski, G. A. Stern, H. A. Berne, B. J. Friesner, R. A. J. Phys. Chem. A
2004, 108, 621.
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

References 59

41. Patel, S. Mackerell, A. D. Brooks, C. L. J. Comput. Chem. 2004, 25, 1504.


42. Yu, H. Van Gunsteren, W. F. J. Chem. Phys. 2004, 121, 9549.
43. Wick, C. D. Stubbs, J. M. Rai, N. Siepmann, J. I. J. Phys. Chem. B 2005, 109,
18974.
44. Lamoureux, G. Harder, E. Vorobyov, I. V. Roux, B. MacKerell, A. D. Chem.
Phys. Lett. 2006, 418, 245.
45. Gresh, N. Cisneros, G. A. Darden, T. A. Piquemal, J.-P. J. Chem. Theory
Comput. 2007, 3, 1960.
46. Xie, W. Pu, J. MacKerell, A. D. Gao, J. J. Chem. Theory Comput. 2007, 3,
1878.
47. Lopes, P. E. Roux, B. MacKerell, A. D. Theor. Chem. Acc. 2009, 124, 11.
48. Borodin, O. J. Phys. Chem. B 2009, 113, 11463.
49. Xie, W. Pu, J. Gao, J. J. Phys. Chem. A 2009, 113, 2109.
50. Shaw, D. E. Maragakis, P. Lindorff-Larsen, K. Piana, S. Dror, R. O.
Eastwood, M. P. Bank, J. A. Jumper, J. M. Salmon, J. K. Shan, Y. Wriggers,
W. Science 2010, 330, 341.
51. Zhao, G. Perilla, J. R. Yufenyuy, E. L. Meng, X. Chen, B. Ning, J. Ahn, J.
Gronenborn, A. M. Schulten, K. Aiken, C. Zhang, P. Nature 2013, 497,
643.
52. Van Duin, A. C. Dasgupta, S. Lorant, F. Goddard, W. A. III. J. Phys. Chem.
A 2001, 105, 9396.
53. Brenner, D. W. Shenderova, O. A. Harrison, J. A. Stuart, S. J. Ni, B. Sinnott,
S. B. J. Phys.: Condens. Matter 2002, 14, 783.
54. Nielson, K. D. van Duin, A. C. Oxgaard, J. Deng, W.-Q. Goddard, W. A. III J.
Phys. Chem. A 2005, 109, 493.
55. Zhao, M. Iron, M. A. Staszewski, P. Schultz, N. E. Valero, R. Truhlar, D. G.
J. Chem. Theory Comput. 2009, 5, 594.
56. Vesely, F. J. J.Comput. Phys. 1977, 24, 361.
57. Howard, A. E. Singh, U. C. Billeter, M. Kollman, P. A. J. Am. Chem. Soc.
1988, 110, 6984.
58. Pople, J. A. Rev. Mod. Phys. 1999, 71, 1267.
59. Kohn, W. Becke, A. D. Parr, R. G. J. Phys. Chem. 1996, 100, 12974.
60. Yang, W. Phys. Rev. Lett. 1991, 66, 1438.
61. Gadre, S. R. Shirsat, R. N. Limaye, A. C. J. Phys. Chem. 1994, 98, 9165.
62. Stewart, J. J. P. Int. J. Quantum Chem., 1996, 58, 133.
63. Dixon, S. L. Merz, K. M. J. Chem. Phys. 1996, 104, 6643.
64. Dixon, S. L. Merz, K. M. J. Chem. Phys. 1997, 107, 879.
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

60 Explicit Polarization Theory

65. Gao, J. J. Phys. Chem. B 1997, 101, 657.


66. Gao, J. J. Chem. Phys. 1998, 109, 2346.
67. Kitaura, K. Ikeo, E. Asada, T. Nakano, T. Uebayasi, M. Chem. Phys. Lett.
1999, 313, 701.
68. Wierzchowski, S. J. Kofke, D. A. Gao, J. J. Chem. Phys. 2003, 119, 7365.
69. Zhang, D. W. Zhang, J. Z. H. J. Chem. Phys. 2003, 119, 3599.
70. Zhang, D. W. Xiang, Y. Zhang, J. Z. H. J. Phys. Chem. B 2003, 107, 12039.
71. Hirata, S. Valiev, M. Dupuis, M. Xantheas, S. S. Sugiki, S. Sekino, H. Mol.
Phys. 2005, 103, 2255.
72. Collins, M. A. Deev, V. A. J. Chem. Phys. 2006, 125
73. Dahlke, E. E. Truhlar, D. G. J. Chem. Theory Comput. 2006, 3, 46.
74. Dahlke, E. E. Truhlar, D. G. J. Chem. Theory Comput. 2007, 3, 1342.
75. Dułak, M. Kamiński, J. W. Wesołowski, T. A. J. Chem. Theory Comput.
2007, 3, 735.
76. Li, W. Li, S. Jiang, Y. J. Phys. Chem. A 2007, 111, 2193.
77. Xie, W. Gao, J. J. Chem. Theory Comput. 2007, 3, 1890.
78. Hratchian, H. P. Parandekar, P. V. Raghavachari, K. Frisch, M. J. Vreven,
T. J. Chem. Phys. 2008, 128
79. Reinhardt, P. Piquemal, J.-P. Savin, A. J. Chem. Theory Comput. 2008, 4,
2020.
80. Xie, W. Song, L. Truhlar, D. G. Gao, J. J. Chem. Phys. 2008, 128
81. R?ezac?, J. Salahub, D. R. J. Chem. Theory Comput. 2009, 6, 91.
82. Song, L. Han, J. Lin, Y.-l. Xie, W. Gao, J. J. Phys. Chem. A 2009, 113, 11656.
83. Sode, O. Hirata, S. J. Phys. Chem. A 2010, 114, 8873.
84. Gao, J. Cembran, A. Mo, Y. J. Chem. Theory Comput. 2010, 6, 2402.
85. Gordon, M. S. Fedorov, D. G. Pruitt, S. R. Slipchenko, L. V. Chem. Rev.
2011, 112, 632.
86. Jacobson, L. D. Herbert, J. M. J. Chem. Phys. 2011, 134
87. Tempkin, J. O. B. Leverentz, H. R. Wang, B. Truhlar, D. G. J. Phys. Chem.
Lett. 2011, 2, 2141.
88. Mayhall, N. J. Raghavachari, K. J. Chem. Theory Comput. 2011, 7, 1336.
89. Le, H.-A. Tan, H.-J. Ouyang, J. F. Bettens, R. P. J. Chem. Theory Comput.
2012, 8, 469.
90. Wen, S. Nanda, K. Huang, Y. Beran, G. J. Phys. Chem. Chem. Phys. 2012,
14, 7578.
91. Mayhall, N. J. Raghavachari, K. J. Chem. Theory Comput. 2012, 8, 2669.
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

References 61

92. Wang, Y. Sosa, C. P. Cembran, A. Truhlar, D. G. Gao, J. J. Phys. Chem. B


2012, 116, 6781.
93. Richard, R. M. Herbert, J. M. J. Chem. Phys. 2012, 137, 064113.
94. Qi, H. W. Leverentz, H. R. Truhlar, D. G. J. Phys. Chem. A 2013, 117, 4486.
95. Isegawa, M. Wang, B. Truhlar, D. G. J. Chem. Theory Comput. 2013, 9,
1381.
96. Giese, T. J. York, D. M. J. Chem. Phys. 2007, 127
97. Giese, T. J. Chen, H. Dissanayake, T. Giambaşu, G. M. Heldenbrand, H.
Huang, M. Kuechler, E. R. Lee, T.-S. Panteva, M. T. Radak, B. K. York, D. M.
J. Chem. Theory Comput. 2013, 9, 1417.
98. Cembran, A. Bao, P. Wang, Y. Song, L. Truhlar, D. G. Gao, J. J. Chem. Theory
Comput. 2010, 6, 2469.
99. Gao, J. Wang, Y. J. Chem. Phys. 2012, 136
100. Fedorov, D. G. Ishida, T. Kitaura, K. J. Phys. Chem. A 2005, 109, 2638.
101. Hratchian, H. P. Krukau, A. V. Parandekar, P. V. Frisch, M. J. Raghavachari,
K. J. Chem. Phys. 2011, 135, 014105.
102. Xie, W. Orozco, M. Truhlar, D. G. Gao, J. J. Chem. Theory Comput. 2009, 5,
459.
103. Löwdin, P. O. J. Chem. Phys. 1950, 18, 365.
104. Mulliken, R. S. J. Chem. Phys. 1955, 23, 1833.
105. Besler, B. H. Merz, K. M. Kollman, P. A. J. Comput. Chem. 1990, 11, 431.
106. Wang, J. Cieplak, P. Kollman, P. A. J. Comput. Chem. 2000, 21, 1049.
107. Leverentz, H. Gao, J. Truhlar, D. Theor. Chem. Acc. 2011, 129, 3.
108. Momany, F. A. J. Phys. Chem. 1978, 82, 592.
109. Cox, S. Williams, D. J. Comput. Chem. 1981, 2, 304.
110. Singh, U. C. Kollman, P. A. J. Comput. Chem. 1984, 5, 129.
111. Chirlian, L. E. Francl, M. M. J. Comput. Chem. 1987, 8, 894.
112. Breneman, C. M. Wiberg, K. B. J. Comput. Chem. 1990, 11, 361.
113. Wang, B. Truhlar, D. G. J. Chem. Theory Comput. 2012, 8, 1989.
114. Storer, J. Giesen, D. Cramer, C. Truhlar, D. J. Comput. Aided Mol. Des.
1995, 9, 87.
115. Marenich, A. V. Jerome, S. V. Cramer, C. J. Truhlar, D. G. J. Chem. Theory
Comput. 2012, 8, 527.
116. Zhang, P. Bao, P. Gao, J. J. Comput. Chem. 2011, 32, 2127.
117. Frisch, M. J. Trucks, G. W. Schlegel, H. B. Scuseria, G. E. Robb, M. A.
Cheeseman, J. R. Scalmani, G. Barone, V. Mennucci, B. Petersson, G. A.
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

62 Explicit Polarization Theory

Nakatsuji, H. Caricato, M. Li, X. Hratchian, H. P. Izmaylov, A. F. Bloino,


J. Zheng, G. Sonnenberg, J. L. Hada, M. Ehara, M. Toyota, K. Fukuda, R.
Hasegawa, J. Ishida, M. Nakajima, T. Honda, Y. Kitao, O. Nakai, H. Vreven,
T. Montgomery, J. A., Jr. Peralta, J. E. Ogliaro, F. Bearpark, M. Heyd, J.
J. Brothers, E. Kudin, K. N. Staroverov, V. N. Kobayashi, R. Normand, J.
Raghavachari, K. Rendell, A. Burant, J. C. Iyengar, S. S. Tomasi, J. Cossi, M.
Rega, N. Millam, N. J. Klene, M. Knox, J. E. Cross, J. B. Bakken, V. Adamo, C.
Jaramillo, J. Gomperts, R. Stratmann, R. E. Yazyev, O. Austin, A. J. Cammi,
R. Pomelli, C. Ochterski, J. W. Martin, R. L. Morokuma, K. Zakrzewski, V.
G. Voth, G. A. Salvador, P. Dannenberg, J. J. Dapprich, S. Daniels, A. D.
Farkas, Ö. Foresman, J. B. Ortiz, J. V. Cioslowski, J. Fox, D. J. Gaussian
Development Version Gaussian Inc. Wallingford, CT. 2013
118. Zhao, Y. Truhlar, D. G. Theor. Chem. Acc. 2008, 120, 215.
119. Lynch, B. J. Zhao, Y. Truhlar, D. G. J. Phys. Chem. A 2003, 107, 1384.
120. Hehre, W. J. Ditchfield, R. Pople, J. A. J. Chem. Phys. 1972, 56, 2257.
121. Phillips, J. C. Braun, R. Wang, W. Gumbart, J. Tajkhorshid, E. Villa, E.
Chipot, C. Skeel, R. D. Kalé, L. Schulten, K. J. Comput. Chem. 2005, 26,
1781.
122. Zhang, P. Fiedler, L. Leverentz, H. R. Truhlar, D. G. Gao, J. J. Chem. Theory
Comput. 2011, 7, 857.
123. Isegawa, M. Fiedler, L. Leverentz, H. R. Wang, Y. Nachimuthu, S. Gao, J.
Truhlar, D. G. J. Chem. Theory Comput. 2012, 9, 33.
124. Han, J. Mazack, M. J. M. Zhang, P. Truhlar, D. G. Gao, J. J. Chem. Phys. 2013,
139, 054503.
125. Andersen, H. C. J. Chem. Phys. 1980, 72, 2384.
126. Koopman, E. A. Lowe, C. P. J. Chem. Phys. 2006, 124
127. Miyamoto, S. Kollman, P. A. J. Comput. Chem. 1992, 13, 952.
128. Gao, J. Han, J. Zhang, P. MCSOL version 2012xp 2012
129. Mazack, M., J. M. Gao, J. X-Pol, version 2013a1 2013
130. Wang, B. Truhlar, D. G. J. Chem. Theory Comput. 2010, 6, 3330.
131. Mo, Y. Gao, J. Peyerimhoff, S. D. J. Chem. Phys. 2000, 112, 5530.
132. Mo, Y. Bao, P. Gao, J. Phys. Chem. Chem. Phys. 2011, 13, 6760.
133. Pople, J. A. Santry, D. P. Segal, G. A. J. Chem. Phys. 1965, 43, S129.
134. Cui, Q. Elstner, M. Kaxiras, E. Frauenheim, T. Karplus, M. J. Phys. Chem.
B 2001, 105, 569.
135. Elstner, M. Theor. Chem. Acc. 2006, 116, 316.
136. Dewar, M. J. S. Thiel, W. J. Am. Chem. Soc. 1977, 99, 4899.
January 27, 2016 13:7 PSP Book - 9in x 6in 02-Qiang-Cui-c02

References 63

137. Dewar, M. J. S. Thiel, W. J. Am. Chem. Soc. 1977, 99, 4907.


138. Dewar, M. J. S. Thiel, W. Theor. Chim. Acta 1977, 46, 89.
139. Fiedler, L. Gao, J. Truhlar, D. G. J. Chem. Theory Comput. 2011, 7, 852.
140. Tang, K. T. Toennies, J. P. J. Chem. Phys. 1984, 80, 3726.
141. Grimme, S. Antony, J. Ehrlich, S. Krieg, H. J. Chem. Phys. 2010, 132
142. Grimme, S. Ehrlich, S. Goerigk, L. J. Comput. Chem. 2011, 32, 1456.
143. McNamara, J. P. Hillier, I. H. Phys. Chem. Chem. Phys. 2007, 9, 2362.
144. Morgado, C. A. McNamara, J. P. Hillier, I. H. Burton, N. A. Vincent, M. A. J.
Chem. Theory Comput. 2007, 3, 1656.
145. McNamara, J. P. Sharma, R. Vincent, M. A. Hillier, I. H. Morgado, C. A.
Phys. Chem. Chem. Phys. 2008, 10, 128.
146. Chang, D. T. Schenter, G. K. Garrett, B. C. J. Chem. Phys. 2008, 128,
164111.
147. Habershon, S. Markland, T. E. Manolopoulos, D. E. J. Chem. Phys. 2009,
131, 024501.
148. Stern, H. A. Berne, B. J. Chem. Phys. 2001, 115, 7622.
149. Murdachaew, G. Mundy, C. J. Schenter, G. K. Laino, T. Hutter, J. J. Phys.
Chem. A 2011, 115, 6046.
150. Stewart, J. J. P. J. Comput. Chem. 1989, 10, 209.
151. Stewart, J. J. P. J. Mol. Model. 2007, 13, 1173.
152. Silvestrelli, P. L. Parrinello, M. Phys. Rev. Lett. 1999, 82, 3308.
153. Coulson, C. A. Eisenberg, D. Proc. R. Soc. London Ser. A 1966, 291, 445.
154. Caldwell, J. W. Kollman, P. A. J. Phys. Chem. 1995, 99, 6208.
155. Sprik, M. J. Chem. Phys. 1991, 95, 6762.
156. Lamoureux, G. MacKerell, A. D. Roux, B. J. Chem. Phys. 2003, 119, 5185.
157. Soper, A. Chem. Phys. 2000, 258, 121.
158. Head-Gordon, T., Johnson, M. E. Proc. Natl. Acad. Sci. 2006, 103, 7973.
This page intentionally left blank
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Chapter 3

Quantum Mechanical Methods for


Quantifying and Analyzing Non-Covalent
Interactions and for Force-Field
Development

C. David Sherrilla and Kenneth M. Merz, Jr.b


a Center for Computational Molecular Science and Technology,

School of Chemistry and Biochemistry,


School of Computational Science and Engineering,
Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
b Institute for Cyber Enabled Research, Department of Chemistry,

and Department of Biochemistry and Molecular Biology, Michigan State University,


East Lansing, MI 48824-1322, USA
[email protected]

3.1 Introduction

Quantum mechanics is the bedrock upon which multi-scale models


are built. For decades, it has been a source of parameters for
force-field models, which are vastly less computationally expensive
and hence able to reach much longer length and time scales.
It is also being increasingly used in concert with force-field
methods through mixed quantum mechanics/molecular mechanics

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

66 QM Methods for Force-Field Development

(QM/MM) approaches, which have the advantage of accuracy (QM)


in the critical region of interest and efficiency (MM) in the less
important regions.
Quantum mechanical methods have come a long way since
Hartree–Fock/6-31G* started being used widely in force-field
development (e.g., to determine various parameters in CHARMM
(MacKerell, Wiórkiewicz-Kuczera, and Karplus, 1995), the torsion
parameters of OPLS-AA (Jorgensen, Maxwell, and Tirado-Rives,
1996), or partial charges in AMBER (Cornell et al., 1995)). Even
the second-order Møller–Plesset perturbation theory (MP2) method
used in some more recent studies to obtain torsion potentials
(Kahn and Bruice, 2002; Kaminski et al., 2001) is considered out-
dated and inaccurate compared to more sophisticated approaches
like coupled-cluster theory through perturbative triples, CCSD(T)
(Raghavachari et al., 1989). This approach is considered the “gold
standard” of quantum chemistry and is quite reliable except in
challenging cases of transition metals, bond-breaking, or diradicals.
In fact, when coupled with various approaches to estimate the
complete basis set (CBS) limit, CCSD(T) can provide very high
levels of accuracy for non-covalent interactions—making it ideal
for obtaining parameters of “next generation” force fields, and for
quantifying errors in existing force fields.
Recent advances in quantum mechanical energy component
analysis are also highly relevant for force-field development. These
approaches allow one to analyze an intermolecular interaction
in terms of its fundamental physical components: electrostatics,
London dispersion forces, induction/polarization, and short-range
Pauli exchange-repulsion. Such an analysis is very beneficial in
better understanding the various kinds of non-covalent interactions
that govern biomolecular structure and drug docking, such as π –
π interactions, CH/π interactions, S/π interactions, base stacking,
cation-π interactions, halogen bonding, etc. Even more detail is now
available through an atom-partitioned energy component analysis,
which provides not only quantum mechanical energy components
for a non-covalent contact, but also a breakdown of how these
components arise in terms of pairwise atomic contacts (Parrish and
Sherrill, 2014). These energy component analyses afford deeper
insight into non-covalent interactions and they also provide a
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Testing Force Fields Against High-Accuracy Quantum Mechanics 67

way to more rigorously assess force-field methods and to obtain


parameters for them. If benchmark-quality energy components can
be computed, then one can envision next-generation force fields that
are parameterized component by component; indeed, some work
along these lines has already been performed, as discussed below.
Force fields derived in this way have a built-in ability to provide
an energy component analysis (not advisable for current popular
force fields due to compensating errors between different terms).
They are also more likely to have physically reasonable, transferable
parameters.
This chapter will review recent advances in quantum mechanical
methods and their application to validating and parameterizing
force fields. We highlight substantial discrepancies between stan-
dard force-field models of electrostatics (even rather sophisticated
ones involving multipoles) and quantum mechanics for the case of
π-stacking, and we discuss some possible solutions.

3.2 Testing Force Fields Against High-Accuracy Quantum


Mechanics

3.2.1 Coupled-Cluster Benchmarks for Non-Bonded


Interactions
Coupled-cluster theory including single, double, and perturbative
triple substitutions (Raghavachari et al., 1989), CCSD(T), has been
a remarkable success of modern electronic structure theory. It
provides quite high accuracy (Lee and Scuseria, 1995; Řezáč and
Hobza, 2013) so long as the system of interest does not feature
substantial electronic near-degeneracies (as might happen in bond-
breaking reactions, systems containing transition metals, etc.).
Unfortunately, CCSD(T) is very demanding computationally, having
a computational cost formally scaling as O(o3 v 4 ), where o and
v are the number of occupied and virtual (unoccupied) orbitals,
respectively (meaning, for example, that doubling the size of the
molecule causes an increase of 27 in computer time required).
Substantial work has gone into reducing the computational cost
of CCSD(T), using a wide variety of techniques including parallel
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

68 QM Methods for Force-Field Development

algorithms (Janowski, Ford, and Pulay, 2007; Janowski and Pulay,


2008; Kus, Lotrich, and Bartlett, 2009; Lotrich et al., 2008; Prochnow
et al., 2010), natural orbitals (DePrince and Sherrill, 2013b; Klopper
et al., 1997; Landau et al., 2010; Sosa et al., 1989; Taube and
Bartlett, 2005, 2008) and the related optimized virtual orbital space
(Adamowicz, 2010; Adamowicz and Bartlett, 1987; Dedı́ková et al.,
2008; Neogrády, Pitoňák, and Urban, 2005; Pitoňák et al., 2008),
Cholesky decomposition (Aquilante et al., 2010; Boström et al.,
2012; DePrince and Sherrill, 2013a; Epifanovsky et al., 2013; Ped-
ersen, Sánchez de Merás, and Koch, 2004; Pitonak et al., 2011), and
density fitting (DePrince et al., 2014; DePrince and Sherrill, 2013a;
Epifanovsky et al., 2013). The most drastic reductions in computa-
tional cost are achieved when using local correlation models (Neese,
Wennmohs, and Hansen, 2009; Saebø and Pulay, 1985, 1993; Schutz
and Werner, 2000; Werner and Schütz, 2011), which delete (or
approximate) terms involving simultaneous excitation of electrons
that are far apart in the molecule; the farther away two electrons are
from each other, the less their motions should be correlated. A recent
paper reported extremely impressive local-CCSD(T) computations
on the crambin protein (Riplinger et al., 2013). However, the
numerous individual approximations that go into these cutting-edge
local correlation methods have not yet been thoroughly tested, so it
remains unclear when these methods remain reliable and when they
might lose the accuracy of the canonical CCSD(T) approach.
Nevertheless, recent CCSD(T) programs (and modern comput-
ers) are certainly now capable of performing computations on
systems of around 30 atoms, even when using relatively large
triple-ζ basis sets like cc-pVTZ (Dunning, 1989) or aug-cc-pVTZ
(Kendall, Dunning, and Harrison, 1992) (the latter adds diffuse
functions, which can be important for intermolecular interactions).
Hence, it is now feasible to employ CCSD(T) to obtain high-accuracy
benchmark data that could be used to validate or parameterize
force-field models. Although CCSD(T) computations on a large test
set seemed a rather remote possibility a decade ago, recently
several groups have been producing CCSD(T) data for small van der
Waals dimers. These data allow one to examine force-field methods
and also more approximate ab initio methods for their ability to
describe non-covalent interactions. With a sufficient amount of such
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Testing Force Fields Against High-Accuracy Quantum Mechanics 69

data, one could also parameterize the non-bonded terms in force


fields. These data have already proven invaluable in testing and
parameterizing emerging dispersion-corrected density functional
theory (DFT) approaches such as DFT-D (Grimme, 2004, 2006a;
Grimme et al., 2010; Wu and Yang, 2002), double-hybrid functionals
(Chai and Head-Gordon, 2009; Grimme, 2006b; Schwabe and
Grimme, 2007; Zhang, Xu, and Goddard, 2009), the so-called van der
Waals DFT (Dion et al., 2004; Langreth et al., 2005; Lee et al., 2010;
Vydrov and Voorhis, 2010), Becke’s exchange dipole moment (XDM)
approach (Becke and Johnson, 2005; Johnson and Becke, 2006;
Kong et al., 2009) and the related density-dependent dispersion
correction (dDsC) of Steinmann and Corminboeuf (Steinmann and
Corminboeuf, 2010, 2011), and others (von Lilienfeld et al., 2004;
Xu and Goddard, 2004). Benchmark data used for such purposes
have included the very popular S22 test set of Hobza and co-workers
(Jurečka et al., 2006) and its more recently revised interaction
energies (Marshall, Burns, and Sherrill, 2011; Podeszwa, Patkowski,
and Szalewicz, 2010; Takatani et al., 2010), the newer S66 (Řezáč,
Riley, and Hobza, 2011b) and A24 (Řezáč and Hobza, 2013) test sets
from that group, and the NBC10 (Hohenstein and Sherrill, 2009;
Marshall, Burns, and Sherrill, 2011; Sherrill et al., 2009b; Takatani
and Sherrill, 2007), HBC6 (Marshall, Burns, and Sherrill, 2011;
Thanthiriwatte et al., 2011), and HSG (Faver et al., 2011a; Marshall,
Burns, and Sherrill, 2011) test sets. Each of these test sets typically
includes several to a few dozen high-quality CCSD(T) data points.
Some of the test sets (like NBC10 and HBC6) include entire potential
energy curves. Other test sets like S22x5 (Gráfová et al., 2010) and
S66x8 (Řezáč, Riley, and Hobza, 2011a) include some additional,
non-equilibrium geometries generated by displacements from the
equilibrium geometry. Recent work in the Sherrill and Merz groups
is seeking to dramatically expand the volume of available high-
quality data by adding thousands of CCSD(T) energies for interacting
fragments taken from the protein databank (PDB) (Berman, 2000);
this project, which we call the Bio-Fragment Database (BFDb), is
described in more detail below.
For the purpose of high-accuracy benchmarking, not all CCSD(T)
computations are of equal accuracy. Just as in any electronic
structure computation, the choice of one-particle basis set matters.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

70 QM Methods for Force-Field Development

Moreover, wavefunction-based methods like CCSD(T) are more


sensitive to the choice of basis than are DFT-based methods. In
particular, for studying non-covalent interactions (as one would
do to obtain non-bonded parameters), even rather large basis sets
like aug-cc-pVTZ are not large enough to obtain accurate results.
Fortunately, a relatively simple remedy exists for this problem: the
focal-point approach of Allen and co-workers (Császár, Allen, and
Schaefer, 1998; East and Allen, 1993). A focal-point approach that
has been independently employed by a number of early studies
(Klopper et al., 1994; Koch, Fernández, and Christiansen, 1998; Sin-
nokrot, Valeev, and Sherrill, 2002; Tsuzuki et al., 2002) has recently
gained widespread popularity among those performing benchmark-
quality CCSD(T) computations for non-covalent interactions: one
estimates the interaction energy in a large basis set using the more
tractable MP2 method and then adds a “coupled-cluster correction,”
CCSD(T)
δMP2 to account for higher-order correlation absent in MP2. This
approach may be expressed as
 
large-basis large-basis
E CCSD(T) ≈ E MP2 + E CCSD(T)
small-basis
− E MP2
small-basis
, (3.1)
 
where E CCSD(T) − E MP2 is the “coupled-cluster correction,” which
CCSD(T)
may be written more compactly as δMP2 . It is interesting to
note that this same approach might be interpreted alternatively
as a small-basis CCSD(T) computation plus a basis-set correction
evaluated as the difference between MP2 in a large basis and a small
basis, i.e.,
 
large-basis large-basis
E CCSD(T) ≈ E CCSD(T)
small-basis
+ E MP2 − E MP2
small-basis
. (3.2)
In the past several years, it has become standard practice to replace
large-basis
E MP2 with an estimate of the MP2 complete basis set (CBS) limit,
CBS
i.e., E MP2 . This then allows one to approximate CCSD(T) in the CBS
limit. This is important for reliable benchmarks, as even rather large
basis sets like aug-cc-pVTZ are not sufficient to converge to the CBS
limit (Burns, Marshall, and Sherrill, 2014). The Dunning correlation-
consistent basis sets like cc-pVXZ or aug-cc-pVXZ (where X=D, T,
Q, 5, etc.) are designed to systematically converge towards the
CBS limit (Dunning, 1989), so one can estimate the MP2/CBS
limit straightforwardly by, for example, using Helgaker two-point
extrapolation (Halkier et al., 1998) of the MP2 correlation energies
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Testing Force Fields Against High-Accuracy Quantum Mechanics 71

with a pair of basis sets like aug-cc-pVTZ and aug-cc-pQZpVQZ; this


is typically effective at very closely approaching the MP2/CBS limit
(Burns, Marshall, and Sherrill, 2014).
CCSD(T)
The coupled-cluster correction, δMP2 , is often relatively insen-
sitive to the choice of basis set, and hence a number of studies
have employed the modest aug-cc-pVDZ basis set. However, as first
pointed out by Janowski and Pulay (Janowski and Pulay, 2007), this
CCSD(T)
basis is not quite sufficient to fully converge δMP2 to within 0.1
−1
kcal mol for the benzene dimer. Subsequent systematic studies
of various van der Waals dimers indicated modest changes in
CCSD(T)
δMP2 if one improves the basis from aug-cc-pVDZ to aug-cc-pVTZ;
the largest improvements are seen for H-bonded systems (which
typically have the largest interaction energies). In some H-bonding
CCSD(T)
cases, like water dimer for formic acid dimer, the sign of δMP2
can even be wrong in an aug-cc-pVDZ basis (Marshall, Burns, and
CCSD(T)
Sherrill, 2011). Unfortunately, small changes in δMP2 persist when
one proceeds to aug-cc-pVQZ and even larger basis sets (Burns,
Marshall, and Sherrill, 2014; Marshall, Burns, and Sherrill, 2011);
nevertheless, aug-cc-pVTZ seems sufficient in the majority of cases
to converge δMP2 within a few hundredths of one kcal mol−1 .
CCSD(T)

Direct extrapolation of CCSD(T) correlation energies using aug-


cc-pVTZ and aug-cc-pVQZ (or larger) basis sets is an even better
approach to obtain CCSD(T)/CBS limits, but this is only possible for
very small systems at present. Hence, the best current estimates
of CCSD(T)/CBS benchmark interaction energies for small van der
Waals dimers typically employ MP2/CBS estimates using the aug-cc-
CCSD(T)
pVTZ and aug-cc-pVQZ basis sets (or better) and a δMP2 correction
evaluated in the aug-cc-pVTZ basis. Benchmark datasets of this
quality or better include the S22B, NBC10A, HBC6A, HSG-A datasets
(Marshall, Burns, and Sherrill, 2011) and the A24 dataset (Řezáč
and Hobza, 2013). The S66 dataset is of nearly this quality (using
a not quite as reliable aug-cc-pVDZ and aug-cc-pVTZ extrapolation
CCSD(T)
of δMP2 ).
In most published CCSD(T)/CBS benchmarks for van der Waals
dimers, the Boys–Bernardi counterpoise (CP) correction (Boys and
Bernardi, 1970) has been employed. This procedure is meant to
correct for basis set superposition error (BSSE), in which incom-
pleteness in the one-particle basis set leads to artificial increases
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

72 QM Methods for Force-Field Development

in computed interaction energies as each monomer “borrows” some


of the basis functions from the other monomer. To approximately
account for this artificial effect, the CP correction estimates the
extent of this basis function borrowing as the difference in energy
between a monomer computed with its own basis and a monomer
computed with a “dimer basis,” i.e., the basis of the dimer complex
but without the electrons or nuclei of the other monomer present. By
making available the full basis of the other monomer, and not just the
unoccupied orbitals, the CP correction can overestimate the extent
of BSSE. Whether or not to use CP correction has been a matter of
much debate in the literature. In our experience, CP correction tends
to be helpful in dispersion-dominated cases (like methane dimer or
benzene dimer). For H-bonded cases, by coincidence one can often
achieve better results in smaller basis sets without the CP correction
(Burns, Marshall, and Sherrill, 2014; Halkier et al., 1999, 1997).
However, in all types of cases, convergence towards the CBS limit is
smoother when using CP correction compared to not using it (Burns,
Marshall, and Sherrill, 2014; Halkier et al., 1999). As a practical
compromise, some authors have advocated using the average of CP
corrected and uncorrected values (Halkier et al., 1999; Kim and
Kim, 1998; Kim et al., 1995, 1992; Kim, Tarakeshwar, and Lee, 2000;
Mackie and DiLabio, 2011; Schutz et al., 1997). A careful study of CP
correction, no correction, or averaged corrections has recently been
reported for focal-point CCSD(T)/CBS schemes (Burns, Marshall,
and Sherrill, 2014).
Before concluding this section, it is worth briefly mentioning
possible additional sources of error in the ab initio CCSD(T)/CBS
interaction energies. Unless heavy elements are present, relativistic
effects should be negligible (Řezáč and Hobza, 2013). Core-valence
correlation, neglected in most studies, contributes a few hundredths
of one kcal mol−1 (or around 0.5%) for small van der Waals dimers
(Podeszwa, Patkowski, and Szalewicz, 2010; Řezáč and Hobza,
2013), which may not be totally negligible for the best benchmarks,
but is not a source of serious concern. Quadruple substitutions
in the wavefunction, neglected in CCSD(T), have been explored
(Hopkins and Tschumper, 2004; Řezáč and Hobza, 2013). In most
cases, the basis sets feasible for demanding CCSDT(Q) or CCSDTQ
computations are quite small (usually smaller than augmented,
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Testing Force Fields Against High-Accuracy Quantum Mechanics 73

polarized double-ζ ). Nevertheless, based on limited available


evidence, quadruple excitations typically contribute a rather small
amount to intermolecular interaction energies: in Hobza’s tests,
around 0.03 kcal mol−1 or less for small dimers (although 0.08
kcal mol−1 for formaldehyde dimer), and in Tschumper’s tests of
larger systems, usually 0.1 kcal mol−1 or less (maximum difference
of 0.2 kcal mol−1 for furan dimer). Hopkins and Tschumper
suggest that quadruples corrections can be around 10% of the
triples correction in CCSD(T), which can be substantial for π -
stacked systems. Given the data of Hopkins and Tschumper, in
principle quadruple excitations should be included in high-accuracy
benchmarking; however, given their typically small contribution and
their extreme computational expense, doing so seems impractical at
the present time except for very small systems.

3.2.2 Comparison of Force Fields to Quantum Mechanical


Benchmarks
Now that truly high-quality quantum mechanical benchmark data
are becoming available for small systems, it is interesting to evaluate
how existing force fields compare to these benchmarks for non-
bonded contacts. It is also interesting to explore the development
of fully ab initio force fields, without the use of any empirical data.
The former question is examined in this section, and the next section
discusses the latter topic.
Direct comparison of force fields to benchmark-quality CCSD(T)
energies is complicated by the fact that most standard, workhorse
force fields do not include polarization terms. This leads to errors,
but these errors can be partially compensated by other errors.
Hence, a force field that compares poorly to CCSD(T) benchmarks
for a set of van der Waals dimers may still perform fairly well
for condensed-phase properties, due to error cancellation. This is
the rationale for obtaining atomic charges in the AMBER force
field using restrained electrostatic potential (RESP) fitting (Bayly,
1993) to modest-quality Hartree–Fock/6-31G* quantum chemical
computations; this method tends to overestimate dipole moments,
but this is considered beneficial for simulations in water, to
approximately cancel errors from neglecting polarization effects
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

74 QM Methods for Force-Field Development

(Cornell et al., 1995). On the other hand, given that recent advances
in estimating the CCSD(T)/CBS limit allow us to achieve nearly the
exact answer for small van der Waals dimers, any large discrepancies
between force fields and these benchmark values would remain a
valid cause for concern. Moreover, for force fields that do contain
polarization terms, the comparison is fair and one should aim to
closely match the benchmark quantum results.

3.2.3 Performance of Force Fields for π-Interactions


Comparison of force fields vs. high-quality ab initio data is especially
interesting for π-interactions because one might wonder whether
the delocalized nature and polarizability of the π electrons might
make them more difficult to model accurately using standard
force fields. Additionally, π interactions can be quite important in
biomolecular systems (Salonen, Ellermann, and Diederich, 2011).
Around 60% of aromatic side-chains in proteins are involved in π –
π interactions (Burley and Petsko, 1985), and simulations indicate
that base stacking interactions are critical for the stability of DNA
and RNA (Černý et al., 2008).
A 2009 study (Paton and Goodman, 2009) examined a variety
of popular force fields for their ability to match the geometries
and stabilization energies of van der Waals dimers in the S22 and
JSCH-2005 databases (Jurečka et al., 2006). The latter database
contains a large number of H-bonded and stacked nucleobases
and some pairs of amino acid side-chains. The study found that
all force fields considered underestimated H-bonding strength, but
that other interactions were described more accurately, with OPLS-
AA giving a mean unsigned error of 2 kcal mol−1 over the 165
complexes considered, outperforming some DFT methods examined
(most likely due to the omission of dispersion terms in standard DFT
approaches); omitting H-bonding complexes reduced the OPLS-AA
mean unsigned error to 1 kcal mol−1 . A drawback of this study is that
the original S22 data, and especially the JSCH-2005 data, are not of
true CCSD(T)/CBS quality due to truncations in the basis sets used
CCSD(T)
for the MP2 and δMP2 components due to limitations of software
and hardware in 2005.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Testing Force Fields Against High-Accuracy Quantum Mechanics 75

Some higher-quality, CCSD(T)/CBS benchmark data became


available around this time for potential energy curves of a small
number of prototype van der Waals dimers, and it was used
to assess the accuracy of CHARMM, AMBER, OPLS-AA, and MM3
potentials for non-covalent π interactions (Sherrill et al., 2009a).
Specifically, potential curves for benzene · CH4 , benzene · H2 S, and
the sandwich, T-shaped, and parallel-displaced configurations of the
benzene dimer were examined. While all of the tested force fields
were qualitatively correct, none of them provided a close match to
more than one or two of the benchmark quantum potential curves.
The shape of the potential curves for the parallel-displaced
benzene dimer (scanning horizontal displacements for fixed vertical
displacements of 3.2, 3.4, and 3.6 Å) were particularly difficult
for the standard force fields to match: they all gave potentials
that were far too flat, instead of showing a pronounced peak at
zero horizontal displacement (sandwich configuration) and distinct
minima at horizontal displacements around 1.6 Å (see Fig. 3.1). It
would be easy to argue that these discrepancies are just a result
of imperfect parameters. To investigate this possibility, the non-
bonded parameters (the one unique atomic charge in benzene and
the Lennard–Jones parameters) were optimized to minimize the
sum of the absolute errors for all five potential curves considered
for benzene dimer. Unfortunately, even with optimal parameters,
performance for the parallel-displaced curves was still rather poor.
This in turn suggested that it is not the parameters that are at
fault, but that the functional form for non-bonded interactions in
the standard force fields is not sufficiently flexible. A more detailed
analysis of this problem was carried out using energy component
analysis (specifically, symmetry-adapted perturbation theory) and
is discussed in more detail below.

3.2.4 Error Analysis for the Indinavir/HIV-II Protease


Complex
The study discussed above focused on van der Waals dimers
exhibiting various kinds of prototype intermolecular π interactions.
An alternative strategy is to compare force fields to accurate
quantum data for non-bonded contacts found in the crystal structure
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

76 QM Methods for Force-Field Development

0.0
est’d CCSD(T)/CBS
AMBER FF99
Interaction energy (kcal/mol)

CHARMM/OPLS
OPT-FF
-1.0 MM3

-2.0

-3.0
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
R (Angstrom)

Figure 3.1 Interaction energies (kcal mol−1 ) for the parallel-displaced


benzene dimer with a fixed vertical separation of 3.6 Å. The OPT-FF results
were obtained by optimizing the unique atomic charge and the Lennard–
Jones parameters to minimize the errors for all benzene dimer potential
curves considered (sandwich, T-shaped, and three vertical separations for
parallel-displaced) in Sherrill et al. (2009a).

of an actual biomolecular complex. This was the approach of


Faver et al., who studied the protein–ligand complex of HIV-II
protease with indinavir (Faver et al., 2011a). The system was
decomposed into 21 interacting fragments, featuring various types
of intermolecular interactions including H-bonding, CH/π , and
simple van der Waals contacts. For each fragment, interaction
energies were computed with various force fields, semiempirical
methods, density functionals, and wavefunction methods, and the
results were compared to accurate CCSD(T)/CBS benchmark values.
The set of 21 CCSD(T)/CBS interaction energies is dubbed the
HSG database, and it has been subsequently revised (Marshall,
Burns, and Sherrill, 2011) and used in the assessment of various
approximate quantum mechanical methods (Burns et al., 2011;
DiLabio, Koleini, and Torres, 2013; Hostaš, Řezáč, and Hobza, 2013;
Johnson et al., 2013; Marshall and Sherrill, 2011; Parker et al., 2014;
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Testing Force Fields Against High-Accuracy Quantum Mechanics 77

Riley et al., 2021; Torres and DiLabio, 2012). Faver et al. found that
most of the theoretical methods examined gave less strongly bound
fragments compared to the benchmark values. Methods without
any explicit description of London dispersion effects (e.g., Hartree–
Fock, and various semi-empirical models) were among the worst
performers.
A major component of the study was a formal error analysis
of binding energies in protein–ligand complexes, proceeding from
an earlier general discussion by Merz (Merz, 2010). Focusing for
simplicity on the electronic energy contribution to binding, the
study fit Gaussian error probability functions to the error distrib-
utions seen for each theoretical method across the 21 fragments
considered, yielding both the mean error and the variance, which is
related to the width of the Gaussian. Quantum methods (apart from
Hartree–Fock) tended to exhibit a small variance but sometimes had
a large mean error. Force-field methods tended to exhibit modest
mean errors but larger variances (with the exception of AMOEBA
(Ponder et al., 2010), which had a small mean error and a small
variance, competitive with MP2/aug-cc-pVDZ). The mean error was
considered as a systematic error (e.g., a particular method tends
to overbind or underbind) and the variance was associated with
random error (random with respect to the particular non-bonded
contact type and geometry). Given these estimates of systematic
and random error, standard error propagation analysis was used to
show how these errors would propagate into errors for the overall
protein–ligand binding energy; the systematic error grows linearly
with the number of contacts, and the random error is proportional
to the square root of the number of contacts. For the HIV-II
protease/indinavir complex, most methods considered exhibited
overall systematic and random errors that were surprisingly large
compared to the experimental binding affinity. Although it is
possible that systematic errors in other parts of the computation of a
free energy of binding (e.g., solvation contributions) might partially
compensate for the systematic errors in the electronic contribution,
in principle an accurate and robust approach for binding energy
estimates should not rely on this type of error cancellation. The
study concluded by suggesting that improved binding energies
might be obtained if one could correct for systematic errors of a
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

78 QM Methods for Force-Field Development

particular force field through knowledge of reliable Gaussian error


probability functions computed by comparison to accurate quantum
data for many types of interactions in different geometries (possibly
broken down into different error functions for different classes of
interactions). Obviously, studies along these lines would require
a very large database of benchmark-quality interaction energies.
Our groups have begun constructing just such a database, the Bio-
Fragment Database (see below).

3.2.5 Error Analysis for Ubiquitin Folding


The general conclusions of the protein–ligand study (Faver et al.,
2011a) discussed above—that surprisingly large systematic and
random errors are associated with force-field or approximate
quantum estimations of interaction energies, and that these errors
grow with the size of the system studied—are rather general and
should also apply to protein folding, crystal isomorph prediction,
etc. Hence, in a subsequent study, we studied error propagation in
protein folding using a similar approach (Faver et al., 2011b). As a
model system, the native fold of ubiquitin was examined, and 42 van
der Waals contacts and 50 H-bonding and/or polar contacts were
identified. The interacting fragments were extracted and capped
with hydrogens to form a test set of 92 van der Waals dimers.
As before, benchmark CCSD(T)/CBS gas-phase interaction energies
were obtained for these fragments, and they were used to evaluate
the systematic and random errors for various force fields and
approximate quantum methods. Results were generally consistent
with the study of the indinavir/HIV-II protease complex: quantum
methods lacking an account of dispersion (Hartree–Fock and the
semi-empirical methods tested except for PM6-DH2 (Korth et al.,
2010)) provided large systematic and random errors, while various
dispersion-corrected density functionals and post-Hartree–Fock
methods performed better. The force fields considered displayed
systematic errors similar to those of some of the better quantum
chemical methods, but larger random errors. Consistent with the
larger interaction energies, errors for polar contacts were generally
larger than errors for non-polar contacts.
Even for a relatively reliable quantum method like B97-D
(Grimme, 2006a) (Grimme’s reparameterization of the Becke’s B97
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Testing Force Fields Against High-Accuracy Quantum Mechanics 79

functional (Becke, 1997) with a -D dispersion correction) using


the TZVP polarized triple-ζ basis set, the random error for the
sum of the folding interactions in ubiquitin was estimated to be
8.9 kcal mol−1 . By comparison, a much more practical computational
method like the ff99sb force field (Lindorff-Larsen et al., 2010)
provides a random error of 18.4 kcal mol−1 . Systematic errors may
be partially controlled through benchmark studies of large numbers
of contacts to obtain reliable error probability functions and mean
errors for different types of contacts; however, the remaining
random errors seem to offer a serious challenge for physics-based
scoring functions, and the errors grow for larger molecules as
discussed above.

3.2.6 The Bio-Fragment Database


Existing databases of benchmark-quality interaction energies, like
the S22 (Jurečka et al., 2006; Marshall, Burns, and Sherrill,
2011; Podeszwa, Patkowski, and Szalewicz, 2010; Takatani et al.,
2010), S66 (Řezáč, Riley, and Hobza, 2011a), HBC6 (Marshall,
Burns, and Sherrill, 2011; Thanthiriwatte et al., 2011), and NBC10
(Marshall, Burns, and Sherrill, 2011; Sherrill et al., 2009b) test
sets, are unavoidably biased in their selection of test molecules and
geometries. Hence, our groups have embarked upon a project to
obtain a very large database of non-bonded interacting fragments
taken directly from the protein databank. The Merz group has mined
the PDB (Berman, 2000) to obtain representative configurations of
the majority of important protein-protein contacts. In each non-
metallic, non-ligand-complexed protein with acceptable crystallo-
graphic resolution (< 2.0 Å), inter-residue contacts were identified,
truncated at the first sp3 -hybridized carbon, and capped with hydro-
gen to form interacting fragments of suitable size for high-accuracy
benchmarking (typically 40 atoms or less). Near-redundant con-
figurations were removed, and the contacts were sorted into
three databases: SSI (∼3300 sidechain–sidechain interactions) BBI
(100 backbone-backbone interactions), and BSI (∼2800 backbone-
sidechain interactions); together these databases comprise the
BFDb. Future work may examine protein-ligand interactions.
Given the very large number of fragment pairs in the database,
the benchmarking work is using a tiered system in which the
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

80 QM Methods for Force-Field Development

CCSD(T)
very expensive “gold standard” level (MP2/CBS + δMP2 /aug-cc-
pVTZ) is used only for a limited number of cases, while most
benchmarks are performed using a more affordable but only slightly
less reliable “silver standard” level, namely, dispersion-weighted
explicitly-correlated coupled-cluster theory (Marshall and Sherrill,
2011), DW-CCSD(T)-F12, with an aug-cc-pVDZ basis set (explicitly
correlated methods are capable of good performance even using
modest basis sets). Because the thousands of interacting fragments
in the BFDb are to be studied by a large number of methods
(force fields, density functionals, and wavefunction methods), data
management becomes a serious issue. Our groups are jointly
developing a web portal to make all this data available and to analyze
it. Error statistics for approximate methods can be displayed in
histogram form and in the form of Gaussian error distributions, and
the tables of error data contain hyperlinks to 3D models of particular
interacting fragments. The analysis may be filtered according to
binding motif, fragment identity, etc. We hope that this large
amount of data will be beneficial in quantifying and understanding
the intermolecular contacts that govern protein folding and drug
binding.
Figure 3.2 shows an error analysis for the General AMBER force
field (GAFF) (Wang et al., 2004), the Austin Model 1 (AM1) semi-
empirical method (Dewar et al., 1985), and the B3LYP density
functional approximation (Stephens et al., 1994) corrected for dis-
persion using Grimme’s third generation semi-empirical correction
(Grimme et al., 2010) (B3LYP-D3) in an aug-cc-pVDZ basis set. For
each method, we show a 20 × 20 grid, representing all possible
combinations of interacting side-chains. Because of the way the
database is constructed, we did not include interactions involving
glycine because the side-chain consists of a single H atom only.
Darker regions on the plot represent larger errors vs. the CCSD(T)
benchmarks. We note that GAFF exhibits substantial errors for
charged or polar sidechains interacting with each other, or indeed
for any type of contact involving a negatively charged sidechain. On
the other hand, nonpolar–nonpolar contacts are reasonably good for
GAFF. AM1 exhibits large errors across the board, and despite being
an (approximate) quantum mechanical method, does not perform
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Symmetry-Adapted Perturbation Theory 81

Figure 3.2 Preliminary results for errors vs. CCSD(T) benchmarks for
the sidechain–sidechain interaction (SSI) database of more than 3300
contacts from the Protein Data Bank. Results are grouped according to
sidechain identity; within each square on the grid are smaller, shaded
squares representing individual contacts within the database (grid locations
with more contacts represented contain smaller squares). Darker shading
represents larger errors. Glycine is not represented in the analysis (see text).

nearly as well as GAFF. On the other hand, B3LYP-D3 performs much


better than the other two methods, with much smaller errors (most
contacts are slightly overbound). This is not surprising, given that
B3LYP is a very popular and often reliable quantum mechanical
approach; it is also much more computationally costly than GAFF or
AM1. Not shown are results from B3LYP without the -D3 dispersion
correction; those results are substantially worse and reinforce the
growing consensus that DFT studies of non-covalent interactions
need to use some kind of dispersion correction or else a newer
functional meant to do a better job at modeling London dispersion
interactions.

3.3 Understanding and Quantifying Intermolecular


Interactions using Symmetry-Adapted Perturbation
Theory

There are various forms of energy component analysis that can


break down non-bonded contacts in terms of their fundamental
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

82 QM Methods for Force-Field Development

physical components, including the Kitaura–Morokuma energy


decomposition analysis (EDA) (Kitaura and Morokuma, 1976; Mo-
rokuma, 1971), the reduced variational space self-consistent-field
(RVS SCF) method (Bagus, Hermann, and Bauschlicher, 1984; Chen
and Gordon, 1996; Stevens and Fink, 1987), and symmetry-adapted
perturbation theory (SAPT) (Jeziorski, Moszynski, and Szalewicz,
1994; Szalewicz, 2012; Williams et al., 1993). The latter begins with
unperturbed wavefunctions for the monomers and then treats the
intermolecular interaction using perturbation theory. Corrections
are applied to account for the fact that the total wavefunction must
be anti-symmetric with respect to the interchange of (fermionic)
electron coordinates: this is the “symmetry-adapted” part of the
theory. Standard wavefunction-based SAPT applies many-body
perturbation theory (and for some terms, coupled-cluster theory)
to account for corrections due to electron correlation; hence, the
approach is a triple perturbation theory (the perturbations being
V̂ , the intermolecular operator; ŴA , the electron correlation pertur-
bation for monomer A; and ŴB , the analogous term for monomer
B). In recent years, an alternative approach termed SAPT(DFT)
(Misquitta et al., 2005) or DFT-SAPT (Heßelmann, Jansen, and
Schütz, 2005) uses DFT for the description of the monomers
(including intramolecular correlation), and dispersion energies are
computed using the frequency-dependent density susceptibility
function. Review articles on SAPT theory and algorithms have been
recently published (Hohenstein and Sherrill, 2012; Szalewicz, 2012).
The SAPT approach leads straightforwardly to electrostatic,
London dispersion, induction/polarization, and exchange-repulsion
terms. Like any perturbation theory, wavefunction-based SAPT can
be carried out to higher and higher orders. The lowest meaningful
order, SAPT0, starts from Hartree–Fock monomer wavefunctions
and treats the intermolecular interaction through second-order;
intramolecular correlation is neglected (treated through “zeroth”
order), and hence the 0 in the name SAPT0. By favorable error
cancellation, SAPT0 can give fairly reliable results (Hohenstein
and Sherrill, 2010) when used in conjunction with a jun-cc-pVDZ
basis set (this is Truhlar’s “calendar” naming scheme for Dunning’s
correlation-consistent basis sets (Papajak and Truhlar, 2011),
denoting cc-pVDZ on H atoms and aug-cc-pVDZ on other atoms,
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Symmetry-Adapted Perturbation Theory 83

but including only the s and p diffuse shells). The computational


cost of SAPT0 scales formally as O(n5 ) if no local approximations
are employed. However, due to the introduction of density-fitting,
Laplace energy denominators, and other algorithmic techniques
(Hohenstein et al., 2011b; Hohenstein and Sherrill, 2010), SAPT0 is
capable of handling systems with around 300 atoms or 4000 basis
functions on a single workstation computer.
The next level in the perturbation theory, SAPT2, adds cor-
rections due to MP2-like intramolecular electron correlation,
with an increase of computational cost to O(n6 ). The following
improvement, which we have called SAPT2+ because it adds the
remaining dispersion contributions first- and second-order in Ŵ
(21) (22)
that are missing from SAPT2 (namely, E disp and E disp ), scales as
O(n ) because it contains a term analogous to the triples term
7

in CCSD(T). This approach can be subsequently improved through


additional terms to yield what Szalewicz and co-workers have called
“full SAPT,” or with additional third-order terms to yield what we
have termed SAPT2+(3) and SAPT2+3 (Hohenstein and Sherrill,
2012). All of these methods of SAPT2+-quality or above provide
interaction energies that begin to approach CCSD(T) in quality
[while simultaneously providing interaction energy components,
unlike CCSD(T)].
We recently completed a thorough systematic study of the
accuracy of these various flavors of SAPT (including DFT-SAPT), in
conjunction with several choices of basis set, comparing to gold-
standard CCSD(T)/CBS values (Parker et al., 2014). Selected results
are presented in Fig. 3.3. Perhaps the most surprising results of the
study were that H-bonded systems remain a challenge even for some
of the more elaborate SAPT methods, with mean absolute errors
(MAE’s) of more than 1 kcal mol−1 for some of the SAPT methods
including triple excitations. This disappointing performance can be
partially understood by considering that the H-bonding interactions
have much larger interaction energies (so that the errors are not
quite as bad on a relative scale), and also that H-bonding leads to
close intermolecular contacts, where the perturbation theory that
underlies SAPT can begin to break down. Another explanation is that
the SAPT methods may require larger basis sets to achieve smaller
errors: indeed, even CCSD(T)/aug-cc-pVTZ exhibits a MAE of about
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

84 QM Methods for Force-Field Development

Figure 3.3 Mean absolute errors (MAE’s) for various SAPT methods of
increasing complexity in conjunction with the aug-cc-pVDZ (aDZ) and
aug-cc-pVTZ (aTZ) basis sets, evaluated against estimated CCSD(T)/CBS
interaction energies from the S22B, HSG-A, NBC10A, and HBC6A databases
(Marshall, Burns, and Sherrill, 2011). Data from Parker et al. (2014).
sSAPT0 refers to an exchange-scaled variant of SAPT0. For SAPT0 and
sSAPT0, the jun-cc-pVDZ (jaDZ) basis set is substituted for aug-cc-pVDZ
because it exhibits better cancellation of errors. Wide bars represent MAE’s
averaged over all four databases; within each bar are three smaller bars,
representing (from left to right) averages over H-bonding, mixed-influence,
and dispersion-dominated interactions.

0.7 kcal mol−1 for the H-bonding cases as compared to CCSD(T) in


the complete-basis-set limit.
We attribute part of the problem for H-bonded systems to
breakdowns in the so-called “S 2 ” approximation, which is used for
(10)
all terms except for the leading exchange term, E exch . Hence, all
results in our study (including those in Fig. 3.3) apply a scaling of
(10) (10)
all exchange-repulsion terms by the ratio (E exch /E exch (S 2 )). For the
case of SAPT0, we found substantially better results if we scaled
the other exchange-repulsion terms by the cube of this ratio, which
we denoted scaled-SAPT0, or sSAPT0. Although using an exponent
other than 1 for this scaling ratio lacks any theoretical justification, it
has some precedent in the literature (Lao and Herbert, 2012) and it
may be useful on a practical basis to provide a relatively inexpensive
SAPT method with good error statistics. Indeed, sSAPT0/jun-cc-
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Symmetry-Adapted Perturbation Theory 85

pVDZ is superior on average to many of the more elaborate SAPT


methods considered.
A disappointing result of our systematic study was that DFT-
SAPT, like the other types of SAPT considered, had difficulty for
H-bonded systems, especially those involving double H-bonds like
those in the HBC6 database (Thanthiriwatte et al., 2011). The MAE
over our test sets was nearly 2 kcal mol−1 for an aug-cc-pVDZ
basis, and about 1 kcal mol−1 for an aug-cc-pVTZ basis. Such results
are hardly better on average than those for SAPT0/jun-cc-pVDZ.
This poor performance was not expected, given the very good
results reported in the literature for DFT-SAPT [and SAPT(DFT)] for
dispersion-dominated systems. Unfortunately the exchange scaling
trick that helps sSAPT0 does not help DFT-SAPT because the
corrections are in the wrong direction. Our findings suggest that
SAPT(DFT) may not be the best approach for parameterizing force
fields, at least for H-bonded systems.
Better results can be had from (more computationally expensive)
higher-order wavefunction-based SAPT, although there is a delicate
interplay between the errors from remaining basis set incomplete-
ness, missing higher-order terms, etc. Depending on exactly what
terms are included, some of the higher-order SAPT methods may
work better on average in the smaller aug-cc-pVDZ basis set (e.g.,
SAPT2+) than the larger aug-cc-pVTZ basis set. Some of the best
performers are SAPT2+/aug-cc-pVDZ, SAPT2+3/aug-cc-pVDZ, and
SAPT2+(3)(CCD)/aug-cc-pVTZ. In an effort to obtain even better
agreement with CCSD(T)/CBS benchmarks, we also considered
an approach that mixes SAPT analysis with supermolecular MP2,
analogous to the δHF correction often used in SAPT computations.
The disadvantage of this approach is that while δHF can be
reasonably well ascribed to induction, it is less clear how to classify
the δMP2 term. Nevertheless, the SAPT2+(3)δMP2/aug-cc-pVTZ
results were the best on average in our systematic study.
To summarize, the primary findings of our systematic study
(Parker et al., 2014) are as follows: (1) low-order SAPT0 provides
fairly reliable interaction energies when used in conjunction with
the jun-cc-pVDZ basis set, especially in the sSAPT0 variant which
uses exchange scaling; (2) DFT-SAPT is quite reliable on average
but exhibits errors for H-bonding systems that are perhaps larger
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

86 QM Methods for Force-Field Development

than widely appreciated; (3) continued addition of higher-order


terms in SAPT does not necessarily result in a smooth decrease
of errors, probably due to complex error cancellation between
remaining neglected terms and basis set incompleteness; (4)
nevertheless, some of the higher-order SAPT methods provide very
accurate interaction energies, with the best method considered
being SAPT2+(3)δMP2/aug-cc-pVTZ, with a mean absolute error
over all databases of only 0.15 kcal mol−1 .
Numerous researchers have profitably used wavefunction-based
SAPT and DFT-based SAPT to better understand non-covalent
interactions. SAPT analysis was critical in understanding the
unexpected finding that all types of substituents, whether electron-
donating or electron-withdrawing, lead to enhanced π -stacking
in gas-phase interactions between a benzene and a substituted
benzene (Sinnokrot and Sherrill, 2003, 2004). Although the Hunter-
Sanders rules (Hunter, 1993; Hunter and Sanders, 1990) state
that such substituent effects should be governed by electrostatics,
SAPT analysis reveals the importance of differential dispersion
effects (Hohenstein, Duan, and Sherrill, 2011a; Ringer et al., 2006;
Sinnokrot and Sherrill, 2004). A recent article (Sherrill, 2013)
reviews applications of SAPT to better understand intermolecular
π -interactions. Very recent work (below) suggests that an atom-
based partitioning of SAPT interaction energies (A-SAPT) provides
even richer insight.

3.3.1 Using SAPT to Investigate Challenges for Current


Force Fields
As discussed above, significant discrepancies were observed be-
tween quantum benchmarks and force fields for non-bonded
interactions in the benzene dimer (Sherrill et al., 2009a). Analysis of
the discrepancies was greatly aided by the use of energy component
analysis, specifically the SAPT method. A detailed analysis of the
parallel-displaced benzene dimer at a fixed vertical distance of 3.4 Å
is shown in Fig. 3.4. As seen from the figure, the London dispersion
interaction computed by the force field through the attractive part
of the Lennard–Jones potential is fairly accurate compared to the
quantum SAPT results. Moreover, in this system, SAPT shows that
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Symmetry-Adapted Perturbation Theory 87

30.0
Electrostatics
25.0 Repulsion
Interaction energy (kcal/mol)

20.0 Induction
Dispersion
15.0
10.0
5.0
0.0
-5.0
-10.0
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
R (Angstrom)

Figure 3.4 Comparison of CHARMM energy components vs. quantum


mechanical values from SAPT2/jun-cc-pVDZ for the parallel-displaced
benzene dimer at various horizontal displacements R and a fixed vertical
displacement of 3.4 Å (data from Sherrill et al. (2009a)). Solid lines are SAPT
data, and dashed lines are CHARMM data.

polarization effects are not very large, so it should not be necessary


to use a polarizable force field. Instead, the major discrepancies
are in the repulsion term and the electrostatic term. The force
field repulsion is much too weak; this could be mostly corrected
using revised parameters (although it is not clear if the shape of
the repulsion curve could be matched closely). However, the more
troubling problem is the electrostatic term, which in the force field
has the wrong sign. Indeed, it is clear that in the force field, the
electrostatic term must be repulsive at the sandwich geometry
(horizontal displacement of zero), because the one benzene is
directly aligned on top of the other, and the closest contacts are all
between identical charges with the same sign.
However, perhaps surprisingly, the quantum mechanical electro-
static energy at this same configuration is attractive (by a few kcal
mol−1 ). While this is impossible to rationalize using any picture
based on atom-centered charges, it is a natural consequence of the
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

88 QM Methods for Force-Field Development

fact that the electrons are smeared out in space according to the laws
of quantum mechanics. If the two monomers are close enough for
their orbitals to overlap, then even simple estimates show that the
overall intermolecular electrostatic term is attractive because the
electron-electron repulsion is smaller (due to the diffuse nature of
the electrons) than the nuclear-nuclear repulsion, and their sum is
less repulsive than the attractive electron-nuclear attraction. This
result is well known in the theory of intermolecular interactions
(Freitag et al., 2000; Kairys and Jensen, 1999; Ng, Meath, and Allnatt,
1976; Stone, 1996; Wheatley and Mitchell, 1994) and is commonly
termed “charge penetration.” What is perhaps less well appreciated
is that these charge penetration terms are quite significant in
π–π interactions; because π surfaces are flat, π stacking can lead to
substantial orbital overlap and hence quite large charge penetration
effects. Indeed, π -stacking may be the ideal motif for magnifying
these charge penetration effects.
These charge penetration effects explain the otherwise coun-
terintuitive finding by Lewis and co-workers (Watt et al., 2011)
that both electron-withdrawing and electron-donating substituents
enhance the electrostatic interaction between a benzene and a
substituted benzene in a sandwich configuration (a result even more
surprising than the earlier discovery (Sinnokrot and Sherrill, 2003,
2004) that all substituents increase the overall binding in these
systems). Regardless of the nature of the substituent, essentially all
substituents will have increased dispersion interactions compared
to a hydrogen. This increased dispersion interaction leads to
tighter binding (and closer intermolecular distances), and as the
distance decreases, the monomer orbitals overlap more, causing
more favorable charge penetration (electrostatic) terms.
The importance of charge penetration effects in π -stacking
interactions is by no means limited to benzene dimers. Base stacking
in DNA and RNA features these same effects, and to an even greater
extent. The first systematic energy component analysis of π -stacking
in DNA and RNA revealed that, at typical values of Rise and nearly
all values of Twist, the electrostatic component of π -stacking is
nearly always attractive (Parker et al., 2013). This is surprising at
first when one considers that for small values of Twist, base pair
steps featuring two identical base pairs should have nearly aligned
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Symmetry-Adapted Perturbation Theory 89

Figure 3.5 Charge penetration in base stacking for the GG:CC base pair step
(at 35◦ Twist and 0.28 Å Slide) as a function of Rise, the vertical separation
between the base pairs. The difference between the distributed multipole
analysis (DMA) value for electrostatics and the quantum mechanical
symmetry-adapted perturbation theory (SAPT0/jun-cc-pVDZ) value for
electrostatics may be taken as a measure of the charge penetration term.
The DMA analysis includes terms up through order 5 (32pole-charge,
hexadecapole-dipole, octopole-quadrupole). Charge penetration rapidly
increases in magnitude for smaller intermolecular distances.

dipoles and hence unfavorable electrostatics. Such contributions


are overcome by the favorable charge penetration terms that occur
due to orbital overlap at typical base stacking distances. Figure
3.5 illustrates this effect for two stacked G:C base pairs (at typical
experimental values of Twist and Slide). At large values of Rise,
the electrostatics are unfavorable due to nearly aligned dipoles.
However, as Rise decreases, the electrostatics (as reliably computed
by SAPT0) rapidly become favorable as orbitals begin to overlap. For
comparison, the figure also illustrates the electrostatic interaction as
computed with a distributed multipole analysis (DMA) (Stone, 1981;
Stone and Alderton, 1985) through order-5 terms (e.g., octopole-
quadrupole). Compared to the simple treatment of electrostatics in
most force fields via atom-centered charges, DMA is a very elaborate
and accurate model. Indeed, at long range, it matches very well to
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

90 QM Methods for Force-Field Development

the more rigorous SAPT values for electrostatics. However, at short


distances, any multipole model will break down, and we see rapid
divergence between SAPT and DMA for values of Rise less than
4 Å. Unfortunately, the biologically relevant values of Rise, around
3.3–3.4 Å, are well within this distance. This suggests that even
seemingly quite advanced multipole-based models of electrostatics
may not be sufficient to accurately model electrostatics in base
stacking or other examples of π -stacking, where the geometry of the
system affords a substantial degree of orbital overlap.
Given the above results, one may well wonder how standard
force field models manage to give reasonable results despite not
including explicit charge penetration terms. The answer is that
one can compensate for the lack of attractive charge penetration
terms by decreasing the size of the repulsion terms; indeed, both
have an exponential dependence on the distance between atoms
(Stone, 1996). On the other hand, the exponential behavior is not
precisely the same (Murrell and Teixeira, 1970), and hence there
are limits to the accuracy of folding in charge penetration with
exchange-repulsion (especially when using non-exponential forms
of exchange-repulsion). The poor performance of force fields for
the parallel displaced benzene dimer discussed above is just one
example of this. Hence, more reliable force fields may need a
more sophisticated treatment of charge penetration electrostatics.
Perhaps the simplest way to account for charge penetration is to
damp the electrostatic interactions between the electrons (Freitag
et al., 2000; Kairys and Jensen, 1999; Piquemal, Gresh, and Giessner-
Prettre, 2003; Slipchenko and Gordon, 2009; Stone, 1996, 2011)
or even the charges themselves (Cisneros et al., 2008; Wang
and Truhlar, 2010). Alternatively, one may abandon point-charge
models or even multipole models and represent the electrons by a
continuous charge distribution (e.g., a Slater or Gaussian function).
Such approaches are discussed below.

3.3.2 Atomic-Partitioned Symmetry-Adapted Perturbation


Theory
An energy component analysis like SAPT provides insight into the
character of intermolecular interactions by providing a breakdown
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Symmetry-Adapted Perturbation Theory 91

of the interaction energy in terms of fundamental physical com-


ponents like electrostatics, exchange-repulsion, London dispersion
terms, and induction/polarization. Standard QM methods for
energy component analysis do not, unfortunately, provide a further
breakdown of what particular interacting atoms or groups are the
most important contributors to each of the energy components.
However, it is clear that such knowledge would be very useful. It
might be helpful to know that a particular drug binds to an active site
mainly due to electrostatic interactions, for example, but it would be
even more helpful to know which the most important contributing
contacts are. This is the motivation for our recent development of an
“atomic-partitioned” version of SAPT we label A-SAPT (Parrish and
Sherrill, 2014).
In A-SAPT, the final energy component expressions are rewritten
in terms of localized orbitals and/or atomic contributions. Terms
involving a local orbital are then assigned to the constituent
atoms according to weights determined by atomic densities. We
use Iterative Stockholder Analysis (ISA) charges (Lillestolen and
Wheatley, 2008, 2009), although the particular choice of charge
model is not an essential part of the method.
Figure 3.6 illustrates an A-SAPT analysis of the polarization of
a benzene by a Na+ cation in the same plane. This is an attractive
interaction that stabilizes the complex. The darker colored regions
of the benzene represent portions of the molecule that are more
important to the induction term in a SAPT0 computation. Prior to the
A-SAPT analysis, we had expected that polarization of the π -cloud
might be the primary contributor to the large, stabilizing induction
energy. Instead, A-SAPT demonstrates that while C–C π -electrons
are strong contributors, there are also important contributions from
nearby C–C σ and C–H σ bonds.
In addition to providing insight into non-bonded contacts, A-
SAPT may also provide opportunities for easier parameterization
of ab initio derived force fields. Because A-SAPT provides energy
component contributions for each interatomic pair, one may fit the
pairwise terms in the force field directly, rather than indirectly
by matching only the overall energy component summed over all
contributions. This should make fitting more straightforward and
robust. Although we have only developed A-SAPT at the most basic
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

92 QM Methods for Force-Field Development

Figure 3.6 A-SAPT0/jun-cc-pVDZ voxel visualization for the induction term


involving the polarization of benzene by a Na+ cation in the same plane.
Darkly shaded areas correspond to strong contributions to the attractive
induction energy.

SAPT0 level so far, the general approach should work with more
reliable levels of SAPT such as SAPT(DFT) or higher-order many-
body SAPT.

3.4 Force Fields Fit to High-Quality Quantum Mechanical


Data

As discussed above, a challenge for standard force fields is that


their functional forms may not be sufficiently flexible to reliably
describe non-bonded contacts in a wide variety of situations; the
poor performance of popular force fields in describing the sliding
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Force Fields Fit to High-Quality Quantum Mechanical Data 93

of one benzene over another is one example (Sherrill et al., 2009a).


This section introduces a few of the various efforts to develop next-
generation force fields using more flexible functional forms and fit
to high-quality quantum mechanical data (i.e., SAPT, CCSD(T)/CBS,
or other approaches that surpass MP2 or DFT in quality). This
has been an active area of research for the past several years, and
hence it would be too difficult to give an exhaustive review here.
Nevertheless, we will attempt to highlight some of the efforts in this
area, particularly those utilizing SAPT. There are, of course, other
approaches, including the computation of all required parameters
directly from monomer properties, as in the effective fragment
potential (EFP) method (Ghosh et al., 2010; Gordon et al., 2001)
(the energy components of EFP have been compared to the energy
components of SAPT for the S22 test set (Flick et al., 2012)).
Work prior to 2000 on the general topic of obtaining force-field
parameters from quantum chemistry computations is summarized
in a review (Engkvist, Åstrand, and Karlström, 2000).
Among the notable early works on fitting ab initio data to
flexible functional forms is the SIBFA (sum of interactions between
fragments ab initio computed) approach (Gresh, 1997; Gresh et al.,
2007; Gresh, Claverie, and Pullman, 1979, 1984). SIBFA represents
interaction energies (intermolecular or intramolecular) in terms of
five components (electrostatics, exchange-repulsion, polarization,
charge transfer, and dispersion) and stresses the fitting of each
of these terms separately (Gresh et al., 2007). Electrostatics are
described by multipole expansions up to quadrupole terms, with the
expansions done at atom and bond centers. Newer versions of SIBFA
add corrections for short-range charge penetration terms (Pique-
mal, Gresh, and Giessner-Prettre, 2003). Distributed anisotropic
polarizability tensors are used to determine the polarization
energy. Exchange repulsion is determined between bonding and
lone-pair orbitals using overlap formulas and bond occupation
numbers. Dispersion energies are determined using damped terms
proportional to R −6 , R −8 , and R −10 . Polarization and charge transfer
terms are fit to energy decomposition methods such as the Reduced
Variational Space (RVS) method (Stevens and Fink, 1987), and
dispersion terms for H-bonded systems were calibrated against
SAPT (Langlet et al., 2003). Some of the more recent methods
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

94 QM Methods for Force-Field Development

discussed below bear a resemblance to SIBFA, although in general


they tend to have somewhat simpler functional forms.
Fitting potentials to individual energy components computed
quantum mechanically was also the strategy behind the NEMO
approach of Karlström and co-workers (Wallqvist, Ahlström, and
Karlström, 1990, 1991); an energy decomposition of Hartree–
Fock interaction energies was originally used to obtain parameters.
NEMO used point charges (not necessarily restricted to atomic
centers), atomic dipole polarizabilities, damped R −6 terms for
dispersion, and a short-range exponential representing the sum of
exchange-repulsion and charge-penetration. NEMO is reviewed in
Ref. (Engkvist, Åstrand, and Karlström, 2000).

3.4.1 Force Fields Fit to SAPT


A number of works have fit specialty force fields to SAPT data
for particular systems by assuming rigid monomers and obtaining
appropriate parameters for the entire molecule, rather than for in-
dividual atoms within the molecule. Hence, parameters contained in
such force fields are not typically transferable to similar molecules.
Nevertheless, these force fields can be very useful for accurate, large-
scale simulations of particular systems. In 1997, Mas, Szalewicz,
Bukowski, and Jeziorski fit two types of analytic potential functions
to SAPT data for more than a thousand interaction energies for
the water dimer (Mas et al., 1997); these included a “site-site”
model (SAPT-ss) using e−R and 1/R terms depending on distances
between the sites, and a more elaborate “pair potential” model
(SAPT-pp) employing the vector between the centers of mass and
the Euler angles defining the relative orientation of the monomers.
Both analytic forms were shown to provide very accurate results for
the second virial coefficient and related thermodynamic properties.
Subsequent work fitting over 2500 SAPT interaction energies led
to the SAPT-5s water potential (Mas et al., 2000), which employed
a “site-site” form (using 8 sites per molecule) but nevertheless
exceeded the previous SAPT-pp function in accuracy. SAPT-based
three-body potentials for water, meant to work in concert with the
SAPT-5s two-body potential, were also developed (Mas, Bukowski,
and Szalewicz, 2003). Torheyden and Jansen also developed a SAPT-
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Force Fields Fit to High-Quality Quantum Mechanical Data 95

based water potential (Torheyden and Jansen, 2006), comparing the


SAPT interaction energies with those of CCSD(T)/CBS estimates.
After the introduction of SAPT(DFT) (Heßelmann, Jansen, and
Schütz, 2005; Misquitta et al., 2005), this method was used to
deduce a revised site-site water model, SDFT-5s (Bukowski et al.,
2006). Even more accurate models, CC-pol and CC-pol-8s, were fitted
to high-accuracy CCSD(T)/CBS estimates (Bukowski et al., 2007;
Cencek et al., 2008).
Szalewicz and co-workers have fit force fields to SAPT data for
numerous other particular systems, including the Ne-HCN complex
(Murdachaew et al., 2001), the methane-water interaction (Akin-
Ojo and Szalewicz, 2005), and the interaction of CO2 with itself
(Bukowski et al., 1999), dimethylnitramine, acetonitrile, or methyl
alcohol (Bukowski and Szalewicz, 1999).
With the advent of the more computationally affordable
SAPT(DFT) and DFT-SAPT methods, additional force fields were fit
to particular chemical systems. Impressive results were reported
in 2006 (Podeszwa, Bukowski, and Szalewicz, 2006) for a site-site
potential for the benzene dimer fit to SAPT(DFT) data, with results
comparing very favorably to high-level CCSD(T) interaction ener-
gies. Another study used more than 1000 dimer configurations to fit
the potential energy surface of cyclotrimethylene trinitramine (RDX)
dimer to SAPT(DFT) energies (Podeszwa et al., 2007). Again, a site-
site form was used, involving R −6 terms for long-range induction
and dispersion, a Tang–Toennies damped charge-charge Coulomb
term, and a generalized Buckingham-type potential modeling both
short-range exchange repulsion and charge penetration effects. This
potential was used in molecular dynamics simulations of the RDX
crystal, leading to crystal densities in excellent agreement with
experiment (Podeszwa et al., 2007).
Jordan and co-workers have developed site-site force fields
for particular (rigid) molecules based on a combination of
wavefunction-based SAPT and CCSD(T) computations on dimers
and small clusters; they refer to their models as distributed
point polarizable (DPP) models (Defusco, Schofield, and Jordan,
2007). Their most recent approach, labeled DPP2, has been
applied to develop models for H2 O (Kumar et al., 2010) and for
CO2 (Wang, Kumar, and Jordan, 2012). These studies employed
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

96 QM Methods for Force-Field Development

exponential exchange terms, Tang–Toennies damped R −6 dispersion


terms, damped electrostatics to account for charge penetration,
and Thole-damped point dipole polarizabilities to account for
induction. For water, an attractive exponential term was added to
account for charge-transfer (since charge transfer terms are not
readily extracted from a SAPT analysis, these were obtained using
an absolutely localized molecular orbitals energy decomposition
analysis, ALMO EDA (Khaliullin et al., 2007)). Atomic polarizability
parameters were adjusted to provide a good fit to molecular
polarizabilities and to three-body energies in CCSD(T) computations
of clusters. Electrostatic and dispersion parameters were fit SAPT
data, and exchange parameters were fit to the difference between
CCSD(T) benchmarks and the sum of the other DPP2 terms. The
DPP2 model gives good radial distribution functions and accurate
interaction energies for clusters (Kumar et al., 2010; Wang, Kumar,
and Jordan, 2012).
Although these site-site interaction models can be computation-
ally inexpensive and also quite accurate, if one hopes to develop
a more general force field for generic molecules, then one needs
transferable parameters. Restricting interactions to atomic centers,
and developing atomic parameters, seems helpful in this regard. In
2005 and 2006, Donchev et al. introduced the quantum mechanical
polarizable force fields (QMPFF) (Donchev et al., 2006a,b, 2005),
which model valence electrons by a polarizable charge cloud
represented by an exponential function centered on each atom. The
idea of representing electrons by spatially delocalized functions in
model potentials had been presented earlier (see for example the
work by Wheatley (Wheatley, 1993; Wheatley and Mitchell, 1994)),
but this appears to be one of the first attempts to incorporate diffuse
electron charge models in a general force field.
In QMPFF, atomic multipole moments account for electron shifts
due to bonding, and inducible atomic dipole moments account
for longer-range polarization. Short-range exchange-repulsion is
modeled by a term analogous to the electrostatic term, and London
dispersion forces are modeled by R −6 and R −8 terms using Tang–
Toennies damping (Tang and Toennies, 1984). The various energy
components are fit separately to quantum mechanical values.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Force Fields Fit to High-Quality Quantum Mechanical Data 97

Electrostatics are fit to dimer electrostatic energies using MP2


densities. Short-range exchange-repulsion energies were obtained
using the Kitaura-Morokuma energy decomposition (Kitaura and
Morokuma, 1976) definition of exchange at the Hartree–Fock
(HF) level, corrected for electron correlation using MP2. Induc-
tion/polarization energies are fit to molecular polarizabilities and,
in later versions, also to non-additive components of the interaction
energies of oligomers. Finally, the dispersion term is fit to the
total QM interaction energy less the other terms already fitted,
initially using MP2 but subsequently using CCSD(T), which worked
better for aromatic hydrocarbons and H2 interactions (Donchev
et al., 2006b; Donchev, Galkin, and Tarasov, 2007). Although the
QM energy components are not determined according to the same
definitions as SAPT, they nevertheless agree fairly well (Donchev
et al., 2006b). QMPFF3 greatly outperforms standard force fields
for the benzene dimer, and the transferability of the approach
was demonstrated by accurate computations of the second virial
coefficient of gaseous benzene, various properties of liquid benzene,
and cohesion energies of various polycyclic aromatic hydrocarbon
crystals (Donchev et al., 2006b).
Around the same time as Donchev’s work on QMPFF, Piquemal,
Cisneros, Darden and co-workers also introduced a force field using
diffuse electrons, the Gaussian Electrostatic Model (GEM) (Cisneros,
Piquemal, and Darden, 2005, 2006; Piquemal et al., 2006). Rather
than trying to use functional forms that mimic the anisotropy of
the electron density, GEM attempts to model the electron density
itself using auxiliary Gaussian basis functions, as is done in density
fitting (Dunlap, Connolly, and Sabin, 1977, 1979a,b; Whitten, 1973).
Intermolecular Coulomb energies (including charge penetration)
can be computed directly from the Coulomb interaction between
the atomic charges and fitted electron densities of monomer A
with those of monomer B. Exchange repulsion is computed using a
density overlap model, and the density overlap is readily computed
using the auxiliary fitting functions. Polarization and charge transfer
energies are computed as in SIBFA (Gresh, 1997; Gresh, Claverie,
and Pullman, 1984) but use electrostatic potentials generated by
the density fitting representation of each monomer’s density, rather
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

98 QM Methods for Force-Field Development

than a less accurate multipolar approximation. Parameters are


adjusted to minimize errors against an energy component analysis
of Hartree–Fock or DFT interaction energies using CSOV or RVS.
However, it should be noted that the fitted densities need not come
from Hartree–Fock or DFT computations; any method capable of
producing a one-particle density matrix can be used. GEM has
been tested using relaxed CCSD density matrices for water dimers,
yielding good results when compared to SAPT electrostatics and
exchange-repulsion terms (Piquemal et al., 2006). More recent
work has examined ways of improving the numerical stability of
the density fitting step (Cisneros et al., 2007) and how to extract
distributed multipoles from the fitted densities and combine them
with the AMOEBA force field (Cisneros, 2012). The related Gaussian
Multipole Model (GMM) (Elking et al., 2010) explores using only
a single contracted Gaussian multipole charge density for each
atom, rather than a large number of distributed auxiliary functions
as in the original GEM approach; electrostatic energies evaluated
in this way tend to match CSOV values for small dimers within
0.1 kcal mol−1 . The GEM* approach (Duke et al., 2014) treats
electrostatic and exchange terms using GEM and bonded terms,
polarization, and dispersion terms using AMOEBA. The overlap
and two-center Coulomb integrals required by GEM are accelerated
using extended versions of the particle mesh Ewald and fast Fourier
Poisson method; for the example systems considered, GEM* requires
around 10 times the computational time of AMOEBA, although the
more physical functional form of GEM* is expected to result in more
reliable results once final parameters are available.
In 2006, a general SAPT-based force field for organic molecules,
based on atomic parameters, was obtained by fitting to SAPT2
interaction energies for 138 small organic complexes and tested
against the interaction energies small peptide ligands and fragments
of glycopeptide antibiotics by Li, Volkov, Szalewicz, and Coppens (Li
et al., 2006). Exchange and induction were fit to exponential terms,
and dispersion was fit to the standard R −6 expression. Electrostatics
were evaluated using a Buckingham-style multipole expansion for
long-range contacts, and explicit Coulomb integrals over atomic
densities for short-range contacts in an approach dubbed EP/MM
(exact potential/multipole methods) (Volkov et al., 2004), using a
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Force Fields Fit to High-Quality Quantum Mechanical Data 99

previously constructed database of aspherical atomic densities. The


authors of this study argued that their approach was more accurate
for the systems studied than the MMFF94 force field, primarily due
to the reliance of the latter on atom-centered point charges (Li et al.,
2006).
In 2007, Stone and Misquitta proposed a general procedure for
developing atom-atom potentials from quantum chemistry com-
putations (Stone and Misquitta, 2007). They advocated computing
long-range interactions using monomer properties, and short-
range interactions by fitting to SAPT computations of dimers.
Distributed multipole analysis (DMA) (Stone, 1981; Stone and
Alderton, 1985) is used to represent the electron density in long-
range interactions. The authors developed an approach to similarly
distribute molecular polarizabilities into atomic contributions (Mis-
quitta and Stone, 2008; Misquitta, Stone, and Price, 2008a; Stone
and Misquitta, 2007); with this approach, the static polarizabilities
needed to compute induction and the dynamic polarizabilities
needed to compute dispersion can be broken down into atomic
contributions. Hence, using their procedure, one can perform
monomer computations to obtain atomic parameters for long-range
electrostatics, induction, and dispersion. The authors advocate mod-
eling short-range effects (exchange, charge-penetration, exchange-
induction, and exchange-dispersion) by fitting SAPT energies to an
interatomic, exponential (Born-Mayer) form. The simplest approach
is to sum all these contributions together and fit them by a
single (isotropic) exponential. Interdependencies between the Born-
Mayer parameters are reduced by assuming a density overlap
model.
This approach was used in 2008 to fit an intermolecular
potential for 1,3-dibromo-2-chloro-5-fluorobenzene to SAPT(DFT)
data (Misquitta, Welch, Stone, and Price, 2008b). This potential was
then used in one of the Blind Tests of crystal structure prediction
organized by the Cambridge Crystallographic Data Centre. The
SAPT(DFT)-deduced potential yielded excellent results compared
to the experimental structure when it was subsequently revealed
(Misquitta, Welch, Stone, and Price, 2008b). The study used Tang–
Toennies damped dispersion terms through R −8 , DMA multipoles
through rank 4 for electrostatics, and pairwise atomic exponential
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

100 QM Methods for Force-Field Development

terms representing the sum of exchange repulsion and charge


penetration.
More recently, Schmidt and co-workers have pursued a similar
strategy in an effort to develop a general procedure for obtaining
atom-atom potentials based on SAPT(DFT) (McDaniel and Schmidt,
2012, 2013; McDaniel, Yu, and Schmidt, 2012). They present
a detailed recipe (McDaniel and Schmidt, 2013) including the
functional form of the force field, the particular fitting procedure,
the level of theory to use for the monomer properties and SAPT
computations, etc. These authors advocate use of atom-centered
charges instead of multipoles to make it easier to implement their
force fields in standard molecular dynamics packages. However,
unlike standard packages, they apply Tang–Toennies damping to
the point-charge electrostatics. Like Misquitta et al. (Misquitta,
Welch, Stone, and Price, 2008b), they use exponentials to represent
exchange-repulsion, induction, and charge penetration. However,
they keep these as three separate terms with separate coefficients
(although with common exponents for a given pair of atom
types). Damped dispersion terms are retained through R −12 , and
a special δHF term is retained (analogous to the term from SAPT)
to account for higher-order induction. Response functions are fit
to libraries of molecules rather than individual molecules to obtain
better transferability of parameters for a particular atom type.
The approach appears to give good results for the second virial
coefficients of various organic molecules (McDaniel and Schmidt,
2013). We noted above (see Section 3.3) that the accuracy of
SAPT(DFT) may not be as high as desired or expected for H-
bonded systems; consistent with this observation, Schmidt and co-
workers recommend special procedures for H-bonded systems in
their parameterization procedure (McDaniel and Schmidt, 2013).

3.5 Conclusions

Numerous recent advances have substantially increased the fea-


sibility of deriving general force fields from ab initio data. First,
several competing but often roughly similar functional forms
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

Conclusions 101

have been examined that appear to be capable of reproducing


quantum mechanical energies for non-bonded interactions to high
accuracy. Second, advances in computer hardware and in algorithms
have made it easier to perform quantum computations at the
“gold standard” CCSD(T)/CBS level of theory. Such results are
substantially more accurate and reliable than those from lower
levels of theory like Hartree–Fock, DFT, MP2, etc. For example, our
two research groups are completing the initial phase of a project to
obtain CCSD(T)/CBS benchmark data for around 3300 sidechain–
sidechain interactions as part of the Bio-Fragment Database. These
benchmarks, in addition to providing data for parameterization
of force fields, are also very helpful in evaluating the accuracy
of existing force fields and approximate quantum methods, as
illustrated by recent studies of the indinavir/HIV-II protease
complex and ubiquitin. Third, more transferable parameters should
be possible if one parameterizes each component of the interaction
energy (electrostatics, exchange-repulsion, induction/polarization,
and dispersion) separately. This now appears to be possible using
SAPT(DFT) or even more accurate high-order many-body SAPT.
Hence, advances in high-level quantum chemistry, intermolecular
theory and functional forms, and energy component analysis
methods like SAPT appear to be at a stage when they may be very
fruitfully combined to develop a next generation of force fields with
general applicability and substantially improved accuracy.

Acknowledgments

The authors would like to thank Trent Parker for research assistance
and for providing Fig. 3.5, Robert Parrish for providing Fig. 3.6,
and Dr. Lori Burns for providing Figs. 3.2 and 3.3. C.D.S. gratefully
acknowledges support by the National Science Foundation (Grant
No. CHE-1300497). The Center for Computational Molecular Science
and Technology is funded through an NSF CRIF award (Grant
No. CHE-0946869).
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

102 QM Methods for Force-Field Development

References

Adamowicz, L. (2010). Optimized virtual orbital space (OVOS) in coupled-


cluster calculations, Mol. Phys. 108, pp. 3105–3112, doi:10.1080/
00268976.2010.520752.
Adamowicz, L., and Bartlett, R. J. (1987). Optimized virtual orbital subspace
for high-level correlated calculations, J. Chem. Phys. 86, pp. 6314–6324.
Akin-Ojo, O., and Szalewicz, K. (2005). Potential energy surface and second
virial coefficient of methane-water from ab initio calculations, J. Chem.
Phys. 123, p. 134311, doi:10.1063/1.2033667.
Aquilante, F., Vico, L. D., Ferre, N., Ghigo, G., Malmqvist, P., Neogrady, P.,
Pedersen, T. B., Pitonak, M., Reiher, M., Roos, B. O., Serrano-Andres, L.,
Urban, M., Veryazov, V., and Lindh, R. (2010). Software news and update
MOLCAS 7: The next generation, J. Comput. Chem. 31, pp. 224–247, doi:
10.1002/jcc.21318.
Bagus, P. S., Hermann, K., and Bauschlicher, C. W. (1984). A new analysis
of charge transfer and polarization for ligand-metal bonding: Model
studies of Al4 CO and Al4 NH3 , J. Chem. Phys. 80, pp. 4378–4386, doi:
10.1063/1.447215.
Bayly, C. I., Cieplak, P., Cornell, W. D., and Kollman, P. A. (1993). A well-
behaved electrostatic potential based method using charge restraints
for deriving atomic charges: The RESP model, J. Phys. Chem. 97, pp.
10269–10280.
Becke, A. D. (1997). Density-functional thermochemistry. v. systematic
optimization of exchange-correlation functionals, J. Chem. Phys. 107,
pp. 8554–8560, doi:10.1063/1.475007.
Becke, A. D., and Johnson, E. R. (2005). Exchange-hole dipole moment and
the dispersion interaction, J. Chem. Phys. 122, p. 154104, doi:10.1063/
1.1884601.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig,
H., Shindyalov, I. N., and Bourne, P. E. (2000). The protein data bank,
Nucleic Acids Res. 28, pp. 235–242, doi:10.1093/nar/28.1.235.
Boström, J., Pitoňák, M., Aquilante, F., Neogrády, P., Pedersen, T. B., and Lindh,
R. (2012). Coupled cluster and Møller–Plesset perturbation theory
calculations of noncovalent intermolecular interactions using density
fitting with auxiliary basis sets from Cholesky decompositions, J. Chem.
Theory Comput. 8, pp. 1921–1928, doi:10.1021/ct3003018.
Boys, S. F., and Bernardi, F. (1970). The calculation of small molecular
interactions by the differences of separate total energies. Some
procedures with reduced errors, Mol. Phys. 19, 4, pp. 553–566.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 103

Bukowski, R., Sadlej, J., Jeziorski, B., Jankowski, P., Szalewicz, K., Kucharski,
S. A., Williams, H. L., and Rice, B. M. (1999). Intermolecular potential of
carbon dioxide dimer from symmetry-adapted perturbation theory, J.
Chem. Phys. 110, pp. 3785–3803, doi:10.1063/1.479108.
Bukowski, R., and Szalewicz, K. (1999). Ab initio interaction potentials
for simulations of dimethylnitramine solutions in supercritical carbon
dioxide with cosolvents, J. Phys. Chem. A 103, pp. 7322–7340, doi:
10.1021/jp991212p.
Bukowski, R., Szalewicz, K., Groenenboom, G., and van der Avoird,
A. (2006). Interaction potential for water dimer from symmetry-
adapted perturbation theory based on density functional description
of monomers, J. Chem. Phys. 125, p. 044301, doi:10.1063/1.2220040.
Bukowski, R., Szalewicz, K., Groenenboom, G. C., and van der Avoird, A.
(2007). Predictions of the properties of water from first principles,
Science 315, pp. 1249–1252, doi:10.1126/science.1136371.
Burley, S. K., and Petsko, G. A. (1985). Aromatic-aromatic interaction: A
mechanism of protein structure stabilization, Science 229, pp. 23–28.
Burns, L. A., Marshall, M. S., and Sherrill, C. D. (2014). Comparing
counterpoise-corrected, uncorrected, and averaged binding energies
for benchmarking noncovalent interactions, J. Chem. Theory Comput.
10, pp. 49–57, doi:10.1021/ct400149j.
Burns, L. A., Vázquez-Mayagoitia, Á., Sumpter, B. G., and Sherrill, C. D.
(2011). Density-functional approaches to noncovalent interactions: A
comparison of dispersion corrections (DFT-D), exchange-hole dipole
moment (XDM) theory, and specialized functionals, J. Chem. Phys. 134,
p. 084107, doi:10.1063/1.3545971.
Cencek, W., Szalewicz, K., Leforestier, C., van Harrevelt, R., and van der
Avoird, A. (2008). An accurate analytic representation of the water pair
potential, Phys. Chem. Chem. Phys. 10, pp. 4716–4731, doi:10.1039/
b809435g.
Černý, J., Kabeláč, M., and Hobza, P. (2008). Double-helical → ladder
structural transition in the B-DNA is induced by a loss of dispersion
energy, J. Am. Chem. Soc. 130, pp. 16055–16059, doi:10.1021/
ja805428q.
Chai, J., and Head-Gordon, M. (2009). Long-range corrected double-hybrid
density functionals, J. Chem. Phys. 131, p. 174105, doi:10.1063/1.
3244209.
Chen, W., and Gordon, M. S. (1996). Energy decomposition analyses for
many-body interaction and applications to water complexes, J. Phys.
Chem. 100, pp. 14316–14328, doi:10.1021/jp960694r.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

104 QM Methods for Force-Field Development

Cisneros, G. A. (2012). Application of gaussian electrostatic model (GEM)


distributed multipoles in the AMOEBA force field, J. Chem. Theory
Comput. 8, pp. 5072–5080, doi:10.1021/ct300630u.
Cisneros, G. A., Elking, D., Piquemal, J.-P., and Darden, T. A. (2007). Numerical
fitting of molecular properties to hermite gaussians, J. Phys. Chem. A
111, pp. 12049–12056, doi:10.1021/jp074817r.
Cisneros, G. A., Piquemal, J. P., and Darden, T. A. (2005). Intermolecular
electrostatic energies using density fitting, J. Chem. Phys. 123, p.
044109, doi:10.1063/1.1947192.
Cisneros, G. A., Piquemal, J.-P., and Darden, T. A. (2006). Generalization
of the gaussian electrostatic model: Extension to arbitrary angular
momentum, distributed multipoles, and speedup with reciprocal space
methods, J. Chem. Phys. 125, p. 184101, doi:10.1063/1.2363374.
Cisneros, G. A., Tholander, S. N., Parisel, O., Darden, T. A., Elking, D., Perera,
L., and Piquemal, J.-P. (2008). Simple formulas for improved point-
charge electrostatics in classical force fields and hybrid quantum
mechanical/molecular mechanical embedding, Int. J. Quantum Chem.
108, pp. 1905–1912, doi:10.1002/qua.21675.
Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., Kerz, K. M., Ferguson, D. M.,
Spellmeyer, D. C., Fox, T., Caldwell, J. W., and Kollman, P. A. (1995). A
second generation force field for the simulation of proteins, nucleic
acids, and organic molecules, J. Am. Chem. Soc. 117, pp. 5179–5197, doi:
10.1021/ja00124a002.
Császár, A. G., Allen, W. D., and Schaefer, H. F. (1998). In pursuit of the ab initio
limit for conformational energy prototypes, J. Chem. Phys. 108, 23, pp.
9751–9764.
Dedı́ková, P., Pitoňák, M., Neogrády, P., Černušák, I., and Urban, M.
(2008). Toward more efficient CCSD(T) calculations of intermolecular
interactions in model hydrogen-bonded and stacked dimers, J. Phys.
Chem. A 112, pp. 7115–7123, doi:10.1021/jp8033903.
Defusco, A., Schofield, D. P., and Jordan, K. D. (2007). Comparison of models
with distributed polarizable sites for describing water clusters, Mol.
Phys. 105, pp. 2681–2696, doi:10.1080/00268970701620669.
DePrince, A. E., Kennedy, M. R., Sumpter, B. G., and Sherrill, C. D.
(2014). Density-fitted singles and doubles coupled cluster on graphics
processing units, Mol. Phys. 112, pp. 844–852, doi:10.1080/00268976.
2013.874599.
DePrince, A. E., and Sherrill, C. D. (2013a). Accuracy and efficiency of
coupled-cluster theory using density fitting/Cholesky decomposition,
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 105

frozen natural orbitals, and a t1 -transformed hamiltonian, J. Chem.


Theory Comput. 9, pp. 2687–2696, doi:10.1021/ct400250u.
DePrince, A. E., and Sherrill, C. D. (2013b). Accurate noncovalent interaction
energies using truncated basis sets based on frozen natural orbitals, J.
Chem. Theory Comput. 9, pp. 293–299, doi:10.1021/ct300780u.
Dewar, M. J. S., Zoebisch, E. G., Healy, E. F., and Stewart, J. J. P. (1985). AM1:
a new general purpose quantum mechanical model, J. Am. Chem. Soc.
107, pp. 3902–3909, doi:10.1021/ja00299a024.
DiLabio, G. A., Koleini, M., and Torres, E. (2013). Extension of the B3LYP-
dispersion-correcting potential approach to the accurate treatment of
both inter- and intra-molecular interactions, Theor. Chem. Acc. 132, p.
1389, doi:10.1007/s00214-013-1389-x.
Dion, M., Rydberg, H., Schröder, E., Langreth, D. C., and Lundqvist, B. I.
(2004). van der Waals density functional for general geometries, Phys.
Rev. Lett. 92, 24, p. 246401.
Donchev, A. G., Galkin, N. G., Illarionov, A. A., Khoruzhii, O. V., Olevanov,
M. A., Ozrin, V. D., Subbotin, M. V., and Tarasov, V. I. (2006a). Water
properties from first principles: Simulations by a general-purpose
quantum mechanical polarizable force field, Proc. Natl. Acad. Sci. USA
103, pp. 8613–8617.
Donchev, A. G., Galkin, N. G., Pereyaslavets, L. B., and Tarasov, V. I. (2006b).
Quantum mechanical polarizable force field (QMPFF3): Refinement
and validation of the dispersion interaction for aromatic carbon, J.
Chem. Phys. 125, p. 244107.
Donchev, A. G., Galkin, N. G., and Tarasov, V. I. (2007). Anisotropic
nonadditive ab initio force field for noncovalent interactions of H2 , J.
Chem. Phys. 126, p. 174307.
Donchev, A. G., Ozrin, V. D., Subbotin, M. V., Tarasov, O. V., and Tarasov, V. I.
(2005). A quantum mechanical polarizable force field for biomolecular
interactions, Proc. Natl. Acad. Sci. USA 102, pp. 7829–7834.
Duke, R. E., Starovoytov, O. N., Piquemal, J.-P., and Cisneros, G. A. (2014).
GEM*: A molecular electronic density-based force field for molecular
dynamics simulations, J. Chem. Theory Comput. 10, pp. 1361–1365, doi:
10.1021/ct500050p.
Dunlap, B. I., Connolly, J. W. D., and Sabin, J. R. (1977). Applicability of
LCAO-X-alpha methods to molecules containing transition-metal atoms
- nickel atom and nickel hydride, Int. J. Quantum Chem. Symp. 11,
p. 81.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

106 QM Methods for Force-Field Development

Dunlap, B. I., Connolly, J. W. D., and Sabin, J. R. (1979a). On first-row diatomic


molecules and local density models, J. Chem. Phys. 71, pp. 4993–4999,
doi:10.1063/1.438313.
Dunlap, B. I., Connolly, J. W. D., and Sabin, J. R. (1979b). On some
approximations in applications of X α theory, J. Chem. Phys. 71, pp.
3396–3402, doi:10.1063/1.438728.
Dunning, T. H. (1989). Gaussian basis sets for use in correlated molecular
calculations. I. The atoms boron through neon and hydrogen, J. Chem.
Phys. 90, pp. 1007–1023.
East, A. L. L., and Allen, W. D. (1993). The heat of formation of NCO, J. Chem.
Phys. 99, 6, pp. 4638–4650.
Elking, D. M., Cisneros, G. A., Piquemal, J.-P., Darden, T. A., and Pedersen, L. G.
(2010). Gaussian multipole model (GMM), J. Chem. Theory Comput. 6,
pp. 190–202, doi:10.1021/ct900348b.
Engkvist, O., Åstrand, P. O., and Karlström, G. (2000). Accurate intermole-
cular potentials obtained from molecular wave functions: Bridging the
gap between quantum chemistry and molecular simulations, Chem. Rev.
100, pp. 4087–4108, doi:10.1021/cr9900477.
Epifanovsky, E., Zuev, D., Feng, X., Khistyaev, K., Shao, Y., and Krylov,
A. I. (2013). General implementation of the resolution-of-the-identity
and Cholesky representations of electron repulsion integrals within
coupled-cluster and equation-of-motion methods: Theory and bench-
marks, J. Chem. Phys. 139, p. 134105, doi:10.1063/1.4820484.
Faver, J. C., Benson, M. L., He, X., Roberts, B. P., Wang, B., Marshall, M. S.,
Kennedy, M. R., Sherrill, C. D., and Merz, K. M. (2011a). Formal
estimation of errors in computed absolute interaction energies of
protein-ligand complexes, J. Chem. Theory Comput. 7, pp. 790–797, doi:
10.1021/ct100563b.
Faver, J. C., Benson, M. L., He, X., Roberts, B. P., Wang, B., Marshall, M. S.,
Sherrill, C. D., and Merz, K. M. (2011b). The energy computation
paradox and ab initio protein folding, PLoS ONE 6, p. e18868, doi:
10.1371/journal.pone.0018868.
Flick, J. C., Kosenkov, D., Hohenstein, E. G., Sherrill, C. D., and Slipchenko,
L. V. (2012). Accurate prediction of noncovalent interaction energies
with the effective fragment potential method: Comparison of energy
components to symmetry-adapted perturbation theory for the S22 test
set, J. Chem. Theory Comput. 8, pp. 2835–2843, doi:10.1021/ct200673a.
Freitag, M. A., Gordon, M. S., Jensen, J. H., and Stevens, W. J. (2000). Evaluation
of charge penetration between distributed multipolar expansions, J.
Chem. Phys. 112, pp. 7300–7306, doi:10.1063/1.481370.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 107

Ghosh, D., Kosenkov, D., Vanovschi, V., Williams, C. F., Herbert, J. M.,
Gordon, M. S., Schmidt, M. W., Slipchenko, L. V., and Krylov, A. I.
(2010). Noncovalent interactions in extended systems described by
the effective fragment potential method: Theory and application to
nucleobase oligomers, J. Phys. Chem. A 114, pp. 12739–12754, doi:
10.1021/jp107557p.
Gordon, M. S., Freitag, M. A., Bandyopadhyay, P., Jensen, J. H., Kairys, V., and
Stevens, W. J. (2001). The effective fragment potential method: A QM-
based MM approach to modeling environmental effects in chemistry, J.
Phys. Chem. A 105, pp. 293–307, doi:10.1021/jp002747h.
Gráfová, L., Pitoňák, M., Řezáč, J., and Hobza, P. (2010). Comparative study of
selected wave function and density functional methods for noncovalent
interaction energy calculations using the extended S22 data set, J. Chem.
Theory Comput. 6, pp. 2365–2376, doi:10.1021/ct1002253.
Gresh, N. (1997). Model, multiply hydrogen-bonded water oligomers (n
= 3-20). how closely can a separable, ab initio-grounded molecular
mechanics procedure reproduce the results of supermolecule quantum
chemical computations? J. Phys. Chem. A 101, pp. 8680–8694, doi:
10.1021/jp9713423.
Gresh, N., Cisneros, G. A., Darden, T. A., and Piquemal, J.-P. (2007).
Anisotropic, polarizable molecular mechanics studies of inter- and
intramolecular interactions and ligand-macromolecule complexes. a
bottom-up strategy, J. Chem. Theory Comput. 3, pp. 1960–1986, doi:
10.1021/ct700134r.
Gresh, N., Claverie, P., and Pullman, A. (1979). Intermolecular interactions:
Reproduction of the results of ab initio supermolecule computations
by an additive procedure, Int. J. Quantum Chem. 16, pp. 243–253, doi:
10.1002/qua.560160826.
Gresh, N., Claverie, P., and Pullman, A. (1984). Theoretical studies of
molecular conformation. derivation of an additive procedure for the
computation of intramolecular interaction energies. comparison with
ab initio SCF computations, Theor. Chim. Acta 66, pp. 1–20, doi:10.
1007/BF00577135.
Grimme, S. (2004). Accurate description of van der Waals complexes by
density functional theory including empirical corrections, J. Comput.
Chem. 25, pp. 1463–1473.
Grimme, S. (2006a). Semiempirical GGA-type density functional constructed
with a long-range dispersion correction, J. Comput. Chem. 27, 15, pp.
1787–1799.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

108 QM Methods for Force-Field Development

Grimme, S. (2006b). Semiempirical hybrid density functional with pertur-


bative second-order correlation, J. Chem. Phys. 124, p. 034108, doi:
10.1063/1.2148954.
Grimme, S., Antony, J., Ehrlich, S., and Krieg, H. (2010). A consistent and
accurate ab initio parametrization of density functional dispersion
correction (DFT-D) for the 94 elements H-Pu, J. Chem. Phys. 132, p.
154104, doi:10.1063/1.3382344.
Halkier, A., Helgaker, T., Jørgensen, P., Klopper, W., Koch, H., Olsen, J. and
Wilson, A. K. (1998). Basis-set convergence in correlated calculations
on Ne, N2 , and H2 O, Chem. Phys. Lett. 286, pp. 243–252.
Halkier, A., Klopper, W., Helgaker, T., Jørgensen, P., and Taylor, P. R. (1999).
Basis set convergence of the interaction energy of hydrogen-bonded
complexes, J. Chem. Phys. 111, pp. 9157–9167.
Halkier, A., Koch, H., Jorgensen, P., Christiansen, O., Nielsen, I. M. B., and
Helgaker, T. (1997). A systematic ab initio study of the water dimer in
hierarchies of basis sets and correlation models, Theor. Chem. Acc. 97,
pp. 150–157, doi:10.1007/s002140050248.
Heßelmann, A., Jansen, G., and Schütz, M. (2005). Density-functional theory-
symmetry-adapted intermolecular perturbation theory with density
fitting: A new efficient method to study intermolecular interaction
energies, J. Chem. Phys. 122, p. 014103.
Hohenstein, E. G., Duan, J., and Sherrill, C. D. (2011a). Origin of the
surprising enhancement of electrostatic energies by electron-donating
substituents in substituted sandwich benzene dimers, J. Am. Chem. Soc.
133, pp. 13244–13247, doi:10.1021/ja204294q.
Hohenstein, E. G., Parrish, R. M., Sherrill, C. D., Turney, J. M., and
Schaefer, H. F. (2011b). Large-scale symmetry-adapted perturbation
theory computations via density fitting and Laplace transformation
techniques: Investigating the fundamental forces of DNA-intercalator
interactions, J. Chem. Phys. 135, p. 174107, doi:10.1063/1.3656681.
Hohenstein, E. G., and Sherrill, C. D. (2009). Effects of heteroatoms on
aromatic π -π interactions: Benzene-pyridine and pyridine dimer, J.
Phys. Chem. A 113, pp. 878–886, doi:10.1021/jp809062x.
Hohenstein, E. G., and Sherrill, C. D. (2010). Density fitting and Cholesky
decomposition approximations in symmetry-adapted perturbation
theory: Implementation and application to probe the nature of π –
π interactions in linear acenes, J. Chem. Phys. 132, p. 184111, doi:
10.1063/1.3426316.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 109

Hohenstein, E. G., and Sherrill, C. D. (2012). Wavefunction methods for


noncovalent interactions, WIREs Comput. Mol. Sci. 2, pp. 304–326, doi:
10.1002/wcms.84.
Hopkins, B. W., and Tschumper, G. S. (2004). Ab initio studies of π · · · π
interactions: The effects of quadruple excitations, J. Phys. Chem. A 108,
15, pp. 2941–2948.
Hostaš, J., Řezáč, J., and Hobza, P. (2013). On the performance of
the semiempirical quantum mechanical PM6 and PM7 methods for
noncovalent interactions, Chem. Phys. Lett. 568, pp. 161–166, doi:10.
1016/j.cplett.2013.02.069.
Hunter, C. A. (1993). Arene-arene interactions: Electrostatic or charge
transfer? Angew. Chem., Int. Ed. Engl. 32, 11, pp. 1584–1586.
Hunter, C. A., and Sanders, J. K. M. (1990). The nature of π –π Interactions, J.
Am. Chem. Soc. 112, 14, pp. 5525–5534.
Janowski, T., Ford, A. R., and Pulay, P. (2007). Parallel calculation of coupled
cluster singles and doubles wave functions using array files, J. Chem.
Theory Comput. 3, pp. 1368–1377, doi:10.1021/ct700048u.
Janowski, T., and Pulay, P. (2007). High accuracy benchmark calculations on
the benzene dimer potential energy surface, Chem. Phys. Lett. 447, pp.
27–32.
Janowski, T., and Pulay, P. (2008). Efficient parallel implementation of the
ccsd external exchange operator and the perturbative triples (T) energy
calculation, J. Chem. Theory Comput. 4, pp. 1585–1592, doi:10.1021/
ct800142f.
Jeziorski, B., Moszynski, R., and Szalewicz, K. (1994). Perturbation theory
approach to intermolecular potential energy surfaces of van der Waals
complexes, Chem. Rev. 94, pp. 1887–1930, doi:10.1021/cr00031a008.
Johnson, E. R., and Becke, A. D. (2006). A post-Hartree–Fock model of
intermolecular interactions: Inclusion of higher-order corrections, J.
Chem. Phys. 124, p. 174104, doi:10.1063/1.2190220.
Johnson, E. R., de-la Roza, A. O., Dale, S. G., and DiLabio, G. A. (2013).
Efficient basis sets for non-covalent interactions in xdm-corrected
density-functional theory, J. Chem. Phys. 139, p. 214109, doi:10.1063/
1.4832325.
Jorgensen, W. J., Maxwell, D. S., and Tirado-Rives, J. (1996). Development and
testing of the OPLS all-atom force field on conformational energetics
and properties of organic liquids, J. Am. Chem. Soc. 118, pp. 11225–
11236.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

110 QM Methods for Force-Field Development

Jurečka, P., Šponer, J., Černý, J., and Hobza, P. (2006). Benchmark database
of accurate (MP2 and CCSD(T) complete basis set limit) interaction
energies of small model complexes, DNA base pairs, and amino acid
pairs, Phys. Chem. Chem. Phys. 8, pp. 1985–1993.
Kahn, K., and Bruice, T. C. (2002). Parameterization of OPLS-AA force field
for the conformational analysis of macrocyclic polyketides, J. Comput.
Chem. 23, pp. 977–996, doi:10.1002/jcc.10051.
Kairys, V., and Jensen, J. H. (1999). Evaluation of the charge penetration
energy between non-orthogonal molecular orbitals using the spherical
gaussian overlap approximation, Chem. Phys. Lett. 315, pp. 140–144.
Kaminski, G. A., Friesner, R. A., Tirado-Rives, J., and Jorgensen, W. J. (2001).
Evaluation and reparameterization of the OPLS-AA force field for
proteins via comparison with accurate quantum chemical calculations
on peptides, J. Phys. Chem. B 105, pp. 6474–6487.
Kendall, R. A., Dunning, T. H., and Harrison, R. J. (1992). Electron affinities of
the first-row atoms revisited. systematic basis sets and wave functions,
J. Chem. Phys. 96, pp. 6796–6806.
Khaliullin, R. Z., Cobar, E. A., Lochan, R. C., Bell, A. T., and Head-Gordon,
M. (2007). Unravelling the origin of intermolecular interactions using
absolutely localized molecular orbitals, J. Phys. Chem. A 111, pp. 8753–
8765, doi:10.1021/jp073685z.
Kim, J., and Kim, K. S. (1998). Structures, binding energies, and spectra
of isoenergetic water hexamer clusters: Extensive ab initio studies, J.
Chem. Phys. 109, pp. 5886–5895, doi:10.1063/1.477211.
Kim, J. S., Lee, S., Cho, S. J., Mhin, B. J., and Kim, K. S. (1995). Structures,
energetics, and spectra of aqua-sodium(I): Thermodynamic effects and
nonadditive interactions, J. Chem. Phys. 102, pp. 839–849, doi:10.1063/
1.469199.
Kim, K. S., Mhin, B. J., Choi, U.-S., and Lee, K. (1992). Ab initio studies of the
water dimer using large basis sets: The structure and thermodynamic
energies, J. Chem. Phys. 97, pp. 6649–6662, doi:10.1063/1.463669.
Kim, K. S., Tarakeshwar, P., and Lee, J. Y. (2000). Molecular clusters of
π -Systems: Theoretical studies of structures, spectra, and origin of
interaction energies, Chem. Rev. 100, 11, pp. 4145–4185.
Kitaura, K., and Morokuma, K. (1976). New energy decomposition scheme
for molecular-interactions within Hartree–Fock approximation, Int. J.
Quantum Chem. 10, pp. 325–340, doi:10.1002/qua.560100211.
Klopper, W., Lüthi, H. P., Brupbacher, T., and Bauder, A. (1994). Ab initio
computations close to the one-particle basis set limit on the weakly
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 111

bound van der Waals complexes benzene-neon and benzene-argon, J.


Chem. Phys. 101, 11, pp. 9747–9754.
Klopper, W., Noga, J., Koch, H., and Helgaker, T. (1997). Multiple basis sets
in calculations of triples corrections in coupled-cluster theory, Theor.
Chem. Acc. 97, pp. 164–176.
Koch, H., Fernández, B., and Christiansen, O. (1998). The benzene-argon
complex: A ground and excited state ab initio study, J. Chem. Phys. 108,
pp. 2784–2790, doi:10.1063/1.475669.
Kong, J., Gan, Z. T., Proynov, E., Freindorf, M., and Furlani, T. R. (2009).
Efficient computation of the dispersion interaction with density-
functional theory, Phys. Rev. A 79, p. 042510, doi:10.1103/PhysRevA.
79.042510.
Korth, M., Pitoňák, M., Řezác, J., and Hobza, P. (2010). A transferable h-
bonding correction for semiempirical quantum-chemical methods, J.
Chem. Theory Comput. 6, pp. 344–352, doi:10.1021/ct900541n.
Kumar, R., Wang, F., Jenness, G. R., and Jordan, K. D. (2010). A second
generation distributed point polarizable water model, J. Chem. Phys.
132, p. 014309, doi:10.1063/1.3276460.
Kus, T., Lotrich, V. F., and Bartlett, R. J. (2009). Parallel implementation of
the equation-of-motion coupled-cluster singles and doubles method
and application for radical adducts of cytosine, J. Chem. Phys. 130, p.
124122, doi:10.1063/1.3091293.
Landau, A., Khistyaev, K., Dolgikh, S., and Krylov, A. I. (2010). Frozen natural
orbitals for ionized states within equation-of-motion coupled-cluster
formalism, J. Chem. Phys. 132, p. 014109, doi:10.1063/1.3276630.
Langlet, J., Caillet, J., Berges, J., and Reinhardt, P. (2003). Comparison of two
ways to decompose intermolecular interactions for hydrogen-bonded
dimer systems, J. Chem. Phys. 118, pp. 6157–6166, doi:10.1063/1.
1558473.
Langreth, D. C., Dion, M., Rydberg, H., Schroder, E., Hyldgaard, P., and
Lundqvist, B. I. (2005). van der Waals density functional theory with
applications, Int. J. Quantum Chem. 101, pp. 599–610, doi:10.1002/qua.
20315.
Lao, K. U., and Herbert, J. M. (2012). Breakdown of the single-exchange
approximation in third-order symmetry-adapted perturbation theory,
J. Phys. Chem. A 116, pp. 3042–3047, doi:10.1021/jp300109y.
Lee, K., Murray, É., Kong, L., Lundqvist, B. I., and Langreth, D. C. (2010).
Higher-accuracy van der Waals density functional, Phys. Rev. B 82, p.
081101(R), doi:10.1103/PhysRevB.82.081101.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

112 QM Methods for Force-Field Development

Lee, T. J., and Scuseria, G. E. (1995). Achieving chemical accuracy with


coupled-cluster theory, in S. R. Langhoff (ed.), Quantum Mechanical
Electronic Structure Calculations with Chemical Accuracy (Kluwer
Academic Publishers, Dordrecht), pp. 47–108.
Li, X., Volkov, A. V., Szalewicz, K., and Coppens, P. (2006). Interaction
energies between glycopeptide antibiotics and substrates in complexes
determined by x-ray crystallography: Application of a theoretical
databank of aspherical atoms and a symmetry-adapted perturbation
theory-based set of interatomic potentials, Acta. Crystallogr. D62, pp.
639–647, doi:10.1107/S0907444906013072.
Lillestolen, T. C., and Wheatley, R. J. (2008). Redefining the atom: Atomic
charge densities produced by an iterative stockholder approach, Chem.
Comm. 350, pp. 5909–5911, doi:10.1039/b812691g.
Lillestolen, T. C., and Wheatley, R. J. (2009). Atomic charge densities
generated using an iterative stockholder procedure, J. Chem. Phys. 131,
p. 144101, doi:10.1063/1.3243863.
Lindorff-Larsen, K., Piana, S., Palmo, K., Maragakis, P., Klepeis, J. L., Dror, R. O.,
and Shaw, D. E. (2010). Improved side-chain torsion potentials for the
amber ff99SB protein force field, Proteins 78, pp. 1950–1958, doi:10.
1002/prot.22711.
Lotrich, V., Flocke, N., Ponton, M., Yau, A. D., Perera, A., Deumens, E., and
Bartlett, R. J. (2008). Parallel implementation of electronic structure
energy, gradient, and hessian calculations, J. Chem. Phys. 128, p.
194104, doi:10.1063/1.2920482.
MacKerell, A. D., Wiórkiewicz-Kuczera and Karplus, M. (1995). An all-atom
empirical energy function for the simulation of nucleic acids, J. Am.
Chem. Soc. 117, pp. 11946–11975.
Mackie, I. D., and DiLabio, G. A. (2011). Approximations to complete basis
set-extrapolated, highly correlated non-covalent interaction energies, J.
Chem. Phys. 135, p. 134318, doi:10.1063/1.3643839.
Marshall, M. S., Burns, L. A., and Sherrill, C. D. (2011). Basis set
C C S D(T )
convergence of the coupled-cluster correction, δ M P 2 : Best practices
for benchmarking non-covalent interactions and the attendant revision
of the S22, NBC10, HBC6, and HSG databases, J. Chem. Phys. 135, p.
194102, doi:10.1063/1.3659142.
Marshall, M. S., and Sherrill, C. D. (2011). Dispersion-weighted explicitly cor-
related coupled-cluster theory [DW-CCSD(T**)-F12], J. Chem. Theory
Comput. 7, pp. 3978–3982, doi:10.1021/ct200600p.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 113

Mas, E. M., Bukowski, R., and Szalewicz, K. (2003). Ab initio three-body


interactions for water. I. Potential and structure of water trimer, J. Chem.
Phys. 118, pp. 4386–4403, doi:10.1063/1.1542871.
Mas, E. M., Bukowski, R., Szalewicz, K., Groenenboom, G. C., Wormer,
P. E. S. and van der Avoird, A. (2000). Water pair potential of
near spectroscopic accuracy. i. analysis of potential surface and
virial coefficients, J. Chem. Phys. 113, pp. 6687–6701, doi:10.1063/1.
1311289.
Mas, E. M., Szalewicz, K., Bukowski, R., and Jeziorski, B. (1997). Pair potential
for water from symmetry-adapted perturbation theory, J. Chem. Phys.
107, pp. 4207–4218, doi:10.1063/1.474795.
McDaniel, J. G., and Schmidt, J. R. (2012). Robust, transferable, and physically
motivated force fields for gas adsorption in functionalized zeolitic
imidazolate frameworks, J. Phys. Chem. C 116, pp. 14031–14039, doi:
10.1021/jp303790r.
McDaniel, J. G., and Schmidt, J. R. (2013). Physically-motivated force fields
from symmetry-adapted perturbation theory, J. Phys. Chem. A 117, pp.
2053–2066, doi:10.1021/jp3108182.
McDaniel, J. G., Yu, K., and Schmidt, J. R. (2012). Ab initio, physically
motivated force fields for CO2 adsorption in zeolitic imidazolate
frameworks, J. Phys. Chem. C 116, pp. 1892–1903, doi:10.1021/
jp209335y.
Merz, K. M. (2010). Limits of free energy computation for protein-ligand
interactions, J. Chem. Theory Comput. 6, pp. 1769–1776, doi:10.1021/
ct100102q.
Misquitta, A. J., Podeszwa, R., Jeziorski, B., and Szalewicz, K. (2005). Inter-
molecular potentials based on symmetry-adapted perturbation theory
with dispersion energies from time-dependent density-functional
calculations, J. Chem. Phys. 123, p. 214103, doi:10.1063/1.2135288.
Misquitta, A. J., and Stone, A. J. (2008). Accurate induction energies for small
organic molecules: 1. theory, J. Chem. Theory Comput. 4, pp. 7–18, doi:
10.1021/ct700104t.
Misquitta, A. J., Stone, A. J., and Price, S. L. (2008a). Accurate induction
energies for small organic molecules. 2. development and testing of
distributed polarizability models against SAPT(DFT) energies, J. Chem.
Theory Comput. 4, pp. 19–32, doi:10.1021/ct700105f.
Misquitta, A. J., Welch, G. W. A., Stone, A. J., and Price, S. L. (2008b). A first
principles prediction of the crystal structure of C6 Br2 ClFH2 , Chem. Phys.
Lett. 456, pp. 105–109, doi:10.1016/j.cplett.2008.02.113.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

114 QM Methods for Force-Field Development

Morokuma, K. (1971). Molecular orbital studies of hydrogen bonds. III. C=O


· · · H-O hydrogen bond in H2 CO · · · H2 O and H2 CO · · · 2H2 O, J. Chem.
Phys. 55, p. 1236, doi:10.1063/1.1676210.
Murdachaew, G., Misquitta, A. J., Bukowski, R., and Szalewicz, K. (2001).
Intermolecular potential energy surfaces and spectra of Ne-HCN
complex from ab initio calculations, J. Chem. Phys. 114, pp. 764–779,
doi:10.1063/1.1331101.
Murrell, J. N., and Teixeira, J. J. (1970). Dependence of exchange energy on or-
bital overlap, Mol. Phys. 19, p. 521, doi:10.1080/00268977000101531.
Neese, F., Wennmohs, F., and Hansen, A. (2009). Efficient and accurate local
approximations to coupled-electron pair approaches: An attempt to
revive the pair natural orbital method, J. Chem. Phys. 130, p. 114108,
doi:10.1063/1.3086717.
Neogrády, P., Pitoňák, M., and Urban, M. (2005). Optimized virtual orbitals
for correlated calculations: An alternative approach, Mol. Phys. 103, pp.
2141–2157.
Ng, K.-C., Meath, W. J., and Allnatt, A. R. (1976). Charge overlap effects
and validity of multipole results for 1st-order molecule-molecule
interaction energies: Formalism and an application to H2 -H2 , Mol. Phys.
32, pp. 177–194, doi:10.1080/00268977600101711.
Papajak, E., and Truhlar, D. G. (2011). Convergent partially augmented basis
sets for post-Hartree–Fock calculations of molecular properties and
reaction barrier heights, J. Chem. Theory Comput. 7, pp. 10–18, doi:
10.1021/ct1005533.
Parrish, R. M., and Sherrill, C. D. (2014). Spatial assignment of sym-
metry adapted perturbation theory interaction energy components:
The atomic SAPT partition, J. Chem. Phys. 141, p. 044115, doi:
10.1063/1.4889855.
Parker, T. M., Burns, L. A., Parrish, R. M., Ryno, A. G., and Sherrill, C. D. (2014).
Levels of symmetry adapted perturbation theory (SAPT). I. Efficiency
and performance for interaction energies, J. Chem. Phys. 140, p. 094106,
doi:10.1063/1.4867135.
Parker, T. M., Hohenstein, E. G., Parrish, R. M., Hud, N. V., and Sherrill, C. D.
(2013). Quantum-mechanical analysis of the energetic contributions to
π stacking in nucleic acids versus rise, twist, and slide, J. Am. Chem. Soc.
135, pp. 1306–1316, doi:10.1021/ja3063309.
Paton, R. S., and Goodman, J. M. (2009). Hydrogen bonding and pi-stacking:
How reliable are force fields? a critical evaluation of force field
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 115

descriptions of nonbonded interactions, J. Chem Inf. Model. 49, pp. 944–


955, doi:10.1021/ci900009f.
Pedersen, T. B., Sánchez de Merás, A. M. J., and Koch, H. (2004). Polarizability
and optical rotation calculated from the approximate coupled cluster
singles and doubles CC2 linear response theory using Cholesky
decompositions, J. Chem. Phys. 120, pp. 8887–8897, doi:10.1063/1.
1705575.
Piquemal, J.-P., Cisneros, G. A., Reinhardt, P., Gresh, N., and Darden, T. A.
(2006). Towards a force field based on density fitting, J. Chem. Phys.
124, p. 104101, doi:10.1063/1.2173256.
Piquemal, J.-P., Gresh, N., and Giessner-Prettre, C. (2003). Improved
formulas for the calculation of the electrostatic contribution to the
intermolecular interaction energy from multipolar expansion of the
electronic distribution, J. Phys. Chem. A 107, pp. 10353–10359, doi:
10.1021/jp035748t.
Pitonak, M., Aquilante, F., Hobza, P., Neogrady, P., Noga, J., and Urban, M.
(2011). Parallelized implementation of the CCSD(T) method in MOLCAS
using optimized virtual orbitals space and Cholesky decomposed two-
electron integrals, Collect. Czech. Chem. Commun. 76, pp. 713–742, doi:
10.1135/cccc2011048.
Pitoňák, M., Neogrády, P., Řezáč, J., Jurečka, P., Urban, M., and Hobza,
P. (2008). Benzene dimer: High-level wave function and density
functional theory calculations, J. Chem. Theory Comput. 4, pp. 1829–
1834.
Podeszwa, R., Bukowski, R., Rice, B. M., and Szalewicz, K. (2007).
Potential energy surface for cyclotrimethylene trinitramine dimer from
symmetry-adapted perturbation theory, Phys. Chem. Chem. Phys. 9, pp.
5561–5569, doi:10.1039/b709192c.
Podeszwa, R., Bukowski, R., and Szalewicz, K. (2006). Potential energy
surface for the benzene dimer and perturbational analysis of π -π
interactions, J. Phys. Chem. A 110, pp. 10345–10354.
Podeszwa, R., Patkowski, K., and Szalewicz, K. (2010). Improved interaction
energy benchmarks for dimers of biological relevance, Phys. Chem.
Chem. Phys. 12, pp. 5974–5979, doi:10.1039/b926808a.
Ponder, J. W., Wu, C., Ren, P., Pande, V. S., Chodera, J. D., Schnieders, M. J.,
Haque, I., Mobley, D. L., Lambrecht, D. S., DiStasio, R. A., Head-Gordon,
M., Clark, G. N. I., Johnson, M. E., and Head-Gordon, T. (2010). Current
status of the AMOEBA polarizable force field, J. Phys. Chem. B 114, pp.
2549–2564, doi:10.1021/jp910674d.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

116 QM Methods for Force-Field Development

Prochnow, E., Harding, M. E., and Gauss, J. (2010). Parallel calculation of


CCSDT and Mk-MRCCSDT energies, J. Chem. Theory Comput. 6, pp.
2339–2347, doi:10.1021/ct1002016.
Raghavachari, K., Trucks, G. W., Pople, J. A., and Head-Gordon, M. (1989).
A 5th-order perturbation comparison of electron correlation theories,
Chem. Phys. Lett. 157, pp. 479–483.
Řezáč, J., and Hobza, P. (2013). Describing noncovalent interactions beyond
the common approximations: How accurate is the gold standard,
ccsd(t) at the complete basis set limit? J. Chem. Theory Comput. 9, pp.
2151–2155, doi:10.1021/ct400057w.
Řezáč, J., Riley, K. E., and Hobza, P. (2011a). Extensions of the s66
data set: More accurate interaction energies and angular-displaced
nonequilibrium geometries, J. Chem. Theory Comput. 7, pp. 3466–3470,
doi:10.1021/ct200523a.
Řezáč, J., Riley, K. E., and Hobza, P. (2011b). S66: A well-balanced database
of benchmark interaction energies relevant to biomolecular structures,
J. Chem. Theory Comput. 7, pp. 2427–2438, doi:10.1021/ct2002946.
Riley, K. E., Platts, J. A., Rezac, J., Hobza, P., and Hill, J. G. (2012). Assessment
of the performance of MP2 and MP2 variants for the treatment of
noncovalent interactions, J. Phys. Chem. A 116, pp. 4159–4169, doi:
10.1021/jp211997b.
Ringer, A. L., Sinnokrot, M. O., Lively, R. P., and Sherrill, C. D. (2006). The effect
of multiple substituents on sandwich and t-shaped π –π interactions,
Chem. Eur. J. 12, pp. 3821–3828, doi:10.1002/chem.200501316.
Riplinger, C., Sandhoefer, B., Hansen, A., and Neese, F. (2013). Natural
triple excitations in local coupled cluster calculations with pair natural
orbitals, J. Chem. Phys. 139, p. 134101, doi:10.1063/1.4821834.
Saebø, S., and Pulay, P. (1985). Local configuration interaction: An efficient
approach for larger molecules, Chem. Phys. Lett. 113, pp. 13–18.
Saebø, S., and Pulay, P. (1993). Local treatment of electron correlation, Annu.
Rev. Phys. Chem. 44, pp. 213–236.
Salonen, L. M., Ellermann, M., and Diederich, F. (2011). Aromatic rings in
chemical and biological recognition: Energetics and structures, Angew.
Chem., Int. Ed. Engl. 50, pp. 4808–4842, doi:10.1002/anie.201007560.
Schutz, M., Brdarski, S., Widmark, P. O., Lindh, R., and Karlstrom, G. (1997).
The water dimer interaction energy: Convergence to the basis set limit
at the correlated level, J. Chem. Phys. 107, pp. 4597–4605, doi:10.1063/
1.474820.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 117

Schutz, M., and Werner, H.-J. (2000). Local perturbative triples correction
(T) with linear cost scaling, Chem. Phys. Lett. 318, pp. 370–378, doi:
10.1016/S0009-2614(00)00066-X.
Schwabe, T., and Grimme, S. (2007). Double-hybrid density functionals
with long-range dispersion corrections: Higher accuracy and extended
applicability, Phys. Chem. Chem. Phys. 9, pp. 3397–3406, doi:10.1039/
b704725h.
Sherrill, C. D. (2013). Energy component analysis of π interactions, Acc.
Chem. Res. 46, pp. 1020–1028, doi:10.1021/ar3001124.
Sherrill, C. D., Sumpter, B. G., Sinnokrot, M. O., Marshall, M. S., Hohenstein,
E. G., Walker, R. C., and Gould, I. R. (2009a). Assessment of standard
force field models against high-quality ab initio potential curves for
prototypes of π -π , CH/π , and SH/π interactions, J. Comput. Chem. 30,
pp. 2187–2193, doi:10.1002/jcc.21226.
Sherrill, C. D., Takatani, T., and Hohenstein, E. G. (2009b). An assessment
of theoretical methods for nonbonded interactions: Comparison to
complete basis set limit coupled-cluster potential energy curves for the
benzene dimer, the methane dimer, benzene-methane, and benzene-
H2 S, J. Phys. Chem. A 113, pp. 10146–10159, doi:10.1021/jp9034375.
Sinnokrot, M. O., and Sherrill, C. D. (2003). Unexpected substituent effects
in face-to-face π -stacking interactions, J. Phys. Chem. A 107, pp. 8377–
8379, doi:10.1021/jp030880e.
Sinnokrot, M. O., and Sherrill, C. D. (2004). Highly accurate coupled cluster
potential energy curves for benzene dimer: The sandwich, T-shaped,
and parallel-displaced configurations, J. Phys. Chem. A 108, 46, pp.
10200–10207, doi:10.1021/jp0469517.
Sinnokrot, M. O., Valeev, E. F., and Sherrill, C. D. (2002). Estimates of the ab
initio limit for π –π interactions: The benzene dimer, J. Am. Chem. Soc.
124, pp. 10887–10893, doi:10.1021/ja025896h.
Slipchenko, L. V., and Gordon, M. S. (2009). Damping functions in the
effective fragment potential method, Mol. Phys. 107, pp. 999–1016, doi:
10.1080/00268970802712449.
Sosa, C., Geersten, J., Trucks, G. W., Barlett, R. J., and Franz, J. A. (1989).
Selection of the reduced virtual space for correlated calculations - an
application to the energy and dipole-moment of H2 O, Chem. Phys. Lett.
159, pp. 148–154, doi:10.1016/0009-2614(89)87399-3.
Steinmann, S. N., and Corminboeuf, C. (2010). A system-dependent density-
based dispersion correction, J. Chem. Theory Comput. 6, pp. 1990–2001,
doi:10.1021/ct1001494.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

118 QM Methods for Force-Field Development

Steinmann, S. N., and Corminboeuf, C. (2011). A generalized-gradient


approximation exchange hole model for dispersion coefficients, J. Chem.
Phys. 134, p. 044117, doi:10.1063/1.3545985.
Stephens, P. J., Devlin, F. J., Chabalowski, C. F., and Frisch, M. J. (1994).
Ab Initio calculation of vibrational absorption and circular dichroism
spectra using density functional force fields, J. Phys. Chem. 98, pp.
11623–11627.
Stevens, W. J., and Fink, W. H. (1987). Frozen fragment reduced variational
space analysis of hydrogen bonding interactions. application to
the water dimer, Chem. Phys. Lett. 139, pp. 15–22, doi:10.1016/
0009-2614(87)80143-4.
Stone, A. J. (1981). Distributed multipole analysis, or how to describe a
molecular charge-distribution, Chem. Phys. Lett. 83, pp. 233–239, doi:
10.1016/0009-2614(81)85452-8.
Stone, A. J. (1996). The Theory of Intermolecular Forces (Oxford University
Press, Oxford).
Stone, A. J. (2011). Electrostatic damping functions and the penetration
energy, J. Phys. Chem. A 115, pp. 7017–7027, doi:10.1021/jp112251z.
Stone, A. J., and Alderton, M. (1985). Distributed multipole analysis -
methods and applications, Mol. Phys. 56, pp. 1047–1064, doi:10.1080/
00268978500102891.
Stone, A. J., and Misquitta, A. J. (2007). Atom-atom potentials from ab
initio calculations, Int. Rev. Phys. Chem. 26, pp. 193–222, doi:10.1080/
01442350601081931.
Szalewicz, K. (2012). Symmetry-adapted perturbation theory of intermolec-
ular forces, WIREs Comput. Mol. Sci. 2, pp. 254–272, doi:10.1002/wcms.
86.
Takatani, T., Hohenstein, E. G., Malagoli, M., Marshall, M. S., and Sherrill, C. D.
(2010). Basis set consistent revision of the S22 test set of noncovalent
interaction energies, J. Chem. Phys. 132, p. 144104, doi:10.1063/1.
3378024.
Takatani, T., and Sherrill, C. D. (2007). Performance of spin-component-
scaled Møller-Plesset theory (SCS-MP2) for potential energy curves of
noncovalent interactions, Phys. Chem. Chem. Phys. 9, pp. 6106–6114,
doi:10.1039/b709669k.
Tang, K. T., and Toennies, J. P. (1984). An improved simple model for the
van der Waals potential based on universal damping functions for the
dispersion coefficients, J. Chem. Phys. 80, pp. 3726–3741, doi:10.1063/
1.447150.
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

References 119

Taube, A. G., and Bartlett, R. J. (2005). Frozen natural orbitals: Systematic


basis set truncation for coupled-cluster theory, Collect. Czech. Chem.
Commun. 70, pp. 837–850.
Taube, A. G., and Bartlett, R. J. (2008). Frozen natural orbital coupled-
cluster theory: Forces and application to decomposition of nitroethane,
J. Chem. Phys. 128, p. 164101, doi:10.1063/1.2902285.
Thanthiriwatte, K. S., Hohenstein, E. G., Burns, L. A., and Sherrill, C. D.
(2011). Assessment of the performance of DFT and DFT-D methods
for describing distance dependence of hydrogen-bonded interactions,
J. Chem. Theory Comput. 7, pp. 88–96, doi:10.1021/ct100469b.
Torheyden, M., and Jansen, G. (2006). A new potential energy surface for
the water dimer obtained from separate fits of ab initio electrostatic,
induction, dispersion and exchange energy contributions, Mol. Phys.
104, pp. 2101–2138, doi:10.1080/00268970600679188.
Torres, E., and DiLabio, G. A. (2012). A (nearly) universally applicable
method for modeling noncovalent interactions using B3LYP, J. Phys.
Chem. Lett. 3, pp. 1738–1744, doi:10.1021/jz300554y.
Tsuzuki, S., Honda, K., Uchimaru, T., Mikami, M., and Tanabe, K. (2002).
Origin of attraction and directionality of the π –π interaction: Model
chemistry calculations of benzene dimer interaction, J. Am. Chem. Soc.
124, 1, pp. 104–112.
Volkov, A., Koritsanszky, T., and Coppens, P. (2004). Combination of the
exact potential and multipole methods (EP/MM) for evaluation of
intermolecular electrostatic interaction energies with pseudoatom
representation of molecular electron densities, Chem. Phys. Lett. 391,
pp. 170–175, doi:10.1016/j.cplett.2004.04.097.
von Lilienfeld, O. A., Tavernelli, I., Rothlisberger, U., and Sebastiani, D.
(2004). Optimization of effective atom centered potentials for London
dispersion forces in density functional theory, Phys. Rev. Lett. 93, 15, p.
153004.
Vydrov, O. A., and Voorhis, T. V. (2010). Nonlocal van der Waals density
functional: The simpler the better, J. Chem. Phys. 133, p. 244103, doi:
10.1063/1.3521275.
Wallqvist, A., Ahlström, P., and Karlström, G. (1990). A new intermolecular
energy calculation scheme - applications to potential surface and liquid
properties of water, J. Phys. Chem. 94, pp. 1649–1656, doi:10.1021/
j100367a078.
Wallqvist, A., Ahlström, P., and Karlström, G. (1991). Erratum to a new
intermolecular energy calculation scheme: Applications to potential
January 27, 2016 13:11 PSP Book - 9in x 6in 03-Qiang-Cui-c03

120 QM Methods for Force-Field Development

surface and liquid properties of water, J. Chem. Phys. 95, p. 4922, doi:
10.1021/j100165a060.
Wang, B., and Truhlar, D. G. (2010). Including charge penetration effects in
molecular modeling, J. Chem. Theory Comput. 6, pp. 3330–3342, doi:
10.1021/ct1003862.
Wang, F., Kumar, R., and Jordan, K. D. (2012). A distributed point polarizable
force field for carbon dioxide, Theor. Chem. Acc. 131, p. 1132, doi:
10.1007/s00214-012-1132-z.
Wang, J. M., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004).
Development and testing of a general amber force field, J. Comput.
Chem. 25, pp. 1157–1174, doi:10.1002/jcc.20035.
Watt, M., Hardebeck, L. K. E., Kirkpatrick, C. C., and Lewis, M. (2011). Face-to-
face arene-arene binding energies: Dominated by dispersion but pre-
dicted by electrostatic and dispersion/polarizability substituent con-
stants, J. Am. Chem. Soc. 133, pp. 3854–3862, doi:10.1021/ja105975a.
Werner, H.-J., and Schütz, M. (2011). An efficient local coupled cluster
method for accurate thermochemistry of large systems, J. Chem. Phys.
135, p. 144116, doi:10.1063/1.3641642.
Wheatley, R. J. (1993). Gaussian multipole functions for describing mole-
cular charge distributions, Mol. Phys. 79, pp. 597–610, doi:10.1080/
00268979300101481.
Wheatley, R. J., and Mitchell, J. B. O. (1994). Gaussian multipoles in practice
- electrostatic energies for intermolecular potentials, J. Comput. Chem.
15, pp. 1187–1198, doi:10.1002/jcc.540151102.
Whitten, J. L. (1973). Coulombic potential-energy integrals and approxima-
tions, J. Chem. Phys. 58, pp. 4496–4501, doi:10.1063/1.1679012.
Williams, H. L., Szalewicz, K., Jeziorski, B., Moszynski, R., and Rybak, S.
(1993). Symmetry-adapted perturbation theory calculation of the Ar-
H2 intermolecular potential energy surface, J. Chem. Phys. 98, pp. 1279–
1292.
Wu, Q., and Yang, W. (2002). Empirical correction to density functional
theory for van der Waals interactions, J. Chem. Phys. 116, pp. 515–524.
Xu, X., and Goddard, W. A. (2004). The X3LYP extended density functional for
accurate descriptions of nonbond interactions, spin states, and thermo-
chemical properties, Proc. Natl. Acad. Sci. USA 101, 9, pp. 2673–2677.
Zhang, Y., Xu, X., and Goddard, W. A. (2009). Doubly hybrid density functional
for accurate descriptions of nonbond interactions, thermochemistry,
and thermochemical kinetics, Proc. Natl. Acad. Sci. USA 106, pp. 4963–
4968.
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

Chapter 4

Force Field Development with


Density-Based Energy Decomposition
Analysis

Nengjie Zhou,a Qin Wu,b and Yingkai Zhanga,c


a Department of Chemistry, New York University, New York 10003, USA
b Center for Functional Nanomaterials, Brookhaven National Laboratory,

Upton, New York 11973, USA


c NYU-ECNU Center for Computational Chemistry at NYU Shanghai,

Shanghai 200062, China


[email protected]

4.1 Introduction

With recent advances in computer hardware and computational


methods, molecular modeling has been increasingly employed
to simulate biomolecules and materials, and demonstrated to
be powerful in elucidating new insights into structure-dynamics-
function relationships [1], as well as facilitating the design of new
drugs [2], catalysts and materials [3]. The foundation of molecular
modeling is ab initio quantum mechanics, which in principle would
provide the most rigorous potential energy surface to describe
a molecular system; however, its applicability to large systems

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

122 Force Field Development with Density-Based Energy Decomposition Analysis

will be much limited for the foreseeable future due to its high
computational cost. Currently, to model most molecular processes
that do not involve chemical reactions, such as protein folding,
biomolecular recognition, and macromolecular assembly, a widely
applied method is the molecular mechanical force field. Meanwhile,
it has long been recognized that due to limitations in currently
available force fields, the power of molecular modeling has been
hampered.
In this chapter, we summarize our recent efforts to examine and
develop force fields with high level ab initio quantum mechanical
calculations and the newly developed density-based energy decom-
position analysis method.

4.2 Density-Based Energy Decomposition Analysis

Pioneered by the Morokuma analysis in the 1970s, there have been


a variety of different EDA approaches [4–15] based on high level
QM calculations, and they have proven to be powerful tools to
study and analyze intermolecular interactions. EDA breaks down
the total QM interaction energy into physically meaningful terms
such as electrostatics, Pauli repulsion (or exchange), polarization (or
induction), and charge transfer. Alternatively, these terms may be
calculated directly from perturbation theory [16, 17]. Definitions of
these physical terms often can vary among different EDA methods.
Nonetheless, their successes have stimulated the development of a
new generation of quantum mechanical force fields, such as SIBFA
[18–21], EFP [22–25], GEM [26, 27], QMPFF [28–32], X-Pol [33, 34],
and mDC [35–37]. For most wave function–based EDA approaches,
a key intermediate state, which is used to separate out polarization
and charge transfer contributions from the total interaction energy,
is represented by the Heitler–London (HL) antisymmetrization of
two fragments’ wave functions, whose corresponding density does
not equal to the sum of two fragments’ densities. Very recently, Wu
et al. [38] have developed a purely density-based energy decom-
position analysis (DEDA) method within the framework of density
functional theory, in which the corresponding intermediate state is
variationally determined through constrained search to reproduce
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

Density-Based Energy Decomposition Analysis 123

the sum of two fragments’ densities. Thus, this DEDA approach


provides a new tool to examine intermolecular interactions and may
have some unique advantages for force field development over the
wave function–based EDA methods. Below we first summarize this
DEDA approach and its implementation, then discuss its comparison
with the wave function–based EDA methods, and finally illustrate
its application to examine directional dependence of hydrogen
bonding.

4.2.1 The DEDA Approach


The density-based energy decomposition analysis (DEDA) method
was recently developed by Wu et al. [38] for intermolecular
interactions, as illustrated in Fig. 4.1. Similar to other energy
decomposition analysis approaches, the determination of binding
energy components between two isolated molecules - A (with
density ρ0 , A) and B (with density ρ0 , B) - and their binding complex

Figure 4.1 Illustration of the density based energy decomposition analysis


(DEDA) scheme. Reprinted with permission from Lu, Z., Zhou, N., Wu,
Q. and Zhang, Y. Directional dependence of hydrogen bonds: A density-
based energy decomposition analysis and its implications on force field
development. J Chem Theory Comput 7, 4038–4049 (2011). Copyright
(2011) American Chemical Society.
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

124 Force Field Development with Density-Based Energy Decomposition Analysis

AB (with density ρ0 AB) involves two constructed intermediate


states.

4.2.1.1 The frozen density energy


The first intermediate state is called the frozen density state, where
the two fragments, A and B, are allowed to approach each other
without distorting the densities around them. As a result the density
of the frozen state is simply a superimposition of the original
densities around A and B (ρ[ρA0 + ρB0 ] = ρA0 + ρB0 ). The energy
difference between the initial state and this frozen state is called the
frozen density energy (E frz ), which consists of both electrostatic
and van der Waals interactions:

E frz = E es + E vdW . (4.1)

4.2.1.2 The electronic relaxation energy


In the second intermediate state, the density on each fragment is
allowed to relax to the extent that the number of electrons on each
fragment is constrained to be the same as in the frozen density
(NA0 and NB0 , respectively); that is, no charge transfer. The energy
difference between the two intermediate states I and II leads to the
polarization component (E pol ).
The final step is to allow density to fully relax and charges to flow
between the two fragments, where the energy lowering naturally
yields the charge transfer component (E ct ). Both polarization and
charge transfer energies add up to the electronic relaxation energy
(E relax ):

E relax = E pol + E ct . (4.2)

4.2.1.3 The total binding energy


The basis set superposition error (BSSE) can be further eliminated
by employing BSSE-corrected fragment energies and densities using
the standard counterpoise method [40]. The total BSSE-corrected
binding energy using the DEDA approach can be separated into two
components, the frozen density term and the electronic relaxation
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

Density-Based Energy Decomposition Analysis 125

term,

E bind
BSSE
= E [ρAB ] − E [ρA0 ] − E [ρB0 ] = E frz + E relax (4.3)

4.2.1.4 The implementation of DEDA


The implementation of DEDA represents a critical difference from
other EDA methods in that energies of all intermediate states
are variationally determined, which to the best of our knowledge
has not been achieved before. For the frozen density energy, the
optimization is done through a constrained search formulation in
DFT, i.e., E [ρA0 + ρB0 ] = min E [ρ], and implemented with the Wu–
ρ→ρA0 +ρB0
Yang (WY) algorithm [38, 41]. Therefore, the auxiliary wave function
(a Slater determinant) is optimized to reproduce the frozen density
while giving the lowest possible energy according to a chosen energy
functional. For the charge-localized state, i.e., before charge transfer,
its energy is optimized using constrained DFT [42]. We use the real-
space, atom-size adjusted Becke integration cells to define charge
populations and only count charge transfer resulted from the net
population change between the frozen density and the final density.
DEDA of the intermolecular interactions therefore involves four
steps of variational calculations, all of which can be done with an in-
house modified version of NWChem [43]. As illustrated in Fig. 4.1,
these four steps are (1) regular DFT calculations of the fragments
(using all the basis functions in the complex), (2) a WY calculation
for the frozen density of the complex that is built from the sum
of fragments’ densities, (3) a constrained DFT calculation where
charges are constrained to be the same as in the frozen density, and
(4) a regular DFT calculation of the complex.

4.2.2 DEDA vs. EDA


The most important distinction between DEDA and other wave
function–based EDA approaches [4–15] lies in the calculation of
the frozen density energy. We have explained above how DEDA
uses constrained search to variationally calculate the energy of the
frozen density state where fragments’ densities are superimposed
without distortions. This approach not only yields an optimal
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

126 Force Field Development with Density-Based Energy Decomposition Analysis

E frz separated from the density relaxation terms (E pol and
E ct ) but also allows a clean separation of electrostatic and Pauli
repulsion terms. Similar intermediate states in wave function-based
EDA approaches are represented by the HL antisymmetrization
of two fragments’ wave functions, [which is] necessary because
molecular orbitals from different fragments are not orthogonal.
This antisymmetrized wave function, however, deforms the frozen
density [12]; that is to say, its density does not correspond to the sum
of fragments’ densities. Such ambiguity makes it difficult to separate
electrostatic and Pauli repulsion terms in other EDA approaches.
In addition, a one-step antisymmetrization of the wave functions
means its energy is not variational.
Another unique feature of DEDA [38] is about its calculation
of the charge transfer component (E ct ), which is also calculated
variationally based on the net electron flow in real space. This net
counting matches classical view of charge transfer more closely and
a real space approach leads to a small basis set dependency. Force
field development can benefit from these unique features, as each
interaction component in DEDA according to the definitions is more
consistent with the typical physical picture employed in the classical
force field description of intermolecular interactions.
To illustrate the key difference between DEDA and wave
function–based EDA (MO-EDA) approaches, we have carried out
calculations on He-He. As shown in Fig. 4.2, the results by DEDA are
more consistent with physical understandings, where the density
relaxation contribution, including both the charge transfer and
polarization effects, to the total binding energy of a rare gas dimer
should be minimal. This is due to the unique feature of DEDA where
the frozen density interaction energy is cleanly separated from the
relaxation terms (charge transfer and polarization).

4.2.3 Directional Dependence of Hydrogen Bonding


As clearly described in the work by Baker et al [44, 45], one
challenge with existing widely used force fields is their descriptions
on hydrogen bonding directionality, the approaching direction of
the hydrogen atom to the acceptor atom in relation. The results
by high level quantum mechanical calculations agree closely with
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

Density-Based Energy Decomposition Analysis 127

0.175
Energies in kCal/mol

0.125

0.075

0.025

-0.025 2.2 2.7 3.2 3.7 4.2 4.7 5.2

-0.075
Dimer separaon in angstroms
Total binding DEDA relax MO-EDA relax

Figure 4.2 Comparison between MO-EDA and DEDA for He-He.

observations in protein structures, while force fields yield very


different results.
By employing the DEDA approach, we have made a systematic
investigation [39] on directional dependence hydrogen bonding
with both B3LYP and M06-2X functional [46–51], which both can
well describe structures and binding interaction energies for a
variety of hydrogen bonding systems according to some recently
extensive benchmark [51–54]. Our results clearly demonstrate
that frozen density interaction energy term is the key factor
in determining the hydrogen bonding (HB) orientation, while
the density relaxation energy term, including both polarization
and charge-transfer components, shows very little HB directional
dependence. This indicates that the deficiency of describing HB
orientation in current non-polarizable force fields is not due to the
lack of explicit polarization or charge-transfer terms. This finding
[39] is very different from the current dominant view regarding the
origin of hydrogen bonding directionality, and cannot be obtained
with wave function–based EDA approaches. As shown in Fig. 4.3,
there are clearly three distinctions between MO-EDA and DEDA
results: (1) E frz from MO-EDA is significantly smaller, which
implies the contribution from electronic relaxation energy to E bind
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

128 Force Field Development with Density-Based Energy Decomposition Analysis

Figure 4.3 Comparision of EDA frozen energies in kcal/mol along an angle


for water dimer (upper) and formamide dimer (lower) with DEDA (blue
curves) and MO-EDA (red curves) The MO-EDA employs the Heitler–London
(HL) antisymmetrization of two fragments’ wave functions to represent
the frozen density state. Reprinted with permission from Lu, Z., Zhou,
N., Wu, Q. and Zhang, Y. Directional dependence of hydrogen bonds: A
density-based energy decomposition analysis and its implications on force
field development. J Chem Theory Comput 7, 4038–4049 (2011). Copyright
(2011) American Chemical Society.
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

Smeared Charge Multipole Model for Electrostatics and Its Parameterization Protocol 129

is significantly larger for MO-EDA; (2) there is no strong correlation


between E frz and E bind for MO-EDA; (3) there is a much larger
variation of E frz from different DFT functionals for MO-EDA than
that for DEDA. These distinctions clearly demonstrate important
novel features of the DEDA approach, and suggest that the DEDA
approach should have some unique advantages for the force field
development in comparison with the wave function–based EDA
methods.

4.3 Smeared Charge Multipole Model for Electrostatics


and Its Parameterization Protocol

4.3.1 Brief Summary of Current Electrostatic Models


A vast majority of the force fields that are being used in simulations
today include a Lennard–Jones 12-6 van der Waals term and a
point Coulomb electrostatic term, qi q j /r, where i and j are the
two interacting particles, q being the point charges on particles and
r the distance between the two particles. The charges are usually
fitted to reproduce the electrostatic potentials around particles
calculated with quantum mechanics. The Coulomb interaction form
with point charges, however, neglects both anisotropy and charge
penetration effects, and thus its accuracy is limited. The anisotropy
effect can be modeled either using off-center charges [55–60] or
employing higher multipole moments [24, 61–63]. To account for
charge penetration [25, 62, 64–66], which has been known to make
significant contributions to intermolecular electrostatic interactions
at the short range, usually a damping function based on smeared
charges is introduced [28, 29, 63–70] or the electron density is
better modeled [71–73].

4.3.2 Going Beyond Point Charges: The Smeared Charge


with Multipole Model
Here our strategy is to avoid the employment of dimerization data
in parameterization; instead we derive all parameters by only using
electrostatic properties of monomers [39]. Thus our parameteriza-
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

130 Force Field Development with Density-Based Energy Decomposition Analysis

tion procedure can be directly applied to any molecules. Based on


the examination of six electrostatic models—atomic point charge,
off-center point charge, point charge with distributed multipoles,
atomic smeared charge, off-center smeared charge and smeared
charge with distributed multipoles—on the directional hydrogen
bonding problem, it was found [39] that the smeared charge
distributed multipole model (up to quadrupole), which took account
of charge penetration effect led to the best agreement with the
corresponding DEDA results.
In the smeared charge multipole model, each atom i is
represented by a smeared charge which consists of a nuclear
qi ai3 −ai r
charge Z i and an exponential charge density ρi (r) = 8π e ,
a point dipole and a point quadrupole. The pairwise electrostatic
interactions among dipoles and quadrupoles can be obtained with
standard multipole expansion following Stone’s formulation [62].
The interactions between two smeared charges can be calculated by
the following formula:
qA qB
E chg-chg = [1 − f (a, b, R) − f (b, a, R)]
R
qA Z B qB Z A ZAZB
+ g(a, R) + g(b, R) + , (4.4)
R R R
 
where f (a, b, R) = e−a R 2 b 2 2 1 − b22a−a2 + a2R and g(a, R) =
4 2

( b −a )
 
1 − e−a R 1 + a2R .
The interaction between a smeared charge and a point dipole is

R 
E chg-dipole
= − (Z A + λ3 qA ) μB · 3 , (4.5)
R
where λ3 = 1 − e−a R − aRe−a R − a 2R e−a R ,
2 2

The interaction between a smeared charge at A and a traceless


point quadrupole at B is,
 Rα Rβ
E chg-quadrupole = (Z A + λ5 qA ) Bαβ , (4.6)
R5
where λ5 = λ3 − 16 a3 R 3 e−a R , with Bαβ being the traceless quadrupole
moment at site B.
The parameterization scheme for the smeared charge with
multipole model is illustrated in Fig. 4.4: the parameterization starts
with geometry optimizations and electronic structural calculations
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

Examination and Parameterization of Interatomic Potentials for Rare Gas Dimers 131

Figure 4.4 Parameterization scheme of the smeared charge with multipole


electrostatic model.

of monomer molecules; the distribute multipoles are calculated with


the GDMA program version 2.2 [74], using the formatted checkpoint
file produced by Gaussian03 [75] as input; with nuclear charge Z A
taken as the number of valence electrons, for example Z = 4 for a
carbon atom, width parameters α for each charge site is determined
by minimizing the electrostatic potential differences between quan-
tum mechanical calculations and the damped multipolar expansion
over a set of grid points [25] by employing a modified “potential”
subprogram in TINKER 5.0 [76].

4.4 Examination and Parameterization of Interatomic


Potentials for Rare Gas Dimers [77]

4.4.1 Van der Waals Descriptions by Atomic Force Fields


The accuracy of the frozen energy term is also largely influenced
by the van der Waals (vdW) term, which is “the other” part of the
frozen density interaction. One of the main challenges with the force
field development is to model this vdW interaction. Currently the
  
Lennard–Jones 12-6 term [78], E vdW = i < j A i j /Ri12j − Bi j /R i j ,
6
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

132 Force Field Development with Density-Based Energy Decomposition Analysis

is overwhelmingly used in combination with a point Coulomb


electrostatic term, qi q j /r as is included in some most widely utilized
force fields such as AMBER [79, 80] OPLS-AA [81, 82], GROMOS
[83] and CHARMM [84–87]. Similar to the Coulomb electrostatic
term (qi q j /r), the Lennard–Jones 12-6 function form does not fully
represent the true physical picture of intermolecular interactions,
but rather acts as a compromise between computational feasibility
and accuracy. It was found that the R 12 term, which is used because
of mathematical convenience, is too repulsive at the short range
[88, 89]. Efforts have been made to improve the Lennard–Jones 12-
6 functional form, including adding an exponential term to allow
flexibility [89, 90], to re-parametrize the R 12 coefficient A i j using
values from a repulsive exponential term [91, 92], and to replace the
R 12 term with a softer R 9 term [93].
Another commonly used and more physically grounded van der
Waals function form is the Buckingham potential [94, 95], as is
included with the MM2 and MM3 force fields [96, 97]. It consists
of a physically appealing repulsion term, known as the Born–Mayer
 −Di j ri j
exponential function (E rep BM
= i j Ci j e ) and an attractive
dispersion term. The Buckingham potential is given as
λ
U (R) = Ae−α R − 6 , (4.7)
R
which is often written in the form of



ε 6 R Rm 6
U (R) = exp α 1 − − (4.8)
1 − 6/α α Rm R
Rare gas dimers are prototypical systems to examine van der
Waals interaction function forms [88, 92, 98–100]. It should be
noted that such examinations on the vdW potentials generally
employ the assumption that the total interactions between two rare
gas atoms are all from the vdW interactions. A seminal work by
Halgren [88] found that neither the Lennard–Jones type potentials
(Lennard–Jones 12-6 or Lennard–Jones 9-6) nor the Buckingham
exp-6 potentials was able to well replicate the high quality reference
data, while a buffered 14-7 potential was found to yield much better
performance. It should be noted that in the calculation of van der
Waals reference energies by Halgren [88], the charge penetration
effects have not been separated out. It is well known that there
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

Examination and Parameterization of Interatomic Potentials for Rare Gas Dimers 133

are significant charge-penetration effects at the short range. Thus


an open question is that if the charge-penetration effects were
separated out, can a function form with correct long-range behavior
be a good representation of vdW interactions?

4.4.2 DEDA and the Born–Mayer-D3 van der Waals Model


In order to address the above question, very recently we have
examined interatomic interactions for rare gas dimers using the
density-based energy decomposition analysis (DEDA) in conjunc-
tion with computational results from CCSD(T) at the complete
basis set (CBS) limit, namely CCSD(T)/CBS [101–104]. Specifically,
the total intermolecular interaction energy is calculated with the
CCSD(T)/CBS approach; the reference density-relaxation contri-
bution as well as electrostatics contribution to intermolecular
interaction are calculated by B3LYP-D3 calculations and DEDA;
the rest binding energy of separating out density-relaxation and
electrostatic contributions is considered as the reference van der
Waals contribution, which consists of both dispersion and Pauli
repulsion terms.
Our results [77] clearly indicate that the reference vdW interac-
tion energies for rare gas dimers can be very well modeled by the
sum of a B3LYP-D3 dispersion term and a physically appealing Born–
Mayer exponential function for describing repulsive interactions.
The B3LYP-D3 dispersion term is an add-on correction term
[48, 105] to overcome the well-known challenge in describing
dispersions by standard Kohn–Sham B3LYP calculations, and has
been parameterized to achieve the CCSD(T) accuracy [48]. Its two-
body interaction term can be cast into the following formula:
  Cij
E disp
B3LYP-D3
= Sn nn fd, n (ri j ), (4.9)
i j n=6, 8
ri j
where C ni j are atom-pairwise specific dispersion coefficients for
atoms i and j , which have been computed from first principles.
fd, n (ri j ) is a damping function proposed by Chai and Head-Gordon
[106] with the form of fd, n (ri j ) = 1
i j −αn , where sr, n is
1+6[ri j /(Sr, n R0 )]
ij
the order-dependent scaling factor of the cutoff radii R0 . Thus,
this dispersion term is screened at short range and has physically
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

134 Force Field Development with Density-Based Energy Decomposition Analysis

correct long-range behavior. The B3LYP-D3 dispersion parameters


(C 6 , C 8 and C 10 ), meanwhile, are available with the DFT-D3 program
[48, 105]. The dispersion energies can thus be calculated with the
available DFT-D3 program without changing any parameters.
The Born–Mayer repulsive exponential function is given as

E rep
BM
= C i j e−Di j ri j , (4.10)
ij

where C i j and Di j are atom-pairwise specific parameters. Since the


natural logarithm of  the calculated difference
 between E ref
vdw and
E B3LYP-D3
disp , i.e., log E ref
vdw − E disp
B3LYP-D3
, has been shown to have
almost perfect linearity against the interatomic distance from 0.6 to
1.0 of the sum of Bondi van der Waals radius [107], parameters C
and D are directly determined by fitting to these semi-logarithmic
plots at the range of 0.6 to 0.75 Bondi van der Waals distance for
homogeneous rare gas dimers. For heterodimers, their parameters
have been found to be very well reproduced by the following
physically motivated combination rule [108]:

Di j = 2Di j Di j /(Di j + Di j ) (4.11)

2 1 1
(C i j Di j ) Di j = (C ii Dii ) Dii (C j j D j j ) D j j (4.12)

Thus, our newly developed molecular mechanical force field for


describing rare gas atoms has three components [77]: a smeared
charge multipole model for charge penetrating electrostatics, a
B3LYP-D3 dispersion term for long-range attractions, and a Born–
Mayer exponential function for short-range repulsive interactions.
The test results show that this force field not only reproduces
rare gas interaction energies calculated at the CCSD(T)/CBS level,
but also yields each interaction component (electrostatic or vdW)
which agrees very well with its corresponding reference value.
Considering none of the force field parameters has been directly
fitted to reproduce total binding energies or any heterodimer in-
teraction energy component, this finding sets a solid foundation for
systematic force field development based on first principle quantum
mechanics calculations with density-based energy decomposition
analysis.
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

References 135

4.5 Outlook

In this chapter, we have summarized our recent development of


DEDA and our progresses in employing this novel approach to
facilitate the force field development. In spite of above promising
preliminary results, more work need to be carried out to establish
this to be a new and fruitful direction to develop better force fields.
Currently we are also exploring the possibility of applying DEDA
to intramolecular interactions. We have recently performed frozen
density calculations for the conformation analysis of the ethane
molecule, and found that at fixed bond lengths and bond angles, the
frozen density energy change with the torsion angle has a similar
profile to that of the total energy [109]. This result is consistent
with the conventional wisdom that the rotational barrier in ethane
is due to steric effects, captured in DEDA with frozen density energy
changes. Such information would be helpful for the development of
more robust torsional potentials, which is a major challenge in force
field development.

Acknowledgment

Research carried out in part at the Center for Functional Nano-


materials was supported by the U.S. Department of Energy, Office
of Basic Energy Sciences under Contract No. DE-AC02-98CH10886.
We acknowledge the contributions from Prof. Paul Ayers and
Dr. Zhenyu Lu.

References

1. Schueler-Furman, O., Wang, C., Bradley, P., Misura, K., and Baker, D.
Progress in modeling of protein structures and interactions. Science,
310, 638–642, doi:10.1126/science.1112160 (2005).
2. Jorgensen, W. L. The many roles of computation in drug discovery.
Science, 303, 1813–1818 (2004).
3. Sears, A., and Batra, R. C. Macroscopic properties of carbon nan-
otubes from molecular-mechanics simulations. Phys. Rev. B, 69,
doi:10.1103/PhysRevB.69.235406 (2004).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

136 Force Field Development with Density-Based Energy Decomposition Analysis

4. Kitaura, K., and Morokuma, K. A new energy decomposition scheme


for molecular interactions within the Hartree-Fock approximation. Int.
J. Quantum Chem., 10, 325–340 (1976).
5. Umeyama, H., and Morokuma, K. Origin of hydrogen-bonding: energy
decomposition study. J. Am. Chem. Soc., 99, 1316–1332 (1977).
6. Bagus, P. S., and Illas, F. Decomposition of the chemisorption bond by
constrained variations: Order of the variations and construction of the
variational spaces. J. Chem. Phys., 96, 8962–8970 (1992).
7. Chen, W., and Gordon, M. S. Energy decomposition analyses for many-
body interaction and applications to water complexes. J. Phys. Chem.,
100, 14316–14328 (1996).
8. Mo, Y., Gao, J., and Peyerimhoff, S. D. Energy decomposition analysis
of intermolecular interactions using a block-localized wave function
approach. J. Chem. Phys., 112, 5530–5538, doi:10.1063/1.481185
(2000).
9. Mayer, I. Energy partitioning schemes. Phys. Chem. Chem. Phys., 8,
4630–4646, doi:Doi 10.1039/B608822h (2006).
10. Khaliullin, R. Z., Cobar, E. A., Lochan, R. C., Bell, A. T., and Head-
Gordon, M. Unravelling the origin of intermolecular interactions using
absolutely localized molecular orbitals. J. Phys. Chem A, 111, 8753–
8765, doi:Doi 10.1021/Jp073685z (2007).
11. Reinhardt, P., Piquemal, J. P., and Savin, A. Fragment-localized Kohn–
Sham orbitals via a singles configuration-interaction procedure and
application to local properties and intermolecular energy decom-
position analysis. J. Chem. Theory Comput., 4, 2020–2029, doi:Doi
10.1021/Ct800242n (2008).
12. Mitoraj, M. P., Michalak, A., and Ziegler, T. A combined charge and
energy decomposition scheme for bond analysis. J. Chem. Theory
Comput., 5, 962–975, doi:Doi 10.1021/Ct800503d (2009).
13. Glendening, E. D. Natural energy decomposition analysis: Explicit
evaluation of electrostatic and polarization effects with application to
aqueous clusters of alkali metal cations and neutrals. J. Am. Chem. Soc.,
118, 2473–2482, doi:10.1021/ja951834y (1996).
14. Su, P., and Li, H. Energy decomposition analysis of covalent bonds
and intermolecular interactions. J. Chem. Phys., 131, 014102-014102-
014115, doi:doi:10.1063/1.3159673 (2009).
15. Stevens, W. J., and Fink, W. H. Frozen fragment reduced variational
space analysis of hydrogen bonding interactions. Application to the
water dimer. Chem. Phys. Lett., 139, 15–22 (1987).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

References 137

16. Jeziorski, B., Moszynski, R., and Szalewicz, K. Perturbation theory


approach to intermolecular potential energy surfaces of van der Waals
complexes. Chem. Rev., 94, 1887–1930, doi:10.1021/cr00031a008
(1994).
17. Misquitta, A. J., Podeszwa, R., Jeziorski, B., and Szalewicz, K. Inter-
molecular potentials based on symmetry-adapted perturbation theory
with dispersion energies from time-dependent density-functional
calculations. J. Chem. Phys., 123, doi:10.1063/1.2135288 (2005).
18. Gresh, N. Energetics of Zn2+ binding to a series of biologically relevant
ligands: A molecular mechanics investigation grounded on ab initio
self-consistent field supermolecular computations. J. Comput. Chem.,
16, 856–882, doi:10.1002/jcc.540160705 (1995).
19. Gresh, N., Guo, H., Salahub, D. R., Roques, B. P., and Kafafi, S. A.
Critical role of anisotropy for the dimerization energies of two
protein−protein recognition motifs: Cis-N-methylacetamide versus a
β-sheet conformer of alanine dipeptide. A joint ab initio, density
functional theory, and molecular mechanics investigation. J. Am. Chem.
Soc., 121, 7885–7894, doi:10.1021/ja9742489 (1999).
20. Gresh, N., Piquemal, J.-P., and Krauss, M. Representation of Zn(II)
complexes in polarizable molecular mechanics. Further refinements
of the electrostatic and short-range contributions. Comparisons with
parallel ab initio computations. J. Comput. Chem., 26, 1113–1130,
doi:10.1002/jcc.20244 (2005).
21. Gresh, N., Cisneros, G. A., Darden, T. A., and Piquemal, J.-P. Anisotropic,
Polarizable Molecular mechanics studies of inter- and intramolecular
interactions and ligand–macromolecule complexes. A bottom-up strat-
egy. J. Chem. Theory Comput., 3, 1960–1986, doi:10.1021/ct700134r
(2007).
22. Day, P. N., et al. An effective fragment method for modeling solvent
effects in quantum mechanical calculations. J. Chem. Phys., 105, 1968–
1986, doi:10.1063/1.472045 (1996).
23. Chen, W., and Gordon, M. S. The effective fragment model for solvation:
Internal rotation in formamide. J. Chem. Phys., 105, 11081–11090,
doi:10.1063/1.472909 (1996).
24. Gordon, M. S., et al. The effective fragment potential method: A QM-
based MM approach to modeling environmental effects in chemistry. J.
Phys. Chem. A, 105, 293–307, doi:10.1021/jp002747h (2001).
25. Slipchenko, L. V., and Gordon, M. S. Electrostatic energy in the effective
fragment potential method: Theory and application to benzene dimer.
J. Comput. Chem., 28, 276–291 (2007).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

138 Force Field Development with Density-Based Energy Decomposition Analysis

26. Piquemal, J.-P., Cisneros, G. A., Reinhardt, P., Gresh, N., and Darden, T.
A. Towards a force field based on density fitting. J. Chem. Phys., 124,
doi:10.1063/1.2173256 (2006).
27. Cisneros, G. A., Piquemal, J.-P., and Darden, T. A. Generalization of
the Gaussian electrostatic model: Extension to arbitrary angular
momentum, distributed multipoles, and speedup with reciprocal space
methods. J. Chem. Phys., 125, doi:10.1063/1.2363374 (2006).
28. Donchev, A. G., Ozrin, V. D., Subbotin, M. V., Tarasov, O. V., and Tarasov,
V. I. A quantum mechanical polarizable force field for biomolecular
interactions. Proc. Natl. Acad. Sci. U. S. A., 102, 7829–7834 (2005).
29. Donchev, A. G., et al. Water properties from first principles: Simulations
by a general-purpose quantum mechanical polarizable force field.
Proc. Natl. Acad. Sci. U. S. A., 103, 8613–8617 (2006).
30. Donchev, A. G., Galkin, N. G., Pereyaslavets, L. B., and Tarasov, V. I.
Quantum mechanical polarizable force field (QMPFF3): Refinement
and validation of the dispersion interaction for aromatic carbon. J.
Chem. Phys., 125, 244107–244112 (2006).
31. Donchev, A. G. Ab initio quantum force field for simulations of
nanostructures. Phys. Rev. B, 74, doi:10.1103/PhysRevB.74.235401
(2006).
32. Donchev, A. G., et al. Assessment of performance of the general purpose
polarizable force field QMPFF3 in condensed phase. J. Comput. Chem.,
29, 1242–1249 (2008).
33. Xie, W., and Gao, J. Design of a next generation force field:
The X-POL potential. J. Chem. Theory Comput., 3, 1890–1900,
doi:10.1021/ct700167b (2007).
34. Xie, W., Orozco, M., Truhlar, D. G., and Gao, J. X-pol potential:
An electronic structure-based force field for molecular dynamics
simulation of a solvated protein in water. J. Chem. Theory Comput., 5,
459–467, doi:10.1021/ct800239q (2009).
35. Giese, T. J., and York, D. M. Charge-dependent model for many-body
polarization, exchange, and dispersion interactions in hybrid quantum
mechanical/molecular mechanical calculations. J. Chem. Phys., 127,
doi:10.1063/1.2778428 (2007).
36. Giese, T. J., et al. A variational linear-scaling framework to build
practical, efficient next-generation orbital-based ouantum force fields.
J. Chem. Theory Comput., 9, 1417–1427, doi:10.1021/ct3010134
(2013).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

References 139

37. Giese, T. J., Chen, H., Huang, M., and York, D. M. Parametrization
of an orbital-based linear-scaling quantum force field for nonco-
valent interactions. J. Chem. Theory Comput., 10(3), 1086–1098,
doi:10.1021/ct401035t (2014).
38. Wu, Q., Ayers, P. W., and Zhang, Y. Density-based energy decomposition
analysis for intermolecular interactions with variationally determined
intermediate state energies. J. Chem. Phys., 131, 164112 (2009).
39. Lu, Z., Zhou, N., Wu, Q., and Zhang, Y. Directional dependence of
hydrogen bonds: A density-based energy decomposition analysis and
its implications on force field development. J. Chem. Theory Comput., 7,
4038–4049 (2011).
40. Boys, S. F., and Bernardi, F. Calculation of small molecular interactions
by differences of separate total energies: Some procedures with
reduced errors. Mol. Phys., 19, 553–566 (1970).
41. Wu, Q., and Yang, W. A direct optimization method for calculating
density functionals and exchange–correlation potentials from electron
densities. J. Chem. Phys., 118, 2498–2509, doi:doi:10.1063/1.1535422
(2003).
42. Wu, Q., and Van Voorhis, T. Direct optimization method to study
constrained systems within density-functional theory. Phys. Rev. A, 72,
doi:10.1103/PhysRevA.72.024502 (2005).
43. Valiev, M., Bylaska, E. J., Govind, N., Kowalski, K., Straatsma, T. P., van
Dam, H. J. J., Wang, D., Nieplocha, J., Apra, E., Windus, T. L., and de Jong,
W. A. NWChem: A comprehensive and scalable open-source solution
for large scale molecular simulations. Comput. Phys. Commun., 181,
1477–1489 (2010).
44. Kortemme, T., Morozov, A. V., and Baker, D. An Orientation-dependent
hydrogen bonding potential improves prediction of specificity and
structure for proteins and protein–protein complexes. J. Mol. Biol., 326,
1239–1259, doi:10.1016/S0022-2836(03)00021-4 (2003).
45. Morozov, A. V., Kortemme, T., Tsemekhman, K., and Baker, D.
Close agreement between the orientation dependence of hydro-
gen bonds observed in protein structures and quantum mechan-
ical calculations. Proc. Natl. Acad. Sci. U. S. A., 101, 6946–6951,
doi:10.1073/pnas.0307578101 (2004).
46. Becke, A. D. Density-functional exchange-energy approximation with
correct asymptotic-behavior. Phys. Rev. A, 38, 3098–3100 (1988).
47. Becke, A. D. Density-functional thermochemistry. 3. The role of exact
exchange. J. Chem. Phys., 98, 5648–5652 (1993).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

140 Force Field Development with Density-Based Energy Decomposition Analysis

48. Grimme, S., Antony, J., Ehrlich, S., and Krieg, H. A consistent and
accurate ab initio parametrization of density functional dispersion
correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys., 132,
154104 (2010).
49. Lee, C. T., Yang, W. T., and Parr, R. G. Development of the Colle–Salvetti
correlation-energy formula into a functional of the electron-density.
Phys. Rev. B, 37, 785–789 (1988).
50. Wu, Q., and Yang, W. Empirical correction to density functional theory
for van der Waals interactions. J. Chem. Phys., 116, 515 (2002).
51. Zhao, Y., and Truhlar, D. G. The M06 suite of density functionals
for main group thermochemistry, thermochemical kinetics, noncova-
lent interactions, excited states, and transition elements: Two new
functionals and systematic testing of four M06-class functionals
and 12 other functionals. Theor. Chem. Acc., 120, 215–241, doi:DOI
10.1007/s00214-007-0310-x (2008).
52. Thanthiriwatte, K. S., Hohenstein, E. G., Burns, L. A., and Sherrill, C.
D. Assessment of the performance of DFT and DFT-D methods for
describing distance dependence of hydrogen-bonded interactions. J.
Chem. Theory Comput., 7, 88–96, doi:10.1021/ct100469b (2011).
53. Hujo, W., and Grimme, S. Comparison of the performance of dispersion-
corrected density functional theory for weak hydrogen bonds.
Phys. Chem. Chem. Phys., 13, 13942–13950, doi:10.1039/c1cp20591a
(2011).
54. Riley, K. E., Pitonak, M., Cerny, J., and Hobza, P. On the structure and
geometry of biomolecular binding motifs (hydrogen-bonding, stack-
ing, X-H center dot center dot center dot pi): WFT and DFT calculations.
J. Chem. Theory Comput., 6, 66–80, doi:10.1021/ct900376r (2010).
55. Cieplak, P., Caldwell, J., and Kollman, P. Molecular mechanical models
for organic and biological systems going beyond the atom centered
two body additive approximation: Aqueous solution free energies
of methanol and N-methyl acetamide, nucleic acid base, and amide
hydrogen bonding and chloroform/water partition coefficients of the
nucleic acid bases. J. Comput. Chem., 22, 1048–1057 (2001).
56. Dixon, R. W., and Kollman, P. A. Advancing beyond the atom-centered
model in additive and nonadditive molecular mechanics. J. Comput.
Chem., 18, 1632–1646 (1997).
57. Karamertzanis, P. G., and Pantelides, C. C. Optimal site charge models
for molecular electrostatic potentials. Mol. Simulat., 30, 413–436,
doi:Doi 10.1080/08927020410001680769 (2004).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

References 141

58. Tschampel, S. M., Kennerty, M. R., and Woods, R. J. TIP5P-


consistent treatment of electrostatics for biomolecular simulations.
J. Chem. Theory Comput., 3, 1721–1733, doi:Doi 10.1021/Ct700046j
(2007).
59. Wang, Z. X., et al. Strike a balance: Optimization of backbone
torsion parameters of AMBER polarizable force field for simulations
of proteins and peptides. J. Comput. Chem., 27, 781–790, doi:Doi
10.1002/Jcc.20386 (2006).
60. Zhao, D.-X., et al. Development of a polarizable force field using
multiple fluctuating charges per atom. J. Chem. Theory Comput., 6, 795–
804 (2010).
61. Buckingham, A. D., and Fowler, P. W. A model for the geometries of van
der Waals complexes. Can. J. Chem., 63, 2018–2025 (1985).
62. Stone, A. J. The Theory of Intermolecular Forces, Clarendon Press;
Oxford University Press (1997).
63. Ren, P., and Ponder, J. W. Polarizable atomic multipole water model
for molecular mechanics simulation. J. Phys. Chem. B, 107, 5933–5947
(2003).
64. Cisneros, G. A., et al. Simple formulas for improved point-charge
electrostatics in classical force fields and hybrid quantum mechan-
ical/molecular mechanical embedding. Int. J. Quantum Chem., 108,
1905–1912, doi:Doi 10.1002/Qua.21675 (2008).
65. Piquemal, J.-P., Gresh, N., and Giessner-Prettre, C. Improved for-
mulas for the calculation of the electrostatic contribution to the
intermolecular interaction energy from multipolar expansion of
the electronic distribution. J. Phys. Chem. A, 107, 10353–10359,
doi:10.1021/jp035748t (2003).
66. Wang, B., and Truhlar, D. G. Including charge penetration effects in
molecular modeling. J. Chem. Theory Comput., 6, 3330–3342, doi:Doi
10.1021/Ct1003862 (2010).
67. Freitag, M. A., Gordon, M. S., Jensen, J. H., and Stevens, W. J. Evaluation
of charge penetration between distributed multipolar expansions. J.
Chem. Phys., 112, 7300–7306, doi:10.1063/1.481370 (2000).
68. Slipchenko, L. V., and Gordon, M. S. Damping functions in the
effective fragment potential method. Mol. Phys., 107, 999–1016,
doi:10.1080/00268970802712449 (2009).
69. Ponder, J. W., and Case, D. A. in Advances in Protein Chemistry, vol. 66
(Daggett Valerie, ed.), Academic Press, (2003), pp. 27–85.
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

142 Force Field Development with Density-Based Energy Decomposition Analysis

70. Ren, P., Wu, C., and Ponder, J. W. Polarizable atomic multipole-based
molecular mechanics for organic molecules. J. Chem. Theory Comput.,
7, 3143–3161 (2011).
71. Elking, D. M., Cisneros, G. A. S., Piquemal, J.-P., Darden, T. A., and
Pedersen, L. G. Gaussian multipole model (GMM). J. Chem. Theory
Comput., 6, 190–202, doi:10.1021/ct900348b (2010).
72. Masia, M., Probst, M., and Rey, R. On the performance of molecular
polarization methods. II. Water and carbon tetrachloride close to a
cation. J. Chem. Phys., 123, doi:10.1063/1.2075107 (2005).
73. Paricaud, P., Predota, M., Chialvo, A. A., and Cummings, P. T. From dimer
to condensed phases at extreme conditions: Accurate predictions of
the properties of water by a Gaussian charge polarizable model. J.
Chem. Phys., 122, 244511–244514 (2005).
74. Stone, A. J. Distributed multipole analysis: Stability for large basis sets.
J. Chem. Theory Comput., 1, 1128–1132 (2005).
75. Frisch, M. J., et al. Gaussian 03, Revision B.05, Gaussian, Inc. (2003).
76. Ponder, J. W. TINKER, Software Tools for Molecular Design, Version 5.0;
2009.
77. Zhou, N., Lu, Z., Wu, Q., and Zhang, Y. Improved parameterization of
interatomic potentials for rare gas dimers with density-based energy
decomposition analysis. J. Chem. Phys., 140, 214117 (2014).
78. Jones, J. E. On the determination of molecular fields. II. From the
equation of state of a gas. Proc. R. Soc. Lond. A, 106, 463–477,
doi:10.1098/rspa.1924.0082 (1924).
79. Cornell, W. D., et al. A second generation force field for the simulation
of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc., 117,
5179–5197 (1995).
80. Wang, J., Cieplak, P., and Kollman, P. A. How well does a restrained
electrostatic potential (RESP) model perform in calculating conforma-
tional energies of organic and biological molecules? J. Comput. Chem.,
21, 1049–1074 (2000).
81. Jorgensen, W. L., Maxwell, D. S., and Tirado-Rives, J. Development and
testing of the OPLS all-atom force field on conformational energetics
and properties of organic liquids. J. Am. Chem. Soc., 118, 11225–11236
(1996).
82. Jorgensen, W. L., and Tirado-Rives, J. The OPLS [optimized potentials
for liquid simulations] potential functions for proteins, energy
minimizations for crystals of cyclic peptides and crambin. J. Am. Chem.
Soc., 110, 1657–1666, doi:10.1021/ja00214a001 (1988).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

References 143

83. Hermans, J., Berendsen, H. J. C., Van Gunsteren, W. F., and Postma,
J. P. M. A consistent empirical potential for water–protein inter-
actions. Biopolymers, 23, 1513–1518, doi:10.1002/bip.360230807
(1984).
84. Brooks, B. R., et al. CHARMM: A program for macromolecular energy,
minimization, and dynamics calculations. J. Comput. Chem., 4, 187–217
(1983).
85. MacKerell, A. D., Wiorkiewicz-Kuczera, J., and Karplus, M. An all-atom
empirical energy function for the simulation of nucleic acids. J. Am.
Chem. Soc., 117, 11946–11975 (1995).
86. MacKerell, A. D., et al. All-atom empirical potential for molecular
modeling and dynamics studies of proteins. J. Phys. Chem. B, 102,
3586–3616 (1998).
87. Foloppe, N., MacKerell, A. D., and Jr. All-atom empirical force field for
nucleic acids: I. Parameter optimization based on small molecule and
condensed phase macromolecular target data. J. Comput. Chem., 21,
86–104 (2000).
88. Halgren, T. A. The representation of van der Waals (vdW) interactions
in molecular mechanics force fields: Potential form, combination rules,
and vdW parameters. J. Am. Chem. Soc., 114, 7827–7843 (1992).
89. Kaminski, G. A., Stern, H. A., Berne, B. J., and Friesner, R. A. Development
of an accurate and robust polarizable molecular mechanics force field
from ab initio quantum chemistry. J. Phys. Chem. A, 108, 621–627
(2004).
90. Borodin, O., and Smith, G. D. Development of many–body po-
larizable force fields for Li-battery components: 1. ether, alkane,
and carbonate-based solvents. J. Phys. Chem. B, 110, 6279–6292
(2006).
91. Mayo, S. L., Olafson, B. D., and Goddard, W. A. DREIDING: A generic
force field for molecular simulations. J. Phys. Chem. Us, 94, 8897–8909
(1990).
92. Rappe, A. K., Casewit, C. J., Colwell, K. S., Goddard, W. A., and Skiff, W.
M. UFF, a full periodic table force field for molecular mechanics and
molecular dynamics simulations. J. Am. Chem. Soc., 114, 10024–10035
(1992).
93. Warshel, A., and Lifson, S. Consistent force field calculations. II. Crystal
structures, sublimation energies, molecular and lattice vibrations,
molecular conformations, and enthalpies of alkanes. J. Chem. Phys., 53,
582–594, doi:10.1063/1.1674031 (1970).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

144 Force Field Development with Density-Based Energy Decomposition Analysis

94. Buckingham, R. A. The classical equation of state of gaseous helium,


neon and argon. Proc. R. Soc. Lond. Ser. A. Math. Phys. Sci., 168, 264–
283, doi:10.1098/rspa.1938.0173 (1938).
95. Hill, T. L. Steric effects. I. van der Waals potential energy curves. J. Chem.
Phys., 16, 399 (1948).
96. Allinger, N. L. Conformational analysis. 130. MM2. A hydrocarbon force
field utilizing V1 and V2 torsional terms. J. Am. Chem. Soc., 99, 8127–
8134 (1977).
97. Allinger, N. L., Yuh, Y. H., and Lii, J. H. Molecular mechanics. The MM3
force field for hydrocarbons. 1. J. Am. Chem. Soc., 111, 8551–8566
(1989).
98. Toennies, J. P. On the validity of a modified Buckingham potential for
the rare gas dimers at intermediate distances. Chem. Phys. Lett., 20,
238–241, doi:10.1016/0009–2614(73)85166-8 (1973).
99. Tang, K. T., and Toennies, J. P. An improved simple model for the van
der Waals potential based on universal damping functions for the
dispersion coefficients. J. Chem. Phys., 80, 3726 (1984).
100. Tang, K. T., and Toennies, J. P. The van der Waals potentials between
all the rare gas atoms from He to Rn. J. Chem. Phys., 118, 4976–4983
(2003).
101. Takatani, T., Hohenstein, E. G., Malagoli, M., Marshall, M. S., and Sherrill,
C. D. Basis set consistent revision of the S22 test set of noncovalent
interaction energies. J. Chem. Phys., 132, 144104-144104-144105
(2010).
102. Marshall, M. S., Burns, L. A., and Sherrill, C. D. Basis set convergence
of the coupled-cluster correction, δMP2CCSD(T): Best practices for
benchmarking non-covalent interactions and the attendant revision of
the S22, NBC10, HBC6, and HSG databases. J. Chem. Phys., 135, 194102-
194102-194110 (2011).
103. Helgaker, T., Klopper, W., Koch, H., and Noga, J. Basis-set convergence
of correlated calculations on water. J. Chem. Phys., 106, 9639–9646,
doi:10.1063/1.473863 (1997).
104. Halkier, A., et al. Basis-set convergence in correlated calculations on
Ne, N2, and H2O. Chem. Phys. Lett., 286, 243–252 (1998).
105. Grimme, S., Ehrlich, S., and Goerigk, L. Effect of the damping function
in dispersion corrected density functional theory. J. Comput. Chem., 32,
1456–1465 (2011).
106. Chai, J.-D., and Head-Gordon, M. Long-range corrected hybrid density
functionals with damped atom–atom dispersion corrections. Phys.
Chem. Chem. Phys., 10, 6615–6620 (2008).
January 27, 2016 13:13 PSP Book - 9in x 6in 04-Qiang-Cui-c04

References 145

107. Bondi, A. van der Waals volumes and radii. J. Phys. Chem., 68, 441–451
(1964).
108. Smith, F. T. Atomic distortion and the combining rule for repulsive
potentials. Phys. Rev. A, 5, 1708–1713 (1972).
109. Wu, Q. Variational nature of the frozen density energy in density-
based energy decomposition analysis and its application to torsional
potential. J. Chem. Phys., 140, 244109 (2014).
This page intentionally left blank
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Chapter 5

Effective Fragment Potential Method

Lyudmila V. Slipchenko
Department of Chemistry, Purdue University,
West Lafayette, IN 47906, USA
[email protected]

5.1 Introduction

Computational modeling of quantum chemical processes in ex-


tended systems remains one of the main challenges in theoretical
chemistry. This is because modeling a system with a large number
of degrees of freedom is computationally expensive if not intractable
without applying additional approximations. Another challenge is
in increased number of reaction pathways and in necessity of con-
figurational sampling and averaging. Thus, efficient algorithms for
configurational sampling should be combined with approximations
for decreasing computational cost and scaling, such as classical force
fields and QM/MM schemes, semiempirical and density functional
methods, linear scaling techniques, and fragmentation approaches.
The effective fragment potential (EFP) method emerged as
a promising compromise between computational efficiency and
rigorous ab initio-based formulation of interaction energy in
weakly interacting systems [1–6]. The EFP method decomposes

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

148 Effective Fragment Potential Method

intermolecular non-covalent interactions into Coulomb (electrosta-


tic), polarization (induction), dispersion, exchange-repulsion, and
optional charge-transfer terms, all of which are derived as truncated
series in a long-range (in terms of 1/R) and short-range (in terms
of intermolecular overlap) perturbation theory [1, 7–10]. A building
block in the EFP method is a so-called “fragment,” which is typically
a molecule in cluster or liquid. Parameters for individual fragments
are obtained from electronic structure calculations on a (gas-phase)
fragment and contain a set of properties such as point charges
and multipoles, static and time-dependent polarizabilities, localized
wave function, etc. Thus, on the one hand, EFP can be thought of
as a force field in which both a functional form and parameters
originate from first principles. On the other hand, the EFP method
is similar in spirit to energy decomposition schemes such as SAPT
(symmetry adapted perturbation theory) [11–12], with a difference
that components for specific energy terms are precomputed and
stored as parameters of individual fragments. The latter comparison
with SAPT provides an obvious mean for benchmarking accuracy
of the EFP method, as shown in Section 5.3. Exploiting a similarity
to classical force fields, EFP can be characterized as a universal
polarizable force field. When combined with quantum calculations
in a QM/MM scheme (called “QM/EFP”), EFP provides polarizable
embedding for the quantum region.

5.2 Overview of the EFP Theory

Originally EFP was introduced as a model potential for describing


water and aqueous solvation. This water potential is referred to
as EFP1 [1–2]. The emphasis in the development of the EFP1
water potential was placed on detailed description of hydrogen
bonding. As the hydrogen bonds are governed by the Coulomb and
polarization interactions, an advanced description of these terms
is one of the distinguishing features of the EFP method. Namely,
the Coulomb term is modeled by distributed multipoles up to
octopoles, centered at atoms and bond mid-points. Polarization term
is described by using distributed polarizability tensors centered at
the localized molecular orbital (LMO) centroids. Polarization energy
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Overview of the EFP Theory 149

is obtained self-consistently and incorporates many-body effects.


Both Coulomb and polarization terms are screened by short-range
damping functions that account for charge-penetration effects and
safeguard a computer simulation from the polarization collapse [13–
15].
The van der Waals part of the interaction energy in the EFP1
water potential is described by a fitted exponential term for
fragment–fragment interactions and a Gaussian term when the
quantum part is present. Fitting of the water EFP1 van der Waals
term, called E rem , was performed based on the total interaction
energies of water dimers obtained with Hartree–Fock (HF) or
density functional theory (DFT) with B3LYP functional, resulting in
EFP1-RHF and EFP1-DFT versions [16], respectively.
Thus, the EFP1 interaction energy is obtained as
E EFP1 = E Coul + E pol + E rem (5.1)
The EFP1 models faithfully reproduce the parent methods, HF or
DFT, and suffer from the limitations of those, e.g., neglecting the
dispersion interactions.
A general description of van der Waals part of the intermolecular
interactions is implemented in EFP (originally referred to as EFP2),
a potential suitable for a general solvent. Dispersion part of the
van der Waals interactions is modeled by using distributed LMO-
centered dynamic polarizabilities ᾱ, augmented by short-range
screening functions fdamp . Dispersion energy between each pair of
fragments A and B is calculated as [9]
4 pq   C 6
pq
E disp = − fdamp , (5.2)
3 p∈A q∈B
R 6pq
pq
where R pq is the distance and C 6 is the effective dispersive
coefficient between points p and q, obtained as an integral over
dynamic polarizabilities in imaginary frequency range:
∞
pq
C 6 = dν ᾱ p (i ν) ᾱ q (i ν). (5.3)
0
The screening function fdamp may be represented using Tang–
Toennies expression [17] or using intermolecular overlap integrals
S pq as [15]
 2    
= 1 −  S pq  (1 − 2 ln  S pq  +2 ln2  S pq  ).
pq
f
damp (5.4)
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

150 Effective Fragment Potential Method

The repulsive part of the potential, represented by the exchange-


repulsion term, is treated quantum-mechanically with an approxi-
mate expression of the Hartree–Fock exchange-repulsion energy in
terms of the intermolecular overlap integrals [7–8, 18]. Specifically,
the exchange-repulsion energy between fragments A and B is
calculated as a sum of contributions from LMOs on each fragment:

E exrep = pq
E exrep , (5.5)
p∈A q∈B

    
−2 ln  S pq  S 2pq  
pq
E exrep = −4 −2S pq F pr Srq +
A
F qs Ssp −2T pq
B
π R pq r∈A s∈B
 
 ZJ  1  ZI  1 1
+2S pq −
2
+2 − +2 − ,
J ∈B
R pJ l∈B
R pl I ∈A
RIq k∈A
R kq R pq

(5.6)
where p, q, r, s are LMOs, I and J are nuclei, S is the overlap
integral, T is the kinetic energy integral, and F is the Fock matrix.
The overlap and kinetic integrals are calculated on-the-fly for each
pair of effective fragments, using frozen localized wave functions
stored for each effective fragment.
Additionally, charge-transfer energy, i.e., the resonance stabiliza-
tion due to a configuration with electron being transferred between
two fragments, can be obtained and added to the total interaction
energy of the system [10, 19]. However, as the charge-transfer
energy is the smallest by magnitude but most computationally
expensive EFP term, it is often omitted in large-scale simulations.
Overall, the EFP (EFP2) interaction energy is composed as
E EFP = E Coul + E pol + E disp + E exrep + E ct (5.7)
Parameters for each effective fragment can be generated in a
special GAMESS [20–21] run called “MAKEFP.” Summary of the
EFP parameters required for each energy term and a relative
computational cost of each EFP term is shown in Table 5.1.
As follows from Table 5.1, the exchange-repulsion and charge-
transfer terms are the most computationally expensive parts of the
EFP calculations because evaluations of one-electron integrals are
involved.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Accuracy of the EFP Method for Describing Intermolecular Interactions 151

Table 5.1 Overview of the interaction terms present in the EFP method
and their relative computational cost

Interaction EFP parameters Cost


Coulomb with screening Point multipoles through octopoles, 1
screening parameters
Polarization in a self-consistent Distributed LMO polarizabilities 3
manner with screening
Dispersion 1/R6 term with guessed Distributed LMO dynamic 1
1/R8 term with screening polarizabilities
Exchange-repulsion from first Basis set, occupied LMOs, Fock matrix 10
principles using intermolecular overlap
Charge transfer (not physical transfer Occupied and virtual canonical orbitals, 100
of charge) from first principles by using Fock matrix
intermolecular overlap

5.3 Accuracy of the EFP Method for Describing


Intermolecular Interactions

In order to provide an accurate description of non-covalent


interactions, the theoretical method should be balanced in treating
different parts of binding energies, such as Coulomb, polarization,
and dispersion interactions. Several datasets have been designed
for benchmarking and comparing the accuracy of computational
models in describing non-covalent interactions [22–27]. One of
these datasets, S22 set, consists of 22 dimers at their equilibrium
geometries dominated by various types of interactions, such as H-
bonding, dispersion, and mixed type. CCSD(T)/CBS energies are
used as reference values.
Table 5.2 shows a performance of ab initio, DFT, and force
field methods, as well as EFP, on the S22 dataset [28]. The
second-order perturbation theory, MP2, tends to overestimate the
dispersion forces, which becomes obvious from significant errors
in describing dispersion-dominated complexes. On the other hand,
HF and many popular DFT methods do not describe dispersion
at all, again resulting in dramatic errors in dispersion-dominated
systems. Augmenting the DFT functionals with dispersion correc-
tions like in BLYP-D3 [29] or ωB97X-D [30] dramatically improves
their performance. Classical force fields are significantly in error
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

152 Effective Fragment Potential Method

Table 5.2 Mean unsigned errors (MUE) (kcal/mol) of the


total interaction energies for hydrogen bonded (HB), dispersion
dominated (DISP), mixed (MIXED) and all (ALL) complexes of the
S22 dataset by EFP, molecular mechanics force fields, HF, DFT, and
ab initio methods

HB DISP MIXED ALL

EFP 1.97 0.48 0.34 0.91


Force Fields
Amber 4.79 0.98 0.98 2.16
OPLSAA 4.59 1.04 0.57 2.02
MMFF94 3.75 0.88 0.59 1.70
HF and DFT
HF 3.29 7.24 3.15 4.56
B3LYP 1.77 6.22 2.64 3.54
PBE 1.13 4.53 1.66 2.44
M05 1.26 3.16 1.09 1.84
M06 0.89 0.99 0.67 0.85
M06-2X 0.73 0.36 0.32 0.47
BLYP-D3 0.23
ωB97X-D 0.22
Correlated methods
MP2 0.24 1.69 0.61 0.88
SCS-MP2 1.54 0.55 0.37 0.80
SCS-CCSD 0.40 0.23 0.08 0.24
10% a 1.38 0.48 0.39 0.74

Source: Adapted from Ref. [28] and references therein.


a
10% values of the average interaction energies.

for H-bonded complexes, manifesting an insufficient accuracy in


describing Coulomb and polarization interactions.
Compared to classical force fields, EFP treatment of Coulomb
interactions through inclusion of high-order multipoles, charge-
penetration screening, and polarization improves the description of
the H-bonded complexes as well as the overall performance. As a
result, the EFP accuracy is similar to that of the MP2 perturbation
theory and M06 density functional, while the EFP computational
cost and scaling is several orders of magnitude less. EFP accuracy
is consistent for different types of non-covalent interactions, i.e.,
relative errors in H-bonded, dispersion-dominated, and mixed
complexes range between 9% and 14%, as seen by comparing mean
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Chemistry of Non-Covalent Interactions 153

unsigned errors (MUE) and 10% values of the average interaction


energies for each type of complexes.

5.4 Chemistry of Non-Covalent Interactions

Non-covalent interactions govern phenomena related to conden-


sation, solvation, adsorption, and crystallization. A role of non-
covalent interactions in biology spans from controlling of protein
folding and functions of nucleic acids to drug design, molecular
recognition, and enzyme catalysis.
EFP1 and general EFP methods were extensively used to
investigate non-covalent interactions in clusters and liquids. For
example, EFP1 water potential was used to characterize structures
and binding energies in water clusters and liquid water [31–
33]. General EFP method was employed in studies of alcohol–
water clusters and mixtures [34–35] and solvation of ions [36–37],
benzene and substituted benzene dimers [14, 38], water–benzene
complexes [39], intermolecular interactions in styrene clusters [40]
and DNA base pairs [5, 41].

5.4.1 Competition between H-Bonding, π –π Bonding, and


π –H Bonding
Detailed and balanced description of different parts of intermolecu-
lar interactions is unique feature of the EFP method, which enables
predictive investigations of heterogeneous systems. One vivid
example of structural heterogeneity is observed in water–benzene
complexes due to interplay among H-bonding, π –π bonding, and
π–H bonding.
EFP interaction energies in the water dimer, benzene dimers, and
water–benzene dimers are compared in Fig. 5.1 [39]. Interaction
in the water dimer is dominated by the Coulomb term (−8.6 kcal/
mol), whereas the polarization and dispersion components are
almost 10 times weaker. Contrarily, dispersion forces (−4.9 kcal/
mol) determine binding in the parallel-displaced benzene dimer.
Interestingly, the two structures of the benzene–water dimer and
the T-shaped benzene dimer exhibit significant contributions from
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

154 Effective Fragment Potential Method

Coulomb -8.6 -3.8 -3.9 -0.1 -1.8

exchange-
5.3 2.3 3.2 2.9 2.4
repulsion

polarization -1.0 -0.6 -0.4 -0.3 -0.2

dispersion -0.9 -1.8 -1.4 -4.9 -3.2

total
-5.1 -3.9 -2.6 -2.4 -2.9
interaction
total with
-2.6 -2.9 -1.6 -2.0 -2.4
ZPE

Figure 5.1 EFP total interaction energies and energy components


(kcal/mol) in water dimer, two benzene–water dimers, and two benzene
dimers. ZPE-corrected interaction energies of the dimers are also shown.

both dispersion and Coulomb forces. As for the total interaction


energies in these dimers, the water dimer is the most strongly
bound, the benzene dimers have the weakest interaction energies,
but the water–benzene dimer is in between. However, including
zero-point vibrational energies (ZPEs) makes disturbs this order.
ZPEs are the largest in the water dimer and the smallest in the
benzene dimers, while benzene–water dimers being in the middle.
As a result, the ZPE-corrected interaction energies of the dimers are
much less spread energetically. This observation suggests that the
immiscibility of benzene in water is due to an unfavorable entropy,
rather than enthalpy, contribution.

5.4.2 Many-Body Interactions in Mixed Systems


As the polarization term in EFP is non-additive, the EFP method
captures a majority of many-body effects in H-bonded systems. By
definition, the many-body energy is a difference between a total
energy of a system and energies of all pairwise interactions. In
polar complexes, the many-body interactions are predominantly of
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Chemistry of Non-Covalent Interactions 155

polarization origin. In non-polar systems such as inert gases, disper-


sion and exchange-repulsion many-body effects become significant.
However, at present, many-body exchange and dispersion are not
accounted for in the EFP method.
Analysis of many-body interactions in water–benzene complexes
provides an intriguing observation that benzene participates in
H-bonding network typically associated with water structures [39].
In the lowest energy water–benzene dimer (see Fig. 5.1), benzene
serves as an acceptor of a H-bond, donating its negatively charged
π-electron density to the positively charged hydrogen of water,
which results in H–π bonding. Geometry of this dimer, with water
sitting on top of the aromatic ring, is optimal for H–π bonding. In
the second water–benzene dimer, the water is effectively a H-bond
acceptor, donating the electron density from a negatively charged
oxygen to positively charged hydrogens of the benzene. To maximize
this interaction, the water is located in plane with the benzene ring.
In larger water–benzene complexes, benzene can also serve as
a donor or acceptor of a H-bond. For example, in W2B1 trimer, the
benzene serves as a H-bond donor for one water and as a H-bond
acceptor for another (see Fig. 5.2). This topology favors H-bonding
network, as becomes obvious from the magnitude of the stabilizing
many-body energy of the complex. Similarly, in W1B2a, the lowest-
energy structure of one-water–two-benzene complex, one benzene
molecule is the H-bond donor and the other benzene is the

Figure 5.2 Structures and many-body interaction energies (kcal/mol) in


low-lying water–benzene trimers and tetramers.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

156 Effective Fragment Potential Method

H-bond acceptor with respect to the water molecule, while the two
benzenes build a polar H–π bond with each other. As a result, a H-
bonding cycle exists also in this trimer. On the other hand, in the
second one-water–two-benzene complex W1B2b, the water donates
H-bonds to both benzenes and the many-body polarization becomes
destructive.
Analysis of the many-body interactions in tetramers suggests
that benzene is actively involved in a collective H-bonding network
in these clusters as well. For example, as shown in Fig. 5.2,
several energetically low-lying benzene–water tetramers prefer
cyclic structures that are similar to the structure of the water
tetramer, and exhibit significant stabilizing many-body energies.

5.4.3 Role of Polarization Energy Increases from Dimers to


Bulk
Another intriguing example of competing non-covalent forces
appears in water–alcohol mixtures, where differences in strengths
of various H-bonds and hydrophobic contacts gives rise to structural
heterogeneity at the microscopic level. The microscopic heterogene-
ity may or may not lead to macroscopic phase separation between
different components, depending on system composition and
thermodynamic conditions. A balanced description of interaction
energy components in the EFP method sets stage for investigating
water–alcohol systems.
Tert-butyl alcohol (TBA) is the largest monohydric alcohol fully
solvated in water. A level of micro-heterogeneity and mixing in
water–TBA systems is under debate [42–43]. EFP reliably predicts
strengths of hydrogen bonding in various water–TBA dimers,
as compared to the MP2/6-311++G(d,p) calculations [35] (see
Fig. 5.3). Intermolecular distances obtained by the EFP and MP2
methods differ by less than 0.1 Å. Stabilities of the dimers by EFP
are within 0.8 kcal/mol of the MP2 stabilities, with the largest error
observed for the first (W-TBAa) water–TBA dimer.
Table 5.3 shows a decomposition of the EFP interaction energy in
water–TBA dimers. As expected, Coulomb and exchange-repulsion
energies contribute the most to the total hydrogen bonding energy
for all of the dimers. The Coulomb energy fraction decreases in
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Chemistry of Non-Covalent Interactions 157

W-W W-TBA a W-TBA b TBA-TBA


EFP -5.88 -7.41 -5.42 -7.90
MP2 -5.35 -6.58 -5.42 -7.39

Figure 5.3 Comparison of the total interaction energies (kcal/mol) by the


EFP and MP2 methods for water dimer, TBA–water, and TBA dimers. MP2
interaction energies were averaged between the basis set superposition
error (BSSE) corrected and uncorrected values.

Table 5.3 EFP energy components as total values (kcal/mol) and as


fractions of the total EFP energy for the water dimer (W–W), TBA dimer
(TBA–TBA), and water–TBA dimers (W–TBA a and b)

EFP energy components Fraction of total EFP energy


W-W W-TBAa W-TBAb TBA-TBA W-W W-TBAa W-TBAb TBA-TBA

Coulomb –8.51 –10.05 –7.17 –8.90 1.45 1.36 1.32 1.13


Exchange- 5.14 8.05 4.99 8.74 –0.87 –1.09 –0.92 –1.11
repulsion
CEX (Coulomb –3.37 –2.00 –2.18 –0.16 0.58 0.27 0.40 0.02
+ ex-rep)
Polarization –1.16 –1.94 –1.11 –1.79 0.20 0.26 0.20 0.23
Dispersion –1.34 –3.47 –2.13 –5.94 0.23 0.47 0.39 0.75
Total –5.88 –7.41 –5.41 –7.89 1.00 1.00 1.00 1.00

Source: Adapted from Ref. [35].


Note: See Fig. 5.3 for dimer structures.

the order of water dimer, water–TBA, and TBA dimer, which is


compensated by increase of the dispersive interactions in the TBA-
containing dimers. The polarization energy fraction remains fairly
constant among all dimers.
Low computational cost of EFP computations allows large-scale
modeling of bulk systems. MD simulations of water–TBA mixtures
with different mole fractions of TBA (0.00 – corresponding to
pure water, 0.06, 0.11, 0.16, and 0.50) were performed in NVT
ensemble at 300 K, with periodic boundary conditions [35]. The unit
cell contained from 98 to 150 molecules. For each concentration,
five different 25 ps production runs with 0.5 fs time step were
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

158 Effective Fragment Potential Method

Table 5.4 EFP energy components as a fraction of the total EFP


energy of the bulk water–TBA solutions at different TBA mole
fraction

Energy component TBA mole fraction


0.00 0.06 0.11 0.16 0.50
Coulomb 1.29 1.28 1.26 1.25 1.03
Exchange-repulsion –1.24 –1.24 –1.23 –1.31 –1.20
CEX (Coulomb + exchange-repulsion) 0.05 0.04 0.03 –0.06 –0.17
Polarization 0.56 0.50 0.49 0.49 0.37
Dispersion 0.38 0.45 0.49 0.56 0.80
Total 1.00 1.00 1.00 1.00 1.00

Source: Adapted from Ref. [35].

performed and the observables were averaged. Analysis of the radial


distribution functions (RDFs) and coordination numbers from these
simulations suggests that EFP predicts a smaller degree of TBA
aggregation at low TBA concentrations compared to classical MD
simulations with GROMOS96 force field. Additionally, the water
structure at low TBA concentrations is enhanced and strengthened
with respect to the pure water solution. Overall, as predicted
by the EFP method for low TBA concentrations, water–water H-
bonding and TBA–TBA interactions are enhanced, while water–
TBA interactions are less favorable. However, more homogeneous
mixing is observed in the equimolar water–TBA solution, and more
interactions occur between water and TBA molecules.
The energy decomposition of the bulk water–TBA solutions is
shown in Table 5.4. Similar to the energy decomposition in the
dimers, the Coulomb and repulsion energies are the largest energy
terms in the bulk. However, a sum of Coulomb and exchange-
repulsion energies contributes only a small fraction to the total
interaction energy, while the polarization and dispersion energies
dominate interaction patterns in water–TBA solutions. With in-
creasing TBA concentration, the Coulomb and polarization energy
fractions decrease, while the dispersion energy fraction increases.
These results are consistent with intuitive expectations when highly
polar water molecules are replaced with TBA molecules containing
hydrophobic methyl groups. The repulsion energy fraction remains
fairly constant at all TBA concentrations. It is noteworthy that the
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Chemistry of Non-Covalent Interactions 159

fraction of the polarization energy in the bulk is consistently higher


than that in the dimers. For example, at all TBA concentrations,
the polarization term is larger in magnitude than the sum of the
first-order (Coulomb plus exchange) contributions. Similar situation
was previously observed in water clusters [44]. Increase of the
polarization energy in the bulk occurs due to many-body collective
behavior in polar medium; a relative decrease of polarization at
higher concentrations of TBA is a sign of weakening the many-
body effects in a presence of the TBA hydrophobic groups. It is
worth mentioning that the accurate description of cooperativity is
non-trivial; it is possible that standard polarizable force-fields with
atom-centered scalar polarizabilities underestimate the amplitude
of cooperativity and the magnitude of polarization energy [45].

5.4.4 Affinity of Ions to Hydrophobic Interfaces


While microscopic interactions in water–alcohol mixtures are
far from being well understood, the behavior of ions in these
heterogeneous systems is even more intriguing. It is commonly
accepted that larger and more polarizable ions tend to favor water
interfaces, such as air–water or oil–water interface, while smaller
ions such as F− and Cl− avoid the interface and get repelled into
the bulk solution [46–49]. However, it is not quite clear whether
microscopic hydrophobic interfaces such as methyl groups in tert-
butyl alcohol will generate a similar response in ionic aqueous
solution. To address this question, MD simulations of TBA in 2.7
M NaF and NaI aqueous solutions were performed [37]. Simulation
box contained 1 TBA molecules, 5 halide and 5 sodium ions,
and 100 water molecules, all of which were represented by the
EFP fragments. Similar to the TBA–water simulations, 300 K NVT
simulations with 0.5 fs time step were performed, and the results
were averaged from five independent 60 ps trajectories.
Radial and angular distribution functions from these simulations
are shown in Fig. 5.4. Comparison of RDFs between the central
carbon on TBA (CTBA ) and water oxygen, I− , and F− clearly shows
that fluoride ions are repelled from the first hydration shell of
TBA, which is determined by the minimum in TBA–water RDF
at 6.5 Å (the first minimum at 4.0 Å corresponds to the waters
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

160 Effective Fragment Potential Method

H-bonded to TBA). On the other hand, a probability to find iodide


near TBA is just slightly below one, suggesting that the iodide ion
in neither repelled nor attracted to TBA molecules. A more detailed
information on a distribution of the iodide ion in a proximity of
TBA can be obtained by using angular distribution functions (ADFs)
calculated at different maximal separations between central TBA
carbon and I− , as shown in the bottom part of Fig. 5.4. ADF based
on configurations with maximum TBA-I− separation of 6.5 Å shows
four distinct regions of favorable location of the iodide ion: 0–20◦ ,
30–70◦ , a broad pick centered at 100◦ , and a smaller pick near 180◦ .
The last two regions correspond to the configurations where ions
are in a contact with the hydrophobic part of TBA. The 30–70◦ peak
corresponds to the iodide being H-bonded with the OH group of TBA.
The peak at 0◦ disappears when only closer iodide–TBA contacts
are considered (the ADF corresponding to 5.3 Å cut-off distance),
which suggests that this angular population arises due to ions that
are located 5.3 to 6.5 Å far from CTBA . A detailed investigation of the
corresponding snapshots shows that the ion is bridged by a water
molecule that is H-bonded to the OH group of TBA. To summarize, I−
ions are most likely to be located either near the OH head group of
TBA or around the periphery of its CH3 groups, rather up against the
hydrophobic end of TBA.
Distribution of ions around a solute can be deduced from the
frequency shift in characteristic vibrational modes of the solute
upon increasing the salt concentration in solution. Such measure-
ments with Raman-MCR (multi-curve resolution) technique were
performed by Ben-Amotz and co-workers for low-concentration
TBA aqueous solutions at various concentrations of NaI and NaF
[37]. These measurements show no frequency shift in CH stretch
region of TBA in a presence of NaF, in agreement with EFP
simulations showing no fluoride ions in proximity of TBA. On the
other hand, there is a small reproducible red shift in CH stretch
vibration in NaI solutions, with a magnitude of 1 cm−1 /M. That is,
a total shift of ∼3 cm−1 in a 3M NaI solution was observed. The
QM/EFP scheme, in which TBA and iodide ions were described
by MP2/6-311++G** and water molecules were represented by
effective fragments, was used to reproduce this salt concentration
dependent frequency shift. For that, 100 snapshots from the EFP
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Chemistry of Non-Covalent Interactions 161

Figure 5.4 (Top) Radial distribution functions between the central carbon
in TBA and F− (blue dotted curve), I− (black solid curve), and water oxygen
(red slashed curve). (Bottom) The angular distribution of I− ions within a
first hydration shell of TBA. The upper (dashed red) curve corresponds to
ions that are within 5.3 Å of the central carbon atom of TBA. The lower (solid
black) curve corresponds to ions that are within 6.5 Å of the central carbon
atom of TBA. Adapted from Ref. [37].

MD TBA–NaI–water trajectories were gathered and QM/EFP partial


Hessian frequency calculations were performed. These frequencies
were compared with analogues QM/EFP averaged frequencies of
salt–free TBA-water solution. The average shift of Raman active
modes of TBA in 2.7 M aqueous NaI solution was calculated as ∼4
± 1 cm−1 , in a very good agreement with experiment.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

162 Effective Fragment Potential Method

These results suggest that EFP accurately predicts affinity of


ions to molecular interfaces, which is very promising for future
applications of EFP to studying interfacial phenomena.

5.5 QM/EFP Schemes

Many chemical and biological processes call for a quantum treat-


ment, due to a chemical reaction, electronic excitation or necessity of
a more accurate description. Hybrid quantum mechanics/molecular
mechanics (QM/MM) approach, pioneered by Warshel and cowork-
ers [50], provides means to represent a part of the system with a
rigorous ab initio method while modeling the rest of the system with
a classical force field. The QM/MM Hamiltonian of the combined
system can be represented as:

H QM/MM = H QM + H MM + H QM-MM (5.8)

A separation of the system into quantum and classical parts


allows maintaining low computational cost. An accuracy of QM/MM
schemes depends on a number of factors, including an accuracy
of the methods used for describing quantum and classical regions,
a choice of the QM region, a level of coupling between the QM
and MM Hamiltonians, and a description of the boundary between
the QM and MM regions. Thus, many available QM/MM approaches
are distinct in describing the above challenges. Polarizable QM/MM
schemes for electronic excited states have been developed in several
groups [51–55]. Unique features of the QM/EFP methodology
developed in Refs. [6, 56–60] are discussed below.
As follows from the form of the QM/MM Hamiltonian (Eq. 5.8),
any electronic structure method can be used for describing the
QM and MM parts of the system. Applications discussed in the
following session mainly deal with understanding photochemistry
in the condensed phase, with a common choice of the excited
state methods from the equation of motion coupled cluster (EOM-
CC) family [61–64], time-dependent density functional theory (TD-
DFT) [65–66], or configuration interaction (CI) methods. The MM
Hamiltonian is represented by either EFP1 or EFP Hamiltonian
from Eq. 5.1 or Eq. 5.7. H QM-MM coupling term in the QM/EFP1
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

QM/EFP Schemes 163

scheme contains three terms, following the energy terms in the EFP1
Hamiltonian:
  

 pol 
H QM-EFP1 = d pq p  VkCoul + Vk + Vkrem q , (5.9)
p, q A k∈A

where p and q are atomic orbitals in the quantum region, dpq is an


pol
element of the atomic density matrix, and VkCoul , Vk , and Vkrem are
the Coulomb, polarization, and repulsion potentials at the expansion
point k of fragment A. All EFP1 QM–MM coupling terms appear as
one-electron contributions to the ab initio Hamiltonian.
Coulomb term involves products of multipole moment integrals
(up to octopole) with the multipoles on the EFP fragments:
x,
y, z x, y, z 
μak a ab
k
3ab − R 2 δab
qk a a
VkCoul = + +
R R3 3R 5
x,
y, z 
abc
k
5abc − R 2 (aδ bc + bδ ac + cδ ab )
a
+ , (5.10)
5R 7
where R and a, b, c are the distance and its Cartesian components
between the electron and EFP expansion point k. Note that the first
term in Eq. 5.10 is an integral used in the electron–nucleus attraction
term; the other terms in Eq. 5.10 may be obtained using recurrence
relation [67].
QM-EFP polarization is obtained self-consistently with electronic
wave function of the QM part and induced dipoles of the EFP part.
In practice, this is achieved in two included self-consistent cycles,
the outer one for the electronic wave function, i.e., a standard
SCF procedure, and the inner one for the EFP induced dipoles
at the instantaneous value of the field due to electronic wave
function. The HQM-MM for polarization includes two parts: the one-
electron contribution to the electronic Hamiltonian (Eq. 5.11) and
the polarization energy added to the EFP subsystem (Eq. 5.12):
y, z
x, 
μak + μ̃ak a
pol 1 a
Vk = − , (5.11)
2 R3

1    k mult, k 
x, y, z
 ai, k
E pol = −μa F a +F anuc,k +μ̃ak F a , (5.12)
2 k a
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

164 Effective Fragment Potential Method

where μak and μ̃ak are the Cartesian components of the induced dipole
and the conjugated induced dipole, respectively, at polarizability
point k. F amult, k , F anuc, k , and F aai, k are the Cartesian components of
electric fields at point k due to EFP multipoles, nuclei of ab initio
region, and ab initio wave function, respectively.
The repulsive term of EFP1 HQM-MM also contributes a one-
electron integral to the electronic Hamiltonian in which each
EFP1 potential is modeled with a set of Gaussian functions with
predefined parameters αkn positioned on atom centers of EFP1
waters:

2

Vkrem = exp −αkn R 2 (5.13)
n=1

In QM/EFP schemes for the general EFP method, the electrostatic


and polarization QM–MM couplings are identical to ones in the
EFP1 model. However, instead of Eq. 5.13, separate dispersion
and exchange-repulsion terms appear in HQM-MM . The optimal
form of these terms is still under debate, with a recent work
suggesting possible routes [4, 68–69]. The main ideas guiding
the development of these terms are rigorous first-principles-based
formulation, computational feasibility, transparent implementation,
and straightforward extension to analytic gradients.
In the applications discussed in Section 5.6, dispersion and
exchange-repulsion terms were excluded from HQM-MM , and a
response of the EFP environment on the QM subsystem was purely
electrostatic, i.e., consisting of the Coulomb and polarization terms.
Equations 5.9, 5.10, and 5.11 define polarizable embedding
scheme, i.e., the situation in which the QM subsystem is fully
polarized by the MM region and vice versa [70]. For example, a
change in the electronic wave function due to electronic excitation
affects polarization (values of the induced dipoles) in the EFP
subsystem that in turn provides a feedback to the electronic
Hamiltonian. Modification of the QM Hamiltonian by the EFP terms
leads to a change in the electronic wave function, as the HF or Kohn–
Sham (KS) procedure is solved in a different potential than would be
a case for the gas phase Hamiltonian. Resulting molecular orbitals
in a presence of effective fragments possess somewhat different
shapes and energies. For example, changes in electronic excitation
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

QM/EFP Schemes 165

energies may be often understood in terms of relative stabilization


or destabilization of the involved orbitals.
When electronic excited states in the QM region are considered,
each electronic state possesses unique electronic density and thus
interacts differently with polarizable environment. Purely electro-
static (Coulomb) interactions between solvent and electronically
excited solute are automatically taken into account due to a presence
of the one-electron Coulomb term in the QM Hamiltonian (Eq. 5.10).
A part of polarization interactions is also represented in a similar
way due to an explicit inclusion of EFP induction terms in the QM
Hamiltonian, as given by Eq. 5.11.
Additional contribution to the interaction energy arises due to
individual response of polarizable environment to the electronic
density of different electronic states. Effectively, Hamiltonians of
electronic states differ by V pol terms, because individual electronic
densities produce different EFP induced dipoles that contribute
differently to the Hamiltonian in Eq. 5.11. Describing the EFP
polarization self-consistently with each electronic state will result
in a set of non-orthogonal electronic states with different effective
Hamiltonians, which is referred to as “method 3” in Ref. [56].
However, dealing with non-orthogonal electronic states might be
quite cumbersome, as calculations of transition properties and ana-
lytic gradients require orthogonal states. Additionally, simultaneous
calculation of non-orthogonal states in the Davidson diagonalization
procedure is non-trivial, such that diagonalization of one state at a
time might be required instead.
Alternatively, one can search for electronic excited states while
keeping polarization of the environment frozen at the values
corresponding to the ground state. In this case, the QM/EFP excited
state energy is
 
QM/EFP
E ex = ex |H 0 + V Coul + V pol | ex + E coul + E pol + E disp +E exrep ,
(5.14)
where E pol is the polarization energy corresponding to the ground
state electron density, as defined in Eq. 5.12.
Expression (5.14) provides a majority of the environment
response on the electronic structure of a solute, and corresponds to
the “model 1” from Ref. [56].
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

166 Effective Fragment Potential Method

Additionally improvement to the description of the solva-


tochromic shifts can be achieved by employing a perturbative ap-
proach for accounting for a state-specific response of the polarizable
environment [55–58]. In this formalism, after the excited electronic
states are obtained (at fixed value of environment polarization),
their one-electron densities are used to repolarize the environment.
A differential polarization energy corresponding to a particular
electronic state is provided by the following expression:

 ⎤
− μ k
− μ k
F mult,k
+ F nuc,k

1⎢ ex, a
a ⎥
x, y, z gr,a a
⎢ ai, k ai, k ⎥

E pol = ⎢ + μ̃k
F − μ̃ k
F ⎥,
2 k a ⎣ ex,a ex,a gr,a gr, a


− μkex,a + μ̃kex,a − μkgr,a − μ̃kgr,a F ex,a ai, k

(5.15)
ai
where F ex is the field due to the excited state one-electron
density, and μkex and μ̃kex are the induced dipoles corresponding
to this excited state density. μkgr and μ̃kgr are the induced dipoles
corresponding to the ground state, as calculated from the ground
state self-consistent polarization procedure.
E pol is added to the
electronic excitation energy of the considered state (Eq. 5.14).
The first two terms in Eq. 5.15 provide a difference of the
polarization energy of the QM/EFP system in the excited and ground
electronic states, while the last term is the leading correction to
the interaction of the ground-state-optimized induced dipoles with
the excited state wave function. The perturbative treatment of the
response of the polarizable environment as in Eqs. 5.14 and 5.15
corresponds to the “method 2” from Ref. [56]. This approach has
been used for calculations discussed in Section 5.6.
Table 5.5 shows a decomposition of electrostatic and polarization
effects to a total solvatochromic shift in para-nitroaniline (pNA)
solvated by several water molecules. In these calculations water
molecules are represented by EFPs while pNA is contained in the
QM region. The indirect electrostatic and polarization terms arise
from Eq. 5.14, i.e., these are the contributions to the electronic
energies due to one-electron Coulomb and polarization terms in
the QM Hamiltonian. One can think of the indirect terms as of
orbital relaxation of the QM subsystem due to the EFP terms in the
QM–MM Hamiltonian. As follows from Table 5.5, the electrostatic
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

QM/EFP Schemes 167

Table 5.5 Solvatochromic red shifts (eV) in the π → π * singlet


transition in PNA–water complexesa

Full EOM-CCSD –0.279 –0.193 –0.263


EOM-CCSD/EFP –0.270 –0.195 –0.239
total indirect –0.262 –0.186 –0.220
electrostatic –0.213 –0.145 –0.177
polarization –0.049 –0.040 –0.043
direct polarizationb –0.008 –0.009 –0.019

a
Gas phase excitation energy is 4.654 eV.
b
Direct polarization contribution (i.e., “polarization correction”) to the solvatochromic shift
calculated by Eq. 5.15.

contributions dominate the indirect term, while polarization is


responsible for about 25% of the solvatochromic shift.
State-specific response of the polarizable environment, cal-
culated by Eq. 5.15, is several times smaller than the indirect
polarization shift, or 0.01–0.02 eV in absolute values. Thus,
polarization correction provides only a minor contribution to the
solvatochromic shift in pNA–water complexes. However, when the
ground (reference) state and excited state significantly differ in
character, such as the case in EOM-IP methods, direct polarization
contribution might become very significant. An overall role of
polarization is expected to increase in larger clusters and bulk
systems, where the many-body effects become prominent.
A computational cost of QM/EFP calculations is typically
determined by the cost of the QM calculation. Additionally, “method
2” (perturbative account of state-specific polarization) requires
calculation of one-particle density matrix for each electronic state.
QM/EFP schemes were implemented for a variety of electronic
structure methods, such as HF, DFT, CIS, CIS(D), TD-DFT, various
EOM-CC methods, and provide means to analyze electronic structure
in the environment at the desired level of accuracy.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

168 Effective Fragment Potential Method

5.6 Excited State Chemistry in the Condensed Phase

QM/EFP1 scheme was used for investigating a variety of chemical


processes in aqueous environment, including chemical reactions,
amino acid neutral/zwitterion equilibrium, solvent effects on
properties of a solute such as changes in dipole moment and shifts in
vibrational spectrum, and solvatochromic shifts of electronic levels
[36, 56, 59–60, 71–79]. Applications of a general QM/EFP scheme
were limited so far to studies of electronic excitations and ionization
energies in various solvents [56–58]. Extensions of QM/EFP to
biological systems have been also developed [80–85].

5.6.1 Solvatochromic Shifts and Photodynamics of


Para-Nitroaniline
Para-nitroaniline (pNA) is an organic chromophore that is often
used as a tag in UV/Vis, Raman, and second-harmonic generation
spectroscopies [86–91]. pNA possesses electron donor amino group
and electron acceptor nitro groups that give rise to a bright charge-
transfer (CT) electronic state with an increased dipole moment. This
CT state becomes red-shifted in polar solvents and thus can be used
for characterization of solute–solvent interactions. Photodynamics
of the CT state is solvent-dependent, as follows from ultrafast
transient absorption measurements [90–91]. For example, it was
observed that in non-polar solvents the CT undergoes fast relaxation
to the triplet manifold via intersystem crossing (ISC) [91]. However,
the ISC channel is inactive in water and the system undergoes
nonradiative internal conversion (IC).
QM/EFP methods were applied to understand solvent effects
and relaxation dynamics of the CT state of pNA in three different
solvents: water, dioxane, and cyclohexane [57]. Specifically, pNA was
described by the configuration interaction singles with perturbative
doubles [CIS(D)] method [92] in 6-31+G* basis, while solvent
molecules were represented by the EFP fragments. For each system,
pNA molecule was solvated by 64 solvent molecules. Configurational
space of each system was sampled with EFP MD (in which pNA
was also represented as an EFP fragment) with periodic boundary
conditions, using NVT ensemble at 300 K. Snapshots from the
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Excited State Chemistry in the Condensed Phase 169

dipole
moment, D

5.5
Excitation energy, eV

5.0

4.5

4.0

gas phase c-hexane dioxane water

Figure 5.5 Excitation energies of the singlet electronic states of pNA in


cyclohexane, 1,4-dioxane, and water compared to the gas-phase energies.
Gas-phase dipole moments (μ, Debye) are also shown; the ground state
dipole moment is 7.7 D. State assignments are given in C2v symmetry group.

MD trajectories were used for QM/EFP excited state calculations.


Further details of these simulations can be found in Ref. [57].
An example of solvent-induced solvatochromic shifts (calculated
at a characteristic snapshot from the MD trajectory for each solvent)
on different electronic excited states is shown in Fig. 5.5. Inspection
of this plot reveals that the electronic states with the dipole
moments that are larger than the dipoles in the ground state (shown
as solid red curves in Fig. 5.5) become increasingly stabilized (red-
shifted) in polar solvents. For example, 11 A1 , 11 B1 , 21 B1 states,
which dipoles are larger than in the ground state dipole (7.7 Debye),
demonstrate systematic red shifts upon solvation. The red shift
increases in more polar solvents (in the order of c-hexane, dioxane,
and water). The most dramatic red shift is experienced by the
experimentally observed 11 A1 charge-transfer state with the (gas-
phase) dipole moment of 12.9 D. It is quite intriguing that this state
(the lowest red state in Fig. 5.5) is only the third lowest excited state
in the gas phase but becomes the lowest excited state in water. On the
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

170 Effective Fragment Potential Method

other hand, the electronic states with smaller dipole moments (blue
dashed curves in Fig. 5.5) become blue-shifted in polar solvents. For
example, clear blue shifts are observed for 11 A2 and 21 B2 .
Additional more complex stabilization/destabilization effects
might arise due to new patterns of state interactions in solvent.
This is because a solvent lifts symmetry constraints and allows
interactions among the states belonging to different symmetries.
Thus, initial stabilization/destabilization by a solvent might be
augmented by interactions and mixing with nearby states resulting
in intricate changes in the excitation spectra and dynamics.
The calculated absorption spectra of the 11 A1 CT state of pNA in
water, dioxane, and cyclohexane are shown in Fig. 5.6. Comparison
of the computed and experimental absorption spectra suggests
that while the calculations overestimate the absolute values of
spectral maxima (probably due to deficiencies in the excited state

Figure 5.6 Simulated absorption spectra of the 11 A1 CT state in water


(red), 1,4-dioxane (blue), and cyclohexane (green). Vertical dashed line
corresponds to the energy of 11 A1 in the gas-phase.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Excited State Chemistry in the Condensed Phase 171

method CIS(D) and using a relatively small basis set), they truthfully
reproduce direction and size of solvatochromic shifts. For example,
the calculated and experimentally measured red shifts in water are
found to be 1.0 eV [90]. The calculated red shifts in 1,4-dioxane
and cyclohexane are smaller, of 0.4 eV and 0.2 eV, respectively,
which is ∼0.2 eV underestimation of the experimentally observed
red shifts in these solvents [90]. Deviations in the computed and
experimental values of the solvatochromic shifts in dioxane and
cyclohexane could be due to a lack of short-range cavity dispersion
and exchange-repulsion contributions in the QM/MM coupling term
of the Hamiltonian, which becomes more pronounced in non-polar
solvents. Additionally, both the shifts and widths of the spectral lines
might be affected by a rigid geometry of the chromophore that is
not changed during MD simulations. For example, narrower spectral
lines in dioxane and cyclohexane can be explained by omitting
inhomogeneous broadening due to freezing out vibrational degrees
of freedom of pNA. On the other hand, simulations still account
for broadening due to different orientations of solvent molecules.
However, as solute–solvent interactions weaken in less polar
solvents (dioxane and cyclohexane), different solvent configurations
have a lesser impact on spectral shifts and narrower spectral lines
are produced. Thus, the electronic excitation energies in non-polar
solvents are less sensitive to solvent reorganization in the present
polarizable embedding QM/MM scheme.
Understanding solvent effects on the electronic spectra allows
qualitative prediction of relaxation dynamics of pNA in various
solvents. A scheme showing possible mechanisms of relaxation in
water and dioxane is presented in Fig. 5.7. Generally, upon excitation
to the 11 A1 singlet state the energy may relax via IC and/or ISC
mechanisms.
In water, only triplet 13 A1 state lies below 11 A1 . This state is a
triplet counterpart of 11 A1 , i.e., the state with π → π* character.
However, El-Sayed’s rules suggest that the ISC to this state will be
very slow due to a vanishing spin-orbit coupling element. Favorable
for ISC 13 A2 triplet with n → π * character has (on average) higher
energy than the 11 A1 singlet, which makes the probability of ISC very
low. These arguments suggest that ISC to the triplet manifold should
be very slow in water, which agrees with experimental findings [91].
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

172 Effective Fragment Potential Method

Figure 5.7 Scheme of the pNA relaxation dynamics in water and in dioxane.
Red arrows correspond to the intense absorption band due to excitation
to the 11 A1 CT state. This excitation further undergoes either internal
conversion (IC) or intersystem crossing (ISC) to the triplet manifold (shown
with blue arrows). Figure adapted from Ref. [57].

Ordering of the low-lying states in dioxane is very different.


Three triplet states, 13 B1 , 13 A1 , and n → π * 13 A2 , lie below the
CT 1 A1 singlet, providing multiple pathways for energy relaxation.
Moreover, the 11 A1 state is preceded by the 11 A2 n → π * singlet,
which opens up an additional possible channel for energy relaxation
through nonradiative decay from 11 A1 to 11 A2 and ISC from 11 A2 to
either 13 A1 or 13 B1 . Experimental observation of absorption in the
triplet manifold suggests that the 13 A1 triplet becomes populated
upon ISC since it is the only state that shows significant oscillator
strengths to other triplets within the energy range of ∼3 eV [91]. The
relaxation channel consistent with the experimental observation is
IC from 11 A1 to 11 A2 and ISC from 11 A2 to 13 A1 . This relaxation
pathway becomes even more favorable in less polar solvents such
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Excited State Chemistry in the Condensed Phase 173

as cyclohexane, where the 11 A2 –13 A1 splitting decreases due to a


smaller red shift of 13 A1 .
Thus, different solvation of singlet and triplet states results in a
dense manifold of the low-lying triplet states and in small singlet-
triplet energy gaps that facilitate ISC in dioxane. These findings are
in qualitative agreement with experimental data [91].

5.6.2 Thymine in Water


The general interface between the EOM-CC and EFP methods [56]
allows one to exploit the advantages of various EOM techniques,
such as spin-flip (SF) [93–94], ionization potential (IP) [95–96],
or electron affinity (EA) [97] variants. For example, the vertical
ionization energies (VIE) of thymine in an aqueous environment
were investigated using the EOM-IP-CCSD/EFP method [58]. The
ionization of nucleic acid bases is relevant to radiation and
photoinduced damage of DNA.
It is noteworthy that the convergence of the VIE with the
number of hydration shells to the bulk value is slow and non-
monotonic, such that the first solvation shell increases the VIE by
∼0.1 eV, whereas the overall solvent-induced shift is –0.9 eV (see
Fig. 5.8). The unexpected effect of the first hydration shell is due to
specific interactions of thymine with individual water molecules. In

Figure 5.8 Convergence of the VIE in thymine with the number of hydration
shells. Adapted from Ref. [58].
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

174 Effective Fragment Potential Method

Table 5.6 Effect of donor and acceptor H-bonds on the VIE (eV) in thymine

H-bonding with CO H-bonding with NH


IP-CCSD +0.27 +0.30 –0.39 –0.29
IP-CCSD /EFP +0.27 +0.30 –0.40 –0.30
error +0.00 +0.00 –0.01 –0.01

particular, when a water molecule is H-bonded to the NH groups of


thymine, it acts as the electron donor and, therefore, reduces the
VIE. On contrary, water that is H-bonded to the carbonyl groups
acts as the electron acceptor and increases the VIE. These effects of
microhydration on the VIE in thymine are demonstrated in Table 5.6.
As shown in this Table 5.6, a single water molecule H-bonded to
an NH group reduces the VIE by ∼0.34 eV, while a water H-bonded to
a carbonyl group increases the VIE by ∼0.29 eV. The data presented
in Table 5.6 provide additional evidence of excellent performance
of EFP in treatment of water, as follows from comparison of EOM-IP-
CCSD/EFP and EOM-IP-CCSD results for microhydrates. Additionally,
the opposing effects of donor and acceptor H-bonds can be used to
rationalize non-monotonous dependence of the VIE on the number
of hydration shells. As only the first hydration shell is considered
(defined based on the first H-bonded minimum in thymine-water
RDFs), an average blue shift of ∼0.1 eV is observed in the VIE
(see Fig. 5.8). This contra-intuitive observation can be explained by
noticing that on average, solvating water forms 2.1 H-bonds with
the CO groups and only 1.0 H-bonds with the NH groups of the
thymine. As a result, H-bonding with carbonyl groups dominates a
change in the IE when only the first hydration shell is considered.
Subsequent hydration shells stabilize the ionized state of thymine.
The convergence of the VIE is reached only by four or five hydration
shells, which is not surprising as the changes in the VIE are governed
by long-range Coulomb and polarization forces.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Excited State Chemistry in the Condensed Phase 175

Figure 5.9 Effect of the polarization term in the EFP water potential on the
VIE in thymine. Adapted from Ref. [58].

Another important observation is that the polarization terms


become even more important for evaluating ionization energies
than they are in case of the excited states of neutral species.
Figure 5.9 demonstrates the effect of polarization interactions on
the VIE, plotted as instantaneous values over 25 random geometry
snapshots. Neglecting polarization in the EFP water potential
produces errors as high as −0.8 eV, with an average error of
∼–0.6 eV. Thus, polarizable potential is crucial for accurate account
of ionization energies in water.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

176 Effective Fragment Potential Method

Thus, investigation of hydrated thymine provides a direct


evidence that explicit polarizable solvation model must be used for
describing solvation effects on VIEs in nucleic bases.

5.7 Technical Details and Implementation

Original implementation of the EFP method exists in the GAMESS


electronic structure package [20–21]. It contains codes for both
EFP1 and general EFP models, connected to multiple electronic
structure methods (HF, DFT, CIS, TDDFT, MCSCF, MRPT, CCSD(T))
and algorithms for local and global minimum search (geometry
optimization, Monte Carlo) and configurational sampling (MD).
Additionally, EFP1 codes are interfaced with fragment molecular
orbital (FMO) method and polarizable continuum models (PCM)
[98–100]. General EFP model is also used in a hybrid of FMO and
EFP: the effective fragment molecular orbital (EFMO) method [101–
102].
Recently, the EFP method was implemented as an indepen-
dent open-source code libefp [103]. Libefp is mpi and openmp
parallelized code in standard C language, maintained at GitHub.
Libefp contains all machinery for intricate energy and gradient
computations for a system of fragments. Libefp is augmented with
algorithms for geometry minimization and molecular dynamics.
Additionally, libefp can be interfaced with electronic structure
packages, enabling various QM/EFP calculations. Such interfaces
have been so far created for Q-Chem [104] and PSI4 [105] packages.
Thus, a main goal of libefp development is to extend state-of-the-art
electronic structure models available in various electronic structure
packages and typically limited to the gas phase chemistry to the
condensed phase chemistry, without neglecting diverse effects of the
environment on the electronic structure and dynamics.
To ensure user-friendly interface of EFP and QM/EFP input
files, several steps were undertaken. Libefp uses native GAMESS
format of EFP potentials. A library of pre-defined EFP potentials
for various fragments (common solvents, ions, DNA bases, etc.) is
created and stored at GitHUB [106]. Additionally, scripts to convert
PDB format into libefp input were created, including automatic
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

Future Directions and Outlook 177

generation and storage of EFP potentials. Finally, libefp output data


format is compatible with VMD visualization and analysis software.
Thus, the interface with libefp library provides users of electronic
structure packages with tools for preparing, performing, and
analyzing calculations in the condensed phase using the desired
level of theory for the quantum region and advanced treatment
of the environment effects. Arguably one of the greatest benefits
of libefp-based developments is that the code improvements and
optimizations on both the libefp and the parent package sides (e.g.,
RI techniques, GPU parallelization) become automatically available
to all QM/EFP models.

5.8 Future Directions and Outlook

Future developments of EFP and QM/EFP methods will be un-


dertaken along the following lines. Analytic gradients in QM/EFP
schemes are necessary for efficient investigations of dynamics of
chemical and photochemical processes. Specifically, gradients for
dispersion and exchange-repulsion QM/EFP terms are current bot-
tlenecks and are a topic of ongoing effort by the developers. As the
QM/EFP dispersion term is a perturbative correction to the ground
state QM/EFP energy, a special treatment for extending the QM/EFP
dispersion to excited states is required. A proper description of
dispersion interactions with an electronic excited state is important
for capturing short-range cavity solvent effects on solvatochromic
shifts, which is largely unstudied and unresolved issue. Extending
QM/EFP schemes to property calculations, such as intensities of
electronic transitions, circular dichroism and NLO spectroscopies,
and non-adiabatic couplings, is another emerging direction in EFP
developments. Another important step is generalizing the EFP
method to flexible molecules, such as organic polymers used in
photovoltaic applications and biological macromolecules. To achieve
that, one can decompose a flexible molecule into several rigid
EFP fragments along bonds with torsional freedom and reconnect
the produced EFP fragments with covalent links. An alternative
approach, exploited in the EFMO method [101–102], is to allow
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

178 Effective Fragment Potential Method

each fragment to change its geometry while updating fragment


parameters accordingly.
In terms of applications, there is a vast majority of exciting
problems that can be solved with the EFP methodology. A few ex-
amples include investigation of affinity to and reactivity of chemical
species at interfaces, in application to catalysis, chemical separation,
and atmospheric processes; and photochemistry in solutions, at
interfaces, and in biological and materials systems, with a span
of potential applications from biomarkers for characterization of
protein functions and dynamics to understanding energy and charge
transfer in solar cells. Overall, these ongoing and future studies will
promote our understanding and control of effects of the environ-
ment on processes in chemistry, biology, and materials science.

References

1. Day, P. N., Jensen, J. H., Gordon, M. S., Webb, S. P., Stevens, W. J.,
Krauss, M., Garmer, D., Basch, H., Cohen, D. (1996). An Effective
Fragment Method for Modeling Solvent Effects in Quantum Mechanical
Calculations, J. Chem. Phys., 105, 1968–1986.
2. Gordon, M. S., Freitag, M. A., Bandyopadhyay, P., Jensen, J. H., Kairys, V.,
Stevens, W. J. (2001). The Effective Fragment Potential Method: A QM-
Based MM Approach to Modeling Environmental Effects in Chemistry,
J. Phys. Chem. A, 105, 293–307.
3. Gordon, M. S., Slipchenko, L. V., Li, H., Jensen, J. H. (2007). The Effective
Fragment Potential: A General Method for Predicting Intermolecular
Forces, Ann. Rep. Comp. Chem., 3, 177–193.
4. Gordon, M. S., Smith, Q. A., Xu, P., Slipchenko, L. V. (2013). Accurate First
Principles Model Potentials for Intermolecular Interactions, Annu. Rev.
Phys. Chem., 64, 553–578.
5. Ghosh, D., Kosenkov, D., Vanovschi, V., Williams, C. F., Herbert, J.
M., Gordon, M. S., Schmidt, M. W., Slipchenko, L. V., Krylov, A. I.
(2010). Noncovalent Interactions in Extended Systems Described by
the Effective Fragment Potential Method: Theory and Application to
Nucleobase Oligomers, J. Phys. Chem. A, 114, 12739–12754.
6. DeFusco, A., Minezawa, N., Slipchenko, L. V., Zahariev, F., Gordon, M. S.
(2011). Modeling Solvent Effects on Electronic Excited States, J. Phys.
Chem. Lett., 2, 2184–2192.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

References 179

7. Jensen, J. H., Gordon, M. S. (1996). An Approximate Formula for the


Intermolecular Pauli Repulsion between Closed Shell Molecules, Mol.
Phys., 89, 1313–1325.
8. Jensen, J. H., Gordon, M. S. (1998). An Approximate Formula for the
Intermolecular Pauli Repulsion between Closed Shell Molecules. II.
Application to the Effective Fragment Potential Method, J. Chem. Phys.,
108, 4772–4782.
9. Adamovic, I., Gordon, M. S. (2005). Dynamic Polarizability, Dispersion
Coefficient C6 and Dispersion Energy in the Effective Fragment
Potential Method, Mol. Phys., 103, 379–387.
10. Li, H., Gordon, M. S., Jensen, J. H. (2006). Charge Transfer Interaction in
the Effective Fragment Potential Method, J. Chem. Phys., 124, 214108.
11. Jeziorski, B., Moszynski, R., Szalewicz, K. (1994). Perturbation Theory
Approach to Intermolecular Potential Energy Surfaces of Van Der
Waals Complexes, Chem. Rev., 94, 1887–1930.
12. Moszynski, R. (1996). Symmetry-Adapted Perturbation Theory for the
Calculation of Hartree–Fock Interaction Energies, Mol. Phys., 88, 741–
758.
13. Freitag, M. A., Gordon, M. S., Jensen, J. H., Stevens, W. J. (2000).
Evaluation of Charge Penetration between Distributed Multipolar
Expansions, J. Chem. Phys., 112, 7300–7306.
14. Slipchenko, L. V., Gordon, M. S. (2007). Electrostatic Energy in the
Effective Fragment Potential Method: Theory and Application to
Benzene Dimer, J. Comp. Chem., 28, 276–291.
15. Slipchenko, L. V., Gordon, M. S. (2009). Damping Functions in the
Effective Fragment Potential Method, Mol. Phys., 107, 999–1016.
16. Adamovic, I., Freitag, M. A., Gordon, M. S. (2003). Density Functional
Theory Based Effective Fragment Potential Method, J. Chem. Phys., 118,
6725–6732.
17. Tang, K. T., Toennies, J. P. (1984). An Improved Simple-Model for the
Van Der Waals Potential Based on Universal Damping Functions for the
Dispersion Coefficients, J. Chem. Phys., 80, 3726–3741.
18. Jensen, J. H. (1996). Modeling Intermolecular Exchange Integrals
between Nonorthogonal Molecular Orbitals, J. Chem. Phys., 104, 7795–
7796.
19. Jensen, J. H. (2001). Intermolecular Exchange-Induction and Charge
Transfer: Derivation of Approximate Formulas Using Nonorthogonal
Localized Molecular Orbitals, J. Chem. Phys., 114, 8775–8783.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

180 Effective Fragment Potential Method

20. Schmidt, M. W., Baldridge, K. K., Boatz, J. A., Elbert, S. T., Gordon, M. S.,
Jensen, J. H., Koseki, S., Matsunaga, N., Nguyen, K. A., Su, S. J., Windus, T.
L., Dupuis, M., Montgomery, J. A. (1993). General Atomic and Molecular
Electronic-Structure System, J. Comp. Chem., 14, 1347–1363.
21. Gordon, M. S., Schmidt, M. W. (2005). Advances in Electronic Structure
Theory: Gamess a Decade Later. In: Theory and Applications of
Computational Chemistry (ed. Dykstra, C. E., Frenking, G., Kim, K. S.,
Scuseria, G. E.), Chapter 41, Elsevier, pp. 1167–1190.
22. Jurečka, P., Šponer, J., Černý, J., Hobza, P. (2006). Benchmark Database
of Accurate (Mp2 and Ccsd(T) Complete Basis Set Limit) Interaction
Energies of Small Model Complexes, DNA Base Pairs, and Amino Acid
Pairs, Phys. Chem. Chem. Phys., 8, 1985–1993.
23. Řezáč, J., Riley, K. E., Hobza, P. (2011). S66: A Well-Balanced Database of
Benchmark Interaction Energies Relevant to Biomolecular Structures,
J. Chem. Theory Comp., 7, 2427–2438.
24. Zhao, Y., Truhlar, D. G. (2005). Benchmark Databases for Nonbonded
Interactions and Their Use to Test Density Functional Theory, J. Chem.
Theory Comp., 1, 415–432.
25. Goerigk, L., Grimme, S. (2009). A General Database for Main Group
Thermochemistry, Kinetics, and Noncovalent Interactions: Assessment
of Common and Reparameterized (Meta-)Gga Density Functionals, J.
Chem. Theory Comp., 6, 107–126.
26. Schneebeli, S. T., Bochevarov, A. D., Friesner, R. A. (2011). Parameteri-
zation of a B3lyp Specific Correction for Noncovalent Interactions and
Basis Set Superposition Error on a Gigantic Data Set of Ccsd(T) Quality
Noncovalent Interaction Energies, J. Chem. Theory Comp., 7, 658–668.
27. Gráfová, L., Pitoňák, M., Řezáč, J., Hobza, P. (2010). Comparative
Study of Selected Wave Function and Density Functional Methods for
Noncovalent Interaction Energy Calculations Using the Extended S22
Data Set, J. Chem. Theory Comp., 6, 2365–2376.
28. Flick, J. C., Kosenkov, D., Hohenstein, E. G., Sherrill, C. D., Slipchenko,
L. V. (2012). Accurate Prediction of Noncovalent Interaction Energies
with the Effective Fragment Potential Method: Comparison of Energy
Components to Symmetry-Adapted Perturbation Theory for the S22
Test Set, J. Chem. Theory Comp., 8, 2835–2843.
29. Grimme, S., Antony, J., Ehrlich, S., Krieg, H. (2010). A Consistent and
Accurate ab initio Parametrization of Density Functional Dispersion
Correction (DFT-D) for the 94 Elements H-Pu, J. Chem. Phys., 132,
154104.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

References 181

30. Chai, J.-D., Head-Gordon, M. (2008). Long-Range Corrected Hybrid


Density Functionals with Damped Atom-Atom Dispersion Corrections,
PCCP, 10, 6615–6620.
31. Merrill, G. N., Gordon, M. S. (1998). Study of Small Water Clusters Using
the Effective Fragment Potential Model, J. Phys. Chem. A, 102, 2650–
2657.
32. Day, P. N., Pachter, R., Gordon, M. S., Merrill, G. N. (2000). A Study of
Water Clusters Using the Effective Fragment Potential and Monte Carlo
Simulated Annealing, J. Chem. Phys., 112, 2063–2073.
33. Netzloff, H. M., Gordon, M. S. (2004). The Effective Fragment Potential:
Small Clusters and Radial Distribution Functions, J. Chem. Phys., 121,
2711–2714.
34. Adamovic, I., Gordon, M. S. (2006). Methanol−Water Mixtures: A
Microsolvation Study Using the Effective Fragment Potential Method,
J. Phys. Chem. A, 110, 10267–10273.
35. Hands, M. D., Slipchenko, L. V. (2012). Intermolecular Interactions in
Complex Liquids: Effective Fragment Potential Investigation of Water–
Tert-Butanol Mixtures, J. Phys. Chem. B, 116, 2775–2786.
36. Kemp, D. A., Gordon, M. S. (2005). Theoretical Study of the Solvation
of Fluorine and Chlorine Anions by Water, J. Phys. Chem. A, 109, 7688–
7699.
37. Rankin, B. M., Hands, M. D., Wilcox, D. S., Fega, K. R., Slipchenko, L.
V., Ben-Amotz, D. (2013). Interactions between Halide Anions and a
Molecular Hydrophobic Interface, Faraday Discuss., 160, 255–270.
38. Smith, Q. A., Gordon, M. S., Slipchenko, L. V. (2011). Benzene−Pyridine
Interactions Predicted by the Effective Fragment Potential Method, J.
Phys. Chem. A, 115, 4598–4609.
39. Slipchenko, L. V., Gordon, M. S. (2009). Water-Benzene Interactions:
An Effective Fragment Potential and Correlated Quantum Chemistry
Study, J. Phys. Chem. A, 113, 2092–2102.
40. Adamovic, I., Li, H., Lamm, M. H., Gordon, M. S. (2006). Modeling
Styrene−Styrene Ineractions, J. Phys. Chem. A, 110, 519–525.
41. Smith, Q. A., Gordon, M. S., Slipchenko, L. V. (2011). Effective Fragment
Potential Study of the Interaction of DNA Bases, J. Phys. Chem. A, 115,
11269–11276.
42. Wilcox, D. S., Rankin, B. M., Ben-Amotz, D. (2013). Distinguishing Ag-
gregation from Random Mixing in Aqueous T-Butyl Alcohol Solutions,
Faraday Discuss., 167, 177–190.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

182 Effective Fragment Potential Method

43. Gupta, R., Patey, G. N. (2012). Aggregation in Dilute Aqueous Tert-


Butyl Alcohol Solutions: Insights from Large-Scale Simulations, J. Chem.
Phys., 137, 034509.
44. Piquemal, J.-P., Chevreau, H., Gresh, N. (2007). Toward a Separate
Reproduction of the Contributions to the Hartree−Fock and DFT In-
termolecular Interaction Energies by Polarizable Molecular Mechanics
with the Sibfa Potential, J. Chem. Theory Comp., 3, 824–837.
45. Piquemal, J.-P., Chelli, R., Procacci, P., Gresh, N. (2007). Key Role of
the Polarization Anisotropy of Water in Modeling Classical Polarizable
Force Fields, J. Phys. Chem. A, 111, 8170–8176.
46. Pegram, L. M., Record, M. T. (2008). Thermodynamic Origin of
Hofmeister Ion Effects, J. Phys. Chem. B, 112, 9428–9436.
47. Jubb, A. M., Hua, W., Allen, H. C. (2011). Organization of Water and At-
mospherically Relevant Ions and Solutes: Vibrational Sum Frequency
Spectroscopy at the Vapor/Liquid and Liquid/Solid Interfaces, Acc.
Chem. Res., 45, 110–119.
48. Netz, R. R., Horinek, D. (2012). Progress in Modeling of Ion Effects at
the Vapor/Water Interface, Annu. Rev. Phys. Chem., 63, 401–418.
49. Jungwirth, P., Tobias, D. J. (2005). Specific Ion Effects at the Air/Water
Interface, Chem. Rev., 106, 1259–1281.
50. Warshel, A., Levitt, M. (1976). Theoretical Studies of Enzymic
Reactions: Dielectric, Electrostatic and Steric Stabilization of the
Carbonium Ion in the Reaction of Lysozyme, J. Mol. Biol., 103, 227–
249.
51. Gao, J. L., Byun, K. (1997). Solvent Effects on the N->Pi Transition of
Pyrimidine in Aqueous Solution, Theor. Chem. Acc., 96, 151–156.
52. Lin, Y. L., Gao, J. L. (2007). Solvatochromic Shifts of the N ->Pi*
Transition of Acetone from Steam Vapor to Ambient Aqueous Solution:
A Combined Configuration Interaction QM/MM Simulation Study
Incorporating Solvent Polarization, J. Chem. Theory Comp., 3, 1484–
1493.
53. Kongsted, J., Osted, A., Mikkelsen, K. V., Christiansen, O. (2002).
The QM/MM Approach for Wavefunctions, Energies and Response
Functions within Self-Consistent Field and Coupled Cluster Theories,
Mol. Phys., 100, 1813–1828.
54. Kongsted, J., Osted, A., Mikkelsen, K. V., Christiansen, O. (2003).
Linear Response Functions for Coupled Cluster/Molecular Mechanics
Including Polarization Interactions, J. Chem. Phys., 118, 1620–1633.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

References 183

55. Thompson, M. A., Schenter, G. K. (1995). Excited-States of the


Bacteriochlorophyll-B Dimer of Rhodopseudomonas Viridis: A QM/MM
Study of the Photosynthetic Reaction Center That Includes MM
Polarization, J. Phys. Chem., 99, 6374–6386.
56. Arora, P., Slipchenko, L. V., Webb, S. P., Defusco, A., Gordon, M. S. (2010).
Solvent-Induced Frequency Shifts: Configuration Interaction Singles
Combined with the Effective Fragment Potential Method, J. Phys. Chem.
A, 114, 6742–6750.
57. Kosenkov, D., Slipchenko, L. V. (2010). Solvent Effects on the Electronic
Transitions of P-Nitroaniline: A QM/EFP Study, J. Phys. Chem. A, 115,
392–401.
58. Ghosh, D., Isayev, O., Slipchenko, L. V., Krylov, A. I. (2011). The
Effect of Solvation on Vertical Ionization Energy of Thymine: From
Microhydration to Bulk, J. Phys. Chem. A, 115, 6028–6038.
59. Yoo, S., Zahariev, F., Sok, S., Gordon, M. S. (2008). Solvent Effects on
Optical Properties of Molecules: A Combined Time-Dependent Density
Functional Theory/Effective Fragment Potential Approach, J. Chem.
Phys., 129, 144112–8.
60. Sok, S., Willow, S. Y., Zahariev, F., Gordon, M. S. (2011). Solvent-Induced
Shift of the Lowest Singlet → * Charge-Transfer Excited State of
P-Nitroaniline in Water: An Application of the TDDFT/EFP1 Method, J.
Phys. Chem. A, 115, 9801–9809.
61. Stanton, J. F., Bartlett, R. J. (1993). The Equation of Motion Coupled-
Cluster Method. A Systematic Biorthogonal Approach to Molecular
Excitation Energies, Transition Probabilities, and Excited State Prop-
erties, J. Chem. Phys., 98, 7029–7039.
62. Koch, H., Jensen, H. J. A., Jorgensen, P., Helgaker, T. (1990). Excitation-
Energies from the Coupled Cluster Singles and Doubles Linear
Response Function (CCSDLR)—Applications to Be, Ch+ , Co, and H2 o,
J. Chem. Phys., 93, 3345–3350.
63. Sekino, H., Bartlett, R. J. (1984). A Linear Response, Coupled-Cluster
Theory for Excitation-Energy, Int. J. Quantum Chem, 18, 255–265.
64. Krylov, A. I. (2008). Equation-of-Motion Coupled-Cluster Methods for
Open-Shell and Electronically Excited Species: The Hitchhiker’s Guide
to Fock Space, Annu. Rev. Phys. Chem., 59, 433–462.
65. Runge, E., Gross, E. K. U. (1984). Density-Functional Theory for Time-
Dependent Systems, Phys. Rev. Lett., 52, 997–1000.
66. Casida, M. E. (1995) Time-Dependent Density Functional Response
Theory for Molecules (World Scientific, Singapore).
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

184 Effective Fragment Potential Method

67. Obara, S., Saika, A. (1986). Efficient Recursive Computation of


Molecular Integrals over Cartesian Gaussian Functions, J. Chem. Phys.,
84, 3963–3974.
68. Smith, Q. A., Ruedenberg, K., Gordon, M. S., Slipchenko, L. V. (2012).
The Dispersion Interaction between Quantum Mechanics and Effective
Fragment Potential Molecules, J. Chem. Phys., 136, 244107.
69. Kemp, D., Rintelman, J., Gordon, M., Jensen, J. (2010). Exchange Repul-
sion between Effective Fragment Potentials and ab initio Molecules,
Theor. Chem. Acc., 125, 481–491.
70. Lin, H., Truhlar, D. G. (2007). QM/MM: What Have We Learned, Where
Are We, and Where Do We Go from Here?, Theor. Chem. Acc., 117, 185–
199.
71. Webb, S. P., Gordon, M. S. (1999). Solvation of the Menshutkin Reaction:
A Rigorous Test of the Effective Fragment Method, J. Phys. Chem. A, 103,
1265–1273.
72. Adamovic, I., Gordon, M. S. (2005). Solvent Effects on the Sn2 Reac-
tion:? Application of the Density Functional Theory-Based Effective
Fragment Potential Method, J. Phys. Chem. A, 109, 1629–1636.
73. DeFusco, A., Ivanic, J., Schmidt, M. W., Gordon, M. S. (2011). Solvent-
Induced Shifts in Electronic Spectra of Uracil, J. Phys. Chem. A, 115,
4574–4582.
74. Minezawa, N., Silva, N. D., Zahariev, F., Gordon, M. S. (2011). Imple-
mentation of the Analytic Energy Gradient for the Combined Time-
Dependent Density Functional Theory/Effective Fragment Potential
Method: Application to Excited-State Molecular Dynamics Simulations,
J. Chem. Phys., 134, 054111.
75. Kemp, D. A., Gordon, M. S. (2008). An Interpretation of the Enhance-
ment of the Water Dipole Moment Due to the Presence of Other Water
Molecules, J. Phys. Chem. A, 112, 4885–4894.
76. Mullin, J. M., Gordon, M. S. (2009). Alanine: Then There Was Water, J.
Phys. Chem. B, 113, 8657–8669.
77. Bandyopadhyay, P., Gordon, M. S. (2000). A Combined
Discrete/Continuum Solvation Model: Application to Glycine, J. Chem.
Phys., 113, 1104–1109.
78. Bandyopadhyay, P., Gordon, M. S., Mennucci, B., Tomasi, J. (2002). An
Integrated Effective Fragment—Polarizable Continuum Approach to
Solvation: Theory and Application to Glycine, J. Chem. Phys., 116, 5023–
5032.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

References 185

79. Chen, W., Gordon, M. S. (1996). The Effective Fragment Model for
Solvation: Internal Rotation in Formamide, J. Chem. Phys., 105, 11081–
11090.
80. Kairys, V., Jensen, J. H. (2000). QM/MM Boundaries across Covalent
Bonds: A Frozen Localized Molecular Orbital-Based Approach for the
Effective Fragment Potential Method, J. Phys. Chem. A, 104, 6656–6665.
81. Li, H., Hains, A. W., Everts, J. E., Robertson, A. D., Jensen, J. H. (2002).
The Prediction of Protein Pka’s Using QM/MM:? The Pka of Lysine
55 in Turkey Ovomucoid Third Domain, J. Phys. Chem. B, 106, 3486–
3494.
82. Grigorenko, B. L., Nemukhin, A. V., Topol, I. A., Burt, S. K. (2002).
Modeling of Biomolecular Systems with the Quantum Mechanical
and Molecular Mechanical Method Based on the Effective Fragment
Potential Technique:? Proposal of Flexible Fragments, J. Phys. Chem. A,
106, 10663–10672.
83. Nemukhin, A. V., Grigorenko, B. L., Topol, I. A., Burt, S. K. (2003).
Flexible Effective Fragment QM/MM Method: Validation through the
Challenging Tests, J. Comput. Chem., 24, 1410–1420.
84. Bravaya, K., Bochenkova, A., Granovsky, A., Nemukhin, A. (2007). An
Opsin Shift in Rhodopsin:? Retinal S0−S1 Excitation in Protein, in
Solution, and in the Gas Phase, JACS, 129, 13035–13042.
85. Grigorenko, B. L., Nemukhin, A. V., Morozov, D. I., Polyakov, I. V., Bravaya,
K. B., Krylov, A. I. (2012). Toward Molecular-Level Characterization
of Photoinduced Decarboxylation of the Green Fluorescent Protein:
Accessibility of the Charge-Transfer States, J. Chem. Theory Comp., 8,
1912–1920.
86. Fujisawa, T., Terazima, M., Kimura, Y. (2008). Solvent Effects on
the Local Structure of P-Nitroaniline in Supercritical Water and
Supercritical Alcohols, J. Phys. Chem. A, 112, 5515–5526.
87. Moran, A. M., Kelley, A. M. (2001). Solvent Effects on Ground and
Excited Electronic State Structures of P-Nitroaniline, J. Chem. Phys.,
115, 912–924.
88. Schuddeboom, W., Warman, J. M., Biemans, H. A. M., Meijer, E. W.
(1996). Dipolar Triplet States of P-Nitroaniline and N-Alkyl Derivatives
with One-, Two-, and Three-Fold Symmetry, J. Phys. Chem., 100, 12369–
12373.
89. Wortmann, R., Krämer, P., Glania, C., Lebus, S., Detzer, N. (1993). De-
viations from Kleinman Symmetry of the Second-Order Polarizability
Tensor in Molecules with Low-Lying Perpendicular Electronic Bands,
Chem. Phys., 173, 99–108.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

186 Effective Fragment Potential Method

90. Kovalenko, S. A., Schanz, R., Farztdinov, V. M., Hennig, H., Ernsting, N.
P. (2000). Femtosecond Relaxation of Photoexcited Para-Nitroaniline:
Solvation, Charge Transfer, Internal Conversion and Cooling, Chem.
Phys. Lett., 323, 312–322.
91. Thomsen, C. L., Thogersen, J., Keiding, S. R. (1998). Ultrafast Charge-
Transfer Dynamics: Studies of P-Nitroaniline in Water and Dioxane, J.
Phys. Chem. A, 102, 1062–1067.
92. Head-Gordon, M., Rico, R. J., Oumi, M., Lee, T. J. (1994). A Doubles
Correction to Electronic Excited States from Configuration Interaction
in the Space of Single Substitutions, Chem. Phys. Lett., 219, 21–
29.
93. Krylov, A. I. (2001). Size-Consistent Wave Functions for Bond-
Breaking: The Equation-of-Motion Spin-Flip Model, Chem. Phys. Lett.,
338, 375–384.
94. Krylov, A. I. (2006). Spin-Flip Equation-of-Motion Coupled-Cluster
Electronic Structure Method for a Description of Excited States, Bond
Breaking, Diradicals, and Triradicals, Acc. Chem. Res., 39, 83–91.
95. Stanton, J. F., Gauss, J. (1994). Analytic Energy Derivatives for Ionized
States Described by the Equation-of-Motion Coupled Cluster Method,
J. Chem. Phys., 101, 8938–8944.
96. Pal, S., Rittby, M., Bartlett, R. J., Sinha, D., Mukherjee, D. (1987).
Multireference Coupled-Cluster Methods Using an Incomplete Model
Space: Application to Ionization Potentials and Excitation Energies of
Formaldehyde, Chem. Phys. Lett., 137, 273–278.
97. Nooijen, M., Bartlett, R. J. (1995). Equation-of-Motion Coupled-Cluster
Method for Electron-Attachment, J. Chem. Phys., 102, 3629–3647.
98. Nagata, T., Fedorov, D. G., Kitaura, K., Gordon, M. S. (2009). A Combined
Effective Fragment Potential-Fragment Molecular Orbital Method. I.
The Energy Expression and Initial Applications, J. Chem. Phys., 131,
024101.
99. Nagata, T., Fedorov, D. G., Sawada, T., Kitaura, K., Gordon, M. S.
(2011). A Combined Effective Fragment Potential-Fragment Molecular
Orbital Method. II. Analytic Gradient and Application to the Geometry
Optimization of Solvated Tetraglycine and Chignolin, J. Chem. Phys.,
134, 034110.
100. Li, H., Gordon, M. S. (2007). Polarization Energy Gradients in Combined
Quantum Mechanics, Effective Fragment Potential, and Polarizable
Continuum Model Calculations, J. Chem. Phys., 126, 124112.
February 2, 2016 14:21 PSP Book - 9in x 6in 05-Qiang-Cui-c05

References 187

101. Steinmann, C., Fedorov, D. G., Jensen, J. H. (2010). Effective Fragment


Molecular Orbital Method: A Merger of the Effective Fragment
Potential and Fragment Molecular Orbital Methods, J. Phys. Chem. A,
114, 8705–8712.
102. Pruitt, S. R., Steinmann, C., Jensen, J. H., Gordon, M. S. (2013). Fully
Integrated Effective Fragment Molecular Orbital Method, J. Chem.
Theory Comp., 9, 2235–2249.
103. Kaliman, I. A., Slipchenko, L. V. (2013). Libefp: A New Parallel
Implementation of the Effective Fragment Potential Method as a
Portable Software Library, J. Comput. Chem., 34, 2284–2292.
104. Shao, Y., Molnar, L. F., Jung, Y., Kussmann, J., Ochsenfeld, C., Brown, S. T.,
Gilbert, A. T. B., Slipchenko, L. V., Levchenko, S. V., O’Neill, D. P., DiStasio,
R. A., Lochan, R. C., Wang, T., Beran, G. J. O., Besley, N. A., Herbert, J. M.,
Lin, C. Y., Van Voorhis, T., Chien, S. H., Sodt, A., Steele, R. P., Rassolov, V. A.,
Maslen, P. E., Korambath, P. P., Adamson, R. D., Austin, B., Baker, J., Byrd,
E. F. C., Dachsel, H., Doerksen, R. J., Dreuw, A., Dunietz, B. D., Dutoi, A. D.,
Furlani, T. R., Gwaltney, S. R., Heyden, A., Hirata, S., Hsu, C. P., Kedziora,
G., Khalliulin, R. Z., Klunzinger, P., Lee, A. M., Lee, M. S., Liang, W., Lotan,
I., Nair, N., Peters, B., Proynov, E. I., Pieniazek, P. A., Rhee, Y. M., Ritchie,
J., Rosta, E., Sherrill, C. D., Simmonett, A. C., Subotnik, J. E., Woodcock,
H. L., Zhang, W., Bell, A. T., Chakraborty, A. K., Chipman, D. M., Keil, F.
J., Warshel, A., Hehre, W. J., Schaefer, H. F., Kong, J., Krylov, A. I., Gill, P.
M. W., Head-Gordon, M. (2006). Advances in Methods and Algorithms
in a Modern Quantum Chemistry Program Package, Phys. Chem. Chem.
Phys., 8, 3172–3191.
105. Turney, J. M., Simmonett, A. C., Parrish, R. M., Hohenstein, E. G.,
Evangelista, F. A., Fermann, J. T., Mintz, B. J., Burns, L. A., Wilke, J. J.,
Abrams, M. L., Russ, N. J., Leininger, M. L., Janssen, C. L., Seidl, E. T., Allen,
W. D., Schaefer, H. F., King, R. A., Valeev, E. F., Sherrill, C. D., Crawford, T. D.
(2012). PSI4: An Open-Source ab initio Electronic Structure Program,
Wiley Interdisciplinary Rev.: Comput. Mol. Sci., 2, 556–565.
106. Slipchenko, L. V. Library of EFP Potentials. https://github.com/
makefp.
This page intentionally left blank
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

SECTION II

ATOMISTIC MODELS
This page intentionally left blank
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Chapter 6

Explicit Inclusion of Induced Polarization


in Atomistic Force Fields Based on the
Classical Drude Oscillator Model

Alexey Savelyev,a Benoı̂t Roux,b and Alexander D. MacKerell, Jr.a


a Department of Pharmaceutical Sciences, School of Pharmacy,

University of Maryland, 20 Penn Street, Baltimore, MD 21230, USA


b Department of Biochemistry and Molecular Biology,

Center for Integrative Science, University of Chicago, Illinois 60637, USA


[email protected]

6.1 Introduction

To study energetics and structural transitions at the atomistic


level in complex biological molecules such as DNA and proteins
computational modeling represents a powerful and widely used
approach [1]. Traditional experimental techniques, such as X-ray
crystallography and solution NMR, yield insights into these phe-
nomena but are plagued by a number of problems associated with
crystallization and resolution issues, as well as accessibility to short
lived high energy states and time domain information [2]. During
the past two decades, several all-atom empirical force fields (FF) for

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
February 15, 2016 12:4 PSP Book - 9in x 6in 06-Qiang-Cui-c06

192 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

biological molecules have been developed, including CHARMM [3, 4],


AMBER [5], GROMOS [6], and OPLS [7], among others. They proved
to be remarkably useful for a range of systems that contain 10,000
or more atoms, being computationally cost-effective due to the
utilization of simplified potential energy functions for determination
of the energies and forces acting on such systems. One of the
major limitations of most of the current force fields associated
with such simplifications is the treatment of electrostatics within
the framework of the fixed-atomic-charge approximation, where
effective charges assigned to particles are independent of a system’s
configuration and are adjusted to account for the influence of in-
duced polarization in an average way. Such force fields are currently
used for most of biomolecular simulations and commonly termed
“additive” indicating they do not account for many-body induced
polarization effects explicitly. However, for many complex biological
systems such as polyanionic DNA or protein immersed in an aqueous
salt environment, whose conformational behavior is determined to
a significant extent by solvation effects and interactions with the
surrounding ionic atmosphere, the omission of polarization effects
may preclude a physically correct description of the forces driving
its conformational behavior. Even for small molecules the dipole
moment is known to vary significantly when they are transferred
from the gas to liquid phase. For example, an isolated water molecule
has a dipole moment of 1.85 D [8], while the average molecular
dipole is 2.1 D in the water dimer, increasing in larger water clusters
[9]. In the condensed phase it reaches a value between 2.4 and 2.6
D, as suggested from classical molecular dynamics (MD) simulations
of the dielectric properties [10, 11], and 2.95 D, as obtained from
ab initio MD simulations [12–14] and from analysis of experimental
data [15, 16]. Additionally, our recent computational studies [17,
18], as well as studies of others [19], based on polarizable MD
simulations indicate the water dipole moment to be noticeably
perturbed in the vicinity of the charged groups of proteins.
Moreover, dipole moments of various chemical groups in proteins
itself, such as peptide backbone and side chains, were shown to
behave quite differently in polarizable and non-polarizable environ-
ments [18, 20]. In particular, MD simulations of a number of fully
solvated proteins utilizing our recently developed CHARMM Drude-
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Classification of Polarizable Models 193

2013 polarizable force field produce time series for the dipole
moments of side chains of individual residues characterized by
significant variability and systematically higher values compared to
the additive results [18]. A similar trend was observed for the dipole
moments of nucleic acid bases from MD simulations of DNA [20].
These observations indicate that the variations of the electronic
structure do impact the dynamics of the system and the microscopic
forces dictating the structural and dynamical properties.
The inability of the charge distribution to vary and adapt as a
function of the local electric field is considered a major limitation of
additive models, significantly diminishing their ability to accurately
treat intermolecular interactions in a variety of environments.
Therefore, the inclusion of molecular polarizability seems a basic
requirement in order to develop force fields applicable to the
modeling of a wide range of heterogeneous environments. Currently,
there exist several major categories of computational models used in
MD simulations in which polarizability is treated explicitly. Among
them is the classical Drude oscillator model, which is employed in
one of the CHARMM polarizable force fields, whose development has
been ongoing in our laboratories since 2001. In this chapter we focus
mainly on the description of the classical Drude oscillator model,
its implementation into MD simulation codes, development of the
Drude polarizable force field for a variety of small molecules and
larger macromolecular systems, such as proteins, DNA and lipids.
Also provided is a brief overview of other methodologies commonly
used for polarizable biomolecular simulations.

6.2 Classification of Polarizable Models

Current polarizable models can be classified into three major


categories: (1) induced dipole, (2) fluctuating charge, and (3)
classical Drude oscillator (or Shell models).

6.2.1 Induced Dipole Models


One of the methods to incorporate polarizability in molecular
mechanical (MM) force fields consists of including both partial
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

194 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

atomic charges and inducible dipoles on the atoms comprising the


molecular system. In this representation, inducible dipoles are self-
consistently adjusted for any given configuration of the atoms in the
systems. The most common variation of the model [21–25] traces its
origin to the polarizable model for liquids by Kirkwood and Onsager
[26, 27]. In a closely related methodology proposed by Allinger and
co-workers bond dipoles rather than atomic dipoles, in combination
with atomic charges, are considered [28].
The induced dipole method is available in AMBER since the
version Parm02 force field [29]. An implementation of the model
has also been reported in CHARMM [25], based on the polarizable
intermolecular potential functions (PIPF) model of Gao and co-
workers [23, 24]. The PIPF potential combined with the CHARMM22
force field has been designated PIPF-CHARMM. In this model, infi-
nite polarization is avoided by using Thole’s electrostatic damping
scheme [30, 31] (see Section 6.2.3). A method to accelerate the
convergence of the induced dipoles for systems employing the PIFF
potential functions has been proposed [32]. An approximation to
the induced dipole model was proposed by Ferenczy and Reynolds
[33–41]. This induced charge method involves point charges only,
and those depend on the environment. It is based on the idea of
representing atomic point dipoles by point charges on neighboring
atoms.
Ponder and co-workers [42–49] developed the AMOEBA force
field based on modifications introduced to the original formulation
of Applequist [50] and Thole [30]. In particular, electrostatic
energy is modeled using permanent and induced contributions,
with the permanent electrostatics originating in atomic multipole–
multipole interactions with moments up to the quadrupole located
on each atom. The induced contribution is modeled iteratively by
generating an induced dipole originated by permanent multipoles
and other induced dipoles. Self-consistency is obtained using an
iterative scheme, and the Thole model [30] is used to dampen
electrostatic interactions at short range. The AMOEBA FF was
initially developed for water [45, 46], followed by an extension to
cover ions [43, 44], organic molecules, including alkanes, alcohols,
amines, sulfides, aldehydes, carboxylic acids, amides, aromatics, and
other small organic molecules [51]. Recently, the AMOEBA force field
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Classification of Polarizable Models 195

for peptides and proteins was published [52]. We refer interested


readers to a more comprehensive description and additional details
of the AMOEBA force field [53–55].

6.2.2 Fluctuating Charge Models


Another class of methods that allow for introducing polarizability
into standard energy functions deals with the partial atomic charges
varying in response to the changes in the electric field of their en-
vironment. Such variations are controlled by self-consistent atomic
electronegativity equalization or chemical potential equalization
schemes. Accordingly, these methods for treating polarizability are
known as the “fluctuating charge” method [56, 57], the “chemical
potential (electronegativity) equalization” method [58–72], or the
“charge equilibration” method [73–78] and have been applied to
a variety of systems. Applications are exemplified by the study
of liquid water [56], vapor–liquid equilibrium and interfacial
properties [79–82], studies of ions in aqueous solution [83–87],
studies of peptides [88], aqueous solvation of amides [89] and
water and cation–water clusters [90]. In these methods the values
of discrete charges located on atomic sites within a molecule are
independent variables obtained for a given molecular geometry
by minimization of electrostatic energy subject to the net charge
constraint [73].
Typically, the charge of each molecule is conserved and there is
no charge transfer between molecules. At the same time, quantum
mechanical charge transfer is an important part of the interaction
energy, so there are reasons to remove this constraint [91–95].
Unfortunately, this procedure often leads to large overestimation
of the polarizability with an increase of molecular size because,
in traditional charge fluctuating models, charge can flow along
covalent bonds at a small energetic cost, covering large portions of
the molecule. While suitable for small molecules, the application of
the method to macromolecules is problematic.
To overcome the over-polarization problem a number of ap-
proaches based on the concepts of atom–atom charge transfer
(AACT) or other charge transfer variables were developed. In the
AACT method [96], the energy is Taylor expanded in terms of
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

196 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

charges transferred between atomic pairs within the molecule,


rather than in terms of the atomic charges themselves. Similar
in spirit is the bond-charge increment (BCI) model [88, 97],
which allows for charge to only flow between two atoms that are
directly bonded to each other. This method guarantees that the
total charge of each set of bonded atoms is conserved. Another
recently developed approach, charge transfer with polarization
current equalization (QTPIE) [98], operates with the charge transfer
variables that describe a polarization current (as opposed to atomic
charge). This method demonstrated an ability to correctly treat
asymptotic behavior near dissociation and also provide a realistic
description of in-plane polarizabilities. A related approach is the
atom–bond electronegativity equalization method (ABEEM) [99–
103] which has been developed based on ideas from the density
functional theory (DFT). In this approach the molecular system is
partitioned into multiple regions including atomic regions, lone pair
regions, and bond regions, each of them having a partial charge.
Ultimately, a molecular system is described by a “field” of partial
charges, {qi }, resembling the continuous charge density q(r) of
the DFT) [104]. ABEEM has been successfully incorporated into
the intermolecular electrostatic interaction term in MM models
of water [105, 106]. More recently, the method was refined to
distinguish between σ and π bond regions (ABEEM σ π) and tested
by computing structural and energetic properties of some organic
and biochemical systems [107].

6.2.3 Classical Drude Oscillator Model


The Drude oscillator model is third common approach to represent
electronic induction in MM by introducing an auxiliary charged
particle attached to a polarizable atom by a harmonic spring. Explicit
polarization in the classical Drude oscillator implementation is also
known as a Shell or Charge-on-Spring model. In MD simulations
using the Drude model, charge redistribution as a response to the
change in the local electrostatic field is approximated by updating
self-consistently the positions of these auxiliary particles to their
local energy minima for any given configuration of the atoms in the
system, thereby taking into account the permanent electric field due
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Classification of Polarizable Models 197

to the fixed charges and the contribution of the induced dipoles to


the electric field.
The model was originally proposed by Paul Drude in 1902 as a
simple way to describe dispersive properties of materials [108]. A
quantum version of the model (including the zero-point vibrations
of the oscillator) has been used in early applications to describe
the dipole–dipole dispersion interactions [109–112]. A semiclassical
version of the model was used more recently to describe molecular
interactions [113], and electron binding [114]. The classical version
has been subsequently used for ionic crystals [115–120], simple
liquids [121–127], water [128–135], and ions [136–139], and in
recent decades has seen widespread use in MD and MC simulations.
In recent years, the Drude model was extended to interface with QM
approaches in QM/MM methods [140], facilitated by the simplicity
of the model in that it only includes additional charge centers.
Development of the Drude polarizable force field in CHARMM
in our own laboratories has been ongoing since 2001. Those
efforts have led to the development of water models [133–135]
and parameters for a collection of small molecules representative
of the functional groups in proteins, nucleic acids, lipids, and
carbohydrates [126, 127, 141–148], as well as for atomic ions [136–
139]. In recent years, progress has been made towards extending the
Drude polarizable force field from small molecules to biologically
relevant macromolecular systems, culminating in the release of
parameters for lipids [149], proteins [18] and nucleic acids [20],
with progress also being made in the carbohydrates. A more
detailed history of the Drude force field development is presented
in Section 6.4.
In the classical Drude oscillator model, polarization is deter-
mined by a pair of point charges separated by a variable distance
d. For a given atom with charge q assigned to the atomic center a
mobile Drude particle (or Drude oscillator) carrying a charge qD is
introduced. The charge on the atom is replaced by q–qD in order
to preserve the net charge of the atom-Drude oscillator pair. The
Drude particle is harmonically bound to the atomic core with a force
constant KD which can be a scalar or tensor, as elaborated below. In
the presence of a uniform electric field E, the Drude particle attached
to an atom located at position r assumes a displaced position r + d,
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

198 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

such that separation d is related to KD , qD and E as follows:


qD E
d= . (6.1)
KD
Thus, the expression for the induced atomic dipole, μ, as a function
of d is
q2E
μ= D (6.2)
KD
with the atomic polarizability, α, being equivalent to the Drude
charge, qD , squared divided by the force constant on the harmonic
spring, KD
qD2
α=
. (6.3)
KD
In the Drude polarizable model, the only relevant adjustable
parameter is the combination qD2 /KD that corresponds to the atomic
polarizability. In the limit of large KD , the treatment of induced
polarization based on Drude oscillators is formally equivalent to a
point-dipole treatment such as used by AMOEBA. In practice, the
magnitude of KD is commonly chosen to achieve small displace-
ments of Drude particles from their corresponding atomic positions,
as required to remain close to the point-dipole approximation for
the induced dipole associated with the atom-Drude pair [150] while
preserving a stable integration of the equation of motion with a
reasonable time step. For a fixed force constant KD the atomic
polarizability is determined by the amount of charge assigned to the
Drude particle. In the current implementation, the classical Drude
model introduces atomic polarizabilities only to non-hydrogen
atoms for practical considerations, as discussed below. However,
this is adequate to accurately reproduce molecular polarizabilties,
as seen in a number of published studies [127, 142, 146].
The total potential energy of the Drude polarizable model
contains the terms representative of the interaction with the static
electric field, interaction with other dipoles and the self-energy
associated with the Drude oscillators, in addition to the standard
contributions representing bonding terms (bonds, angles, dihedrals,
etc.) and intermolecular interactions represented by Lennard–Jones
(LJ) “6–12” term:
U = U bond (r) − U LJ (r) − U elec (r, d) − U self (d) (6.4)
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Classification of Polarizable Models 199

In this expression, the dependences on the nuclear positions and


the displacements of the Drude particle are indicated by r and d,
respectively. U bond (r) and U LJ (r) are, respectively, intramolecular
bonding and non-bonding LJ energetic contributions. U elec (r d) is the
sum over all Coulombic interactions between atomic core charges
(i ) (i ) (i ) (i )
q (i ) located at rN , and the Drude charges −qD and qD located at rN
(i )
and rD , respectively. The displacement vector for the Drude particle
(i ) (i )
with respect to the parent nucleus is defined as d(i ) = rN − rD ,
with a magnitude of di = |d |. The contribution U self (d) is the self-
(i )

energy of the Drude oscillators, which may take the form of isotropic
or anisotropic harmonic restraints. In our earlier Drude models all
atoms were approximated to be isotropically polarizable. In this
case, the Drude oscillators are treated as harmonic springs with the
self-energy

1
N
U self (d) = KD di2 , (6.5)
2 i =1

where the KD is a scalar value of the spring constant of the Drude


oscillator.
Subsequently, the model was extended to account for anisotropic
polarizabilities to improve non-bonded interactions as a function
of orientation involving hydrogen bond acceptors, such as oxygen
or nitrogen [143]. In addition, to mimic higher order electrostatic
effects, such as atomic multipoles on acceptors, and further
improving the treatment of non-bonded interactions as a function
of orientation, virtual particles representative of lone pairs were
included in the model [143]. For such atomic sites polarizability is
a tensor of trace = 3, diagonal in a local reference frame that is
fixed with respect to the parent molecular group. The core charge
is typically restrained to off-atom virtual sites (lone pair), and an
anisotropic Drude oscillator is employed with the self-energy to be
1 1 D 2 
U self (d) = d · KD · d = K d + K22
D 2
d2 + K33
D 2
d3 , (6.6)
2 2 11 1
where the quantities d12 d22 , and d32 are the projections of the
Drude displacement vector d on the orthogonal axis defined on a
local intramolecular reference frame. The intramolecular reference
frame may be defined, for instance, by the C=O vector and
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

200 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

the N–C=O plane of an amide bond [142]. The force constant


tensor K D is diagonal in a local reference frame of the molecular
group. Accordingly, atomic polarizability is defined as the isotropic
polarizability times a diagonal tensor A. As the trace of A is set to
3, only two of the three components, A11 and A22 need to be set in
the CHARMM residue-topology-file (RTF), such that A33 = 3 – A11 –
A22 . A11 defines the component of the vector between the atom on
which anisotropic polarizability is being assigned and a covalently
bound atom or, more generally, any particle, such as a lone pair. The
third and fourth atoms, or particles, define the A22 vector, with the
A33 vector being orthogonal to A11 and A22 [18].
An additional extension of the Drude model includes terms to im-
prove the treatment of the orientation of molecular polarizabilities.
Traditionally, in non-polarizable models electrostatic interactions,
as well as LJ terms, between atoms bonded to each other or
separated by two covalent bonds, 1,2 and 1,3 pairs, respectively,
are ignored. Similarly, in the Drude model the interactions between
core charges as well as those between the Drude oscillators and
core charges are excluded for 1–2 and 1–3 pairs. However, the
ability to preserve Coulomb interactions between Drude oscillators
(i.e., dipole–dipole interactions) for the 1,2 and 1,3 atom pairs is
important for accurate reproduction of the molecular polarizability
tensor. At the same time, the use of point charges for these
interactions is problematic as their spatial separation is small
enough that the Coulombic approximation fails. To overcome this,
the electrostatic shielding treatment proposed by Thole [30] is
applied, in which the Coulomb interactions between charges i and
j are modulated by a factor, Si j , as follows:
 
(ti + t j )ri j
e−(ti +t j )ri j /2(αi α j ) ,
1/6
Si j (ri j ) = 1 − 1 + (6.7)
2(αi α j )1/6
where ri j is the distance between the atoms, αi and α j are the
respective atomic polarizabilities, and ti and t j are the atom-based
Thole parameters that dictate the extent of the scaling between
specific atom types. It is important to note that the screening is
applied to the interaction of the electroneutral pair, including both
the mobile charge qD and its countercharge −qD located on the
atomic core. Notably, the use of atom-based Thole parameters yields
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Classification of Polarizable Models 201

improvements in the treatment of the orientation of molecular


polarizabilities [142].
While initially introduced for 1,2 and 1,3 interactions, the use of
Thole screening has been extended to non-bond atom pairs [138].
While this extension was initially motivated by the need to fine-tune
interactions involving divalent ions, the approach is general and may
be applied to any atom pair In practice, this term is applied only
when the two particles approach each other closer than a specified
threshold, typically 5 Å. For example, Thole screening was recently
used to calibrate interactions between mobile ions and nucleic acid
bases during development of the Drude polarizable force field for
DNA [20, 151].
An issue with the Drude model is the potential for polarization
catastrophe when performing MD simulations. This can occur when
a positively charged ion approaches too close to an atom, leading the
negatively charged Drude particle to become overpolarized, yielding
unphysical infinite energies and instabilities. This is because the
simple sum over Coulomb interactions does not exclude singular
1/r attractive interactions between the Drude particles and other
interaction sites carrying a net charge. While such singularities are
generally not problematic in fixed charge (additive) force fields,
where the charges are buried within 1/r 12 Lennard–Jones (LJ) core
repulsive interactions, in the polarizable Drude model the charge on
the Drude particles is not as effectively shielded from other charges
by such non-electrostatic core repulsive interactions.
A number of empirical approaches to resolve overpolarization in
a Drude model are possible. One of them is to introduce an additional
anharmonic restoring force to prevent excessively large excursions
of the Drude particle away from the atom. This force corresponds
to the following “hyperpolarization” term in the potential energy
function [138],

E hyp = Khyp (R − R0 )n , (6.8)
where R is the distance between the nucleus and the Drude particle,
n is the order of the term, typically 4 or 6, Khyp is the force constant,
and R defines the distance at which the term starts to impact the
Drude particle, typically 0.2 Å, such that the normal trajectory of the
Drude is not impacted by the higher order term. More recently, a
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

202 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

Drude reflective “hard wall” term has been added to more rigorously
avoid polarization catastrophe [149]. This term involves reversing
the relative velocities along the bond between the Drude particle
and its parent nucleus and scaling them accordingly to the proper
temperature whenever the Drude-nucleus bond length exceeds a
specified distance, again typically 0.2 Å. In addition, the relative
displacement with respect to the specified distance is reversed and
scaled according to the new velocities on the Drude particle to
ensure that the location of the Drude is within the specified distance.
From practical experience, it appears that the reflective hard
wall term represents a more robust method to avoid polarization
catastrophe as compare to the hyperpolarization terms and is now
the recommended approach to be used on all MD simulations using
the Drude polarizable force field. Finally, another possibility is to
assign a small repulsive core (i.e. LJ radius) to the Drude particle.
This can include specific Drude-atom LJ terms implemented with the
NBFIX option in the program CHARMM [3], which alters the default
Lorentz-Berthelot combination rule and introduces pair-specific LJ
parameters (see Section 6.3.2). Alternatively, the above described
through-space Thole screening function can also be used to mitigate
Coulomb interaction among two arbitrary particles.

6.2.4 Molecular Dynamics Simulations with the Classical


Drude Polarizable Model via an Extended Lagrangian
Integrator
An essential feature of MM methods for the treatment of biomole-
cular systems is their computational efficiency. The inclusion of
polarizability into the model increases the computational demand
due to the addition of dipoles or additional charges centers and,
in the context of MD simulations, the requirement for shorter
integration time steps. In addition, for every energy or force
evaluation it is necessary to solve for all the polarizable degrees of
freedom in a self-consistent manner. Traditionally, this is performed
via a self-consistent field (SCF) calculation based on the Born–
Oppenheimer approximation in which the induced polarization
is solved iteratively until a satisfactory level of convergence is
achieved. With the Drude model, this implies that the Drude
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Classification of Polarizable Models 203

particles relax in the electric field for each (fixed) nuclear configu-
ration of the system. The result is an equilibrium between the force
of the Drude spring and the electrostatic force from the total electric
field,
∂U self ∂U elec
=− , (6.9)
∂di ∂di
from which the following expression follows:
KD di − qD Ei = 0 (6.10)
Here Ei is the total electric field at the position of the Drude particle,
r – d, arising from the fixed charges as well as all the induced dipoles
(modeled with Drude oscillators). For atomic positions r, the relaxed
displacements produce the potential
U SCF (r) = U (r, d) (6.11)
and the atomic motions in the SCF regime are described by
∂U (r, d SC F )
mi r̈i = − . (6.12)
∂ri
This simulation strategy has been widely used in MD simulations
[117, 118, 131, 152, 153] and to a lesser extent in Monte Carlo
simulations [128, 129, 154]. Nonetheless, the SCF procedure is
limited and computationally demanding, because any nonconverged
SCF calculation (i.e., energy minimization in the case of the Drude
model) introduces systematic drag forces on the physical atoms
that considerably affect energy conservation and the stability of the
temperature [118, 132, 155]. Therefore, this approach is not ideal
for MD simulations.
A simple alternative to SCF is to extend the Lagrangian of the
system to treat electronic degrees of freedom as additional classical
dynamic variables with associated masses and momenta in the
MD simulation. This approach, commonly referred as the extended
Lagrangian method [156–159], originates from the Carr-Parrinello
approach for QM simulations [160]. In MD, it was implemented for
induced dipoles [156, 161], Drude oscillators [150], and fluctuating
charge methods [56, 57]. In the Drude model, the additional degrees
of freedom are the positions of the moving Drude particles relative to
their parent nucleus. To propagate classically during the simulation,
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

204 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

all Drude particles are assigned a small mass mD, i taken from the
atomic masses, mi , of their parent atoms, and the motions of atoms
and Drude particles are simulated on an equal dynamical footing, as
follows:
∂U
(mi − mD, i ) r̈ i = − (6.13)
∂ri
∂U
mD, i r̈D, i = − (6.14)
∂rD, i
The motion of Drude particles is expected to be decoupled from
the atomic motion if mD is sufficiently small. The obvious drawback
is that a small mD requires a small integration time step [119,
130]. For a single Drude oscillator, a significant speedup can be
attained by using a multi-time step integration approach [83, 84],
but this advantage is lost for a dense system of polarizable atoms,
because the long-range 1/r 3 dipole–dipole interactions include
high-frequency oscillations and have to be integrated using the
shortest time step. However, even if mD is very small, the Drude
particles will eventually reach a thermal equilibrium with the rest
of the system. Therefore, simulation approaches relying solely on
the kinetic decoupling of the Drude oscillators to maintain a Born–
Oppenheimer regime are inappropriate for long simulation runs.
To overcome this, the long thermalization time can be exploited
to remain close to the SCF energy surface by periodically resetting
the positions of the Drude oscillators to their energy minimum
[83], although doing so makes the simulation irreversible. The most
effective solution to this issue is to control the temperature of the
various degrees of freedom with a dual-thermostat (see below).
Prior to describing this methodology, it is of interest to examine
the consequences of full thermalization of the classical Drude
oscillators on the properties of the system. This is particularly
important given the fact that any classical fluctuations of the
Drude oscillators are a priori unphysical according to the Born–
Oppenheimer approximation, upon which electronic induction
models are based. It has been shown [150] that under the influence
of thermalized (“hot”) fluctuating Drude oscillators the effective
energy of the system, truncated to two-body interactions is
3  N  αi α j
U eff (r) = U SCF (r) − kB T . + · · ·. (6.15)
2 i =1 j =i
ri6j
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Classification of Polarizable Models 205

It is seen from this expression that in addition to the static induction


effects included in U SCF , the thermalized Drude oscillators give rise
to a spurious 1/r 6 , temperature-dependent, attractive term. This
(3/2)kB T α 2 /r 6 term is the classical thermodynamic equivalent of
the London quantum dispersive attraction, IEα 2 /r 6 . It corresponds
to a small perturbation to the London forces, because kB T is at least
two orders of magnitude smaller than the typical ionization energy
IE. The smaller the temperature of the dipole motion is, the closer
the effective potential is to the SCF potential.
To approximately reproduce the dynamics equivalent to the
SCF regime of Eq. 6.10), two Nosé–Hoover thermostats [162] are
employed: one to keep the atoms at room temperature T and
another to reduce the thermal fluctuations of the Drude oscillators
by imposing a temperature T ∗  T . The idea of cooling the
polarization degrees of freedom with a separate thermostat was
carefully studied by Sprik [10], who showed that, for cold dipoles,
both the equilibrium and diffusion properties are independent of
the value of the dipole inertia parameter (the analog of mD ), as
long as it is sufficiently small. For Drude oscillators, the temperature
T ∗ should be small enough to leave almost no kinetic energy in
the Drude-nucleus vibrations, yet large enough to allow the Drude
particles to readjust to the room-temperature motion of the atoms.
This requirement is achieved with the second thermostat, which is
coupled to the motion of the Drude particles relative to their nuclei,
ḋ (not to their absolute motion ṙD ). Denoting Ri the center of mass
of each (ri , r D, i ) pair, mi the total mass of the pair (as before), and
mi = mD (1 − mD /mi ) the reduced mass, the equations of motion are
mi R̈ i = FR, i − mi Ṙi η̇ (6.16)

mi d̈i = FD, i − mi ḋi η̇∗ (6.17)



Qη̈ = m j Ṙ 2j − Nf kB T (6.18)

Q∗ η̈∗ = mj ḋ 2j − N f ∗ kB T∗ . (6.19)
Indices i and j run from 1 to N, the total number of atoms. Because
not all atoms have to be polarizable, the total number of Drude
particles, ND , may be less than N. If a given atom i bears no Drude
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

206 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

oscillator, Ri corresponds to ri , mi is zero, and the corresponding


Eq. 6.17) is ignored. Nf is the number of degrees of freedom asso-
ciated with the atomic motion, accounting for distance constraints
imposed by SHAKE [163], and Nf ≡ 3ND is the number of degrees
of freedom associated with the motion of the Drude oscillators. Q
and Q∗ are the inertia factors of the Nosé–Hoover thermostats. The
“velocities” η̇ and η̇∗ are acting as friction coefficients, that is, as
scaling exponents on the velocities { Ṙ} and {ḋ}, respectively.
Initially, the above described algorithm was implemented in
the program CHARMM allowing for a range of calculations [164]
to be performed as required for the force field optimization [4].
Subsequently, the polarizable model was implemented in NAMD
[165], where an alternative dual-thermostat based on Langevin
dynamics turned out to be preferable to permit highly parallelizable
MD simulations. In addition to CHARMM and NAMD, the Drude
model has been implemented in ChemShell QM/MM [166] and
efforts towards implementation of the model in the Gromacs
package [167] and in the Open MM suite of GPU utilities [168] are
ongoing.
A particularly attractive aspect of the Drude oscillator model is
that it preserves a simple particle–particle Coulomb electrostatic
interaction already present in MM simulation codes, such that
its implementation in standard biomolecular simulation programs
is performed in a relatively straightforward way. This includes
implementation of the constant-pressure algorithm [169] and the
particle-mesh Ewald (PME) summation [170] used to treat long-
range electrostatic interactions. No new interaction types, such as
the dipole field tensor in the induced dipole model, are required. The
great practical advantage of not having to compute the dipole–dipole
interactions is balanced by the extra charge-charge calculations.
Thus, the computational cost of the Drude implementation results in
the overhead associated directly with the number of Drude particles.
Assigning a Drude particle to all physical nuclei results in a doubling
of the total number particles in the simulated system. However,
it is possible to increase computational efficiency by assigning
Drude particles only to the non-hydrogen atoms that dominate the
molecular polarizability [165]. As discussed above, this proved to be
adequate for an accurate reproduction of molecular polarizabilties
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Parametrization of the Drude Polarizable Force Field in CHARMM 207

[127, 142, 146]. Additionally, the high frequency motions of the


Drude particles limit the integration time step of MD simulations
to 1 fs for the majority of systems, though even a smaller time
step appears to be required for highly ionic systems. Given the
possible use of a 2 fs integration time step with the additive (non-
polarizable) force fields, the additional computational cost of the
Drude simulations is approximately a factor of 4. MD polarizable
simulations based on the classical Drude oscillator model are at
least one order of magnitude less computationally expensive than
the traditional SCF procedure.

6.3 Parametrization of the Drude Polarizable Force Field


in CHARMM

In this section we briefly outline the parametrization protocol for


determining the partial atomic charges, atomic polarizabilities, and
the atom-based Thole damping factors, as well as the optimization
procedure of the force field parameters not dependent of the Drude
oscillator positions, namely the bonded and Lennard–Jones terms.
While the overall parameter optimization is described linearly in
the text, it is important to bear in mind that the bonded and
nonbonded parameters are strongly interdependent, such that in
practice, an iterative procedure is adopted, with the electrostatic and
LJ nonbonded and bonded parameters optimized in turn until a self-
consistent solution is reached, offering optimal agreement with all
sets of target data.

6.3.1 Optimization of Electrostatic Parameters


In the Drude oscillator model, the determination of atomic
polarizabilities, αi , can be reduced to the determination of the
(i )
partial charges of Drude particles, qD , as follows from Eq. 6.3).
(i ) (i )
Both the atomic core charges, q and qD can be determined
simultaneously, in a single fitting step [143, 171]. Partial atomic
charges can be obtained by optimizing the fit of an electrostatic
potential (ESP) ϕ MM derived from the molecular mechanics model
to a ESP map ϕ QM generated by QM calculations on a set of grid
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

208 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

points located on non-intersecting concentric Connolly surfaces


[172] around the molecule. This is in contrast to the traditional
and computationally more expensive practice of using the cubic
grid with the grid points placed at an equidistant separation from
each other, often covering regions around molecule having minimal
chemical relevance. Adjusting the polarizabilities requires a series
of response ESP maps, ϕpQM , each one representing the altered
charge distribution for the molecule in the presence of a small
perturbing point charge, typically +0.5e at a given chemically
relevant position around the molecule. The lowest energy rotamer
of the molecule, optimized at the MP2(fc)/6-31G(d) level of theory
for neutral species and at the MP2(fc)/6-31+G(d) level for ions, is
used for constructing the ESP maps. Optimal parameters are chosen
to minimize the difference between the unperturbed and perturbed
ESP maps from the QM and MM models according to
  QM 2
χ 2 ({q, α, a}) = ϕgrid − ϕgrid
MM
({q, α, a}) + χr2 (6.20)
grid

where {q, α, a} are the set of core charges, Drude polarizabilities,


and atom-based Thole damping parameters that define the elec-
trostatic potential energy of the model. Because the charge fitting
problem is underdetermined—mainly due to the small contribution
of some charges on buried atoms to the overall ESP associated
with the screening of the those charges by atoms located on the
periphery of the molecule—an additional restraint, χr2 , is used to
ensure that the optimized parameters do not deviate appreciably
from chemically relevant values [143, 171]. Such restrained fitting
scheme is referred as the restrained electrostatic potential (RESP)
fitting which was originated by Bayly et al [173]. The reference
values for partial atomic charges and polarizabilities are adopted
from two sources. The additive CHARMM force field [4] is used to
provide a set of reference values for the atomic charges. It should
be noted that other initial guesses were considered, namely the
charges obtained from the Natural Population Analysis by Reeds
et al. [174] and Mulliken [175] charges. It was found, however, that
these two approaches possess a number of shortcomings compared
to the choice of the CHARMM charges [171]. The modified atomic
polarizabilities of Miller [176], derived from experimental gas-phase
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Parametrization of the Drude Polarizable Force Field in CHARMM 209

molecular polarizabilities, were chosen as a reference for the Drude


oscillator parameters. The Miller parameters assign additive atomic
polarizabilities to atom types based on the hybridization state of
the atom including hydrogen atoms. The reference polarizabilities
were constructed by adding the Miller polarizabilities of hydrogen
atoms to their covalently bonded heavy atom [171]. It is of utility
to reiterate that since the atomic polarizability is directly related to
(i )
a partial charge of the Drude particle, qD , the atomic and Drude
charges are determined in one step through charge fitting to the
series of perturbed ESP maps obtained from QM calculations The
initial guess for the Thole scaling factor is 1.3, a value initially
determined for benzene [127].
The Drude model containing virtual sites (lone pairs, LP) and
anisotropic polarizabilities have additional parameters that must be
fitted to QM data [143]. These are the geometry of the virtual LP
sites and the components of the polarizability force constant tensor.
The LP geometry is determined iteratively using the aforementioned
charge/polarizability fitting procedure. An initial guess is obtained
from an atoms in molecules (AIM) analysis [177] of the electron
density, by which the positions of lone pairs can be mapped to
local maxima in the negative of the Laplacian of the density. The
charges are then fit using the above protocol. The reference charge
values for the lone pair containing atoms are typically shifted to
the virtual sites, with the charge on the corresponding atom site
to be restrained to zero during the fitting procedure, while the
polarizability and Thole factor are both retained on the atomic
center. The polarizability anisotropies (or the components of the
force constant tensor) are optimized by considering the calculated
ESP around the molecule as a function of orientation, in the presence
and absence of a perturbing charge [143], typically located on a
concentric ring around the acceptor. Also, in some cases the atomic
center as well as the LPs may include a partial atomic charge [144].
Density-functional theory provides an efficient means of eval-
uating the electrostatic potential maps used to fit electrostatic
parameters of the Drude model [143, 171]. The QM electrostatic
potential calculations are evaluated using the B3LYP functional [178,
179] and the aug-cc-pVDZ basis set, a combination that has been
shown to give good agreement with molecular polarizabilities and
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

210 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

gas-phase dipole moments [171]. The electrostatic parameter fitting


of the Drude force field is carried out by minimizing Eq. (6.20) using
the FITCHARGE module in CHARMM [4] or the GAAMP utility [180].
Although gas-phase properties, such as molecular dipole moments,
are easily reproduced with the atomic polarizabilities fitted to QM
perturbed ESPs, they need to be scaled down to account for reduced
polarization expected for the condensed phase [143]. Scaling is
based on the reproduction of experimental data on the dielectric
constant of pure solvents, with scaling factor ranging from 1.0 to 0.6
obtained for the atomic polarizabilities. Scaling factors were initially
developed for the SWM4-DP [133] and SWM4-NDP [134] water
models, yielding values of 0.72 and 0.68, respectively Scaling factors
for other molecules are 0.7 for primary and secondary alcohols
[126], 0.85 for aromatics [127] N-containing heterocycles [146],
nucleic acid bases [147] and ether [145] and 1.0 for alkanes [181].
Other scaling factors are 0.7 for thiols, 0.85 for dimethyl disulfide
and 0.6 for ethylmethyl sulfide [144]. A value of 0.724 was recently
used with the atomics ions [138].

6.3.2 Optimization of Lennard–Jones and Intramolecular


Parameters
The optimization of the LJ parameters represents one of the most
intensive parts of the model parametrization, involving iterative
adjustment of selected LJ parameters to accurately reproduce
experimental target data. Among those data are liquid and crystal
phase thermodynamic properties, such as enthalpy of vaporization
and free energy of solvation, isothermal compressibility, density,
lattice geometry, liquid phase dielectric constant, self-diffusion
coefficient, heat capacity, and osmotic pressure, as available.
Additionally, QM gas-phase interaction data as well as interaction
energies and distances between the model compound and rare gases
[182, 183] represent the data also incorporated in the optimization
of LJ parameters. Within the CHARMM Drude polarizable force
field, the repulsion and dispersion components of the nonbond
interaction energy, U LJ (r), are calculated using a standard “6–12” LJ
interaction potential defined by two empirical parameters, R min and
ε, corresponding to the value of the interatomic separation at which
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Parametrization of the Drude Polarizable Force Field in CHARMM 211

U LJ (r) is a minimum and to the depth of the energy well, respectively.


The values of Rmin and ε used to calculate the interaction between
two atoms i and j are obtained from individual parameters assigned
to each of the two interacting atoms via the following combining
(Lorentz–Berthelot) rules:
R min Rmin
Rmin = ,i+ , j (6.21)
2 2

ε = εi × ε j . (6.22)
The flexibility of the model may be increased by introducing pair-
specific LJ parameters using the NBFIX option in the CHARMM [3]
thereby overriding the standard LJ parameter combining rules. This
approach allows for the inclusion of pair-specific LJ parameters for
any atom pairs of choice, while nonbond interactions involving all
other atom pairs are calculated using Rmin and ε values obtained
via the standard combining rules. As mentioned above, NBFIX
feature can be used as a remedy to the overpolarization problem,
by assigning a repulsive core to the Drude, atomic or LP particles
of interest. Recently, use of the pair-specific LJ parameters via
NBFIX was shown to work remarkably well to correct calculated
hydration free energies of various classes of the Drude polarizable
model compounds while simultaneously allowing reproduction of
pure solvent heats of vaporization and molecular volumes as
the introduced terms only impact the model compound–water
interactions [184].
The internal bonded terms including bond, angle, and dihedral
contributions are optimized as follows. Target data for equilibrium
bond length and angle parameters are typically obtained from
surveys of the Cambridge Crystal Data Bank [185] and QM
geometries optimized at the MP2/6-31G(d) model chemistry (or
MP2/6-31+G(d) in the case of anions). The optimization of force
constants was performed to reproduce QM vibrational spectra
obtained at the above model chemistries, including both frequencies
and assignments based on potential energy decomposition analysis
computed by the MOLVIB utility of CHARMM, using internal
coordinates assignments suggested by Pulay et al [186]. Force field
parameters were optimized by manually adjusting the individual
parameters until the best possible agreement with the target QM
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

212 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

data was obtained. Because of the interdependence of the bonded


and non-bonded parameters the procedure for the optimization
of equilibrium bond lengths and angles, as well as force constant
parameter values, was repeated each time a new set of LJ parameters
became available.
Special attention is required for the procedure for optimizing
the dihedral parameters for flexible model compounds, such as
acyclic moieties and small molecules representative of the intrinsic
conformational properties of various biopolymers, such as DNA,
proteins and lipids. For such model compounds, QM potential
energy scans (PES) for dihedral angles of interest, sampled from
0◦ to 360◦ with an increment of 15◦ , are calculated by initially
optimizing the structure at the MP2/6-31+G(d) level followed
by a single-point energy calculations at the RIMP2/cc-pVTZ level,
though higher levels may be used in some cases, such as alkanes.
In so doing, all other rotatable degrees of freedom are typically
maintained at geometries corresponding to relevant macromole-
cular conformations. For instance, when energetics of the DNA
phosphodiester backbone are studied using a model compound
that contains a phosphodiester linkage capped by furanose rings,
QM one-dimensional PESs for each of the key dihedral angles
encountered along the polymer backbone are computed, with the
rest of the rotatable bonds fixed in the conformations corresponding
to the canonical forms DNA [20, 187]. When necessary, two-
dimensional energetic profiles are obtained from QM calculations to
address correlations among particular torsions. MM calculations are
repeated with the empirical dihedral force field parameters adjusted
to minimize the difference between QM and MM PESs.

6.3.3 Optimization at the Macromolecular Level


Optimization of the Drude FF for macromolecular systems is based
on balancing the energetic properties of the underlying model com-
pounds and their overall conformational and dynamical properties
in condensed phase. It is important to note that this strategy is
more physically sound, though significantly more demanding, than
approaches based on empirically adjusting parameters targeting
only condensed phase experimental data, or approaches aimed at
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Parametrization of the Drude Polarizable Force Field in CHARMM 213

reproducing solely QM data and/or experimental data on small


model compounds. Indeed, the first approach does not guarantee
sufficient fidelity of the important local degrees of freedom, while
the latter does not capture correlations among various motional
modes and other many-body effects in the macromolecule. For
example, applying parameters developed for the Drude model com-
pounds representative of DNA backbone and glycosydic linkages,
such as nucleic acid bases, dimethyphosphate anion, tetrahdrofuran
and others, did not result in a model capable of reproducing critical
aspects of DNA structural behavior in solution [20, 171]. Those
aspects include the equilibrium between A and B forms of DNA
and the BI/BII conformational equilibrium within the B form of the
DNA in solution, both driven by correlation effects in DNA duplex
[20, 188]. Faithful capture of these structural aspects by the Drude
model required reoptimization of some of the underlying model
compounds and empirical adjustment of the key dihedral parame-
ters associated with the DNA phosphodiester backbone, glycosidic
linkages and sugar moiety. During development of the Drude FF
for proteins, a similar strategy was adopted [18]. For instance,
optimization of the polypeptide backbone electrostatic parameters
targeted both the QM data (conformational energies, interactions
with water, molecular dipole moments and polarizabilities for
dialanine) and experimental condensed phase data for extended
polypeptides such as (Ala)5 . In addition, optimization of the
backbone dihedral angles ϕ, ψ included empirical adjustments of the
CMAP away from the gas phase QM surface obtained for the alanine
dipeptide, to improve agreement with conformational sampling of
the peptide backbone in peptides and proteins. Finally, optimization
of the dihedral parameters χ1 and χ2 for side chains, sampling of
which is known to impact the conformational distribution of the
polypeptide backbone, was based on the manual adjustments guided
by the data from condensed phase simulations of the 9-mer peptide
(Ala)4 – X – (Ala)4 for each amino acid X, in addition to the gas-phase
QM data for dipeptides of the different amino acids.
The above examples demonstrate that optimization of the
parameters for the highly correlated macromolecules such as
DNA and proteins needs to simultaneously target both QM and
experimental data, allowing for a compromise between the level
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

214 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

of agreement with the two types of target data. While this


approach was previously applied during CHARMM additive force
field development, the increased sensitivity of the polarizable FF
to changes in the environment increases the degree of difficulty in
balancing the reproduction of the gas phase QM and experimental
condensed data. However, the inclusion of electronic polarizability
in the Drude model does allow for closer agreement with gas phase
QM data for the model compounds as compared to the additive
model while yielding a level of agreement with experimental data
on the macromolecules similar to that with the additive C36 model.

6.4 Historical Overview of the CHARMM Drude


Polarizable Force Field for Small Molecules and
Biological Polymers

The initial formulation and development of the Drude polarizable


FF in CHARMM [164] started in 2000 in our laboratories. As
elaborated above, development of the force field first involved
implementation of the appropriate dual-thermostat integrators to
allow computationally efficient extended Langrangian MD simula-
tions [150]. This was followed by optimization of the first water
model, in which a positive charge was assigned to the Drude particle
(SWM4-DP) [133]. The SWM4-DP model was re-optimized with a
negative charged assigned to the Drude particles, consistent with
their representation of the electronic degrees of freedom. The new
model, called SWM4-NDP [134], is the standard polarizable water
model of the Drude polarizable FF. It was calibrated to reproduce
important properties of the neat liquid at room temperature and
pressure such as enthalpy of vaporization, density, static dielectric
constant and self-diffusion constant, free energy of hydration and
shear viscosity. Concurrently with development of the water model,
the methodologies to determine electrostatic parameters for the
Drude FF elaborated above became advanced [143, 171]. More
recently, a six-point polarizabile model, SWM6, that includes LPs
and a virtual M site was developed and shown to yield improved
treatment of hydrogen bonding interactions [135].
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Historical Overview of the CHARMM Drude Polarizable Force Field 215

An early test of the feasibility of MD simulations with the Drude


polarizable FF was a successful simulation of a DNA octamer in a box
of water with sodium counterions [171]. Development of the Drude
polarizable FF continued with parametrization of small molecules
covering the functional groups commonly found in biomolecules. In
2005, the alkane FF was developed, followed by parametrization
of alcohols and aromatic compounds in 2007 [126, 127]. Harder
et al. published the first generation of N-methyl acetamide (NMA)
parameters in 2008 [142]. Noteworthy is the treatment of the
dielectric constant by the polarizable FF in all systems, a property
considered essential for the accurate treatment of, for example,
hydrophobic solvation in biomolecules. The Drude polarizable FF
was extended to nitrogen-containing heteroaromatic compounds in
2009 [146]. FF parameters were refitted for ethers by Baker and
MacKerell [189], with significant improvements in the reproduction
of liquid phase dielectric constants, while maintaining the good
agreement of the previous model with all other experimental and
QM target data [141]. Sulfur containing model compounds were
parametrized in 2010 [144]. Other classes of molecules for which
Drude empirical FF parameters had been developed are nucleic
acid bases [147] and acyclic polyalcohols [148]. Early simulations
of dipalmitoylphosphatidylcholine (DPPC) bilayers and monolayers
were reported [190].
Significant progress has been made in extending the Drude
polarizable FF from small compounds representative of the build-
ing blocks encountered in biological polymers to the polymers
themselves. The Drude empirical FF applicable to MD simulation
studies of peptides and proteins, termed Drude-2013, was published
in 2013 [18]. Earlier the same year Drude polarizable FF for a
phosphatidylcholine-containing lipids was released [149]. In the
area of the carbohydrates the polyalcohols were published in 2013
[148] and parameters for the hexapyranose monosaccharides were
completed in the end of 2013 [191]. Finally, the polarizable Drude
model for DNA has been recently completed (January 2014) [20,
151]. In the latter stages of model optimization, the implementation
of the Drude FF in NAMD [165] played a critical role by it
making possible to efficiently generate MD simulations of large-scale
biomolecular systems.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

216 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

6.5 Conclusion

In this chapter we focused on presenting details of the classical


Drude oscillator model where additional charge sites attached
directly to polarizable atoms are used to model electronic induction
in an MM force field. Emphasis was placed on the mathematical
foundation of the Drude oscillar model, its implementation in MD
simulation codes and applications to small biological molecules
and biopolymers. A brief overview of other methodologies for
modeling electronic polarization effects in atomistic empirical force
fields is also presented. Those include induced dipole model, where
inducible dipole as an additional property is introduced on atomic
centers, or along chemical bonds; and fluctuating charge model,
dealing with the atomic charge redistribution within a molecule in
response to the changes in the electrostatic environment. While all
these models are undergoing active development, only the polariz-
able Drude model developed primarily in the context of CHARMM,
resulting from the work of MacKerell, Roux and co-workers, has
been developed for a broad class of macromolecules, including lipids
[149], proteins [18], carbohydrates [148] and DNA [20], in addition
to polarizable models for a variety of small biologically relevant ions
[138, 139] and molecules [126, 127, 141–148]. Another polarizable
force field, AMOEBA, based on the induced dipole approach, has
recently achieved the goal of producing a fully functional model for
proteins [52], although no parameters have been reported for nu-
cleic acids and lipids. Comparison between the different polarizable
force fields will shed light on the aspects that are model-specific
versus those that reflect robust physical features of the systems.
The Drude oscillator model has a number of advantages
over other polarizable models facilitating its implementation in
multiple simulation packages including CHARMM [150], NAMD
[165], ChemSell QM/MM [192] and the OpenMM suite of utilities for
GPU [193]. Representing a dipole as two point charges provides an
intuitive physical picture in terms of displacement of the electronic
distribution; the model is able to represent delocalization without
need of additional non-atomic sites since the dipole is not point-like
as, e.g., in the induced dipole model. For example, the use of auxiliary
particles allows for the inclusion of mechanical polarizabilities [194]
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

Conclusion 217

by including LJ parameters on the Drude particles. With respect to


practical considerations, the Drude model allows any pre-existing,
charge-based, methodology to be used, including constant-pressure,
[169], particle-mesh Ewald summation [170], and the various
schemes used in QM/MM studies [166, 192]. Finally, the relatively
low computational cost of the Drude model implementation—an
essential feature of MM methods for the treatment of biomolecular
systems—allows stable MD simulations to be conducted on a 100+
ns time scale for fully solvated DNA [20] and proteins [18], as well
as the simulations of lipid bilayer membranes [149]. Accordingly,
with the currently available macromolecular parameters [18, 20,
149] and parameters for carbohydrates [126, 127, 141–148], atomic
ions [138, 139] and new model of water [135], as well as tools
for the optimization of small molecule parameters [180], the
CHARMM Drude polarizable FF has the potential of becoming a
truly comprehensive and broadly used biomolecular force field. The
CHARMM Drude FF has the potential to be utilized in numerous
application studies of heterogeneous biological systems using a fully
polarizable force field.
MD simulations with the Drude polarizable model have been
shown to be more sensitive to initial conditions than simulations
carried out with an additive (non-polarizable) model [18]. Accord-
ingly, it is recommended that systems be carefully equilibrated using
a nonadditive FF, such as CHARMM36, and be subsequently con-
verted to the Drude polarizable model. To facilitate this procedure, a
new module, the “Drude Prepper,” has been added to the CHARMM-
GUI [195] allowing for previously equilibrated CHARMM36 additive
system to be readily converted into the Drude model along with the
production of standard best-practices input files for MD simulations
using CHARMM or NAMD.

Acknowledgments

The NIH (GM051501, GM072558 and GM070855) is thanked for


financial support, we acknowledge the University of Maryland
Computer-Aided Drug Design Center and the XSEDE resources for
their generous allocations of computer time.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

218 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

References

1. MacKerell, A. D., Jr., Empirical force fields for biological macromole-


cules: Overview and issues. J. Comput. Chem., 2004. 25, 1584–1604.
2. Cheatham, T. E., III, and D. A. Case, Twenty-five years of nucleic acid
simulations. Biopolymers, 2013. 99(12), 969–977.
3. Brooks, B. R., et al., CHARMM: A program for macromolecular energy,
minimization, and dynamics calculations. J. Comput. Chem., 1983. 4,
187–217.
4. MacKerell, A. D., Jr., et al., CHARMM: The energy function and its
paramerization with an overview of the program, in Encyclopedia of
Computational Chemistry (Schleyer, P. V. R., et al., eds.), John Wiley &
Sons: Chichester. 1998, p. 271–277.
5. Cornell, W. D., et al., A second generation force field for the simulation
of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc.,
1995. 117, 5179–5197.
6. van Gunsteren, W. F., GROMOS. Groningen Molecular Simulation
Program Package. 1987: University of Groningen, Groningen.
7. Jorgensen, W. L., and J. Tirado-Rives, The OPLS potential function
for proteins. Energy minimizations for crystals of cyclic peptides and
crambin. J. Am. Chem. Soc., 1988. 110, 1657–1666.
8. Dyke, T. R., and J. S. Muenter, Electric dipole moments of low J states of
H2O and D2O. J. Chem. Phys., 1973. 59(6), 3125–3127.
9. Gregory, J. K., et al., The water dipole moment in water clusters. Science,
1997. 275(5301), 814–817.
10. Sprik, M., Computer-simulation of the dynamics of induced polariza-
tion fluctuations in water. J. Phys. Chem., 1991. 95(6), 2283–2291.
11. Soetens, J. C., M. Costa, and C. Millot, Static Dielectric Constant of the
Polarizable NCC Water Model. Mol. Phys., 1998. 94(3), 577–579.
12. Silvestrelli, P. L. and M. Parrinello, Structural, electronic, and bonding
properties of liquid water from first principles. J. Chem. Phys., 1999.
111(8), 3572–3580.
13. Silvestrelli, P. L., and M. Parrinello, Water molecule dipole in the gas
and in the liquid phase. Phys. Rev. Lett., 1999. 82(26), 5415–5415.
14. Silvestrelli, P. L., and M. Parrinello, Water molecule dipole in the gas
and in the liquid phase. Phys. Rev. Lett., 1999. 82(16), 3308–3311.
15. Badyal, Y. S., et al., Electron distribution in water. J. Chem. Phys., 2000.
112(21), 9206–9208.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

References 219

16. Gubskaya, A. V., and P. G. Kusalik, The total molecular dipole moment
for liquid water. J. Chem. Phys., 2002. 117(11), 5290–5302.
17. Patel, S., A. D. MacKerell, Jr., and C. L. Brooks, III, CHARMM fluctuating
charge force field for proteins: II Protein/solvent properties from
molecular dynamics simulations using a nonadditive electrostatic
model. J. Comput. Chem., 2004. 25, 1504–1514.
18. Lopes, P. E. M., et al., Polarizable force field for peptides and proteins
based on the classical Drude oscillator. J. Chem. Theory Comp., 2013.
9(12), 5430–5449.
19. Kim, B., et al., Structure and dynamics of the solvation of bovine
pancreatic trypsin inhibitor in explicit water: A comparitive study of
the effects of solvent and protein polarizability. J. Phys. Chem. B, 2005.
109, 16529–16538.
20. Savelyev, A., and A. D. MacKerell, All-atom polarizable force field for
DNA based on the classical drude oscillator model. J. Comput. Chem.,
2014. 35(16), 1219–1239.
21. Maple, J. R., et al., A polarizable force field and continuum solvation
methodology for modeling of protein-ligand interactions. J. Chem.
Theor. Comput., 2005. 1(4), 694–715.
22. Swart, M., and P. T. van Duijnen, DRF90: a polarizable force field. Mol.
Simul., 2006. 32(6), 471–484.
23. Gao, J. L., D. Habibollazadeh, and L. Shao, A polarizable intermolecular
potential function for simulation of liquid alcohols. J. Phys. Chem., 1995.
99(44), 16460–16467.
24. Gao, J. L., J. J. Pavelites, and D. Habibollazadeh, Simulation of liquid
amides using a polarizable intermolecular potential function. J. Phys.
Chem., 1996. 100(7), 2689–2697.
25. Xie, W. S., et al., Development of a polarizable intermolecular potential
function (PIPF) for liquid amides and alkanes. J. Chem. Theor. Comput.,
2007. 3(6), 1878–1889.
26. Onsager, L., Electric moments of molecules in liquids. J. Am. Chem. Soc.,
1936. 58(8), 1486–1493.
27. Kirkwood, J. G., The dielectric polarization of polar liquids. J. Chem.
Phys., 1939. 7(10), 911–919.
28. Ma, B. Y., J. H. Lii, and N. L. Allinger, Molecular polarizabilities and
induced dipole moments in molecular mechanics. J. Comput. Chem.,
2000. 21(10), 813–825.
29. Cieplak, P., J. Caldwell, and P. Kollman, Molecular mechanical models
for organic and biological systems going beyond the atom centered
February 15, 2016 12:4 PSP Book - 9in x 6in 06-Qiang-Cui-c06

220 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

two body additive approximation: Aqueous solution free energies


of methanol and N-methyl acetamide, nucleic acid base, and amide
hydrogen bonding and chloroform/water partition coefficients of the
nucleic acid bases. J. Comput. Chem., 2001. 22(10), 1048–1057.
30. Thole, B. T., Molecular polarizabilities calculated with a modified
dipole interaction. Chem. Phys., 1981. 59(3), 341.
31. van Duijnen, P. T., and M. Swart, Molecular and atomic polarizabilities:
Thole’s model revisited. J. Phys. Chem. A, 1998. 102(14), 2399–2407.
32. Xie, W., J. Pu, and J. Gao, A coupled polarization-matrix inversion
and iteration approach for accelerating the dipole convergence in a
polarizable potential function. J. Phys. Chem. A, 2009. 113(10), 2109–
2116.
33. Ferenczy, G. G., and C. A. Reynolds, Modeling polarization through
induced atomic charges. J. Phys. Chem. A, 2001. 105(51), 11470–
11479.
34. Reynolds, C. A., G. G. Ferenczy, and W. G. Richards, Methods For
determining the reliability of semiempirical electrostatic potentials
and potential derived charges. J. Mol. Struct.: THEOCHEM, 1992. 256,
249–269.
35. Winn, P. J., G. G. Ferenczy, and C. A. Reynolds, Toward improved force
fields. 1. Multipole-derived atomic charges. J. Phys. Chem. A, 1997.
101(30), 5437–5445.
36. Ferenczy, G. G., P. J. Winn, and C. A. Reynolds, Toward improved
force fields. 2. Effective distributed multipoles. J. Phys. Chem. A, 1997.
101(30), 5446–5455.
37. Winn, P. J., G. G. Ferenczy, and C. A. Reynolds, Towards improved force
fields: III. Polarization through modified atomic charges. J. Comput.
Chem., 1999. 20(7), 704–712.
38. Ferenczy, G. G., et al., Effective distributed multipoles for the quanti-
tative description of electrostatics and polarisation in intermolecular
interactions. Abstr. Papers Am. Chem. Soc., 1997. 214, 38-COMP.
39. Wu, J. H., et al., Solute polarization and the design of cobalt complexes
as redox-active therapeutic agents. Int. J. Quantum Chem., 1999. 73(2),
229–236.
40. Gooding, S. R., et al., Fully polarizable QM/MM calculations: An
application to the nonbonded iodine-oxygen interaction in dimethyl-
2-iodobenzoylphosphonate. J. Comput. Chem., 2000. 21(6), 478–482.
41. Illingworth, C. J. R., et al., Classical polarization in hybrid QM/MM
methods. J. Phys. Chem. A, 2006. 110(20), 6487–6497.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

References 221

42. Ren, P., and J. W. Ponder, Consistent treatment of inter- and intramole-
cular polarization in molecular mechanics calculations. J. Comput.
Chem., 2002. 23, 1497–1506.
43. Grossfield, A., P. Y. Ren, and J. W. Ponder, Ion solvation thermodynamics
from simulation with a polarizable force field. J. Am. Chem. Soc., 2003.
125(50), 15671–15682.
44. Grossfield, A., P. Y. Ren, and J. W. Ponder, Single ion solvation
thermodynamics from simulations. Biophys. J., 2003. 84(2), 94A.
45. Ren, P. Y., and J. W. Ponder, Polarizable atomic multipole water model
for molecular mechanics simulation. J. Phys. Chem. B, 2003. 107(24),
5933–5947.
46. Ren, P. Y., and J. W. Ponder, Temperature and pressure dependence of
the AMOEBA water model. J. Phys. Chem. B, 2004. 108(35), 13427–
13437.
47. Grossfield, A., Dependence of ion hydration on the sign of the ion’s
charge. J. Chem. Phys., 2005. 122(2), 024506.
48. Jiao, D., et al., Simulation of Ca2+ and Mg2+ solvation using
polarizable atomic multipole potential. J. Phys. Chem. B, 2006. 110(37),
18553–18559.
49. Rasmussen, T. D., et al., Force field modeling of conformational
energies: Importance of multipole moments and intramolecular
polarization. Int. J. Quantum Chem., 2007. 107(6), 1390–1395.
50. Applequist, J., J. R. Carl, and K.-K. Fung, Atom dipole interaction model
for molecular polarizability. Application to polyatomic molecules and
determination of atom polarizabilities. J. Am. Chem. Soc., 1972. 94(9),
2952–2960.
51. Ren, P., C. Wu, and J. W. Ponder, Polarizable atomic multipole-based
molecular mechanics for organic molecules. J. Chem. Theory Comput.,
2011. 7(10), 3143–3161.
52. Shi, Y., et al., Polarizable atomic multipole-based AMOEBA force field
for proteins. J. Chem. Theory Comput., 2013. 9, 4046–4064.
53. Ponder, J. W., et al., Current status of the AMOEBA polarizable force
field. J. Phys. Chem. B, 2010. 114(8), 2549–2564.
54. Ren, P. Y., and J. W. Ponder, Consistent treatment of inter- and
intramolecular polarization in molecular mechanics calculations. J.
Comput. Chem., 2002. 23(16), 1497–1506.
55. Ponder, J. W., and D. A. Case, Force fields for protein simulations, in
Protein Simulations (Daggett, V., et al., eds.), Elsevier Academic Press.
2003, p. 27–86.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

222 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

56. Rick, S. W., S. J. Stuart, and B. J. Berne, Dynamical fluctuating charge


force-fields - application to liquid water. J. Chem. Phys., 1994. 101(7),
6141–6156.
57. Patel, S., and C. L. Brooks, CHARMM fluctuating charge force field for
proteins: I parameterization and application to bulk organic liquid
simulations. J. Comput. Chem., 2004. 25(1), 1–15.
58. Nalewajski, R. F., Normal (decoupled) representation of electronegativ-
ity equalization equations in a molecule. Int. J. Quantum Chem., 1991.
40(2), 265–285.
59. Nalewajski, R. F., On the chemical potential/electronegativity equaliza-
tion in density functional theory. Pol. J. Chem., 1998. 72(7), 1763–1778.
60. Nalewajski, R. F., Charge sensitivities of the externally interacting open
reactants. Int. J. Quantum Chem., 2000. 78(3), 168–178.
61. Chelli, R., et al., Calculation of optical spectra in liquid methanol using
molecular dynamics and the chemical potential equalization method.
J. Chem. Phys., 1999. 111(9), 4218–4229.
62. Chelli, R., and P. Procacci, A transferable polarizable electrostatic
force field for molecular mechanics based on the chemical potential
equalization principle. J. Chem. Phys., 2002. 117(20), 9175–9189.
63. Chelli, R., et al., Behavior of polarizable models in presence of strong
electric fields. I. Origin of nonlinear effects in water point-charge
systems. J. Chem. Phys., 2005. 123(19), 194109.
64. Itskowitz, P., and M. L. Berkowitz, Chemical potential equalization
principle: Direct approach from density functional theory. J. Phys.
Chem. A, 1997. 101(31), 5687–5691.
65. Bret, C., M. J. Field, and L. Hemmingsen, A chemical potential
equalization model for treating polarization in molecular mechanical
force fields. Mol. Phys., 2000. 98(11), 751–763.
66. Llanta, E., K., Ando, and R. Rey, Fluctuating charge study of polarization
effects in chlorinated organic liquids. J. Phys. Chem. B, 2001. 105(32),
7783–7791.
67. York, D. M., and W. T. Yang, A chemical potential equalization method
for molecular simulations. J. Chem. Phys., 1996. 104(1), 159–172.
68. Smith, P. E., Local chemical potential equalization model for cosolvent
effects on biomolecular equilibria. J. Phys. Chem. B, 2004. 108(41),
16271–16278.
69. Medeiros, M., Monte Carlo simulation of polarizable systems: Early
rejection scheme for improving the performance of adiabatic nuclear
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

References 223

and electronic sampling Monte Carlo simulations. Theor. Chem. Acc.,


2005. 113(3), 178–182.
70. Piquemal, J. P., et al., Key role of the polarization anisotropy of water
in modeling classical polarizable force fields. J. Phys. Chem. A, 2007.
111(33), 8170–8176.
71. Warren, G. L., J. E. Davis, and S. Patel, Origin and control of superlinear
polarizability scaling in chemical potential equalization methods. J.
Chem. Phys., 2008. 128(14), 144110.
72. Zhang, Y., and H. Lin, Flexible-boundary quantum-mechanical/
molecular-mechanical calculations: Partial charge transfer between
the quantum-mechanical and molecular-mechanical subsystems. J.
Chem. Theor. Comput., 2008. 4(3), 414–425.
73. Rappé, A. K., and W. A. Goddard, Charge equilibration for molecular-
dynamics simulations. J. Phys. Chem., 1991. 95(8), 3358–3363.
74. Kitao, O., and T. Ogawa, Consistent charge equilibration (CQEq). Mol.
Phys., 2003. 101(1–2), 3–17.
75. Ogawa, T., et al., Consistent charge equilibration (CQEq) method:
application to amino acids and crambin protein. Chem. Phys. Lett.,
2004. 397(4–6), 382–387.
76. Nistor, R. A., et al., A generalization of the charge equilibration method
for nonmetallic materials. J. Chem. Phys., 2006. 125(9), 094108.
77. Sefcik, J., et al., Dynamic charge equilibration-morse stretch force field:
Application to energetics of pure silica zeolites. J. Comput. Chem., 2002.
23(16), 1507–1514.
78. Tanaka, M., and H. U. Siehl, An application of the consistent charge
equilibration (CQEq) method to guanidinium ionic liquid systems.
Chem. Phys. Lett., 2008. 457(1–3), 263–266.
79. Chen, B., J. H. Xing, and J. I. Siepmann, Development of polarizable
water force fields for phase equilibrium calculations. J. Phys. Chem. B,
2000. 104(10), 2391–2401.
80. Patel, S., and C. L. Brooks, Structure, thermodynamics, and liquid-
vapor equilibrium of ethanol from molecular-dynamics simula-
tions using nonadditive interactions. J. Chem. Phys., 2005. 123(16),
164502.
81. Patel, S., and C. L. Brooks, A nonadditive methanol force field: Bulk
liquid and liquid-vapor interfacial properties via molecular dynamics
simulations using a fluctuating charge model. J. Chem. Phys., 2005.
122(2), 024508.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

224 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

82. Zhong, Y., G. L. Warren, and S. Patel, Thermodynamic and structural


properties of methanol-water solutions using nonadditive interaction
models. J. Comput. Chem., 2008. 29(7), 1142–1152.
83. Stuart, S. J., and B. J. Berne, Effects of polarizability on the hydration of
the chloride ion. J. Phys. Chem., 1996. 100(29), 11934–11943.
84. Stuart, S. J., and B. J. Berne, Surface curvature effects in the aqueous
ionic solvation of the chloride ion. J. Phys. Chem. A, 1999. 103(49),
10300–10307.
85. Warren, G. L., and S. Patel, Hydration free energies of monovalent
ions in transferable intermolecular potential four point fluctuating
charge water: An assessment of simulation methodology and force
field performance and transferability. J. Chem. Phys., 2007. 127(6),
064509.
86. Warren, G. L., and S. Patel, Comparison of the solvation structure
of polarizable and nonpolarizable ions in bulk water and near the
aqueous liquid-vapor interface. J. Phys. Chem. C, 2008. 112(19), 7455–
7467.
87. Warren, G. L., and S. Patel, Electrostatic properties of aqueous salt
solution interfaces: A comparison of polarizable and nonpolarizable
ion models. J. Phys. Chem. B, 2008. 112(37), 11679–11693.
88. Banks, J. L., et al., Parametrizing a polarizable force field from ab
initio data. I. The fluctuating point charge model. J. Chem. Phys., 1999.
110(2), 741–754.
89. Rick, S. W., and B. J. Berne, Dynamical fluctuating charge force fields:
The aqueous solvation of amides. J. Am. Chem. Soc., 1996. 118(3), 672–
679.
90. Toufar, H., et al., Investigation of supramolecular systems by a combi-
nation of the electronegativity equalization method and a Monte-Carlo
simulation technique. J. Phys. Chem., 1995. 99(38), 13876–13885.
91. Kitaura, K., and K. Morokuma, A new energy decomposition scheme
for molecular interactions within the Hartree-Fock approximation. Int.
J. Quantum Chem., 1976. 10(2), 325–340.
92. Weinhold, F., Nature of H-bonding in clusters, liquids, and enzymes: an
ab initio, natural bond orbital perspective. J. Mol. Struct.: THEOCHEM.,
1997. 398–399, 181–197.
93. van der Vaart, A., and K. M. Merz, The role of polarization and charge
transfer in the solvation of biomolecules. J. Am. Chem. Soc., 1999.
121(39), 9182–9190.
February 15, 2016 12:4 PSP Book - 9in x 6in 06-Qiang-Cui-c06

References 225

94. Korchowiec, J., and T. Uchimaru, New energy partitioning scheme


based on the self-consistent charge and configuration method for
subsystems: Application to water dimer system. J. Chem. Phys., 2000.
112(4), 1623–1633.
95. Jeziorski, B., R. Moszynski, and K. Szalewicz, Perturbation-theory
approach to intermolecular potential-energy surfaces of van-der-
Waals complexes. Chem. Rev., 1994. 94(7), 1887–1930.
96. Chelli, R., et al., Electrical response in chemical potential equalization
schemes. J. Chem. Phys., 1999. 111(18), 8569–8575.
97. Stern, H. A., et al., Fluctuating charge, polarizable dipole, and combined
models: Parameterization from ab initio quantum chemistry. J. Phys.
Chem. B, 1999. 103(22), 4730–4737.
98. Chen, J. H., and T. J. Martinez, QTPIE: Charge transfer with polarization
current equalization. A fluctuating charge model with correct asymp-
totics. Chem. Phys. Lett., 2007. 438(4–6), 315–320.
99. Yang, Z. Z., and C. S. Wang, Atom-bond electronegativity equalization
method. 1. Calculation of the charge distribution in large molecules. J.
Phys. Chem. A, 1997. 101(35), 6315–6321.
100. Wang, C. S., S. M. Li, and Z. Z. Yang, Calculation of molecular energies
by atom-bond electronegativity equalization method. J. Mol. Struct.:
THEOCHEM, 1998. 430, 191–199.
101. Wang, C. S., and Z. Z. Yang, Atom-bond electronegativity equalization
method. II. Lone-pair electron model. J. Chem. Phys., 1999. 110(13),
6189–6197.
102. Cong, Y., and Z. Z. Yang, General atom-bond electronegativity equaliza-
tion method and its application in prediction of charge distributions in
polypeptide. Chem. Phys. Lett., 2000. 316(3–4), 324–329.
103. Yang, Z. Z., and C. S. Wang, Atom-bond electronegativity equalization
method and its applications based on density functional theory. J.
Theor. Comput. Chem., 2003. 2(2), 273–299.
104. Yang, Z. Z., and C. S. Wang, Molecular electronegativity in density
functional theory(VIII)) - Charge polarization modes in a closed
system. Sci. China Ser. B Chem., 2000. 43(2), 187–195.
105. Yang, Z. Z., Y. Wu, and D. X. Zhao, Atom-bond electronegativity
equalization method fused into molecular mechanics. I. A seven-site
fluctuating charge and flexible body water potential function for water
clusters. J. Chem. Phys., 2004. 120(6), 2541–2557.
106. Wu, Y., and Z. Z. Yang, Atom-bond electronegativity equalization
method fused into molecular mechanics. II. A seven-site fluctuating
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

226 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

charge and flexible body water potential function for liquid water. J.
Phys. Chem. A, 2004. 108(37), 7563–7576.
107. Zhao, D.-X., et al., Development of a Polarizable Force Field Using
Multiple Fluctuating Charges per Atom. J. Chem. Theory Comput., 2010.
6(3), 795–804.
108. Drude, P., The Theory of Optics (1902), (Millikan, R. A. T., and C. T. Riborg
Mann, ed.), 2008: Kessinger Publishing Company.
109. London, F., The general theory of molecular forces. Trans. Faraday Soc.,
1937. 33, 8b-26.
110. Bade, W. L., Drude-model calculation of dispersion forces. I. general
theory. J. Chem. Phys., 1957. 27(6), 1280–1284.
111. Bade, W. L., and J. G. Kirkwood, Drude-model calculation of dispersion
forces. II. The linear lattice. J. Chem. Phys., 1957. 27(6), 1284–1288.
112. Bade, W. L., Drude-model calculation of dispersion forces. III. the
fourth-order contribution. J. Chem. Phys., 1958. 28(2), 282–284.
113. Amos, A. T., Bond properties using a modern version of the Drude
model. Int. J. Quantum Chem., 1996. 60(1), 67–74.
114. Wang, F., and K. D. Jordan, Application of a Drude model to the binding
of excess electrons to water clusters. J. Chem. Phys., 2002. 116(16),
6973–6981.
115. Dick, B. G., and A. W. Overhauser, Theory of the dielectric constants of
alkali halide crystals. Phys. Rev., 1958. 112(1), 90.
116. Hanlon, J. E., and A. W. Lawson, Effective ionic charge in alkali halides.
Phys. Rev., 1959. 113(2), 472.
117. Jacucci, G., I. R. McDonald, and K. Singer, Introduction of the shell model
of ionic polarizability into molecular dynamics calculations. Phys. Lett.
A, 1974. 50(2), 141–143.
118. Lindan, P. J. D., and M. J. Gillan, Shell-model molecular-dynamics
simulation of superionic conduction in CAF2. J. Phys.: Condens. Matter,
1993. 5(8), 1019–1030.
119. Mitchell, P. J., and D. Fincham, Shell-model simulations by adiabatic
dynamics. J. Phys.: Condens. Matter, 1993. 5(8), 1031–1038.
120. Lindan, P. J. D., Dynamics with the shell-model. Mol. Simul., 1995. 14(4–
5), 303–312.
121. Hoye, J. S., and G. Stell, Dielectric theory for polar molecules with
fluctuating polarizability. J. Chem. Phys., 1980. 73(1), 461–468.
122. Pratt, L. R., Effective field of a dipole in non-polar polarizable fluids.
Mol. Phys., 1980. 40(2), 347–360.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

References 227

123. Lado, F., Molecular theory of a charged particle in a polarizable


nonpolar liquid. J. Chem. Phys., 1997. 106(11), 4707–4713.
124. Cao, J., and B. J. Berne, Theory of polarizable liquid crystals: Optical
birefringence. J. Chem. Phys., 1993. 99(3), 2213–2220.
125. Noskov, S. Y., G. Lamoureux, and B. Roux, Molecular dynamics study of
hydration in ethanol-water mixtures using a polarizable force field. J.
Phys. Chem. B, 2005. 109(14), 6705–6713.
126. Anisimov, V. M., et al., Polarizable empirical force field for the primary
and secondary alcohol series based on the classical drude model. J.
Chem. Theor. Comput., 2007. 3(6), 1927–1946.
127. Lopes, P. E. M., et al., Polarizable empirical force field for aromatic
compounds based on the classical drude oscillator. J. Phys. Chem. B,
2007. 111(11), 2873–2885.
128. Saint-Martin, H., C. Medina-Llanos, and I. Ortega-blake, nonadditivity
in an analytical intermolecular potential - the water-water interaction.
J. Chem. Phys., 1990. 93(9), 6448–6452.
129. Saint-Martin, H., et al., A mobile charge densities in harmonic
oscillators (MCDHO) molecular model for numerical simulations: The
water-water interaction. J. Chem. Phys., 2000. 113(24), 10899–10912.
130. de Leeuw, N. H., and S. C. Parker, Molecular-dynamics simulation of
MgO surfaces in liquid water using a shell-model potential for water.
Phys. Rev. B, 1998. 58(20), 13901–13908.
131. van Maaren, P. J., and D. van der Spoel, Molecular dynamics simulations
of water with novel shell-model potentials. J. Phys. Chem. B, 2001.
105(13), 2618–2626.
132. Yu, H. B., T. Hansson, and W. F. van Gunsteren, Development of a simple,
self-consistent polarizable model for liquid water. J. Chem. Phys., 2003.
118(1), 221–234.
133. Lamoureux, G., A. D. MacKerell, and B. Roux, A simple polarizable
model of water based on classical Drude oscillators. J. Chem. Phys.,
2003. 119(10), 5185–5197.
134. Lamoureux, G., et al., A polarizable model of water for molecular
dynamics simulations of biomolecules. Chem. Phys. Lett., 2006. 418(1–
3), 245–249.
135. Yu, W., et al., Six-site polarizable model of water based on the classical
Drude oscillator. J. Chem. Phys., 2013. 138(3), 034508.
136. Lamoureux, G., and B. Roux, Absolute hydration free energy scale for
alkali and halide ions established from simulations with a polarizable
force field. J. Phys. Chem. B, 2006. 110(7), 3308–3322.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

228 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

137. Whitfield, T. W., et al., Theoretical study of aqueous solvation of K+


comparing ab initio, polarizable, and fixed-charge models. J. Chem.
Theor. Comput., 2007. 3(6), 2068–2082.
138. Yu, H., et al., Simulating monovalent and divalent ions in aqueous
solution using a Drude polarizable force field. J. Chem. Theory Comput.,
2010. 6(3), 774–786.
139. Luo, Y., et al., Simulation study of ion pairing in concentrated aqueous
salt solutions with a polarizable force field. Faraday Discuss., 2013.
160, 135–149.
140. Lu, Z. Y., and Y. K. Zhang, Interfacing ab initio quantum mechanical
method with classical Drude osillator polarizable model for molecular
dynamics simulation of chemical reactions. J. Chem. Theor. Comput.,
2008. 4(8), 1237–1248.
141. Vorobyov, I., et al., Additive and classical drude polarizable force fields
for linear and cyclic ethers. J. Chem. Theor. Comput., 2007. 3(3), 1120–
1133.
142. Harder, E., et al., Understanding the dielectric properties of liquid
amides from a polarizable force field. J. Phys. Chem. B, 2008. 112(11),
3509–3521.
143. Harder, E., et al., Atomic level anisotropy in the electrostatic modeling
of lone pairs for a polarizable force field based on the classical Drude
oscillator. J. Chem. Theory Comput., 2006. 2(6), 1587–1597.
144. Zhu, X., and A. D. MacKerell, Jr., Polarizable empirical force field for
sulfur-containing compounds based on the classical Drude oscillator
model. J. Comput. Chem., 2010. 31(12), 2330–2341.
145. Baker, C. M., and A. D. MacKerell, Jr., Polarizability rescaling and atom-
based Thole scaling in the CHARMM Drude polarizable force field for
ethers. J. Mol. Model, 2010. 16(3), 567–576.
146. Lopes, P. E. M., G. Lamoureux, and A. D. MacKerell, Jr., Polarizable em-
pirical force field for nitrogen-containing heteroaromatic compounds
based on the classical Drude oscillator. J. Comput. Chem., 2009. 30,
1821–1838.
147. Baker, C. M., V. M. Anisimov, and A. D. MacKerell, Jr., Development of
CHARMM polarizable force field for nucleic acid bases based on the
classical Drude oscillator model. J. Phys. Chem. B, 2011. 115(3), 580–
596.
148. He, X., P. E. M. Lopes, and A. D. MacKerell, Polarizable empirical force
field for acyclic polyalcohols based on the classical Drude oscillator.
Biopolymers, 2013. 99(10), 724–738.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

References 229

149. Chowdhary, J., et al., A Polarizable force field of dipalmitoylphos-


phatidylcholine based on the classical Drude model for molecular
dynamics simulations of lipids. J. Phys. Chem. B, 2013. 117, 9142–9160.
150. Lamoureux, G., and B. Roux, Modeling induced polarization with
classical Drude oscillators: Theory and molecular dynamics simulation
algorithm. J. Chem. Phys., 2003. 119(6), 3025–3039.
151. Savelyev, A., and A. D. MacKerell, Balancing the interactions of ions,
water, and DNA in the Drude polarizable force field. J. Phys. Chem. B,
2014. Article ASAP, DOI 10.1021/jp503469s.
152. Jacucci, G., I. R. McDonald, and A. Rahman, Effects of polarization on
equilibrium and dynamic properties of ionic systems. Phys. Rev. A,
1976. 13(4), 1581.
153. Sangster, M. J. L., and M. Dixon, eds., Advances in Physical Chemistry
(Prigogine, I., ed.), vol. 25. 1976, Wiley-Interscience.
154. Mahoney, M. W., and W. L. Jorgensen, Rapid estimation of electronic
degrees of freedom in Monte Carlo calculations for polarizable models
of liquid water. J. Chem. Phys., 2001. 114(21), 9337–9349.
155. Sangster, M. J. L., and M. Dixon, Interionic potentials in alkali halides
and their use in simulations of the molten salts. Adv. Phys., 1976. 25(3),
247–342.
156. van Belle, D., and S. J. Wodak, Extended Lagrangian formalism applied
to temperature control and electronic polarization effects in molecular
dynamics simulations. Comput. Phys. Commun., 1995. 91(1–3), 253–
262.
157. Tuckerman, M. E., and G. J. Martyna, Understanding modern molecular
dynamics: Techniques and applications. J. Phys. Chem. B, 2000. 104(2),
159–178.
158. Martyna, G. J., et al., Explicit reversible integrators for extended
systems dynamics. Mol. Phys., 1996. 87(5), 1117–1157.
159. Sprik, M., and M. L. Klein, A polarizable model for water using
distributed charge sites. J. Chem. Phys., 1988. 89(12), 7556–7560.
160. Car, R., and M. Parrinello, Unified approach for molecular dynamics and
density-functional theory. Phys. Rev. Lett., 1985. 55(22), 2471.
161. van Belle, D., et al., Molecular-dynamics simulation of polarizable water
by an extended Lagrangian method. Mol. Phys., 1992. 77(2), 239–255.
162. Hoover, W. G., Canonical dynamics: Equilibrium phase-space distribu-
tions. Phys. Rev. A, 1985. 31(3), 1695.
163. Ryckaert, J. P., G. Ciccotti, and H. J. C. Berendsen, Numerical integration
of Cartesian equations of motion of a system with constraints:
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

230 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

Molecular dynamics of n-alkanes. J. Comput. Phys., 1977. 23(3), 327–


341.
164. Brooks, B. R., et al., CHARMM: The biomolecular simulation program. J.
Comput. Chem., 2009. 30, 1545–1614.
165. Jiang, W., et al., High-performance scalable molecular dynamics
simulations of a polarizable force field based on classical Drude
oscillators in NAMD. J. Phys. Chem. Lett., 2011. 2(2), 87–92.
166. Sherwood, P., et al., QUASI: A general purpose implementation of the
QM/MM approach and its application to problems in catalysis. J. Mol.
Struct.: Theochem., 2003. 632, 1–28.
167. Lemkul, J. A., Roux, B., van der Spoel, D and MacKerell, A. D., Jr.,
Implementation of Extended Lagrangian Dynamics in GROMACS for
Polarizable Simulations Using the Classical Drude Oscillator Model, In
Press. J. Comput. Chem., 2015. 36, 1480–1486.
168. Friedrichs, M. S., et al., Accelerating molecular dynamic simulation on
graphics processing units. J. Comput. Chem., 2009. 30(6), 864–872.
169. Martyna, G. J., D. J. Tobias, and M. L. Klein, Constant-pressure
molecular-dynamics algorithms. J. Chem. Phys., 1994. 101(5): 4177–
4189.
170. Darden, T. A., D. York, and L. G. Pedersen, Particle mesh Ewald: An
Nlog(N) method for Ewald sums in large systems. J. Chem. Phys., 1993.
98, 10089–10092.
171. Anisimov, V. M., et al., Determination of electrostatic parameters for a
polarizable force field based on the classical Drude oscillator. J. Chem.
Theory Comput., 2005. 1(1), 153–168.
172. Connolly, M. L., Analytical molecular surface calculation. J. Appl.
Crystallogr., 1983. 16(OCT), 548–558.
173. Bayly, C. I., et al., A well-behaved electrostatic potential based method
using charge restraints for deriving atomic charges - the resp model. J.
Phys. Chem., 1993. 97(40), 10269–10280.
174. Reed, A. E., R. B. Weinstock, and F. Weinhold, Natural population
analysis. J. Chem. Phys., 1985. 83(2), 735–746.
175. Mulliken, R. S., Electronic population analysis on LCAO–MO molecular
wave functions. I. J. Chem. Phys., 1955. 23(10), 1833–1840.
176. Miller, K. J., Additivity methods in molecular polarizability. J. Am. Chem.
Soc., 1990. 112(23), 8533–8542.
177. Bader, R., Atoms in Molecules: A Quantum Theory. 1994, USA: Oxford
University Press.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

References 231

178. Lee, C. T., W. T. Yang, and R. G. Parr, Development of the colle-salvetti


correlation-energy formula into a functional of the electron-density.
Phys. Rev. B, 1988. 37(2), 785–789.
179. Becke, A. D., Density-functional exchange-energy approximation with
correct asymptotic-behavior. Phys. Rev. A, 1988. 38(6), 3098–3100.
180. Huang, L., and B. Roux, Automated force field parameterization for
nonpolarizable and polarizable atomic models based on ab initio
target data. J. Chem. Theory Comput., 2013. 9(8), 3543–3556.
181. Vorobyov, I. V., V. M. Anisimov, and A. D. MacKerell, Polarizable
empirical force field for alkanes based on the classical drude oscillator
model. J. Phys. Chem. B, 2005. 109(40), 18988–18999.
182. Yin, D. X., and A. D. Mackerell, Combined ab initio empirical approach
for optimization of Lennard-Jones parameters. J. Comput. Chem., 1998.
19(3), 334–348.
183. Chen, I. J., D. Yin, and A. D. MacKerell, Combined ab initio/empirical
approach for optimization of Lennard-Jones parameters for polar-
neutral compounds. J. Comput. Chem., 2002. 23(2), 199–213.
184. Baker, C. M., et al., Accurate calculation of hydration free energies
using pair-specific Lennard-Jones parameters in the CHARMM Drude
polarizable force field. J. Chem. Theory Comput., 2010. 6(4), 1181–
1198.
185. Allen, F. H., The Cambridge Structural Database: A quarter of a million
crystal structures and rising. Acta Crystallogr. Sec. B Struct. Sci., 2002.
58, 380–388.
186. Pulay, P., et al., Systematic ab initio gradient calculation of molecular
geometries, force constants, and dipole moment derivatives. J. Am.
Chem. Soc., 1979. 101(10), 2550–2560.
187. MacKerell, A. D., Jr., Contribution of the intrinsic mechanical energy of
the phosphodiester linkage to the relative stability of the A, BI and BII
forms of duplex DNA. J. Phys. Chem. B, 2009. 113, 3235–3244.
188. Hart, K., et al., Optimization of the CHARMM additive force field for
DNA: Improved treatment of the BI/BII conformational equilibrium.
J. Chem. Theory Comput., 2012. 8(1), 348–362.
189. Baker, C. M., and A. D. MacKerell, Polarizability rescaling and atom-
based Thole scaling in the CHARMM Drude polarizable force field for
ethers. J. Mol. Model., 2010. 16(3), 567–576.
190. Harder, E., A. D. MacKerell, and B. Roux, Many-body polarization effects
and the membrane dipole potential. J. Am. Chem. Soc., 2009. 131(8),
2760–2761.
January 29, 2016 11:25 PSP Book - 9in x 6in 06-Qiang-Cui-c06

232 Explicit Inclusion of Induced Polarization in Atomistic Force Fields

191. Patel, D. S., X. He, and A. D. MacKerell, Jr., Polarizable empirical force
field for hexopyranose monosaccharides based on the classical Drude
oscillator. J. Phys. Chem. B, Submitted.
192. Boulanger, E., and W. Thiel, Solvent boundary potentials for hybrid
QM/MM computations using classical Drude oscillators: A fully
polarizable model. J. Chem. Theory Comput., 2012. 8(11), 4527–4538.
193. Eastman, P., et al., OpenMM 4: A reusable, extensible, hardware
independent library for high performance molecular simulation. J.
Chem. Theory Comput., 2013. 9(1), 461–469.
194. Rick, S. W., and S. J. Stuart, Potentials and algorithms for incorporating
polarizability in computer simulations, in Reviews in Computational
Chemistry (Lipkowitz, K. B., and D. B. Boyd, eds.), Wiley-VCH: Hoboken,
NJ. 2002, p. 89–146.
195. Jo, S., et al., CHARMM-GUI: a web-based graphical user interface for
CHARMM. J. Comput. Chem., 2008. 29(11), 1859–1865.
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Chapter 7

Multipolar Force Fields for Atomistic


Simulations

Tristan Bereaua and Markus Meuwlyb,c


a Department of Chemistry, University of Basel,

Klingelbergstr. 80, CH 4056, Switzerland


b Department of Chemistry, University of Basel,

Klingelbergstr. 80, CH 4056, Switzerland


c Department of Chemistry, Brown University, Providence RI, USA

[email protected], [email protected]

7.1 Introduction

From pioneering in vacuo, picosecond-timescale investigations of


proteins [1], atomistic simulations have gradually matured into a
scientific workhorse for (bio)molecular systems [2–9]. By averaging
over the electronic degrees of freedom, atomistic models idealize
the system by using a set of empirical interactions potentials [10–
13]. Though approximate, atomistic force fields have increasingly
become exquisitely finely tuned to reproduce ab initio and ex-
perimental properties [14–17]. Recent developments in the field
have highlighted their many successes, e.g., insight and predictions
in drug discovery [18], accurate thermodynamic calculations of

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

234 Multipolar Force Fields for Atomistic Simulations

organic molecules [19], or beyond-microsecond-timescale protein


simulations [20]. Yet, atomistic simulations have also revealed their
limitations. Recent access to powerful computers exhibits force-
field inaccuracies that have long-time, spurious repercussions [21].
Beyond the quality of the parametrization of the potential energy
surfaces (PESs), their functional forms are based on crucial assump-
tions. Most current-generation force fields represent intermolecular
interactions via pairwise Lennard–Jones interactions and point-
charge (PC) electrostatics [14–17]. For instance, polarizable force
fields, which reproduce the response of a charge distribution to a
local change in the electric field, have become increasingly popular
for key systems [22], e.g., cation–π interactions [23] only recently.
Atomistic force fields traditionally employ PC electrostatics,
which describe the charge distribution of a molecule using atom-
centered partial charges, interacting with one another using
Coulomb’s law. Despite the problematic range of 1/r interactions,
computational methods (e.g., Ewald summation) have been devised
to efficiently compute long-range electrostatics in periodic systems
[24, 25]. The success of atomistic force fields is due in no small part
to the effectiveness of PC electrostatics in approximating the charge
distribution. However, limitations become apparent in specific sys-
tems, e.g., halogens are notoriously challenging for PC force fields, as
they fail to correctly describe the σ hole in front of the atom [26, 27].
In general, the lack of anisotropy limits the ability to model specific
chemical interactions, such as the need for dummy atoms in certain
water models to better reproduce hydrogen-bond interactions [28,
29]. To this end, multipolar (MTP) electrostatics provide a natural
and systematic extension to Coulomb interactions, where anisotropy
is included as a series expansion with distinct symmetries. This
chapter focuses on their derivation, implementation in simulations,
and applicability to molecular systems.

7.2 Describing Electrostatics in Atomistic Force Fields

The present section introduces MTP electrostatics in the context of


molecular simulations. It first motivates and briefly describes the
derivation of MTPs and their link to symmetries in the system’s
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Describing Electrostatics in Atomistic Force Fields 235

electrostatic potential; the role of axis systems in describing non-


monopole moments; and finally the further possibility to reproduce
the conformational dependence of a molecule’s charge distribution.

7.2.1 Multipolar Interactions


Placing point charges on every atom of a molecule can only do so
much in reproducing its charge distribution ρ. Emulating strongly
anisotropic features—e.g., lone pairs, hydrogen bonding, π -electron
density—may require more elaborate schemes. Going beyond the
simple PC approximation can be approached both naturally and
systematically by considering the integral for the electrostatic
potential (ESP)

ρ(r )
4π ε0 (r) = dr , (7.1)
|r − r |
where r and r are spatial variables. For a charge distribution
confined to a sphere of radius r  around an arbitrary origin and
an observation point outside the sphere (r > r  ), one can expand
1/|r − r | in powers of r  /r < 1 [30]. The ESP can thereby be
represented by an expansion in spherical harmonics Ylm (θ, φ)—a
set of orthonormal functions that depends on the order l and its
projection m, and the spherical coordinates θ and φ—to yield
∞  
 l
Qlm 4π
4π ε0 (r) = Ylm (θ, φ), (7.2)
l=0 m=−l
r l+1 2l + 1
while the spherical MTP moments are defined by
 
   l 4π
Qlm = dr ρ(r )(r ) Y ∗ (θ  , φ  ) (7.3)
2l + 1 lm
and can therefore be determined from the density ρ. For l ≤ 2,
the Qlm coefficients reduce to linear combinations of the familiar q
(monopole scalar), μ (dipole vector), and αβ (quadrupole second-
rank tensor) expressed in Cartesian coordinates (see field-line
representations in Fig. 7.1). A more convenient linear combination
of Cartesian coordinates expresses the spherical MTP moments in
terms of cos mφ and sin mφ, rather than the original exp ±i mφ [31].
The new linear combination, indexed by κ = {0, 1c, 1s, . . . , lc, ls}
for index l (c and s refer to cos and sin), has the added advantage
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

236 Multipolar Force Fields for Atomistic Simulations

(a) (b) (c)

Figure 7.1 Representations of the (a) monopole, (b) dipole, and (c)
quadrupole fields. The anisotropy of the higher MTPs provides the means
for an improved description of the ESP.

Table 7.1 List of spherical harmonics and MTP


moments expressed in Cartesian coordinates—up to
quadrupole (i.e., l = 2) [31]

l κ rl 4π
2l+1 Ylκ (θ, φ) Qlκ

0 0 1 q
1 0 z μz
1 1c x μx
1 1s y μy
2 (3z − r ) zz
1 2 2
2 0

2 1c 3x z √2 x z
3

2 1s 3yz √2 yz
3
√ 2
2 3(x − y ) − yy )
1 2 √1 ( x x
2 2c 3

2 2s 3x y √2 x y
3

of containing only real components. While the spherical harmonics


and MTP moments can be found elsewhere (e.g., [30, 31]), the
coefficients up to quadrupole are summarized in Table 7.1.
An explicit development of Eq. 7.2 in terms of the Cartesian
coordinates from Table 7.1 yields
q μα Rα 1 3Rα Rβ − R 2 δαβ
4π ε0 (r) = + 3
+ αβ + · · · , (7.4)
R R 3 R5
1
(r) = qT − μα Tα + αβ Tαβ + · · · , (7.5)
3
where 1/R ≡ 1/|r − r |, the Einstein summation convention is
applied, and Kronecker delta, δαβ , is 1 only if α = β, 0 otherwise.
The total ESP can be partitioned into a sum of multipolar potentials
l (e.g., 0 is the monopolar potential), leading to the concept of
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Describing Electrostatics in Atomistic Force Fields 237

a “distributed multipole” expansion. Equation 7.5 provides a more


compact notation in terms of the T tensors describing the geometry
of the multipolar potential. A simple Taylor expansion of the original
formulation of the ESP (i.e., Eq. 7.1) shows that the T tensors
correspond to the various partial derivatives of 1/R.
In terms of running a molecular dynamics (MD) simulation, the
quantity of interest is the interaction potential, U . This quantity is
defined by the work done on an MTP Qlκ brought from infinity to a
point r in a region populated by the (multipolar) potential , U =
Qlκ (derived from first-order perturbation theory [30, 31]). Thus,
the interaction energy between sites (e.g., atoms, molecules) a and b
can be written as  
1 a 1 b
U = q T − μα Tα + αβ Tαβ + · · ·
ab a a
q + μα + αβ + · · · ,
b b
3 3
(7.6)
where the superscripts a and b over the MTP parameters refer to
the interaction site (usually an atom) they belong to. Evidently, a
truncation of the MTP expansions to l = 0 reduces to the familiar
Coulomb interaction, U ab = q a q b /4π ε0 R. In general, the interaction
energy can be compactly written as U ab = (Qa )T T ab Qb , where Qa is
a vector containing all MTP moments of site a and T ab forms a matrix
of T tensors—as elegantly presented in the AMOEBA implementation
[32].
For a given interaction between two MTP moments Qat and
b
Qu on sites a and b, respectively, the tensor element describing
the geometry as Ttuab (q) is required, where q forms a set of basis
coordinates (vide infra). In general, the interaction will give rise to
both forces and torques, where the K th component of the force and
torque are expressed in the form

F Ka (q) = −Qat Qbu T ab (q), (7.7)
∂ A K tu

GaK (q) = −Qat Qbu a Ttuab (q), (7.8)
∂θ K
and A K and θ Ka correspond to the translational and rotational (i.e.,
Euler angles) coordinates of the rigid body at site a, respectively. A
detailed account of the Ttuab (q) elements, as well as forces and torque
expressions, can be found in previous work [31, 33–37]. Figure 7.2
shows a cartoon representation of a torque acting on an atom and a
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

238 Multipolar Force Fields for Atomistic Simulations

Figure 7.2 Cartoon representation of the torque Ga applied on MTP site a


(colored in orange). The torque can be propagated by applying forces onto
the neighboring atoms a1 , a2 , and a3 —leading to the forces F a1 , F a2 , and F a3
[38]. The arrows shown on the rightmost hydrogen depict its local reference
axis system, as defined from Ref. [39]. Adapted with permission from T.
Bereau, C. Kramer, M. Meuwly, J. Chem. Theory Comp. 9(12), 5450 (2013).
Copyright (2013) American Chemical Society.

possible (i.e., not unique) way to propagate it in terms of forces on


neighboring atoms.
From Table 7.1, one realizes that placing MTPs up to quadrupoles
on a given site will yield nine independent parameters in spherical
coordinates (i.e., one for the monopole, three for the dipole, and
five for the traceless second-rank tensor). However, the main
computational hurdle in MD simulations is the force calculation.
Although Eq. 7.6 refers to the pairwise interaction potential, it shows
that the associated force (and of course energy) will consist of n × n
independent terms, where n is the number of MTP coefficients.
As such, the interaction between two MTP sites, described up to
quadrupole, will involve 9 × 9 = 81 terms—to be put in perspective
with the single term prescribed by the Coulomb interaction in
standard PC force fields. This certainly provides one major reason
why MTP force fields have not become routine in the MD community.
Most equations so far included an infinite collection of terms: a
distributed MTP expansion without truncation. Formally, the infinite
sum in Eq. 7.4 is capable of reproducing the potential with arbitrary
accuracy, given the observation point, r, is located far enough from
the molecule (recall that the above-mentioned expansion requires
r  /r < 1—the direct consequence of the convergence properties
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Describing Electrostatics in Atomistic Force Fields 239

of 1/R). In practice, the expansion is often truncated at low order


(e.g., l ≤ 2) in existing MTP implementations (e.g., [32, 38, 40]). This
is naturally constrained by the computational investment involved.
On the other hand, numerous studies have pointed to the rapid
convergence of improvements provided by every additional MTP
order [39, 41–44], such that going far beyond l = 2 may, in fact,
be difficult to justify.

7.2.2 Reference Axis Systems and Symmetries


Unlike monopoles, all higher MTPs are intrinsically anisotropic (see
Fig. 7.1). Their orientation must be defined with respect to an axis
system. Evidently, the choice of an axis system has no influence
on the physics of the system and its choice may thus appear one
guided by convenience. However, not all axis systems are equally
advantageous, as, depending on the symmetry of the molecule, the
number of nonzero MTPs differs. Beyond its aesthetic appeal, it
stands to reason that such a scheme should help reduce the number
of interactions between two MTP sites (see Eq. 7.6). A smaller
number of concurrent MTP interactions may also help stabilize the
torque propagation, though this remains very much unclear at the
moment.
Following the notation of Stone [31], a local axis system {wa } =
{x , ya , za } for sites a and b is defined. These sets combined with
a

the intersite unit vector between a and b, R̂, define the so-called
direction cosines q = {R, wa · R̂, wb · R̂, wa · wb }, which are the
basis for the computation of all MTP interactions. It is important
to mention that R̂, wa , and wb can be represented in either of the
three coordinate systems, the global frame and the two local frames,
through suitable (linear) transformations. Computing an interaction
between two sites can be done following two different strategies:
(i) rotating the two MTP coefficients into the global frame—thereby
aligning wa and wb with R̂—or (ii) expressing the local axis systems
of the two sites in the basis of the global frame.
As an example, the dipole-charge interaction energy between
the α−component of a dipole moment Qa1α on site a and a charge
Qb00 on site b is considered. From Eq. 7.6, one can write the energy
ab
as U 1α00 = Qa1α Qb00 R −2 (waα · R̂). The expressions for the forces
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

240 Multipolar Force Fields for Atomistic Simulations

and torques are detailed elsewhere [31, 38, 45]. The advantage
of rotating the MTP coefficients into the global frame is that the
geometric term waα · R̂ becomes trivial, simplifying greatly the
calculation. On the other hand, expressing the two sites’ local axis
systems in the basis of the global frame does not provide such
simplified expressions. This method does shine, however, when
dealing with MTP parameters that are zero: While rotating a zero
MTP coefficient into the global frame will most likely set it to
a nonzero value, computing the interaction with respect to the
local frame will keep its zero value, thereby eliminating the entire
interaction term.
As such, using symmetry arguments to improve computational
efficiency requires the MTPs to be expressed in the local frame
during the computation. This method has shown successful in
dealing with small, rigid molecules, where one aligns the local
axis system with the molecule’s principal axes [33, 40–43, 46–51].
For instance, an MTP representation of a diatomic molecule up to
quadrupole only requires three terms: q, μz , and zz , where z is
directed along the axis of the molecule. More recent efforts have
focused on the symmetries provided by the immediate environment
of a given atomic site, the method to larger and flexible molecules:
first as simple geometric rules from neighboring atoms [32, 52–55].
Later, Kramer et al. introduced a systematic set of reference axis
systems contingent on the atom’s connectivity and chemical nature
[39] (see Fig. 7.2, for example), which was recently applied in MD
simulations [38, 56].

7.2.3 Fluctuating and Conformationally Dependent


Multipoles
Up to now it was assumed that MTPs are static coefficients
which recreate a unique charge distribution irrespective of the
geometry of the molecule. This assumption may be challenged
at two distinct levels: (i ) (Thermal) fluctuations will invariably
distort the molecule, possibly altering the true charge distribution
in a significant way; and (ii ) the flexibility of certain groups—
e.g., rotatable bonds—may yield drastic differences in the atoms’
geometry [57]. The description of fluctuating, or conformationally
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Describing Electrostatics in Atomistic Force Fields 241

dependent, MTPs has been exposed in detail by Elking et al.


[55]. MTP coefficients are expressed as a function of one/multiple
internal coordinates η (e.g., bond length, angle), Qlκ
a
(η), in the site’s
local axis system. As a corollary, ∂ Qlκ (η)/∂η = 0 would reduce
a

to the static MTPs considered so far. Using the chain rule, one can
incorporate fluctuating MTPs into the force and torque expressions
a
by adding derivatives of Qlκ (η)
∂  ∂ Qa ∂η
Qlκa
= lκ
, (7.9)
∂ AK η
∂η ∂ AK
∂ a  ∂ Qa ∂η
a Qlκ =

, (7.10)
∂θ K η
∂η ∂θ Ka
to Eqs. 7.7 and 7.8, respectively. In their development, Elking et al.
expressed the fluctuating MTPs using a truncated Taylor series
 ∂ Qa
a
Qlκ (η) = Qlκ
a
(η0 ) + (η − η0 ) lκ , (7.11)
η
∂η0
where η0 represents the internal coordinates of the reference
structure—here, the optimized equilibrium geometry.
Due to the added complexity, fluctuating MTPs have been the
subject of only a few studies. Not only do they require increased
computational investment with respect to static MTPs, but such an
approach entails even more parameters that need to be fitted. The
simplest case where fluctuating MTPs may arise is in a diatomic
molecule: Qlκa
(η) will be a simple linear function of η = d which
is the interatomic distance. This was explicitly studied for free
and bound carbon monoxide (CO) in myoglobin; see also Fig. 7.3
[40, 48]. For larger systems, it was demonstrated that fluctuating
MTPs, unlike static MTPs, could accurately reproduce the ab initio
atomic forces of hydrogen-bonded dimers [55, 58]. Popelier and
coworkers have incorporated fluctuating MTPs for the accurate
characterization of small molecules, using a polarizable force field
parametrized from machine-learning techniques [59–61].
While the reproduction of fluctuating MTPs certainly helps
describing a charge distribution with high accuracy, the sheer
number of parameters—up to nine static MTP coefficients (Table
7.1) as well as the coefficients along all internal coordinates (Eq.
7.11) for each site—can seem daunting. As a compromise, we point
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

242 Multipolar Force Fields for Atomistic Simulations

Figure 7.3 Fluctuating molecular MTP moments for CO: (a) dipole, (b)
quadrupole, (c) octupole, (d) hexadecapole. The symbols represent different
MTP models, while the solid line depicts ab initio—CCSD(T)—values. See
Ref. [40] for more details. Reprinted from Biophys. J., 94, N. Plattner and
M. Meuwly, The role of higher CO-multipole moments in understanding the
dynamics of photodissociated carbonmonoxide in myoglobin, 2505–2515,
Copyright (2008), with permission from Elsevier.

out efforts targeted at fitting static MTPs to several conformations of


a molecule at once [39, 56] (MTP parametrization will be described
in more details in Section 7.4). As a general rule, the accuracy
provided by MTP electrostatics comes at the cost of increased force-
field specificity, and thus lower transferability. This decrease in
transferability occurs not only between identical chemical moieties
on distinct molecules, but also across conformations of the same
compound.

7.3 Examples of MTP Implementations

In the following a number of force fields and molecular mechanics


implementations that rely on MTP electrostatics are discussed. Two
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Examples of MTP Implementations 243

main approaches are distinguished: discrete (i.e., point-like objects)


and Gaussian MTPs.

7.3.1 Discrete Multipoles


Most discrete MTP implementations are similar in many respects,
e.g., limited expansion up to order 2–4, spherical harmonic de-
scription, interaction calculation in the atoms’ local frames. Hence,
what distinguishes these force fields and implementations from
each other is primarily in how they treat the other interaction
terms. Most importantly, static multipoles only consist of a first-
order perturbation of the electrostatic operator. Describing second-
order effects leads to polarizability—the charge density’s ability
to respond to an external electric field—a critical aspect of
certain systems (e.g., dielectric changes) [62–64]. Here, possible
implementations are ordered in terms of increased overall accuracy
(and thus computational investment and larger parametrization
effort). Given the heavy requirements of such refined force fields,
it is important to point out that “more is not always better,” and
each system of interest will call for a fine balance of accuracy and
statistical sampling.

Non-polarizable force fields Static MTP electrostatic descriptions


with standard van der Waals interactions have been used extensively
to (i) probe how MTPs alone can improve force fields and (ii) provide
implementations that are efficient, both in terms of computational
speed and parametrization work. Efforts in this perspective started
by incorporating a single MTP site on the molecule (e.g., center
of mass). Leslie extended the DL POLY package to compute MTP
interactions [43], where all bonded interactions were fixed, and
MTPs were aligned along the molecule’s principle axes of inertia
and computed using Particle-Mesh Ewald. The implementation was
applied to study the liquid properties of water and hydrogen fluoride
[41, 42]. Meuwly and coworkers have more recently studied the
impact of molecular MTPs on small molecules, focusing mostly
on spectroscopic properties [40, 47], but also thermodynamic and
dynamical quantities (see Section 7.6) [49–51]. By placing MTPs
exclusively on a single molecule (e.g., ligand, solute), the computa-
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

244 Multipolar Force Fields for Atomistic Simulations

tional investment is reduced (see Section 7.5.3), yet the effects on


the environment can be significant. Extending this approach, a more
recent implementation provides an atomic-based MTP description
suited to long and flexible molecules [38]. MTPs improved the
thermodynamic properties of certain halogenated compounds by
better reproducing the σ hole of the molecule. In addition, the free-
energy calculation of a brominated ligand (with MTPs) and a protein
target showed minimal extra computer investment, compared to
standard PC simulations. By merely extending PC descriptions, these
simulations limit the parametrization effort to the MTPs themselves
(see Section 7.4) and the Lennard–Jones coefficients, which need to
be (re)adjusted against the new electrostatics.a

Polarizable force fields Including polarizability with MTP electro-


statics provides an additional step forward in accuracy. Polarization
is often modeled by means of point-induced dipoles that are
self-iteratively converged at each time step, though the Drude
oscillator—making use of two point charges joined by a spring
to model dipole induction—has proven a viable alternative [62–
65]. The AMOEBA force field, developed by Ren and Ponder, provides
an atomic multipole-based polarizable force field. It is based on
atomic static multipoles (up to quadrupoles) parametrized from a
distributed multipole analysis (DMA; see Section 7.4), an induced-
dipole representation, and pairwise additive van der Waals inter-
actions through a 14-7 buffered interaction—originally developed
by Halgren [66]. A number of systems have been parametrized and
studied, such as water [53], ions [67–71], organic molecules [72–
75], proteins [32, 76], and protein-ligand binding [77–80]. AMOEBA
has also shown promising perspectives in better refining X-ray
crystallographic data [81, 82].

Accurate energy-decomposition schemes Other force fields build on


both atomic multipoles and polarizability to provide an accurate
decomposition of intermolecular energies. The sum of interaction
between fragments ab initio (SIBFA) [83, 84] decomposes the

a Ingeneral, any change in the electrostatics will also affect the dihedrals. This
effect may be important for molecules with rotatable bonds but is, so far, largely
unexplored.
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Examples of MTP Implementations 245

interaction, E int , in a number of terms

E int = E MTP + E rep + E pol + E ct + E disp (+E LF ), (7.12)

where E MTP refers to the short-range, penetration-corrected MTP


energy, E rep is the short-range repulsion, E pol refers to polarization,
E ct is the charge-transfer energy, E disp the dispersion contribution,
and E LF describes the ligand-field correction (see Ref. [85] for more
details). SIBFA is parametrized entirely from ab initio data, and uses
a fragment-based approach. The MTPs are parametrized using DMA
and include short-range corrections to account for the exponential
decay of the ab initio integrals at short range. One specificity of the
MTP description of SIBFA is the presence of MTP sites not only on
atoms, but also on bond midpoints—reminiscent of “bond functions”
used by Tao and Klemperer in electronic structure calculations [86].
SIBFA has been used to compute intermolecular energy calculations
for a variety of systems [87–94]. Currently, no MD implementation
is available and the computational effort in using SIBFA has, so far,
limited applications to small systems.

7.3.2 Gaussian Multipoles


One of the limitations of the above-mentioned MTP description is its
inability to describe intermolecular energies in a meaningful fashion
when the charge distributions overlap—the so-called penetration
error [31]. Damping functions can provide a correction by smoothly
tapering the interactions at short distances [95–100]. Alternatively,
modeling the electron density itself has shown promising in better
describing intermolecular energies at short distances (see, e.g., the
pioneering work of Sokalski and Poirier [101]). Instead of (on-
site) discrete PC or MTP coefficients, the charge density can be
decomposed in terms of a linear combination of Gaussian functions
[102, 103]. Representing the charge distribution by a set of Gaussian
functions on each atom provides both a significant decrease of the
penetration error [84] and reduces to point MTPs at large distances
[102]. Later, the Gaussian multipole model (GMM) represented the
charge density from Slater-type contracted Gaussian functions, of
the form exp(−λr), on each atom. The charge density ρ evaluated
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

246 Multipolar Force Fields for Atomistic Simulations

at point r and nuclear center R is given by [58]



l max  ∗
Qlκ Rlκ (r − R)
ρ(r, R) = ρl (|r − R|; αμ ), (7.13)
l=0 κ
(2l − 1)!!

where Rlκ (r) = r l 4π/2l + 1Ylκ (θ, φ) are so-called solid harmon-
ics (Table 7.1 provides coefficients up to l = 2), and μ runs over
the degree of contraction, Nc (see below). ρl is expressed from a
Gaussian charge density
  Nc  3/2
1 d l αμ2
ρl (r; αμ ) = − cμ exp(−αμ2 r 2 ). (7.14)
r dr μ=1 π

In their implementation, Elking et al. modeled the charge density


by means of Gaussian functions and valence nuclear charges [58].
The electrostatics were parametrized by fitting both the Gaussian
MTP moments, Qlκ , and the Slater-type exponent parameter λ of
each atom to the ESP around the molecule. The resulting model
provides accurate dimer energies, intermolecular density overlap
integrals (i.e., quantifying penetration effects), and permanent
molecular MTP moments, when compared with ab initio data.
Overall, Gaussian MTPs provide a compromise in terms of both
accuracy and computational performance between discrete MTP
expansions and ab initio calculations [103]. An inclusion of Gaussian
multipoles into the AMOEBA force field has recently been presented
[104].

7.4 Parametrization of MTPs

This section describes two methods to fit static MTP coefficients:


the distributed multipole analysis (DMA) and ESP-based methods.
Empirical methods which rely on experimental measurements of
dipole (e.g., Stark effect [105, 106]) or quadrupole (e.g., Buckingham
cylinder [107]) moments are limited to the molecular MTPs and
difficult to break down in terms of MTP coefficients [31] and are
not covered here. Also, parametrization issues for fluctuating MTPs
are not discussed. For this, the reader is referred to the relevant
literature (see Section 7.2.3).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Parametrization of MTPs 247

7.4.1 Distributed Multipole Analysis


The distributed multipole analysis (DMA) embodies a wavefunction
partitioning method pioneered by Stone and coworkers [31, 108,
109], similar to a number of procedures developed by others [35,
101, 110]. DMA relies on the expansion of the charge density as a
function of a product of basis wavefunctions

ρ(r) = Pst φs (r − ps )φt (r − pt ), (7.15)
st

where Pst refers to the density matrix. It is customary to write


wavefunctions in terms of Gaussian functions
φ = Rlκ (r − a)e−ζ (r−a) ,
2
(7.16)
where φ is centered around site a (e.g., a nucleus), Rlκ is the Carte-
sian representation of the angular dependence of the wavefunction
x k yl zm (where k + l + m characterizes the angular momentum
of the particular orbital), and ζ describes the decay coefficient of
the Gaussian. When dealing with Gaussian basis functions, Eq. 7.15
reduces to the sum of single Gaussians—rather than products—
centered around an intermediate point pi between ps and pt , its
exact location being determined by the decay coefficients of the
two exponentials [111]. The associated solid harmonics Rlκ and
Rl  κ  are both first translated to pi using an addition theorem [112],
providing a linear combination of solid harmonics of ranks up to l
and l  , respectively. A given product of two solid harmonics is then
expressed as a linear combination of solid harmonics using Clebsch–
Gordan coefficients [112]. Finally, we can express the charge density
as a linear combination of terms resembling the functional form of
φ (Eq. 7.16) of ranks from 0 to l + l  . By orthogonality of the solid
harmonics, each term Rkq (r − p) exp(−ζ (r − p)2 ) will generate an
MTP moment Qkq at site p (Eq. 7.3).
We point out to the reader that DMA provides an expansion—
rather than a fit—of the charge density. As such, the individual
MTP coefficients derived from DMA will not depend on the order
of the expansion. Despite its overwhelming use in the field, the
method suffers from being tied to a single conformation: Koch et
al. showed that MTP can be highly conformation dependent [57]
and may thus show a lack of transferability across the distribution
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

248 Multipolar Force Fields for Atomistic Simulations

of conformations. This limitation has motivated the development of


ESP-based fitting methods (Section 7.4.2).

7.4.2 ESP-Based Fitting Methods


More recently, efforts have been made to parametrize MTP
coefficients with respect to the ESP itself [32, 39, 58, 113–115].
From Eq. 7.4, one readily observes the linear dependence of each
MTP coefficient on the ESP. Optimizing MTP coefficients to best
reproduce the ab initio ESP can thus be obtained from a simple
linear least-squares fit over a number of discrete points r( p) around
the molecule. We thus express the target function


χ 2 = min ai r( p) − MTP r( p) , (7.17)
p

where the sum runs over a select list of discrete points, and ai and
MTP represent the value of the ESP generated by the ab initio and
MTP coefficients, respectively. The linearity of the problem allows
us to cast χ 2 into the form Xb = y, where the matrix X represents
all geometrical terms (i.e., the T tensors in Eq. 7.5) sampled on every
grid point, the vector b contains all MTP coefficients, and the vector y
is the collection of ab initio ESP values at every grid point. Moreover,
a number of penalty functions may be added to provide additional
features to the fit, e.g., ensure the molecule’s net charge, damp the
magnitude of higher MTP coefficients, or constrain a number of MTP
coefficients to specific values. More details can be found in Ángyán
et al. [113] and Kramer et al. [115]. Elking et al. have also used
an ESP-based (non-linear) fitting method to parametrize Gaussian
multipoles (see Section 7.3.2) [58].
Fitting electrostatic parameters—and especially point charges—
to the ESP is a comparatively old idea [116–118]. There are two main
features associated with it:

• The least-squares nature of the method provides flexibility


in terms of fitting different molecules and/or conformations
at once. This addresses the aforementioned issue of high
conformational dependence by averaging over distinct
conformers [39].
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Molecular Simulations with MTPs 249

• Such ESP-based methods all suffer from undersampling


problems around buried atoms (e.g., carbon of a methyl
group): Since one typically only retains grid points in a
close volume around the molecule—where intermolecular
interactions are the strongest—certain atoms may encom-
pass few grid points in their immediate neighborhood. As
a result, large variations in the coefficients of those atoms
may affect the quality of the fit by a marginal amount.
Methods that alleviate this issue include restraints on the
coefficients [118] and associate atoms into atom types to
both reduce the number of fitting parameters and increase
the sampling for each atom [56, 115].

Together, the combination of atom types that make use of added


information between different atoms of similar chemical nature
and fits over distinct conformations and molecules can help
strengthen the MTP coefficients. The benefits of averaging over both
conformations and molecules was demonstrated in References [39]
and [115].
Finally, we note that the above-mentioned penalties that can be
included in the linear least-squares fit can provide the means to
restrain the new MTP coefficients around an existing PC force field
in a controlled way. While the new monopoles will remain close to
the original PC values, the higher MTPs will typically be of limited
intensity, thereby generating a new set of coefficients that is akin to
a perturbation of the original PC force field [56].

7.5 Molecular Simulations with MTPs

Including MTPs in a molecular simulation brings its own set of


features and caveats. In the following, we highlight a number of
points one may wish to keep in mind when working with MTPs.

7.5.1 Energy Conservation


Energy conservation is a basic requirement for any Hamiltonian-
based description of a molecular system without explicit time
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

250 Multipolar Force Fields for Atomistic Simulations

dependence. The incorporation of MTP electrostatics is no ex-


ception: Numerical integration of the force and torque equations
7.7 and 7.8—derivatives of the interaction energy (Eq. 7.6)—
leads in principle to strict energy conservation. In practice, the
torques arising from the anisotropy of the interactions may prove
problematic: Torques are commonly converted into pairwise forces
by means of a rigid-body approximation, where a collection of atoms
are both translated and rotated around a given axis. Such a scheme
naturally precludes the use of flexible intramolecular interactions,
such as harmonic bonds and angles, and instead requires strictly
rigid molecules. While the use of bond-constraint algorithms (e.g.,
SHAKE [119]) can effectively alleviate this issue for molecules with
two to three atoms, it becomes less clear how to run MD simulations
of larger compounds with such a requirement. This very issue was
raised in Bereau et al. [38]. The DL MULTI implementation, which
relied on molecular MTPs and rigid molecules, showed apparent
energy conservation [43].a Sagui et al.’s implementation showed
a small energy drift when placing atomic MTPs on a rigid water
model [54], though they showed that the treatment of long-range
electrostatics (see Section 7.5.2) was responsible for this effect.

7.5.2 Long-Range Electrostatics


Coulomb interactions show critically poor convergence properties
as a function of distance (i.e., 1/r interactions). Interaction cutoffs
have shown prone to artifacts and motivated the development
of long-range electrostatic methods, such as Ewald summation
(see, e.g., Reference [25] and references therein). A number of
Ewald summation methods have been extended to MTPs (e.g., [43,
54, 120]), providing a rigorous treatment of electrostatics in MD
simulations.

7.5.3 Performance Issues


The improved electrostatic description MTPs provide comes at a
cost: performance. As mentioned in Section 7.2.1, the number of
a Theenergy range of Fig. 1 in Reference [43] only provides resolution up to
∼1 kcal/mol.
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Molecular Simulations with MTPs 251

interactions scales critically with the order of the MTP expansion.


To this end, we outline a number of techniques to reduce the
computational burden, some of which have been devised in previous
work:

Reduce the number of MTP sites The expansion of the ESP in MTPs
need not be uniform. Some sites may require higher MTPs (e.g.,
polar groups) whereas for other sites a lower order or even a PC
description (e.g., apolar hydrogens) may suffice. The many successes
of PC force fields demonstrate that they can satisfactorily describe
a range of chemical properties, where higher MTPs may provide
little or no improvement. Beyond the appropriate description of a
single molecule, a given physico-chemical process may not uniformly
depend on all molecules of the system. For instance, the structure of
the solvent around a solute and protein-ligand binding are strongly
localized events that require a large environment only to alleviate
finite-size artifacts and ensure proper thermodynamic conditions.
Setting MTPs on the solute/ligand molecule only can provide
a hybrid resolution, akin to a quantum-mechanics/molecular-
mechanics approach (QM/MM), assuming the two electrostatic
descriptions are compatible.a The advantage of such an approach is
that the MTP region does not scale with the system’s size, in such
a way that the extra computational investment to compute MTP
interactions will become negligible for larger simulations, including
more particles and interaction sites. Free-energy calculations of a
protein-ligand binding event—with MTPs on the ligand alone—
showed only a 20% increase in computer time compared to the PC
case [38].

Reduce the number of MTP coefficients on each site In Section


7.2.2, we argued in favor of aligning reference-axis systems along
symmetries of an atom’s immediate environment, which leads to a
number of MTP coefficients to being zero. By computing energies
and forces/torques in the MTP sites’ local axis systems, these null
coefficients cancel all interactions in which they participate. Given

a The interaction of one MTP and one PC molecules not only involves their
electrostatic coefficients but also the Lennard–Jones parameters. Achieving a
consistent cross-interaction requires care.
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

252 Multipolar Force Fields for Atomistic Simulations

the complexity of each interaction term, a clever choice of local


axis systems can have a significant impact on the computational
performance. Beyond the influence of symmetries that offer a
systematic way to nullify MTP coefficients, chemical intuition can
guide the simulator in reaching a favorable compromise between
accuracy and performance.

Ignore intramolecular MTP interactions In light of a parametriza-


tion that puts most weight on the PCs [56], one may envision ne-
glecting MTPs for intramolecular interactions. Very small molecules
(e.g., diatomics, triatomics) will not contain any such interactions
by the commonly used 1-4 exclusion rule. On the other hand,
larger molecules will exert intramolecular electrostatic interactions,
though they may well be negligible for rigid compounds (e.g.,
benzene). However, larger compounds—thus more flexible (e.g.,
proteins)—may suffer much from such a compromise. Indeed, it
stands to reason that the carboxyl and amino groups, which form
the building blocks of secondary structure, would benefit from an
MTP description [32, 44].

Minimize real-space evaluation in Ewald summation Sagui et al.


have quantified the impact of the real-space cutoff used in their
MTP Ewald implementation on the energy drift of a constant-energy
simulation [54]. Restricting the real-space evaluation to a minimal
amount and performing the rest in reciprocal space can lead to
significant improvements. The authors showed minimal energy and
force errors down to 4.25 Å of direct interaction cutoff. They report
an increase in computational time due to MTP interactions of only
8.5 with respect to simple PCs.

MTP interaction cutoffs for real-space only evaluations In specific


cases, long-range MTP electrostatics may not be compulsory. By
combining Ewald summation on the PC-PC interactions and keeping
the strength of higher MTPs low, real-space only MTP interactions
have shown sufficient to provide significant MTP effects and good
energy conservation [38]. Further, the interaction cutoff can be
tailored to the various power-laws exerted by MTP interactions (i.e.,
higher-order interactions decay faster).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Applications 253

In general, improving performance will come at the cost of ac-


curacy. The benefit of using a given “trick” to reduce computational
time lies thus in the eye of the beholder—and the system of interest.

7.6 Applications

In the following, we outline a number of MTP electrostatic ap-


plications to spectroscopy, free-energy calculations, and dynamical
properties.

7.6.1 Spectroscopy
CO in Myoglobin The use of MTP electrostatics has been of
particular relevance in spectroscopic applications, specifically when
quantitative comparisons with experiments and their interpretation
was of interest. One of the noticeable examples is the infrared
spectrum of photodissociated carbon monoxide (CO) in myoglobin
(Mb). The strong (43 MV/cm [121]) inhomogeneous electric field
in the heme pocket (see Fig. 7.5) leads to characteristic shifting and
splitting of the spectral lines due to the Stark effect. Several attempts
were made [122–124] to correctly interpret the experimentally
known infrared spectrum [125] using computational methods.
Although some of them were capable of correctly modeling the
width of the experimentally determined spectrum, they usually
were unable to find the characteristic splitting of the CO spectrum
(i.e., ≈ 10 cm−1 ). A first successful attempt used a fluctuating
point-charge model based on an earlier three-point model for CO
[126, 127]. This was later generalized to a rigorous fluctuating MTP
model which reproduced most features of the spectrum known
from experiments [40]. In particular, the splitting, width and relative
intensities of the computed spectrum favorably agreed with the
experimentally known properties. Based on this agreement it was
then also possible to assign the two spectroscopic signatures to
distinct conformational substates. Those agreed with previous—
more heuristic—attempts based on mutations in the active site and
mixed QM/MM simulations based on a few representative snapshots
from MD simulations [128, 129].
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

254 Multipolar Force Fields for Atomistic Simulations

CO in Neuroglobin As a second example, the structural origins of


infrared absorptions of photodissociated CO in murine neuroglobin
(Ngb) have been determined by combining MD simulations with
Fourier transform infrared (FTIR) spectroscopy [130]. Ngb is a small
heme protein and predominantly expressed in neuronal cells of
vertebrates [131]. Its physiological role is not yet known [132].
However, it most likely involves the binding of a small ligand such
as dioxygen, nitric oxide, or carbon monoxide at the heme iron.
Greenberg et al. suggested that Ngb plays a role in neuroprotection
[133]. Other suggestions of possible functions include the signaling
of hypoxia [134] and radical scavenging [135, 136].
Due to the considerably larger active site pocket the assignment
of experimentally determined infrared signatures is more involved.
Again, a quadrupolar MTP model has been used for CO, whereas
the solvated protein is treated with a conventional PC force
field. To capture the influence of the protein environment on the
spectroscopy and dynamics, experiments and simulations were
carried out for the wild type protein and its F28L and F28W mutants.
It is found that a voluminous side chain at position 28 divides
site B into two subsites, B and B . At low temperatures, CO in
wild-type Ngb only migrates to site B , from where it can rebind,
while B is not populated. The CO spectra in site B for wild-type
Ngb from simulations and experiments are very similar in spectral
shift and shape: They both show doublets, red-shifted with respect
to gas-phase CO and split by ≈8 cm−1 . The FTIR spectra of the
F28L mutant show additional bands which are also found in the
simulations and can be attributed to CO located in substate B . The
different bands are mainly related to different orientations of the
His64 side chain with respect to the CO ligand. Large red-shifts
arise from strong interactions between the Histidine-NH and the
CO oxygen. FTIR photoproduct spectra provide information on the
number of conformational substates and also the number of visited
transient docking sites, but lack direct structural information. Site-
specific spectra can be obtained from MD simulations, which assist
in interpreting the experimental data.

1D- and 2D-infrared spectroscopy of CN− The solution-phase


spectroscopy of the cyanide anion (see Fig. 7.4) is another
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Applications 255

Figure 7.4 (a) Cartoon representation of cyanide in water. (b) Time


evolution of the 2D-infrared tilt angle, α. The red, magenta, and blue curves
correspond to PC, MTP, and experimental results, respectively. See Ref. [51]
for more details.

benchmark system for atomistic simulations. The dynamics of


small solute molecules in solution provides detailed information
on the coupling between intra- and intermolecular degrees of
freedom. 2D IR spectroscopy has been shown to be very sensitive
to the solvent dynamics on short time scales, which provides the
opportunity to validate atomistic computational models against
detailed experimental data [137].
The dynamical behavior of the cyanide ion (CN− ) has been
well studied experimentally [138–140]. Atomistic simulations have
shown to give energy relaxation times in good agreement with
experiments [49, 141]. It has been found that vibrational energy
relaxation is particularly sensitive to the level at which the
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

256 Multipolar Force Fields for Atomistic Simulations

intermolecular interactions are described and that models beyond


traditional point charges are required for realistic computational
work. This provides the basis for more detailed investigations
of the spectroscopy of CN− in D2 O, specifically whether a single
parametrization of the intermolecular interactions is capable of
quantitatively describing a number of distinct experimental observ-
ables.
It is found that within a range of justifiable (and commonly used)
force fields, the tilt angle α as a function of the waiting time can be
realistically modeled [51]. Most importantly, the recently developed
multipolar model for water and cyanide combined with anharmonic
stretching and bending potentials [49] and slightly modified van
der Waals ranges for the CN− yields very favorable agreement with
experiments [140], without further adjustment of any parameter.
Hence, such models provide a robust and realistic parametrization
for dynamical problems including vibrational relaxation and 2D IR
spectroscopy.
Finally, it is also worth mentioning that an efficient and spec-
troscopically accurate force field for sampling the conformations
obviates the need for specifically designing frequency maps in the
computation of 2D infrared spectra. Such frequency maps are a
convenient means to determine 2D IR spectra from conventional
MD simulations [142, 143]. However, their transferability from
one system to a chemically related one is not guaranteed and
they do not allow to carry out a consistent analysis of a physico-
chemical process because conformational sampling and analysis
(“scoring”) of the simulations employ different energy functions.
In other words, only the use of a unique force field for both
conformational sampling and post-processing allows to uniquely
trace back potential shortcomings of the energy function (e.g., CN−
in aqueous solution [49, 51]).

7.6.2 Free-Energy Calculations


Free-energy calculations form one of the hallmarks of computational
chemistry: They crystallize the promise that one can reproduce
and predict thermodynamics in a system’s model representation
[19, 144–146]. There has been much interest in the ability of MTPs
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Applications 257

to improve the accuracy of free-energy calculations in at least two


different areas: hydration free energy and protein-ligand binding.

Hydration free energies Solvated cyanide (Fig. 7.4) is again a


suitable test system for the ability to describe the solvation behavior
of small anions. Simulations with an MTP force field showed
improved accuracy against the experimental hydration free energy,
compared to a PC representation, for both CN− and hydroxide anion
[50]. Specifically, the parametrization used for both spectroscopic
(Section 7.6.1) and dynamical properties (see below) [49, 51]
most accurately reproduced CN− ’s thermodynamic properties. This
underlines the utility of a physics-based approach to force field para-
metrization for small systems. More importantly, it demonstrates
the remarkable robustness MTP force fields can provide. Within
this context, the reparametrization of van der Waals interactions
could be optimized and/or validated against a larger number of
independent experimental observables to strengthen the force field.
It would also be tempting to consider computationally more efficient
means to evaluate solvation and binding free energies by scoring
trajectories generated from computationally less expensive models
and evaluate observables from improved models. Under favorable
circumstances such a procedure indeed did show some merit [51].
However, at the present stage such an approach cannot be broadly
recommended as it sensitively depends on the phase space sampled
and more in-depth studies are required to delineate the essentials
that need to be captured correctly.
Simulations using AMOEBA provided extremely accurate hydration
free energies of monovalent cations (i.e., K+ and Na+ ) as well
as whole salts in both water and formamide—reproducing free
energies within a few percents of the experimental value and
offering comparisons to standard non-polarizable force fields [67]—
divalent cations, such as Ca2+ and Mg2+ [68], as well as Zn2+
[70]. Finally, we note the study of Marjolin et al. [71], which
estimated the hydration free energy of the actinide Th(IV) in
water to −1,638 kcal/mol, in good agreement with experiments—
a remarkable achievement given the sheer magnitude of the free
energy. The accuracy with which the hydration free energy of
organic molecules can be reproduced in AMOEBA has been investigated
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

258 Multipolar Force Fields for Atomistic Simulations

in a number of studies, reaching a root-mean squared error of


≈0.6 kcal/mol with respect to experiments on a limited set [74, 75].

Protein-ligand binding The advantage of MTP over PC electrostatics


coupled to a non-polarizable force field becomes evident when
calculating the free energy of binding of a tetrabromobenzotriazole
ligand with the target protein casein kinase 2 [147]: PC-only
electrostatics have been shown to destabilize the complex [148],
while the relative binding free energy between PC and MTP
descriptions yielded a 3.8 kcal/mol increased stability, though no
absolute free energy calculation was reported [38].
Protein-ligand binding studies performed with AMOEBA have
allowed to reproduce the absolute and relative binding free energies
of charged benzamidine and diazamidine ligands to trypsin within
0.5 kcal/mol of the experimental measurements [77, 78]. The
authors reported the crucial contribution of electronic polarization,
making it difficult to assess the impact of the MTP electrostatics
alone.a Further, the description of charged ligands in the presence of
Zn2+ cations in the protein (e.g., zinc-finger proteins) was strongly
improved by the incorporation of polarizability—though the role of
MTP electrostatics is, here as well, unclear [79].

7.6.3 Dynamical Properties


The exchange of energy between different degrees of freedom in
a chemical system is of fundamental importance. Energy flow is
required for processes ranging from chemical reactivity to signaling
in biological systems. Directly mapping out energy migration
pathways in molecular systems from experiments alone is very
challenging. Hence, atomistic simulations with dedicated force fields
have become a meaningful complement.

Vibrational Relaxation of Solvated CN− Following vibrational exci-


tation, IR-pump-IR-probe experiments have been used to determine
T1 relaxation times of the v = 1 state of CN− in H2 O and D2 O
[139, 149]. In contrast to polyatomic molecules such as N−
3 , energy
relaxation in diatomics is governed by intermolecular interactions
a This caveat applies to most studies that involve polarizability.
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Applications 259

and the coupling between solvent and solute can be investigated


directly. It has been suggested [149] and later confirmed [139,
141] that Coulomb interactions are responsible for the vibrational
relaxation of polar molecules in coordinating solvents, such as water.
Therefore, atomistic simulations with accurate MTP electrostatics
are expected to provide detailed insights into energy migration
pathways. Many previous simulation studies were carried out with
idealized interaction potentials. For example, rigid water models are
unable to reproduce energy flow into the water’s internal degrees of
freedom [141].
Simulations with fully flexible force fields and accurate repre-
sentations of the nonbonded interactions for CN− and H2 O provide
quantitative agreement with experimentally determined relaxation
times [49]. Using a rigid water model, energy relaxation from
the vibrationally excited chromophore (CN− ) into the surrounding
solvent is slower by more than an order of magnitude. Hence,
under the given circumstances (existence of mechanical resonances
between chromophore vibrations and internal solvent degrees of
freedom) and for this type of study it is mandatory, that atomistic
simulations are carried out with fully flexible monomers. The sim-
ulations also show that the calculated T1 times sensitively depend
on the force field parametrization, in particular the Lennard–Jones
ranges. Increasing the LJ ranges by up to 7.5% simulations leads to
longer relaxation times by a factor of 4 to 5. This can be qualitatively
understood by noting that for larger LJ ranges the distance between
the solvent water molecules and CN− will be larger on average
which, in turn, leads to reduced electrostatic interactions and hence
less efficient vibrational energy transfer.

Ligand Migration in Myoglobin Coupling between small molecule


diffusion and its environment is also potentially relevant for ligand
migration in proteins. An experimentally well-characterized system
is carbon monoxide (CO) in myoglobin (Mb). However, despite
intense work, important atomistic aspects governing CO rebinding
and migration barriers after photodissociation are still incompletely
understood. While the different pockets accessible to small diatomic
ligands are well characterized by experiments [151–154] and
theory/computer simulations [155–158], the pathways between the
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

260 Multipolar Force Fields for Atomistic Simulations

pockets and the energy barriers associated with them are more
debatable.
For a quantitative analysis for CO migration between the most
important pockets, simulations with MTP and PC electrostatics for
the ligand were carried out [150]. The barriers obtained using
a PC model are either equal or higher by up to ≈ 2 kcal/mol
compared to simulations with a multipolar model, while the barriers
themselves are between 2 and 8 kcal/mol high; see Fig. 7.5. On
the other hand, it was also found that depending on the initial
configuration from which the free energy simulations were started,

Figure 7.5 (a) Myoglobin structure (cartoon representation), its heme and
two neighboring side chains (licorice), iron and the CO molecule (bulky). (b)
Free energy profile of CO migration along two reaction coordinates linking
different pockets: DP → Xe4 and Xe4 → Xe2. The two curves represent PC
(red) and MTP (blue) electrostatics. See Ref. [150] for more details.
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

Conclusions and Outlook 261

barrier heights with the same interaction model can vary by


up to 4 kcal/mol. Hence, a sufficiently exhaustive conformational
sampling is required together with a reliable energy function for
quantitative assessments of ligand migration in proteins. The ligand
migration barriers themselves do, however, not provide sufficient
evidence that multipolar models are indeed mandatory for realistic
simulations.

7.7 Conclusions and Outlook

MTP electrostatics provides an elegant and systematic way to


incorporate anisotropic features in a molecule’s charge distribution.
By originating from an expansion of the ESP, MTPs are associated
with orthogonal spherical harmonics of varying symmetries. The
sheer number of MTP interactions scales critically with the number
of coefficients on each site, in such a way that most implementations
restrict the expansion, l, to low order (i.e., l = 2 − 4).
Force fields and MD implementations that build on MTP
electrostatics have shown significant improvements over standard
PC force fields for a number of systems—from water dimers, to
organic molecules, to protein-ligand complexes. Still, one should not
jump to the conclusion that MTP electrostatics is essential to any
(bio)molecular study: Their inclusion requires a significant invest-
ment, both in terms of extra simulation time and parametrization
effort. Tackling a parametrization procedure with MTPs will not
only require the electrostatic coefficients—using, say, DMA or an
ESP-based method (see Section 7.4)—it will de facto call for the
optimization of all the other force field parameters (e.g., van der
Waals, dihedrals). In this regard, more refined force fields (e.g.,
AMOEBA, SIBFA) seem to offer better separability between terms, while
the parameters optimized in non-polarizable descriptions typically
depend on each other. As an illustration, we point to a number of
studies, which reparametrized the Lennard–Jones coefficients upon
changing of electrostatic model [38, 41, 42, 50].
While the diversity of electrostatic descriptions (e.g. PC vs. MTP,
molecular vs. atomic MTPs) in force fields may seem daunting, they
simply illustrate the variety of physics and chemical resolutions
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

262 Multipolar Force Fields for Atomistic Simulations

required to describe molecular phenomena. To this day, no clear


guideline can substitute the researcher’s intuition in deciding
whether PC or MTP electrostatics ought to be used for a given
system. Our experience points to a reduced impact of MTPs for
larger molecules, where standard PCs perform well enough. Still,
MTP electrostatics does help describe certain anisotropic features
that are out of reach of standard PC force fields—both are bound
to remain relevant given the finite computational resources one has
access to.

Acknowledgement

We thank Drs. N. Plattner, M. Devereux, C. Kramer, and M. W. Lee for


insightful discussions and preparation material for this chapter. This
work is supported by the Swiss National Science Foundation through
the NCCR-MUST and grant 200021-117810.

References

1. J. McCammon, B. Gelin, M. Karplus, Nature 267, 585 (1977).


2. W. W. Wood, J. J. Erpenbeck, Annu. Rev. Phys. Chem. 27(1), 319 (1976).
3. M. Karplus, G. A. Petsko, Nature 347(6294), 631 (1990).
4. W. F. van Gunsteren, H. J. Berendsen, Angew. Chemie 29(9), 992 (1990).
5. T. P. Lybrand, Rev. Comput. Chem. 1, 295 (1990).
6. W. Gunsteren, F. Luque, D. Timms, A. Torda, Ann. Rev. Biophys. Biomol.
Struct 23(1), 847 (1994).
7. T. E. Cheatham III, P. A. Kollman, Annu. Rev. Phys. Chem. 51(1), 435
(2000).
8. T. Hansson, C. Oostenbrink, W. van Gunsteren, Curr. Opin. Struct. Biol.
12(2), 190 (2002).
9. M. Karplus, J. A. McCammon, Nat. Struct. Biol. 9(9), 646 (2002).
10. O. M. Becker, A. D. MacKerell Jr, B. Roux, M. Watanabe, Computational
Biochemistry and Biophysics (CRC Press, 2001).
11. J. W. Ponder, D. A. Case, Adv. Prot. Chem. 66, 27 (2003).
12. R. W. Hockney, J. W. Eastwood, Computer Simulation Using Particles
(CRC Press, 2010).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

References 263

13. L. Monticelli, D. P. Tieleman, in Biomolecular Simulations (Springer,


2013), pp. 197–213.
14. W. L. Jorgensen, D. S. Maxwell, J. Tirado-Rives, J. Am. Chem. Soc.
118(45), 11225 (1996).
15. H. Sun, J. Phys. Chem. B. 102(38), 7338 (1998).
16. C. Oostenbrink, A. Villa, A. E. Mark, W. F. Van Gunsteren, J. Comput.
Chem. 25(13), 1656 (2004).
17. B. R. Brooks, C. L. Brooks, A. D. Mackerell, L. Nilsson, R. J. Petrella,
B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, et al., J. Comput.
Chem. 30(10), 1545 (2009).
18. J. Durrant, J. A. McCammon, BMC Biol. 9(1), 71 (2011).
19. P. Kollman, Chem. Rev. 93(7), 2395 (1993).
20. J. L. Klepeis, K. Lindorff-Larsen, R. O. Dror, D. E. Shaw, Curr. Opin. Struct.
Biol. 19(2), 120 (2009).
21. P. L. Freddolino, S. Park, B. Roux, K. Schulten, Biophys. J. 96(9), 3772
(2009).
22. P. E. Lopes, G. Lamoureux, B. Roux, A. D. MacKerell, J. Phys. Chem. B
111(11), 2873 (2007).
23. G. Lamoureux, E. A. Orabi, Mol. Sim. 38(8-9), 704 (2012).
24. M. Deserno, C. Holm, J. Chem. Phys. 109, 7678 (1998).
25. C. Sagui, T. A. Darden, Ann. Rev. Biophys. Biomol. Struct 28(1), 155
(1999).
26. A. C. Legon, Phys. Chem. Chem. Phys. 12(28), 7736 (2010).
27. W. L. Jorgensen, P. Schyman, J. Chem. Theory Comp. 8(10), 3895 (2012).
28. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, M. L. Klein,
J. Chem. Phys. 79, 926 (1983).
29. M. W. Mahoney, W. L. Jorgensen, J. Chem. Phys. 112, 8910 (2000).
30. J. D. Jackson, Classical Electrodynamics (John Wiley & Sons: New York,
1998).
31. A. J. Stone, The Theory of Intermolecular Forces, vol. 32 (Clarendon
Press Oxford, 1996).
32. J. W. Ponder, C. Wu, P. Ren, V. S. Pande, J. D. Chodera, M. J. Schnieders,
I. Haque, D. L. Mobley, D. S. Lambrecht, R. A. DiStasio Jr, M. Head-
Gordon, G. N. I. Clark, M. E. Johnson, T. Head-Gordon, J. Phys. Chem. B
114(8), 2549 (2010).
33. C. Gray, Can. J. Phys. 54(5), 505 (1976).
34. A. Stone, Mol. Phys. 36(1), 241 (1978).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

264 Multipolar Force Fields for Atomistic Simulations

35. F. Vigné-Maeder, P. Claverie, J. Chem. Phys. 88, 4934 (1988).


36. C. Hättig, Chem. Phys. Lett. 260(3), 341 (1996).
37. C. Hättig, Chem. Phys. Lett. 268(5), 521 (1997).
38. T. Bereau, C. Kramer, M. Meuwly, J. Chem. Theory Comp. 9(12), 5450
(2013).
39. C. Kramer, P. Gedeck, M. Meuwly, J. Comput. Chem. 33, 1673 (2012).
40. N. Plattner, M. Meuwly, Biophys. J. 94, 2505 (2008).
41. S. Liem, P. Popelier, J. Chem. Phys. 119, 4560 (2003).
42. S. Liem, P. Popelier, M. Leslie, Int. J. Quant. Chem. 99(5), 685 (2004).
43. M. Leslie, Mol. Phys. 106(12-13), 1567 (2008).
44. Y. Yuan, M. J. Mills, P. L. Popelier, J. Comput. Chem. (2013).
45. S. Price, A. Stone, M. Alderton, Mol. Phys. 52(4), 987 (1984).
46. A. Buckingham, P. Fowler, J. M. Hutson, Chem. Rev. 88(6), 963 (1988).
47. N. Plattner, M. Meuwly, J. Mol. Model. 15(6), 687 (2009).
48. M. Devereux, M. Meuwly, Biophys. J. 96(11), 4363 (2009).
49. M. W. Lee, M. Meuwly, J. Phys. Chem. A. 115(20), 5053 (2011).
50. M. W. Lee, M. Meuwly, Phys. Chem. Chem. Phys. (2013). DOI 10.1039/
c3cp52713a
51. M. W. Lee, J. K. Carr, M. Göllner, P. Hamm, M. Meuwly, J. Chem. Phys. 139,
054506 (2013).
52. A. Toukmaji, C. Sagui, J. Board, T. Darden, J. Chem. Phys. 113, 10913
(2000).
53. P. Ren, J. W. Ponder, J. Phys. Chem. B. 107(24), 5933 (2003).
54. C. Sagui, L. G. Pedersen, T. A. Darden, J. Chem. Phys. 120, 73 (2004).
55. D. M. Elking, L. Perera, R. Duke, T. Darden, L. G. Pedersen, J. Comput.
Chem. 31(15), 2702 (2010).
56. T. Bereau, C. Kramer, F. W. Monnard, E. S. Nogueira, T. R. Ward,
M. Meuwly, J. Phys. Chem. B. 117(18), 5460 (2013).
57. U. Koch, P. L. A. Popelier, A. J. Stone, Chem. Phys. Lett. 238(4-6), 253
(1995).
58. D. M. Elking, G. A. Cisneros, J. P. Piquemal, T. A. Darden, L. G. Pedersen,
J. Chem. Theory Comp. 6(1), 190 (2010).
59. C. M. Handley, G. I. Hawe, D. B. Kell, P. L. Popelier, Phys. Chem. Chem.
Phys. 11(30), 6365 (2009).
60. M. J. Mills, P. L. Popelier, Comput. Theor. Chem. 975(1), 42 (2011).
61. M. J. Mills, P. L. Popelier, Theo. Chem. Account 131(3), 1 (2012).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

References 265

62. T. A. Halgren, W. Damm, Curr. Opin. Struct. Biol. 11(2), 236 (2001).
63. A. Warshel, M. Kato, A. V. Pisliakov, J. Chem. Theory Comp. 3(6), 2034
(2007).
64. H. S. Antila, E. Salonen, in Biomolecular Simulations (Springer, 2013),
pp. 215241.
65. G. A. Cisneros, M. Karttunen, P. Ren, C. Sagui, Chem. Rev. (2013).
66. T. A. Halgren, J. Am. Chem. Soc. 114(20), 7827 (1992).
67. A. Grossfield, P. Ren, J. W. Ponder, J. Am. Chem. Soc. 125(50), 15671
(2003).
68. D. Jiao, C. King, A. Grossfield, T. A. Darden, P. Ren, J. Phys. Chem. B
110(37), 18553 (2006).
69. J. P. Piquemal, L. Perera, G. A. Cisneros, P. Ren, L. G. Pedersen, T. A.
Darden, J. Chem. Phys. 125(5), 054511 (2006).
70. J. C. Wu, J. P. Piquemal, R. Chaudret, P. Reinhardt, P. Ren, J. Chem. Theory
Comp. 6(7), 2059 (2010).
71. A. Marjolin, C. Gourlaouen, C. Clavaguéra, P. Y. Ren, J. C. Wu, N. Gresh, J.
P. Dognon, J. P. Piquemal, Theo. Chem. Account 131(4), 1 (2012).
72. P. Ren, J. W. Ponder, J. Comput. Chem. 23(16), 1497 (2002).
73. T. D. Rasmussen, P. Ren, J. W. Ponder, F. Jensen, Int. J. Quant. Chem.
107(6), 1390 (2007).
74. P. Ren, C. Wu, J. W. Ponder, J. Chem. Theory Comp. 7(10), 3143 (2011).
75. Y. Shi, C. Wu, J. W. Ponder, P. Ren, J. Comput. Chem. 32(5), 967 (2011).
76. Y. Shi, Z. Xia, J. Zhang, R. Best, C. Wu, J. W. Ponder, P. Ren, J. Chem. Theory
Comp. 9(9), 4046 (2013).
77. D. Jiao, P. A. Golubkov, T. A. Darden, P. Ren, Proc. Natl. Acad. Sci. U. S. A.
105(17), 6290 (2008).
78. D. Jiao, J. Zhang, R. E. Duke, G. Li, M. J. Schnieders, P. Ren, J. Comput.
Chem. 30(11), 1701 (2009).
79. J. Zhang, W. Yang, J. P. Piquemal, P. Ren, J. Chem. Theory Comp. 8(4), 1314
(2012).
80. J. Zhang, Y. Shi, P. Ren, in Protein-Ligand Interactions, 1st edn. (Wiley
Online Library, 2012), pp. 99120.
81. M. J. Schnieders, T. D. Fenn, V. S. Pande, A. T. Brunger, Act. Cryst. D 65(9),
952 (2009).
82. T. D. Fenn, M. J. Schnieders, A. T. Brunger, V. S. Pande, Biophys. J. 98(12),
2984 (2010).
83. N. Gresh, P. Claverie, A. Pullman, Theor. Chim. Acta. 66(1), 1 (1984).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

266 Multipolar Force Fields for Atomistic Simulations

84. N. Gresh, G. A. Cisneros, T. A. Darden, J. P. Piquemal, J. Chem. Theory


Comp. 3(6), 1960 (2007).
85. G. Cisneros, T. Darden, N. Gresh, J. Pilmé, P. Reinhardt, O. Parisel, J.
P. Piquemal, in Multi-scale Quantum Models for Biocatalysis (Springer,
2009), pp. 137172.
86. F. M. Tao, W. Klemperer, J. Chem. Phys. 101, 1129 (1994).
87. N. Gresh, A. Pullman, P. Claverie, Theor. Chim. Acta. 67(1), 11 (1985).
88. K. X. Chen, N. Gresh, B. Pullman, Nucleic Acids Res. 14(5), 2251 (1986).
89. S. Meddeb, J. Berges, J. Caillet, J. Langlet, Biochim. Biophys. Acta
1112(2), 266 (1992).
90. N. Gresh, D. R. Garmer, J. Comput. Chem. 17(12), 1481 (1996).
91. N. Gresh, G. Tiraboschi, D. R. Salahub, Biopolymers 45(6), 405 (1998).
92. F. Rogalewicz, G. Ohanessian, N. Gresh, J. Comput. Chem. 21(11), 963
(2000).
93. J. P. Piquemal, B. Williams-Hubbard, N. Fey, R. J. Deeth, N. Gresh,
C. Giessner-Prettre, J. Comput. Chem. 24(16), 1963 (2003).
94. K. E. Hage, J. P. Piquemal, Z. Hobaika, R. G. Maroun, N. Gresh, J. Comput.
Chem. (2013).
95. B. T. Thole, Chem. Phys. 59(3), 341 (1981).
96. K. Tang, J. P. Toennies, J. Chem. Phys. 80, 3726 (1984).
97. M. A. Freitag, M. S. Gordon, J. H. Jensen, W. J. Stevens, J. Chem. Phys. 112,
7300 (2000).
98. J. P. Piquemal, N. Gresh, C. Giessner-Prettre, J. Phys. Chem. A. 107(48),
10353 (2003).
99. G. Cisneros, S. Tholander, O. Parisel, T. Darden, D. Elking, L. Perera, J. P.
Piquemal, Int. J. Quant. Chem. 108(11), 1905 (2008).
100. L. V. Slipchenko, M. S. Gordon, Mol. Phys.107(8-12), 999 (2009).
101. W. A. Sokalski, R. Poirier, Chem. Phys. Lett. 98(1), 86 (1983).
102. R. J. Wheatley, Mol. Phys. 79(3), 597 (1993).
103. R. J. Wheatley, J. B. Mitchell, J. Comput. Chem. 15(11), 1187 (1994).
104. G. A. Cisneros, J. Chem. Theory Comp. 8(12), 5072 (2012).
105. M. W. P. Strandberg, J. Chem. Phys. 17, 901 (1949).
106. S. A. Clough, Y. Beers, G. P. Klein, L. S. Rothman, J. Chem. Phys. 59, 2254
(1973).
107. A. Buckingham, J. Chem. Phys. 30, 1580 (1959).
108. A. Stone, Chem. Phys. Lett. 83(2), 233 (1981).
109. A. Stone, M. Alderton, Mol. Phys. 56(5), 1047 (1985).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

References 267

110. R. Rein, Adv. Quantum Chem. 7, 335 (1973).


111. S. F. Boys, P. Roy. Soc. Lond. A Mat. 200(1063), 542 (1950).
112. M. Abramowitz, I. Stegun, Handbook of mathematical functions:
with formulas, graphs, and mathematical tables, Courier Corporation
(1964).
113. J. G. Ángyán, C. Chipot, F. Dehez, C. Hättig, G. Jansen, C. Millot, J. Comput.
Chem. 24(8), 997 (2003).
114. J. C. Wu, G. Chattree, P. Ren, Theo. Chem. Account 131(3), 1138 (2012).
115. C. Kramer, T. Bereau, A. Spinn, K. R. Liedl, P. Gedeck, M. Meuwly, J. Chem.
Inf. Modell., 53, 3410 (2013).
116. U. C. Singh, P. A. Kollman, J. Comput. Chem. 5(2), 129 (1984).
117. C. M. Breneman, K. B. Wiberg, J. Comput. Chem. 11(3), 361 (1990).
118. C. I. Bayly, P. Cieplak, W. Cornell, P. A. Kollman, J. Phys. Chem. 97(40),
10269 (1993).
119. J. P. Ryckaert, G. Ciccotti, H. J. Berendsen, J. Comp. Phys. 23(3), 327
(1977).
120. W. Smith, CCP5 Inf. Q. 4, 13 (1982).
121. E. S. Park, S. S. Andrews, R. B. Hu, S. G. Boxer, J. Phys. Chem. B 103, 9813
(1999).
122. J. Ma, S. Huo, J. Straub, J. Am. Chem. Soc. 119, 2541 (1997).
123. J. Meller, R. Elber, Biophys. J. 74, 789 (1998).
124. M. Anselmi, M. Aschi, A. Di Nola, A. Amadei, Biophys. J. 92, 3442 (2007).
125. M. Lim, T. A. Jackson, P. A. Anfinrud, J. Chem. Phys. 102, 4355 (1995).
126. D. R. Nutt, M. Meuwly, Biophys. J. 85, 3612 (2003).
127. J. E. Straub, M. Karplus, Chem. Phys. 158, 221 (1991).
128. K. Nienhaus, J. S. Olson, S. Franzen, G. U. Nienhaus, J. Am. Chem. Soc.
127, 41 (2005).
129. M. Meuwly, Chem. Phys. Chem. 10, 2061 (2006).
130. K. Nienhaus, S. Lutz, M. Meuwly, G. U. Nienhaus, Chem. Phys. Chem.
11(1), 119 (2010).
131. T. Burmester, B. Weich, S. Reinhardt, T. Hankeln, Nature 407(6803),
520 (2000).
132. K. Nienhaus, G. U. Nienhaus, IUBMB Life 59(8-9), 490 (2007).
133. D. A. Greenberg, K. Jin, A. A. Khan, Curr. Opin. Pharmacol. 8(1), 20
(2008).
134. J. V. Esplugues, Brit. J. Pharmacol. 135, 1079 (2002).
January 29, 2016 11:27 PSP Book - 9in x 6in 07-Qiang-Cui-c07

268 Multipolar Force Fields for Atomistic Simulations

135. S. Herold, A. Fago, R. E. Weber, S. Dewilde, L. Moens, J. Biol. Chem.


279(22), 22841 (2004).
136. T. R. Weiland, S. Kundu, J. T. Trent, J. A. Hoy, M. S. Hargrove, J. Am. Chem.
Soc. 126(38), 11930 (2004).
137. P. Hamm, M. Zanni, Concepts and Methods of 2D Infrared Spectroscopy
(Cambridge University Press, Cambridge, 2011).
138. J. Lascombe, M. Perrot, Farad. Discuss. 66, 216 (1978).
139. P. Hamm, M. Lim, R. M. Hochstrasser, J. Chem. Phys. 107, 10523 (1997).
140. M. Koziński, S. Garrett-Roe, P. Hamm, Chem. Phys. 341, 5 (2007).
141. R. Rey, J. T. Hynes, J. Chem. Phys. 108, 142 (1998).
142. T. Hayashi, T. Jansen, W. Zhuang, S. Mukamel, J. Phys. Chem. A. 109(1),
64 (2005).
143. L. Wang, C. T. Middleton, M. T. Zanni, J. L. Skinner, J. Phys. Chem. B 115,
3713 (2011).
144. W. L. Jorgensen, Acc. Chem. Res. 22(5), 184 (1989).
145. T. Simonson, G. Archontis, M. Karplus, Acc. Chem. Res. 35(6), 430
(2002).
146. C. Chipot, A. Pohorille, Free Energy Calculations (Springer, 2007).
147. R. Battistutta, E. De Moliner, S. Sarno, G. Zanotti, L. A. Pinna, Prot. Sci.
10(11), 2200 (2001).
148. M. Kolář, P. Hobza, J. Chem. Theory Comp. 8(4), 1325 (2012).
149. E. J. Heilweil, F. E. Doany, R. Moore, R. M. Hochstrasser, J. Chem. Phys.
76, 5632 (1982).
150. N. Plattner, M. Meuwly, Biophys. J. 102(2), 333 (2012).
151. R. Tilton, I. D. Kuntz, G. A. Petsko, Biochemistry 23, 2849 (1984).
152. J. S. Olson, G. N. Phillips, J. Biol. Chem. 271, 17593 (1996).
153. E. E. Scott, Q. H. Gibson, J. S. Olson, J. Biol. Chem. 276, 5177 (2001).
154. F. Schotte, M. Lim, A. Jackson, V. Smirnov, J. Soman, J. Olson, G. Phillips,
M. Wulff, A. P., Science 300, 1944 (2003).
155. R. Elber, M. Karplus, J. Am. Chem. Soc. 112, 9161 (1990).
156. J. Z. Ruscio, D. Kumar, M. Shukla, M. G. Prisant, T. M. Murali, A. V.
Onufriev, Proc. Natl. Acad. Sci. U. S. A. 105, 9204 (2008).
157. C. Bossa, M. Anselmi, D. Roccatano, A. Amadei, B. Vallone, M. Brunori,
A. Di Nola, Biophys. J. 86, 3855 (2004).
158. N. Plattner, J. D. Doll, M. Meuwly, J. Chem. Phys. 133, 044506 (2010).
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Chapter 8

Status of the Gaussian Electrostatic


Model, a Density-Based Polarizable
Force Field

Jean-Philip Piquemala,b and G. Andrés Cisnerosc


a UPMC University Paris 06, Sorbonne Universités,

UMR 7616 Laboratoire de Chimie Théorique case courrier 137,


4 place Jussieu 75005, Paris, France
b CNRS, UMR 7616 Laboratoire de Chimie Théorique case courrier 137,

4 place Jussieu 75005, Paris, France


c Department of Chemistry, Wayne State University, Detroit, MI 48202, USA

[email protected], [email protected]

8.1 Introduction

The use of classical potentials for simulations of chemical and


biochemical systems with molecular dynamics has been a field of
intense research. Currently, it is possible to simulate systems with
millions of atoms and millisecond time scales (Schulten et al., 2008;
Shaw et al., 2010). With exa-scale computing, i.e., 1018 floating point
operations per second (FLOPs), on the horizon it is necessary to
evaluate the performance of the current potentials. Indeed, long-
time biomolecular simulations have revealed some issues already.

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

270 Status of the GEM Force Field

For example, Raval et al. carried out a study on 24 proteins (both


homology models and experimental structures) used in recent CASP
competitions involving at least 100 μs MD simulations (Raval et al.,
2012). For most systems, the structures drifted away from the
native state, even when starting from the experimental structure.
Although only two conventional force fields were employed, the
authors concluded that this is most likely a limitation of the available
point-charge force fields. As simulations on these and longer scales
grow more widespread with improvements in computing power,
node inter-connect, and graphical processing unit (GPU) hardware
(Stone et al., 2007), the accuracy of these classical potentials will be
further tested.
In this context, there has been a recent impetus to develop more
accurate force fields. One of the main thrusts has been to improve the
description of the bonded interactions by including anharmonicity
and of the non-bonded interactions by introducing explicit polar-
ization and a better description of the charge anisotropy terms.
Several force fields that employ distributed multipoles and use
explicit polarization (or are QM based using simplified MO schemes)
have been proposed including AMOEBA, SIBFA, EFP, X–Pol, mDC
and NEMO among others (Hermida-Ramón et al., 2003; Gresh et al.,
2007; Ponder et al., 2010; Day et al., 1996; Xie and Gao, 2007; Xie
et al., 2009; Mills and Popelier, 2012; Popelier, 2012; Giese et al.,
2013; Babin et al., 2014; Giese et al., 2014). The use of distributed
multipoles results in an improved description of the charge density
anisotropy and provides more accurate electrostatic interactions
(Stone, 2000; Price, 1999; Popelier, 2000; Kosov and Popelier, 2000;
Popelier et al., 2001a; Popelier and Kosov, 2001; McDaniel and
Schmidt, 2014). However, distributed multipoles suffer from one
drawback since they cannot describe the overlap of charge density
as two molecules get close to each other. This is known as the
penetration effect (Stone, 2000; Freitag et al., 2000). It is possible
to reduce the penetration error by employing empirical damping
functions (Kairys and Jensen, 1999; Freitag et al., 2000; Piquemal
et al., 2003; Cisneros et al., 2008; Wang and Truhlar, 2010; Stone,
2011). It is also possible to include this effect via the use of neural
networks (Handley and Popelier, 2010).
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Introduction 271

Another possibility to avoid the charge penetration and


anisotropy shortcomings is to use a continuous description of
the molecular charge density. Several methods that describe the
electronic distribution explicitly have been proposed (Wheatley,
2011; Gavezzotti, 2002; Eckhardt and Gavezzotti, 2007; Volkov
and Coppens, 2004; Coppens and Volkov, 2004; Paricaud et al.,
2005). We have introduced the Gaussian Electrostatic Model (GEM)
(Cisneros et al., 2005a; Piquemal et al., 2006a; Cisneros et al.,
2006b). GEM uses density fitting (DF) techniques (Boys and Shavit,
1959; Dunlap et al., 1979; Köster et al., 2002) to reproduce the
molecular electronic density using Hermite Gaussian auxiliary basis
sets (ABSs). These fitted densities are employed to calculate each
intermolecular component as obtained from energy-decomposition
(EDA) procedures. The reason for the use of EDA methods for
the parametrization of GEM is that it enables the separation
of each of the components of the intermolecular interactions.
There are a variety of EDA approaches that can be employed
including symmetry-adapted perturbation theory (SAPT), Kitaura–
Morokuma (KM), restricted variational space (RVS), constrained
space orbital variations (CSOV), to name a few (Eisenschitz and
London, 1930; Hirshfelder, 1967a,b; Murrel and Shaw, 1967; Kitaura
and Morokuma, 1976; Bagus et al., 1984; Stevens and Fink,
1987; Jeziorski et al., 1994; Glendening and Streitwieser, 1994;
Glendening, 1994; Mo et al., 2000; Heßelmann et al., 2005; Piquemal
et al., 2005; Khaliullin et al., 2006; Wu et al., 2009; Lu et al., 2011).
In this contribution, we present the theory behind the GEM
method and recent advances and results on the application of two
hybrid GEM potentials. In Section 8.2, we provide a brief review of
the analytical and numerical density fitting methods and its imple-
mentation, including the methods employed to control numerical
instabilities. This is followed by a review of the procedure to obtain
distributed site multipoles from the fitted Hermite coefficients
in Section 8.3. Section 8.4 describes the extension of reciprocal
space methods for continuous densities. Section 8.5 describes the
complete form for GEM and a novel hybrid force field, GEM*, which
combines term from GEM and AMOEBA for MD simulations. Finally,
Section 8.6 describes the implementation and initial applications of
a multi-scale program that combines GEM and SIBFA.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

272 Status of the GEM Force Field

8.2 Density Fitting Methods

The use of ABSs for density fitting is a field of intense study. This
method relies on the use of auxiliary basis functions (ABS), generally
Gaussians, to expand the molecular electron density

ρ̃(r) = ck (r). (8.1)
k

For GEM the ABSs consist of Hermite Gaussians, tuv (r). The
expansion coefficients ck for the approximate density ρ̃ may be
obtained by minimizing Eq. 8.2 using some metric Ô (Dunlap et al.,
1979; Eichkorn et al., 1995; Köster, 1996; Köster et al., 2002).

E self =< ρ(r) − ρ̃(r)| Ô|ρ(r) − ρ̃(r) > (8.2)

Several operators Ô can be employed including the overlap operator


Ô = 1, the Coulomb operator Ô = 1/r or the damped Coulomb
operator Ô = erfc(βr)/r (Jung et al., 2005). The minimization of Eq.
8.2 with respect to the expansion coefficients ck leads to a linear
system of equations:
∂ E self  
=− Pμν < μν| Ô|l > + ck < k| Ô|l > (8.3)
∂cl μ, ν k

The solution of Eq. 8.3 requires the inversion of a the ABS matrix
G =< k| Ô|l >. In principle this matrix should be positive definite
and symmetric. In practice however, this matrix is almost singular
and therefore the diagonalization to obtain its inverse must be done
with care. To this end we have explored analytical and numerical
procedures to obtain G and G−1 .

8.2.1 Analytical Fitting


The analytical fitting procedure involves the explicit evaluation of
all the matrix elements of G and its subsequent inversion, which is
achieved by diagonalization. We have implemented several methods
for the diagonalization step. Initially we employed singular value
decomposition (SVD) (Press et al., 1992) by setting the inverse of
the eigenvalue to zero if it is below a certain cutoff. However, this
method produces undesirable numerical instabilities (noise) when
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Density Fitting Methods 273

the number of basis functions starts to grow as we and others have


discussed previously (Cisneros et al., 2005b; Podeszwa et al., 2006).
In the current implementation we employ the Tikhonov reg-
ularization method (Press et al., 1992). This approach is similar
to the constrained density fitting algorithm of Misquitta and
Stone (Misquitta and Stone, 2006). Here, the redundant basis set

contributions are penalized by minimizing E self + λ k xk2 , resulting
in a more stable diagonalization procedure (Cisneros et al., 2006a).
For problematic systems we also implemented the damped
Coulomb operator Ô = erfc(βr)/r proposed by (Jung et al., 2005) to
attenuate the near singular behavior due to long range interactions
present in G. In our studies we observed that noise is still a problem
for some systems. The noise in the fit is known to arise due to the
attempt of the ABSs to fit the density at the nuclear cores (Cisneros
et al., 2005b, 2006a).
The use of Hermite Gaussians with angular moment greater
than 0 requires the rotation of the fitting coefficients. In GEM
this is addressed in a similar manner to multipolar force fields by
defining a global molecular frame and a reference (local) site frame
(Toukmaji et al., 2000; Sagui et al., 2004; Cisneros et al., 2006a).
The use of Hermite Gaussians provides a straightforward solution
to the rotation since they are defined by partial derivatives of a
spherical Gaussian which can be taken either with respect to the
local (reference) frame or with respect to the global coordinates
(Cisneros et al., 2006a). Moreover, the rotation frames are the same
for the distributed multipoles.

8.2.2 Numerical Fitting


As mentioned above, the numerical instabilities in the fit arise from
the attempt to fit the nuclear cores. Thus, if the density at the cores
is discarded, then the fit should become more stable. This can be
achieved by using numerical grids to evaluate a given molecular
property and discarding points at and near the core. This can be
achieved by minimizing the following fitting function:

χ2 = W(ri ) (y(ri ) − ỹ(ri , ck ))2 , (8.4)
i
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

274 Status of the GEM Force Field

where y(ri ) denotes the ab initio molecular property of interest at


point i and ỹ(ri , ck ) is the same property evaluated with the kth ABS
element at the same point on the grid. Finally, W(r) is the weighting
function for the point on the grid, which can be defined in several
ways (Bayly et al., 1993; Hu et al., 2007). Hu et al. have proposed a
weighting function that provides a smooth cutoff near the cores and
at long distance to avoid any discontinuities (Hu et al., 2007):

W(ri ) = exp[−σ (log ρpromol (r) − log ρref )2 ], (8.5)

where ρpromol is a reference promolecular atomic density, and σ


and ρref are adjustable parameters. It has been shown that the
surface for σ and ρref is relatively flat (Hu et al., 2007; Elking et al.,
2010). We have implemented a modified version of this weighting
function previously (Elking et al., 2010). The main differences
between the original Hu et al. weight and our implementation are
the re-optimization of the promolecular atomic electron densities
at the MP2/aug–cc–pVQZ level and the values for σ and ρref which
correspond to 0.42 and –7.0, respectively.
The minimization of Equation 8.5 leads to a linear system of
equations that can be expressed as c − c 0 = −H 0, −1 g0 . As was the
case for the analytic DF, we employ Tikhonov regularization for
the inversion of the Hessian that arises for the linear-least-squares
procedure. In our initial implementation of numerical fitting,
we explored different molecular properties, including electronic
density, molecular electrostatic potential (mESP), and the three
components of the electric field (Cisneros et al., 2007). All the
properties were gridded on rectangular grids. Subsequently we
showed that the use of spherical molecular grids based on the
scheme proposed by Becke (Becke, 1988) significantly reduce the
number of fitting points (Elking et al., 2010; Cisneros, 2012).

8.3 Distributed Multipoles

In this subsection, we present the methodology to obtain Cartesian


point multipoles from the Hermite coefficients obtained in the
fitting procedure. In all our work we have purposefully employed
ABSs with a maximum angular momentum of 2, which results in
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Distributed Multipoles 275

distributed multipoles only up to quadrupoles. This ensures that


the distributed multipoles obtained can be directly employed in
the AMOEBA force field. However, higher order multipoles can be
obtained if an ABS with higher angular momentum is used.
Briefly, we have expanded on the work by Challacombe et al.,
who have shown that Hermite Gaussians have a simple relation
to elements of the Cartesian multipole tensor (Challacombe et al.,
1996). Once the Hermite coefficients have been determined, they
may be employed to calculate point multipoles centered at the
expansion sites. Thus, if hctuv represents the coefficient of a Hermite
Gaussian of order tuv , then if this Hermite is normalized we have

hc000 0 dr = hc000 . (8.6)

This guarantees that higher order multipole integrals will


integrate to integer numbers, for example, for the dipole integral in
the z direction, dz :
   

hc001 z001 dxdydz = hc001 z 0 dz
∂S
 z

= −hc001 z 0 dz = hc001 (8.7)
∂z
For quadrupole and higher order integrals the same relationships
hold, although different cases need to be considered (see ref.
(Cisneros et al., 2006a)). In practice, following Stone’s definition
(Stone, 2000), we have used traceless quadrupoles. Furthermore,
the use of GEM distributed multipoles (GEM-DM) for multipolar
force fields provides a straightforward way to determine the
penetration error in the site–site Coulomb interaction energy due
to the connection with the GEM Hermites. Thus, this connection
provides a natural way to generate damping functions to lessen the
penetration error (Piquemal et al., 2003; Cisneros et al., 2008).
A further advantage of this approach to distributed multipoles is
that, unlike some conventional multipole expansions (Stone, 2005;
Popelier et al., 2001b), the (spherical) multipole expansion obtained
from Hermite Gaussians in this way is intrinsically finite of order t +
u + v (i.e., the highest angular momentum in the ABS) as shown in
(Cisneros et al., 2006a), similar to the multipoles obtained by Volkov
and Coppens (Volkov and Coppens, 2004).
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

276 Status of the GEM Force Field

8.4 Reciprocal Space Methods for Integral Evaluation

The use of molecular densities results in the need to compute a large


number of two center integrals for the intermolecular interaction.
A significant computational speedup can be achieved by using
reciprocal space methods based on Ewald sums. In this way, the
integrals are calculated in direct or reciprocal space depending on
the exponent of the Gaussian Hermites.
Here we describe how the Ewald formalism can be extended
to take into account the Gaussian distribution. Let U denote a
unit cell whose edges are given by the vectors a1 , a2 , and a3 .
An idealized infinite crystal can be generated by all periodic
translations n = n1 a1 + n2 a2 + n3 a3 for all integer triples
(n1 , n2 , n3 ), with n1 , n2 , n3 not all zero. Now, consider a collection
of N normalized spherical Gaussian charge densities ρ1 . . . ρ N (e.g.,
GEM–0) centered at {R1 . . . R N } ∈ U with exponents αi , i.e. ρi (r) =
qi (αi /π )3/2 exp(−αi (r − Ri )2 ), and let q1 + · · · + q N = 0. Note
that N need not be limited only to atomic positions, e.g., GEM–0
includes sites on the oxygen lone pairs and the bisector line between
the two hydrogens (Piquemal et al., 2006a). The Coulomb energy
of the central unit cell within a large spherical crystal, due to the
interactions of the Gaussian charge distributions with each other
and all periodic images within the crystal can be calculated using
Ewald methods.
In particular, to determine the reciprocal part in the Ewald sum
it is necessary to grid the Gaussian densities. However, this can
become intractable for Gaussian functions with large exponents
(compact Gaussians). In the initial implementation the charge
densities were classified into compact or diffuse Hermite Gaussians
based on a given Ewald exponent β. Therefore, if the exponent of a
given Hermite was above the cutoff it was considered compact, and
diffuse (αi < β) otherwise. With this, the contributions involving
diffuse Hermites can be calculated in reciprocal space exclusively
(Cisneros et al., 2006b).
This was later improved by the realization that the Ewald
exponent, β may be different for each pair i j (Darden, 2007).
Thus, β is chosen to be infinite for i j pairs where at least one of
the Gaussians is diffuse. In this way, all pairs that involve diffuse
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Reciprocal Space Methods for Integral Evaluation 277

Hermites are evaluated in reciprocal space. That is, given θ > 0 a


Gaussian distribution qi ρi is classified as compact (i ∈ c) if αi ≥ 2θ
and diffuse (i ∈ d) otherwise. Subsequently, for i, j ∈ {c}, select β so
that 1/θ = 1/αi + 1/α j + 1/β, otherwise β = ∞. In the case of GEM,
the fitted densities are expanded in a linear combination of Hermite
Gaussians tuv (r, α, R). Thus, the charge distribution is given by
L 
ρi (r, α, R) = l=1 tuv ci, l, tuv tuv (r, αl , Ri ), where ci, l, tuv are the
Hermite coefficients and L denotes the different ABS exponents on
center i . With this, the Ewald expression becomes
1  
N
E (ρ {N} ) = ci, li , ti ui vi
2 n i =1 l ∈c t u v
i i i i


N  
(−1)(t j +u j +v j ) c j, l j , t j u j v j
j =1 l j ∈c t j u j v j
 ti +t j  ui +u j  vi +v j
∂ ∂ ∂
×
∂Ri j x ∂Ri j y ∂Ri j z
 1/2 
er f c(θ 1/2 |Ri j − n|) − er f c(μli l j |Ri j − n|)
×
|Ri j − n|
1  1 
+ exp(−π 2 2
m /2θ ) Sl1 (m)
2π V m=0 m2 l ∈c 1

×exp(−π 2 m2 /2θ ) Sl2 (−m) (8.8)
l 2 ∈c

1  1 
N
+ exp(−π 2 m2 /αl1 )
2π V m=0 m2 (l 1 , l 2 )∈c×c
/

×exp(−π 2 m2 /αl2 )Sl1 (m)Sl2 (−m)


 
π 
N N
1 1 1
− ci, l , 000 c j, l2 , 000 − −
2V i =1 j =1 l ∈c l ∈c 1 θ αl1 αl2
1 2


N
2π D2
− E sel f (ρi ) + + ε(K ),
i =1
3V
where the first term corresponds to the direct part of the Ewald
sum, the second and third terms to the reciprocal part, Ri j = Ri −
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

278 Status of the GEM Force Field

R j , the structure factors Sl (m) involve derivatives of the Fourier


exponential with respect to the Hermite centers, E sel f (ρi ) is the
correction due to the self energy of each Hermite interacting with
its replicate, the term involving the unit cell dipole D = q1 R1 + · · · +
q N R N is the surface term, ε(K ) denotes a quantity that converges
to 0 as K −→ ∞, m denotes the reciprocal lattice vectors, and
1/μli l j = 1/αli + 1/αl j (Cisneros et al., 2006b; Darden, 2007).
Since the ABSs include Hermites with l > 0, the direct space
contributions can be efficiently evaluated by using the McMurchie–
Davidson (MD) recursion (McMurchie and Davidson, 1978). This
recursion has been used to calculate the required erfc and higher
derivatives for multipole interactions (Sagui et al., 2004). This
approach was also employed for the Hermite Gaussians (Cisneros
et al., 2006b), where it was shown that the MD recursion is
applicable to other types of operators besides 1/r. For the reciprocal
sums three methods were implemented: full Ewald (Ewald, 1921),
sPME (Essmann et al., 1995) and FFP (York and Yang, 1994). The
latter two methods rely on the use of fast Fourier transforms to
approximate the structure factors that arise in the reciprocal term,
which results in the efficient evaluation of this term and has been
shown to scale as O(N log N) for sPME (Essmann et al., 1995).

8.5 The GEM and GEM* Force Fields

The initial implementation of the full GEM potential involved the use
of spherical type Hermites only, resulting in what was termed GEM–
0 (Piquemal et al., 2006b). This initial parametrization included the
terms described below.

8.5.1 The GEM Functional Form


The idea for GEM is to employ the fitted Hermite Gaussians to
evaluate each term in

E Total = E Coulomb
GEM
+ E exch-rep
GEM

+E polarization
GEM
+ E charge-transfer
GEM
, (8.9)
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

The GEM and GEM* Force Fields 279

where the Coulomb term is given by


 Z A Z B   Z A ρ̃ B (rB )
GEM
E Coulomb = + dr
A>B
rAB A>B
rAB
  Z B ρ̃ A (r A )
+ dr
A>B
rAB
  ρ̃ A (r A )ρ̃ B (rB )
+ dr. (8.10)
A>B
rAB
The exchange-repulsion term is calculated by means of the
charge density overlap following the Wheatley–Price overlap model
(Wheatley and Price, 1990; Domene et al., 2001):

GEM
E exch-rep =K ρ̃ A (rA )ρ̃ B (rB ) (8.11)
A>B

The polarization term is approximated by the use of dipole


polarizabilities, which yield a very good results for the polarization
energies (if the electric fields are not large) (Böttcher, 1993). To this
end, the electric fields are calculated with the fitted densities and
interacted with distributed dipolar polarizabilities with Garmer and
Steven’s approach (Day et al., 1996):
1
x yz
GEM
E polarization = μ(i )(γ E 0 ( j )), (8.12)
2 j
x yz
where μ(i ) = α(i ) j E (μ(i )) + (γ E 0 ( j )), and γ is a scaling
factor for the permanent electric fields (Piquemal et al., 2006b).
Finally, the charge transfer term is evaluated using the semiem-
pirical formalism implemented in the SIBFA force field (Gresh et al.,
1979; Piquemal et al., 2007):
 (Iαβ
∗ 2
)
GEM
E charge-transfer = 2C ∗ , (8.13)

E αβ

where C = 3.5 is a constant parametrized to reproduce the value of


the charge-transfer energy (obtained with CSOV) for the canonical

water dimer at equilibrium distance. Iαβ is a function of the overlap
between the localized molecular orbital (LMO) for the donor lone
pair and antibonding LMO of the acceptor, as well as the electrostatic
potential on site A arising from all other sites obtained with the GEM
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

280 Status of the GEM Force Field


densities. E αβ is a difference between the ionization potential of A
and the electron affinity of the acceptor site.
In our initial implementations of GEM–0 and GEM we have not
introduced an explicit term for the dispersion interactions. This is
because these force fields have been originally parametrized using
the CSOV method at the DFT level, which, by definition, does not
include a dispersion contribution. However, it is possible to include
this term in a similar way to the SIBFA potential (Gresh et al., 1979;
Piquemal et al., 2007).

8.5.2 GEM*: molecular Dynamics with Fitted Densities


After our implementation of GEM–0, we extended the Coulomb
and exchange-repulsion terms to enable the use or arbitrary
angular momentum Hermites (Cisneros et al., 2006b). However,
both implementations only enabled energy calculations. In order to
carry out MD simulations it is necessary to evaluate the associated
forces efficiently. Until recently, it was impractical to do this since
the analytical form of the force for the charge-transfer term was
unavailable.
To enable the performance of MD simulations, a hybrid force
field called GEM* was developed. GEM* combines the Coulomb and
exchange-repulsion terms from GEM with the polarization, van der
Waals (modified) and bonded terms from AMOEBA. The functional
form for GEM* is thus
E Total = E Coulomb
GEM
+ E exch-rep
GEM
+ E polarization
AMOEBA

+E VdW
AMOEBA
+ E bonded
AMOEBA
. (8.14)

The Coulomb and exchange-repulsion terms for GEM* are


evaluated with the same expressions as for GEM (Eqs. 8.10 and 8.11).
Since GEM* includes an explicit term for exchange, it was necessary
to modify the original van der Waals function implemented in
AMOEBA. In this case, we have modified the buffered Halgren
function (modHalgren) by removing the repulsive term as follows:

7
1.07Ri∗j
E modHalgren = −i j . (8.15)
(Ri j + 0.07Ri∗j )
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

The GEM and GEM* Force Fields 281

The polarization and bonded terms are the same as those in


the original AMOEBA implementation (Ren and Ponder, 2003). The
only difference in the polarization is that the permanent electric
fields for the calculation of the induced dipoles are calculated with
the distributed multipoles obtained from the fitted Hermites for
consistency between the Coulomb and polarization terms (Cisneros,
2012).
The initial implementation of GEM* was tested by fitting para-
meters for a water potential (Duke et al., 2014). These parameters
were compared to reference ab initio values for total intermolecular
interactions corrected for basis set superposition error via the
counterpoise correction . The reference data was calculated at
the MP2(full)/aug–cc–pVTZ level to match the original AMOEBA
parametrization (Ren and Ponder, 2003, 2002; Ponder et al., 2010;
Ren et al., 2011). The molecular density used to obtain the fitting
coefficients for GEM* was calculated at the same level of theory as
above for a water molecule at the AMOEBA equilibrium geometry.
Three parametrizations were investigated, termed models 1–3
in the discussion below. The difference among the three models
involves the use of different ABSs, A1 or A2 (Andzelm and Wimmer,
1992; Godbout and Andzelm, 1999), and/or the dataset of water
oligomers used for the parametrization. Model 1 was fitted using
the A2 ABS to reproduce intermolecular interaction energies for the
canonical water dimer (see Fig. 8.1), several random dimers, and
selected water clusters from (Temelso et al., 2011). Models 2 and
3 were parametrized to reproduce intermolecular energies for the
canonical water dimer only using the A2 (model 2) and A1 (model
3) ABSs. All calculations for GEM* were performed with a modified
pmemd version in the AMOEBA suite of programs (Case et al., 2005).
The comparison of the QM reference for energies and forces
calculated with GEM* models 2 and 3 showed that both models
reproduce the total intermolecular interactions well. However,
model 3 deviated in the forces due to the limited accuracy because of
the small number of Gaussians in the ABS employed for model 3. In
addition, both models were observed to produce significant errors in
the intermolecular energies for random dimers and binding energies
for different oligomers. These errors are due to the improper
description of these two models to describe interactions between
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

282 Status of the GEM Force Field

0
1 2 3 4 5 6 7

-1
Energy, kcal/mol

-2

MP2/aug-cc-pVTZ
-3 GEM* model 1
AMOEBA
GEM* model 3
GEM* model 2
-4

-5 r(O-H), Å

Figure 8.1 Total intermolecular interaction energy for the canonical water
dimer calculated with the three GEM* models compared to ab initio and
AMOEBA.

H atoms since both models were fitted only to the canonical water
dimer.
Conversely, all results for model 1 showed good agreement
with the QM references for dimers as well as larger clusters. The
results show that a better parametrization can be obtained once a
slightly larger data set that included different dimer orientations
was considered. Recently Babin et al. have developed a novel water
model parametrized only from QM data (Babin et al., 2012, 2013)
using results from 40,000 dimers calculated at the CCSD/CBS level.
The performance of GEM* was tested by performing 100 MD
steps in the NVE ensemble with a series of water boxes of increasing
size (216, 512, 1024, 2048 and 4096 molecules). All MD calculations
were done on a single Xeon X5550 CPU with 12 GB of memory at
300 K with an 8 Å cutoff for Van der Waals interactions, using the
Beeman integrator, a 1 fs time step and a dipole tolerance for the
SCF convergence of 10−6 . The calculation of the polarization with
the induced dipoles was performed using the PME method with a B-
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

The GEM and GEM* Force Fields 283

Figure 8.2 Timings for different water boxes for GEM* using the model 1
parametrization. All times in seconds.

spline order of 5 and a grid size of 24. During the parametrization


and testing, it was realized that the overlap integrals for both sets
of ABSs employed (A1 and A2) tend to 0 at distances greater than
5 Å. Therefore, although it is also possible to perform the Overlap
integrals in reciprocal space as described in (Cisneros et al., 2006b;
Darden, 2007), the exchange integrals were evaluated only in direct
space with the same cutoff as the van der Waals interactions (8 Å),
or with a reduced cutoff of 6 Å.
Timings for all the tested systems are shown in Fig. 8.2.
For comparison we performed the calculations for all cases with
the Coulomb integrals evaluated completely in direct space (all
Gaussians set as compact), or by placing two Gaussians in the
diffuse set and employing sPME or FFP with two different exchange
cutoffs. As discussed previously, the evaluation of the Hermite
Gaussians in reciprocal space requires significantly larger grids
and B-spline orders (Cisneros et al., 2006b; Darden, 2007). For
the three smallest boxes the calculations are faster when all the
integrals are evaluated in direct space. This is due to the large
overhead for the FFTs due to the fine grids required for accurate
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

284 Status of the GEM Force Field

evaluation of the energies and forces in reciprocal space. As the


system grows in size, the calculation becomes faster by using the
compact/diffuse density split method with PME. The smallest water
box comprising 216 molecules takes 100 s for the evaluation of the
100 energy/force calculations including the evaluation of Coulomb
and overlap integrals for 15,120 basis functions. That is, our code
is able to evaluate all Coulomb and overlap integrals for 15,120
basis functions for a single step in 1 s. For the 4096 box, comprising
286,720 basis functions, when the exchange cutoff is reduced to 6 Å
the time is 2363 s. Moreover, after an initial optimization of the code,
this time is reduced to 789 s. This is only four times slower than the
same 4096 system calculated with AMOEBA on the same CPU. We
expect more gains in performance as the code is further optimized.

8.6 Combining SIBFA and GEM: S/G–1

In this initial implementation the direct coupling between GEM


and SIBFA has been only performed at the induction level (po-
larization and charge transfer energies) in the spirit of QM/MM
techniques (Chaudret et al., 2014). Indeed, the GEM equations for
exchange-repulsion involve overlap integrals between densities of
both interacting fragments. Thus a mixed S/G–1 scheme is not
possible for this term since the overlap between a GEM density
and SIBFA’s multipoles would be zero. Therefore, in the present
S/G–1 implementation, the electrostatic, exchange-repulsion and
dispersion energies are computed at the sole SIBFA level and include
SIBFA’s short range corrections. In a forthcoming work the full
multiscale implementation including full Gaussian electrostatics
first order energy will be reported.
Therefore in the first implementation, GEM is only used to
compute the second-order polarization and charge-transfer con-
tributions between the cation and its bound ligands. Finally, the
dispersion equations are the same for both methods, as they do
not depend on electric fields or potentials. As a proof of concept,
we limited ourselves to a Hartree–Fock level parametrization (no
dispersion) of the method.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Combining SIBFA and GEM 285

Briefly, within S/G–1, the evaluation of the polarization and


charge-transfer energies use the same formalism. The differences
between the two levels of computations are linked to the level of
calculation of electric fields and potentials (i.e., using electronic
densities for GEM or using distributed multipoles for SIBFA). Indeed,
differences between GEM and SIBFA electric fields and electrostatic
potentials can arise at short distances since GEM is identical to an
ab initio field. Both procedures converge to the same solutions upon
increasing the distances when the multipolar approximation starts
to be valid as GEM densities act as a continuous electrostatic model.
Therefore, in a similar spirit as in QM/MM approaches, specific
fragments can be defined so as to be handled with either the GEM
density or the SIBFA multipoles. For the first implementation, the
use of GEM densities were restricted to the metal cations, whereas
the rest of molecules were described using SIBFA.
In order to try to include the different previously discussed
physical effects within a MM scheme, we show here some results
focusing on the polarization contribution in the case of the Ca(II)-
H2O complex.
Figure 8.3 displays four curves, namely the reference ab initio
CSOV polarization contribution, the undamped full GEM polarization
energy, the full GEM + damping approach, and results obtained
upon computing the polarization energy obtained with the exact ab
initio undamped field values extracted from a quantum mechanical
Gaussian 09 computation. The damping procedure is identical to
the one used by SIBFA and is detailed in the technical appendix in
(Chaudret et al., 2014).
It is important to point out that the GEM fields alone, in spite
of their quasi-perfect match with their ab initio counterparts, do
not provide a good reproduction of ab initio results at short-range:
a damping of the fields is required to gain accuracy at very short
distances. Such conclusions are confirmed as our results obtained
from undamped exact ab initio fields (i.e., computed using the
original molecular orbitals) are basically identical to the undamped
GEM results (see Fig. 8.3). This clearly shows that inclusion of
short-range quantum effects, inherently present within QM and
GEM fields, is not sufficient to reproduce the true polarization
energy. This is because the final ab initio CSOV polarization energy
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

286 Status of the GEM Force Field

Polarization energy (kcal/mol)

Ab initio (CSOV)
GEM without damping
GEM with damping
Ab initio fields

Ca(II)-O distance (Å)

Figure 8.3 Ab initio polarization energies (kcal/mol) for the Ca(II)-H2O


complex, computed using the CSOV procedure (blue), polarization energy
computed using: (i) distributed polarizabilities+ab-initio fields (grey), (ii)
distributed polarizabilities +GEM with (green) and (iii) without fields
damping (red).

embodies both penetration and exchange-polarization effects. The


first quantity is present in GEM (as in QM), however, the exchange-
polarization arises from the required orthogonalization of molecular
orbitals of both Ca(II) and H2O fragments within the constrained
self-consistent field procedure. Therefore, since GEM does not
include this repulsive effect, the computed polarization energy is
overestimated. A straightforward solution to the problem is to apply
the exact same field damping procedure that is used for the SIBFA
polarization contribution. As can be seen from Fig. 8.3, the GEM+
damping approach accurately reproduces the CSOV reference by
selectively including the different effects.
The initial implementation of the S/G–1 method has been
developed to describe metal cations in a ligand environment. To
this end, S/G–1 has been parametrized to model Zn(II) and Hg(II)
cations. The Zn(II) parametrization was performed on a series of
representative mono-ligand complexes and subsequently employed
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Combining SIBFA and GEM 287

to calculate the polarization and charge-transfer energies for a series


of multi-valent complexes as shown in Fig. 8.4. S/G–1 matches the
anti-cooperative behavior of the polarization and charge-transfer
(at the RVS level) energies. For example, for [Zn(H2 O)4/2 ]2+ the
charge-transfer energy from S/G–1(RVS) corresponds to −26.8(–
28.7) kcal/mol. Correspondingly, the polarization energy for S/G–
1 is −103.8 kcal/mol compared to −101.7 kcal/mol for its RVS
counterpart.
As a test of the applicability of S/G–1 to large systems, we applied
this method to a Zn(II) dependent system recently studied using
the SIBFA procedure. To this end, the Zn(II)–alcohol dehydrogenase
(ADH) active site (de Courcy et al., 2008) was calculated, GEM was
used to model the Zn(II) cation and the remaining system was
treated with SIBFA. As can be seen in Fig. 8.4, S/G–1 successfully
reproduced the RVS values for a complicated hetero-polyligated
complex (Chaudret et al., 2014).
To further test the performance of S/G–1 to model heavy
metals where relativistic and correlation effects are important ,
a calibration for Hg(II) was performed. The cation polarization
energy requires two components. The first arises from the cation
dipolar polarizability and depends on the electrostatic field that
the cation is subjected to. The second component arises from the
quadrupolar polarizability and depends upon the field gradient
(Buckingham, 1975). The magnitude of the second component was
found to be important in the case of some metal cations, such as
Cu(I) and Hg(II), and this component had to be explicitly formulated
in SIBFA (Gresh et al., 2002). However, although the reference
quantum dipolar polarizability for Hg(II) can be easily obtained, its
quadrupolar polarizability could not be derived by QM calculations
using a small core pseudopotential. Therefore we resorted to the
available Cu(I) (Buckingham, 1975) value as a starting point. The
values of the polarization energy for Hg(II) were obtained from RVS
calculations on the [Hg(H2 O)2 ]2+ complex, with Hg(II) equidistant
from the two water molecules. In this complex, the field undergone
by Hg(II) is zero, but the gradient is non-zero. Subsequently, the
Cu(I) quadrupolar polarizability was employed as a starting point
for the Hg(II) and scaled to match the polarization energy of the
complex.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

288 Status of the GEM Force Field

-50

Polarization [kcal/mol]

-100

2 4 6 8
complex
-10
Charge-transfer [kcal/mol]

-20

-30

-40

-50
2 4 6 8
complex
-80
Induction [kcal/mol]

-100

-120

-140

-160

-180
2 4 6 8
complex

Figure 8.4 Polarization (A), charge-transfer (B) and second order


induction (C) (pol+c–t) energies in poly-ligated Zn(II) complexes
calculated with RVS and S/G–1. The RVS charge-transfer and induction
energies are BSSE corrected. The complexes are as follows: 1: [Zn(CH3 S)3 ]− ,
2: [Zn(CH3 S)4 ]2− , 3: [Zn(imidazole)3 ]2+ , 4: [Zn(imidazole)4 ]2+ ,
5: [Zn(H2 O)6 ]2+ , 6: [Zn(H2 O)5/1 ]2+ , 7: [Zn(H2 O)4/2 ]2+ , 8: cluster model for
alcohol dehydrogenase active site
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

Combining SIBFA and GEM 289

-40

-45 -15

Charge-transfer [kcal/mol]
Polarization [kcal/mol]

-50

-55 -20

-60

-65 -25

-70
2.1 2.2 2.3 2.1 2.2 2.3
Hg-O Distance [Å] Hg-O Distance [Å]

-55 -120
-60
-65
Total [kcal/mol]
Induction [kcal/mol]

-125
-70
-75
-80 -130
-85

-90
-95 -135
2.1 2.2 2.3 2.1 2.2 2.3
Hg-O Distance [Å] Hg-O Distance [Å]

Figure 8.5 Polarization (A), charge-transfer (B), second order induction


(C), and total intermolecular interaction energy (D) for [Hg(H2 O)2 ]2+ as a
function of the Hg–O distances.

Figure 8.5 shows the second-order polarization and charge-


transfer energies and the total intermolecular interaction energies
for a di-aquo mercury complex at different distances. Except at very
short distances, the error between SIBFA/GEM with respect to the
ab initio calculations is very small. The S/G–1 charge-transfer values
are very close to one another. The main differences arise from the
polarization contribution. Moreover, the use of the GEM densities
for the calculation of the second-order components results in better
agreement at short range with respect to RVS than the original
SIBFA method. The close agreement found for the monoligated
Hg(II) complex used for the parametrization is conserved in the
polyligated complexes. This shows that the non-additivity of both
the polarization and charge-transfer components can be reproduced
with S/G–1 for Hg(II) as it has been demonstrated for Zn(II).
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

290 Status of the GEM Force Field

8.7 Conclusion and Perspective

The present contribution reports the development of the ab initio


GEM force field that uses Hermite Gaussian electrostatics to include
short-range quantum mechanical effects into molecular mechanics.
It also details the GEM* and S/G-1 approaches that blend together
GEM with the AMOEBA and SIBFA polarizable potentials, both of
which rely on distributed multipoles. As the capability to achieve
high accuracy on the interaction energies is demonstrated, first
application to molecular dynamics have been detailed as the
potentialities of the GEM densities to accurately treat difficult
systems such as metalloproteins have been exposed. In term of
perspectives, all models should be able to be used directly in MD
simulations as all the required gradients were recently coded. It
should open the possibility of large-scale molecular dynamics using
Gaussian Hermite functions as the models will benefit from recent
advances in algorithmic and in hybrid MPI/OPEN–MP parallelism
that use new scalable strategies with gains going from 2 to 3 orders
in magnitude in time within the present framework (Lipparini
et al., 2014). Overall, efforts will be devoted to propose a scalable
integrate methodology incorporating both distributed Multipoles
and Hermite Gaussian densities in popular packages such as Amber
and Tinker.

Acknowledgments

This work was supported by Wayne State University. Computing


time from Wayne State’s C&IT is gratefully acknowledged. This
work was also supported by the French state funds managed
by CALSIMLAB and the ANR within the Investissements dAvenir
program under referenceANR-11-IDEX-0004-02

References

Andzelm, J. and Wimmer, E. (1992). Density functional Gaussian-type-


orbital approach to molecular geometries, vibrations and reaction
energies, J. Chem. Phys. 96, pp. 1280–1303.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

References 291

Babin, V., Leforestier, C., and Paesani, F. (2013). Development of a “first


principles” water potential with flexible monomers: Dimer potential
energy surface, VRT spectrum, and second virial coefficient, J. Chem.
Theo. Comp. 9, 12, pp. 5395–5403, doi:10.1021/ct400863t, URL http:
//pubs.acs.org/doi/abs/10.1021/ct400863t.
Babin, V., Medders, G. R., and Paesani, F. (2012). Toward a universal water
model: First principles simulations from the dimer to the liquid phase,
J. Phys. Chem. Lett. 3, 24, pp. 3765–3769, doi:10.1021/jz3017733, URL
http://pubs.acs.org/doi/abs/10.1021/jz3017733.
Babin, V., Medders, G. R., and Paesani, F. (2014). Development of a “first
principles” water potential with flexible monomers. ii: Trimer potential
energy surface, third virial coefficient, and small clusters, J. Chem. Theo.
Comp. 10, 4, pp. 1599–1607.
Bagus, P. S., Hermann, K., and C. W. Bauschlicher Jr. (1984). A new analysis
of charge transfer and polarization for ligand–metal bonding: model
studies for Al4 CO and Al4 NH3 , J. Chem. Phys. 80, pp. 4378–4386.
Bayly, C. I., Cieplak, P., Cornell, W. D., and Kollman, P. A. (1993). A well-
behaved electrostatic potential base method using charge restraints
for deriving atomic charges: the RESP method, J. Phys. Chem. 97, pp.
10269–10280.
Becke, A. D. (1988). A multicenter numerical integration scheme for
polyatomic molecules, J. Chem. Phys. 88, 4, pp. 2547–2553.
Böttcher, C. (1993). Theory of Electric Polarization (Elsevier, Amsterdam).
Boys, S. F., and Shavit, I. (1959). A Fundamental Calculation of the Energy
Surface for the System of Three Hydrogen Atoms (AD212985, NTIS,
Springfield, VA).
Buckingham, A. (1975). multipolar expansion, Phil. Trans. Roy. Soc. (London)
B 272, p. 5.
Case, D. A., T.E. Cheatham III, Darden, T. A., Gohlke, H., Luo, R., K.M. Merz
Jr., Onufirev, A., Simmerling, C., Wang, B., and Woods, R. J. (2005). The
amber biomolecular simulation programs, J. Comput. Chem. 26, pp.
1668–1688.
Challacombe, M., Schwgler, E., and Almlöf, J. (1996). Modern developments
in Hartree–Fock theory: Fast methods for computing the Coulomb
matrix, in Computational Chemistry: Review of Current Trends (World
Scientific Inc., Singapore).
Chaudret, R., Nohad Gresh, O. P., Darden, T. A., Cisneros, G. A., and Piquemal,
J.-P. (2014). Towards improved treatment of metal cations in polar-
izable molecular mechanics using the hybrid Gaussian electrostatics
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

292 Status of the GEM Force Field

/ distributed multipoles GEM/SIBFA approach, J. Chem. Theo. Comp.


Submitted.
Cisneros, G. A. (2012). Application of Gaussian electrostatic model (GEM)
distributed multipoles in the AMOEBA force field, J. Chem. Theo.
Comp. 12, pp. 5072–5080, URL http://pubs.acs.org/doi/abs/10.1021/
ct300630u.
Cisneros, G. A., Elking, D. M., Piquemal, J.-P., and Darden, T. A. (2007).
Numerical fitting of molecular properties to hermite Gaussians, J. Phys.
Chem. A 111, pp. 12049–12056.
Cisneros, G. A., Piquemal, J. P., and Darden, T. A. (2005a). Intermolecular
electrostatic energies using density fitting, J. Chem. Phys. 123, p.
044109.
Cisneros, G. A., Piquemal, J.-P., and Darden, T. A. (2005b). Intermolecular
electrostatic energies using density fitting, J. Chem. Phys. 123, p.
044109.
Cisneros, G. A., Piquemal, J.-P., and Darden, T. A. (2006a). Generalization
of the Gaussian electrostatic model: extension to arbitrary angular
momentum, distributed multipoles and computational speedup with
reciprocal space methods, J. Chem. Phys. 125, p. 184101.
Cisneros, G. A., Piquemal, J. P., and Darden, T. A. (2006b). Generalization
of the Gaussian electrostatic model: Extension to arbitrary angular
momentum, distributed multipoles, and speedup with reciprocal space
methods, J. Chem. Phys. 125, p. 184101.
Cisneros, G. A., Tholander, S. N.-I., Parisel, O., Darden, T. A., Elking, D.,
Perera, L., and Piquemal, J.-P. (2008). Simple formulas for improved
point-charge electrostatics in classical force fields and hybrid quantum
mechanical/molecular mechanical embedding, Int. J. Quantum Chem.
108, pp. 1905–1912.
Coppens, P., and Volkov, A. (2004). The interplay between experiment and
theory in charge-density analysis, Acta Cryst. A 60, 5, pp. 357–364.
Darden, T. A. (2007). Dual bases in crystallographic computing, in
International Tables of Chrystallography, Vol. B (Kluwer Academic
Publishers, Dordrecht, The Netherlands).
Day, P. N., Jensen, J. H., Gordon, M. S., Webb, S. P., Stevens, W. J.,
Krauss, M., Garmer, D., Basch, H., and Cohen, D. (1996). An effective
fragment method for modeling solvent effects in quantum mechanical
calculations, J. Chem. Phys. 105, pp. 1968–1986.
de Courcy, B., Piquemal, J.-P., and Gresh, N. (2008). Energy analysis of zn
polycoordination in a metalloprotein environment and of the role of
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

References 293

a neighboring aromatic residue. what is the impact of polarization? J.


Chem. Theo. Comp. 4, 10, pp. 1659–1668.
Domene, C., Fowler, P. W., Wilson, M., Madden, P., and Wheatley, R. J. (2001).
Overlap-model and ab initio cluster calculations of ion properties in
distorted environments, Chem. Phys. Lett. 333, pp. 403–412.
Duke, R. E., Starovoytov, O. N., Piquemal, J.-P., and Cisneros, G. A. (2014).
Gem*: A molecular electronic density-based force field for classical
molecular dynamics simulations, J. Chem. Theo. Comp. 10, pp. 1361–
1365.
Dunlap, B. I., Connolly, J. W. D., and Sabin, J. R. (1979). On first-row diatomic
molecules and local density models, J. Chem. Phys. 71, pp. 4993–4999.
Eckhardt, C. J., and Gavezzotti, A. (2007). Computer simulations and analysis
of structural and energetic features of some crystalline energetic
materials, J. Phys. Chem. B 111, 13, pp. 3430–3437.
Eichkorn, K., Treutler, O., Öhm, H., Häser, M., and Ahlrichs, R. (1995).
Auxiliary basis sets to approximate coulomb potentials, Chem. Phys.
Lett. 240, pp. 283–290.
Eisenschitz, R., and London, F. (1930). Perturbation theory, Z. Phys. 60, pp.
491–527.
Elking, D. M., Cisneros, G. A., Piquemal, J.-P., Darden, T. A., and Pedersen, L. G.
(2010). Gaussian multipole model (gmm), J. Chem. Theo. Comp. 6, pp.
190–202.
Essmann, U., Perera, L., Berkowitz, M. L., Darden, T., Lee, H., and Pedersen,
L. G. (1995). A smooth particle mesh Ewald method, J. Chem. Phys. 103,
pp. 8577–8593.
Ewald, P. (1921). Die Berechnung optischer und elektrostatischer Gitterpo-
tentiale, Ann. Phys. 64, pp. 253–287.
Freitag, M. A., Gordon, M. S., Jensen, J. H., and Stevens, W. J. (2000). Evaluation
of charge penetration between distributed multipolar expansions, J.
Chem. Phys. 112, pp. 7300–7306.
Gavezzotti, A. (2002). Calculation of intermolecular interaction energies by
direct numerical integration over electron densities i. electrostatic and
polarization energies in molecular crystals, J. Phys. Chem. B 106, pp.
4145–4154.
Giese, T. J., Chen, H., Dissanayake, T., Giambasu, G. M., Heldenbrand, H.,
Huang, M., Kuechler, E. R., Lee, T.-S., Panteva, M. T., Radak, B. K., and York,
D. M. (2013). A variational linear-scaling framework to build practical,
efficient next-generation orbital-based quantum force fields, J. Chem.
Theo. Comp. 9, 3, pp. 1417–1427.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

294 Status of the GEM Force Field

Giese, T. J., Chen, H., Huang, M., and York, D. M. (2014). Parametrization
of an orbital-based linear-scaling quantum force field for noncovalent
interactions, J. Chem. Theo. Comp. 10, 3, pp. 1086–1098.
Glendening, E. D. (1994). Natural energy decomposition analysis: explicit
evaluation of electrostatic and polarization effects with application to
aqueous clusters of alkali metal cations and neutrals, J. Am. Chem. Soc.
118, pp. 2473–2482.
Glendening, E. D., and Streitwieser, A. (1994). Natural energy decomposition
analysis: An energy partitioning procedure for molecular interactions
with application to weak hydrogen bonding strong ionic, and moderate
donor–acceptor interactions, J. Chem. Phys. 100, pp. 2900–2909.
Godbout, N., and Andzelm, J. (1999). DGauss Version 2.0, 2.1, 2.3, 4.0:
the file that contains the A1, A2 and P1 auxiliary basis sets can be
obtained from the CCL WWW site at http://www.ccl.net/cca/data/basis-
sets/DGauss/basis.v3.html (Computational Chemistry List, Ltd., Ohio).
Gresh, N., Cisneros, G. A., Darden, T. A., and Piquemal, J.-P. (2007).
Anisotropic, polarizable molecular mechanics studies of inter–,
intra-molecular interactions, and ligand–macromolecule complexes. a
bottom-up strategy, J. Chem. Theo. Comp. 3, pp. 1960–1986.
Gresh, N., Claverie, P., and Pullman, A. (1979). SIBFA, Int. J. Quantum Chem. ,
p. 253Symp 11.
Gresh, N., Policar, C., and Giessner-Prettre, C. (2002). Modeling copper(i)
complexes: SIBFA molecular mechanics versus ab initio energetics and
geometrical arrangements, J. Phys. Chem. A 106, 23, pp. 5660–5670.
Handley, C. M., and Popelier, P. L. A. (2010). Potential energy surfaces fitted
by artificial neural networks, J. Phys. Chem. A 114, 10, pp. 3371–3383,
doi:10.1021/jp9105585.
Hermida-Ramón, J. M., Brdarski, S., Karlström, G., and Berg, U. (2003).
Inter- and intramolecular potential for the n-formylglycinamide-water
system. a comparison between theoretical modeling and empirical
force fields, J. Comput. Chem. 24, 2, pp. 161–176.
Heßelmann, A., Jansen, G., and Schütz, M. (2005). DFT–SAPT with density
fitting: a new efficient method to study intermolecular interaction
energies, J. Chem. Phys. 122, pp. 14103–14120.
Hirshfelder, J. O. (1967a). Perturbation theory for exchange forces, i, Chem.
Phys. Lett. 1, pp. 325–329.
Hirshfelder, J. O. (1967b). Perturbation theory for exchange forces, ii, Chem.
Phys. Lett. 1, pp. 363–368.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

References 295

Hu, H., Lu, Z., and Yang, W. (2007). Fitting molecular electrostatic potentials
from quantum mechanical calculations, J. Chem. Theo. Comp. 3, pp.
1004–1013.
Jeziorski, B., Moszynski, R., and Szalewicz, K. (1994). Perturbation theory
approach to intermolecular potential energy surfaces of van der Waals
complexes, Chem. Rev. 94, pp. 1887–1930.
Jung, Y., Sodt, A., P. M. W. Gill and Head-Gordon, M. (2005). Auxiliary basis
expansions for large scale electronic structure calculations, Proc. Natl.
Acad. Sci. 102, pp. 6692–6697.
Kairys, V., and Jensen, J. H. (1999). Evaluation of the charge penetration
energy between non-orthogonal molecular orbitals using the spherical
Gaussian overlap approximation, Chem. Phys. Lett. 315, 1-2, pp. 140–
144.
Khaliullin, R. Z., Head-Gordon, M., and Bell, A. T. (2006). An efficient
self-consistent field method for large systems of weakly interacting
components, J. Chem. Phys. 124, 20, 204105.
Kitaura, K., and Morokuma, K. (1976). A new energy decomposition scheme
for molecular interactions within the Hartree–Fock approximation, Int.
J. Quantum Chem. 10, pp. 325–340.
Kosov, D. S., and Popelier, P. L. A. (2000). Atomic partitioning of molecular
electrostatic potentials, J. Phys. Chem. A 104, pp. 7339–7345.
Köster, A. M. (1996). Efficient recursive computation of molecular integrals
for density functional methods, J. Chem. Phys. 104, pp. 4114–4124.
Köster, A. M., Calaminici, P., Gómez, Z., and Reveles, U. (2002). Density
functional theory calculation of transition metal clusters, in Reviews of
Modern Quantum Chemistry, A Celebration of the Contribution of Robert
G. Parr (World Scientific, Singapore).
Lipparini, F., Lagardère, L., Stamm, B., Cancès, E., Schnieders, M., Ren, P.,
Maday, Y., and Piquemal, J.-P. (2014). Scalable evaluation of polarization
energy and associated forces in polarizable molecular dynamics: I.
toward massively parallel direct space computations, J. Chem. Theo.
Comp. 10, 4, pp. 1638–1651.
Lu, Z., Zhou, N., Wu, Q., and Zhang, Y. (2011). Directional dependence of
hydrogen bonds: A density-based energy decomposition analysis and
its implications on force field development, J. Chem. Theo. Comp. 7, 12,
pp. 4038–4049.
McDaniel, J. G., and Schmidt, J. R. (2014). First-principles many-body force
fields from the gas phase to liquid: A “universal” approach, J. Phys. Chem.
B 0, 0, p. null, doi:10.1021/jp501128w.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

296 Status of the GEM Force Field

McMurchie, L., and Davidson, E. (1978). One- and two-electron integrals


over cartesian Gaussian functions, J. Comput. Phys 26, pp. 218–231.
Mills, M., and Popelier, P. (2012). Polarisable multipolar electrostatics from
the machine learning method Kriging: an application to alanine, Theo.
Chem. Acc. 131, 3, pp. 1–16, doi:10.1007/s00214-012-1137-7.
Misquitta, A. J., and Stone, A. J. (2006). Distributed polarizabilities obtained
using a constrained density-fitting algorithm, J. Chem. Phys. 124, p.
024111.
Mo, Y., Gao, J., and Peyerimhoff, S. D. (2000). Energy decomposition analysis
of intermolecular interactions using a block-localized wave function
approach, J. Chem. Phys. 112, 13, pp. 5530–5538.
Murrel, J. N., and Shaw, G. (1967). Intermolecular forces in the region of
small orbital overlap, J. Chem. Phys. 46, pp. 1768–1772.
Paricaud, P., Predota, M., Chialvo, A. A., and Cummings, P. T. (2005).
From dimer to condensed phases at extreme conditions: Accurate
predictions of the properties of water by a Gaussian charge polarizable
model, J. Chem. Phys. 122, 24, p. 244511.
Piquemal, J.-P., Chevreau, H., and Gresh, N. (2007). Toward a separate
reproduction of the contributions to the Hartree–Fock and DFT
intermolecular energies by polarizable molecular mechanics with the
SIBFA potential, J. Chem. Theo. Comp. 3, pp. 824–837.
Piquemal, J. P., Cisneros, G. A., Reinhardt, P., Gresh, N., and Darden, T. A.
(2006a). Towards a force field based on density fitting, J. Chem. Phys.
124, p. 104101.
Piquemal, J.-P., Cisneros, G. A., Reinhardt, P., Gresh, N., and Darden, T. A.
(2006b). Towards a force field based on density fitting, J. Chem. Phys.
124, p. 104101.
Piquemal, J.-P., Gresh, N., and Giessner-Prettre, C. (2003). Improved
formulas for the calculation of the electrostatic contribution to the
intermolecular interaction energy from multipolar expansion of the
electronic distribution, J. Phys. Chem. A 107, pp. 10353–10359.
Piquemal, J.-P., Marquez, A., Parisel, O., and Giessner-Prettre, C. (2005).
A CSOV study of the difference between HF and DFT intermolecular
interaction energy values: the importance of the charge transfer
contribution. J. Comput. Chem. 26, pp. 1052–1062.
Podeszwa, R., Bukowski, R., and Szalewicz, K. (2006). Density-fitting
method in symmetry-adapted perturbation theory based on Kohn–
Sham description of monomers, J. Chem. Theo. Comp. 2, pp. 400–412.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

References 297

Ponder, J. W., Wu, C., Ren, P., Pande, V. S., Chodera, J. D., Schnieders, M. J.,
Haque, I., Mobley, D. L., Lambrecht, D. S., Robert A. DiStasio Jr., Head-
Gordon, M., Clark, G. N. I., Johnson, M. E., and Head-Gordon, T. (2010).
Current status of the AMOEBA polarizable force field, J. Phys. Chem. B
114, pp. 2549–2564.
Popelier, P. (2000). Atoms in Molecules: An Introduction (Prentice Hall,
Harlow, England).
Popelier, P. (2012). A generic force field based on quantum chemical
topology, in C. Gatti and P. Macchi (eds.), Modern Charge-Density
Analysis (Springer Netherlands), ISBN 978-90-481-3835-7, pp. 505–
526, doi:10.1007/978-90-481-3836-4 14.
Popelier, P. L. A., Joubert, L., and Kosov, D. S. (2001a). Convergence of the
electrostatic interaction based on topological atoms, J. Phys. Chem. A
105, pp. 8254–8261.
Popelier, P. L. A., Joubert, L., and Kosov, D. S. (2001b). The convergence of
the electrostatic interaction based on topological atoms, J. Phys. Chem.
A 105, pp. 8524–8261.
Popelier, P. L. A., and Kosov, D. S. (2001). Atom–atom partitioning of
intramolecular and intermolecular coulomb energy, J. Chem. Phys. 114,
pp. 6539–6547.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992).
Numerical recipes in fortran77; the art of scientific computing, 2nd Ed.
(Cambridge University Press, New York, NY).
Price, S. (1999). Toward more accurate model intermolecular potentials for
organic molecules, in K. Lipkowitz and D. B. Boyd (eds.), Reviews in
Computational Chemistry, Vol. 14 (VCH Publishers, New York, NY), pp.
225–289.
Raval, A., Piana, S., Eastwood, M. P., Dror, R. O., and Shaw, D. E. (2012).
Refinement of protein structure homology models via long, all-atom
molecular dynamics simulations, Prot. Struct. Func. Bioinf. 80, 8, pp.
2071–2079, doi:10.1002/prot.24098, URL http://dx.doi.org/10.1002/
prot.24098.
Ren, P., and Ponder, J. W. (2002). A consistent treatment of inter– and
intramolecular polarization in molecular mechanics calculations, J.
Comput. Chem. 23, pp. 1497–1506.
Ren, P., and Ponder, J. W. (2003). Polarizable atomic multipole water model
for molecular mechanics simulation, J. Phys. Chem. B 107, pp. 5933–
5947.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

298 Status of the GEM Force Field

Ren, P., Wu, C., and Ponder, J. W. (2011). Polarizable atomic multipole-based
molecular mechanics for organic molecules, J. Chem. Theo. Comp. 7, 10,
pp. 3143–3161, doi:10.1021/ct200304d.
Sagui, C., Pedersen, L. G., and Darden, T. A. (2004). Towards an accurate
representation of electrostatics in classical force fields: Efficient
implementation of multipolar interactions in biomolecular simulations,
J. Chem. Phys. 120, pp. 73–87.
Schulten, K., Phillips, J. C., Kale, L. V., and Bhatele, A. (2008). Biomolecular
modeling in the era of petascale computing, in D. Bader (ed.), Petascale
Computing: Algorithms and Applications (Chapman & Hall / CRC Press),
pp. 165–181.
Shaw, D. E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R. O., Eastwood,
M. P., Bank, J. A., Jumper, J. M., Salmon, J. K., Shan, Y., and Wriggers, W.
(2010). Atomic-Level Characterization of the Structural Dynamics of
Proteins, Science 330, 6002, pp. 341–346.
Stevens, W. J., and Fink, W. H. (1987). Frozen fragment reduced variational
space analysis of hydrogen bonding interactions. applications to the
water dimer, Chem. Phys. Lett. 139, pp. 15–22.
Stone, A. J. (2000). The theory of intermolecular forces (Oxford University
Press, Oxford, UK).
Stone, A. J. (2005). Distributed multipole analysis: Stability for large basis
sets, J. Chem. Theo. Comp. 1, pp. 1128–1132.
Stone, A. J. (2011). Electrostatic damping functions and the penetration
energy, J. Phys. Chem. A 115, 25, pp. 7017–7027.
Stone, J. E., Phillips, J. C., Freddolino, P. L., Hardy, D. J., Trabuco, L. G. and
Schulten, K. (2007). Accelerating molecular modeling applications with
graphics processors, J. Comput. Chem. 28, 16, pp. 2618–2640, doi:{10.
1002/jcc.20829}.
Temelso, B., Archer, K. A., and Shields, G. C. (2011). Benchmark structures
and binding energies of small water clusters with anharmonicity
corrections, J. Phys. Chem. A 115, 43, pp. 12034–12046.
Toukmaji, A., Sagui, C., Board, J. A., and Darden, T. (2000). Efficient PME-
based approach to fixed and induced dipolar interactions, J. Chem. Phys.
113, pp. 10913–10927.
Volkov, A., and Coppens, P. (2004). Calculation of electrostatic interaction
energies in molecular dimers from atomic multipole moments obtained
by different methods of electron density partitioning, J. Comput. Chem.
25, pp. 921–934.
February 2, 2016 12:27 PSP Book - 9in x 6in 08-Qiang-Cui-c08

References 299

Wang, B., and Truhlar, D. G. (2010). Including charge penetration effects in


molecular modeling, J. Chem. Theo. Comp. 6, 11, pp. 3330–3342.
Wheatley, R. (2011). Gaussian multipole functions for describing molecular
charge distributions, Mol. Phys. 7, 3, pp. 761–777, doi:10.1021/
ct100530r.
Wheatley, R. J., and Price, S. L. (1990). An overlap model for estimating the
anisotropy of repulsion, Mol. Phys. 69, pp. 507–533.
Wu, Q., Ayers, P. W., and Zhang, Y. (2009). Density-based energy decom-
position analysis for intermolecular interactions with variationally
determined intermediate state energies, J. Chem. Phys. 131, 16, 164112.
Xie, W., and Gao, J. (2007). The design of a next generation force field: The
x-pol potential, J. Chem. Theo. Comp. 3, 6, pp. 1890–1900.
Xie, W., Orozco, M., Truhlar, D. G., and Gao, J. (2009). X-pol potential:
An electronic structure-based force field for molecular dynamics
simulation of a solvated protein in water, J. Chem. Theo. Comp. 5, 3, pp.
459–467.
York, D., and Yang, W. (1994). The Fast Fourier Poisson (FFP) method for
calculating Ewald sums, J. Chem. Phys. 101, pp. 3298–3300.
This page intentionally left blank
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Chapter 9

Water Models: Looking Forward by


Looking Backward

Toshiko Ichiye
Department of Chemistry, Georgetown University, Washington, DC 20057, USA
[email protected]

9.1 Introduction

Computer simulations of biological molecules have come a long


way since the initial 5 ps molecular dynamics (MD) simulations of
bovine pancreatic trypsin inhibitor in vacuum using an empirical
potential energy function (PEF) [1] almost 40 years ago. Recently,
one of the huge stumbling blocks in MD simulations of biological
problems, namely, sampling, has been greatly alleviated through
special purpose computers for MD simulations, which currently can
simulate small proteins in water on a millisecond timescale [2].
However, these simulations bring focus on another problem: the
accuracy of the potential energy functions. For instance, simulations
of multiple folding and unfolding events of the villin headpiece in
aqueous solution using two different parameter sets for the protein
and two different parameter sets for the water give the same
final folded state but different folding pathways [3]. In addition,

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

302 Water Models

simulations of several proteins using different parameter sets


indicate that the radius of gyration of the unfolded state is about
half of the experimental value and close to that of the simulated
folded state [4]. Moreover, these simulations indicate that not only is
the accuracy of the biomolecular PEF important, but also the water
PEF. In other words, although the biomolecule may be the subject
of interest, the aqueous environment is necessary for the structure
and function of these molecules and so proper modeling of water is
essential.
The importance of the PEF for water is not surprising since
water has many unique properties as a liquid and as a solvent,
which make it inextricable to our understanding of life processes
[5]. The unusual and even anomalous properties of liquid water
are generally attributed to the strong, directional hydrogen bonds
(H-bonds) between neighboring water molecules [6, 7]. Each water
molecule generally accepts two H-bonds from and donates two
other H-bonds to neighboring water molecules, which are arranged
tetrahedrally (Fig. 9.1). The tetrahedral arrangement leads to the
ice structure in the solid phase, but persists even in liquid water
as a tetrahedral network, although with greater disorder and with
an occasional fifth interstitial water [8]. This tetrahedral network
has long been attributed as the cause of hydrophobic effects, which
are essential for the structure of biological macromolecules such
as proteins [9] and assemblies such as micelles and membranes
[10]. However, although the properties of water have been studied
for many years, the underlying molecular features that give rise to
the interactions between water molecules are still not completely
clear [6], which complicates the development of empirical PEFs that
adequately describe the unique properties of water.
Alternatively, quantum mechanical (QM) treatments of liquid
water are now possible by ab initio molecular dynamics (AIMD)
simulations [11], which use density functional theory (DFT) [12,
13] to calculate the exact ground-state energy in principle. However,
the correct form of the exchange-correlation functional is not
known and the functionals generally used in AIMD simulations
give liquids that are too structured compared to experiment [14–
18]. Progress has been made by adding dispersion effects into
functionals [19] and making fast methods for hybrid functionals
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Introduction 303

Figure 9.1 A water molecule and its four hydrogen-bonded nearest


neighbors.

[15]. Also, new MP2 Monte Carlo simulations of the liquid are
important in evaluating these efforts [20]. However, perhaps the
major problem for biological simulations is that AIMD and other QM
simulations are so computationally intensive that current studies of
the pure liquid generally consist of less than 100 water molecules
for less than 100 ps, somewhat like the situation 40 years ago for
classical MD simulations. Thus, for computer simulations of most
biological applications, empirical PEFs are still needed.
Even among empirical PEFs, water molecules are represented at
different levels of sophistication, including intramolecular stretch-
ing and bending, charge distribution, and electronic polarization. A
balance between accuracy and computational efficiency is required
in simulations of biological molecules where the main focus of
interest is not the surrounding water, since more degrees of
freedom in a water model lead to slower, more memory intensive
simulations. In simple descriptions of a water molecule, the effects
of the missing degrees of freedom can be accounted for as average
values found in the liquid environment. For instance, intramolecular
vibrations are generally ignored and nuclear charge and electron
density are usually represented as “partial charges.” Also, electronic
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

304 Water Models

polarization of a water molecule by its neighboring water molecules


in the liquid phase is often accounted for by increasing the dipole
moment from the gas phase value to one reflective of the condensed
phase. However, as a model is made more complex, additional
degrees of freedom should be included in order of importance so
that they do not mask underlying deficiencies by simply increasing
the number of parameters.
A myriad of PEFs for water have been proposed since the advent
of computer simulations in the 1960s, which have been reviewed
extensively including assessments of the progress in modeling water
[21, 22]. While models of water that consider only on-nuclear partial
charges are not able to model many properties of liquid water,
surprisingly more recent nonpolarizable rigid models mimic certain
properties reasonably well with off-nuclear negatively charged sites
at very different locations [23, 24]. However, the sensitivities of
different liquid properties to the charge distribution are not clear
nor are the reasons for failure to reproduce all properties. In
addition, much recent interest has been on the explicit treatment
of polarizability as a possible compromise between purely classical
and quantum mechanical models [25]. However, varying success
in the improvement by polarizable over nonpolarizable models
may indicate that the basic models may not adequately describe
some important molecular features. Furthermore, the strategy for
development of both nonpolarizable and polarizable PEFs for water
models is generally based on the water dimer and pure liquid
properties because it is difficult to determine properties of an
individual molecule in the liquid phase either experimentally or
by quantum mechanical calculations. However, since water models
with very different charge distributions can give rise to similar bulk
properties for the pure liquid, matching bulk properties may not be
a stringent enough criterion. Moreover, this may lead to problems
when these models are used in inhomogeneous environments found
in biological simulations.
Here, we examine different types of rigid, nonpolarizable water
models for use as solvents of biologically relevant molecules, both
because they are the most computationally efficient types of models
and because they can inform us as to what should serve as the
gas phase model for water to add polarization to. Given the large
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Potential Energy Functions for Liquid Water 305

number of water models today, only a few models will be considered


as representative of the different types of models and are selected
because they have been parameterized for use with Ewald-type
electrostatics [26], which are currently the preferred methods
in simulations of biomacromolecules. The focus here is on how
well each type of model can describe the molecular features that
give rise to the unusual properties of water rather than on how
well optimized the parameters are. In other words, given that a
type of model can adequately describe the molecular features, its
parameters can be optimized. Since good pure water properties are
essential for aqueous solvation, a review of pure water properties is
given first, followed by key issues in solutions and interfaces.

9.2 Potential Energy Functions for Liquid Water

Potential energy functions for water used in computer simulations


can be classified by how a water molecule and its interactions are
represented. For the models here, the basic form of the interaction
potential energy E between two water molecules consists of van
der Waals and electrostatic interactions, denoted by subscript “vdW”
and “el” respectively.
E = E vdW + E el (9.1)
The van der Waals interactions between two water molecules
are generally described by a Lennard–Jones m–6 potential (m>6),
representing repulsive and London dispersion interactions
  
  m   m 6/(m−6) 
σαβ m

σαβ 6
E vdW = εαβ −
α, β
m−6 6 rαβ rαβ
(9.2)
where r is the interatomic distance, σ is the atomic diameter, and
ε is the well-depth, with the summation over the atoms α of one
molecule and β of the other molecule. For the repulsive term, m = 12
is commonly chosen, although this value probably overestimates the
steepness of the repulsion [21] and m = 9 appears to work equally
well [27, 28] if not better [29]. An alternative for the repulsive
term is the Buckingham potential, which has been largely avoided
in biological simulations up to now because of the computational
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

306 Water Models

slowness of the exponential function. Although lookup tables make


use of this potential feasible, it converges to a constant as r → 0,
which is unphysical. The hydrogen is generally assumed to have no
van der Waals interactions so that σOH = σHH = 0, which reduces the
total number of interactions but makes most water models “coarse-
grained” as far as their treatment of van der Waals interactions.
Combining rules are usually used for interactions of water molecules
with other types of molecules; the Lorentz–Berthelot rules (σαβ =

1
(σ + σββ ), εαβ = εαα εββ ) are frequently used [26] although the
2 αα
inadequacy of these rules has often been pointed out [30].
The major difference between most models is in the electrostatic
interactions, which give rise to the H-bonding critical for the
properties of water. The models examined here assume H-bonding
can be described completely by E el with no covalent character,
although this may not be true at close distances [31]. We examine
two types of models, which differ in how electrostatic interactions
between molecules are approximated, either by “partial charges”
using Coulomb’s law in the multisite models or by multipoles in an
expansion of Coulomb’s law in the multipole models. To simplify the
discussion, a molecular coordinate frame is defined with the center
at the oxygen, the positive z-axis along the dipole vector pointing
towards the hydrogens, and the y-axis parallel to the H–H bond
(Fig. 9.2).

9.2.1 Multisite Models


The most commonly used PEFs for water in computer simulations
today use rigid, nonpolarizable multisite models of a water molecule
(Fig. 9.3). This is in part because they are perhaps the most
chemically intuitive and in part because they are simple to
implement in MD computer simulations, requiring only standard
algorithms for Newtonian dynamics [26] and constraints such as
the SHAKE algorithm [32] to keep the molecule rigid. The models
discussed here use the “gas phase” nuclear geometry with an OH
bond length of bOH = 0.9572 Å and an HOH bond angle of θHOH
= 104.52◦ [33], although neutron diffraction studies indicate slight
increases in both [34, 35] and a recent polarizable model uses
a geometry reflective of this increase [36, 37]. Three sites are
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Potential Energy Functions for Liquid Water 307

Figure 9.2 The molecular coordinate frame of water.

Figure 9.3 Multisite models of water: (a) three-site, (b) four-site, (c) five-
site.

located on the nuclei, with mass and “partial charges” representing


concentrations of partial positive or negative charge due to a sum of
local electron density and nuclear charge. Additionally, off-nuclear,
massless sites with partial charges may be included to represent
electron density not centered on nuclei. However, the idea that
electron density can be represented as point charges, whether on the
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

308 Water Models

nuclei or on dummy sites, is questionable at an atomistic scale. Also,


partial charges are not physical quantities that can be measured or
calculated directly from quantum mechanical calculations so that
recipes are required for partitioning the molecular wave function
into atomic contributions or for matching observables such as
the electrostatic potential that can be calculated from the wave
function against the same observables calculated from the partial
charges [13]. Since the partial charges are highly dependent on the
recipe used, vastly different simulated properties can result. Finally,
additional sites decrease computational efficiency.
The electrostatic interaction energy between two multisite water
molecules is given by Coulomb’s law:
 qα qβ
E el = , (9.3)
α, β
rαβ

where r is the inter-site distance and q are the partial charges of


the sites, with the summation over the sites α of one molecule and
β of the other molecule. The n-site models with n > 3 discussed
below have no partial charge on the oxygen and no Lennard–Jones
interaction for the hydrogens so the computational efficiency goes
as (n – 1)2 + 1 rather than as n2 . Electrostatic interactions with
other types of molecules (i.e., solutes) are represented by Coulombic
interactions with the partial charges of the other molecules.
Three types of multisite models will be discussed, using a
representative model for each type. Because of the numerous water
models in the literature, the reader is referred to other references
for other examples of each type of model [21–24]. The simple
three-site models (Fig. 9.3a), represented here by TIP3P [38] (the
CHARMM force field uses a modified version of TIP3P [39]), have a
partial negative charge on the single oxygen site and partial positive
charges on the two hydrogen sites. Although long recognized to
be unable to reproduce many properties of water [40, 41], they
are computationally efficient and so are often used in biomolecular
simulations. In addition to the three nuclear sites, the four-site
models (Fig. 9.3b), represented here by TIP4P-Ew [42], have a
massless “M” site with a partial charge qM at distance bOM from
the oxygen along the dipole vector towards the hydrogens, i.e., in
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Potential Energy Functions for Liquid Water 309

the (0, 0, +z) direction in the molecular frame (Fig. 9.2). M sites
were originally proposed to increase the quadrupole moment to
be more consistent with experimental results [40], and might be
considered to represent a shift in the center of the electron density
in the molecular plane away from the oxygen and towards the
hydrogens. Alternatively, in addition to the three nuclear sites, the
five-site models (Fig. 9.3c), represented by TIP5P-E [43], have two
additional massless “L” sites with partial charge qL at distances
bOL and angle θLOL with respect to the oxygen, symmetric to the
dipole vector but out of the molecular plane, i.e., in the (± x, 0, –
z) direction in the molecular frame (Fig. 9.2). Although the L sites
are often considered to represent sp3 hybridized “lone pairs” QM
calculations have long indicated that the “lone pair” effects are from
the highest occupied molecular orbital (HOMO), which arises from
a 2 p orbital perpendicular to the molecular plane [44]. Models with
more sites have been devised, but are not considered here because
of the computational cost.

9.2.2 Molecular Multipole Models


Molecular multipole models represent a water molecule by a single
site with a van der Waals sphere and electric multipole tensors
of the entire molecule [45] (Fig. 9.4). The electrostatic interaction
energy between molecules is obtained by expanding Coulomb’s
law around a single site rather than by multiple sites with partial
charges. A significant advantage of the multipole approach is that

Figure 9.4 Molecular multipoles, from left to right: μ0 , a linear dipole; 0 ,


a linear quadrupole; 2 , a square quadrupole; 0 , a linear octupole; and
2 , a cubic octupole, in which positive charge is light and negative charge is
dark.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

310 Water Models

multipoles are well-defined quantities for a molecule in the gas


phase that can be calculated from the electron density determined
by the wave function from a quantum mechanical calculation.
In principle, they may also be measured experimentally in the
gas phase, although so far only the dipole and quadrupole have
been measured for water [45–47]. In the condensed phase, the
problem of partitioning electron density to separate molecules
arises, but is more defined than the problem of partitioning electron
density between bonded atoms in calculating partial charges of
site molecules. In fact, while partial charges are often obtained
by fitting electrostatic potentials obtained from QM calculations,
which requires knowledge of nuclear and dummy site positions, a
multipole expansion does not require knowledge of either nuclear
or dummy site positions and the multipoles can be obtained directly
from QM calculations. The only need for nuclear positions arises
from the moments of inertia needed for the dynamics. Moreover,
since the expansion is exact in the limit of infinite distance or infinite
number of terms outside of the charge distribution, the number
of terms determines the accuracy of the electrostatic potential at
contact distances. Thus, these models are not necessarily any more
coarse-grained than multisite models using artificial point charges
in representing the electrostatic potential of a water molecule due
to its nuclei and electron density.
The multipole expansion of Coulomb’s law up to the octupole,
which can be found in [48], is more complicated than Eq. 9.3.
The advantage is that the complexity in the expression leads
to computational efficiency in computer simulations once it is
programmed, since only one distance between two interacting water
molecules is needed. The downside is that while it becomes more
accurate as higher order multipoles are added, it also becomes
computationally slower as higher-order multipoles are included
since each n-pole involves a (n – 1) rank tensor. A soft-sphere model
with a dipole, quadrupole, and octupole (SSDQO), which is exact
up to the 1/r 4 term and in addition approximates the 1/r 5 term,
has been developed for computational efficiency [48]. However, the
recent implementation of a fast multipole method in the molecular
dynamics program CHARMM [80] should make this approximation
unnecessary; specifically, the full multipole expansion up to the
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Potential Energy Functions for Liquid Water 311

octupole is actually two to three times faster than a three-site


model such as TIP3P. In addition, the electrostatic interactions
with other types of molecules represented by partial charges as in
typical force fields can be treated via the charge-multipole terms
of the expansion. Alternatively, new multipole force fields can be
developed for other molecules, again with the advantage that the
multipoles can be calculated directly from the wave functions from
quantum mechanical calculations. Also, the advantages in modeling
off-nuclear density without additional computational cost also apply
to polar moieties, and may even apply to treating hydrogens bonded
to nonpolar molecules.
So far, the multipole approach has been rather limited for
modeling liquid water. For rigid, nonpolarizable models, the SSDQO1
discussed here uses multipoles up to the octupole that have been
optimized for various properties of liquid water at STP [29]. In
addition, SCME, a polarizable molecular multipole model using
multipoles up to the hexadecapole, has recently been developed
[49].

9.2.3 Summary
Multisite and multipole models can both be used to generate an
electrostatic potential due to a water molecule. Although more
common, multisite models assume that electron density can be
described by point charges located on the nuclei and sometimes
even on off-nuclear sites. This leads to simple expressions for
intermolecular interactions, although computational time increases
according to the number of sites. Multipole models can give an
accurate representation of the electrostatic potential due to the
nuclei and electron density comprising a molecule if a sufficient
number of terms are included, although computational time
increases according to the number of terms. Since models using
multipoles up to the octupole are more flexible in describing a
charge distribution than four- and five-site models for water and
since new fast computational methods make these models faster
than three-site models, molecular multipole models are a viable
alternative to multisite models.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

312 Water Models

9.3 The Pure Liquid

9.3.1 The Water Molecule in the Liquid Phase


The properties of a water molecule in the gas phase are well studied
experimentally [33, 45–47, 50] and theoretically [44, 51]. However,
determining molecular properties of an individual molecule in the
liquid phase is much more difficult because they will be altered by
the fluctuating, locally inhomogenous environment created by the
surrounding water molecules. Although neutron diffraction studies
can tell us about the geometry [34, 35], comparisons of empirical
water models with QM calculations of molecular properties of a wa-
ter molecule in the liquid phase [52–55] in addition to experimental
bulk properties are perhaps the best means of assessing how well
the model represents a water molecule in the liquid phase.
Recently, the molecular charge distribution of a water molecule
in a liquid-like environment has been examined in quantum
mechanical/molecular mechanical (QM/MM) calculations [55]. The
calculated charge distribution has a large quadrupole consistent
with gas phase experiments [47] and liquid phase QM simulations
[52–54]. The large quadrupole is due to the charges of the hydrogens
and to electron density of the HOMO, a predominately p-type
orbital perpendicular to the molecular plane with no evidence of
sp3 hybridization that gives rise to lone pairs, much like in the gas
phase [44]. This charge distribution can be distinguished from lone
pairs or in-plane charge shifts because the octupole is intermediate
between the two. In addition, recent polarizable models such as
the QDO [56] and TL6P [37] models with polarizable 3D Gaussian
negative charge density also suggest that out-of-plane charge is
important in addition to the large quadrupole. However, while
it is reasonable to model the nuclear charges as point charges,
the electron density is far from localized in the QM calculations
(Fig. 9.5). In particular, the electron density increases in two
maxima ∼0.25 Å from the oxygen out-of-the molecular plane, which
compares favorably with the AIMD simulations [52]. In addition, this
HOMO density is present even in the gas phase [44, 55], indicating
that polarizable models should take this out-of-plane charge into
account in the fixed charges, i.e., in the gas phase.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

The Pure Liquid 313

Figure 9.5 Difference in electron density of the molecule from free atoms
with four MM neighbors at the MP2/6-31G** level/basis set from a
geometry optimization at the B3LYP/6-31G** level/basis set (a) in the plane
of the molecule and (b) perpendicular to the molecular plane. Solid contours
are positive differences, dashed contours are negative differences.

Spherical multipoles also provide a means to compare classical


versus QM representations of water. In particular, they describe the
charge distribution of a molecule in progressively higher levels of
detail as the order of the multipole increases (Fig. 9.4). For instance,
the large dipole μ of water is due to the separation of the net
positive charge of the hydrogens in the +z direction from the net
negative charge of the excess electron density on the oxygen in
the –z direction. Next, the large planar quadrupole 2 is due to
the separation of the two hydrogens in the ± y directions and of
negatively charged electron density in the ± x directions. Also, the
linear quadrupole  is small because little positive charge occurs in
the –z direction. In addition, the linear octupole  is large because
the positive charge of the hydrogens lies in +z direction but off the z-
axis. Finally, the cubic octupole 2 will vary according to how much
positive charge is along the molecular plane in the +z direction
versus how much negative charge is perpendicular to the molecular
plane in the –z direction.
The multipole moments of the classical models and QM results
[55] are compared in Table 9.1. In examining the QM results, the
moments increase from ∼10 to ∼30% from the gas phase to the
liquid phase. The classical models have moments that are generally
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

314 Water Models

Table 9.1 Multipoles for multisite and molecular multipole


models, quantum mechanical calculations, and experiment

Source μ0 (D) 0 (DÅ) 2 (DÅ) 0 (DÅ2 ) 2 (DÅ2 )


TIP3P 2.35 0.23 1.72 −1.21 1.68
TIP4P-Ew 2.32 0.21 2.16 −1.53 2.11
TIP5P-E 2.29 0.13 1.56 −1.01 0.59
SSDQO1 2.12 0.00 2.13 −1.34 1.15
MP2+4MM (cluster) 2.49 0.13 2.93 −1.73 2.09
MP2 (gas phase) 1.86 0.11 2.54 −1.35 1.91
Exp (gas phase) 1.86 0.11 2.57 NA NA

Note: All moments are with respect to a molecular coordinate system centered on
the oxygen.

smaller than QM, presumably because the electron density is more


spread out and thus at atomic separations may be reflected better
by smaller moments. However, a major discrepancy is that 2 of all
of the classical models are much smaller than the QM value even in
the gas phase. In addition, although TIP4P-Ew has the largest 2 of
the multisite models, its 2 is even larger than the QM value in the
cluster. To understand these differences better, the multipoles are
examined as ratios (Table 9.2). To account for differences in classical
versus QM descriptions, the size of the planar quadrupole relative
to the dipole is measured by mquad = √23 2 /μ0 − Å. It is equal
to 1 for a three-site model with a perfectly tetrahedral HOH angle
of 109.47 and OH bond length of 1 Å. Comparing the QM results
in the gas and liquid-like phases in Tables 9.1 and 9.2, electronic
polarization affects the dipole more strongly than the quadrupole.
Even so, the classical models still underpredict 2 , with TIP4P-Ew
and SSDQO1 having the largest mquad . Another feature of the charge
distribution, the out-of-plane character, is measured by mout = 1 −
2 /( √13 2 -Å − 12 0 ). It is equal to 0 for a three-site model with a
perfectly tetrahedral HOH angle of 109.47◦ and OH bond length of
1 Å and to 1 for a five-site model with perfectly tetrahedral HOH
and LOL angles of 109.47◦ and OH and OL lengths of 1 Å. Comparing
the QM results in the gas and liquid-like phases in Table 9.2, out-of-
plane character is apparently present in both phases and the liquid-
like environment increases the out-of-plane character somewhat.
Only the SSDQO1 and TIP5P-E models show out-of-plane character,
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

The Pure Liquid 315

Table 9.2 Ratios of multipoles and first hydra-


tion shell order (see text) from QM, multisite,
and molecular multipole models

Source mquad mout S2 S 2

TIP3P 0.85 −0.05 0.597 0.134


TIP4P-Ew 1.08 −0.05 0.685 0.126
TIP5P-E 0.79 0.58 0.700 0.031
SSDQO1 1.16 0.39 0.708 0.022
MP2+4MM (liquid-like) 1.33 0.17 NA NA
MP2 (gas phase) 1.58 0.11 NA NA

although overpredicting the amount compared to the QM results,


especially in TIP5P-E.
Overall, the QM results indicate both a large quadrupole and out-
of-plane character in the charge distribution of a water molecule
in the liquid phase. Only SSDQO1 has multipoles consistent with
both features, although somewhat too little for the former and too
much for the latter. Furthermore, it was shown that multisite models
require at least six points to reproduce moments consistent with
the QM results [55], as has also been found for polarizable multisite
models [37, 57]. On the other hand, it was also shown [55] that
multipole models are able to reproduce electrostatic potentials due
to the QM charge distribution with moments up to the octupole.

9.3.2 Liquid Water


A vast literature exists on the experimental properties of bulk
liquid water at different temperatures and pressures and the
AIMD simulations mentioned in the introduction can also provide
qualitative information with the caveats noted in the introduction.
The ability of different rigid multisite models to reproduce various
properties of water is partly due to how well the model represents
the key molecular features and but is also partly due to how well the
parameters have been optimized for the properties, with more sites
giving more degrees of freedom to optimize. As mentioned in the
introduction, the focus here is on how the different types of models
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

316 Water Models

Figure 9.6 Three-dimensional distribution of neighboring water (red


density) around a central water molecule (red and white stick model),
contoured at 3 times bulk. Left to right, TIP3P, TIP4P-Ew (M site in blue),
TIP5P-E (L sites in pink), and SSDQO1 (M and L sites in blue and pink,
respectively, to indicate that both effects are included).

(Section 9.2) perform rather than how well the parameterization of


a particular model reproduces experimental values.
The hydration shell of a water molecule in the liquid can be
viewed as the link between the molecule and the bulk liquid. The
three-dimensional (3D) distribution of water molecules around a
water molecule gives a more detailed picture of the environment of
the water molecule [18, 58, 59]; however, while readily calculated
from simulations, it is hard to obtain directly from experiment.
3D-distributions of the first shell water oxygens from the classical
models (Fig. 9.6) all predict that the H-neighbors are localized into
peaks along the direction of the OH vectors. On the other hand, the L-
neighbors become increasing localized in the “lone pair” directions
in the order TIP3P < TIP4P-Ew < TIP5P-E ≈ SSDQO1. (Since the
3D distribution of the hydrogens [18] and the orientation of the
OH vectors [59] are necessary to demonstrate that the neighbors
correspond to actual H-bond acceptors and donors, the terms H-
and L-neighbors are used to denote the neighbors in the hydrogen
or “lone pair” directions, respectively.)
To quantify the hydration shell of a water molecule, a tetrahedral
order parameter S 2 has been proposed [60] as an average of order
parameters in H- and L- directions of a central water, given by
2

1
Sα =
2
P2 (uα j · dα j ) , (9.4)
2 j =1

where α = H, L; and P2 is the second order Legendre polynomial,


d is a unit vector between the central water and an α nearest
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

The Pure Liquid 317

neighbor water j , and u are unit vectors in the four α tetrahedral


directions in the molecular frame of the central water. S 2 = 1 for
perfect tetrahedral order; S 2 = 0 for a random distribution. Linear
regressions of S 2 and S 2 with the multipoles of a series of three-,
four-, and five-site models as well as multipole models for water
show that the hydration shell is a function of the moments up to the
octupole [60]. The overall order of the hydration shell S 2 increases
quadratically with 2 but decreases linearly with 2 , so that the
four-site models (which have large 2 ), the five-site models (which
have small 2 ), and the multipole models (which have large 2 and
small 2 ) all have more ordered hydration shells (Fig. 9.6). Also, the
degree of molecular symmetry between the location of charge in the
H and L directions increases with decreasing 2 , so that the three-
and four-site models have large S 2 = SH2 – SL2 while five-site and
multipole models have small S 2 (Fig. 9.6).
Rather than an exhaustive comparison of different properties,
the dependence of few key properties on the tetrahedral order of
the hydration shell has revealed important features of the hydration
[60]. In particular, the temperature of maximum density TMD has
been linked with the ability to show hydrophobic effects, the
diffusion constant D sets the timescale with respect to the forces,
and the dielectric permittivity ε is a measure of the dielectric
properties of the fluid. Comparing the series of water models, a
linear regression of the calculated TMD with S 2 gives
2 2
TMD ≈ TMD, exp 0.7 S −1 , (9.5)
where TMD,exp = 277 K with a correlation coefficient of R 2 = 0.998
(Fig. 9.7, bottom) while a linear regression of the calculated D with
S 2 gives

D = Dexp + (20.2 × 10−9 m2 /s) 1 − 0.706
1
S2 , (9.6)
where Dexp = 2.31 × 10−9 m2 /s with R 2 = 0.985 (Fig. 9.7, top).
Thus, both the TMD and the D show universal scaling for all of the
models examined with the average order of the hydration shell and
both converge to the experimental values at S 2 ≈ 0.7. However, the
Kirkwood gK -factor [61], which is related to ε by μ20 gK = kB T (ε – 1)
(2ε+1)/(4πρε), depends on both S 2 and on S 2 ; thus,
  
1 1
gK ≈ 1 + 50 1 − S 2
1− S 2
(9.7)
0.18 0.745 .
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

318 Water Models

Figure 9.7 Difference from experiment of simulated diffusion constant,


D (solid symbols, left axis), and of simulated temperature of maximum
density, TMD (open symbols, right axis), as a function of S 2 . The linear
regressions of D (dashed line) and TMD (dotted line) given in Eqs. 9.5 and 9.6
are also shown. Symbols indicate three- (triangles), four- (diamonds), five-
(squares), and multipole (circles) models, with TIP3P, TIP4P-Ew, TIP5P-E,
and SSDQO1 indicated by larger symbols.

Since S 2 is different for models with and without out-of-plane


charge (Fig. 9.8, open symbols, the results are plotted as μ20 gK since
μ0 in the liquid is not known), three- and four-site models are
only able to match experiment for ε at S 2 ≈ 0.63, while five-site
and multipole models (in particular, TIP5P-E and SSDQO1) match
experiment at S 2 ≈ 0.71. Thus, TIP5P-E and SSDQO1 match the
experimental TMD , D, and ε simultaneously, indicating out-of-plane
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

The Pure Liquid 319

Figure 9.8 Difference from experiment of simulated Kirkwood g-factor,


μ20 gK , as a function of S 2 (solid symbols). Symbol types are given in Fig.
9.7. The linear regression given in Eq. 9.5 for gK of models with only in-
plane charge (dotted line) and with out-of-plane charge (dashed line), using
appropriate values of S 2 . Since gK is multiplied by μ20 for the model so that
the difference from experiment can be plotted, the linear regression lines
are not straight.

charge is necessary to describe all three properties with the same


model.
In modeling water, deviations of ε from experiment are often
considered unimportant as long as ε is large enough to provide
dielectric screening. However, the physical reason for deviations
between the models apparently comes from two significant sources
since gK = < cos θ > measures the correlation of the dipole vector
of a central water with those of its neighboring waters, where θ
is the angle between these two vectors. First, models with a large
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

320 Water Models

quadrupole such as TIP4P-Ew and SSDQO1 have a first hydration


shell with θ that is larger than three- and five-site models, since the
large quadrupole decreases the dipole-dipole correlation. Thus, the
contributions to gK of the first hydration shell of a central water
molecule in TIP4P-Ew and SSDQO1 are somewhat lower than in
TIP3P and TIP5P-E. Second, models with small S 2 such as TIP5P-E
and SSDQO1 have a more ordered tetrahedral network of molecules
in the liquid, extending to three shells versus only two for three- and
four-site models. Thus, the contributions of the first two shells have
reached the full values of gK in TIP3P and TIP4P-Ew while this sum
is still below the full value in TIP5P-E and SSDQO1. This implies that
the structure of the liquid predicted by SSDQO1 is different in that
the tetrahedral network of hydrogen-bonded water extends further
as in TIP5P-E while the orientation of the first shell is less dipolar as
in TIP4P-Ew.

9.3.3 Summary
Overall, the charge distribution of a water molecule determines
the nature of its hydration shell, and that the nature of the
tetrahedral order of the hydration shell determines its liquid state
properties. Two features of the charge distribution, which require
either six sites or multipoles up to the octupole, appear important
for reproducing liquid water properties: the large quadrupole and
out-of-plane negative charge density, the latter giving rise to an
intermediate size octupole [62]. The large quadrupole increases the
average order of the hydration shell and also reduces the dipole-
dipole correlation between neighboring water molecules. The out-
of-plane charge also increases the average order of the hydration
shell but also makes it more symmetric between the H and L
directions, which extends the overall tetrahedral network by about
one hydration shell.

9.4 Aqueous Solutions

The electrostatic potential becomes more important in aqueous


solutions or at interfaces, because the average bulk liquid properties
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Aqueous Solutions 321

no longer apply. While a complete review of solvation properties of


the different types of models is not presented here, a few key issues
are raised.

9.4.1 Hydrophobic Solvation


The hydrophobic effect is essential in the structure of biomacro-
molecules. Since most hydrophobic molecules have relatively low
solubility in water, experiments on alcohols and other amphiphilic
molecules [63] as well as computational studies and theory [64,
65] have played a large role in our understanding of hydrophobic
solvation. Early experimental studies of the anomalous concen-
tration dependence of the partial molar volumes of alcohols in
water, which exhibit a minimum at very low alcohol concentrations
[63], indicated an iceberg-like hydration shell of hydration groups.
However, the current view of the hydrophobic effects, which is
largely based on computation and theory, emphasizes a length-scale
crossover at ∼1 nm [64]. Experimentally, the nature of hydrophobic
solvation of alcohols is still unclear since neutron diffraction studies
of methanol–water mixtures have shown molecular-level segrega-
tion consistent with hydrophobic effects but no enhancement of
water structure around nonpolar groups [66, 67] while new Raman
scattering measurements of n-alcohols have found hydrophobic
hydration shells with greater tetrahedral order than in bulk water
[68].
Recent large-scale simulations of t-butanol-water mixtures show
strong force field dependence as well as finite size effects on
aggregation [69], which lead to questions about smaller-scale sim-
ulation results for hydrophobic effects in water. These simulations
consisted of 32,000 to 64,000 particles using four different force
fields, all using three-site water models. Examining mole fractions
of t-butanol, X tB , between the experimental minimum in the partial
molar volume of t-butanol at X tB, min = 0.03 to X tB = 0.06 [70], only
two of the force fields show aggregation as seen in experiment, and
also show differences in the nature of the aggregation. Perhaps more
disturbing is that all of the force fields demonstrated unphysical
demixing for X tB >∼0.1 in the larger simulations not found in
smaller systems. This may mean that all four of the force fields
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

322 Water Models

Table 9.3 Selected pure water properties at


∼300 K and 1 atm for different models [60],
except for TMD [24], and from experiment [5]

Model ρ (g/cm3 ) TMD (K) D (10−5 cm2 /s) ε

TIP3P 0.979 182 5.48 94


TIP4P-Ew 0.991 280 2.33 66
TIP5P-E 1.003 280 2.75 101
SSDQO1 0.993 260 2.39 73
Exp 0.997 277 2.30 78

poorly represent the real system, which does not demix, or may be
an artifact of system size even at 64,000 particles.
In addition, recent simulations of ethanol-water mixtures
illustrate dependence on the water force field at very dilute
concentrations [62], which are relevant to the initial stages of
hydrophobic association and were not explored in the simulations
of t-butanol-water mixtures. The minimum in the partial molar
volume of ethanol VE occurs at a mole fraction of ethanol X E that
is below significant aggregation. In these simulations, the VE (Fig.
9.9) of TIP3P and TIP4P-Ew have only a slight minimum while
SSDQO1 has a pronounced minimum as in experiment, although
at a slightly lower X E than X E,min = 0.06 in experimental density
data [70]. More recent simulations (Fig. 9.9) indicate TIP5P-E may
not even have a minimum, although more thorough studies are
necessary. In addition, the partial molar volume of water VW of
SSDQO1 is constant up to X E,min and drops at higher concentrations,
also consistent with experiment, while the VW of TIP3P, SPC/E, and
TIP4P-Ew decrease more continuously. Moreover, the coordination
numbers of water around the terminal carbon, nC1Ow , (Fig. 9.10) and
water, nOwOw , as a function of X E for SSDQO1 are consistent with
neutron diffraction data. For instance, the values of nC1Ow and nOwOw
indicate that hydration shells for the terminal carbon molecule that
are too few by one water molecule and for the water are too many by
half a water molecule, respectively, for concentrations from infinite
dilution (Table 9.4) up to X E ≈ 0.5. While the ethanol parameters
also play a role, this indicates that the hydrophobic hydration
predicted by the site models may be qualitatively incorrect since
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Aqueous Solutions 323

Figure 9.9 The partial molar volumes of ethanol VE in ethanol–water


mixtures as a function of ethanol mole fraction X E for TIP3P (dotted line),
TIP4P-Ew (dot-dashed line), TIP5P-E (dashed line), SSDQO1 (solid line), and
experiment (gray solid line).

the minimum in VE appears to arise from the breakdown of the


hydration shell around the hydrocarbons as they begin to associate.
Finally, the liquid–vapor interface probes hydrophobic effects
around solutes of infinite diameter. In particular, the surface

Table 9.4 Selected aqueous properties at ∼300 K and 1 atm

Model VE (X min )–VE (0) (cc/mol) Surface potential (V)


TIP3P −0.45 0.28
TIP4P-Ew −0.33 0.43
TIP5P-E −a 0.12
SSDQO1 −1.71 0.01
Exp −1.83b 0.03–0.14c

a
No apparent minimum.
b
Reference [70]
c
Farrell, J. R., McTigue, P. Precise compensating potential difference measurements
with a volatiac cell: The surface potential of water. J. Electroanal. Chem., 1982.139:
37–56; Fawcett, W. R., The ionic work function and its role in estimating absolute
electrode potential. Langmuir, 2008. 24: 9868–9875.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

324 Water Models

Figure 9.10 The coordination numbers nC2Ow in ethanol–water mixtures


as a function of ethanol mole fraction X E for TIP3P (dotted line), TIP4P-
Ew (dot-dashed line), TIP5P-E (dashed line), SSDQO1 (solid line), and
experiment (crosses).

potential measures the dipole orientation at the interface and can


be compared to experiment. Although there has been controversy
over the quadrupole contributions, only the dipole contributes.
Interestingly, TIP3P, TIP4P-Ew, and other models with large S 2
have large φ while TIP5P-E, SSDQO1, and other models with small
S 2 have small φ, close to experiment (Table 9.4).
Altogether, these results lead to concern not only for simulations
of micelles and membranes, but also for all hydrophobic association
including that involved in protein folding. In particular, the SSDQO1
results support a more extensive tetrahedral network.

9.4.2 Polar Solvation


Solvation of polar moieties in water is also important for under-
standing structure of biomacromolecules in solution. Given the small
enthalpic difference between the folded and unfolded states, the
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Aqueous Solutions 325

proper balance in the interaction energy between polar groups and


water and that between hydrophobic groups and water is essential.
The aqueous solvation of polar groups will be dependent on the
PEF used for the polar moiety. For instance, interactions between
hydroxyl groups and water, a common form of hydrogen bonding
between biomolecules and water, should use equal treatment of
off-nuclear electron density in both the biomolecule and water.
For instance, although not included in the nonpolarizable all-
atom CHARMM force fields [39], the new polarizable CHARMM
force fields using the Drude model have dummy charges for so-
called lone pairs on sp3 -hybridized oxygens and sp2 -hybridized
nitrogens [71]. Thus, accounting for off-nuclear electron density
in nonpolarizable biomolecular PEFs may result in substantial
improvement, especially when coupled with water PEFs that also
account for off-nuclear electron density.

9.4.3 Ionic Solvation


The accuracy of solvation of ions by a water model is important in
at least two respects for biological simulations. First, counterions
are usually necessary to mimic experimental conditions for most
in vitro studies, and errors in the affinity of the ions for being
solvated by water versus attaching to proteins, nucleic acids, or
membrane surfaces could affect the structure of the biomolecule
if the surface density of ions is either too high or too low. Second,
charged amino acids, nucleic acid phosphate groups, and ionic lipids
are all solvated by water, so incorrect ionic solvation may even affect
the biomolecule directly, although the spread of charge over more
atoms may make this direct effect less important.
Perhaps the most thorough study of ions in rigid nonpolarizable
water models for simulations of biological molecules was carried out
by Cheatham [72]. The alkali (Li+ , Na+ , K+ , Rb+ , and Cs+ ) and halide
(F− , Cl− , Br− , I− ) ions were treated as simple charged Lennard–
Jones spheres with their net charge at the center, using the Lorentz–
Berthelot rules for combining the Lennard–Jones parameters of
the ion and three different water models. The Lennard–Jones
parameters of the ions were optimized based on lattice energies
and lattice constants of alkali halide salt crystals, free energies of
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

326 Water Models

ion-water hydration, and binding energies. The hydration energy


of simple ions is determined by both the number and orientation
of water molecules that surround the ion; however, while neutron
diffraction experiments [73] and AIMD [74–76] predict ∼5 water
molecules in the first shell around Na+ , Cheatham’s parameters and
most other parameter sets predict ∼6 [77]. Although more recent
parameters have adjusted the ion radius, no complete studies of ions
such as Cheatham’s have been done to our knowledge. In addition,
although the experimental data is not clear and the limitations
of AIMD have been discussed above, recent experimental studies
indicate that the neighbors range between a dipolar orientation with
the dipole vector parallel to the internuclear distance rONa and a
“bent” orientation with the dipole vector tilted so that one of the
“lone pairs” points along rONa [81] while the AIMD [74, 75] predicts
both. Most classical potentials predict very different orientations for
the dipole vectors of the water molecules, with TIP3P predicting
mostly the dipolar orientation, TIP5P-E and SSDQO1 predicting
mostly the bent orientation, and TIP4P-Ew predicting a range,
although slightly overpredicting the dipolar orientation compared
to AIMD [77]. However, the number and orientation of neighbors
in the first shell are dependent on each other, so further studies
are necessary. In addition, the value of the hydration energy of the
proton, which is used as the zero for the hydration energies of all
other ions, has been brought into question [78].
The extent to which polarization and charge transfer affect the
ion-water potential is also important. For instance, charge transfer
was significant in QM/MM studies of chloride ions and their first
hydration shell waters calculated at the MP2 level with aug-cc-
pVDZ basis sets using configurations generated from classical MD
simulations [78]. In studies of both singly charged anions and
cations in water using an empirical model for charge transfer and
polarizability in both the ions and water indicate that the effects of
charge transfer are larger than those of polarization of the first shell
water and actually in the opposite direction [79]. In addition, while
the polarization of the chloride is significant as in other studies,
the charge transfer reduces the polarization somewhat. Since charge
transfer appears larger than the polarization of the water molecules
and mostly involves the ions and the first and second shell water
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

Conclusions 327

Figure 9.11 The ethanol hydration shell from simulations with TIP4P-
Ew (left) and SSDQO1 (right). Large spheres are ethanol oxygens (red)
and carbons (aqua); small spheres are water oxygens (red) and hydrogens
(white).

(Fig. 9.11), it may be worthwhile considering charge transfer in rigid


models of water.

9.4.4 Summary
While further studies are necessary, many results indicate that
developments in the nonpolarizable force fields for simulations of
biomolecules or their prototypes in water may lead to substantial
improvement without the addition of polarization. First, both the
large quadrupole and out-of-plane charge appears necessary for the
water structure in alcohol-water mixtures. Second, polar groups in
proteins most likely should be treated at the same level as the water
molecules in the solvent. Third, the structure of water around ions
is important in matching the experimental ion hydration energies.
Finally, charge transfer between the ion and water may outweigh
polarization effects of water molecules and cations, although the
polarization of anions appears significant.

9.5 Conclusions

The potential energy functions for both water and biomolecules


are important in making atomistic simulations of biomolecules
in solution quantitative. Water, as a liquid with many unique
properties that are important for life on earth, must be modeled with
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

328 Water Models

Figure 9.12 Direction of charge transfer. (a) The cation receives charge
from a water molecule in the first shell, which receives charge from one in
the second shell. (b) The anion transfers charge to a water molecule in the
first shell, which transfers charge to one in the second shell. Figure courtesy
of Marielle Soniat and Steven W. Rick.

sufficient accuracy such that the balance between hydrophilic and


hydrophobic interactions is accurate. In addition, lessons learned
from modeling water can be applied to making better potential
energy functions for biomolecules, and the level of accuracy should
be balanced for the solute and the solvent. While point charges
have been used because they are simple to conceptualize and
program, multipole models can give a better representation of
the electrostatic potential arising from the combination of nuclear
charge and electron density that comprises a molecule. Since fast
computational methods for multipole models make them more
efficient than multisite models especially for more complicated
charge distributions resulting from off-nuclear electron density,
exploration of these models is warranted. Finally, understanding
the magnitude of effects is important in deciding how to improve
potential energy functions.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

References 329

Acknowledgments

We gratefully acknowledge support of the National Science Foun-


dation (CHE-1158267), the National Institutes of Health (R21-
GM104500) and the McGowan Foundation. This work used the
Extreme Science and Engineering Discovery Environment (XSEDE),
which is supported by National Science Foundation grant number
OCI-1053575; the Matrix and Medusa clusters, maintained by
University Information Services at Georgetown University; and the
Lobos cluster at the Laboratory for Computational Biology, National
Heart, Lung, and Blood Institute, National Institutes of Health
through the generosity of Bernard R. Brooks. We thank Thomas W.
Beck, Steven W. Rick, Andrew C. Simmonett, and Frank C. Pickard for
bringing several interesting papers to our attention. We also thank
Ming-Liang Tan, Shuqiang Niu, Joseph R. Cendagorta, and Kelly N.
Tran for assistance with the figures as well as Marielle Soniat and
Steven W. Rick for providing Figure 9.11.

References

1. McCammon, J. A., B. R. Gelin, and M. Karplus, Dynamics of folded


proteins. Nature, 1977. 267: 585–590.
2. Shaw, D. E., R. O. Dror, J. K. Salmon, J. P. Grossman, K. M. Mackenzie, J.
A. Bank, C. Young, M. M. Deneroff, B. Batson, K. J. Bowers, E. Chow, M. P.
Eastwood, D. J. Ierardi, J. L. Klepeis, J. S. Kuskin, R. H. Larson, K. Lindorff-
Larsen, P. Maragakis, M. A. Moraes, S. Piana, Y. Shan, and B. Towles,
Millisecond-scale molecular dynamics simulations on Anton. Proc. Conf.
High Perf. Comput. Network. Storage Anal., 2009: 1–11.
3. Piana, S., K. Lindorff-Larsen, and D. E. Shaw, How robust are protein
folding simulations with respect to force field parameterization?
Biophys. J., 2011. 100: L47–L49.
4. Piana, S., J. L. Klepeis, and D. E. Shaw, Assessing the accuracy of physical
models used in protein-folding simulations: quantitative assessment
from long molecular dynamics simulations. Curr. Opin. Struct. Biol.,
2014. 24: 98–105. 10.1016/j.sbi.201312.066.
5. Eisenberg, D., and W. Kauzmann, The Structure and Properties of Water.
1st ed. 1969, New York: Oxford University Press.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

330 Water Models

6. Ball, P., Water: An enduring mystery. Nature, 2008. 452: 291–292.


doi:10.1038/452291a.
7. Poole, P. H., F. Sciortino, T. Grande, H. E. Stanley, and C. A. Angell, Effect
of hydrogen onds on the thermodynamic behavior of liquid water. Phys.
Rev. Lett., 1994. 73: 1632–1635.
8. Narten, A. H., M. D. Danford, and H. A. Levy, On the interstitial water
structure. Disc. Farad. Soc., 1967. 43: 97.
9. Kauzmann, W., Some factors in the interpretation of protein denatura-
tion. Adv. Protein Chem., 1959. 14: 1–63.
10. Tanford, C., The Hydrophobic Effect: Formation of Micelles and Biological
Membranes. 1973, New York: Wiley. 200.
11. Car, R., and M. Parrinello, Unified approach for molecular dynamics and
density-functional theory. Phys. Rev. Lett., 1985. 55: 2471–2474.
12. Parr, R. G., and W. Yang, Density-functional Theory of Atoms and
Molecules. International Series of Monographs on Chemistry. Vol. 16.
1989, Oxford: Oxford University Press. ix, 333 p.
13. Cramer, C. J., Essentials of Computational Chemistry: Theories and Models.
2nd ed. 2004, West Sussex, UK: Wiley.
14. Kuo, I. F. W., C. J. Mundy, M. J. McGrath, J. I. Siepmann, J. VandeVondele,
M. Sprik, J. Hutter, B. Chen, M. L. Klein, F. Mohamed, M. Krack, and M.
Parrinello, Liquid water from first principles: Investigation of different
sampling approaches. J. Phys. Chem. B, 2004. 108: 12990–12998. Doi
10.1021/Jp047788i.
15. VandeVondele, J., F. Mohamed, M. Krack, J. Hutter, M. Sprik, and M.
Parrinello, The influence of temperature and density functional models
in ab initio molecular dynamics simulations of liquid water. J. Chem.
Phys., 2005. 122: 014515–014516.
16. Grossman, J. C., E. Schwegler, E. W. Draeger, F. Gygi, and G. Galli, Towards
an assessment of the accuracy of density functional theory for first
principles simulations of water. J. Chem. Phys., 2004. 120: 300–311.
17. Schwegler, E., J. C. Grossman, F. Gygi, and G. Galli, Towards an assessment
of the accuracy of density functional theory for first principles
simulations of water. II. J. Chem. Phys., 2004. 121: 5400–5409.
18. Mantz, Y. A., B. Chen, and G. J. Martyna, Structural correlations and motifs
in liquid water at selected temperatures: Ab initio and empirical model
predictions. J. Phys. Chem. B, 2006. 110: 3540–3554.
19. Grimme, S., J. Antony, S. Ehrlich, and H. Krieg, A consistent and accurate
ab initio parametrization of density functional dispersion correction
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

References 331

(DFT-D) for the 94 elements H-Pu. J. Chem. Phys., 2010. 132: 154104.
doi: 10.1063/1.3382344.
20. Del Ben, M., M. Schoherr, J. Hutter, and J. VandeVondele, Bulk liquid water
at ambient temperature and pressure from MP2 theory. J. Phys. Chem.
Lett., 2013. 4: 3753–3759. 10.1021/jz/401931f.
21. Guillot, B., A reappraisal of what we have learnt during three decades of
computer simulations of water. J. Molec. Liq., 2002. 101: 219–260.
22. Jorgensen, W. L., and J. Tirado-Rives, Potential energy functions
for atomic-level simulations of water and organic and biomolecular
systems. Proc. Natl. Acad. Sci. U. S. A., 2005. 102: 6665–6670. Doi
10.1073/Pnas.0408037102.
23. Ichiye, T., Water in the liquid state: A computational viewpoint. Adv.
Chem. Phys., 2014. 155: 161–200. doi: 10.1002/9781118755815.
24. Vega, C., and J. L. F. Abascal, Simulating water with rigid non-polarizable
models: a general perspective. Phys. Chem. Chem. Phys., 2011. 13:
19663–19688.
25. Rick, S. W., and S. J. Stuart, Potentials and algorithms for incorporating
polarizability in computer simulations. Rev. Comput. Chem., 2002. 18:
89–146.
26. Allen, M. P., and D. J. Tildesley, Computer Simulations of Liquids. 1st ed.
1987, Oxford: Clarendon Press.
27. Hagler, A. T., E. Huler, and S. Lifson, Energy functions for peptides and
proteins .1. Derivation of a consistent force-field including hydrogen-
bonds from amide crystals. J. Am. Chem. Soc., 1974. 96: 5319–5327. Doi
10.1021/Ja00824a004.
28. Lifson, S., A. T. Hagler, and P. Dauber, Consistent force-field studies of
inter-molecular forces in hydrogen-bonded crystals 1. Carboxylic-acids,
amides, and the C=OH- hydrogen-bonds. J. Am. Chem. Soc., 1979. 101:
5111–5121. Doi 10.1021/Ja00512a001.
29. Te, J. A., and T. Ichiye, Temperature and pressure dependence of
the optimized soft sticky dipole-quadrupole-octupole water model. J.
Chem. Phys., 2010. 132: 114511. DOI: 10.1063/1.3359432, PMCID:
PMC2855697.
30. Delhommelle, J., and P. Millie, Inadequacy of the Lorentz–Berthelot
combining rules for accurate predictions of equilibrium proper-
ties by molecular simulation. Mol. Phys., 2001. 99: 619–625. doi:
10.1080/00268970010020041.
31. Gordon, M. S., and J. H. Jensen, Understanding the hydrogen bond using
quantum chemistry. Acc. Chem. Res., 1996. 29: 536–543.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

332 Water Models

32. Rychaert, J. P., G. Ciccotti, and H. J. C. Berendsen, Numerical integration of


the cartesian equation of motion of a system with constraints: Molecular
dynamics of n-alkanes. J. Comput. Phys., 1977. 23: 327–341.
33. Benedict, W. S., N. Gailar, and E. K. Plyer, Rotation-vibration spectra of
deuterated water vapor. J. Chem. Phys., 1956. 24: 1139–1165.
34. Ichikawa, K., Y. Kameda, T. Yamaguchi, H. Wakita, and M. Misawa,
Neutron-diffraction investigation of the intramolecular structure of a
water molecule in the liquid-phase at high-temperatures. Mol. Phys.,
1991. 73: 79–86.
35. Soper, A. K., The radial distribution functions of water and ice from 220
to 673 K and at pressures up to 400 MPa. Chem. Phys., 2000. 258: 121–
137.
36. Tröster, P., and P. Tavan, The microscopic physical cause for the density
maximum of liquid water. J. Phys. Chem. Lett., 2014. 5: 138–142. doi:
10.1021/jz4023927.
37. Tröster, P., K. Lorenzen, and P. Tavan, Polarizable six-point water models
from computational and empirical optimization. J. Phys. Chem. B, 2014.
118: 1589–1602. doi: 10.1021/jp4125765.
38. Jorgensen, W. L., J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L.
Klein, Comparison of simple potential functions for simulating liquid
water. J. Chem. Phys., 1983. 79: 926–935.
39. MacKerell Jr., A. D., D. Bashford, M. Bellot, R. L. Dunbrack Jr., M. J. Field,
S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph, K. Kuchnir, K. Kuczera, F. T. K.
Lau, M. Mattos, S. Michnick, D. T. Nguyen, T. Ngo, B. Prodhom, B. Roux,
M. Schlenkrich, J. Smith, R. Stote, J. Straub, J. Wiorkiewicz-Kuczera, and
M. Karplus, All-atom empirical potential for molecular modeling and
dynamics studies of proteins. J. Phys. Chem. B, 1998. 102: 3586–3616.
40. Bernal, J. D., and R. H. Fowler, A theory of water and ionic solution, with
particular reference to hydrogen and hydroxyl ions J. Chem. Phys., 1933.
1: 515–548.
41. Rowlinson, J. S., The lattice energy of ice and the second virial coefficient
of water vapour. Trans. Faraday Soc., 1951. 47: 120–129.
42. Horn, H. W., W. C. Swope, J. W. Pitera, J. D. Madura, T. J. Dick, G. L.
Hura, and T. Head-Gordon, Development of an improved four-site water
model for biomolecular simulations: TIP4P-Ew. J. Chem. Phys., 2004.
120: 9665–9678. DOI: 10.1063/1.1683075.
43. Rick, S. W., A reoptimization of the five-site water potential (TIP5P) for
use with Ewald sums. J. Chem. Phys., 2004. 2004: 6085–6093.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

References 333

44. Herzberg, G., Molecular Spectra and Molecular Structure: III. Electronic
Spectra and Electronic Structure of Polyatomic Molecules. 1967, Prince-
ton, NJ: D. Van Nostrand Co., Inc. 145.
45. Glaeser, R. M., and C. A. Coulson, Multipole moments of the water
molecule. Trans. Faraday Soc., 1965. 61: 389–391.
46. Clough, S. A., Y. Beers, G. P. Klein, and L. S. Rothman, Dipole moment of
water from Stark measurements of HO, HDO, and D2O. J. Chem. Phys.,
1973. 65: 2254–2259.
47. Verhoevan, J., and A. Dymanus, Magnetic properties and molecular
quadrupole tensor of the water molecule by beam-maser Zeeman
spectroscopy. J. Chem. Phys., 1970. 52: 3222–3233.
48. Ichiye, T., and M.-L. Tan, Soft sticky dipole-quadrupole-octupole po-
tential energy function for liquid model: An approximate moment
expansion. J. Chem. Phys., 2006. 124: 134504. DOI: 10.1063/1.216120,
PMID: 16613458.
49. Wikfeldt, K. T., E. R. Batista, F. D. Vila, and H. Jonsson, A transferable
H2 O interaction potential based on a single center multipole expan-
sion: SCME. Phys. Chem. Chem. Phys., 2013. 15: 16542–16446. doi:
10.1039/c3cp52097h.
50. Cook, R. L., F. C. De Lucia, and P. Helminger, Molecular force field and
structure of water: Recent microwave results. J. Mol. Spectrosc., 1974.
53: 62–76.
51. Xantheas, S. S., and T. H. Dunning Jr., Ab initio studies of cyclic water
clusters (H2 O)n , n = 1–6. I. Optimal structures and vibrational spectra.
J. Chem. Phys., 1993. 99: 8774–8782.
52. Silvestrelli, P. L., and M. Parrinello, Structural, electronic, and bonding
properties of liquid water from first principles. J. Chem. Phys., 1999. 111:
3572–3580.
53. Coutinho, K., R. C. Guedes, B. J. Costa Cabral, and S. Canuto, Electronic po-
larization of liquid water: Converged Monte-Carlo-quantum mechanics
results for the multipole moments. Chem. Phys. Lett., 2003. 369: 345–
353.
54. Osted, A., J. Kongsted, K. V. Mikkelson, P.-O. Åstrand, and O. Christiansen,
Statistical mechanically averaged molecular properties of liquid water
calculated using the combined coupled cluster/molecular dynamics
method. J. Chem. Phys., 2006. 124: 124503–124516.
55. Niu, S., M.-L. Tan, and T. Ichiye, The large quadrupole of water
molecules. J. Chem. Phys., 2011. 134: 134501. PMID:21476758, PMCID:
PMC3081860.
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

334 Water Models

56. Jones, A., F. Cipcigan, V. P. Sokan, J. Crain, and G. J. Martyna, Electronically


coarse-grained model for water. Phys. Rev. Lett., 2013. 110: 227801. doi:
10.1103/PhysRevLett.110.227801.
57. Yu, W., P. E. M. Lopes, B. Roux, and A. D. MacKerell Jr., Six-site polarizable
model of water based on the classical Drude oscillator. J. Chem. Phys.,
2013. 138: 034508. doi: 10.1063/1.4774577.
58. Laaksonen, A., P. G. Kusalik, and I. M. Svishchev, Three-dimensional
structure in water-methanol mixtures. J. Phys. Chem. A, 1997. 101:
5910–5918.
59. Mason, P. E., and J. W. Brady, “Tetrahedrality” and the relationship
between collective structure and radial distribution functions in liquid
water. J. Phys. Chem. B, 2007. 111: 5669–5679.
60. Tan, M.-L., J. R. Cendagorta, and T. Ichiye, The molecular charge
distribution, the hydration shell, and the unique properties of liquid
water. J. Chem. Phys., 2014. 141: 244504.
61. Kirkwood, J. G., The theory of dielectric polarization. J. Chem. Phys., 1936.
4: 592–601.
62. Tan, M.-L., J. R. Cendagorta, and T. Ichiye, Effects of microcomplexity
on hydrophobic hydration in amphiphiles. J. Am. Chem. Soc., 2013. 135:
4918–4921. doi: 10.1021/ja312504q.
63. Frank, H. S., and M. W. Evans, Free volume and entropy in condensed
systems. III. Entropy in binary liquid mixtures; partial molal entropy in
dilute solutions; structure and thermodynamics in aqueous electrolytes.
J. Chem. Phys., 1945. 13: 507–532.
64. Chandler, D., Interfaces and the driving force of hydrophobic assembly.
Nature, 2005. 437: 640–647.
65. Ashbaugh, H. S., and L. R. Pratt, Colloquium: Scaled particle theory and
the length scales of hydrophobicity. Rev. Mod. Phys., 2006. 78: 159–178.
66. Dixit, S., J. Crain, W. C. K. Poon, J. L. Finney, and A. K. Soper, Molecular
segregation observed in a concentrated alcohol-water solution. Nature,
2002. 416: 829–832.
67. Dixit, S., A. K. Soper, J. L. Finney, and J. Crain, Water structure and solute
association in dilute aqueous methanol. Europhys. Lett., 2002. 59: 377–
383.
68. Davis, J. G., K. P. Gierszal, P. Wang, and D. Ben-Amotz, Water structural
transformation at molecular hydrophobic interfaces. Nature, 2012. 49:
582–585.
69. Gupta, R., and G. N. Patey, Aggregation in dilute aqueous tert-butyl
alcohol solutions: Insights from large-scale simulations. J. Chem. Phys.,
2012. 137: 034509(12).
February 2, 2016 14:22 PSP Book - 9in x 6in 09-Qiang-Cui-c09

References 335

70. Nakanishi, K., Partial molal volumes of butyl alcohols and of related
compounds in aqueous solution. Bull. Chem. Soc. Jpn., 1960. 33: 793–
797.
71. Anisimov, V. M., G. Lamoureux, I. V. Vorobyov, N. Huang, B. Roux, and
A. D. MacKerell Jr., Determination of electrostatic parameters for a
polarizable force field based on the classical Drude oscillator. J. Chem.
Theory Comput., 2005. 1: 153–168. doi: 10.1021/ct049930p.
72. Joung, I. S., and T. E. Cheatham III, Determination of alkali and halide
monovalent ion parameters for use in explicitly solvated biomolecular
simulations. J. Phys. Chem. B, 2008. 112: 9020–9041.
73. Mason, P. E., S. Ansell, and G. W. Neilson, Neutron diffraction studies of
electrolytes in null water: a direct determination of the first hydration
zone of ions J. Phys. Condens. Matter, 2006. 18: 8437–8447.
74. Ikeda, T., M. Boero, and K. Terakura, Hydration of alkali ions from
first principles molecular dynamics revisited. J. Chem. Phys., 2007. 126:
034501(9).
75. Krekeler, C., and L. D. Site, Solvation of positive ions in water: the
dominant role of water–water interaction. J. Phys. Condens. Matter, 2007.
19: 192101(7).
76. Varma, S., and S. B. Rempe, Coordination numbers of alkali metal
ions in aqueous solutions. Biophys. Chem., 2006. 124: 192–199. doi:
10.1016/j.bpc.2006.07.002.
77. Tan, M.-L., L. Lucan, and T. Ichiye, Study of multipole contributions to the
structure of water around ions in solution using the soft sticky dipole-
quadrupole-octupole (SSDQO) model of water. J. Chem. Phys., 2006. 124:
174505. DOI: 10.1063/1.2177240, PMID: 16689581.
78. Zhao, Z., D. M. Rogers, and T. L. Beck, Polarization and charge transfer
in the hydration of chloride ions. J. Chem. Phys., 2010. 132: 014502. doi:
10.1063/1.3283900.
79. Soniat, M., and S. W. Rick, The effects of charge transfer on the
aqueous solvation of ions. J. Chem. Phys., 2012. 137: 044511. doi:
10.1063/1.4736851.
80. A. C. Simmonett, F. C. Pickard IV, H. F. Schaefer III, B. R. Brooks, An
efficient algorithm for multipole energies and derivatives based on
spherical harmonics and extensions to particle mesh Ewald. J. Chem.
Phys., 2014. 140: 184101.
81. R. Mancinelli, A. Botti, F. Bruni, M. A. Ricci, and A. K. Soper, Hydration
of sodium, potassium, and chloride ions in solution and the concept of
structure maker/breaker. J. Phys. Chem. B, 2007. 111, 13570–13577.
This page intentionally left blank
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Chapter 10

Quantum Mechanics–Based Polarizable


Force Field for Proteins

Changge Ji,a,b Ye Mei,a,b and John Z. H. Zhanga,b,c


a State Key Laboratory of Precision Spectroscopy,

Institute of Theoretical and Computational Science,


East China Normal University, Shanghai 200062, China
b NYU-ECNU Center for Computational Chemistry at NYU Shanghai,

Shanghai 200062, China


c Department of Chemistry, New York University, New York, NY 10003

[email protected]

10.1 Fragment Quantum Chemistry Calculation of


Proteins

Molecular modeling and computer simulation with empirical


potential energy function (force field) are now routinely carried out
to help understand and predict structures and dynamics of proteins
and other macromolecules of biological relevance in water and
membrane environments. After over 40 years of development, pop-
ular force fields such as AMBER, CHARMM, OPLS and GROMOS have
been widely employed in biomolecular simulations. These force
fields are used dominantly in highly optimized molecular dynamics

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

338 Quantum Mechanics–Based Polarizable Force Field for Proteins

simulation packages including AMBER, CHARMM, GROMACS and


NAMD that greatly facilitated molecular dynamics (MD) simulation
of biological molecules. Advance in computer technology, especially
the recent effort in migrating the MD code to graphics processing
unit (GPU) and the emergence of the special-purpose computer
system such as Anton, has extended the MD simulation time to much
longer than could be reached just a few years ago.
The predictive power of computer simulation relies heavily on
the accuracy of force field and the efficiency of phase space sampling.
The broader and comprehensive applications of biomolecular
simulation have highlighted the limitations of the existing force
fields, and there is an urgent need to develop next generation force
field that includes electrostatic polarization for biomolecules. The
desired polarizable or polarized force fields should be ideally based
on quantum mechanical calculations of biomolecules, which is a
challenging task for computational chemists.
Over the past decade, quantum chemists have made significant
progress in the development of highly efficient quantum mechanical
methods for ab initio calculation of macromolecules. In particular,
the fragment-based quantum chemistry methods for macromole-
cules have received considerable interest and they have been applied
to realistic biomolecules such as proteins. The fragment approach is
based on “chemical locality” of the molecular system and is naturally
linear scaling and therefore highly efficient for parallel computing
(Gordon et al., 2011).
Among various fragmentation methods, molecular fractionation
with conjugate caps approach (MFCC) was proposed in 2003 for
calculating biomolecular interaction (Zhang et al., 2003; Zhang and
Zhang, 2003; Mei et al., 2012a). In this approach, a peptide chain
is treated by a cut-and-seal process as shown in Fig. 10.1. The
unsaturated chemical bonds are sealed with a capping group on
each side. Coordinates of the atoms in the capping group are taken
from their corresponding atoms in the original peptide. Therefore,
each pair of capping groups can be put together to generate a
cap molecule. Disulfide bond undergoes the same processing by
cutting the bond and sealing the dangling atoms with another pair
of conjugate caps.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Fragment Quantum Chemistry Calculation of Proteins 339

A B
O H O H O
a)
C N C C N C C N

H R1 H R2 H

b) O H O H H O H O

C N C C N C H H C C N C C N

H R1 H H H H R2 H

c)
H O H O H H O H H O H O H

H C C N C C N C H H C C N C H H C C N C C N C H

H H R1 H H H H H H H R2 H H

fragment A conjugate caps fragment B

Figure 10.1 (a) A peptide chain is decomposed into residues by cutting


the amide bond. (b) Then a pair of conjugate caps are added to saturate
the dangling chemical bonds and also to mimic the immediate chemical
environment. (c) The electronic density of this peptide can be represented
as the sum of the electronic density of all the capped fragments (A and B).
But the double counted atoms in the conjugate cap C should be removed
from the total density.

In the MFCC approach, the total electronic density ρ of a peptide


with N residues and Nss disulfide bonds in a single chain can be
approximately written as

N 
N−1 
Nss
ρ= ρk − ρkcc − ρkdc , (10.1)
k=1 k=1 k=1

in which ρk is the electron density of kth capped fragment, ρkcc is


the electron density of kth pair of conjugate caps and ρkdc is the
electron density of kth pair of disulfide bond caps. It can be proved
that the total number of electrons is conserved in Eq. 10.1 through
integration as in Eq. 10.2.
 N  N−1 
 Nss 

ρdr = ρk dr − ρkcc dr − ρkdc dr
k=1 k=1 k=1

N 
N−1 
Nss
= Nk − Nkcc − Nkdc
k=1 k=1 k=1
= Ntotal . (10.2)
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

340 Quantum Mechanics–Based Polarizable Force Field for Proteins

Other additive quantities such as electrostatic potential at grids


around the molecule can be calculated in a similar way as

N 
N−1 
Nss
φ(r) = φk (r) − φkcc (r) − φkdc (r), (10.3)
k=1 k=1 k=1
and for dipole moment

N 
N−1 
Nss
μ
= μ
k − μ
 cc
k − μ
 dc
k , (10.4)
k=1 k=1 k=1
where the superscripts and the subscripts have the same means as
in Eq. 10.1.
The electronic structure calculation of each fragment is per-
formed in the presence of an external electric field generated by
atoms not included in the fragment, a method called electrostatic
embedding. These atoms are usually represented as but not limited
to monopoles located on nuclei. Charges from pairwise force fields
such as AMBER03 are a good mean-field approximation. Better
choice of atomic charges are also available, and will be discussed in
Section 10.3. Due to the covalent-Coulomb duality of hydrogen bond,
the hydrogen-bonded partners can be handled in the same way
as disulfide bond or simply be treated as electrostatic interaction.
Obviously, the latter treatment is more computational efficient,
because it does not add more atoms to the fragment.
The accuracy of the total electron density can be improved
by virtue of the idea of many-body expansion. For a tripeptide
(sequence: ABC) without disulfide bond, the total electron density
can be computed as (Mei et al., 2004)
ρ(A BC ) = ρ( A B) + ρ(BC ) − ρ(B), (10.5)
in which A B, BC , and B are all the capped residues. Here, the
calculations of the cap molecules are not required any more. But
the calculation of the two-body terms (A B and BC ) are more
expensive than the calculation of single-body term in the original
MFCC scheme, due to the steep increase of computational overhead
with the increase of molecular size. For a peptide with N residues,
Eq. 10.5 can be generalized to

N−1 
N−1
ρ= ρk, k+1 − ρk . (10.6)
k=1 k=2
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Protein Solvation 341

10.2 Protein Solvation

Proteins mainly reside in condensed phases such as aqueous


solution and membrane. Solvent plays an important role in
modulating the structure and function of protein. Polar solvent
molecules around the protein generate an electric field to polarize
protein nonuniformly. Rigorous implementation of the solvent effect
should include explicit solvent molecules. However, convergence
of the interaction between solvent molecules and the solute need
large scale sampling of the solvents’ degrees of freedom, which is
too expensive for the study at quantum mechanical level, if not
impossible. A more practical approach to including the solvent effect
is to treat the solvent as a continuous dielectric medium, in which
the solute molecule (protein) is embedded (Tomasi and Persico,
1994). In this solvation model, the solvent is structureless and its
only variable is the dielectric constant.
For solvation of small molecules, the polarizable continuum
model (PCM) and its variants have been widely used for calculation
of solvation energy. The conductor-like PCM (CPCM) model gives a
concise formulation of solvent effect, in which the solvent’s response
to the solute polarization is represented by the presence of induced
surface charges distributed on the solute–solvent interface. In this
formulation, no volume polarization (extension of solute’s electron
distribution into the solvent region) is allowed. The induced surface
charge counterbalances the electrostatic potential on the interface
generated by the solute molecule.
By discretizing the induced charges on tesserae, the basic
mathematics of CPCM is given by
B + Aq = 0, (10.7)
where q is the vector of induced charges and

1   4π
Auv = 1 − δuv + 1.07 δuv , (10.8)
|ru − rv | Su
 Zα
Bu = + φ(ru ). (10.9)
α
|ru − Rα |
ru and rv are the coordinates of tesserae u and v. Su is the
area of tessera u. φ(ru ) is the electrostatic potential generated by
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

342 Quantum Mechanics–Based Polarizable Force Field for Proteins

solute electrons and is calculated by quantum approach described


below. Z α and Rα are the charge and the coordinates of nucleus α,
respectively.
When the solute is relatively small, the number of surface
tesserae is also small which makes direct matrix inversion of Eq. 10.7
feasible. The computational time and storage requirement for the
direct inversion of matrix A is on the order of N 3 , where N is the
number of tesserae. For large solutes such as protein, the number
of surface tesserae can be very large. Even the allocation for matrix
A may cause memory overflow. Therefore iterative methods are
required with on-the-fly calculations of the columns or rows of
matrix A whenever necessary (Barrett et al., 1994).
The calculated induced charge q in Eq. 10.7 is scaled for the finite
dielectric constant correction by
−1
q⇒ q. (10.10)

The Hamiltonian equation for the solute can be written as
H  = (H 0 + H  ), (10.11)
in which H 0 is the Hamiltonian for the solute in gas phase and H  is
the perturbation from the apparent surface charge
 qu Z α  qu
H = − (10.12)
u, α
|ru − Rα | u
|ru − r|
where qu and ru are, respectively, the induced surface charge on
tessera u and its location. Since the induced charge on the interface
and the electronic structure of the solute depend on each other,
Eq. 10.7 and 10.11 must be solved iteratively. This is the main idea
of the self-consistent reaction field (SCRF) method.
The free energy of the solute is given by
1
G =< |H 0 | > + < |H  | >, (10.13)
2
which  is the native wavefunction of H 0 + H  . Subtracting the
energy of the solute in gas phase, the solvation free energy can be
written as the sum of the wavefunction distortion energy and the
reaction field energy
1
G(ele) = [< |H 0 | > − < 0 |H 0 |0 >] + < |H  | >
2
= G(w f d) + G(es) (10.14)
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Protein Solvation 343

The CPCM method works well for small solute molecules, but is
not directly applicable to proteins due to large size of the solutes.
However, by combining with a fragmentation method such as MFCC,
the CPCM approach can be applied to solvation of proteins. For
example, computation of B in Eq. 10.7 follows Eq. 10.3 by counting
the contributions from all the capped residues but removing the
contribution from double-counted atoms. The total energy in gas
phase and in solution can be calculated using the methods shown in
Section 10.1, and the calculation of wavefunction distortion energy
is quite straightforward (Mei et al., 2006).
After convergence, the reaction field energy is calculated as

N 
Nc
G(es) = Gk (es) − Gcc
k (es), (10.15)
k=1 k=1

where
  
1  qu Z α  qu ρk (r)
Gk (es) = − dr
2 |r − Rα |
u, α∈k u u
|ru − r|
 
1  qu Z α 
= + qu φk (ru ) (10.16)
2 |r − Rα |
u, α∈k u u

for the kth fragment and


 
1  qu Z αcc 
Gk (es) =
cc
− qu φk (ru )
cc
(10.17)
2 u, α∈k |ru − Rcc
α| u

for the kth pair of conjugate caps. The total solvation free energy is
given by
G(sol) = G(ele) + G(ne)
= G(w f d) + G(es) + G(ne) (10.18)
where G(ne) is the non-electrostatic contribution to solvation energy
(Tomasi and Persico, 1994).
It is worth noting that the protein–solvent interface is uniquely
defined by the whole protein and is used in all the electronic
structure calculations of the capped fragments and conjugate caps.
Thus in each cycle of MFCC calculation, all the fragments and caps
are interacting with a common external ESP created by the same set
of induced charges on the cavity surface.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

344 Quantum Mechanics–Based Polarizable Force Field for Proteins

Another approach to include solvation effect is to couple the


MFCC calculation with the Poisson–Boltzmann (PB) equation

∇ · [(r)∇φ(r)] − κ 2 (r)φ(r) = ρ(r), (10.19)

in which (r), φ(r) and κ(r) are the dielectric constant, the electro-
static potential and the Debye-Hückel parameter at r, respectively.
Equation 10.19 can be solved by using directly the electron density
of protein from MFCC calculation or using discrete charges fitted
from the electron density. The effect of solvent polarization can be
treated by two methods. In the first approach, Equation 10.19 is
solved twice for φ(r), once in the gas phase and once in the dielectric
medium. The reaction field potential is defined as

φRF (r) = φsol (r) − φgas (r). (10.20)

Then this reaction field is incorporated into the MFCC Hamiltonians


to generate the polarized electron density of protein.
The other approach is to map the polarization effect onto the
protein–solvent interface in the form of induced charge, and then
take the induced charge as a single electron operator in the MFCC
Hamiltonian. The latter approach is employed in working with
the discrete representation of the electron density by (Ji et al.,
2008). Similar to the MFCC-CPCM approach, this procedure must be
iterated until convergence is reached.

10.3 Polarized Protein-Specific Charge

In the widely used contemporary force fields for biological mole-


cules, such as AMBER, CHARMM, and OPLS, the atomic charges are
amino acid specific, i.e., they only depend on the type of the amino
acid. However, it is well known that protein is an electrostatically
heterogeneous entity, and electron density distribution in protein
is a function of coordinates and electrostatic environment. The
same type of residues in different locations in a protein have
different conformations and are embedded in a different chemical
environment. Atomic charges from pairwise force fields are a mean-
field approximation to the charge distribution for the residues of
the same type. The advantage of this charge scheme is its high
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Polarized Protein-Specific Charge 345

portability, which leads to limited accuracy unfortunately. Correct


prediction of the subtlety of electrostatic interaction in protein calls
for a more accurate description of the interaction potential.
Since the electronic structure of the protein is available through
MFCC calculation, atomic charges can be obtained employing a
certain charge fitting scheme. Now, this set of atomic charge is
no longer amino acid specific but protein specific. Besides, intra-
protein polarization effect and the perturbation from the solvent
are also incorporated in the quantum mechanical calculation of the
electronic structure. Therefore, this charge model is termed the
polarized protein-specific charge or PPC for short.
In order to be consistent with AMBER force field, the restrained
electrostatic potential (RESP) fitting method is employed. The
theory of RESP can be found elsewhere (Bayly et al., 1993; Cornell
et al., 1993; Cieplak et al., 1995), and it will not be covered here.
RESP charge is now widely used in MM and QM/MM modeling.
However, there always exists numerical difficulty when applying this
charge fitting method, which is known mathematically as the rank
deficiency for the least-square matrix. It exists not only in RESP
fitting of PPC, but is prevalent in all the ESP based charge fitting
methods (Stouch and Williams, 1992, 1993). The idea in ESP based
charge fitting method is to best reproduce the electrostatic potential
at some grids around the molecule in a least-square way. Because
the ESP grids are scattered on some scaled VDW surfaces and are
closer to the surface atoms than the buried atoms. The figure-of-
merit function measuring the quality of fit is less sensitive to the
buried atoms. Therefore, in spite of having much larger number
of grids than atoms, the atomic charges are still indeterminate for
molecules with buried atoms.
Some remedies have been proposed by utilizing more conforma-
tions at the same time or using electron density to reweight grids
(Reynolds et al., 1992; Hu et al., 2007; Berente et al., 2007; Mei and
Zhang, 2009). Recently, a new charge scheme termed delta Restraint
Electrostatic Potential (dRESP) has been proposed to remit the
impact of this numerical difficulty on the fitted atomic charges (Zeng
et al., 2013a). In this method, the atomic charge for each atom is
divided into two parts as
q j = q 0j + q j , (10.21)
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

346 Quantum Mechanics–Based Polarizable Force Field for Proteins

where q 0j is the base charge and q j is a perturbation to the base


charge. The base charge should be a good mean-field approximation
to the charge distribution, which can be taken from AMBER,
CHARMM, or OPLS force fields, etc. The perturbation is usually very
small and is to capture the system dependence. Besides, it must
satisfy the constraint

δq j = 0. (10.22)
j
The contribution from the base charge is removed from the
standard ESP, which is usually obtained from quantum mechanical
calculations. Instead of fitting the total atomic charge directly, only
the perturbation is fitted to the residual ESP. The figure-of-merit
function is now written as
  δq j
χdresp
2
= χ 2esp + χ 2rstr = (Vi − Vi0 − )2 + χ 2rstr (10.23)
i j
ri j

where Vi0 is the electrostatic potential on grid i generated by base


charge located usually on nuclei, which is calculated as
 q 0j
Vi0 = . (10.24)
j
ri j
It then follows the same way as that in the original RESP fit except
that a different way to assign the weighting factor for each atom is
employed.
In RESP fit, restraint that keeps the atomic charge near zero
are applied to each atom with a uniform weighting factor. In this
fitting method, different weighting factors are assigned to the atoms,
which are reversely proportional to the square of the base charges,
i.e.,

2
W j ∝ 1/q 0j (10.25)
Therefore, nonpolar atoms have large weighting factors to keep
them more inert, while polar atoms have more freedom to vary
with respect to the chemical environment. For a hexapeptide in
eight conformations, the dRESP charges for the polar atoms are
very close to the RESP charge, while the dRESP charges for the
nonpolar atoms are nearly invariable with conformational change.
Therefore, dRESP can be as effective as RESP charge in depicting
strong Coulomb interaction among polar atoms. The difference is it
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Dynamically Adapted Hydrogen Bond Charge 347

avoids the large amplitude of fluctuation in potential energy caused


by the fake charge separation in other ESP based fitting methods.

10.4 Dynamically Adapted Hydrogen Bond Charge

Protein is not a static entity. Instead, it adopts an ensemble of struc-


tures (Henzler-Wildman et al., 2007). Large-scale conformational
change, which may be induced by ligand binding or titration etc,
is correlated with its functions. Protein folding is another typical
example, in which protein travels from very extended structure to
an ordered and well-defined structure. Large-scale conformational
change is accompanied by charge redistribution. Therefore, in order
to study a dynamic process involving diverse conformations, a single
set of polarized protein-specific charges, which may be fitted from
the starting structure of the simulation, is not suitable for the
whole trajectory. Atomic charges should be changed at each step of
molecular dynamics propagation.
However, charge fitting based on quantum mechanical calcu-
lations at every time step is still impractically expensive. It is a
reasonable assumption that charge distribution is mainly perturbed
by the formation and breakage of hydrogen bonds, which is also
a good indicator of the variations of secondary structures. During
the molecular dynamics simulation, main chain hydrogen bonds are
periodically checked. If any main chain hydrogen bond is formed
or broken, residues involved in this hydrogen bond will have their
atomic charges refitted (Duan et al., 2010). The time interval
between two successive checks should be short enough to guarantee
that the current charge is acceptable for structures in that interval.
Some weak hydrogen bonds may undergo rapid change between
breaking and formation.
To avoid repeating charge fitting for the residues involved in
these hydrogen bonds, the determination of bond forming and
breaking can be based on its occupancy in a short time period, say
10 snapshots in the last 1 ps. If the occupancy is below 0.3, it can be
deemed that this hydrogen bond has not been formed and the status
of this hydrogen bond is set to nonexistent. When the occupancy
is over 0.7, the hydrogen bond is stable and its status is set to
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

348 Quantum Mechanics–Based Polarizable Force Field for Proteins

existent. In between, the status of this hydrogen bond is unaltered.


For example, suppose the occupancy of a hydrogen bond is 0.2 at
time t0 . Therefore, its status is nonexistent. At time t1 it increases to
0.5 and its status is not changed. The atomic charge for the residues
involved in this hydrogen bond will not be fitted at this moment.
At time t2 , its occupancy becomes 0.7. Now, its status is switched to
existent. At this moment, the atomic charges should be updated for
these two residues involved.
Although the computational expense is significantly reduced by
employing a periodic charge fitting only for selected residues based
on the occupancy in collective snapshots, it is still too demanding
for protein folding simulation in which large-scale conformational
change is involved in a long time scale. A much cheaper way to
implement polarization effect along main chain hydrogen bond is
to employ an empirical relation between charge response and the
hydrogen bond strength (Gao et al., 2011). Shown in Fig. 10.2 is the
model system used for the parameterization, which consists a pair of
alanine dipeptides connected through a main chain hydrogen bond.
By systematically alternating the length of the hydrogen bond, the
atomic charge for each conformation can be fitted through quantum
mechanical calculations. It is a good approximation to assume that
only the atomic charges of the hydrogen bonded NH and CO groups
are adjusted, while other charges are fixed. It is further assumed that
charge flow between these two residues is not allowed. Therefore,
charge flow is only within the NH group and within the CO
group.
The amount of charge transferred from H to N and from C to O
can be well-fitted to single exponential functions of the distance dON
between two heavy atoms as
qN = −0.493 × exp(−0.455 × dON ) (10.26)
and
qO = −0.334 × exp(−0.466 × dON ). (10.27)
The influence of the hydrogen bond angle on charge variation is
found to be negligible. Taking these relationships into molecular
dynamics simulations, the polarization effect can be incorporated by
updating atomic charges for the hydrogen bonded main chain polar
groups.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Dynamically Adapted Hydrogen Bond Charge 349

Figure 10.2 (top) The model system, i.e., a pair of alanine dipepetide con-
nected through a main chain hydrogen bond, used for the parameterization.
(bottom) The calculated amount of charge flows can be well-fitted to single
exponential functions of bond distance.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

350 Quantum Mechanics–Based Polarizable Force Field for Proteins

10.5 Effective Polarizable Bond Method

Effective polarizable bond (EPB) method is another efficient


means to investigate microenvironment dependent character of
electrostatic interactions in protein (Xiao et al., 2013; Ji et al., 2012).
In EPB, charges are allowed to migrate through polarizable bond,
and all the parameters are pre-determined by extensive quantum
mechanical calculations of the polar group in various strengths of
electric field. The EPB model keeps the “effective charge” character
of the classical force field and provides a good correction to the
traditional force field for MD simulation by introducing “fluctuating”
charge for atoms of polar groups.
When a molecule is placed in an electric field, electron
redistribute to accommodate the new environment. Redistribution
of electron density is the quantum mechanical source of electronic
polarization (Yu and van Gunsteren, 2005; Cieplak et al., 2009).
There are two opposing energetic effects that occur during the
electron redistribution. On one hand, electron redistribution will
enhance the interaction energy between the molecule and the
environment in order to lower the electrostatic energy of the system.
On the other hand, the internal energy of the molecule will rise
as a result of distortion of the electron charge distribution of the
molecule due to polarization. This energy is usually recognized as
distortion energy in quantum mechanical calculation.
When a molecule is moved from gas phase to condensed phase,
these two opposing energetic effects will counter balance each
other and establish a new equilibrium when the molecule reaches
its eventual polarized state under given electrostatic environment
generated by surrounding molecules. Using this rational, the
relationship between the distortion energy and polarization state
under an external electric field can be determined from large set of
quantum electronic structure calculations of model systems in gas
phase and in solution.
Taking acetone as an example. To model different external
electric field environments, a acetone molecule is embedded in
an octahedron-like TIP3P water box. A long time simulation is
performed to generate an ensemble of configurations of the water
molecules around the acetone while the acetone is kept fixed. A
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Effective Polarizable Bond Method 351

total of 15000 configurations are extracted from the trajectory. For


each configuration, electronic structure of the solute is calculated
with and without influence of the background charge to mimic gas
phase and liquid phase environments, respectively. The quantitative
relationship between the polarization cost and the polarization
states of certain chemical groups can be determined through
the following procedure. The Schrödinger equations that describe
electronic structures in gas phase and in solvent are
H 0 0 = E 0 0 (10.28)
and
(H 0 + H  ) = E , (10.29)
where H 0 is the solute Hamiltonian in gas phase, and H  is the
interaction between the solute and the surrounding charges.
 qs Z α  qs
H = − (10.30)
s, α
|rs − Rα | s, i
|rs − r|
where qs and rs are, respectively, background charges, and their
positions, Z α and R α are nuclear charges and their corresponding
coordinates.
The polarization cost (or distortion) energy during the polariza-
tion process is given by
E p−cost =< |H 0 | > − < 0 |H 0 |0 > (10.31)
In order to derive the polarization parameter for CO group, the
internal energy contribution from the two methane molecules was
subtracted. Figure 10.3 shows the relationship between polarization
cost energy and change of dipole moment of the CO group in
CH3 COCH3 . The data can be fitted into a quadratic relation:
E p−cost = k(μliquid − μgas )2 (10.32)
where 1/k represents polarizability of the CO group. Using this
simple relationship, a new set of polarizable charges can be obtained
for molecular dynamics simulation, which may mimic electron
redistribution process during the polarization process. Consider
transferring the polar group CO from gas phase into liquid phase,
the energy of the system can be written as
E = E self + E ele = [k(μliquid − μgas )2 ] + [qC C + qO O ] (10.33)
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

352 Quantum Mechanics–Based Polarizable Force Field for Proteins

Figure 10.3 Polarization cost energy of the CO group as a function of the


dipole moment change. Reprinted with permission from Journal of Chemical
Theory, 82 (6), 2157–2164. Copyright 2012 American Chemical Society.

where qC and qO are, respectively, the ESP charges of the C and O


atoms of the CO group, and C and O are the electrostatic potential
at C and O atoms, respectively.
The polarization process can be treated as charge transfer
between atoms in a polar group. If charge transfer from atom O to
atom C is q, then the final partial charge is
gas
qC = qC + q (10.34)
gas
qO = qO − q (10.35)
gas gas
where qC and qOare the atomic partial charges in the gas phase.
Thus, the dipole moment change along CO group in the polarization
process is given by
μ = μliquid − μgas
= qdCO (10.36)
where dCO id the bond length of the CO bond. Thus Eq. 10.33 can be
rewritten as
gas gas
E = k( qdCO )2 + (qC + q) C + (qO − q) O (10.37)
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Effective Polarizable Bond Method 353

The equilibrium with “self energy going up and interaction


energy going down” can be reached by minimizing the total energy
with respect to variation of q
∂E
=0 (10.38)
∂ q
from which we get the amount of charge transfer along the CO bond
under a given background electrostatic potential,
O − C
q = 2
(10.39)
2dCO k
Just as in the traditional force field, the effective charge concept can
also be introduced in the fluctuating charge model.
It is convenient to express the polarization cost energy term in
Eq. 10.33 in a form of electrostatic interaction. For the CO group,
Eq. 10.33 can be rewritten as
E = E self + E ele
gas gas
= k( qdCO )2 + [(qC + q) C + (qO − q) O ]
= q̃C C + q̃O O (10.40)
where q̃C and q̃O are, respectively, the effective charges of C and O
atoms. Combination of Eq. 10.39 and Eq. 10.40 leads to
gas gas
E = k( qdCO )2 + [(qC + q) C + (qO − q) O ]
gas gas
= q · ( qkdCO2
) + [(qC + q) C + (qO − q) O ]
O − C 2 gas gas
= q · ( 2
kdCO ) + [(qC + q) C + (qO − q) O ]
2dCO k
gas gas
= (1/2)( q( O − C )) + [(qC + q) C + (qO − q) O ]
gas 1 gas 1
= (qC + q) C + (qO − q) O
2 2
= q̃C C + q̃O O (10.41)
The effective fluctuating charges (EFQ) can be defined as
gas 1
q̃C = qC + q (10.42)
2
1 gas
q̃O = qO −
q (10.43)
2
The polarization cost energy is a negative contributor in the
polarization process. The net effect of EFQ is that the amount
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

354 Quantum Mechanics–Based Polarizable Force Field for Proteins

of charge transferred is reduced by half when polarization cost


energy is merged into electrostatic interaction using point charge.
This new charge model inherited the effective character of the
classic force field and the fluctuating feature of previous polarizable
models. Different from other polarizable models, the polarization
cost energy is implicitly included. Since polarization cost energy is
treated properly, this model avoids the problem of over polarization
and is numerically stable and efficient for MD simulation.

10.6 Applications

10.6.1 Thermodynamics of Proton Binding in Protein


Protons binding and unbinding plays an important role in modu-
lating proteins function. In enzyme reaction, identification of pro-
tonation state of core residues involved in the reaction is essential
to uncover the catalytic mechanism (Ji and Zhang, 2011). Post-
translational modification by protons, same as phosphorylation,
acetylation, and methylation, can modulate proteins binding to small
molecules, or it can drive large scale conformational changes and
dynamics that then trigger some specific function (Schönichen et al.,
2013).
Proton binding with protein is a pH-dependent process and is
strongly influenced by electrostatic interactions with the protein
residues and water around. In the past several decades, many
computational methods have been developed for predicting pKa
values, including Poisson–Boltzmann equation based continuum
electrostatics (Warshel, 1981), constant pH molecular dynamics
(Donnini et al., 2011) and molecular dynamics free energy (MDFE)
simulations (Simonson et al., 2004). As for the continuum electrosta-
tics methods, the accuracy of the predicted pKa depends strongly on
parameters used. For residues located on the surface of the protein
and those buried in the protein, different dielectric constant should
be used in continuum method. Choosing dielectric constant properly
is a tricky job in the continuum electrostatics method.
MDFE simulation in explicit water is the most rigorous methods
for calculating thermodynamics of the proton-protein binding
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Applications 355

process, since it captures conformational relaxation of protein


residues and water molecules around the titratable site. MDFE
calculation using Polarized Protein specific Charges accurately
reproduced the experimental value of pKa shift for ionizable
residue ASP26 buried inside thioredoxin (Ji et al., 2008), whereas
previous calculations using classic AMBER94 and CHARMM22 force
fields all overestimated pKa shift by twice as much (Simonson
et al., 2004). Free energy profile of the proton binding process
was constructed from distributions of conformational ensembles
sampled MD simulation. Simulations employing PPC can correctly
describe conformational ensemble and electrostatic interaction of
proteins in water. Without polarization, conformations sampled
from MD simulation may deviate a lot from real state, which
would introduce errors in predicting thermodynamic properties of
biochemical processes.

10.6.2 Protein Ligand Binding


Understanding the mechanism of protein-ligand binding is vital
to understanding various life processes such as cell signing,
modulation of gene expression, etc. Accurate calculation of protein-
ligand binding affinity is also important to rational drug design.
A battery of methods have been developed for the calculation
of binding affinity, from the empirical scoring methods to more
rigorous methods such as free energy perturbation and thermody-
namic integration (Kollman, 1993). Different simulations and post-
simulation processing may give contradictory results partly due
to the use of different force fields. For example, Kollman et al.
drew a conclusion from a molecular dynamics simulation employing
unpolarized AMBER force field that the dominant driving force for
avidin-biotin binding was the van der Waals interaction. In their
study, the electrostatic component of the avidin-biotin interaction
was nearly canceled out by the desolvation penalty. Avidin-biotin is
the strongest binding partner occurring in nature with a binding free
energy over 20 kcal/mol (Green, 1966). Biotin is firmly anchored in
the pocket of avidin through 8 hydrogen bonds.
The conclusion from Kollman’s study is somewhat against
chemical intuition. Explicit polarization along hydrogen bond will
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

356 Quantum Mechanics–Based Polarizable Force Field for Proteins

strongly distort the electron density to facilitate the hydrogen bond.


Hydrogen bond cooperativity is significant in this system, which has
been confirmed by quantum mechanical calculations (DeChancie
and Houk, 2007). These effects are neglected in simulations
employing pairwise force fields. A series of studies of (strept)avidin-
biotin systems with various free energy estimation methods showed
that electrostatic polarization effect is critical in delineating the
strong binding affinity in avidin-biotin complex (Tong et al., 2010;
Mei et al., 2012b; Zeng et al., 2013b).
For instance, a recent mutagenesis study showed that a distal
mutation F130L had no significant perturbation to either the local
structure of the binding pocket in avidin or the binding pattern of
biotin in the pocket, but it reduced the binding affinity of biotin
to avidin by 1000-fold, which corresponded to 4.2 kcal/mol loss
in binding free energy (Baugh et al., 2010). The experimentalists
speculated that it was the electrostatic polarization effect that was
responsible for the loss. Postprocessing of the MD simulations for
the wild type and the mutant using unpolarized AMBER force field
was not able to provide qualitatively correct binding affinities via
the end-point free energy method. The fitted dRESP PPC for both
the wild type and the mutant show remarkable deviations from
AMBER charge. Furthermore, both the absolute and relative binding
affinities, especially the binding enthalpy, were very close to the
experimental measurement (Zeng et al., 2013b).

10.6.3 Protein Folding


Current understanding of protein folding can be viewed as a journey
on protein’s free energy landscape toward the native structure
(global free energy minimum). During folding, hydrophobic side
chains pack together, and polar groups move outward toward
solvent or form intra-protein strong Coulomb interactions such
as hydrogen bonds and salt bridges. Main chain hydrogen bond
is a typical characteristics of secondary structures. Formation of
the secondary structure is a competition between residue–residue
hydrogen bond interaction and desolvation penalty. In force fields
like AMBER and CHARMM, explicit hydrogen bond term does not
exist and it is described mainly by electrostatic interaction. However,
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

Applications 357

it is well known that the hydrogen-bonded residue pair have strong


perturbation to each other’s electronic structure, which facilitates
the bonding and lowers the potential energy further. This is a typical
polarization effect, which cannot be described by pairwise force
field.
With the on-the-fly charge fitting method, atomic charges of the
protein can be obtained at any time. Therefore, it can give a real-time
and more realistic description of the interaction between hydrogen
bonded pairs. Folding simulations of a short helical peptide with
Protein Data Bank entry 2I9M have demonstrated the essential role
played by electrostatic polarization effect and the effectiveness of
the dynamically adapted polarized hydrogen bond charge in protein
folding simulations (Duan et al., 2010, 2012). In these simulations,
main chain hydrogen bonds are periodically checked. If any main
chain hydrogen bond is formed or broken, the atomic charges of
the residues involved in this hydrogen bond will be refitted and
kept constant until the next hydrogen bond check. The nascent
hydrogen bond is more stable under this charge model with less
chance of being cleaved by solvent molecules, and the folding can
move forward along the pathway. Some snapshots in these folding
simulations are show in Fig. 10.4.

Figure 10.4 Snapshots of intermediate structures of peptide 2I9M in


simulations using AMBER (upper) and dynamically adapted HBC (lower).
α-helix: purple; coil: white; turn: cyan. Reprinted with permission from
Journal of the American Chemical Society, 132 (32), 11159–11164. Copyright
2010 American Chemical Society.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

358 Quantum Mechanics–Based Polarizable Force Field for Proteins

Pairwise force fields assume that the charge distribution in


protein is not altered by solvent molecules, no matter how polar the
solvent is. The response of protein to solvent alternation is on the
nuclei’s degrees of freedom but not the electrons’. This is a crude
assumption, which is not strictly valid. Polar solvent has strong pull-
and-push on the electrons in protein, while nonpolar solvent has
only weak interaction. Therefore, protein has larger dipole in water
than in nonpolar solvent. Atomic charge in pairwise force field is
usually more suitable for protein in polar solvent by calculating
the electron density in water or by ad hoc scaling up the charge.
While in less polar or nonpolar solvent such as trifluoroethanol,
the specifically tuned atomic charge is no longer suitable. The
polarized protein-specific charge takes the chemical environment
into consideration for each residue including the solvent effect.
Therefore, it is capable of giving a more realistic delineation of the
charge distribution in protein. Simulations of E6-associated protein
with the on-the-fly charge fitting indicate diverse folding pathways
of this protein in trifluoroethanol and water (Xu et al., 2012).

References

Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout,
V., Pozo, R., Romine, C., and Van der Vorst, H. (1994). Templates for the
Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd
Edition (SIAM, Philadelphia, PA).
Baugh, L., Le Trong, I., Cerutti, D. S., Gulich, S., Stayton, P. S., Stenkamp,
R. E., and Lybrand, T. P. (2010). A distal point mutation in the
streptavidin/biotin complex preserves structure but diminishes bind-
ing affinity: Experimental evidence of electronic polarization effects?
Biochemistry 49, pp. 4568–4570.
Bayly, C. I., Cieplak, P., Cornell, W., and Kollman, P. A. (1993). A well-
behaved electrostatic potential based method using charge restraints
for deriving atomic charges: The RESP model, Journal of Physical
Chemistry 97, pp. 10269–10280.
Berente, I., Czinki, E., and Náray-szabó, G. (2007). A combined elec-
tronegativity equalization and electrostatic potential fit method for
the determination of atomic point charges, Journal of Computational
Chemistry 28, 12, pp. 1936–1942.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

References 359

Cieplak, P., Cornell, W. D., Bayly, C., and Kollman, P. A. (1995). Application
of the multimolecule and multiconformational RESP methodology to
biopolymers: Charge derivation for DNA, RNA, and proteins, Journal of
Computational Chemistry 16, pp. 1357–1377.
Cieplak, P., Dupradeau, F.-Y., Duan, Y., and Wang, J. (2009). Polarization effects
in molecular mechanical force fields, Journal of Physics: Condensed
Matter 21, 33, p. 333102.
Cornell, W. D., Cieplak, P., Bayly, C. I., and Kollmann, P. A. (1993). Application
of RESP charges to calculate conformational energies, hydrogen bond
energies, and free energies of solvation, Journal of the American
Chemical Society 115, pp. 9620–9631.
DeChancie, J., and Houk, K. N. (2007). The origins of femtomolar protein-
ligand binding: Hydrogen-bond cooperativity and desolvation energet-
ics in the biotin-(strept)avidin binding site, Journal of the American
Chemical Society 129, pp. 5419–5429.
Donnini, S., Tegeler, F., Groenhof, G., and Grubmüller, H. (2011). Constant
pH molecular dynamics in explicit solvent with λ-dynamics, Journal of
Chemical Theory and Computation 7, 6, pp. 1962–1978.
Duan, L. L., Gao, Y., Mei, Y., Zhang, Q. G., Tang, B., and Zhang, J. Z. H.
(2012). Folding of a helix is critically stabilized by polarization of
backbone hydrogen bonds: Study in explicit water, The Journal of
Physical Chemistry B 116, 10, pp. 3430–3435.
Duan, L. L., Mei, Y., Zhang, D., Zhang, Q. G., and Zhang, J. Z. H. (2010).
Folding of a helix at room temperature is critically aided by electrostatic
polarization of intraprotein hydrogen bonds, Journal of the American
Chemical Society 132, 32, pp. 11159–11164.
Gao, Y., Lu, X., Duan, L. L., Zhang, J. Z. H., and Mei, Y. (2011). Polarization of
intraprotein hydrogen bond is critical to thermal stability of short helix,
The Journal of Physical Chemistry B 116, 1, pp. 549–554.
Gordon, M. S., Fedorov, D. G., Pruitt, S. R., and Slipchenko, L. V. (2011).
Fragmentation methods: A route to accurate calculations on large
systems, Chemical Reviews 112, 1, pp. 632–672.
Green, N. M. (1966). Thermodynamics of the binding of biotin and some
analogues by avidin, Biochemcal Journal 101, pp. 774–780.
Henzler-Wildman, K. A., Lei, M., Thai, V., Kerns, S. J., Karplus, M., and Kern,
D. (2007). A hierarchy of timescales in protein dynamics is linked to
enzyme catalysis, Nature 450, pp. 913–916.
Hu, H., Lu, Z., and Yang, W. (2007). Fitting molecular electrostatic potentials
from quantum mechanical calculations, Journal of Chemical Theory and
Computation 3, 3, pp. 1004–1013.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

360 Quantum Mechanics–Based Polarizable Force Field for Proteins

Ji, C., Mei, Y., and Zhang, J. Z. H. (2008). Developing polarized protein-specific
charges for protein dynamics: MD free energy calculation of pka shifts
for asp26/asp20 in thioredoxin, Biophysical Journal 95, 3, pp. 1080–
1088.
Ji, C. G., Xiao, X., and Zhang, J. Z. H. (2012). Studying the effect of site-
specific hydrophobicity and polarization on hydrogen bond energy of
protein using a polarizable method, Journal of Chemical Theory and
Computation 8, 6, pp. 2157–2164.
Ji, C. G., and Zhang, J. Z. H. (2011). Understanding the molecular
mechanism of enzyme dynamics of ribonuclease a through protona-
tion/deprotonation of his48, Journal of the American Chemical Society
133, 44, pp. 17727–17737.
Kollman, P. (1993). Free energy calculations: Applications to chemical and
biochemical phenomena, Chemical Reviews 93, 7, pp. 2395–2417.
Mei, Y., He, X., Ji, C., Zhang, D., and Zhang, J. Z. H. (2012a). A fragmentation
approach to quantum calculation of large molecular systems, Progress
in Chemistry 24, 6, pp. 1058–1064.
Mei, Y., Ji, C., and Zhang, J. Z. H. (2006). A new quantum method for
electrostatic solvation energy of protein, The Journal of Chemical
Physics 125, 9, p. 094906.
Mei, Y., Li, Y. L., Zeng, J., and Zhang, J. Z. H. (2012b). Electrostatic polarization
is critical for the strong binding in streptavidin-biotin system, Journal
of Computational Chemistry 33, 15, pp. 1374–1382.
Mei, Y., Zhang, D. W., and Zhang, J. Z. H. (2004). New method for direct
linear-scaling calculation of electron density of proteins, The Journal of
Physical Chemistry A 109, 1, pp. 2–5.
Mei, Y., and Zhang, J. Z. H. (2009). Numerical stabilities in fitting
atomic charges to electric field and electrostatic potential, Journal of
Theoretical and Computational Chemistry 08, pp. 925–942.
Reynolds, C. A., Essex, J. W., and Richards, W. G. (1992). Atomic charges
for variable molecular conformations, Journal of the American Chemical
Society 114, 23, pp. 9075–9079.
Schönichen, A., Webb, B. A., Jacobson, M. P., and Barber, D. L. (2013).
Considering protonation as a posttranslational modification regulating
protein structure and function, Annual Review of Biophysics 42, 1, pp.
289–314.
Simonson, T., Carlsson, J., and Case, D. A. (2004). Proton binding to proteins:
pka calculations with explicit and implicit solvent models, Journal of the
American Chemical Society 126, 13, pp. 4167–4180.
January 27, 2016 15:33 PSP Book - 9in x 6in 10-Qiang-Cui-c10

References 361

Stouch, T. R., and Williams, D. E. (1992). Conformational dependence


of electrostatic potential derived charges of a lipid headgroup:
Glycerylphosphorylcholine, Journal of Computational Chemistry 13, 5,
pp. 622–632.
Stouch, T. R., and Williams, D. E. (1993). Conformational dependence of
electrostatic potential-derived charges: Studies of the fitting procedure,
Journal of Computational Chemistry 14, 7, pp. 858–866.
Tomasi, J., and Persico, M. (1994). Molecular interactions in solution: An
overview of methods based on continuous distributions of the solvent,
Chemical Reviews 94, 7, pp. 2027–2094.
Tong, Y., Mei, Y., Li, Y. L., Ji, C. G., and Zhang, J. Z. H. (2010). Electrostatic
polarization makes a substantial contribution to the free energy of
avidin-biotin binding, Journal of the American Chemical Society 132, 14,
pp. 5137–5142.
Warshel, A. (1981). Calculations of enzymic reactions: Calculations of
pka, proton transfer reactions, and general acid catalysis reactions in
enzymes, Biochemistry 20, 11, pp. 3167–3177.
Xiao, X., Zhu, T., Ji, C. G., and Zhang, J. Z. H. (2013). Development of an effective
polarizable bond method for biomolecular simulation, The Journal of
Physical Chemistry B 117, 48, pp. 14885–14893.
Xu, Z., Lazim, R., Sun, T., Mei, Y., and Zhang, D. (2012). Solvent effect on the
folding dynamics and structure of e6-associated protein characterized
from ab initio protein folding simulations, The Journal of Chemical
Physics 136, 13, p. 135102.
Yu, H., and van Gunsteren, W. F. (2005). Accounting for polarization in mole-
cular simulation, Computer Physics Communications 172, 2, pp. 69–85.
Zeng, J., Duan, L., Zhang, J. Z. H., and Mei, Y. (2013a). A numerically stable
restrained electrostatic potential charge fitting method, Journal of
Computational Chemistry 34, 10, pp. 847–853.
Zeng, J., Jia, X., Zhang, J. Z. H., and Mei, Y. (2013b). The F130L mutation in
streptavidin reduces its binding affinity to biotin through electronic
polarization effect, Journal of Computational Chemistry 34, 31, pp.
2677–2686.
Zhang, D. W., Chen, X. H., and Zhang, J. Z. H. (2003). Molecular caps for full
quantum mechanical computation of peptide-water interaction energy,
Journal of Computational Chemistry 24, 15, pp. 1846–1852.
Zhang, D. W., and Zhang, J. Z. H. (2003). Molecular fractionation with
conjugate caps for full quantum mechanical calculation of protein-
molecule interaction energy, The Journal of Chemical Physics 119, 7, pp.
3599–3605.
This page intentionally left blank
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Chapter 11

Polarizable Continuum Models for


(Bio)Molecular Electrostatics: Basic
Theory and Recent Developments for
Macromolecules and Simulations

John M. Herberta and Adrian W. Langea,b


a Department of Chemistry and Biochemistry,

The Ohio State University, Columbus, OH, USA


b Present address: Apple, Inc., Cupertino, CA, USA

[email protected]

11.1 Overview

The topic of this chapter is the solution of a simple and well-defined


model problem, namely, the molecular electrostatics problem for
one or more molecules immersed in a homogeneous dielectric
medium characterized by a dielectric constant, ε. The interface
between the atomistic region (the solute) and the continuum
solvent is defined by a molecule-shaped cavity such as the ones
depicted in Figs. 11.1(a) and 11.1(b). In practice, this cavity is
often constructed from atom-centered spheres, although more

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

364 Polarizable Continuum Models for (Bio)Molecular Electrostatics

Figure 11.1 (a) Pictorial depiction of a cavity, constructed from atom-


centered spheres, that defines the interface between the atomistic region
and the continuum. (b) Triangular tessellation of the atom-centered spheres
that define the surface of the protein 3U7T (crambin). (c) Cavity surface for a
segment of double-stranded DNA, discretized with atom-centered Lebedev
grids. Panel (b) is reprinted from Ref. [25]; copyright 2002 John Wiley and
Sons.

complicated constructions have been considered [21]. Atomistic


electrostatics is used for the solute, often with ε = 1 inside
of the cavity, although this choice is not required by the theory
and other values have been employed, e.g., in an attempt to
incorporate a protein dielectric “constant”. In any case, there is
a sharp discontinuity in ε(r) at the cavity surface. The atomistic
region can be described at various levels of complexity: quantum-
mechanically, in terms of an electron density, or classically in terms
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Overview 365

of a set of point charges and/or higher-order multipoles, be they


static or polarizable.
Given a solute charge distribution ρ(r) whose corresponding
electrostatic potential we denote by φ ρ (r), the solution to the
aforementioned model problem consists in determining the total
(solute + continuum) electrostatic potential
φ(r) = φ ρ (r) + φrxn (r), (11.1)
which includes a reaction-field potential, φrxn (r), that arises from
polarization of the medium. The total potential φ(r) is obtained
by solution of Poisson’s equation [7, 85]. For a sharp dielectric
boundary, this equation reads

1/εinside
∇ˆ φ(r) = −4πρ(r) ×
2
, (11.2)
1/εoutside
expressed here in unrationalized CGS units [85]. [For a solute
described by classical multipoles, the definition of ρ(r) in Eq. (11.2)
might be considered problematic, but the methods discussed below
actually require only the electrostatic potential φ ρ (r) generated by
these multipoles.] Having determined φ(r), the total electrostatic (or
polarization) free energy is

1
Gpol = dr ρ(r) φ(r), (11.3)
2 R3
where the factor of 1/2 accounts for the reversible work done in
polarizing the medium (hence why Gpol is a free energy) [7].
Equation (11.2) is a partial differential equation in three
dimensions, subject to boundary conditions such that φ(r) is
continuous across the cavity surface but must decay faster than
r −2 as r → ∞ [77]. This equation can be solved using grid-
based finite-difference techniques [5, 35, 54], though this requires
discretizing the whole of three-dimensional space, including the
infinite continuum region. For a macromolecular solute described
using a classical force field, such methods form the basis of much of
modern biomolecular electrostatics calculations [4, 54]. (In practice,
the equation that is usually solved in biomolecular applications
is the Poisson–Boltzmann equation [4, 33], which includes the
effects of a thermal distribution of dissolved ions; this will be
considered in Section 11.3.1). Such methods are useful for producing
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

366 Polarizable Continuum Models for (Bio)Molecular Electrostatics

an electrostatic map of the surface of the macromolecular solute,


but their finite-difference nature means that forces obtained from
such algorithms are inherently discontinuous. Although progress is
being made to reduce this problem [80–83], the discontinuities pose
a fundamental problem for the use of finite-difference solvers in
molecular dynamics (MD) simulations. Moreover, the requirement
to discretize all of three-dimensional space (or at least sufficiently
far into the continuum so that φ(r) has decayed to zero) means that
the size of the discretized linear systems becomes extremely large
for macromolecules. The matrices involved are sparse; nevertheless,
only highly parallelized approaches are tractable.
This chapter explores an alternative category of methods aimed
at solving the same continuum electrostatics problem using an
apparent surface charge (ASC), σ (s), induced at the cavity surface
by polarization of the medium. Here, we use s ∈  to denote a
point on the cavity surface, , whereas r ∈ R3 . The quantity σ (s)
is determined from ρ(r) as described in Section 11.2 but exists only
on . Thus

1
Gpol = ds ρ(s) σ (s). (11.4)
2 
Relative to finite-difference Poisson–Boltzmann approaches, such
methods have the advantage that only the two-dimensional cavity
surface must be discretized.
Methods based on an ASC have a long history in quantum-
mechanical (QM) calculations with continuum solvent [60, 61,
77], where they are generally known as polarizable continuum
models (PCMs). However, PCMs have seen little use in the area of
biomolecular electrostatics, for reasons that are unclear to us. In
the QM context, such methods are inherently approximate, even
with respect to the model problem defined by Poisson’s equation,
owing to the volume polarization that results from the tail of the
QM electron density that penetrates beyond the cavity and into the
continuum [13, 14, 89]. The effects of volume polarization can be
treated only approximately within the ASC formalism [14, 15, 89].
For a classical solute, however, there is no such tail and certain
methods in the PCM family do afford a numerically exact solution
of Poisson’s equation, up to discretization errors that are systemat-
ically eliminable. Moreover, ASC methods have been generalized to
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Theoretical Background 367

solve the linearized Poisson–Boltzmann equation [17, 22, 43, 58],


and to inhomogeneous dielectrics where the scalar ε is replaced
by a dielectric tensor [8, 57, 77]. Long-standing problems with
discontinuities engendered by discretization have been overcome
[41, 42, 70, 76, 87], so that intrinsically smooth forces are available
for MD simulations. Finally, linear-scaling implementations of the
PCM algorithm render such methods amenable to macromolecular
solutes [25, 69]. Such developments are potentially useful not only
for traditional biomolecular electrostatics calculations, but also for
QM/MM/PCM calculations, in which the PCM serves as a boundary
condition for a QM/MM calculation (replacing periodic boundary
conditions), but where the size of the large MM region dictates the
dimensionality of the linear equations, that must be solved to obtain
the ASC. For many QM/MM/PCM calculations, the cost of solving the
PCM equations would exceed the QM cost, were it not for linear-
scaling implementations of the PCM algorithm.
The goal of this chapter is to draw attention to some of
these developments, with the aim of popularizing PCMs beyond
small-molecule QM applications. We do not have the space here
for a comprehensive review (and several recent ones can be
found [56, 77]) but will focus mainly on our own work [41–46].
Some knowledge of basic continuum electrostatics is assumed; see
Ref. [7] for an excellent pedagogical introduction. This chapter will
focus mostly on the advantages of the PCM formulation of the
electrostatics problem, with an emphasis on methods that might
ultimately replace finite-difference Poisson–Boltzmann solvers. In
addition, details of our linear-scaling implementation of the PCM
algorithm are presented here for the first time, although this
algorithm has been available for some time as part of the Q-CHEM
software [39].

11.2 Theoretical Background

11.2.1 Continuum Electrostatics


The basic setup of the continuum electrostatics problem has been
outlined above. The ASC formalism is based on an ansatz in which
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

368 Polarizable Continuum Models for (Bio)Molecular Electrostatics

the exact reaction-field potential—which includes the effects of


volume polarization and is defined throughout three-dimensional
space—is nevertheless generated by a charge distribution σ that
exists only on the cavity surface:

σ (s)
φrxn (r) = ds . (11.5)
 |s − r|
The apparent surface charge, σ , should be distinguished from the
actual surface charge that forms at any dielectric boundary [85]. The
latter is given by
 
ε−1 
σ (s) = (ns · ∇)φ(s)
ˆ  +. (11.6)
4π s=s

Here, ns represents the outward-pointing surface normal vector


located at the point s, so that the derivative in Eq. (11.6) represents
the outward-pointing normal component of the electric field. (The
notation s = s+ indicates that this derivative should be evaluated
infinitesimally outside of the cavity.) The normal electric field
is discontinuous at a dielectric boundary, and satisfies a “jump”
boundary condition [7, 85],
 
ε (ns · ∇)φ(s)
ˆ + =ε (ns · ∇)φ(s)
ˆ −. (11.7)
outside s inside s
This comes from the fact that the electric displacement (= ε ×
electric field) is continuous across the dielectric boundary.
Equation (11.7) can be used to eliminate the exterior derivative
of φ from Eq. (11.6). Then, given some initial approximation for φ
(perhaps just φ ρ , which is known once the solute’s wave function
has been computed), one could compute the surface charge, and
thus the reaction-field potential, without the need to perform any
calculations outside of the solute cavity. For a QM solute, this
procedure must then be iterated to self-consistency. The original
PCM of Miertuš, Scrocco, and Tomasi [60, 61] used precisely this
approach; this model is now known as D-PCM. It is less desirable
than more modern PCMs, owing to the need to compute the
normal electric field, which may be subject to increased numerical
noise relative to later formulations that involve only electrostatic
potentials [77]. Perhaps more significantly, the formulation of
this model has conflated the apparent and actual surface charge
distributions, and corresponds to a neglect of volume polarization
[13].
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Theoretical Background 369

A key point in the elementary theory of dielectric materials is that


the polarization vector can be replaced by an appropriate charge
distribution, which consists of both a surface charge distribution
at the dielectric boundaries [Eq. (11.6)] and a volume charge
distribution in the dielectric material itself [7, 85]. The latter
was ignored in the early development of PCMs [13, 59], but was
finally treated carefully in the late 1990s by Chipman [13–15, 89].
Generalizing Chipman’s treatment to an arbitrary value of εinside , we
note that in the absence of the medium, the solute’s electrostatic
potential would satisfy the Poisson equation ∇ˆ 2 φ ρ = −4πρ/εinside
throughout all space. On the other hand, the total potential φ =
φ ρ + φrxn must satisfy Eq. (11.2); hence, the reaction-field potential
must satisfy the equation

0 for r inside of 
∇ˆ φrxn (r) =
2
−1 −1
. (11.8)
4π(εinside − εoutside )ρ(r) for r outside of 

This can be accomplished by invoking an apparent volume charge



0 for r inside of 
β(r) = −1 −1
(11.9)
(εoutside − εinside )ρ(r) for r outside of 

that satisfies a Poisson equation

∇ˆ 2 φ β = −4πβ. (11.10)

As such, an exact treatment of volume polarization [13, 18, 89],


which is not considered here, requires discretization of three-
dimensional space in order to solve Eq. (11.10).
If φ β is known, then according to Eq. (11.6) the proper surface
charge should be [14]
 
ε − 1 ˆ ρ
σ (s) = ∂s φ (s) + ∂ˆ s φ β (s) + ∂ˆ s φ σ (s+ ) , (11.11)

where the notation ∂ˆ s φ = (ns · ∇)φ ˆ has been introduced, and


σ +
∂ˆ s φ (s ) is the contribution arising from the dielectric boundary.
Infinitesimal displacements in φ ρ and φ β are not necessary, as these
potentials are continuous across the cavity surface [14].
An approximate treatment is obtained by noting that the (actual)
surface charge is obtainable directly from the discontinuity in the
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

370 Polarizable Continuum Models for (Bio)Molecular Electrostatics

normal electric field [13],


1
ˆ  
σ (s) = ∂s φ s=s− − ∂ˆ s φ s=s+ , (11.12)

The potential φ includes the reaction-field part, which depends upon
both the surface and volume charge distributions; hence, σ implicitly
appears on both sides of Eq. (11.12). Combining this result with
Eq. (11.7), and setting εinside = 1 and εoutside = ε for the remainder
of this chapter, one obtains
fε ˆ  ρ
σ (s) = ∂s φ (s) + φ σ (s) + φ β (s) , (11.13)

where fε = (ε − 1)/(ε + 1) and the normal derivative ∂ˆ s φ σ (s)
is now evaluated at the cavity surface, rather than an infinitesimal
displacement away. For pedagogical reasons we rewrite Eq. (11.13)
in the form
 
fε ˆ fε ˆ  ρ
1̂ − ∂s σ (s) = ∂s φ (s) + φ β (s) . (11.14)
2π 2π
This is as far as one can go with an exact formulation, unless
one is willing to solve Eq. (11.10) in three dimensions. However, the
effect of volume polarization can be approximated by introducing
an additional surface charge, α(s), that is defined such that its
electrostatic potential at the cavity surface is identical to that
generated by φ β . Let us define an operator Ŝ that acts on functions
f (s) defined on , generating the corresponding electrostatic
potential:

f (s )
Ŝ f (s) = ds , (11.15)
 |s − s |
We therefore insist that [15]
Ŝα(s) = φ β (s), (11.16)
and set σ = σ + α. This approximation allows for the elimination
of φ β in Eq. (11.14), affording an equation that requires only surface
integration [15]:
   
fε † 1
Ŝ 1̂ − D̂ σ (s) = fε D̂ − 1̂ φ ρ (s). (11.17)
2π 2π
   
K̂ R̂
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Theoretical Background 371

Here, the operator D̂† generates the negative of the outward-


pointing normal electric field [13, 15],


D̂† f (s) = − ds f (s ) |s − s |−1 , (11.18)
 ∂n s

and its adjoint is defined such that




D̂ f (s) = ds f (s ) |s − s |−1 . (11.19)
 ∂ns
The origin of Eq. (11.17), or at least its left side, is evident from
Eq. (11.14). Equation (11.17) also indicates the notation that we will
henceforth use for this equation:

K̂ σ (s) = R̂ φ ρ (s). (11.20)

Equation (11.20) is the primary PCM equation. It must be


discretized for actual computation (see Section 11.2.2), but then
given the solute’s electrostatic potential evaluated at the surface
discretization points, this equation can be solved for the induced
surface charge at those points (i.e., the discretized σ ). In an MM/PCM
calculation, the electrostatic solvation energy is then immediately
available via a discretized version of Eq. (11.3), although in QM
applications the surface charge must be included in the next self-
consistent field (SCF) iteration, and the SCF procedure is iterated
until both the electron density and the surface charge have reached
mutual self-consistency.
For QM solutes, volume polarization is treated approximately
(but accurately [89]) by Eq. (11.17), and Chipman has called
this approach surface and simulation of volume polarization for
electrostatics [SS(V)PE] [15]. An equivalent form of Eq. (11.17) was
actually derived prior to Chipman’s work, where it was called the
integral equation formalism (IEF) [10, 58]. The equivalence is not
obvious, as the original IEF requires the solute’s electric field as an
input in addition to its electrostatic potential, but it was later shown
that the former could be eliminated in order to obtain Eq. (11.17)
[9]. The operator K̂ can similarly be manipulated into different
forms, by means of the identity [15]

D̂ Ŝ = Ŝ D̂† . (11.21)
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

372 Polarizable Continuum Models for (Bio)Molecular Electrostatics

However, this identity is not preserved upon discretization, and


different implementations of SS(V)PE/IEF-PCM are therefore pos-
sible, not all of which perform equally well in practice [44]. This is
discussed below.
Finally, it is worth emphasizing that for classical solutes φ β ≡
0 and Eq. (11.17) represents an exact solution to the continuum
electrostatics problem. To emphasize this point, we have performed
numerical comparisons of MM/PCM calculations versus results
obtained from the “adaptive Poisson–Boltzmann solver” (APBS) [5],
which represents a recent implementation of the three-dimensional
finite-difference approach. (The solvent’s ionic strength was set to
zero in the APBS calculations.) Results for amino acids, plotted
in Fig. 11.2, show sub-kcal/mol differences in most cases, and
differences of < 0.1 kcal/mol for the “X = DAS” version of SS(V)PE

101
X=DAS
(EIEF-PCM – EAPBS) / kcal mol –1

X=SAD†

100

10-1

10-2
A C F G I L M N P Q S T V W Y H K R E D
amino acids

Figure 11.2 Comparison of total energies (on a logarithmic scale) for


aqueous amino acids, where the solute is described using the AMBER99 force
field and the solvent is a dielectric continuum. The continuum electrostatics
problem is solved either by finite-difference solution of Poisson’s equation
using the APBS software [5], or else using two different forms of IEF-PCM
(X = DAS or X = SAD† , as described in Section 11.2.2.1). What is plotted is
the difference E IEF-PCM − E APBS between these two solutions. The APBS and
IEF-PCM solute cavities are identical. APBS calculations used a 193 × 193 ×
193 grid with a grid resolution of 0.1 Å, whereas IEF-PCM calculations used
590 Lebedev points per atomic sphere with Gaussian blurring.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Theoretical Background 373

that is our preferred implementation of this model, for reasons


discussed below.

11.2.2 Practical Considerations


11.2.2.1 Matrix equations
Specific choices for how to construct and discretize the solute cavity
are discussed below, but for now let us assume that this has been
done, so that  has been turned into a discrete set of points si , each
with a well-defined surface area, ai . The continuous surface charge
σ (s) is thus replaced with a set of point charges qi and Eq. (11.20) is
converted into a set of linear equations
Kq = Rv (11.22)
for the vector q of surface charges, with vi = φ ρ (si ). The matrices
K and R depend upon the matrix representations of the operators Ŝ,
D̂, and D̂† .
Since Ŝ generates the electrostatic potential [Eq. (11.15)], it is
clear that Si j = |si − s j |−1 (in atomic units) for i = j , because then
the quantity Si j q j is the electrostatic potential due to q j , evaluated at
the point si . The diagonal elements Sii could in principle be obtained
by evaluating the surface Coulomb integral in Eq. (11.15) over the
area ai ⊂ . For efficiency, however, the expression
 1/2
4π shape
Sii = C fi (11.23)
ai
shape
is widely used, where C ≈ 1.06 and fi is an (often omitted)
shape factor [19]. This choice is based on the exact result Sii =
(4π/ai )1/2 for a uniform spherical surface grid.
The integral operator D̂ is replaced by a matrix product DA,
where A is a diagonal matrix containing the areas ai . The matrix
elements of D are typically defined as [19]
 
−(2π + k=i Di k ak )/ai for i = j
Di j = −3
. (11.24)
−n j · (s j − si )|s j − si | for i = j
The off-diagonal matrix element is recognizable from the integrand
in Eq. (11.19), whereas the diagonal elements are based upon a sum
rule derived in Ref. [68]. (This sum rule proves to be problematic in
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

374 Polarizable Continuum Models for (Bio)Molecular Electrostatics

modern, smooth discretization schemes, and the definitions of Dii


and Sii will be modified below.)
One complication with the discretized PCM equation is that the
discretized analogue of Eq. (11.21), which would read DAS = SAD† ,
is in general not satisfied, except in the special case of a spherical
cavity. The discretized form of Eq. (11.17) is therefore ambiguous,
because the operator Ŝ D̂† appearing in that equation could be re-
placed by any linear combination X̂ = a Ŝ D̂† +b D̂ Ŝ so long as a+b =
1, but the corresponding matrix X = aSAD† +bDAS leads to different
matrix equations for each choice of coefficients. In Chipman’s origi-
nal work on SS(V)PE [15], the choice a = b = 1/2 was suggested, as
this leads to a symmetric matrix K and thus more efficient solution
of Eq. (11.22). However, IEF-PCM calculations using the other two
“obvious” choices (a = 0 and b = 1, or vice versa) have also
been reported [24, 26, 51]. Only recently have the consequences
of these choices been recognized [44]. In particular, for realistic
molecular cavities, only the choice X = DAS achieves the correct
conductor limit (ε → ∞), whereas X = SAD† does not, nor does the
symmetrized version X = (DAS + SAD† )/2. A particular example is
shown in Fig. 11.3, and an analytic proof is provided in Ref. [44].
As a result, we choose X = DAS to define the K matrix. For
definiteness, the forms of K and R for this version of SS(V)PE/IEF-

–25
IEF-PCM with X = DAS
–30 IEF-PCM with X = SAD†
C-PCM
G pol / kcal mol –1

–35

–40

–45

–50

–55
1 10 100 1000
dielectric constant, ε

Figure 11.3 Electrostatic solvation energy for classical histidine as a


function of dielectric constant. The C-PCM approach is free of the matrix
D and achieves the correct conductor limit as ε → ∞. Reprinted from
Ref. [44]; copyright 2011 Elsevier.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Theoretical Background 375

Table 11.1 Matrices used in the equation Kq = Rv for


several different PCMsa

Method Matrix K Matrix R

SS(V)PE/
S − ( fε /2π)DAS − fε (1 − 1
2π DA)
IEF-PCM

C-PCM/  
S − ε−1
ε 1
GCOSMO
DESMO S −1 + 1ε M

a
The factor fε = (ε − 1)/(ε + 1) and the matrix M has elements Mi j =
ρ ρ
δi j φκ (si )/φ0 (si ).

PCM are explicated in Table 11.1. Also listed in this table are the
forms of K and R for the so-called conductor-like model, C-PCM [25].
This model is considerably simpler in that the matrix D is absent.
C-PCM is identical to the generalized conductor-like screening model
(GCOSMO) [78], and almost identical to the original COSMO [37].
(G)COSMO was introduced prior to SS(V)PE/IEF-PCM, based on ad
hoc arguments and designed to achieve the correct ε → ∞ limit. We
will show below that this model differs from SS(V)PE/IEF-PCM only
by terms of order ε−1 . Due to its simplicity, C-PCM is therefore our
preferred model for high-dielectric solvents such as water.

11.2.2.2 Cavity construction and discretization


In order to obtain the matrix equations above, one must decide how
to construct, and subsequently discretize, the cavity surface. The
most widely used methods take the cavity to be a union of atom-
centered spheres [77], as suggested in Fig. 11.1(a). The electrostatic
solvation energy is quite sensitive to the radii of these spheres (it
varies as ∼ R −1 in the Born ion model), and highly parameterized
constructions that exploit information about the bonding topology
[6] or the charge states of the atoms [31] are sometimes employed.
The details of these parameterizations are beyond the scope of
the present work, especially given that careful reconsideration of
these parameters is probably necessary for classical biomolecular
electrostatics calculations.
Having selected a set of atomic radii, these must next be turned
into a discrete set of surface grid points. In QM/PCM calculations, the
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

376 Polarizable Continuum Models for (Bio)Molecular Electrostatics

most popular approach has been the generating polyhedra (GEPOL)


algorithm [1], which tessellates the surface of each sphere into a
collection of small triangles or tesserae, using a 60-sided regular
polygon. [An example is depicted in Fig. 11.1(b).] A discretization
charge qi is placed in the center of each tessera. One difficulty
with this procedure is the complicated geometry of how these
triangles should change as a function of the atomic coordinates,
which significantly complicates the formulation of analytic energy
derivatives [23]. Furthermore, the GEPOL discretization has only a
limited degree of systematic improvability [1].
A more appealing procedure is to use atom-centered Lebedev
angular quadrature grids [42, 70, 87], which are designed as exact
quadratures through a given order in spherical harmonic functions
and are therefore systematically improvable [42]. Figure 11.1(c)
shows an example of double-stranded DNA, discretized using 50
Lebedev points per atomic sphere.
For QM/PCM applications, an appealing alternative to carefully
parametrized atomic radii is a one-parameter cavity construction
in which the cavity is defined as an isocontour of the QM electron
density [12, 19, 30]. Unfortunately, analytic energy gradients have
never been reported for such a construction (they are complicated
by the fact that ns becomes density-dependent [19, 30]), and in any
case such an approach is not possible in MM/PCM or QM/MM/PCM
calculations.
In the context of generalized Born models, Friesner and co-
workers [88] have experimented with a cavity defined as an
isosurface of a pseudo-density, d(r), that is expressed as a sum of
atom-centered Gaussians:
N
atoms
 
d(r) = exp −B|r − r K |2 /R 2K . (11.25)
K

The parameter B controls what we term the “blobbiness” of the


surface, and R K is a Gaussian width parameter for the atom centered
at r K . An isosurface contour value d = e−B ensures that the
isosurface coincides with the radius R K for a single, isolated atom
(Born ion model). A discretization grid for the isosurface can be
obtained using the marching cubes algorithm [52], arriving at a
tessellated surface grid made up of triangles. This construction will
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Theoretical Background 377

be used in macromolecular PCM calculations presented later in this


chapter.
An issue with all of these discretization schemes—except
possibly the genuine isodensity surface that is not considered in
this work—is the fact that the solvation energy is a discontinuous
function of the atomic coordinates, because discretization points
appear and disappear as the overlap between atomic spheres
changes. (In principle, the energy also loses rotational invariance
upon discretization, but we fund that this problem is not serious
[42]). The discontinuity problem, which is shared by finite-
difference Poisson–Boltzmann solvers, has recently been resolved in
the context of PCMs, with the development of intrinsically smooth
discretization algorithms [42, 70, 76, 87]. These are discussed in
Section 11.4.1.

11.2.2.3 Beyond electrostatics


This chapter is devoted strictly to a discussion of electrostatic in-
teractions between solute and continuum solvent; non-electrostatic
interactions are not discussed beyond a brief mention here.
Such interactions include the cavitation energy (a destabilizing
interaction representing the energy required to carve a molecule-
shaped void out of the continuum); dispersion (the stabilizing
van der Waals interaction); specific interactions such as hydrogen
bonding; and changes to the solvent structure upon insertion of
the solute. To some extent, these effects can be captured (especially
in QM/PCM calculations) by including one or more explicit solvent
molecules in the atomistic region, albeit at increased cost.
Simple corrections for non-electrostatic interactions have been
suggested, wherein atomic-specific parameters are used to describe
cavitation, Pauli repulsion, and dispersion [2, 3, 20, 84]. These non-
electrostatic interactions are then added to Gpol to obtain the total
solvation energy. The most successful examples of this approach
are the so-called SMx models of Cramer and Truhlar [27], most
of which are not actually PCMs per se but rather generalized
Born models. However, one such model (“SMD”) has recently been
parameterized for use with IEF-PCM electrostatics [55] and exhibits
mean errors of  1 kcal/mol as compared to experimental solvation
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

378 Polarizable Continuum Models for (Bio)Molecular Electrostatics

energies for small neutral molecules, although the mean error for
ions is 4 kcal/mol. More recently, Pomogaeva and Chipman [67]
suggested “more ab initio” forms for the various non-electrostatic
interactions, and demonstrated performance equal to or exceeding
that of the best SMx models, for aqueous solvation, with fewer
empirical parameters.
All of the aforementioned examples were developed in the
context of QM/PCM calculations and would undoubtedly need to be
reconsidered, or at least re-parameterized, for classical solutes.

11.3 New Models and Insights

In this section we discuss new theory, as opposed to the new


algorithms that are discussed in Section 11.4. Recent theoretical
developments include new methods for incorporating salt effects
into PCMs (Section 11.3.1) and new connections between PCM
and generalized Born models (Section 11.3.2), which may help to
improve the latter.

11.3.1 Generalized Debye–Hückel Theory


The discussion of continuum electrostatics in Section 11.2.1 was
limited to solution of Poisson’s equation, which can be achieved
exactly (for classical solutes) or to a good approximation (for QM
solutes) using PCMs. In biomolecular applications, however, the
objective is usually solution of the Poisson–Boltzmann equation
[4, 33]. For low concentrations of dissolved ions, the latter is often
replaced by the linearized Poisson–Boltzmann equation (LPBE),

∇ˆ 2 φ(r) = −4πρ(r) for r inside of 
(11.26)
(∇ − κ )φ(r) = −4πρ(r)/ε for r outside of .
ˆ 2 2

Here, κ = 8πe2 I/εkB T is the inverse Debye length, for a solution


whose ionic strength is I. The LPBE was derived by Debye and
Hückel [28], and its analytic solution for a spherical cavity forms the
basis of the eponymous theory. In this section, we discuss how PCMs
can be modified to solve the LPBE, but first we present an alternative
derivation of GCOSMO that will be useful in this respect.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

New Models and Insights 379

11.3.1.1 Alternative derivation of C-PCM/GCOSMO


The original derivation of COSMO was based on taking ε → ∞,
in which case Sq = −v is the exact solution to the molecular
electrostatics problem, then rescaling the solution for finite ε [37,
78]. Recently, we presented a much more satisfying derivation [43].
Our approach starts from an ansatz
 ρ
φ0 (r) + φ0σ (r) for r inside of 
φ(r) = ρ (11.27)
φ0 (r)/ε for r outside of 
for the electrostatic potential, consisting of a solute contribution
ρ
φ0 and a second contribution φ0σ arising from the induced surface
charge. The subscripts on these quantities indicate that κ = 0,
which will later be replaced by nonzero κ in the event of dissolved
ions. Enforcing the condition that φ(r) must be continuous across
, Eq. (11.27) immediately affords the C-PCM/GCOSMO equation in
Table 11.1 [43]. On the other hand, this ansatz cannot be made to
satisfy the jump boundary condition in Eq. (11.7).
ρ
Noting that ∂ˆ s φ0 is continuous across , the reaction-field
potential must be solely responsible for the jump in the electric field
[16]. This condition can be expressed as [43]
 
ε−1 ˆ ρ 1
∂ˆ s φrxn (s+ ) = − ∂s φ0 (s) + ∂ˆ s φrxn (s− ). (11.28)
ε ε
The normal derivative of the ansatz in Eq. (11.27) lacks the
second term in Eq. (11.28); hence, C-PCM/GCOSMO engenders
errors of order ε−1 , as compared to an exact treatment of classical
electrostatics. Such errors are negligible in water [44], as seen in
Fig. 11.3.

11.3.1.2 DESMO and ion exclusion


The above derivation of GCOSMO immediately suggests how this
model can be generalized to solvents with non-zero ionic strength,
using a modified ansatz of the form
 ρ
φ0 (r) + φ0σ (r) for r inside of 
φ(r) = (11.29)
φκρ (r)/ε for r outside of 
and enforcing continuity of φ at the cavity surface but neglecting
the jump boundary condition for the electric field [43]. In homage
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

380 Polarizable Continuum Models for (Bio)Molecular Electrostatics

to (G)COSMO, we have called the resulting PCM the Debye–Hückel-


like screening model (DESMO), and its basic working equation is
given in Table 11.1. The only change, relative to GCOSMO, is the
need to compute the screened electrostatic potential φκρ at the
ρ
cavity surface, in addition to the unscreened potential, φ0 . (The
screened form simply uses the Yukawa potential e−κr /εr in place of
the Coulomb potential 1/εr that is used in the unscreened form.)
DESMO represents the leading-order (in 1/ε) approximation to the
“screened” SS(V)PE and IEF-PCM models that have been developed
to solve the LPBE [17, 22, 58]. Working equations for the latter
models are more complicated and can be found in Ref. [43]. In high-
dielectric solvents, however, DESMO incurs negligible error with
respect to those models but retains the simplicity of (G)COSMO.
On the other hand, the screened SS(V)PE [17] and IEF-PCM
[22, 58] treatments of the LPBE lack one important feature of the
original Debye–Hückel theory, namely, a correction for the finite size
of the dissolved ions. To understand this, let us recall the model
problem considered by Debye and Hückel [28], which consists of a
point charge q centered in a spherical cavity of radius R cav , outside
of which is the dielectric medium. The dissolved ions are assumed to
have a finite radius Rion , and their centers therefore cannot approach
the charge q closer than a distance R cav + Rion . This manifests as an
ion exclusion layer (Stern layer) for Rcav ≤ r ≤ R cav + R ion , and a
long-range electrostatic potential (for r > Rcav + Rion ) of the form
 −κr 
e eκ(Rcav +Rion )
φlong-range (r) = q
DH
. (11.30)
εr 1 + κ(Rcav + Rion )
 
γ
This potential has the form of the charge q times a screened
Coulomb potential (Yukawa potential, e−κr /εr) multiplied by what
we have termed an ion exclusion factor, γ [43]. This suggests that ion
exclusion might be incorporated into DESMO using an ansatz of the
form [43]

eκ(Rion +R I ) e−κ|si −r|
φκρ (si ) = dr ρ(r) (11.31)
1 + κ(Rion + R I ) R3 |si − r|
 
γI

for discretization points si located on the I th atomic sphere.


January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

New Models and Insights 381

Table 11.2 Solvation energiesa for a model consisting


of 25 disjoint spheres with a point charge in eachb

ε κ −1 / Gpol / Error / kcal mol−1


Å kcal mol−1 DESMO SS(V)PE/
with γ I without γ I IEF-PCM
4 ∞ −2899.49 −0.01 −0.01 0.98
20 ∞ −3672.69 0.00 0.00 0.27
80 ∞ −3817.67 0.00 0.00 0.07

4 25 −2914.09 −0.26 −1.39 0.70


20 25 −3675.61 −0.05 −0.28 0.21
80 25 −3818.40 −0.01 −0.07 0.06

4 5 −3035.50 −0.41 −20.92 0.37


20 5 −3699.90 −0.08 −4.18 0.13
80 5 −3824.47 −0.02 −1.04 0.04

4 3 −3122.65 −0.22 −44.47 0.30


20 3 −3717.32 −0.05 −8.90 0.09
80 3 −3828.83 −0.01 −2.22 0.03

a
Computed from an exact analytic solution of the LPBE,[53] for
various values of ε and κ with R ion = 0.
b
Reprinted from Ref. [43]; copyright 2011 American Institute of
Physics.

Table 11.2 presents some results for a simple model consisting


of 25 disjoint spheres immersed in a salty dielectric, with a point
charge centered in each sphere but with mobile ions of zero
size (Rion = 0). The LPBE can be solved analytically for this
toy problem [53], which is intended to explore how continuum
methods might perform for modeling protein–protein interactions
in solution. Solvation energies obtained from the LPBE are on the
order of −3000 kcal/mol or more and DESMO, with ion exclusion
factors γ I as suggested in Eq. (11.31), reproduces these energies to
within 0.4 kcal/mol in each case. Without the ion exclusion factors,
however, very large errors can result. Interestingly, DESMO with
ion exclusion is actually slightly more accurate than the versions
of SS(V)PE and IEF-PCM that have been suggested for use with
the LPBE (and which are equivalent for a classical solute). Errors
in SS(V)PE/IEF-PCM are a reflection of the fact that this method
is fundamentally approximate in the presence of outlying charge,
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

382 Polarizable Continuum Models for (Bio)Molecular Electrostatics

which arises here not from tails of a wave function but rather from
the presence of disjoint solute cavities [43].
In the future, DESMO should be tested with finite ion size and
compared to numerical solution of the LPBE using a cavity surface
(defined by the van der Waals radii R I ) that does not coincide with
the ion exclusion surface (defined by R I + Rion ). Finite ion size has
incorporated into Generalized Born models, however, via the ion
exclusion factors in Eq. (11.31) [46]. These models are discussed in
the next section.

11.3.2 Connections to Generalized Born Models


The most widely used implicit solvation models in biomolecular
simulations are probably the Generalized Born (GB) models [63,
79], because they are computationally inexpensive and amenable
to analytic forces. GB models posit that the electrostatic solvation
energy can be expressed in the form
 
1 ε − 1  qi q j
Gpol = −
GB
, (11.32)
2 ε i, j
fi j

where qi and q j are the MM point charges on atoms i and j , and


the quantity fi−1
j is an effective Coulomb potential. In the case of a
spherical cavity, fi j has two known limits: [34, 45]
sphere
fi j → ri j as ri j → ∞
sphere  perf perf 1/2
. (11.33)
fi j → ri2j + Ri Rj as ri j → 0

(We use atomic units in this discussion, so ri−1 j is the Coulomb


potential between charges qi and q j .) The limit ri j → ∞
corresponds to the solvable model problem of two Born ions in non-
overlapping spherical cavities, while the limit ri j → 0 becomes valid
when qi and q j occupy the same spherical cavity [34].
perf
The quantity Ri in Eq. (11.33) denotes the “perfect” effective
Born radius for qi [64], the efficient and accurate computation of
which is a major part of the development of GB models. To define
perf
Ri , let GPE
pol, ii denote the exact polarization energy (obtained by
solving Poisson’s equation) for the atomic charge qi in a cavity
representative of the entire molecule. (That is, we turn off all charges
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

New Models and Insights 383

q j =i but leave the cavity unchanged.) Then the definition


 
perf 1 ε−1 qi2
Ri =− (11.34)
2 ε GPE
pol, ii

assures the correct Born ion limit in Eq. (11.32) [45].


Equation (11.34) is not a practical construction of the perfect
radii, because it requires solving Poisson’s equation for the entire
molecule, once per atom. Computationally tractable approximations
have been proposed and tested [48, 62], but will not be discussed
here. Instead, we discuss a formal connection that was discovered
recently between PCMs and the GB ansatz [45], and propose PCMs
as a means to generate benchmark data for testing the various
approximations that go into GB models.
The key breakthrough is to recognize that when the solute is
composed of point charges (or higher-order classical multipoles
[45]), each solute charge’s contribution to the ASC can be treated
individually within the PCM formalism. As a result, the total
electrostatic solvation energy assumes a pairwise-additive form.
Equating this energy with Eq. (11.32) affords a formal expression
for the exact effective Coulomb operator for GB theory: [45]

1 1 ρ ρ
= ds φi (s) Ĉ ε−1 φ j (s). (11.35)
fi j qi q j 
ρ
= fii−1 . The quantity φi (s)
perf
Exact perfect radii are given by Ri
in Eq. (11.35) denotes the electrostatic potential at the point s
that is generated by the solute charge qi , and the operator Ĉ ε can
be expressed in terms of the PCM operators Ŝ and D̂ introduced
above [45]. The subscript in Ĉ ε is intended as a reminder that this
operator depends explicitly on the dielectric constant, so that fi−1 j
cannot be independent of ε, as is assumed in most (though not all
[72]) GB models. We have argued [45] that the only reasonable,
ε-independent choice is the ε → ∞ limit (especially given the
importance of aqueous solvation), which has the added benefit of
simplifying the operator Ĉ ε , since IEF-PCM reduces to C-PCM in that
limit.
Thus, we have demonstrated a formal equivalence between PCM
and GB calculations, wherein perfect radii and exact values of fi j can
be computed from PCM calculations. Exact fi j values are defined
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

384 Polarizable Continuum Models for (Bio)Molecular Electrostatics

only in a pairwise way, for each pair of atoms in a macromolecule,


and the key to an accurate GB model is to pick an analytic functional
form that can interpolate between the two limits in Eq. (11.33) while
fitting the pairwise fi j data. A commonly used form for the analytic
interpolating function is [65]

fi j = ri2j + Ri R j i j , (11.36)

where i j is some function of ri j and the atomic radii Ri and R j .


(The latter are generally some approximations to the perfect radii.)
The form

iStill
j = exp(−ri2j /c Ri R j ) (11.37)

is often used, with c = 4 in the original GB model of Still et al. [74]


We have used C-PCM calculations to obtain a data set of fi j values
and perfect radii for a small collection of proteins, using Eq. (11.35)
[45]. Figure 11.4 plots the pairwise i j data obtained for one of
these proteins, which consists of 515 atoms for a total of 132,355
values of iCj -PCM with i = j . Although the functional form originally
proposed by Still et al. [74] grossly conforms to the contours of the
data, there appears to be room for improvement.

Figure 11.4 Exact values of i j (black dots) for all pairs of atoms in the
protein 1AJJ, obtained from C-PCM calculations with perfect radii Ri and R j .
The colored curves depict various analytic interpolating functions.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

New Models and Insights 385

Setting
2 i j ri j
i j = + i2j (11.38)
(Ri R j )1/2

in Eq. (11.36) affords fi j = ri j + i j (Ri R j )1/2 , where i j is a new


pairwise interpolating function having limits

i j → 1 as ri j /Ri R j → 0
(11.39)
i j → 0 as ri j /Ri R j → ∞.

The quantities (Ri R j )1/2 that are needed to obtain i j and fi j can be
computed outside of the pairwise GB loop, and one may seek a form
for i j that does not require calls to the exponential or square root
functions, both of which are required when the “canonical” form of
i j [Eq. (11.37)] is used. An example is the function
 −16
p16 ξ ri j
i j = 1 + , (11.40)
16(Ri R j )1/2
which is a 16th-order approximation to the function in Eq. (11.37),
but which can be evaluated using only a small number of floating
point operations [45]. In Ref. [45], we fit the parameter ξ (along
with the truncation order, p = 16) to a protein training set. Visual
inspection of the various interpolating functions that are plotted in
p16
Fig. 11.4 suggests that the function i j obtained from Eq. (11.40)
does indeed fit the data better than the function suggested by Still et
al. [74], although the enormous number of data points in the figure
somewhat obscures the true spread of the data. In any case, an exact
data set of i j values has never been available before, so the utility
of PCMs in re-parameterizing GB models seems clear.
Figure 11.5(a) shows contour plots of two-dimension histograms
that count the number of iCj -PCM data points, as a function of
the value of iCj -PCM and the dimensionless distance ri j /(Ri R j )1/2 .
Various analytic interpolating functions i j are superimposed on
top of these contours. Much more so than the function iStill j , the
exp p16
interpolating functions i j and i j that we suggested in Ref. [45]
cut a path through where the number of data points is peaked. On the
other hand, Fig. 11.5(b) superimposes these functions on top of two-
dimensional histograms of the pairwise GB energy contributions,
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

386 Polarizable Continuum Models for (Bio)Molecular Electrostatics

Figure 11.5 Analytic interpolating functions i j superimposed on top of


contours that represent (a) the total number of exact iCj -PCM data points
for protein 1AJJ in the scatter plot of Fig. 11.4, and (b) the pairwise energies
pol, i j | = |qi q j /2 fi j | associated with each data point. In (a), the contours
|GGB
are shown in black with the outermost contour representing 100 data points
per bin and subsequent contours increasing in increments of 100 data
points per bin. In (b), the contours are shown in alternating black and gray,
with the outermost (rightmost) contour representing Gpol, i j = 1 kcal/mol
and subsequent contours increasing in increments of 1 kcal/mol. Bin widths
are 0.5 and 0.026, respectively, in the dimensionless quantities ri j /(Ri R j )1/2
and i j .

|GGB
pol, i j | = |qi q j |/2 fi j [see Eq. (11.32)]. This figure seems to
suggest that the new interpolating functions push i j away from the
energetically most important data points, so further improvement in
the effective Coulomb operator may be possible.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 387

However, error statistics confirm that the interpolating functions


suggested in Ref. [45] do fit the |GGB pol, i j | data better than i j
Still
.
p16
In fact, the function i j [Eq. (11.40)] actually reduces the errors
in GB solvation energies while simultaneously accelerating the
calculations [45]. To wit, when quasi-perfect “R6∗ ” radii [62], which
can be computed cheaply, are used in place of the perfect radii that
are only available in benchmark calculations, the mean absolute
error in GGBpol as compared to C-PCM benchmarks is reduced from
8.7 kcal/mol for the canonical GB operator iStill j to 5.1 kcal/mol
p16 p16
for the interpolating function i j . At the same time, use of i j
reduces the cost by a factor of three relative to the canonical
GB model based on iStill j [45]. The new interpolating function
can be “dropped in” to existing MD codes with GB capability, and
given the sizable speed-ups that we have observed, we advocate
extensive further testing of the GB kernel in Eq. (11.40) and related
functions.
Finally, let us briefly mention the incorporation of salt effects
into GB models. It is recognized that standard GB models tend to
exaggerate the importance of the salt, likely due to neglect of the ion
exclusion layer [73]. Empirical scaling of κ has been suggested as
a remedy [71, 73]. Alternatively, however, the DESMO ion exclusion
factors [γ I in Eq. (11.31)], in conjunction with the formal connection
between PCMs and the GB ansatz, can be used to suggest “first
principles” corrections to GB models that incorporate salt effects
[46]. Several new GB models that incorporate salt effects were
suggested in Ref. [46], based on formal connections to DESMO, and
shown to be only slightly less accurate than methods that use an
empirical scaling factor for κ. As such, these new models may serve
as starting points for future development of salty GB models.

11.4 Advances in Algorithms

In this section we focus on technical rather than theoretical


developments, but ones that are absolutely essential if PCMs are
going to be brought to bear on macromolecules.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

388 Polarizable Continuum Models for (Bio)Molecular Electrostatics

11.4.1 Intrinsically Smooth Discretization


An issue with the PCM formalism introduced in Section 11.2.2.1 is
that the electrostatic energy is in general a discontinuous function
as the solute atoms are displaced, because the number and size of
the surface tesserae may change as a function of solute geometry. A
similar problem is suffered by finite-difference Poisson–Boltzmann
solvers, and the “solution” in those cases (in order to achieve stable
forces for MD simulations, for example) is tight thresholding and/or
some kind of interpolation between grid points [80–83].
The situation is simpler in the case of PCMs, where only the cavity
surface (and not the whole of three-dimensional space) needs to be
discretized. A switching function of the form

atoms
Fi = f (si , r K ) (11.41)
K, i ∈K
/

can be used to attenuate the contribution to the PCM equations


from the i th surface grid point si , as that point passes through
a narrow buffer region around the solute cavity surface, which is
defined in terms of spheres centered at the atoms r K . The quantity
f in Eq. (11.41) is some function that changes smoothly from 0 to 1
across this buffer region [41, 42, 87].
This simple procedure, however, leads to problems for certain
cavity surface definitions [41, 44]. In particular, while the switching
function can provide a potential energy surface that is rigorously
smooth in the mathematical sense of having continuous derivatives,
those derivatives may fluctuate wildly as a function of the solute
coordinates [41]. These oscillations are actually exacerbated by the
switching function, which allows the surface charges to approach
one another more closely than would be the case if they were simply
turned on or off discontinuously at the cavity surface boundary. The
result can be sharp singularities in the energy along a geometry
optimization [41] or MD trajectory [42]. For example, Fig. 11.6
shows harmonic vibrational spectra for a relatively large system
(adenine with 52 explicit water molecules, all embedded within
a polarizable continuum) computed using the fixed points with
variable areas (FIXPVA) discretization algorithm of Ref. [75]. The
FIXPVA approach achieves a smooth potential surface by applying
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 389

6000
FIXPVA 3500
SWIG
3000
5000
intensity / km mol –1

2500

4000 2000

1500
3000
1000

500
2000
0
0 500 1000 1500 2000 2500 3000 3500 4000

1000

0
0 2000 4000 6000 8000 10000 12000 14000 16000
vibrational wavenumber / cm-1

Figure 11.6 Harmonic vibrational spectra for (adenine)(H2 O)52 computed


at the MM/PCM level, using two different smooth implementations of
the C-PCM solvation model. Harmonic frequencies were computed via
finite difference of analytic energy gradients and convolved with 10 cm−1
Gaussians. Arrows indicate FIXPVA peaks with no obvious SWIG analogues,
and the inset blows up the region of the spectrum < 4000 cm−1 . Reprinted
from Ref. [41]; copyright 2010 American Chemical Society.

a switching function to the surface areas ai , scaling them smoothly


to zero as the point si passes through the buffer region and
into the cavity. However, sharp fluctuations in the FIXPVA energy
gradient (which, we emphasize, is a continuous function) manifest
as anomalously large vibrational frequencies of up to 16,000 cm−1 !
A solution to this problem is to use Gaussian blurring of the
surface charges [41, 87], in which each discretization charge qi is
replaced by a Gaussian function

 3/2
ζi2  
gi (r) = qi exp −ζi2 |r − si |2 . (11.42)
π

The width parameters ζi are chosen so as to approximate a uniform


surface charge in the case of a single point charge centered in a
spherical cavity [87], and are fixed parameters once the number
of Lebedev discretization points per sphere is specified. The matrix
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

390 Polarizable Continuum Models for (Bio)Molecular Electrostatics

elements of S are then



ζi (2/π )1/2 F i−1 i= j
Si j = (11.43)
erf(ζi j si j )/si j i = j
where si j = |si − s j | and ζi j = ζi ζ j (ζi2 + ζ j2 )1/2 . The off-diagonal
element is simply the Coulomb interaction between two Gaussians,
while the diagonal element Sii consists of the s j → si limit of that
Coulomb interaction, multiplied by F i−1 . The latter factor guarantees
a smooth potential surface by ensuring that Eq. (11.22) has a null
space corresponding to grid points for which F i = 0 [42]. As such,
it is safe to discard points for which F i falls below a given threshold,
thus reducing the dimension of the linear system in Eq. (11.22).
The matrix elements of D require some care. Off-diagonal
elements can be computed from Si j according to [42]
∂ Si j
Di j = n j · . (11.44)
∂si
Diagonal elements are often computed by means of a sum rule [see
Eq. (11.24)] [68], but this relationship is no longer rigorously valid
in the presence of attenuated grid points that may actually reside
within the cavity. This can lead to serious numerical problems in the
context of smooth PCMs [44]. Instead, we take Dii = Sii /2R I , where
R I is the radius of the atomic sphere on which the point si resides
[42]. This formula is correct for a spherical surface of radius R I [58].
The combination of switching function, Gaussian blurring, and
Lebedev discretization, with these choices for S and D, constitutes
what we have called the switching/Gaussian (SWIG) discretization
approach [42]. All of the required matrix elements are analytically
differentiable functions of the atomic coordinates, and the deriva-
tives are rigorously continuous and free of unphysical oscillations.
As compared to the FIXPVA approach, spurious vibrational fre-
quencies are absent (see Fig. 11.6). In QM/PCM calculations, SWIG
discretization preserves the variational property of the original
operator formalism, namely, that the solute/continuum electrostatic
interaction necessarily lowers the SCF energy [42]. When the
X = DAS version of the K matrix is employed in IEF-PCM (or
when C-PCM is used instead), SWIG discretization yields the same
solvation energies, to very high accuracy, as compared to traditional
discretizations [44].
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 391

non-electrostatic energy / kcal mol –1


VTN

SWIG

FIXPVA

Na–Cl distance / Å

Figure 11.7 Non-electrostatic solvation energy for a QM/PCM calculation


of aqueous NaCl, as a function of the distance between the two atoms.
The model assumes that the non-electrostatic energy is proportional to the
exposed cavity surface area, which jumps in discontinuous steps for VTN
discretization. The SWIG discretization smoothly interpolates through these
steps, whereas the FIXPVA discretization achieves smoothness by scaling
the tesserae surface areas, leading it to underestimate the total surface area.
Adapted from Ref. [41]; copyright 2010 American Chemical Society.

Within the SWIG-PCM approach, the exposed cavity surface area


is also a rigorously smooth function of the atomic coordinates, even
though the “seams” between atomic spheres are no longer sharp
cusps, and discretization points within the buffer zone do contribute
to the total surface area, albeit with some attenuation. The fact that
surface areas are smooth is important because the non-electrostatic
energy is often parameterized in terms of the cavity surface area
[2, 3, 20, 47, 84]. In biomolecular applications, so-called MM/PBSA
methods [36, 38] also use the cavity surface area to obtain the
non-electrostatic part of the solvation energy. Figure 11.7 plots the
total surface area, obtained using various discretization methods, as
two atomic spheres are pulled apart. The variable tesserae number
(VTN) scheme [49] amounts to a discontinuous throwing away of
grid points as they enter the solute cavity, and serves as a control
experiment. As one would expect for such an approach, the VTN
surface area consists of a sequence of discrete steps corresponding
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

392 Polarizable Continuum Models for (Bio)Molecular Electrostatics

to addition or loss of grid points. Due to the simplicity of the


model, these steps should represent an accurate (if discontinuous)
approximation to the cavity surface area, and the SWIG surface area
smoothly interpolates through these steps. FIXPVA discretization,
while it does afford a rigorously smooth surface area, tends to
underestimate the VTN surface area. We have shown that the lack
of Coulomb regularization in FIXPVA necessitates a more aggressive
switching function in order to avoid singularities [41, 44], with the
result that many grid points are attenuated completely away, leading
to “holes” in the cavity surface [41].
A major advantage of the inherently smooth SWIG-PCM ap-
proach, as compared to a grid-based finite-difference solution of
Poisson’s equation, is that stable forces for MD simulations are
obtained by straightforward differentiation of Eq. (11.22) [42],
even for fairly coarse discretization grids. In the finite-difference
approach, one must resort to very fine grids, or else interpolation
or other tricks, in order to render discontinuities small enough so
that energy-conserving MD can be achieved [80–83]. Figure 11.8
plots the electrostatic solvation energy, Gpol , from an ab initio
MD trajectory of glycine in implicit water [42]. In this simulation,
the solute is initialized in its carboxylic acid tautomer, whereas
the zwitterionic tautomer is more stable in aqueous solution. As
such, the molecule spontaneously undergoes intramolecular proton
transfer, evident in Fig. 11.8 by the dramatic change in Gpol .
Close examination, however, reveals that Gpol is a perfectly smooth
function of time, even during the course of this radical change in
cavity geometry.
Stable forces are also obtained in MM/PCM simulations, as
shown in Fig. 11.9 for a classical MD simulation of a segment of DNA
bound to a histone protein. Here, the energy fluctuations amount to
an acceptable ∼ 0.0001% of the total energy.

11.4.2 Linear Scaling and Parallelization


The MM/PCM simulation in Fig. 11.9 consists of ∼124,000 surface
discretization points (those for which F i > 10−6 ). As such, solution
of Eq. (11.22) by matrix inversion or other O(N 3 ) methods is clearly
infeasible, and a linear-scaling approach (in both memory and CPU
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 393

Figure 11.8 Ab initio (PBE0/6-31+G*) QM/PCM MD simulation of in-


tramolecular proton transfer in aqueous glycine. The inset shows the
total electrostatic solvation energy (Gpol ), which is much larger for the
zwitterionic tautomer than for the carboxylic acid tautomer. The shaded
region has been enlarged in the main part of the figure, in order to
demonstrate that the solvation energy is a smooth function of time despite
the radical change in cavity shape upon proton transfer. The time step is
≈1 fs. Adapted from Ref. [42]; copyright 2010 American Institute of Physics.

time) is required. Such algorithms have been reported [25, 69],


and parallelization has been discussed as well [29]. Our approach
is described here for the first time, although versions of it have
actually been available in the Q-CHEM software [39] since v. 3.2. Our
strategy involves (bi)conjugate gradient solvers for linear equations,
which do not require explicit formation of matrices such as S and
D; a treecode version [50] of the fast multipole method [32]; and
parallelization using both OpenMP and MPI. The discussion below
pays particular attention to scalability and to the parallelization
strategy, focusing on practical considerations as the system size is
scaled up, and on how appropriate choices can be made for optimal
efficiency at different points along the way.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

394 Polarizable Continuum Models for (Bio)Molecular Electrostatics

Figure 11.9 Energy fluctuations in an MM/PCM MD simulation of a


segment of DNA bound to a histone protein. The solute (DNA + protein)
consists of 21,734 AMBER99 atoms and the cavity surface is discretized using
∼124,000 surface charges. After the initial equilibration period, energy
fluctuations amount to ∼0.0001% of the total energy.

11.4.2.1 Conjugate gradient solvers


The straightforward way to solve Eq. (11.22) is by constructing the
matrix K−1 R, or more realistically by some equivalent factorization
procedure such as LU decomposition. Even in an iterative SCF
procedure, this needs to be done just once per molecular geometry.
Nevertheless, this operation scales as O(Ngrid 3
) and becomes the
bottleneck surprisingly quickly in QM/PCM calculations, especially
for dense discretization grids. The SWIG discretization procedure
exacerbates the cost of QM/PCM calculations, both by increasing the
number of grid points (we retain all si for which F i > 10−6 ) and
also because it requires evaluation of three-index Gaussian integrals,
(gi |μν). For example, in Hartree–Fock/6-31G* calculations on linear
alkanes, using 302 Lebedev points per atom (which is sufficient to
converge the electrostatics to within ∼0.3 kcal/mol [42]), the PCM
cost exceeds the QM cost starting around octane. For QM/MM/PCM
calculations, the O(Ngrid
2
) cost in memory can also become a serious
limitation.
A solution is to use Krylov subspace methods, such as the
conjugate gradient (CG) method, the biconjugate gradient (BiCG)
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 395

method, or other variants. The former is appropriate for C-PCM and


DESMO, where the matrix K is symmetric, and the BiCG algorithm
can be used for the non-symmetric SS(V)PE/IEF-PCM case. The cost
of these algorithms is dominated by matrix-vector products that
scale as O(Ngrid2
), although matrix-matrix multiplication is required
for SS(V)PE/IEF-PCM, which brings that method’s scaling back up to
O(Ngrid
3
) if matrices are constructed explicitly.
These matrix multiplications can be avoided using a combination
of the CG and BiCG algorithms to bypass construction of K [19]. In
the first stage, the BiCG algorithm is used to solve the equation
   
fε 1
I− DA w = − fε I − DA v (11.45)
2π 2π
for w. Following that, the equation Sq = w is next solved for q,
using the CG algorithm. The cost of this two-stage approach scales
as O(Ngrid
2
).
The CG and BiCG algorithms are complicated by the presence of
an inverse switching function in the definition of Sii [Eq. (11.43)],
which causes Sii → ∞ as F i → 0. Although in practice these
matrix elements are discarded when F i is smaller than some pre-
determined threshold, values of F i just above threshold tend to
inflate the condition number of S, which can cause numerical
instabilities or slow convergence in CG/BiCG algorithms. Large
condition numbers can be avoided by appropriate factorizations, for
example


1/2 −1/2 −1/2 1/2
S = Sdiag 1 + Sdiag Soff Sdiag Sdiag , (11.46)
where Sdiag and Soff represent the diagonal and off-diagonal parts
of S, respectively. The factor in parentheses is symmetric and thus
amenable to a CG approach, and ought to have a significantly
smaller condition number than S because small F i appear in the
−1/2
numerator of Sdiag . For C-PCM, Eq. (11.46) can be used to obtain an
intermediate equation


−1/2 −1/2
1 + Sdiag Soff Sdiag q  = − fε S−1/2
diag v (11.47)
,
that is solvable by CG techniques. Having solved this equation for q
−1/2

the final C-PCM charges are q = Sdiag q. This strategy can also be
used in the two-stage CG/BiCG calculation for SS(V)PE/IEF-PCM, as
outlined above. The cost remains O(Ngrid
2
) in CPU time.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

396 Polarizable Continuum Models for (Bio)Molecular Electrostatics

Pre-conditioning improves both the rate of convergence and the


stability of CG algorithms, but its effectiveness depends sensitively
on the nature of the pre-conditioner matrix, M. For solving linear
equations Ax = b, the pre-conditioner should be selected such
that M−1 A has a smaller condition number than A, which usually
implies that M−1 is some approximation to A−1 . A common choice
is to set M equal to the diagonal of A, but if the C-PCM equations
are factored according to Eq. (11.47), this would make M a
unit matrix and therefore pointless. We find that factorization
according to Eq. (11.47), without pre-conditioning, exhibits superior
convergence properties as compared to pre-conditioning using the
diagonal of A.
Block-diagonal pre-conditioners, which can be easily diagonal-
ized and stored in core memory, are also popular, and this is the
approach that we take. The fast multipole method [32, 50] (FMM)
that is described below is used to partition the surface discretization
charges, and this partition suggests a natural block structure for M.
We define the blocks of M to be equal to sub-blocks of S consisting of
the “leaf boxes,” which are the smallest partitions in the “octree” data
structure of the FMM (see below). The maximum number of grid
points in one of these leaf boxes is a controllable threshold (Nthresh )
in the FMM procedure, and keeping Nthresh  200 ensures that M
can be rapidly diagonalized and efficiently stored. We use this pre-
conditioner without the factorization in Eq. (11.47), and find that
convergence is accelerated by about 20% for large systems, relative
to other methods discussed here (Fig. 11.10).

11.4.2.2 Fast multipole method


Given that the matrix elements of both S and D are essentially just
particle–particle interactions, the FMM algorithm [32] can be used
to improve the scaling to either O(N) or O(N log N), depending on
the precise details. Our implementation (in Q-CHEM [39]) is based
on the octree Cartesian FMM developed by Krasny and co-workers
[50], which recursively sub-divides space into eight cubes of equal
size and computes a multipole expansion of the charges contained in
each box. In our implementation, these sub-divisions cease when the
number of particles in a box falls below a given threshold (Nthresh ),
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 397

10–1

a
b
maximum residual c
10–2

10–3

10–4

0 5 10 15 20 25 30
CG iteration

Figure 11.10 Convergence of AMBER99/C-PCM calculations for (alanine)1000


using SWIG discretization with 110 Lebedev points per atom. Method (a)
uses the factorization in Eq. (11.47) with no pre-conditioning; method (b)
uses a diagonal pre-conditioner; and method (c) uses a block-diagonal pre-
conditioner. The convergence threshold was set to a maximum residual of
10−4 and electrostatics were computed exactly (no FMM).

or when the distance between the center and the vertices of the cube
falls below another threshold (Rthresh ).
Following construction of the octree data structure, the Coulomb
interaction between a given particle (surface grid point) and the
N−1 other particles is computed by traversing the octree, starting at
the root box and then traveling downward into each box containing
the given particle’s coordinates, until the particle reaches a terminal
(“leaf”) box of the tree. At each level in this traversal, we sum the
multipole interactions between the given particle and each “child”
box that is within a certain multipole acceptance criterion (MAC),
θMAC [50]. The criterion for using multipoles rather than explicit
particle–particle interactions is
R c, box /ri c ≤ θMAC , (11.48)
where Rc, box is the radius of the cth box (or “cluster” in the language
of Ref. [50]), and ri c is the distance between the i th tree-traversing
particle and the center of the cth box. If the inequality in Eq. (11.48)
is met, then the multipole expansion for the cth box is used for its
interaction with tree-traversing particle i , otherwise the pairwise
particle–particle interactions are computed explicitly. (The limit
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

398 Polarizable Continuum Models for (Bio)Molecular Electrostatics

θMAC = 0 is equivalent to never accepting multipoles, and the limit


θMAC = 1 accepts all multipoles except for those inside of a box, so
that particle–particle interactions are computed explicitly in the leaf
boxes.) Tree traversal terminates when the particle reaches a leaf
box, and in the leaf box, the particle computes its explicit pairwise
interactions within the leaf box as well as any remaining boxes on
the same level that are not within the MAC. At the end, one will have
summed the pairwise interactions between particle i and all other
particles, in O(log N) effort. Repeating this for all N values of i leads
to overall O(N log N) scaling. No matrices are stored, so memory
usage is O(N).
This FMM procedure readily interfaces with the CG solver,
replacing all matrix-vector products involving S with the FMM
using the Coulomb kernel ri−1 j for point charges [50]. The matrix
D can similarly be replaced by computing the electric field via
FMM and appropriately multiplying the normal vectors. However,
the Gaussian blurring used in SWIG discretization complicates
the situation, because it involves a modified pairwise kernel,
erf(ζi j ri j )/ri j . Although variants of FMM that are appropriate for
Gaussian basis sets have been developed [86], we take a simpler
approach, allowing Gaussian charges to interact as if they were
point charges outside of a certain distance, Rerror . Note that the
pairwise Coulomb kernel Sii reduces to the point-charge kernel
in the limit ζi j → ∞. For a given PCM grid, we therefore select
√ width, ζi , whence the minimum possible
min
a minimum Gaussian
value of ζi j is ζi / 2. The maximum error in the pairwise Gaussian
min

charge interactions is then equal to 1 − erf(ζimin Rerror /21/2 ), so


Rerror can be tuned to achieve a desired accuracy. The minimum box
“radius”, Rthresh , serves as a stopping criterion during the subdivision
recursion, such that the minimum distance for meeting the MAC is
Rthresh /θMAC . Then, in order to ensure that we only accept multipoles
beyond Rerror , one must simply ensure that
R thresh /θMAC ≤ Rerror . (11.49)
This applies to explicit charge–charge interactions, while the
error in multipoles remains controlled by the order at which
the multipole expansions are truncated. The explicit pairwise
interactions between Gaussian charges in the leaf boxes are still
computed explicitly using the erf(ζi j ri j )/ri j kernel.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 399

11.4.2.3 Parallelization strategies


The methods described above can be applied to any PCM discussed
herein, including those based on SWIG discretization, and scale as
O(N log N) in CPU time and O(N) in memory. Here, we discuss a hy-
brid OpenMP/MPI paradigm that we have used to parallelize these
calculations. Our focus is on C-PCM and its DESMO extension for
salt effects, as these models are simpler and cheaper than SS(V)PE/
IEF-PCM, and provide nearly identical results in water. Although
parallel implementations of the FMM algorithm for MD simulations
have been reported before [11, 40], achieving good scalability for
the two-dimensional PCM electrostatics problem (with a cavity
surface that is changing dynamically as the simulation evolves) may
pose different challenges as compared to three-dimensional MD
simulations. Indeed, our preliminary implementation suffers from
some load-balancing issues, as discussed below, that have not yet
been resolved.
Most of the work in an FMM implementation of C-PCM goes into
computing the matrix-vector products in each CG iteration; these
operations are a good target for multithreading with OpenMP. We
store the entire FMM octree data structure in shared memory so
that each thread can access the octree in a parallel fork. We do not
store the Taylor series multipole expansion coefficients for the FMM
but instead compute them on-the-fly as needed, via bootstrapping
through recursion, whenever the MAC is met. This maintains a low
memory footprint and also seems to benefit the performance with
more cache hits within the function stack memory than by otherwise
fetching the coefficients more slowly from heap memory. When the
FMM is called, the multipole moments for each box in the octree are
updated with the provided charges. We multithread this loop over
boxes, providing it with the “guided” OpenMP threading schedule to
account somewhat for load imbalance in the boxes. The FMM can
then proceed to compute the electrostatic potential for each particle.
We multithread the loop over the particle tree traversals, which
must be done once for each discretization point, and each particle
accumulates its potential into the shared memory vector that stores
the result of the matrix-vector product. For t OpenMP threads, the
ideal scaling for each CG iteration is O[(N log N)/t].
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

400 Polarizable Continuum Models for (Bio)Molecular Electrostatics

We reserve distributed MPI parallelism for a different purpose.


The Cartesian space of the surface grid is partitioned into separate
regions and each region is owned by a single MPI rank that is
responsible for storing the surface grid data (Cartesian coordinates,
normal vectors, charges, etc.) for that region. In so doing, the
memory storage for the surface grid is distributed, scaling roughly as
O(N/ p) for p MPI ranks, assuming an even load balance of the grid
points. The regions can be determined in a number of ways, either
automatically or fixed ahead of the calculation. The boundaries of
the regions should not overlap, as this may degrade the accuracy of
the FMM. The reason for this restriction is that we let each MPI rank
build a distinct FMM tree for its region. This provides parallelization
over the number of regions, which is in addition to the tree traversal
parallelization provided by the OpenMP multithreading.
Because the surface grid is distributed, a given grid point is
available to only one MPI rank at first. In order to compute electro-
static interactions between grid points belonging to different MPI
ranks, it is therefore necessary to communicate grid information
between MPI ranks. To do so, we establish a communication ring
for all MPI ranks, wherein each rank has a neighbor rank to the
“left” and also one to the “right,” forming a closed circle. To carry out
the distributed FMM, we first let each MPI rank compute in parallel
its local electrostatic interactions (i.e., interactions with the grid
points that comprise the given MPI rank’s FMM tree). Next, each MPI
rank sends its list of grid-point information and the corresponding
(incomplete) electrostatic potential vector to its neighboring MPI
rank to the right, while simultaneously each MPI rank receives
incoming grid and potential information from its neighbor to the
left. The incoming grid points, which are only stored temporarily,
are then allowed to traverse the MPI rank’s FMM tree, accumulating
the interactions in the incoming electrostatic potential vector. After
tree traversal has been performed for all temporary grid points,
the grid information and potential are again passed along to the
neighboring MPI rank. This compute-and-pass procedure continues
until the grid information makes a complete cycle around the ring,
which takes p steps of communication. Upon completion of the cycle,
the electrostatic potential that has been passed around will have
traversed the FMM tree of each MPI rank, and it will have returned
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 401

to the MPI rank to which it belongs. This communication pattern can


benefit from using non-blocking MPI sends and receives, allowing
computation and communication to overlap to some extent.
A further optimization that is possible within the distributed
FMM scheme is to impose a certain cutoff distance for interactions
between grid points belonging to different MPI ranks. When a grid
point from rank pi interacts with those from rank p j , the MAC
criteria are tested for pi ’s grid point against the level-zero box from
p j ’s FMM tree. So long as pi and p j are sufficiently distant, the MAC
will always be met for points in pi , and there is no need for these
points to traverse p j ’s octree. The criterion to let all of pi ’s grid
points interact with the level-zero multipole expansion of p j ’s grid
is
ri j − Ri, box − R j, box ≥ Rcut-box , (11.50)
where ri j is the distance between the centroids of trees pi and
p j , Ri, box is the radius of pi ’s level-zero box, and Rcut-box is a pre-
determined cutoff. If the inequality in Eq. (11.50) is satisfied, then
we only compute level-zero multipoles for p j ’s grid points, which
affords some savings.
In the distributed FMM scheme, it is also beneficial to use the
same FMM trees to compute the electrostatic potential on the
surface grid. This involves a procedure similar to that described
above, in which all solute point charges traverse the distributed
surface grid. In our implementation, we let each MPI rank store all
atomic coordinates and related information, so that each MPI rank
can independently compute its portion of v in Eq. (11.22), without
communication. Allowing all MPI ranks to store the global atomic
coordinates and charges is usually feasible because there are far
fewer atoms than surface grid points.
What we have described above applies only to solute point
charges, whereas an electron density must be treated differently.
A simple procedure for the latter is to let the QM charge density
interact explicitly (rather than through the distributed FMM trees)
with all surface grid points within a certain pre-determined cutoff
distance, which maintains a nearly constant amount of CPU time
for a fixed QM region. The electrostatic potential thus computed
can then be communicated as needed to the appropriate MPI ranks.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

402 Polarizable Continuum Models for (Bio)Molecular Electrostatics

For QM/MM/PCM jobs, this means that the FMM trees are used
to compute interactions between the surface charges themselves,
and between the surface charges and the MM charges, but not for
interactions that involve the QM region. While this approach clearly
could be improved, we often find that the purely classical steps are
the bottlenecks when the MM region is large.

11.4.2.4 Surface construction strategies


The distributed FMM scheme can also be used to accelerate
construction of the PCM surface grid. Note that SWIG discretization
requires the evaluation of O(Natoms2
NLebedev ) switching functions,
although this number could be reduced somewhat by atom–atom
pairwise distance cutoffs. The FMM octree spatial partitioning
provides further acceleration, by constructing an octree for all the
atoms of a given solute, similar to what was described above for the
surface grid electrostatic interactions. Then, each atom traverses the
octree using a switching function acceptance criterion (instead of a
MAC) to determine if the tree-traversing atom needs to compute its
switching function with the atoms of a neighboring octree box or not.
The switching function acceptance criterion that we use is

ri c − Rc, box ≤ Rcut-switch , (11.51)

where Rcut-switch is a pre-determined cutoff distance, selected so


that atom–atom distances larger than this cutoff will not alter the
switching function for the i th atom’s grid points. The quantity ri c
is the distance from the i th tree-traversing atom to the center of
the cth octree box whose radius is R c, box . Only if the inequality
in Eq. (11.51) is valid do we compute the explicit atom–atom
pairwise switching functions. This procedure scales roughly as
O(NLebedev Natoms log Natoms ) and can afford significant savings for
macromolecular solutes.
Furthermore, this octree switching function procedure can
be parallelized within the distributed surface scheme. In our
implementation, each MPI rank is assigned a set of atoms whose
switching functions it will compute. Each MPI rank independently
constructs the switching function octree from the entire global set of
solute atoms (stored in each MPI rank) and then the chosen subset
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 403

of atoms for each MPI rank traverses the switching-function octree.


In the end, each MPI rank will have evaluated and constructed the
PCM surface grid for its set of atoms. It is this surface, then, from
which the surface octree is constructed and used in the distributed
FMM scheme. This is a preliminary implementation and there are
undoubtedly load-balance issues associated with this approach. In
a globular solute, for example, one MPI rank may be assigned
only interior atoms and end up with no surface grid because the
switching functions all evaluate to zero. A load-balancing scheme
that can dynamically respond to protein conformational changes
would be preferable, but remains to be developed.
Alternatively, one could abandon the switching functions of
SWIG discretization and construct a smooth isodensity surface,
either using the actual electron density [12, 19, 30] (in QM/
PCM calculations) or else some pseudo-density, as discussed in
Section 11.2.2.2. A pseudo-density isosurface can be constructed
using the marching cubes algorithm [52], which is trivially paral-
lelizable by multithreading the “marching” loop over all cubes and
partitioning the array of cubes across MPI ranks. We perform this
partition before the calculation begins, by assigning a number of
MPI ranks to each of the x, y, and z Cartesian dimensions. As with
the SWIG octree approach, this procedure is vulnerable to poor load
balance if some MPI process receives a set of solvent-inaccessible
atoms. In practice, we are often able to achieve reasonably good load
balance by examining the geometry of the solute and assigning a
greater number of MPI processes to the larger Cartesian dimensions,
but for MD applications a dynamical load-balancing scheme is
probably required. As with the switching function octree, the
resulting distributed surface grid is reused in the FMM scheme for
solving the PCM equations.

11.4.2.5 Scalability tests


We next consider some examples to demonstrate the scalability of
the algorithms described above, focusing on AMBER99/C-PCM jobs.
We set ε = 78.4 and (for DESMO calculations) κ −1 = 3.0 Å,
which equates to a fairly large ionic strength of about 1 mol/L for
water at 25◦ C. The solute cavity is constructed as a pseudo-density
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

404 Polarizable Continuum Models for (Bio)Molecular Electrostatics

Table 11.3 Dimensionless width


parameters for pseudo-density
isosurfaces.

x/Åa ζb
≥ 0.4 5.9
0.3 5.5
0.2 5.2
0.1 4.9

a
Marching cubes grid resolution.
b
Gaussian widths ζi in Eq. (11.42) are given
−1/2
by ζi = ζ ai .

isosurface (see Section 11.2.2), with B = 2.5 Å as in Ref. [88]. Our


implementation of this cavity construction uses Gaussian blurring
(Section 11.4.1) to avoid numerical issues related to Coulomb
singularities. As such, a set of Gaussian widths is required. We
determine these by minimizing the error in the Born ion solvation
energy for a spherical cavity of radius 2 Å, at various grid resolutions.
The width parameter ζ obtained for each grid resolution is listed in
Table 11.3, and the Gaussian width parameters ζi in Eq. (11.42) are
−1/2
taken to be ζi = ζ ai . A marching cubes grid resolution of 0.4 Å
was employed in all calculations.
Parameters for the CG-FMM algorithm were selected based on
test calculations for (alanine)20 , in order to obtain a solvation
energy that is within 1 kcal/mol of that obtained by explicit matrix
inversion. Multipoles up to  = 5 were included for computing
interactions between the surface charges, using Rthresh = 2.0 Å,
Nthresh = 200, and θMAC = 0.7. For interactions between the solute
charges and the surface charges, multipoles up to  = 4 were
included, with Rthresh = 2.0 Å, Nthresh = 50, and θMAC = 0.5. The CG
algorithm was considered converged when the maximum residual
fell below a threshold of 10−3 . All calculations were performed on
a cluster of 12-core HP Intel Xeon x5650 processors with 48 Gb of
RAM per node, using a locally modified version of Q-CHEM [39].
A quasi-linear solute is a best-case scenario for scalability, so
we first examine unfolded alanine polypeptides, (Ala)n . Table 11.4
is a strong-scaling analysis for a fixed solute size, (Ala)250 , with
a surface grid consisting of ≈ 350, 000 points, well beyond the
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Advances in Algorithms 405

Table 11.4 Strong-scaling data for CG-FMM applied to


(Ala)250 a

Nodes Threads Cores Wall time/sec Parallel efficiency


1 1 1 171.0 1.00
1 2 2 88.0 0.97
1 4 4 46.0 0.93
1 8 8 26.4 0.81
1 12 12 19.5 0.73
2 12 24 10.2 0.70
4 12 48 6.9 0.52
8 12 96 4.1 0.43
16 12 192 2.9 0.31

a
Surface grid consists of 349,797 points.

feasible memory limits for matrix inversion. The multithreaded


CG-FMM approach scales quite well across all 12 cores of one
node, with a parallel efficiency of 73% that greatly exceeds that of
recent multi-threaded FMM algorithms [11, 66]. However, the use
of additional nodes at 12 cores/node scales only moderately well
for a few additional nodes, and leads to diminishing returns as the
amount of work/node becomes small and communication becomes
a significant fraction of cost. Nevertheless, this fairly significant
single-point calculation can be performed in just 10 seconds on
2 × 12 cores, with 70% parallel efficiency.
Next we investigate weak-scaling parallelism with (Ala)n poly-
mers, increasing n in proportion to the number of MPI ranks (Ta-
ble 11.5). Although the parallel efficiency is not great, calculations

Table 11.5 Weak-scaling data for CG-FMM applied to (Ala)n a

Nodes Cores Wall time/ Parallel n No. grid


sec efficiency points
1 12 19.5 1.00 250 349,797
2 24 23.8 0.82 500 698,589
4 48 32.4 0.60 1000 1,397,704
8 96 38.7 0.50 2000 2,793,018
16 192 46.2 0.42 4000 5,583,607

a
All calculations use 12 cores/node, and the parallel efficiency is defined relative to the
single-node performance.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

406 Polarizable Continuum Models for (Bio)Molecular Electrostatics

Table 11.6 Strong scaling data for the protein 1LXLa

COSMO DESMO
ds
s

wall time/ parallel wall time/ parallel


rea

res
de
No

Th

Co sec efficiency sec efficiency

1 1 1 344.1 1.00 351.7 1.00


1 2 2 175.6 0.98 180.0 0.98
1 4 4 90.6 0.95 92.9 0.95
1 8 8 50.9 0.85 52.1 0.84
1 12 12 37.4 0.77 37.9 0.77
2 12 24 25.9 0.55 25.0 0.59
4 12 48 15.4 0.47 15.7 0.47
8 12 96 11.2 0.32 12.1 0.30
16 12 192 11.0 0.17 11.2 0.16

a
Surface grid consists of 285,446 points.

with several million grid points can be performed in less than


a minute for systems as large as (Ala)4000 . The calculations in
Table 11.5 represent the largest PCM calculations of which we are
aware.
Finally, we present strong-scaling tests for an irregularly shaped
protein (PDB code 1LXL) in Table 11.6, using both C-PCM and
DESMO. As with the quasi-linear alanine chains, scaling remains
good across one node but drops noticeably across multiple nodes.
Note, however, that the DESMO method, which incorporates salt
effects, incurs negligible overhead as compared to C-PCM. The extra
overhead in a DESMO MM/PCM calculation is simply the need to
compute the screened electrostatic potential on the surface grid one
time, and this can be accomplished using an adaptation of the FMM
algorithm of Krasny and co-workers [50].
Clearly, it is desirable to improve the MPI aspect of the
parallelization strategy, which is presently bottlenecked by com-
munication of grid information around the ring of MPI ranks. One
possible way to accomplish this would be to define neighboring MPI
ranks based on their FMM boxes. Non-neighbors could either be
ignored beyond some cutoff, or (preferably) they could broadcast
only their FMM level-zero multipoles, rather than the larger quantity
of grid information that is passed in our present implementation. In
this modified algorithm, grid information needs only to be passed
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

Summary and Future Directions 407

between neighboring MPI ranks, rather than the full ring, which may
lead to a quantity of communication that is nearly constant with
respect to system size.
Alternatively, one could introduce another layer of parallelism
between the distributed-grid MPI ranks and the tree-traversing
OpenMP threads. In this layer, one would allow a certain number
of MPI ranks to perform tree-traversal, with each of possessing a
copy of the grid information that it needs. In this master/slave setup,
each MPI master rank builds the appropriate portion of the grid (as
every single MPI rank does in our present implementation), then
passes the grid information to its set of MPI slaves. In addition, each
of the MPI slaves would also exploit OpenMP multi-threading (as
in our current implementation) to assist with tree traversal. Such
an algorithm does not directly reduce the communication problem,
but by sub-dividing the work of tree-traversal this approach allows
the use of larger boxes at the level of the “master” MPI ranks. This
should reduce the communication in an indirect way, since fewer
master ranks will be required, and these are the only ones that must
communicate grid information around the ring.

11.5 Summary and Future Directions

The prospects for the use of PCMs in macromolecular electrosta-


tics calculations seem bright. The accuracy is (and theoretically
speaking, should be) comparable to that achievable using finite-
difference solution of Poisson’s equation [44], but the computational
cost is greatly reduced since only the molecular cavity surface,
and not the whole of three-dimensional space, need be discretized.
Problems with discontinuous forces are entirely eliminated by
recently developed smooth discretization schemes [41, 42, 70, 76,
87]. Reported here for the first time is our multithreaded OpenMP
implementation of a conjugate gradient/fast multipole PCM solver,
whose cost is O(N) in memory and O(N log N) in CPU time. This
approach shows good scalability across all 12 cores of one node,
with a parallel efficiency exceeding that of other multi-threaded
FMM algorithms, although the present implementation does not
scale well beyond one or two nodes.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

408 Polarizable Continuum Models for (Bio)Molecular Electrostatics

To improve the accuracy of implicit-solvent potential energy


surfaces, non-electrostatic interactions must be included, although
such interactions have received only a brief mention here. The
smooth, linear-scaling PCM technology that is discussed here is
immediately ready for use in MM/PBSA applications [36, 38], as
a replacement for finite-difference electrostatics. Other formulas
for the non-electrostatic interactions [47] can also be used in PCM
calculations, possibly after some re-parameterization. In general
these non-electrostatic interaction formulas depend in some way
on the cavity surface area, which is smooth and easily calculable by
means of the PCM algorithms discussed herein.
Particular attention should be paid to the DESMO method
[43], as this model appears to be suitable for use with solvents
containing dissolved ions that are described by the linearized
Poisson–Boltzmann equation. DESMO shows promising accuracy
with respect to benchmark LPBE calculations. This includes an
analytically solvable model problem consisting of multiple solute
cavities, as would be encountered in a study of protein–protein
interactions in implicit solvent.
Finally, PCMs are useful for creating a data set of perfect radii
and effective pairwise Coulomb interactions that can be used to
parameterize novel generalized Born models. Several improved GB
models, having slightly better accuracy and significantly lower cost,
have been suggested based on comparisons to PCM benchmarks
[45]. These new GB models are ready to be “dropped in” to existing
MD codes. Comparison to DESMO suggests new ways to incorporate
salt effects into GB models [46], which warrant further exploration
as well.

Acknowledgments

The authors’ work on PCMs has been supported by the National


Science Foundation (grant nos. CHE-0748448 and CHE-1300603, to
J.M.H.), through an Ohio State Presidential Fellowship (to A.W.L.),
and through generous allocations of computing time from the Ohio
Supercomputer Center (project nos. PAS-0291 and PAA-0003).
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

References 409

References

1. Ahuir, J. L. P., and Silla, E. (1990). GEPOL: An improved description of


molecular surfaces. I. Building the spherical surface set, J. Comput. Chem.
11, pp. 1047–1060.
2. Amovilli, C., and Mennucci, B. (1997). Self-consistent-field calculation
of Pauli repulsion and dispersion contributions to the solvation free
energy in the polarizable continuum model, J. Phys. Chem. B 101, pp.
1051–1057.
3. Bachs, M., Luque, F. J., and Orozco, M. (1994). Optimization of
solute cavities and van der Waals parameters in ab initio MST-SCRF
calculations of neutral molecules, J. Comput. Chem. 15, pp. 446–454.
4. Baker, N. A. (2004). Poisson–Boltzmann methods for biomolecular
electrostatics, Method. Enzymol. 383, pp. 94–118.
5. Baker, N. A., Sept, D., Joseph, S., Holst, M. J., and McCammon, J. A.
(2001). Electrostatics of nanosystems: Application to microtubules and
the ribosome, Proc. Natl. Acad. Sci. USA 98, pp. 10037–10041.
6. Barone, V., Cossi, M., and Tomasi, J. (1997). A new definition of cavities
for the computation of solvation free energies by the polarizable
continuum model, J. Chem. Phys. 107, pp. 3210–3221.
7. Bottcher, C. J. (1976). Theory of Electric Polarization, 2nd ed. (Elsevier).
8. Cancès, E., and Mennucci, B. (1998). New applications of integral
equations methods for solvation continuum models: Ionic solutions and
liquid crystals, J. Math. Chem. 23, pp. 309–326.
9. Cancès, E., and Mennucci, B. (2001). Comment on “Reaction field
treatment of charge penetration” [J. Chem. Phys. 112, 5558 (2000)],
J. Chem. Phys. 114, pp. 4744–4745.
10. Cancés, E., Mennucci, B., and Tomasi, J. (1997). A new integral
equation formalism for the polarizable continuum model: Theoretical
background and applications to isotropic and anisotropic dielectrics,
J. Chem. Phys. 107, pp. 3032–3041.
11. Chau, N. H. (2013). Parallelization of the fast multipole method for
molecular dynamics simulations on multicore computers, in Advanced
Computational Methods for Knowledge Engineering, Studies in Computa-
tional Intelligence, vol. 479 (Springer), pp. 209–224.
12. Chen, F., and Chipman, D. M. (2003). Boundary element methods for
dielectric cavity construction and integration, J. Chem. Phys. 119, pp.
10289–10297.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

410 Polarizable Continuum Models for (Bio)Molecular Electrostatics

13. Chipman, D. M. (1997). Charge penetration in dielectric models of


solvation, J. Chem. Phys. 106, pp. 10194–10206.
14. Chipman, D. M. (1999). Simulation of volume polarization in reaction
field theory, J. Chem. Phys. 110, pp. 8012–8018.
15. Chipman, D. M. (2000). Reaction field treatment of charge penetration,
J. Chem. Phys. 112, pp. 5558–5565.
16. Chipman, D. M. (2002). Comparison of solvent reaction field represen-
tations, Theor. Chem. Acc. 107, pp. 80–89.
17. Chipman, D. M. (2004). Solution of the linearized Poisson–Boltzmann
equation, J. Chem. Phys. 120, pp. 5566–5575.
18. Chipman, D. M. (2006). New formulation and implementation for
volume polarization in dielectric continuum theory, J. Chem. Phys. 124,
pp. 224111:1–10.
19. Chipman, D. M., and Dupuis, M. (2002). Implementation of solvent
reaction fields for electronic structure, Theor. Chem. Acc. 107, pp. 90–
102.
20. Colominas, C., Luque, F. J., Teixidó, J., and Orozco, M. (1999). Cavitation
contribution to the free energy of solvation. Comparison of different
formalisms in the context of MST calculations, Chem. Phys. 240, pp. 253–
264.
21. Connolly, M. L. (1983). Solvent-accessible surfaces of proteins and
nucleic acids, Science 221, pp. 709–713.
22. Cossi, M., Barone, V., Mennucci, B., and Tomasi, J. (1998). Ab initio study
of ionic solutions by a polarizable continuum dielectric model, Chem.
Phys. Lett. 286, pp. 253–260.
23. Cossi, M., Mennucci, B., and Cammi, R. (1996). Analytical first derivatives
of molecular surfaces with respect to nuclear coordinates, J. Comput.
Chem. 17, pp. 57–73.
24. Cossi, M., Rega, N., Scalmani, G., and Barone, V. (2001). Polarizable
dielectric continuum model of solvation with inclusion of charge
penetration effects, J. Chem. Phys. 114, pp. 5691–5701.
25. Cossi, M., Rega, N., Scalmani, G., and Barone, V. (2003). Energies,
structures, and electronic properties of molecules in solution with the
C-PCM solvation model, J. Comput. Chem. 24, pp. 669–681.
26. Cossi, M., Scalmani, G., Rega, N., and Barone, V. (2002). New develop-
ments in the polarizable continuum model for quantum mechanical and
classical calculations on molecules in solution, J. Chem. Phys. 117, pp.
43–54.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

References 411

27. Cramer, C. J., and Truhlar, D. G. (2008). A universal approach to solvation


modeling, Acc. Chem. Res. 41, pp. 760–768.
28. Debye, P., and Hückel, E. (1954). On the theory of electrolytes. I. Freezing
point depression and related phenomena, in Collected Papers of Peter J.
W. Debye (Interscience Publishers, Inc.), pp. 217–263.
29. Ferrighi, L., Frediani, L., Fossgaard, E., and Ruud, K. (2006). Paralleliza-
tion of the integral equation formulation of the polarizable continuum
model for higher-order response functions, J. Chem. Phys. 125, p.
154112.
30. Foresman, J. B., Keith, T. A., Wiberg, K. B., Snoonian, J., and Frisch,
M. J. (1996). Solvent effects 5. Influence of cavity shape, truncation
of electrostatics, and electron correlation on ab initio reaction field
calculations, J. Phys. Chem. 100, pp. 16098–16104.
31. Ginovska, B., Camaioni, D. M., Dupuis, M., Schwerdtfeger, C. A., and
Gil, Q. (2008). Charge-dependent cavity radii for an accurate dielectric
continuum model of solvation with emphasis on ions: Aqueous
solutes with oxo, hydroxo, amino, methyl, chloro, bromo, and fluoro
functionalities, J. Phys. Chem. A 112, pp. 10604–10613.
32. Greengard, L., and Rokhlin, V. (1987). A fast algorithm for particle
simulations, J. Comput. Phys. 73, pp. 325–348.
33. Grochowski, P., and Trylska, J. (2008). Continuum molecular electro-
statics, salt effects, and counterion binding—A review of the Poisson–
Boltzmann theory and its modifications, Biopolymers 89, pp. 93–113.
34. Grycuk, T. (2003). Deficiency of the Coulomb-field approximation in the
generalized Born model: An improved formula for Born radii evaluation,
J. Chem. Phys. 119, pp. 4817–4826.
35. Holst, M. J., and Saied, F. (1995). Numerical solution of the nonlinear
Poisson–Boltzmann equation: Developing more robust and efficient
methods, J. Comput. Chem. 16, pp. 337–364.
36. Homeyer, N., and Gohlke, H. (2012). Free energy calculations by the
molecular mechanics Poisson–Boltzmann surface area method, Mol. Inf.
31, pp. 114–122.
37. Klamt, A., and Schüürmann, G. (1993). COSMO: A new approach
to dielectric screening in solvents with explicit expressions for the
screening energy and its gradient, J. Chem. Soc. Perkin Trans. 2 , pp. 799–
805.
38. Kollman, P. A., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., Lee, M.,
Lee, T., Duan, Y., Wang, W., Donini, O., Cieplak, P., Srinivasan, J., Case, D. A.,
and Cheatham III, T. E. (2000). Calculating structures and free energies
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

412 Polarizable Continuum Models for (Bio)Molecular Electrostatics

of complex molecules: Combining molecular mechanics and continuum


models, Acc. Chem. Res. 33, pp. 889–897.
39. Krylov, A. I., and Gill, P. M. W. (2013). Q-Chem: An engine for innovation,
WIREs Comput. Mol. Sci. 3, pp. 317–326.
40. Kurzak, J., and Pettitt, B. M. (2006). Fast multipole methods for particle
dynamics, Mol. Simul. 32, pp. 775–790.
41. Lange, A. W., and Herbert, J. M. (2010). Polarizable continuum reaction-
field solvation models affording smooth potential energy surfaces,
J. Phys. Chem. Lett. 1, pp. 556–561.
42. Lange, A. W., and Herbert, J. M. (2010). A smooth, non-singular, and
faithful discretization scheme for polarizable continuum models: The
switching/Gaussian approach, J. Chem. Phys. 133, pp. 244111:1–18.
43. Lange, A. W., and Herbert, J. M. (2011). A simple polarizable continuum
solvation model for electrolyte solutions, J. Chem. Phys. 134, pp.
204110:1–15.
44. Lange, A. W., and Herbert, J. M. (2011). Symmetric versus asymmetric
discretization of the integral equations in polarizable continuum
solvation models, Chem. Phys. Lett. 509, pp. 77–87.
45. Lange, A. W., and Herbert, J. M. (2012). Improving generalized Born
models by exploiting connections to polarizable continuum models. I.
An improved effective Coulomb operator, J. Chem. Theory Comput. 8, pp.
1999–2011.
46. Lange, A. W., and Herbert, J. M. (2012). Improving generalized Born
models by exploiting connections to polarizable continuum models. II.
Corrections for salt effects, J. Chem. Theory Comput. 8, pp. 4381–4392.
47. Lee, M. S., and Olson, M. A. (2013). Comparison of volume and
surface area nonpolar solvation free energy terms for implicit solvent
simulations, J. Chem. Phys. 139, pp. 044119:1–6.
48. Lee, M. S., Salsbury, Jr., F. R., and Brooks III, C. L. (2002). Novel
generalized Born methods, J. Chem. Phys. 116, pp. 10606–10614.
49. Li, H., and Jensen, J. H. (2004). Improving the efficiency and convergence
of geometry optimization with the polarizable continuum model: New
energy gradients and molecular surface tessellation, J. Comput. Chem.
25, pp. 1449–1462.
50. Li, P., Johnston, H., and Krasny, R. (2009). A Cartesian treecode for
screened Coulomb interactions, J. Comput. Phys. 228, pp. 3858–3868.
51. Lipparini, F., Scalmani, G., Mennucci, B., Cancès, E., Caricato, M., and
Frisch, M. J. (2010). A variational formulation of the polarizable
continuum model, J. Chem. Phys. 133, pp. 014106–1–11.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

References 413

52. Lorensen, W. E., and Cline, H. E. (1987). Marching cubes: A high


resolution 3D surface construction algorithm, Comp. Graph. 21, pp. 163–
169.
53. Lotan, I., and Head-Gordon, T. (2006). An analytical electrostatic model
for salt screened interactions between multiple proteins, J. Chem. Theory
Comput. 2, pp. 541–555.
54. Lu, B. Z., Zhou, Y. C., Holst, M. J., and McCammon, J. A. (2008). Recent
progress in numerical methods for the Poisson–Boltzmann equation in
biophysical applications, Commun. Comput. Phys. 3, pp. 973–1009.
55. Marenich, A. V., Cramer, C. J., and Truhlar, D. G. (2009). Universal
solvation model based on solute electron density and on a continuum
model of the solvent defined by the bulk dielectric constant and atomic
surface tensions, J. Phys. Chem. B 113, pp. 6378–6396.
56. Mennucci, B. (2012). Polarizable continuum model, WIREs Comput. Mol.
Sci. 2, pp. 386–404.
57. Mennucci, B., and Cammi, R. (2003). Ab initio model to predict NMR
shielding tensors for solutes in liquid crystals, Int. J. Quantum Chem. 93,
pp. 121–130.
58. Mennucci, B., Cancés, E., and Tomasi, J. (1997). Evaluation of solvent ef-
fects in isotropic and anisotropic dielectrics and in ionic solutions with
a unified integral equation method: Theoretical bases, computational
implementation, and numerical applications, J. Phys. Chem. B 101, pp.
10506–10517.
59. Mennucci, B., and Tomasi, J. (1997). Continuum solvation models: A
new approach to the problem of solute’s charge distribution and cavity
boundaries, J. Chem. Phys. 106, pp. 5151–5158.
60. Miertuš, S., Scrocco, E., and Tomasi, J. (1981). Electrostatic interaction
of a solute with a continuum. a direct utilization of ab initio molecular
potentials for the prevision of solvent effects, Chem. Phys. 55, pp. 117–
129.
61. Miertuš, S., and Tomasi, J. (1982). Approximate evaluations of the
electrostatic free energy and internal energy changes in solution
processes, Chem. Phys. 65, pp. 239–245.
62. Mongan, J., Svrcek-Seiler, W. A., and Onufriev, A. (2007). Analysis of
integral expressions for effective Born radii, J. Chem. Phys. 127, pp.
185101:1–10.
63. Onufriev, A. (2010). Continuum electrostatics solvent modeling with the
generalized Born model, in M. Feig (ed.), Modeling Solvent Environments:
Applications to Simulations of Biomolecules, Chapter 6 (Wiley-VCH,
Hoboken, NJ), pp. 127–165.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

414 Polarizable Continuum Models for (Bio)Molecular Electrostatics

64. Onufriev, A., Case, D. A., and Bashford, D. (2002). Effective Born radii
in the generalized Born model approximation: The importance of being
perfect, J. Comput. Chem. 23, 14, pp. 1297–1304.
65. Onufriev, A. V., and Sigalov, G. (2011). A strategy for reducing gross
errors in the generalized Born models of implicit solvation, J. Chem. Phys.
134, pp. 164104:1–15.
66. Pan, X. M., Pi, W. C., and Sheng, X. Q. (2011). An OpenMP parallelization
of the multilevel fast multipole algorithm, Prog. Electromag. Res. 112,
pp. 199–213.
67. Pomogaeva, A., and Chipman, D. M. (2014). Hydration energy from a
composite method for implicit representation of the solvent, J. Chem.
Theory Comput. 10, pp. 211–219.
68. Purisima, E. O., and Nilar, S. H. (1995). A simple yet accurate boundary
element method for continuum dielectric calculations, J. Comput. Chem.
16, pp. 681–689.
69. Scalmani, G., Barone, V., Kudin, K. N., Pomelli, C. S., Scuseria, G. E., and
Frisch, M. J. (2004). Achieving linear-scaling computational cost for the
polarizable continuum model of solvation, Theor. Chem. Acc. 111, pp.
90–100.
70. Scalmani, G., and Frisch, M. J. (2010). Continuous surface charge
polarizable continuum models of solvation. I. General formalism,
J. Chem. Phys. 132, pp. 114110:1–15.
71. Sigalov, G., Fenley, A., and Onufriev, A. (2006). Analytical electrostatics
for biomolecules: Beyond the generalized Born approximation, J. Chem.
Phys. 124, pp. 124902:1–14.
72. Sigalov, G., Scheffel, P., and Onufriev, A. (2005). Incorporating variable
dielectric environments into the generalized Born model, J. Chem. Phys.
122, pp. 094511:1–15.
73. Srinivasan, J., Trevathan, M. W., Beroza, P., and Case, D. A. (1999).
Application of a pairwise generalized Born model to proteins and
nucleic acids: Inclusion of salt effects, Theor. Chem. Acc. 101, pp. 426–
434.
74. Still, W. C., Tempczyk, A., Hawley, R. C., and Hendrickson, T. (1990).
Semianalytical treatment of solvation for molecular mechanics and
dynamics, J. Am. Chem. Soc. 112, pp. 6127–6129.
75. Su, P., and Li, H. (2009). Continuous and smooth potential energy surface
for conductor-like screening solvation model using fixed points with
variable areas, J. Chem. Phys. 130, pp. 074109:1–13.
January 29, 2016 11:32 PSP Book - 9in x 6in 11-Qiang-Cui-c11

References 415

76. Thellamurege, N. M., and Li, H. (2012). Note: FixSol solvation model and
FIXPVA2 tessellation scheme, J. Chem. Phys. 137, pp. 246101:1–2.
77. Tomasi, J., Mennucci, B., and Cammi, R. (2005). Quantum mechanical
continuum solvation models, Chem. Rev. 105, pp. 2999–3093.
78. Truong, T. N., Nguyen, U. N., and Stefanovich, E. V. (1996). Generalized
conductor-like screening model (GCOSMO) for solvation: An assessment
of its accuracy and applicability, Int. J. Quantum Chem. Symp. 60, pp.
1615–1622.
79. Tsui, V., and Case, D. A. (2001). Theory and applications of the
generalized Born solvation model in macromolecular simulations,
Biopolymers (Nucl. Acid Sci.) 56, pp. 275–291.
80. Wang, J., Cai, Q., Li, Z.-L., Zhao, H.-K., and Luo, R. (2009). Achieving energy
conservation in Poisson–Boltzmann molecular dynamics: Accuracy and
precision with finite-difference algorithms, Chem. Phys. Lett. 468, pp.
112–118.
81. Wang, J., Cai, Q., Xiang, Y., and Luo, R. (2012). Reducing grid dependence
in finite-difference Poisson–Boltzmann calculations, J. Chem. Theory
Comput. 8, pp. 2741–2751.
82. Wang, J., Tan, C., Chanco, E., and Luo, R. (2010). Quantitative analysis of
Poisson–Boltzmann implicit solvent in molecular dynamics, Phys. Chem.
Chem. Phys. 12, pp. 1194–1202.
83. Wang, J., Tan, C., Tan, Y.-H., Lu, Q., and Luo, R. (2008). Poisson–Boltzmann
solvents in molecular dynamics simulations, Commun. Comput. Phys. 3,
pp. 1010–1031.
84. Wang, Y., and Li, H. (2009). Smooth potential energy surface for
cavitation, dispersion and repulsion energies in continuum solvation
model, J. Chem. Phys. 131, pp. 206101:1–2.
85. Wangsness, R. K. (1986). Electromagnetic Fields, 2nd ed. (Wiley).
86. White, C. A., Johnson, B. G., Gill, P. M. W., and Head-Gordon, M. (1994).
The continuous fast multipole method, Chem. Phys. Lett. 230, pp. 8–16.
87. York, D. M., and Karplus, M. (1999). Smooth solvation potential based
on the conductor-like screening model, J. Phys. Chem. A 103, pp. 11060–
11079.
88. Yu, Z., Jacobson, M. P., and Friesner, R. (2005). What role do surfaces play
in GB models? A new-generation of surface-generalized Born model
based on a novel Gaussian surface for biomolecules, J. Comput. Chem.
27, pp. 72–89.
89. Zhan, C.-G., Bentley, J., and Chipman, D. M. (1998). Volume polarization
in reaction field theory, J. Chem. Phys. 108, pp. 177–192.
This page intentionally left blank
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Chapter 12

Differential Geometry-Based Solvation


and Electrolyte Transport Models for
Biomolecular Modeling: A Review

Guo Wei Weia and Nathan A. Bakerb


a Department of Mathematics, Department of Biochemistry and Molecular Biology,

Michigan State University, MI 48824, USA


b Computational and Statistical Analytics Division,

Pacific Northwest National Laboratory, Richland, WA 99352, USA


[email protected], [email protected]

12.1 Background

Solvation is an elementary process in nature and is particularly


essential to biology. Physically, the solvation process can be
described by a variety of interactions, such as electrostatic, dipolar,
induced dipolar, and van der Waals, between the solvent and solute.
Due to the ubiquitous nature of electrostatics and the aqueous
environment common to most biomolecular systems, molecular
solvation and electrostatics analysis is significantly important to
research in chemistry, biophysics, and medicine. Such analyses can
be classified into two general types: (1) quantitative analysis for

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

418 Differential Geometry-Based Solvation and Electrolyte Transport Models

thermodynamic or kinetic observables and (2) qualitative analysis


for general characteristics of biomolecular solvation.
In general, implicit solvent models describe the solvent as a
dielectric continuum, while the solute molecule is modeled with
an atomistic description [2–7]. There are many two-scale implicit
solvent models available for electrostatic analysis of solvation,
including generalized Born (GB) [8–18], polarizable continuum [19–
25] and Poisson–Boltzmann (PB) models [3, 4, 26–29]. GB methods
are fast heuristic models for approximating polar solvation energies.
PB methods can be formally derived from basic statistical mechanics
theories for electrolyte solutions [30–32] and therefore offer the
promise of robust models for computing the polar solvation energy
[9, 33, 34]. In many solvation analyses, the total solvation energy is
decomposed into polar and nonpolar contributions. Although there
are many ways to perform this decomposition, many approaches
model the nonpolar energy contributions in two stages: the work
of displacing solvent when adding a hard-sphere solute to solution
and the dispersive nonpolar interactions between the solute atoms
and surrounding solvent.
One of the primary quantitative applications of implicit solvent
methods in computational biology and chemistry research has
involved the calculation of thermodynamic properties. Implicit
solvent methods offer the advantage of “pre-equilibrating” the
solvent and mobile ions, thus effectively pre-computing the solvent
contribution to the configuration integral or partition function
for a system [6]. Such pre-equilibration is particularly evident in
molecular mechanics/Poisson–Boltzmann surface area (MM/PBSA)
models [35–39] that combine implicit solvent approaches with
molecular mechanics models to evaluate biomolecule–ligand bind-
ing free energies from an ensemble of biomolecular structures.
The calculation and assignment of protein titration states is
another important application of implicit solvent methodology
[40–43, 43–51]. Such methods have been used to interpret
experimental titration curves, decompose residue contributions
to protein-protein and protein-ligand binding energetics, examine
structural/functional consequences of RNA nucleotide protonation,
as well as several other applications. Another application area
for implicit solvent methods is in the evaluation of biomolecular
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 419

dynamics, where implicit solvent models generally are used to


provide solvation forces for molecular Langevin dynamics [52–
57], Brownian dynamics [58–61], or continuum diffusion [62–66]
simulations. A major qualitative use of implicit solvent methods in
experimental work is the visualization and qualitative analysis of
electrostatic potentials on and around biomolecular surfaces [67–
70]. Visualization of electrostatic potentials was popularized by the
availability of software, such as Grasp [68], and is now a standard
procedure for analyzing biomolecular structures with thousands
of examples available in the literature, including ligand-receptor
binding and drug design, protein-nucleic acid complexes, protein-
protein interactions, macromolecular assembly, and enzymatic
mechanism analysis, among others. More complete descriptions of
the solvation process, solvation models, and various applications
of solvation methods also can be found in the literature [71–
73]. Typically, solvation models are tested against experimental
data for solvation free energies, titration and redox behaviors, or
spectroscopic measures of local electric fields. However, solvation
models can also provide insight into molecular properties which
cannot be directly measured experimentally, including solute
surface area and enclosed volume, electrostatic potential, and
nonpolar solvation behavior. The properties derived from solvation
models are used in a variety of applications, including pH and
pKa estimation, titration analysis, stability analysis, visualization,
docking, and drug and protein design. In addition, sophisticated
models for non-equilibrium processes, such as Brownian dynamics,
molecular dynamics, kinetic models, and multiscale models, may
have a solvation model as a basic component [74–76].

12.2 Differential Geometry-Based Solvation Models

Most implicit solvent models require a definition of the solvent


density and/or dielectric coefficient profile around the solute
molecule. Often, these definitions take the form of analytic functions
[18, 77, 78] or discrete boundary surfaces dividing the solute-
solvent regions of the problem domain. The van der Waals surface,
solvent accessible surface [79], and molecular surface (MS) [80]
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

420 Differential Geometry-Based Solvation and Electrolyte Transport Models

are typically used for this purpose and have found many successful
applications in biomolecular modeling [81–88]. Physical properties
calculated from implicit solvent models are very sensitive to the
definition of the dielectric profile [89–92]; however, many of these
popular profile definitions are ad hoc divisions of the solute and
solvent regions of the problem domain based on assumptions about
molecular geometry rather than minimization of solute-solvent
energetic interactions.
Geometric analysis, which combines differential geometry (DG)
and differential equations, has had a tremendous impact in signal
and image processing, data analysis, surface construction [93–
100], and surface smoothing [101]. Geometric partial differential
equations (PDEs) [102], particularly mean curvature flows, are
popular tools in applied mathematics. Computational techniques
using the level set theory were devised by Osher and Sethian [99,
103, 104] and have been further developed and applied by many
others [105–107]. An alternative approach is to minimize the mean
curvature or energy functional of the hypersurface function in the
framework of the Mumford-Shah variational functional [108], and
the Euler–Lagrange formulation of surface variation developed by
Chan and co-workers, and others [104, 109–113]. Wei introduced
some of the first high-order geometric PDEs for image analysis
[114] and, with co-worker Jia, also presented the first geometric
PDE-based high-pass filters by coupling two nonlinear PDEs [115].
Recently, this approach has been generalized to a more general
formalism, the PDE transform, for image and surface analysis [116–
118], including biomolecular surface generation [119].
Geometric PDEs and DG theories of surfaces provide a natural
and simple description for a solvent–solute interface. In 2005, Wei
and his collaborators, including Michael Feig, pioneered the use of
curvature-controlled PDEs for molecular surface construction and
solvation analysis [120]. In 2006, based on DG, Wei and co-workers
introduced the first variational solvent–solute interface: the minimal
molecular surface (MMS), for molecular surface representation
[121–123]. With a constant surface tension, the minimization of
surface free energy is equivalent to the minimization of surface
area, which can be implemented via the mean curvature flow,
or the Laplace–Beltrami flow, and gives rise to the MMS. The
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 421

MMS approach has been used to calculate both solvation energies


and electrostatics [1, 123]. Potential-driven geometric flows, which
admit non-curvature-driven terms, have also been proposed for
biomolecular surface construction [124]. While our approaches
were employed by many others [125–128] for molecular surface
analysis, our curvature-controlled PDEs and the geometric flow-
based MMS model proposed in 2005 [120, 121, 123, 124] are, to
our knowledge, the first of their kind for biomolecular surface and
electrostatics/solvation modeling.
Our DG theory of the solvent–solute interface can be extended
into a full solvation model by incorporating a variational formulation
of the PB theory [129, 130] as well as a model of nonpolar solute-
solvent interactions [1] following a similar approach by Dzubiella,
Swanson, and McCammon [131]. We have implemented our DG-
based solvation models in the Eulerian formulation, where the solute
boundary is embedded in the three-dimensional (3D) Euclidean
space so evaluation of the electrostatic potential can be carried out
directly [71]. We have also implemented our DG-based solvation
models in the Lagrangian formulation [72] (see Fig. 12.1) wherein
the solvent–solute interface is extracted as a sharp surface and
subsequently used in solving the PB equation for the electrostatic
potential. To account for solute response to solvent polarization, we
recently introduced a quantum mechanical (QM) treatment of solute
charges to our DG-based solvation models using density functional
theory (DFT) [132]. Most recently, Wei and co-workers have taken
a different treatment of non-electrostatic interactions between the
solvent and solute in the DG based solvation models so that the
resulting total energy functional and PB equations are consistent
with more detailed descriptions of solvent densities at equilibrium
[75, 76]. This multiscale approach self-consistently computes the
solute charge density distribution which simultaneously minimizes
both the DFT energy as well as the solvation energy contributions.
The resulting model significantly extends the applicability of our
solvation model to a broad class of molecules without the need
for force-field parametrized charge terms. The resulting differential
geometry implicit solvent model has been tested extensively and
shows excellent performance when compared with experimental
and explicit solvent reference datasets [1, 71, 72, 75, 132–136].
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

422 Differential Geometry-Based Solvation and Electrolyte Transport Models

Figure 12.1 An illustration of differential geometric based solvation


models. The minimum curvature is mapped on the Laplace–Beltrami surface
of protein penicillopepsin (PDB ID 2web).

As mentioned above, a parallel line of research has been carried


out by Dzubiella, Hansen, McCammon, and Li. Early work by
Dzubiella and Hansen demonstrated the importance of the self-
consistent treatment of polar and nonpolar interactions in solvation
models [137, 138]. These observations were then incorporated
into a self-consistent variational framework for polar and nonpolar
solvation behavior by Dzubiella, Swanson, and McCammon [131,
139] which shared many common elements with our earlier
geometric flow approach but included an additional term to
represent nonpolar energetic contributions from surface curvature.
Li and co-workers then developed several mathematical methods for
this variational framework based on level-set methods and related
approaches [140–142] which they demonstrated and tested on a
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 423

Figure 12.2 An illustration of the one-dimensional projection of the


profiles of S and 1 − S functions along the x-axis.

variety of systems [143–145]. Unlike our Eulerian representation


[71], level-set methods typically give rise to models with sharp
solvent–solute interfaces.
An immediate consequence of our models is that the surfaces
generated are free of troublesome geometric singularities that
commonly occur in conventional solvent-accessible and solvent-
excluded surfaces [146, 147] and impact computational stability
of methods (see Fig. 12.2 for a smooth surface profile). Addition,
without using ad hoc molecular surfaces, both our solvation
models and the models of Dzubiella et al. significantly reduce
the number of free parameters that users must “fit” or adjust in
applications to real-world systems [136]. Our recent work shows
that physical parameters; i.e., pressure and surface tension, obtained
from experimental data can be directly employed in our DG-based
solvation models to achieve an accurate prediction of solvation
energy [135].
In this chapter, we review a number of DG-based models.
Initially, we discuss solvation models, i.e., nonpolar and polar
solvation models at equilibrium. To improve the accuracy and make
our models robust, quantum mechanics is applied to the solute’s
electron structure. As an important extension, we also consider
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

424 Differential Geometry-Based Solvation and Electrolyte Transport Models

DG-based models for the dynamical processes at non-equilibrium


settings, including applied external electrical field gradients and
inhomogeneous solvent concentration across membrane proteins.

12.2.1 Nonpolar Solvation Model


As discussed above, solvation free energy is typically divided into
two contributions: polar and nonpolar components. In one popular
description, polar portion refers to electrostatic contributions while
the nonpolar component includes all other effects. Scaled particle
theory (SPT) is often used to describe the hard-sphere interactions
between the solute and the solvent by including the surface free
energy and mechanical work of creating a cavity of the solute size
in the solvent [148, 149].
The SPT model can be used in combination with other solute-
solvent nonpolar interactions; e.g. [71, 74, 131, 150],

G = γ A + pV +
NP
U dr, r ∈ R3 , (12.1)
s
where the first two terms are from SPT and the last term is the free
energy due to solvent–solute interactions. Here, A and V are the
surface area and volume of the solute, respectively; γ is the surface
tension; p is the hydrodynamic pressure; U denotes the solvent–
solute non-electrostatic interactions; and s is the solvent domain.
In our earlier work, we have shown that the surface area in
Eq. 12.1 can be evaluated via a two-dimensional (2D) integral for
arbitrarily shaped molecules [123, 124]. For variation purposes, the
total free functional must be set up as a 3D integral in R3 . To this end,
we take advantage of geometric measure theory by considering the
mean surface area [74] and the coarea formula [151]:
 1 
A= 
dσ dc = |∇ S(r)|dr, r ∈ R3 , (12.2)
0 S −1 (c)  
where  denotes the whole computational domain and 0 ≤ S ≤ 1
is a hypersurface or simple surface function that characterizes the
solute domain and embeds the 2D surface in R3 ; 1 − S characterizes
the solvent domain [71]. Using the function S, the volume in Eq. 12.1
can be defined as  
V = dr = S(r)dr, (12.3)
m 
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 425

where m is the solute domain. Note that s ∩ m is not empty


because the surface function S is a smooth function, which leads to
overlap between s and m domains. The last term in Eq. 12.1 can
be written in terms of S as:
 
U dr = (1 − S(r))U dr. (12.4)
s 

Therefore, we have the following nonpolar solvation free energy


functional [1, 71, 74]:

GNP [S] = {γ |∇ S| + pS + (1 − S)U } dr (12.5)

which is in an appropriate form for variational analysis.


It is important to understand the nature of the solvent–solute
non-electrostatic interaction, U . Assume that the aqueous environ-
ment has multiple species labeled by α, and their interactions with
each solute atom near the interface can be given by

U = ρα U α (12.6)
α
 
= ρα (r) U α j (r), (12.7)
α j

where ρα (r) is the density of αth solution component, which may be


charged or uncharged, and U α j is an interaction potential between
the j th atom of the solute and the αth component of the solvent.
For water that is free of other species, ρα (r) is the water molecule
density. In our earlier work [71, 72], we represented solvent–solute
interactions using the Lennard–Jones potential. The full Lennard–
Jones potential is singular and can cause computational difficulties
[71]; however, Zhao has proposed a way to improve the integration
stability in a realistic setting for proteins [127]. However, further
mathematical algorithms are needed for this class of problems. The
Weeks–Chandler–Anderson (WCA) decomposition of the potential,
which separates the attractive and repulsive components [152], was
also found to provide a good account of the attractive dispersion
interaction in our earlier work [71, 72].
The interaction potential U can be further modified to consider
additional interactions, such as steric effects [153] and alternate
descriptions of van der Waals interactions.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

426 Differential Geometry-Based Solvation and Electrolyte Transport Models

Figure 12.3 The final isosurfaces of a nonpolar compound projected with


the corresponding van der Waals (vdW) potential for glycerol triacetate [1].

The Euler–Lagrange equation is used in our variational approach.


By variation of the energy functional with respect to S, we arrive at
an elliptic equation
 
∇S
∇· γ − p + U = 0, (12.8)
|∇ S|

where ∇ · γ |∇∇ SS| is a mean curvature term as the surface tension
γ is treated as a constant. A standard computational procedure used
in our earlier work [121, 123, 124] involves converting Eq. 12.8 into
a parabolic equation by introducing an artificial time variable:

 
∂S ∇S
= |∇ S| ∇ · γ + VNP , (12.9)
∂t |∇ S|
where VNP = − p + U is a potential-driving term for the time-
dependent problem. Equation 12.9 is a generalized Laplace–
Beltrami equation whose solution leads to the minimization of the
nonpolar solvation free energy with respect to the surface function
S.
The accuracy of the nonpolar solvation model performance
is crucial to the success of other expanded versions of the
differential geometry formalism. In particular, as the electrostatic
effect and its associated approximation error are excluded, the
major factor impacting the nonpolar solvation model is the solvent–
solute boundary, which is governed by the DG-based formalism.
Therefore, the nonpolar model provides the most direct and
essential validation of the DG-based models. In our recent work [1],
the DG-based nonpolar solvation (DG-NP) model was tested using a
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 427

Table 12.1 Solvation energies calculated with the differential geometry


nonpolar solvation model for a set of 11 alkanes in comparison with an
explicit solvent model [154]

Rep. part (kcal/mol) Att. part (kcal/mol) Total (kcal/mol) Error (kcal/mol)
Compound DG-NP Explicit DG-NP Explicit DG-NP Explicit DG-NP Explicit
methane 4.71 5.72 −2.73 −3.31 1.98 2.41 −0.02 0.41
ethane 6.65 8.07 −4.75 −5.44 1.90 2.63 0.07 0.80
butane 10.30 10.10 −8.18 −7.21 2.12 2.89 0.04 0.81
propane 8.50 12.19 −6.45 −8.98 2.04 3.21 0.08 1.25
pentane 12.19 14.22 −9.82 −10.77 2.37 3.45 0.04 1.12
hexane 14.03 16.17 −11.54 −12.38 2.50 3.78 0.01 1.30
isobutane 10.14 11.91 −7.97 −8.88 2.16 3.03 −0.36 0.51
2-methylbutane 11.73 13.64 −9.35 −10.13 2.38 3.51 0.00 1.13
neopentane 11.81 13.62 −9.20 −10.39 2.61 3.23 0.11 0.73
cyclopentane 10.60 12.79 −9.43 −9.99 1.17 2.80 −0.03 1.60
cyclohexane 12.05 14.00 −10.78 −11.66 1.27 2.34 0.04 1.11

Note: Errors are computed with respect to experimental data [155].

large number of nonpolar compounds. Table 12.1 presents a small


portion of our results [1] compared with an explicit nonpolar model
[154] and experimental data [155]. The solvation free energy is
decomposed into repulsive and attractive parts, showing dramatic
cancellations. The predicted total nonpolar solvation energies are in
good agreement with experimental measurements. More extensive
validation of our DG-NP model can be found in an earlier paper [1].

12.2.2 Incorporating Polar Solvation with a


Poisson—Boltzmann Model
Most biomolecules are either charged or highly polarized; therefore,
electrostatic interactions are indispensable in their theoretical
description. The energy of electrostatic interactions can be mod-
eled by a number of theoretical approaches, including Poisson–
Boltzmann (PB) theory [3, 4, 26, 27], polarizable continuum theory
[20, 156], and the generalized Born approximation [8, 9]. In our
work, we incorporate PB theory for the polar solvation free energy
and optimize the electrostatic solvation energy in our variational
procedure.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

428 Differential Geometry-Based Solvation and Electrolyte Transport Models

Using the surface function S and electrostatic potential , a PB


model for the polar solvation free energy can be expressed by [71,
74]:
 
m
Gpolar = S − |∇|2 + 

2
 
s   qα +U α −μα0

+ (1 − S) − |∇| − kB T
2
ρα0 e kB T
−1 dr,
2 α
(12.10)
where s and m are the dielectric constants of the solvent and
solute, respectively, and
represents the fixed charge density of
the solute. The charge density is often modeled by a point charge

approximation
= j Q j δ(r − r j ), with Q j denoting the partial
charge of the j th atom in the solute. kB is the Boltzmann constant; T
is the temperature; ρα0 denotes the reference bulk concentration of
the αth solvent species; and qα denotes the charge valence of the αth
solvent species, which is zero for an uncharged solvent component.
In Eq. 12.10, the form of the Boltzmann distribution [75] is different
from that featured in our earlier work [71, 74].
qα +U α −μα0

ρα = ρα0 e kB T
(12.11)
with μα0 being a relative reference chemical potential that reflects
differences in the equilibrium activities of the different chemical
− Uα
species, and thus their concentrations. The extra term e kB T
in Eq. 12.11 describes the solvent–solute interactions near the
− Uα
interface beyond those implicitly represented by S. Therefore, e kB T
provides a non-electrostatic correction to the charge density near
the interface.
The resulting total free energy functional for the full solvation
system was first proposed in 2012 [75]:
 
m
total [S, ] =
GPB γ |∇ S| + pS + S − |∇|2 + 

2
 
s   qα +U α −μα0

+(1 − S) − |∇| − kB T
2
ρα0 e kB T
−1 dr.
2 α

(12.12)
Note that the energy functional (Eq. 12.12) differs from that in our
earlier work [71, 74] and that of Dzubiella et al. [131, 139] not only
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 429

in terms of the Boltzmann distribution, but also in the solvent–solute


interactions (1−S)U , which is omitted in the present form. As shown
in Section 12.3, the present form is consistent with the DG-based
Poisson–Nernst–Planck (PNP) theory at equilibrium. The DG-based
PNP model offers a more detailed description of solvent densities
based on fundamental laws of physics. As a result, the formalism of
the DG-based full solvation model should agree with that of the DG-
based PNP model at equilibrium.
The total solvation free energy in Eq. 12.12 is expressed as
a functional of the surface function S and electrostatic potential
. Therefore, the total solvation free energy functional can be
minimized with respect to S and  via the variational principle.
Variation with respect to S leads to
 
∇S m s
−∇· γ + p − |∇|2 + 
+ |∇|2
|∇ S| 2 2
  qα +U α −μα0

+ kB T ρα0 e kB T
− 1 = 0. (12.13)
α

Using the same procedure discussed earlier, we construct the


following generalized Laplace–Beltrami equation:

 
∂S ∇S
= |∇ S| ∇ · γ + VPB , (12.14)
∂t |∇ S|
where the potential driven term is given by
m s
VPB = − p + |∇|2 − 
− |∇|2
2 2
  qα +U α −μα0

−kB T ρα0 e kB T
−1 . (12.15)
α

As in the nonpolar case, solving the generalized Laplace–Beltrami


equation (12.14) generates the solvent–solute interface through the
function S. Variation with respect to  gives the generalized PB
(GPB) equation:
 q +U −μ
− α k αT α0
−∇ · ( (S)∇) = S
+ (1 − S) qα ρα0 e B , (12.16)
α

where (S) = (1 − S) s + S m is the generalized permittivity


function. As shown in our earlier work [71, 74], (S) is a smooth
dielectric function gradually varying from m to s . Thus, the solution
procedure of the GPB equation avoids many numerical difficulties of
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

430 Differential Geometry-Based Solvation and Electrolyte Transport Models

solving elliptic equations with discontinuous coefficients [157–161]


in the standard PB equation.
Equations 12.14 and 12.16 are solved for the surface function S
and electrostatic potential , respectively. These coupled “Laplace–
Beltrami and Poisson–Boltzmann” equations are the governing
equation for the DG-based solvation model in the Eulerian repre-
sentation. The Lagrangian representation of the DG-based solvation
model has also been derived [72]. Both the Eulerian and Lagrangian
solvation models have been shown [71, 72] to be essentially
equivalent and provide very good predictions of solvation energies
for a diverse range of compounds.

12.2.3 Improving Poisson–Boltzmann Model Charge


Distributions with Quantum Mechanics
While our earlier DG-based solvation models resolved the problem
of ad hoc solute-solvent boundaries, they depended on existing force
field parameters for atomic partial charge and radius assignments.
Most force field models are parametrized for a certain class of
molecules or materials which often limits their transferability and
applicability. In particular, fixed partial charges do not account
for charge rearrangement during the solvation process [162–164].
Therefore, a quantum solvation model that can self-consistently
update the charge density of the solute molecule during solvation
offers the promise of improving the accuracy and transferability of
our DG-based solvation model.
A quantum mechanical formulation of solute charge density can
be pursued in a number of ways. The most accurate treatment is
the one that uses quantum mechanical first principle or ab initio
approaches. However, the ab initio calculation of the electronic
structure of a macromolecule is currently prohibitively expensive
due to the large number of degrees of freedom. A variety of elegant
theories and algorithms have been developed in the literature to
reduce the dimensionality of this many-body problem [165–172]. In
earlier work from the Wei group, a density functional theory (DFT)
treatment of solute electron distributions was incorporated into our
DG-based solvation model [132]. In this work, we review the basic
formulation and present an improved DG-DFT model for solvation
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 431

analysis. Our goal is to construct a DG-DFT based solvation model


that will significantly improve the accuracy of existing solvation
models and still be orders of magnitude faster than explicit solvent
models.
DFT uses functionals of single-electron distributions to repre-
sent multi-electron properties so that the total dimensionality is
dramatically reduced. To combine DFT with our DG-based solvation
formulation, we define the kinetic energy functional as
 2
Gkin [n] = S(r) |∇ψ j (r)|2 dr, (12.17)
j
2m
where n is the total electron density, m(r) is the position-dependent
electron mass,  = 2π h
with h being the Planck constant, and ψ j (r)
are the Kohn–Sham orbitals. The total electron density n is obtained
by

n(r) = |ψi |2 , (12.18)
i
where the summation is over all of the Kohn–Sham orbitals.
In the absence of external potentials, the electrostatic potential
energy of nuclei and electrons can be represented by the Coulombic
interactions among the electrons and nuclei. There are three
groups of electrostatic interactions: interactions between nuclei,
interactions between electrons and nuclei, and interactions between
electrons. Following the Born–Oppenheimer approximation, we
neglect nuclei interactions in our DG-based model. Using Coulomb’s
law, the repulsive interaction between electrons can be expressed as
the Hartree term:
 2
1 eC n(r)n(r ) 
U ee [n] = dr , (12.19)
2 (r)|r − r |
where eC is the unit charge of an electron; (r) is the position-
dependent electric permittivity; and r and r are positions of two
interacting electrons. Equation 12.19 U ee [n] involves nonlinear
functions of the electron density n, which implies the need for
iterative numerical variational methods, even in the absence of
solvent density. The attractive interactions between electrons and
nuclei are given by
 e2 n(r)Z I
U en [n] = − C
(12.20)
I
(r)|r − RI |
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

432 Differential Geometry-Based Solvation and Electrolyte Transport Models

where Z I is the charge of the nucleus. The total potential energy


functional is thengiven by
Gpotential = S(r) (U ee [n] + U ne [n] + E XC [n]) dr, (12.21)

where the last term, E XC , is the exchange-correlation potential,
which approximates the many-particle interactions in the solute
molecule.
Intuitively, it appears that the total free energy functional for the
DG-based model is the simple summation of the polar, nonpolar,
kinetic, and potential energy. However, such a summation will lead
to double counting because of the coupling among different energy
terms. For example, the electrostatic energy depends on the charge
density, which, in turn, depends on the kinetic and potential energies
of electrons. Additionally, the electrostatic potential serves as a
variable in the polar energy functional and also serves as a known
input in the potential energy of electrons through solution of the
Poisson equation in vacuum ( = 1)
−∇ 2 φv (r) = ρtotal
v
(r), (12.22)
where φv is the electrostatic potential in vacuum and ρtotal = nv + nn
v

with nv (r) being the electron density in vacuum and nn the density
of nuclei. The solution
 of the Poisson equation in vacuum is
eC nv (r )  eC Z I
φv (r) = dr − . (12.23)
|r − r | I
|r − RI |
Of note, the solution to Eq. 12.23 is the exact total Coulombic po-
tential of the electron–electron and electron–nucleus interactions.
Therefore, we do not need to include U ee [n] and U en [n] terms in the
total free energy functional.
Based on the preceding discussions, we propose a total free
energy functionalforsolutes at equilibrium:

1
Gtotal [S, φ, n] =
DFT-PB
γ |∇ S(r)|+ pS(r) + S(r) ρtotal φ − m |∇φ| 2
 2

s
+(1 − S(r)) − |∇|2
2 
  qα +U α −μα0

−kB T ρα0 e kB T
−1
α
⎡ ⎤⎫
 2 ⎬
+ S(r) ⎣ |∇ψ j |2 + E XC [n]⎦ dr, (12.24)
2m ⎭
j
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Solvation Models 433

where the first two terms are the nonpolar energy functional; the
third and fourth terms are the electrostatic energy functional; and
the last row is the electronic energy functional, which is confined
to the solute region by S(r). As already discussed, the term ρtotal =
nv + nn also contributes to the Coulombic potentials of the electron–
electron and electron–nucleus interactions. This total free energy
functional provides a starting point for the derivation of governing
equations for the DG-based solvation models, as well as the basis for
evaluation of solvation free energies.
The governing equations for the DG-based solvation model with
quantum mechanical charge distributions are determined by the
calculus of variations. As before, variation of Eq. 12.24 with respect
to the electrostatic potential φ gives the generalized Poisson–
Boltzmann (GPB) equation [71, 74]:
Nc
q +U −μ
− α k αT α0
−∇ · ( (S)∇φ) = Sρtotal + (1 − S) ρα0 qα e B , (12.25)
α=1
where the dielectric function is defined as before: (S) = (1 − S) s +
S m . In a solvent without salt, the GPB equation is simplified to be the
Poisson equation:
−∇ · ( (S)∇φ) = Sρtotal . (12.26)
This equation and Eq. 12.25 are similar to the model described in the
previous section (Section 12.2.2). However, in the present multiscale
model, the charge source ρtotal is determined by solving the Kohn–

Sham equations rather than by the fixed charges ρm = j Q j δ(r −
r j ).
Variation of Eq. 12.24 with respect to the surface function S gives
a Laplace–Beltrami equation [71, 74, 123, 124]:

 
∂S ∇S
= |∇ S| ∇ · γ + VDFT-PB , (12.27)
∂t |∇ S|
where
1 1
VDFT-PB = − p + m |∇φ|2 − s |∇|2
2 2
Nc  qα +U α −μα0

−kB T ρα0 e kB T
−1
α=1
 2
−ρtotal  − |∇ψ j |2 − E XC [n] (12.28)
j
2m
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

434 Differential Geometry-Based Solvation and Electrolyte Transport Models

The electronic potentials in the last row of this equation have


relatively small contributions to VDFT-PB at equilibrium due to the
fact that they essentially are confined inside the solute molecular
domain. Note that Eq. 12.27 has the same structure as the potential-
driven geometric flow equation defined in the models presented in
earlier in this chapter. As t → ∞, the initial profile of S evolves into
a steady-state solution, which offers an optimal surface function S.
Finally, to derive the equation for the electronic wavefunction,
we minimize the energy functional with respect to the wavefunction

ψ ∗j (r), subject to the Lagrange multiplier i E i (δi j − Sψi (r)

ψ j ( r)dr)) for the orthogonality of wavefunctions to arrive at the
Kohn–Sham equation:
 
2 2
− ∇ + U eff ψ j = E j ψ j , with U eff (r) = q + VXC [n],
2m
(12.29)
where the Lagrange multiplier constants E i can be interpreted as
energy expectation values, VXC [n] = d Edn XC [n]
, and qφ is the poten-
tial contribution from Coulombic interactions. These electrostatic
interactions can be calculated by the GPB equation (12.27) with a
given total charge density. Eq. 12.29 does not directly depend on the
solvent characteristic function S, so existing DFT packages can be
used in our computations with minor modifications.
To integrate our continuum model with standard DFT algo-
rithms, Wei and co-workers introduce the reaction field potential
RF =  − 0 with 0 being the solution of the Poisson
equation in homogeneous media [132]. The reaction field potential
is the electric potential induced by the polarized solvent and its
incorporation leads to the following effective energy function:
U eff (r) = q + VXC [n] = qRF + U eff
0
(r), (12.30)
0
where U eff (r) = q0 +VXC [n] is the traditional Kohn–Sham potential
available in most DFT algorithms. The reaction field potential also
appears in the Hamiltonian of the solute in the quantum calculation
[173–175] and can be obtained from the electrostatic computation
in the framework of the continuum models developed above. In
summary, the inclusion of quantum mechanical charge distributions
in the DG-based continuum model involves two components: (1) the
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Electrolyte Transport Models 435

classical electrostatic problem of determining the solvent reaction


field potential with the quantum mechanically calculated charge
density and (2) the quantum mechanical problem of calculating the
electron charge density with fixed nucleus charges in the presence
of the reaction field potential. To carry out these computations, an
intuitive, self-consistent, iterative procedure can be constructed to
solve the quantum equations for the electron distribution and the
continuum electrostatic equations for the reaction field potential
[20, 173–176].
After solving the Kohn–Sham equation, the QM-based charge
density can be incorporated into the solvation model in two different
ways. Our preferred approach is to apply the continuous QM charge
density directly to the PB equation as a source term. However,
it is also possible to fit the QM charge density into atomic point
charges or multipoles for use as the source term [177–179]. This
second approach is most useful when the DG-DFT scheme is used
in conjunction with other molecular simulation approaches, such as
MM-PBSA or docking.

12.3 Differential Geometry-Based Electrolyte Transport


Models

It is well-known that implicit solvent models use both discrete


and continuum representations of molecular systems to reduce the
number of degrees of freedom; this philosophy and methodology of
implicit solvent models can be extended to more general multiscale
formulations. A variety of DG-based multiscale models have been
introduced in an earlier paper of Wei [74]. Theory for the differential
geometry of surfaces provides a natural means to separate the
microscopic solute domain from the macroscopic solvent domain so
that appropriate physical laws are applied to applicable domains.
This portion of the chapter focuses specifically on the extension
of the equilibrium electrostatics models described above to non-
equilibrium transport problems that are relevant to a variety of
chemical and biological systems, such as molecular motors, ion
channels, fuel cells, and nanofluidics, with chemically or biologically
relevant behavior that occurs far from equilibrium [74–76].
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

436 Differential Geometry-Based Solvation and Electrolyte Transport Models

Another class of DG-based multiscale models involves the dy-


namics and transport of ion channels, transmembrane transporters
and nanofluidics. In new multiscale models developed by the Wei
group, the total energy functionals are modified with additional
chemical energies to account for spatially inhomogeneous ion
density distribution and charge fluxes due to applied external
field gradients and inhomogeneous solvent concentrations across
membranes. The Nernst–Planck equation is constructed using
Fick’s law via a generalized chemical potential governed by the
variational principle. Together with the Laplace–Beltrami equation
for the surface function and Poisson equation for electrostatic
potential, the resulting DG-based PNP theory reduces to our PB
theory at equilibrium [75]. The PNP equation has been thoroughly
studied in the biophysical literature [180–187]; however, a DG-
based formulation of the PNP offers many of the advantages that
DG-based solvation models described above provide: elimination
of several ad hoc parameters from the model and a framework in
which to incorporate more complicated solution phenomena such
as strong correlations between ions and confinement-induced ion
steric effects. Additionally, compared with conventional PNP models
[180–187], the DG-based PNP models include nonpolar solvation
free energy and thus can be used to predict the full solvation energy
against experimental data, in addition to the usual current–voltage
curves [75].

12.3.1 A Differential Geometry-Based


Poisson–Nernst–Planck Model
The GPB and Laplace–Beltrami models discussed in the previous
section were obtained from a variational principle applied to
equilibrium systems. For chemical and biological systems far from
equilibrium, it is necessary to incorporate additional equations (e.g.,
the Nernst–Planck equation) to describe the dynamics of charged
particles. Various DG-based Nernst–Planck equations have derived
from mass conservation laws in earlier work by Wei and co-workers
[74, 75]. We outline the basic derivation here. For simplicity in
derivation, we assume that the flow stream velocity vanishes (|v| =
0) and we omit the chemical reactions in our present discussion.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Electrolyte Transport Models 437

The chemical potential contribution to the free energy consists a


homogeneous reference term and the entropy of mixing [188]:
  
 0  ρα
Gchem = μα − μα0 ρα + kB T ρα ln − kB T (ρα − ρα0 ) dr,
α
ρα0
(12.31)
where μ0α is the reference chemical potential of the αth species at
which the associated ion concentration is ρ0α in a homogeneous
system (e.g.,  = U α = μα0 = 0). Here, kB T ρα ln ρρα0α is the entropy
of mixing, and −kB T (ρα − ρα0 ) is a relative osmotic term [189]. The
chemical potential of species α can be obtained by variation with
respect to ρα :
δGchem ρα
⇒ μαchem = μ0α − μα0 + kB T ln . (12.32)
δρα ρα0
Note that at equilibrium, μchem α = 0 and ρα = ρα0 because
of possible external electrical potentials, charged solutes, solvent–
solute interactions, and charged species interactions. This chemical
potential energy term can be combined with the polar and nonpolar
contributions discussed in the previous sections to give a total
system free energy of
 
Gtotal [S, , {ρα }] =
PNP
γ |∇ S| + pS + (1 − S)U

m
+S − |∇|2 + 

2 
s 
+(1 − S) − |∇| + 2
ρα qα
2 α

  ρα
+(1 − S) μ0α − μα0 ρα + kB T ρα ln
α
ρα0

−kB T (ρα − ρα0 ) + λα ρα ] dr, (12.33)

where λα is a Lagrange multiplier, which is required to ensure ap-


propriate physical properties at equilibrium [188]. In this functional,
the first row is the nonpolar solvation free energy contribution, the
second row is the polar solvation free energy contribution, and the
third row is chemical potential energy contribution. A unique aspect
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

438 Differential Geometry-Based Solvation and Electrolyte Transport Models

of this PNP formulation is the inclusion of nonpolar solvation free


energy contribution to the functional (see Eq. 12.1).
While electrostatic interactions provide a strong driving force
for many biomolecular phenomena, they are not the only source
of ion-ion and ion-solute interactions. In the heterogeneous envi-
ronment where biomolecules interact with a range of aqueous ions,
counterions, and other solvent molecules, electrostatic interactions
often manifest themselves in a variety of different forms related to
polarization, hyperpolarization, vibrational and rotational averages,
screening effects, etc. For example, size effects have been shown to
play an important role in macromolecular interactions [134, 190–
194]. Another important effect is the change of ion–water interac-
tions due to geometric confinement, which is commonly believed to
result in channel selectivity for sodium and/or potassium ions [134].
In past papers by Wei and co-workers, these types of interactions are
called “non-electrostatic interactions” or “generalized correlations”
[75, 134] and are incorporated into the DG-based models by
modifying Eqs. 12.6 and 12.7:

U = ρα U α
α
 
Uα = U α j (r) + U αβ (r), (12.34)
j β
where the subscript β runs over all solvent components, including
ions and water. In general, we denote U α as any possible non-
electrostatic interactions in the system. The inclusion of these non-
electrostatic interactions does not change the derivation or the
form of other expressions presented in the preceding section. The
total free energy functional (Eq. 12.33) is a function of the surface
function S, electrostatic potential , and the ion concentration
ρα . The governing equations for the system are derived using the
variational principle.
We first derive the generalized Poisson equation by the variation
of the total free energy functional with respect to the electrostatic
potential . The resulting generalized Poisson equation is

−∇ · ( (S)∇) = S
+ (1 − S) ρα qα , (12.35)
α
where (S) = (1 − S) s + S m is an interface-dependent dielectric
profile. The generalized Poisson equation (Eq. 12.35) involves the
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Electrolyte Transport Models 439

surface function S and the densities of ions ρα , which are to be


determined. Variation with respect to the ion density ρα leads to the
relative generalized potential μgen
α
δGPNP ρα
total
⇒ μgen
α = μα − μα0 + kB T ln
0
+ qα  + U α + λα
δρα ρα0
= μchem
α + qα  + U α + λα . (12.36)
We require μgenα , rather than μα
chem
, to vanish at equilibrium.
Therefore, we require
λα = −μ0α
qα +U α −μα0

ρα = ρα0 e kB T
. (12.37)
Using these relations, the relative generalized chemical potential
μgen
α can be rewritten as:
ρα
α = kB T ln
μgen + qα  + U α − μα0 . (12.38)
ρα0
Wei and co-workers derived a similar quantity from a slightly
different perspective in an earlier paper [195]. Note that this
chemical potential consists of contributions from the entropy
of mixing, electrostatic potential, solvent–solute interaction, and
the position-independent reference chemical potential. For many
biomolecular transport problems, diffusion is the major mechanism
for transport and relaxation to equilibrium. gen
By Fick’s first law,
the diffusive ion flux is Jα = −Dα ρα ∇ μkBαT with Dα being the
diffusion coefficient of species α. The diffusion equation for the mass
conservation of species α at the absence of steam velocity is ∂ρ∂tα =
−∇ · Jα , which results in the generalized Nernst–Planck equation:

 
∂ρα ρα
= ∇ · Dα ∇ρα + ∇(qα  + U α ) , (12.39)
∂t kB T
where qα  + U α is a form of the mean field potential. In the absence
of solvent–solute interactions, Eq. 12.39 reduces to the standard
Nernst–Planck equation.
Using the Euler–Lagrange equation, one can derive an elliptic
equation for the surface function S and, introducing an artificial time
as discussed earlier in this chapter, this can be transformed into a
parabolic equation:

 
∂S ∇S
= |∇ S| ∇ · γ + VPNP , (12.40)
∂t |∇ S|
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

440 Differential Geometry-Based Solvation and Electrolyte Transport Models

where the driving term is


m s 
VPNP = − p + U + |∇|2 − 
− |∇|2 +  ρα qα
2 2 α


ρα

+ kB T ρα ln − ρα + ρα0 − μα0 ρα . (12.41)
α
ρα0
Equations 12.39, 12.35, and 12.40 form a coupled system of
equations describing the surface function S, charge concentrations
ρα , and electrostatic potential . This coupled system differs from
the original PNP equations through the coupling of the surface
definition are to charge concentrations and electrostatics. We
call this DG-based system the “Laplace–Beltrami Poisson–Nernst–
Planck” (LB-PNP) model.
In general, the total free energy functional of the DG-based PNP
model in Eq. 12.33 differs from that of the DG-based PB model
in Eq. 12.12. The difference also exists between the surface-driven
term VPNP in the charge transport model and VBP in the solvation
model. Moreover, ρα in the charge transport model is determined
by the Nernst–Planck equation (12.39) rather than the Boltzmann
factor. However, if the charge flux is zero for the electrodiffusion
system, the PNP model is known to be equivalent to the PB model
[196]. Note that at equilibrium, the relative generalized potential
vanishes everywhere, and the result is the equilibrium constraint
given in Eq. 12.37. Therefore, by using the equilibrium constraint,
the total free energy functional in Eq. 12.33 becomes [75]
qα +U α −μα0

total −→ G total ,
GPNP as ρα −→ ρα0 e
PB kB T
. (12.42)
This relationship shows that under the equilibrium assumption, the
total free energy functional for the charge transport model reduces
to the equilibrium solvation model presented earlier (Eq. 12.12).
Furthermore, for the surface-driven functions of the generalized
LB equation, it is easy to show [75] that under the equilibrium
constraint, one has:
qα +U α −μα0

VPNP −→ VBP , as ρα −→ ρα0 e kB T
. (12.43)
This consistency between the DG-based PNP and PB models
is a crucial aspect of this non-equilibrium theory of charge
transport. Numerical simulations in Wei’s group have confirmed this
consistency [75].
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Electrolyte Transport Models 441

12.3.2 Quantum Mechanical Charge Distributions in the


Poisson–Nernst–Planck Model
As with the equilibrium solvation models introduced earlier, it is
also possible to incorporate quantum mechanical effects into the
non-equilibrium transport model. Our motivation is to account for
non-equilibrium ion fluxes and induced response in the electronic
structure of the solute or membrane protein. To this end, we
combine our DG-based DFT model with our DG-based PNP model as
illustrated in Fig. 12.4 to develop a free energy functional and derive
the associated governing equations.
The free energy functional is a combination of four models
(nonpolar, PB, PNP, and DFT) in a manner which avoids energetic
double-counting. Four variables are used (S, , {ρα }, and n) to
minimize the total energy. The resulting free energy functional has

Figure 12.4 An illustration of the differential geometry-based DFT-PNP


model for ion channels.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

442 Differential Geometry-Based Solvation and Electrolyte Transport Models

the following form:


 
GDFT-PNP
total [S, , {ρα }, n] = γ |∇ S| + pS + (1 − S)U

m
+S − |∇|2 + ρtotal 
2 
s 
+(1 − S) − |∇|2 +  ρα qα
2 α
  
+(1 − S) μ0α − μα0 ρα
α

ρα
+kB T ρα ln − kB T (ρα − ρα0 ) + λα ρα
ρα0
⎡ ⎤
 2
+S ⎣ |∇ψ j |2 + E XC [n]⎦ dr,
j
2m
(12.44)
where the first row is the nonpolar solvation energy functional, the
second row is electrostatic energy density of solvation, the third row
is the chemical energy functional of solvent ions, and the last row
is the energy density of solute electrons in the DFT representation,
as explained in earlier sections. Note that this coupled form places
some restrictions on the potential U : in particular, care must be
taken to avoid double-counting dispersive and repulsive interactions
that are already accounted for in the quantum mechanical treatment.
Using this function, the derivation of governing equations is
straightforward. For the sake of completeness, we discuss all of the
governing equations of this new model (as follows).
As before, variation of the total free energy functional with
respect to the electrostatic potential  gives rise to the generalized
Poisson equation:

−∇ · ( (S)∇) = Sρtotal + (1 − S) ρα qα , (12.45)
α
where (S) = (1 − S) s + S m is an interface-dependent dielectric
profile. The charge sources in Eq. 12.45 are the total charge density

ρtotal of the solute molecule and the ionic density α ρα qα of
aqueous species. The former is determined by DFT, while the latter is
estimated by the Nernst–Planck theory. At equilibrium (12.37), the
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Differential Geometry-Based Electrolyte Transport Models 443

generalized Poisson equation (12.45) reduces to the GPB equation


given in Eq. 12.25.
The procedure for deriving the Nernst–Planck equation is the
same as discussed in the previous section. We first carry out the
variation with respect to ρα to obtain the relative generalized
potential. Next, Fick’s laws of diffusion are employed to construct
the generalized Nernst–Planck equation:

 
∂ρα ρα
= ∇ · Dα ∇ρα + ∇(qα  + U α ) . (12.46)
∂t kB T
Formally, this equation has the same form as the generalized Nernst–
Planck equation in the last section. However, to evaluate U α , possible
effects stemming from the quantum mechanical representation of
the electronic structure must be considered.
As discussed previously, variation with respect to the surface
function S leads to a generalized Laplace–Beltrami equation after
the introduction of an artificial time:

 
∂S ∇S
= |∇ S| ∇ · γ + VDFT-PNP , (12.47)
∂t |∇ S|
where the potential driving term is given by
m s 
VDFT-PNP = − p + U + |∇|2 − 
− |∇|2 +  ρα qα
2 2 α


ρα

+ kB T ρα ln − ρα + ρα0 − μα0 ρα
α
ρα0
 2
− |∇ψ j |2 − E XC [n].
j
2m
At equilibrium (Eq. 12.37) VDFT-PNP becomes VDFT-PB . Equation 12.47
is coupled to all other quantities, , ρα and n. Fast solutions to this
type of equation remains an active research issue [71, 124, 197].
In the present multiscale DFT formalism, the governing Kohn–
Sham equation is obtained via the minimization of the energy
functional
 with respect to ψ ∗j (r),
 subject to the Lagrange multiplier
 ∗
( i E i δi j − Sψi (r)ψ j (r)dr ),
 
2 2
− ∇ + q + VXC [n] ψ j = E j ψ j . (12.48)
2m
Although the Kohn–Sham equation does not explicitly involve the
surface function and ion densities, the electrostatic potential energy
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

444 Differential Geometry-Based Solvation and Electrolyte Transport Models

q is calculated by the GPB equation (12.45) which is coupled


with solvent charge density and surface function. As such, electronic
response to ion fluxes in the ion channel is included in the present
model.
Equations 12.45, 12.46, 12.47, and 12.48 form a complete set
of governing equations which are strongly coupled to each other.
Therefore, these equations can be solved by nonlinear iterative
procedures [133, 134, 198] and efficient second-order algorithms
[1, 71, 72, 132].

12.4 Concluding Remarks

Geometric analysis, which combines differential geometry (DG)


with partial differential equations (PDEs), has generated great
successes in the physical sciences and engineering. In the past
decade, DG-based solvation models have been introduced for
biomolecular modeling. This new methodology has been tested
over hundreds of molecular test cases, ranging from nonpolar
molecules to large proteins. Our DG-based solvation models use
the differential geometry of surfaces theory as a natural means to
separate microscopic domains for biomolecules from macroscopic
domains for solvents and to couple continuum descriptions with
discrete atomistic or quantum representations. The goal of our DG-
based formalism is to achieve an accurate prediction of essential
physical observables while efficiently reducing the dimensionality
of complex biomolecular systems. An important technique used in
our approach is the construction of total free energy functionals
for various biomolecular systems, which enables us to put various
scales into an equal footing. Variational principles are applied to
the total energy functional to derive coupled governing PDEs for
biomolecular systems.
This chapter has focused on equilibrium and non-equilibrium
models of electrolyte solutions around biomolecules. However, the
Wei group has also extended this formalism to the multiscale
modeling of other systems and biological processes. One class
of multiscale models developed in the Wei group is a DG-based
quantum treatment of proton transport [133, 134]. Proton transport
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

Concluding Remarks 445

underpins the molecular mechanisms in a variety of systems,


including transmembrane ATPases as well as other proton pumps
and translocators [199]. The significant quantum effects in proton
permeation require quantum mechanical models, while the large
number of degrees of freedom demands a multiscale treatment
[200, 201]. In the multiscale approach developed by the Wei group,
a new DFT is formulated based on Boltzmann statistics, rather
than Fermi–Dirac statistics, for protons in the solvent while treating
water molecules as a dielectric continuum. The membrane protein
is described in atomistic detail and densities of other ions in the
solvent are approximated via Boltzmann distributions, following
an approach introduced in our earlier Poisson–Boltzmann–Nernst–
Planck theory [195]. The resulting multiscale proton model provides
excellent predictions of experimental current–voltage relationships
[133, 134]. Another class of DG-based multiscale models has been
proposed by Wei et al. for alternative MM and/or continuum
elasticity (CE) description of solute molecules, as well as continuum
fluid mechanics formulation of the solvent [74–76, 202]. The
idea is to endow the DG-based multiscale paradigm with the
ability to handle excessively large macromolecules by elasticity
description, manage conformational changes with MM, and deal
with macromolecular-flow interaction via fluid mechanics. The
theory of continuum elasticity with atomic rigidity (CEWAR) also
has been introduced [202] and treats the molecular shear modulus
as a continuous function of atomic rigidity. Thus, the dynamic
complexity of integrating time-dependent governing equations for
a macromolecular system is separated from the static complexity
of determining the flexibility at given time step. In CEWAR, the
more time-consuming dynamics is approximated using continuum
elasticity theory while the less-time-consuming static analysis is
pursued with atomic description. A recent multidomain formulation
by Wei and co-workers allows each different part of a macromole-
cule to have a different physical description [76]. Efficient geometric
modeling strategies associated with DG-based multiscale models
have been developed in both Lagrangian [203, 204] and Eulerian
representations [205]. Algorithms for curvature evaluation and
volumetric and surface meshing have been developed for organelles,
subcellular structures, and multiprotein complexes [203] and have
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

446 Differential Geometry-Based Solvation and Electrolyte Transport Models

been combined with electrostatic analysis for the prediction of


protein-ligand binding sites [205].

Acknowledgments

This work was supported in part by National Science Foundation


grants IIS-1302285 and DMS-1160352, as well as National Institutes
of Health Grant R01GM-090208. The authors are indebted to their
collaborators who have contributed to the DG-based biomolecular
modeling.

References

1. Z. Chen, S. Zhao, J. Chun, D. G. Thomas, N. A. Baker, P. B. Bates, and


G. W. Wei. Variational approach for nonpolar solvation analysis. J. Chem.
Phys., 137(084101), 2012.
2. N. A. Baker. Biomolecular applications of Poisson-Boltzmann methods.
In K. B. Lipkowitz, R. Larter, and T. R. Cundari, editors, Reviews in
Computational Chemistry, volume 21. John Wiley and Sons, Hoboken,
NJ, 2005.
3. M. E. Davis and J. A. McCammon. Electrostatics in biomolecular
structure and dynamics. Chem. Rev., 94:509–521, 1990.
4. K. A. Sharp and B. Honig. Electrostatic interactions in macromolecules:
Theory and applications. Ann. Rev. Biophys. Biophys. Chem., 19:301–
332, 1990.
5. B. Honig and A. Nicholls. Classical electrostatics in biology and
chemistry. Science, 268(5214):1144–1149, 1995.
6. B. Roux and T. Simonson. Implicit solvent models. Biophys. Chem., 78(1-
2):1–20, 1999.
7. R. Jinnouchi and A. B. Anderson. Electronic structure calculations of
liquid-solid interfaces: Combination of density functional theory and
modified Poisson-Boltzmann theory. Phys. Rev. B, 77(245417), 2008.
8. B. N. Dominy and C. L. Brooks, III. Development of a generalized Born
model parameterization for proteins and nucleic acids. J. Phys. Chem. B,
103(18):3765–3773, 1999.
9. D. Bashford and D. A. Case. Generalized Born models of macromolecu-
lar solvation effects. Ann. Rev. Phys. Chem., 51:129–152, 2000.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 447

10. V. Tsui and D. A. Case. Molecular dynamics simulations of nucleic


acids with a generalized Born solvation model. J. Am. Chem. Soc.,
122(11):2489–2498, 2000.
11. A. Onufriev, D. A. Case, and D. Bashford. Effective Born radii in the
generalized Born approximation: the importance of being perfect. J.
Comput. Chem., 23(14):1297–1304, 2002.
12. E. Gallicchio, L. Y. Zhang, and R. M. Levy. The SGB/NP hydration free
energy model based on the surface generalized Born solvent reaction
field and novel nonpolar hydration free energy estimators. J. Comput.
Chem., 23(5):517–529, 2002.
13. J. Zhu, E. Alexov, and B. Honig. Comparative study of generalized Born
models: Born radii and peptide folding. J. Phys. Chem. B, 109(7):3008–
3022, 2005.
14. P. Koehl. Electrostatics calculations: latest methodological advances.
Curr. Opin. Struct. Biol., 16(2):142–151, 2006.
15. H. Tjong and H. X. Zhou. GBr6NL: A generalized Born method for
accurately reproducing solvation energy of the nonlinear Poisson-
Boltzmann equation. J. Chem. Phys., 126:195102, 2007.
16. J. Mongan, C. Simmerling, J. A. McCammon, D. A. Case, and A. Onufriev.
Generalized Born model with a simple, robust molecular volume
correction. J. Chem. Theory Comput., 3(1):159–169, 2007.
17. D. Chen, G. W. Wei, X. Cong, and G. Wang. Computational methods
for optical molecular imaging. Commun. Numerical Methods Eng.,
25:1137–1161, 2009.
18. J. A. Grant, B. T. Pickup, M. T. Sykes, C. A. Kitchen, and A. Nicholls.
The Gaussian Generalized Born model: application to small molecules.
Phys. Chem. Chem. Phys., 9:4913–4922, 2007.
19. M. Chiba, D. G. Fedorov, and K. Kitaura. Polarizable continuum model
with the fragment molecular orbital-based time-dependent density
functional theory. J. Comput. Chem., 29:2667–2676, 2008.
20. J. Tomasi, B. Mennucci, and R. Cammi. Quantum mechanical continuum
solvation models. Chem. Rev., 105:2999–3093, 2005.
21. R. Improta, V. Barone, G. Scalmani, and M. J. Frisch. A state-specific
polarizable continuum model time dependent density functional
theory method for excited state calculations in solution. J. Chem. Phys.,
125(054103), 2006.
22. Y. Takano and K. N. Houk. Benchmarking the conductor-like polarizable
continuum model (cpcm) for aqueous solvation free energies of
neutral and ionic organic molecules. J. Chem. Theory Comput., 1(1):70–
77, 2005.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

448 Differential Geometry-Based Solvation and Electrolyte Transport Models

23. E. Cances, B. Mennucci, and J. Tomasi. A new integral equation for-


malism for the polarizable continuum model: Theoretical background
and applications to isotropic and anisotropic dielectrics. J. Chem. Phys.,
107:3032–3041, 1997.
24. V Barone, M. Cossi, and J. Tomasi. A new definition of cavities for the
computation of solvation free energies by the polarizable continuum
model. J. Chem. Phys., 107:3210–3221, 1997.
25. M. Cossi, V. Barone, R. Cammi, and J. Tomasi. Ab initio study of solvated
molecules: A new implementation of the polarizable continuum model.
Chem. Phys. Lett., 255:327–335, 1996.
26. G. Lamm. The Poisson-Boltzmann equation. In K. B. Lipkowitz,
R. Larter, and T. R. Cundari, editors, Reviews in Computational
Chemistry, pages 147–366. John Wiley and Sons, Inc., Hoboken, N.J.,
2003.
27. F. Fogolari, A. Brigo, and H. Molinari. The Poisson-Boltzmann equation
for biomolecular electrostatics: a tool for structural biology. J. Mol.
Recognit., 15(6):377–392, 2002.
28. Y. C. Zhou, M. Feig, and G. W. Wei. Highly accurate biomolecular
electrostatics in continuum dielectric environments. J. Comput. Chem.,
29:87–97, 2008.
29. N. A. Baker. Improving implicit solvent simulations: a Poisson-centric
view. Curr. Opin. Struct. Biol., 15(2):137–143, 2005.
30. D. Beglov and B. Roux. Solvation of complex molecules in a polar liquid:
an integral equation theory. J. Chem. Phys., 104(21):8678–8689, 1996.
31. R. R. Netz and H. Orland. Beyond Poisson-Boltzmann: Fluctuation
effects and correlation functions. Eur. Phys. J. E, 1(2-3):203–214, 2000.
32. C. Holm, P. Kekicheff, and R. Podgornik. Electrostatic Eff. Soft matter
Biophys.; NATO Sci. Ser.. Kluwer Academic Publishers, Boston, 2001.
33. L. David, R. Luo, and M. K. Gilson. Comparison of generalized Born and
Poisson models: Energetics and dynamics of HIV protease. J. Comput.
Chem., 21(4):295–309, 2000.
34. A. Onufriev, D. Bashford, and D. A. Case. Modification of the gen-
eralized Born model suitable for macromolecules. J. Phys. Chem. B,
104(15):3712–3720, 2000.
35. P. Weinzinger, S. Hannongbua, and P. Wolschann. Molecular mechanics
PBSA ligand binding energy and interaction of efavirenz derivatives
with HIV-1 reverse transcriptase. J. Enzyme Inhib. Med. Chem.,
20(2):129–134, 2005.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 449

36. J. M. J. Swanson, R. H. Henchman, and J. A. McCammon. Revisiting


free energy calculations: A theoretical connection to MM/PBSA and
direct calculation of the association free energy. Biophys. J., 86(1):67–
74, 2004.
37. C. S. Page and P. A. Bates. Can MM-PBSA calculations predict the speci-
ficities of protein kinase inhibitors? J. Comput. Chem., 27(16):1990–
2007, 2006.
38. Jian J. Tan, Wei Z. Chen, and Cun X. Wang. Investigating interactions
between HIV-1 gp41 and inhibitors by molecular dynamics simulation
and MM-PBSA/GBSA calculations. J. Mol. Struct.: Theochem., 766(2-
3):77–82, 2006.
39. I. Massova and P. A. Kollman. Computational Alanine Scanning To Probe
Protein-Protein Interactions: A Novel Approach To Evaluate Binding
Free Energies. J. Am. Chem. Soc., 121(36):8133–8143, 1999.
40. D. Bashford and M. Karplus. pKa ’s of ionizable groups in proteins:
atomic detail from a continuum electrostatic model. Biochemistry,
29(44):10219–10225, 1990.
41. J. Antosiewicz, J. A. McCammon, and M. K. Gilson. The determinants of
pKa s in proteins. Biochemistry, 35(24):7819–7833, 1996.
42. J. Li, C. L. Fisher, J. L. Chen, D. Bashford, and L. Noodleman. Calculation
of redox potentials and pKa values of hydrated transition metal cations
by a combined density functional and continuum dielectric theory.
Inorg. Chem., 35(16):4694–4702, 1996.
43. J. E. Nielsen and G. Vriend. Optimizing the hydrogen-bond network
in Poisson-Boltzmann equation-based pK(a) calculations. Proteins,
43(4):403–412, 2001.
44. C. M. MacDermaid and G. A. Kaminski. Electrostatic polarization is
crucial for reproducing pKa shifts of carboxylic residues in turkey
ovomucoid third domain. J. Phys. Chem. B, 111(30):9036–9044, 2007.
45. C. L. Tang, E. Alexov, A. M. Pyle, and B. Honig. Calculation of pKas
in RNA: On the structural origins and functional roles of protonated
nucleotides. J. Mol. Biol., 366(5):1475–1496, 2007.
46. J. E. Nielsen, K. V. Andersen, B. Honig, R. W. W. Hooft, G. Klebe, G. Vriend,
and R. C. Wade. Improving macromolecular electrostatics calculations.
Protein Eng., 12(8):657–662, 1999.
47. A. S. Yang, M. R. Gunner, R. Sampogna, K. Sharp, and B. Honig. On
the calculation of pK(a)s in proteins. Proteins-Struct. Funct. Genet.,
15(3):252–265, 1993.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

450 Differential Geometry-Based Solvation and Electrolyte Transport Models

48. R. E. Georgescu, E. G. Alexov, and M. R. Gunner. Combining conforma-


tional flexibility and continuum electrostatics for calculating pKas in
proteins. Biophys. J., 83(4):1731–1748, 2002.
49. W. M. Matousek, B. Ciani, C. A. Fitch, B. E. Garcia-Moreno, R. A.
Kammerer, and A. T. Alexandrescu. Electrostatic contributions to the
stability of the GCN4 leucine zipper structure. J. Mol. Biol., 374(1):206–
219, 2007.
50. H. Li, A. D. Robertson, and J. H. Jensen. Very fast empirical prediction
and rationalization of protein pka values. Proteins, 61(4):704–721,
2005.
51. H. Li, A. D. Robertson, and J. H. Jensen. The determinants of carboxyl
pKa values in turkey ovomucoid third domain. Proteins, 55(3):689–
704, 2004.
52. C. Tan, L. Yang, and R. Luo. How well does Poisson-Boltzmann implicit
solvent agree with explicit solvent? A quantitative analysis. J. Phys.
Chem. B, 110(37):18680–18687, 2006.
53. N. V. Prabhu, M. Panda, Q. Y. Yang, and K. A. Sharp. Explicit ion, implicit
water solvation for molecular dynamics of nucleic acids and highly
charged molecules. J. Comput. Chem., 29:1113–1130, 2008.
54. N. V. Prabhu, P. Zhu, and K. A. Sharp. Implementation and testing of
stable, fast implicit solvation in molecular dynamics using the smooth-
permittivity finite difference Poisson-Boltzmann method. J. Comput.
Chem., 25(16):2049–2064, 2004.
55. R. Luo, L. David, and M. K. Gilson. Accelerated Poisson-Boltzmann
calculations for static and dynamic systems. J. Comput. Chem.,
23(13):1244–1253, 2002.
56. Q. Lu and R. Luo. A Poisson-Boltzmann dynamics method with
nonperiodic boundary condition. J. Chem. Phys., 119(21):11035–
11047, 2003.
57. W. Geng and G. W. Wei. Multiscale molecular dynamics using the
matched interface and boundary method. J. Comput. Phys., 230(2):435–
457, 2011.
58. J. D. Madura, J. M. Briggs, R. C. Wade, M. E. Davis, B. A. Luty, A. Ilin,
J. Antosiewicz, M. K. Gilson, B. Bagheri, L. R. Scott, and J. A. McCammon.
Electrostatics and diffusion of molecules in solution: Simulations with
the University of Houston Brownian Dynamics program. Comput. Phys.
Commun., 91(1-3):57–95, 1995.
59. R. R. Gabdoulline and R. C. Wade. Brownian dynamics simulation of
protein-protein diffusional encounter. Methods-a Companion Methods
Enzymol., 14(3):329–341, 1998.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 451

60. A. H. Elcock, R. R. Gabdoulline, R. C. Wade, and J. A. McCam-


mon. Computer simulation of protein-protein association kinetics:
Acetylcholinesterase-fasciculin. J. Mol. Biol., 291(1):149–162, 1999.
61. D. Sept, A. H. Elcock, and J. A. McCammon. Computer simulations of
actin polymerization can explain the barbed-pointed end asymmetry.
J. Mol. Biol., 294(5):1181–1189, 1999.
62. Y. Cheng, J. K. Suen, Radi-Z., S. D. Bond, M. J. Holst, and J. A.
McCammon. Continuum simulations of acetylcholine diffusion with
reaction-determined boundaries in neuromuscular junction models.
Biophys. Chem., 127(3):129–139, 2007.
63. Y. Cheng, J. K. Suen, D. Zhang, S. D. Bond, Y. Zhang, Y. Song, N. A. Baker,
C. L. Bajaj, M. J. Holst, and J. A. McCammon. Finite element analysis of
the time-dependent Smoluchowski equation for acetylcholinesterase
reaction rate calculations. Biophys. J., 92(10):3397–406, 2007.
64. D. Zhang, J. Suen, Y. Zhang, Z. Radic, P. Taylor, M. Holst, C. Bajaj, N. A.
Baker, and J. A. McCammon. Tetrameric mouse acetylcholinesterase:
Continuum diffusion rate calculations by solving the steady-state
Smoluchowski equation using finite element methods. Biophys. J.,
88(3):1659–1665, 2005.
65. Y. Song, Y. Zhang, C. L. Bajaj, and N. A. Baker. Continuum dif-
fusion reaction rate calculations of wild-type and mutant mouse
acetylcholinesterase: Adaptive finite element analysis. Biophys. J.,
87(3):1558–1566, 2004.
66. Y. Song, Y. Zhang, T. Shen, C. L. Bajaj, J. A. McCammon, and N. A. Baker.
Finite element solution of the steady-state Smoluchowksi equation for
rate constant calculations. Biophys. J., 86(4):2017–2029, 2004.
67. J. Warwicker and H. C. Watson. Calculation of the electric potential in
the active site cleft due to alpha-helix dipoles. J. Mol. Biol., 157(4):671–
679, 1982.
68. D. Petrey and B. Honig. GRASP2: Visualization, surface properties, and
electrostatics of macromolecular structures and sequences. Methods
Enzymol., 374:492–509, 2003.
69. N. A. Baker and J. A. McCammon. Electrostatic interactions. In P. Bourne
and H. Weissig, editors, Structural Bioinformatics, pp. 427–440. John
Wiley & Sons, Inc., New York, 2003.
70. N. A. Baker. Poisson-Boltzmann methods for biomolecular electrostat-
ics. Methods Enzymol., 383:94–118, 2004.
71. Z. Chen, N. A. Baker, and G. W. Wei. Differential geometry based
solvation models I: Eulerian formulation. J. Comput. Phys., 229:8231–
8258, 2010.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

452 Differential Geometry-Based Solvation and Electrolyte Transport Models

72. Z. Chen, N. A. Baker, and G. W. Wei. Differential geometry based


solvation models II: Lagrangian formulation. J. Math. Biol., 63:1139–
1200, 2011.
73. P. Ren, J. Chun, D. G. Thomas, M. J. Schnieders, M. Marucho, J. Zhang, and
N. A. Baker. Biomolecular electrostatics and solvation: a computational
perspective. Quart. Rev. Biophys., 2013.
74. G. W. Wei. Differential geometry based multiscale models. Bull. Math.
Biol., 72:1562–1622, 2010.
75. G.-W. Wei, Q. Zheng, Z. Chen, and K. Xia. Variational multiscale models
for charge transport. SIAM Rev., 54(4):699–754, 2012.
76. G.-W. Wei. Multiscale, multiphysics and multidomain models I: Basic
theory. J. Theor. Comput. Chem., 12(8):1341006, 2013.
77. J. A. Grant, B. T. Pickup, and A. Nicholls. A smooth permittivity
function for Poisson-Boltzmann solvation methods. J. Comput. Chem.,
22(6):608–640, 2001.
78. J. Grant and B. Pickup. A gaussian description of molecular shape. J.
Phys. Chem., 99:3503–3510, 1995.
79. B. Lee and F. M. Richards. The interpretation of protein structures:
estimation of static accessibility. J. Mol. Biol., 55(3):379–400, 1971.
80. F. M. Richards. Areas, volumes, packing, and protein structure. Ann. Rev.
Biophys. Bioeng., 6(1):151–176, 1977.
81. R. S. Spolar, J. H. Ha, and M. T. Record Jr. Hydrophobic effect in protein
folding and other noncovalent processes involving proteins. Proc. Natl.
Acad. Sci. U. S. A., 86(21):8382–8385, 1989.
82. J. R. Livingstone, R. S. Spolar, and M. T. Record Jr. Contribution to
the thermodynamics of protein folding from the reduction in water-
accessible nonpolar surface area. Biochemistry, 30(17):4237–4244,
1991.
83. P. B. Crowley and A. Golovin. Cation-pi interactions in protein-
protein interfaces. Proteins: Struct. Funct. Bioinformatics, 59(2):231–
239, 2005.
84. L. A. Kuhn, M. A. Siani, M. E. Pique, C. L. Fisher, E. D. Getzoff, and J. A.
Tainer. The interdependence of protein surface topography and bound
water molecules revealed by surface accessibility and fractal density
measures. J. Mol. Biol., 228(1):13–22, 1992.
85. C. A. S. Bergstrom, M. Strafford, L. Lazorova, A. Avdeef, K. Luthman,
and P. Artursson. Absorption classification of oral drugs based on
molecular surface properties. J. Med. Chem., 46(4):558–570, 2003.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 453

86. A. I. Dragan, C. M. Read, E. N. Makeyeva, E. I. Milgotina, M. E. Churchill,


C. C.-Robinson, and P. L. Privalov. DNA binding and bending by HMG
boxes: Energetic determinants of specificity. J. Mol. Biol., 343(2):371–
393, 2004.
87. R. M. Jackson and M. J. Sternberg. A continuum model for protein-
protein interactions: Application to the docking problem. J. Mol. Biol.,
250(2):258–275, 1995.
88. V. J. Licata and N. M. Allewell. Functionally linked hydration changes in
escherichia coli aspartate transcarbamylase and its catalytic subunit.
Biochemistry, 36(33):10161–10167, 1997.
89. F. Dong, M. Vijaykumar, and H. X. Zhou. Comparison of calculation
and experiment implicates significant electrostatic contributions to
the binding stability of barnase and barstar. Biophys. J., 85(1):49–60,
2003.
90. F. Dong and H. X. Zhou. Electrostatic contribution to the binding
stability of protein-protein complexes. Proteins, 65(1):87–102, 2006.
91. M. Nina, W. Im, and B. Roux. Optimized atomic radii for protein
continuum electrostatics solvation forces. Biophys. Chem., 78(1-2):89–
96, 1999.
92. J. M. J. Swanson, J. Mongan, and J. A. McCammon. Limitations of atom-
centered dielectric functions in implicit solvent models. J. Phys. Chem.
B, 109(31):14769–14772, 2005.
93. X. Feng and A. Prohl. Analysis of a fully discrete finite element method
for the phase field model and approximation of its sharp interface
limits. Math. Comput., 73:541–567, 2004.
94. J. Gomes and O. D. Faugeras. Using the vector distance functions to
evolve manifolds of arbitrary codimension. Lect. Notes Comput. Sci.,
2106:1–13, 2001.
95. K. Mikula and D. Sevcovic. A direct method for solving an anisotropic
mean curvature flow of plane curves with an external force. Math.
Methods Appl. Sci., 27(13):1545–1565, 2004.
96. S. Osher and R. P. Fedkiw. Level set methods: An overview and some
recent results. J. Comput. Phys., 169(2):463–502, 2001.
97. A. Sarti, R. Malladi, and J. A. Sethian. Subjective surfaces: A geometric
model for boundary completion. Int. J. Comput. Vis., 46(3):201–221,
2002.
98. C. Sbert and A. F. Solé. 3d curves reconstruction based on deformable
models. J. Math. Imaging Vis., 18(3):211–223, 2003.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

454 Differential Geometry-Based Solvation and Electrolyte Transport Models

99. J. A. Sethian. Evolution, implementation, and application of level set


and fast marching methods for advancing fronts. J. Comput. Phys.,
169(2):503–555, 2001.
100. N. Sochen, R. Kimmel, and R. Malladi. A general framework for low level
vision. Image Process. IEEE Trans., 7(3):310–318, 1998.
101. Y. Zhang, G. Xu, and C. Bajaj. Quality meshing of implicit solvation
models of biomolecular structures. Comput. Aided Geometric Des.,
23(6):510–530, 2006.
102. T. J. Willmore. Riemannian Geometry. Oxford University Press, USA,
1997.
103. S. Osher and J.A. Sethian. Fronts propagating with curvature-
dependent speed: algorithms based on Hamilton-Jacobi formulations.
J. comput. phys., 79(1):12–49, 1988.
104. L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based
noise removal algorithms. In Proceedings of the eleventh annual
international conference of the Center for Nonlinear Studies on
Experimental mathematics : computational issues in nonlinear science,
pages 259–268, Amsterdam, The Netherlands, The Netherlands, 1992.
Elsevier North-Holland, Inc.
105. T. Cecil. A numerical method for computing minimal surfaces in
arbitrary dimension. J. Comput. Phys., 206(2):650–660, 2005.
106. D. L. Chopp. Computing minimal surfaces via level set curvature flow. J.
Comput. Phys., 106(1):77–91, 1993.
107. P. Smereka. Semi-implicit level set methods for curvature and surface
diffusion motion. J. Sci. Comput., 19(1):439–456, 2003.
108. D. Mumford and J. Shah. Optimal approximations by piecewise smooth
functions and associated variational problems. Commun. Pure Appl.
Math., 42(5):577–685, 1989.
109. P. Blomgren and T.F. Chan. Color TV: total variation methods for
restoration of vector-valued images. Image Process. IEEE Trans.,
7(3):304–309, 1998.
110. V. Carstensen, R. Kimmel, and G. Sapiro. Geodesic active contours. Int.
J. Comput. Vis., 22:61–79, 1997.
111. Y. Li and F. Santosa. A computational algorithm for minimizing total
variation in image restoration. IEEE Trans. Image Process., 5(6):987–
995, 1996.
112. S. Osher and L. I. Rudin. Feature-oriented image enhancement using
shock filters. SIAM J. Numerical Anal., 27(4):919–940, 1990.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 455

113. G. Sapiro and D. L. Ringach. Anisotropic diffusion of multivalued


images with applications to color filtering. Image Process. IEEE Trans.,
5(11):1582–1586, 1996.
114. G. W. Wei. Generalized Perona-Malik equation for image restoration.
IEEE Signal Process. Lett., 6:165–167, 1999.
115. G. W. Wei and Y. Q. Jia. Synchronization-based image edge detection.
Eur. Lett., 59(6):814–819, 2002.
116. Y. Wang, G. W. Wei, and S.-Y. Yang. Mode decomposition evolution
equations. J. Sci. Comput., 50:495–518, 2012.
117. Y. Wang, G. W. Wei, and S.-Y. Yang. Partial differential equation
transform–Variational formulation and Fourier analysis. Int. J. Numer-
ical Methods Biomed. Eng., 27:1996–2020, 2011.
118. Y. Wang, G. W. Wei, and S.-Y. Yang. Selective extraction of entangled tex-
tures via adaptive PDE transform. Int. J. Biomed. Imaging, 2012:Article
ID 958142, 2012.
119. Q. Zheng, S. Y. Yang, and G. W. Wei. Molecular surface generation using
PDE transform. Int. J. Numerical Methods Biomed. Eng., 28:291–316,
2012.
120. G. W. Wei, Y. H. Sun, Y. C. Zhou, and M. Feig. Molecular multiresolution
surfaces. arXiv:math-ph/0511001v1, pp. 1–11, 2005.
121. P. W. Bates, G. W. Wei, and S. Zhao. The minimal molecular surface.
arXiv:q-bio/0610038v1, [q-bio.BM], 2006.
122. P. W. Bates, G. W. Wei, and S. Zhao. The minimal molecular
surface. Midwest Quantitative Biology Conference, Mission Point Resort,
Mackinac Island, MI:September 29–October 1, 2006.
123. P. W. Bates, G. W. Wei, and Shan Zhao. Minimal molecular surfaces and
their applications. J. Comput. Chem., 29(3):380–391, 2008.
124. P. W. Bates, Z. Chen, Y. H. Sun, G. W. Wei, and S. Zhao. Geometric and
potential driving formation and evolution of biomolecular surfaces. J.
Math. Biol., 59:193–231, 2009.
125. L. T. Cheng, J. Dzubiella, A. J. McCammon, and B. Li. Application of
the level-set method to the implicit solvation of nonpolar molecules.
J. Chem. Phys., 127(8), 2007.
126. Z. Y. Yu and C. Bajaj. Computational approaches for automatic
structural analysis of large biomolecular complexes. IEEE/ACM Trans.
Comput. Biol. Bioinform, 5:568–582, 2008.
127. S. Zhao. Pseudo-time-coupled nonlinear models for biomolecular
surface representation and solvation analysis. Int. J. Numerical Methods
Biomed. Eng., 27:1964–1981, 2011.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

456 Differential Geometry-Based Solvation and Electrolyte Transport Models

128. S. Zhao. Operator splitting adi schemes for pseudo-time coupled


nonlinear solvation simulations. J. Comput. Phys., 257:1000–1021,
2014.
129. K. A. Sharp and B. Honig. Calculating total electrostatic energies with
the nonlinear Poisson-Boltzmann equation. J. Phys. Chem., 94:7684–
7692, 1990.
130. M. K. Gilson, M. E. Davis, B. A. Luty, and J. A. McCammon. Computation
of electrostatic forces on solvated molecules using the Poisson-
Boltzmann equation. J. Phys. Chem., 97(14):3591–3600, 1993.
131. J. Dzubiella, J. M. J. Swanson, and J. A. McCammon. Coupling
hydrophobicity, dispersion, and electrostatics in continuum solvent
models. Phys. Rev. Lett., 96:087802, 2006.
132. Z. Chen and G. W. Wei. Differential geometry based solvation models
III: Quantum formulation. J. Chem. Phys., 135(194108), 2011.
133. D. Chen, Z. Chen, and G. W. Wei. Quantum dynamics in continuum
for proton transport II: Variational solvent-solute interface. Int. J.
Numerical Methods Biomed. Eng., 28:25–51, 2012.
134. D. Chen and G. W. Wei. Quantum dynamics in continuum for proton
transport—Generalized correlation. J. Chem. Phys., 136:134109, 2012.
135. M. Daily, J. Chun, A. Heredia-Langner, G. W. Wei, and N. A. Baker.
Origin of parameter degeneracy and molecular shape relationships in
geometric-flow calculations of solvation free energies. J. Chem. Phys.,,
139:204108, 2013.
136. D.G. Thomas, J. Chun, Z. Chen, G. W. Wei, and N. A. Baker. Parameteri-
zation of a geometric flow implicit solvation model. J. Comput. Chem.,
24:687–695, 2013.
137. J. Dzubiella and J. P. Hansen. Reduction of the hydrophobic attraction
between charged solutes in water. J. Chem. Phys., 119(23):12049–
12052, 2003.
138. J. Dzubiella and J.-P. Hansen. Competition of hydrophobic and
coulombic interactions between nanosized solutes. J. Chem. Phys.,
121(11):5514–5530, September 2004.
139. J. Dzubiella, J. M. J. Swanson, and J. A. McCammon. Coupling nonpolar
and polar solvation free energies in implicit solvent models. J. Chem.
Phys., 124:084905, 2006.
140. J. Che, J. Dzubiella, B. Li, and J. A. McCammon. Electrostatic free
energy and its variations in implicit solvent models. J. Phys. Chem. B,
112(10):3058–3069, 2008.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 457

141. L.-T. Cheng, Y. Xie, J. Dzubiella, J. A. McCammon, J. Che, and


B. Li. Coupling the level-set method with molecular mechanics for
variational implicit solvation of nonpolar molecules. J. Chem. Theory
Comput., 5:257–266, 2009.
142. B. Li and Y. Zhao. Variational implicit solvation with solute molecular
mechanics: From Diffuse-Interface to Sharp-Interface models. SIAM J.
Appl. Math., 73(1):1–23, 2013.
143. P. Setny, Z. Wang, L. T. Cheng, B. Li, J. A. McCammon, and J. Dzubiella.
Dewetting-Controlled binding of ligands to hydrophobic pockets. Phys.
Rev. Lett., 103(18):187801+, October 2009.
144. L.-T. Cheng, Z. Wang, P. Setny, J. Dzubiella, B. Li, and J. A. McCammon.
Interfaces and hydrophobic interactions in receptor-ligand systems:
A level-set variational implicit solvent approach. J. Chem. Phys.,
131(14):144102+, October 2009.
145. S. Zhou, K. E. Rogers, C. A. de Oliveira, R. Baron, L.-T. Cheng, J. Dzubiella,
B. Li, and J. A. McCammon. Variational Implicit-Solvent modeling of
HostGuest binding: A case study on cucurbit[7]uril—. J. Chem. Theory
Comput., 9(9):4195–4204, August 2013.
146. M. L. Connolly. Depth buffer algorithms for molecular modeling. J. Mol.
Graph., 3:19–24, 1985.
147. M. F. Sanner, A. J. Olson, and J. C. Spehner. Reduced surface: An efficient
way to compute molecular surfaces. Biopolymers, 38:305–320, 1996.
148. F. H. Stillinger. Structure in aqueous solutions of nonpolar solutes from
the standpoint of scaled-particle theory. J. Solut. Chem., 2:141–158,
1973.
149. R. A. Pierotti. A scaled particle theory of aqueous and nonaqeous
solutions. Chem. Rev., 76(6):717–726, 1976.
150. J. A. Wagoner and N. A. Baker. Assessing implicit models for nonpolar
mean solvation forces: the importance of dispersion and volume terms.
Proc. Natl. Acad. Sci. U. S. A., 103(22):8331–8336, 2006.
151. H. Federer. Curvature Measures. Trans. Am. Math. Soc., 93:418–491,
1959.
152. J. D. Weeks, D. Chandler, and H. C. Andersen. Role of repulsive forces in
determining the equilibrium structure of simple liquids. J. Chem. Phys.,
54(12):5237–5247, 1971.
153. I. Borukhov and D. Andelman. Steric effects in electrolytes: A modified
poisson-boltzmann equation. Phys. Rev. Lett., 79(3):435–438, 1997.
154. E. Gallicchio, M. M. Kubo, and R. M. Levy. Enthalpy-entropy and cavity
decomposition of alkane hydration free energies: Numerical results
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

458 Differential Geometry-Based Solvation and Electrolyte Transport Models

and implications for theories of hydrophobic solvation. J. Phys. Chem.


B, 104(26):6271–6285, 2000.
155. S. Cabani, P. Gianni, V Mollica, and L Lepori. Group Contributions to
the Thermodynamic Properties of Non-Ionic Organic Solutes in Dilute
Aqueous Solution. J. Solut. Chem., 10(8):563–595, 1981.
156. Y. Mei, C. G. Ji, and J. Z. H. Zhang. A new quantum method for
electrostatic solvation energy of protein. J. Chem. Phys., 125(094906),
2006.
157. S. Zhao and G. W. Wei. High-order FDTD methods via derivative
matching for Maxwell’s equations with material interfaces. J. Comput.
Phys., 200(1):60–103, 2004.
158. Y. C. Zhou, S. Zhao, M. Feig, and G. W. Wei. High order matched interface
and boundary method for elliptic equations with discontinuous
coefficients and singular sources. J. Comput. Phys., 213(1):1–30, 2006.
159. Y. C. Zhou and G. W. Wei. On the fictitious-domain and interpolation
formulations of the matched interface and boundary (MIB) method. J.
Comput. Phys., 219(1):228–246, 2006.
160. S. N. Yu, Y. C. Zhou, and G. W. Wei. Matched interface and boundary
(MIB) method for elliptic problems with sharp-edged interfaces. J.
Comput. Phys., 224(2):729–756, 2007.
161. S. N. Yu and G. W. Wei. Three-dimensional matched interface and
boundary (MIB) method for treating geometric singularities. J. Comput.
Phys., 227:602–632, 2007.
162. J. W. Ponder, C. J. Wu, P. Y. Ren, V. S. Pande, J. D. Chodera, M. J. Schnieders,
I. Haque, D. L. Mobley, D. S. Lambrecht, R. A. DiStasio, M. Head-Gordon,
G. N. I. Clark, M. E. Johnson, and T. Head-Gordon. Current status of the
amoeba polarizable force field. J. Phys. Chem. B, 114:2549–2564, 2010.
163. A. Grossfield, P. Y. Ren, and J. W. Ponder. Ion solvation thermodynamics
from simulation with a polarizable force field. J. Am. Chem. Soc.,
125:15671–15682, 2003.
164. M. J. Schnieders, N. A. Baker, P. Ren, and J. W. Ponder. Polarizable atomic
multipole solutes in a Poisson-Boltzmann continuum. J. Chem. Phys.,
126:124114, 2007.
165. T. S. Lee, D. M. York, and W. Yang. Linear-scaling semiempirical quan-
tum calculations for macromolecules. J. Chem. Phys., 105(7):2744–
2750, 1996.
166. W. T. Yang. Gradient correction in thomas-fermi theory. Phys. Rev. A,
34(6):4575–4585, Dec 1986.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 459

167. W. T. Yang. Direct calculation of electron density in density-functional


theory. Phys. Rev. Lett., 66:1438–1441, 1991.
168. W. T. Yang. A local projection method for the linear combination of
atomic orbital implementation of density-functional theory. J. Chem.
Phys., 94(2):1208–1214, Jan 1991.
169. W. T. Yang. Direct calculation of electron-density in density-functional
theory. Phys. Rev. Lett., 66(11):1438–1441, Mar 1991.
170. W. T. Yang. Ab initio approach for many-electron systems without
invoking orbitals: An integral formulation of density-functional theory.
Phys. Rev. Lett., 59(14):1569–1572, Oct 1987.
171. S. Goedecker. Linear scaling electronic structure methods. Rev. Mod.
Phys., 71:1085–1123, 1999.
172. W. T. Yang. Ab initio approach for many-electron systems without
invoking orbitals: An integral formulation of density-functional theory.
Phys. Rev. A, 38(11):5494–5503, Dec 1988.
173. D. J. Tannor, B. Marten, R. Murphy, R. A. Friesner, D. Sitkoff, A. Nicholls,
M. Ringnalda, W. A. Goddard, and B. Honig. Accurate first principles
calcualtion of molecular charge distribution and solvation energies
from ab initio quantum mechanics and continuum dielectric theory. J.
Am. Chem. Soc., 116:11875–11882, 1994.
174. M. L. Wang and C. F. Wong. Calculation of solvation free energy from
quantum mechanical charge density and continuum dielectric theory.
J. Phys. Chem. A, 110:4873–4879, 2006.
175. J.L. Chen, L. Noodleman, D.A. Case, and D. Bashford. Incorporating sol-
vation effects into density functional electronic structure calculations.
J. Phys. Chem., 98:11059–11068, 1994.
176. V. Gogonea and K. M. Merz. Fully quantum mechanical description of
proteins in solution. combining linear scaling quantum mechanical
methodologies with the poisson-boltzmann equation. J. Phys. Chem. A,
103:5171–5188, 1999.
177. W. Geng, S. Yu, and G. W. Wei. Treatment of charge singularities in
implicit solvent models. J. Chem. Phys., 127:114106, 2007.
178. E. Sigfridsson and U. Ryde. Comparison of methods for deriving atomic
charges from the electrostatic potential and moments. J. Comput.
Chem., 19(4):377–395, 1998.
179. H Hu, Z. Y. Lu, and W. T. Yang. Fitting molecular electrostatic potentials
from quantum mechanical calculations. J. Chem. Theory Comput.,
3:1004–1013, 2007.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

460 Differential Geometry-Based Solvation and Electrolyte Transport Models

180. W. Im and B. Roux. Ion permeation and selectivity of ompf porin:a


theoretical study based on molecular dynamics, Brownian dynamics,
and continuum electrodiffusion theory. J. Mol. Biol., 322:851–869,
2002.
181. D. Gillespie, W. Nonner, and R.S. Eisenberg. Density functional theory
of charged, hard-sphere fluids. Phys. Rev. E, 68:031503, 2003.
182. B. S. Eisenberg and D Chen. Poisson-Nernst-Planck (PNP) theory of an
open ionic channel. Biophys. J., 64:A22, 1993.
183. M. G. Kurnikova, R. D. Coalson, P. Graf, and A. Nitzan. A lattice relaxation
algorithm for Three-Dimensional Poisson-Nernst-Planck theory with
application to ion transport through the Gramicidin A channel. Biophys.
J., 76:642–656, 1999.
184. H. Daiguji, P. Yang, and A. Majumdar. Ion transport in nanofluidic
channels. Nano Lett., 4(1):137–142, 2004.
185. J Cervera, B Schiedt, and P Ramirez. A poisson/nernst-planck model for
ionic transport through synthetic conical nanopores. EPL (Europhys.
Lett.), 71(1):35, 2005.
186. U. Hollerbach, D. P. Chen, and R. S. Eisenberg. Two- and three-
dimensional Poisson–Nernst–Planck simulations of current flow
through gramicidin A. J. Sci. Comput., 16(4):373–409, 2001.
187. R. D. Coalson and M. G. Kurnikova. Poisson-Nernst-Planck theory
approach to the calculation of current through biological ion channels.
NanoBiosci. IEEE Trans., 4(1):81–93, 2005.
188. F. Fogolari and J. M. Briggs. On the variational approach to Poisson-
Boltzmann free energies. Chem. Phys. Lett., 281:135–139, 1997.
189. M. Manciu and E. Ruckenstein. On the chemical free energy of the
electrical double layer. Langmuir, 19(4):1114–1120, 2003.
190. Y. Hyon, B. S. Eisenberg, and C. Liu. A mathematical model for the
hard sphere repulsion in ionic solution. Commun. Math. Sci., 9:459–
475, 2011.
191. M. Z. Bazant, M. S. Kilic, B. D. Storey, and A. Ajdari. Towards an
understanding of induced-charge electrokinetics at large applied
voltages in concentrated solutions. Adv. Colloid Interface Sci., 152:48–
88, 2009.
192. Y. Levin. Electrostatic correlations: from plasma to biology. Rep. Prog.
Phys., 65:1577–1632., 2002.
193. P. Grochowski and J Trylska. Continuum molecular electrostatics, salt
effects, and counterion binding: A review of the Poisson–Boltzmann
theory and its modifications. Biopolymers, 89(2):93–113., 2008.
January 27, 2016 15:45 PSP Book - 9in x 6in 12-Qiang-Cui-c12

References 461

194. V. Vlachy. Ionic effects beyond poisson-boltzmann theory. Annu. Rev.


Phys. Chem., 50:145–165, 1999.
195. Q. Zheng and G. W. Wei. Poisson-Boltzmann-Nernst-Planck model. J.
Chem. Phys., 134:194101, 2011.
196. B. Roux, T. Allen, S. Berneche, and W. Im. Theoretical and computa-
tional models of biological ionchannels. Q. Rev. Biophys., 7(1):1–103,
2004.
197. W. F. Tian and S. Zhao. A fast ADI algorithm for geometric flow
equations in biomolecular surface generations. Int. J. Numerical
Methods Biomed. Eng., 2014.
198. D. Chen and G. W. Wei. Quantum dynamics in continuum for proton
transport I: Basic formulation. Commun. Comput. Phys., 13:285–324,
2013.
199. H. N. Chen, Y. J. Wu, and G. A. Voth. Proton transport behavior through
the influenza A M2 channel: Insights from molecular simulation.
Biophys. J., 93:3470–3479, 2007.
200. J. F. Nagle and H. J. Morowitz. Molecular mechanisms for proton
transport in membranes. Proc. Natl. Acad. Sci. U.S.A., 1458(72):298–
302, 1978.
201. R. Pomes and B. Roux. Structure and dynamics of a proton wire: A
theoretical study of H+ translocation along the single-file water chain
in the gramicidin A channel. Biophys. J., 71:19–39, 2002.
202. K. L. Xia, K. Opron, and G. W. Wei. Multiscale multiphysics and mul-
tidomain models— Flexibility and rigidity. J. Chem. Phys., 139:194109,
2013.
203. X. Feng, K. Xia, Y. Tong, and G.-W. Wei. Geometric modeling of
subcellular structures, organelles and large multiprotein complexes.
Int. J. Numerical Methods Biomed. Engineering, 28:1198–1223, 2012.
204. X. Feng, K. L. Xia, Y. Y. Tong, and G. W. Wei. Multiscale geometric
modeling of macromolecules II: lagrangian representation. J. Comput.
Chem., 34:2100–2120, 2013.
205. K. L. Xia, X. Feng, Y. Y. Tong, and G. W. Wei. Multiscale geometric
modeling of macromolecules. J. Comput. Phys., 275:912–936, 2014.
This page intentionally left blank
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

SECTION III

COARSE-GRAINED MODELS
This page intentionally left blank
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

Chapter 13

A Physics-Based Coarse-Grained Model


with Electric Multipoles

Guohui Li and Hujun Shen


Laboratory of Molecular Modeling and Design,
State Key Lab of Molecular Reaction Dynamics,
Dalian Institute of Chemical Physics, Chinese Academy of Sciences,
457 Zhongshan Rd., Dalian, Liaoning 116023, P. R. China
[email protected]

13.1 Introduction

Molecular dynamics (MD) simulation has recently enjoyed consid-


erable success in modeling proteins since the MD methodology
based on molecular mechanics (MM) enable us to probe the
motions of proteins at the atomistic/molecular level [1–4]. With the
improvement of force fields as well as the increase of computational
power, MD simulations are able to provide accurate description
of protein motions efficiently. Therefore, MD simulation has been
widely accepted as a key complementary tool to experimental
techniques, such as nuclear magnetic resonance (NMR) and X-ray
crystallography, which provide very limited dynamical information
about proteins.

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

466 A Physics-Based Coarse-Grained Model with Electric Multipoles

Despite tremendous success of MD simulations in modeling


proteins, the capability of atomistic MD simulations poses limi-
tations on the access to many biological processes occurring on
the timescales usually ranging from milliseconds to hours and
beyond since atomistic MD simulations of a typical protein can
barely access the timescale of microseconds at cost of consuming
large computational resources [5–7]. In order to push forward
the feasibility of MM-based MD simulation to access the biological
phenomena occurring on much longer timescale, two possible
approaches would be effective: one is to develop powerful sampling
techniques and another is to develop efficient and accurate coarse-
grained (CG) models [8–10]. In this chapter, we just focus on coarse-
grained model in which a few atoms in a protein are grouped into a
“super-atom” as an interaction center. Thus, the reduction of degrees
of freedom of the system of interest, accompanied by increasing the
integration time steps, would enable coarse-grained MD simulations
to be speeded up due to the neglect of contributions from fast local
fluctuations of proteins, making coarse-grained models suitable for
probing large conformational changes of complex biomolecules.
As a matter of fact, it has been realized that many protein
functions usually are associated with large conformational changes,
which beyond the capability of atomistic models as we mentioned
above.
The performance of a coarse-grained force field heavily depends
on how accurate its CG potentials are when describing the
interactions between coarse-grained particles. In considering the
treatment of deriving and parameterizing CG potentials, CG models
can employ either knowledge-based approaches or physics-based
methods. In a knowledge-based approach, the interactions between
CG particles are usually assumed to be pair-wise such that it is
straightforward to derive knowledge-based potentials through the
Boltzmann-based treatment of probability distributions constructed
from experimental protein structures deposited in a database (such
as protein database bank). Following up the pioneering work
of Scheraga and coworkers [11], a variety of knowledge-based
potentials have been proposed and improved for predicting protein
structures. To take solvent effect [12] into account, Miyazawa &
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

Introduction 467

Jernigan [13] applied the quasi-chemical approximation to derive


effective potentials for measuring inter-residue contact energies
for proteins based on the observed contact numbers in crystal
structures of globular proteins. Wang et al. [14] reduced the
Miyazawa–Jernigan (MJ) 20×20 lattice matrix to a 5×5 matrix
after they sought the requirement of minimal number of residue
types for forming a well-structured protein. In the protein model
proposed by Wilson and Doniach [15], the backbone of a protein
is regarded as a freely rotating rigid chain; as for each amino acid
residue, side chain atoms are considered as a super-atom that
is attached to the rigid backbone chain; attractive and repulsive
interactions between different CG particles are described by the
knowledge-based potentials derived based on experimental protein
structures in protein database bank. Godzik and Skolnick [16]
introduced a knowledge-based model with the consideration to the
pattern of non-bonded interactions between side chains as well
as the contributions from one-, two- and three-body interactions.
Bryant and Lawrence [17] presented a novel knowledge-based
potential, which was used to measure hydrophobic and pair-
wise interaction energies for proteins in an average way; they
suggested that the treatment of “threading” different sequences
through backbone folding motifs might be an effective way to derive
accurate knowledge-based potentials for the structure prediction.
Over the past decade, these knowledge-based models have enjoyed
considerable success in modeling proteins, particularly in predicting
protein structures. However, the knowledge-based approaches have
been criticized recently due to their arguable fundamental basis or
questionable physical meaning [18–20]. First, the simple treatment
of many-body correlation in knowledge-based approaches might
be inaccurate or even incorrect [18] since the contributions from
packing effects as well as chain connectivity [19] are nontrivial;
second, the Boltzmann-based treatment of statistics distributions
from experimental protein structures might provide an inaccurate
dynamical description for a single protein fluctuating around its
equilibrium [19, 21].
The strategy of developing physics-based CG model for proteins,
which was originally suggested by Arieh Warshel and Michael
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

468 A Physics-Based Coarse-Grained Model with Electric Multipoles

Levitt [22], is similar to that of developing an atomistic model.


To be specific, based on experimental data as well as quantum
mechanics (QM) calculations and/or atomistic simulations, physics-
based CG potentials can be parameterized through either Boltzmann
inversion method [23] or force matching [24]. As such, physics-
based potentials can describe bonded and non-bonded interac-
tions between CG particles by following physical principles. In
this respect, physics-based approach should have advantage over
knowledge-based approach. Over the past decade, major advances
have been made in the development and improvement of various
physics-based CG models and many successful stories have been
reported in their applications to various biological systems. For the
limited space here, we only present a few typical physics-based
CG models and demonstrate promising physics-based potentials in
modeling proteins. A very simple one-bead CG model, also known
as minimalist model, was developed by Tozzini and McCammon
[25]; in this model, each amino acid of a protein is grouped into a
bead centering at the alpha carbon (Cα ) and the empirical energy
function for this minimalist model is a sum of various energy
terms describing the bond stretching, angle bending, dihedral angle
torsion, van der Waals and electrostatic interactions, respectively;
the capability of the minimalist model has been shown in modeling
the flap opening of HIV-1 protease [26] as well as the binding of
substrates to HIV-1 protease and product release pathways [27].
In the off-lattice minimalist model developed by Head-Gordon and
coworkers [28], a CG bead belongs to one three types: hydrophobic,
hydrophilic and neutral; the experimental protein structures, which
contain α-helical, β-sheet, and mixed α/β topologies, were used for
the parameterization of the angle and dihedral energy terms; this
minimalist model has been proved to be suitable to model protein
folding.
However, local and non-local interactions between side chains
are essential to protein folding as well as protein assembly. To
improve the performance of a coarse-grained model, it is necessary
to explicitly describe the interactions involving side chains. We take
an example from a more sophisticated physics-based CG model,
namely united residue (UNRES) coarse-grained model, which has
been developed by Scheraga and coworkers [29, 30]; in this model,
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

Introduction 469

side chains of amino acids are regarded to be elliptical while peptide


groups are considered as spherical bead, and the feasibility of the
UNRES coarse-grained model in modeling protein folding dynamics
heavily relies on the treatment of side chain-backbone and side
chain-side chain interactions. Another example is MARTINI coarse-
grained model proposed by Marrink and coworkers, who employ
the four-to-one mapping scheme for bimolecules (on average four
non-hydrogen atoms are clustered into a coarse-grained bead) [31];
as for the amino acids with aromatic ring, such as phenylalanine
(Phe), tyrosine (Tyr) and tryptophan (Trp), the mapping of ring-
like fragments is taken with a little bit higher resolution (three-
to-one or two-to-one mapping scheme); these features enable
MARTINI CG model to be effective in modeling protein-protein
and protein-lipid interactions [32]. In addition, the framework of
coupling elastic network model [33, 34] or atomistic models [35]
has been recently suggested in order to improve the accuracy
of MARTINI CG model in modeling proteins. Feig and coworkers
[36] has proposed a CG model for proteins with higher resolution
compared to UNRES and MARTINI models, namely PRIMO (protein
intermediate model); in this model, the empirical energy function
includes the hydrogen-bonding energy term in addition to the
typical bonded and non-bonded interaction terms mentioned in
the minimalist model; the PRIMO model has nice performance
in modeling protein folding and protein dynamics. Because the
computational cost of modeling proteins is largely contributed from
the calculations of the interactions involving water molecules, Wu
and coworkers [37] has introduced a hybrid CG model for proteins,
called PACE, in which a protein-solvent system is treated through
coupling united atom model with the MARTINI CG water model (four
real water molecules are clustered into a bead). However, in order to
improve the transferability of CG models, Voth and coworkers [38]
have suggested that the interactions between CG particles should
be contributed from two components: the systematic and analytic
components; as such, the analytic component of CG interactions is
determined by evaluating the anisotropic interactions between Gay–
Berne ellipsoids while the systematic part can be done through force
matching, which is known as MS-CG method [24, 39].
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

470 A Physics-Based Coarse-Grained Model with Electric Multipoles

Unfortunately, in many physics-based CG models for proteins,


the electrostatic interactions between CG particles are usually
treated implicitly or simply following the principles of atomistic
point charge models or even are ignored. However, the inaccurate
treatment of electrostatics would greatly impair the transferability
of CG models because electrostatic interactions play a critical role
in biomolecular behaviors. One example is the MARTINI CG water
model, in which a water CG particle represents four real water
molecules without point charges, such that electrostatic interactions
involving water molecules are ignored, resulting in the incorrect
description of non-bonded interactions between polar head groups
of phospholipids and water molecules as well as the interactions
between hydrophilic side chain groups of proteins and water mole-
cules [40, 41]; thus, in considering the nontrivial contributions from
electrostatics of a cluster of water molecules, MARTINI polarizable
water model [40] and BMW water model [41] have been proposed
and significantly improved the performance of MARTINI CG model,
demonstrating the importance of electrostatic interactions. Another
example is the “sticky dipole” potential water model, known as
BBL model, which was originally proposed by Bratko et al. [42];
in this model, a single site is located at the molecular center
of mass with a short-range tetrahedral “sticky” potential and a
long-range point dipolar potential; thereafter, the BBL model was
improve by Ichiye et al. [43] through replacing original hard-sphere
model with Lennard–Jones soft-sphere potential and was further
refined by Ichiye et al. [44] by introducing higher order multipole
moment expansion such that the modified BBL model was able
to accurately mimic the potential energy function for liquid water.
Similar to Ichiye’s work, a GBEMP water model [45] has been
proposed based on Gay–Berne and point electric multipole (EMP)
potentials; the extension of GBEMP CG model to organic liquids [45–
48] and proteins [49, 50] have demonstrated that the encouraging
performance of GBEMP model is credited to the inclusion of point
multipoles as well as the anisotropic description of Gay–Berne
particles. In what follows, we present details about the development
of the physics-based GBEMP CG model as well as its applications in
modeling solvent liquids and proteins.
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

Model 471

13.2 Model

13.2.1 GBEMP Energy Function


The potential energy function of GBEMP model is a sum of various
energy terms:
U GBEMP = U bond + U angle + U torsion + U GB + U EMP , (13.1)
where U bond , U angle , U torsion , U GB and U EMP represent the bond
stretching, angle bending, torsional, van der Waals, and electrostatic
potentials, respectively. The valence potentials (such as bond
stretching, angle bending, and torsion potentials) adopt similar
functional forms being employed by MM3 force field [51], in which
the bond stretching term uses the fourth-order Taylor expansion of
Morse potential, the bond angle bending term adopts a sixth-order
potential, and the torsional term utilizes a three-term Fourier series
expansion:
   
7
U bond = Kb (b − b0 )2 1 − 2.55 (b − b0 ) + 2.55 (b − b0 )2
12
(13.2)
U angle = Kθ (θ − θ0 )2 [1 − 0.014 (θ − θ0 ) + 5.6 × 10−5 (θ − θ0 )2
−7.0 × 10−7 (θ − θ0 )3 + 2.2 × 10−8 (θ − θ0 )4 (13.3)

U torsion = Knφ [1 + cos(nφ ± δ)] (13.4)
n
The parameterization of the bond stretching, angle bending and
torsional potentials can be done by following the Boltzmann-based
approach: (1) carrying out atomistic MD simulations of the system
of interest using atomistic models, such as AMOEBA [52], or AMBER
[53], or CHARMM [54]; (2) constructing the potentials of mean force
(PMF) profile from the atomistic configurations sampled from the
atomistic MD trajectories; (3) obtaining the parameters through
fitting to the atomistic PMF profile. Due to the coupling between 1–4
non-bonded interactions and torsional interactions, it is necessary
to re-optimize the parameters for torsional potentials in following
MD simulations by iteratively matching to experimental results if
available or fitting to atomistic results. The last two energy terms
(U GB and U EMP ) in Eq. 13.1 represent the non-bonded (vdW and
electrostatic) interaction energy terms, and are described in what
follows.
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

472 A Physics-Based Coarse-Grained Model with Electric Multipoles

13.2.2 Gay–Berne Potential


It has become attractive to consider a CG particle as ellipsoid
because this treatment can give a reasonable approximation to the
anisotropic nature of a cluster of atoms. The Gay–Berne anisotropic
potential [55, 56] is actually based on a Gaussian-overlap potential
[57], and in this work the Gay–Berne anisotropic potential energy
function U GB is represented by the functional form:

12
d σ
U GB (ui , u j , ri j ) = 4ε(ui , u j , ri j ) ⎣
      w 0
  
ri j − σ (ui , u j , ri j ) + dw σ0

6 ⎤
dw σ0
−   
− ⎦ (13.5)
ri j − σ (ui , u j , ri j ) + dw σ0

The range parameter is denoted by σ and the strength parameter


denoted by ε; they are pair-wise functions of the relative orientation
of corresponding Gay–Berne particles. The Gay–Berne potential is
associated with a set of parameters describing the shape of Gay–
Berne particles as well as the orientation of its principal axis defined
according to its corresponding all-atom model in the inertial frame.
The term dw is introduced in order to control the “softness” of the
Gay–Berne potential.
  
The functional form of the range parameter σ (ui , u j , ri j ) is given
as
  
σ (ui , u j , ri j ) = σ0
⎡ ⎧ ⎫ ⎤ −1/2

⎨ χα 2 (u
  −2   2       ⎪

⎢ i · r i j ) + χα (u j · r i j ) − 2χ (ui · r i j )(u j · r i j )(ui · u j ) ⎥
⎣1 −   ⎦ ,

⎩ 2  
2 ⎪

1−χ ui · u j

(13.6)
where

σ0 = di2 + d 2j (13.7)

   1/2
li2 − di2 l 2j − d 2j
χ= 2   (13.8)
li + di2 l 2j + d 2j
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

Model 473

   1/2
li2 − di2 l 2j + di2
α = 2
2   (13.9)
l j − d 2j li2 + d 2j
The notations l and d represent the length and breadth of the Gay–
Berne particles. The terms χ α 2 , χ α −2 , and χ in Eq. 13.6 can be
computed in the following manner:
2 
li − di2
χα =  2
2  (13.10)
li + d 2j
2 
−2
l j − d 2j
χα =2  (13.11)
l j + di2
2  
li − di2 l 2j − d 2j

α = 2
2    (13.12)
l j + di2 li2 + d 2j
  
The total well-depth parameter ε(ui , u j , ri j ) can be calculated as
the product of the well depth of the cross configuration ε0 and the
orientation-dependent strength terms (ε1 and ε2 ), such as
     
ε ui , u j , ri j = ε0 · ε1ν ui , u j · ε2μ ui , u j , ri j ,
       
(13.13)
in which the orientation-dependent strength terms ε1 and ε2 is
represented as
    2 −1/2
  2  
ε1 ui , u j = 1 − χ ui · u j (13.14)

⎧    
  ⎨ χ  α 2 u  
i · ri j + χ α
 −2  
u j · ri j
  
ε2 ui , u j , ri j = 1−  
⎩ 
1 − χ 2 ui · u j


   ⎫
2χ 2 ui · ri j u j · ri j ui · u j ⎬
     

−   ,

1 − χ 2 ui · u j
 ⎭

(13.15)
where
⎡ 1 1
  1 1
 ⎤1/2
ε
⎢ Si
μ
− ε μ
Ei · ε μ
Sj − ε μ
Ej ⎥
χ = ⎢
⎣  μ1 1
  1 1
⎥⎦ (13.16)
ε S j + ε E i · ε Si + ε E j
μ μ μ
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

474 A Physics-Based Coarse-Grained Model with Electric Multipoles

⎡ 1 1
  1 1
 ⎤1/2
⎢ ε Si − ε E i · ε S j + ε E i ⎥
μ μ μ μ

α 2 = ⎢
⎣  μ1 1
  1 1
⎥⎦ , (13.17)
ε S j − ε E j · ε Si + ε E j
μ μ μ

where the notation εE is the well depth of the end-to-end/face-to-


face configuration and the notation εS is the well depth of the side-
by-side configuration. As for the interaction between unlike pairs,
their εS and εE are specified explicitly and all values are computed
using a combining rule employed in AMOEBA polarizable force field
[58]. The parameters μ and ν were set to canonical values of 2.0 and
1.0, respectively. The terms χ 2 , χ 2 α 2 and χ 2 α −2 can be treated in
an inseparable fashion and computed directly through the following
equations:
   
1/μ 1/μ 1/μ 1/μ
εSi − εEi · εS j − εE j
χ 2 =     (13.18)
1/μ 1/μ 1/μ 1/μ
εS j + εEi · εSi + εE j
 
1/μ 1/μ
ε Si − ε E i
χ 2 α 2 =   (13.19)
1/μ 1/μ
ε Si + ε E j
 
1/μ 1/μ
εS j − ε E j
χ 2 α −2 =  (13.20)
1/μ 1/μ
εS j + ε E i
To determine the parameters of the Gay–Berne potential for a
CG particle, one needs to construct the atomistic energy profiles
(using AMOEBA force field in our work) of van der Waals (vdW)
interactions between the homodimer of its corresponding all-atom
model. Please note that various configurations with different ori-
entations of the homodimer, such as side-by-side, end-to-end/face-
to-face and T-shape, have to be generated at various separations
(from short to long distances) for the calculations. The vdW
interaction energies between the homodimer at each separation
were calculated as a Boltzmann average over conformations being
generated by rotating one molecule around its primary axis with
keeping another molecule fixed. Gay–Berne particles can be treated
as sphere or ellipsoid or disk. The initial Gay–Berne parameters can
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

Model 475

be obtained by fitting to the atomistic energy profiles in gas phase by


employing an optimization algorithm and then will be re-optimized
in subsequent CG MD simulations if necessary.

13.2.3 Electric Multipole Potential


In GBEMP model, point multipoles are included inside Gay–Berne
particles for providing a reasonable approximation to the charge
density of corresponding all-atom model, and they are usually
placed at the mass centers of the Gay–Berne particles and/or
specific locations inside the Gay–Berne particles. In some cases,
non-interaction EMP sites, which do not involve non-bonded
interactions, are used to connect two different Gay–Berne particles.
Thus, the interaction energy between two electric multipole sites
(i and j ) can be computed as
U EMP = Mit Ti j M j , (13.21)
where M is point multipole and given by
 
M = q dx dy dz Qx x . . . (13.22)
In Eq. 13.21, Ti j is the interaction matrix [58]. In Eq. 13.22, the
notations q, d and Q represent charge, dipole and quadrupole
moments, respectively. The number of EMP sites included in Gay–
Berne particles would determine how accurately the electrostatics
would be described as well as how fast the model is able to achieve.
Thus, it is critical to decide the number of EMP sites for each Gay–
Berne particle through optimizing the balance between accuracy
and efficiency.
The point multipoles can give the accurate description of the
electrostatic interactions between Gay–Berne particles separated
with a certain distance (independent of the particle sizes). As the
two particles are getting too close to each other, the point multipoles
are not able to accurately describe the overlap of their charge
density, causing the so-called penetration error [59]. An effective
solution to avoid the penetration error would be to seek a proper
damping function [60]. In current GBEMP model, we employed the
damping function defined as
λ = 1 − e−a u ,
3
(13.23)
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

476 A Physics-Based Coarse-Grained Model with Electric Multipoles

where u corresponds to the effective distance and it is defined in the


relation
ri j
u= , (13.24)
αi α j
where ri j is the actual distance between i th and j th CG particles,
and αi and α j represent the effective sizes of the particles i and
j . To control the damping strength, the dimensionless parameter
denoted by a (the value is tentatively set to 0.49) is introduced
in our model. Thus, the damping factor λ is applied to manipulate
the multipole interactions and forces, and it would approach unity
when the distance ri j increases. Actually, this approach has been
proved to be effective in treating the polarization catastrophe of
point polarizable model [58]. Therefore, when the damping function
is applied to the model, the smeared charge distributions would
replace the charge distributions of the point multipoles and the
actual overlap of the charge densities of CG particles can be properly
described, avoiding the penetration problem at short range.
It is has been known that the atomic multipole moments for
atoms in AMOEBA model can be calculated through quantum
mechanics method and Stone’s distributed multipole analysis [61].
Thus, it is straightforward to obtain the parameters of electric
multipole potentials based on the distributed multipole analysis
after the EMP sites of Gay–Berne particles are decided or directly
from AMOEBA force field. However, the EMP parameters of Gay–
Berne particles need to be optimized because they are derived
based on the gas-phase ab initio quantum mechanics. One possible
solution would be to match GBEMP and AMOEBA results for the
electrostatic energies between CG particles and water molecules,
or between CG particle dimers, at various separations and/or in
different orientations.

13.3 GBEMP Model for Molecular Solvents

Solvents (especially water) play a critical role in many biological


processes or chemical reactions, so the development and improve-
ment of coarse-grained force fields for solvents are essential to the
CG modeling of biomolecules in solution. In addition, these small
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

GBEMP Model for Molecular Solvents 477

Figure 13.1 GBEMP mappings for (A) water, (B) methanol and (C) benzene,
their rigid bodies, which are enclosed by dash lines, consist of CG particles
represented by sphere, ellipsoid, and disk respectively. For each CG particle,
a point multipole is included at the mass centers for water and benzene and
at the oxygen atom for methanol respectively. As for water and benzene,
EMP and Gay–Berne interaction sites share the same spot indicated by black
filled circle enclosed by red open circle. In the case of methanol, EMP and
Gay–Berne interaction sites are located at different positions illustrated by
red and black filled circles respectively.

molecules can serve as the building blocks for complex biomolecules


such that improving the accuracy of CG model for solvents would
increase the transferability of CG model for biomolecules. In what
follows, we present the derivation of GBEMP potentials for a few
examples of solvents (water, methanol and benzene) and at the same
time demonstrate the features of our GBEMP model.
Similar to the BBL water model we mentioned above, one water
molecule is represented by a single CG particle and the shape of the
water molecule is considered to be spherical in our current GBEMP
water model (see Fig. 13.1A). As such, the Gay–Berne potential,
describing the vdW interactions between two spherical CG particles,
is reduced to the well-known Lennard–Jones potential. One may
question the computational efficiency of the GBEMP water model,
but the major purpose of coarse-graining a water molecule into
a bead here is to demonstrate the advantage of point multipoles
in the electrostatic description of hydrogen bonding molecules. To
increase the computational efficiency of GBEMP water model, one
may crudely cluster a few water molecules into a CG particle, in the
same way as did in the MARTINI or BMW water model. In order
to obtain Gay–Berne parameters of the water model, we need to
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

478 A Physics-Based Coarse-Grained Model with Electric Multipoles

construct atomistic potential of mean force (PMF) profile of the


vdW interactions between corresponding water homodimer using
AMOEBA force field in our work. By fitting to the atomistic PMF
profile, a set of Gay–Berne parameters can be determined and then
should be refined in subsequent CG MD simulation of liquid water
in order to reproduce atomistic results as much as possible. From
atomistic AMOEBA MD simulations of liquid water, the average
molecular multipole moments, containing the contribution from
induced dipoles, can be calculated from atomistic MD simulations.
As we mentioned above, it is straightforward to obtain the point
multipoles at any position (mass center in the case of water
model) inside water molecule based on the average molecular
multipole moments. The GBEMP results for MD simulations of liquid
water have shown excellent agreement to all-atom TIP3P water
model. Particularly, the O-H RDF plot obtained from the GBEMP
water model can reproduce the major feature of hydrogen bonding
observed in the experimental results or atomistic results using
the TIP3P model, implying that it is necessary to include point
multipoles in CG water model in order to reconstruct the electron
density of atomistic water model.
In the case of GBEMP methanol model (see Fig. 13.1B), a
methanol molecule is regarded as an elliptical Gay–Berne particle
(see Fig. 13.1). To derive the Gay–Berne parameters, a variety
of the methanol homodimer configurations have to be generated
in different orientations (for instance, end-to-end, side-by-side,
cross, and t-shape), and pair-wise vdW interactions are required
to be calculated at various separations using an atomistic force
field. Similar to the parameterization of GBEMP water model, an
optimization algorithm, such as genetic algorithm used in our
current work, can be applied to fit GBEMP energy profiles to
the atomistic energy profiles. Through monitoring the correlation
between the CG an atomistic models, a set of Gay–Berne parameters
can be obtained for the GBEMP methanol model. In original version
of GBEMP methanol model [45], an electric multipole was place at
the center of mass of the methanol model. However, such treatment
has been shown to be improper because the on-center treatment
of the EMP site cannot correctly capture hydrogen bonding feature
of liquid methanol. Although doubling the dipole magnitude of the
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

GBEMP Model for Molecular Solvents 479

EMP site is able to improve the GBEMP result for O-H RDF plot, such
treatment would result in the unrealistic electrostatic interaction
energies. The on-center model is working for GBEMP water model
since the on-center EMP site is very close to the oxygen atom where
high electron density occurs. Thus, an off-center model, in which
the EMP site was placed at a position with high electron density
(such as the oxygen atom in the case of methanol model) instead
of being placed at the default mass center, has been shown to
correctly capture the predominant hydrogen-bonding orientation
[46], implying that it is important to find out the optimal site
of the electric multipole expansion such that the charge density
distribution of a molecule can be reconstructed by GBEMP model
as much as possible.
As for benzene molecule (see Fig. 13.1C), the geometrical shape
of the molecule can be considered to be disk-like. Similar to
parameterize the Gay–Berne potential of the methanol molecule,
various orientations for benzene homodimer (such as stacking, side-
by-side and t-shape) are required to be generated for the calcu-
lations of the vdW interaction energies with different separating
distances. In contrast to the GBEMP methanol molecule using the
off-center model for EMP site, the on-center point multipole model
for benzene molecule should be the best option due to the planar
and symmetric feature of the benzene molecule. In particular, the
on-center model would provide satisfactory agreement with the
distributed atomic multipole model when benzene-benzene pair
adopts stacking orientation.
In summary, we present the description of parameterizing
Gay–Berne and EMP potentials for three typical solvents (water,
methanol, and benzene) which represent spherical, elliptic and
disk-like CG particles, respectively. The parameterization of Gay–
Berne and EMP potentials for any small molecule can follow the
same strategy from above. For instance, the GBEMP models for
a few important organic solvents, such as tetrahydrofuran (THF),
chloroform (CHCl3 ), acetaldehyde (CH3 CHO) and methanethiol
(CH3 SH), have been built and improved [47, 48], demonstrating
that the inclusion of the electric multipoles is necessary in order to
significantly improve the accuracy of CG models applied in hydrogen
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

480 A Physics-Based Coarse-Grained Model with Electric Multipoles

Figure 13.2 GBEMP mappings for amino acid dipeptides: each rigid body,
being enclosed by a dash line, consists of a Gay-Berne particle (represented
by shadowed ellipsoid, or sphere, or disk) with or without electric
multipoles. The indices of the rigid bodies, Gay–Berne sites, interacting EMP
sites and non-interaction EMP sites (just serve as connecting different rigid
bodies), are indicated by Roman numerals and Arabic numbers in black, red
and blue, respectively.

bonding systems. In what follows, we would present the extension


of our GBEMP model with electric multipoles to model proteins.

13.4 GBEMP Model for Proteins

GBEMP mapping for amino acid dipeptide models (see Fig. 13.2) has
been described in our previous papers [49, 50], and we summarize
the GBEMP mapping of phenylalanine dipeptide here for an example.
The GBEMP model for phenylalanine dipeptide contains six Gay–
Berne particles enclosed in the corresponding rigid bodies (denoted
by Roman numerals I through VI), any two of which are connected
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

GBEMP Model for Proteins 481

by a virtual bond. Each rigid body consists of at least one Gay–


Berne particle (ellipsoid, or sphere, or disk) with or without EMP
sites. Each Gay–Berne site acts as the vdW interaction center and is
usually placed at the mass center of the CG particle. For instance, the
indices of the Gay–Berne sites are 111, 482 and 871 for the spherical
rigid bodies (I, III and VI), respectively, and the Gay–Berne sites
are located at the mass centers of corresponding all-atom methyl
groups (-*CH3 ); the numbers 122 and 862 represent the indices of
the Gay–Berne sites that are placed at the centers of corresponding
all-atom peptide groups (–CONH-) for the elliptical rigid bodies (II
and V), respectively; the index of the Gay–Berne site 492 show
the interaction site positioned at the mass center of corresponding
all-atom phenyl group (–C6 H5 ) for the disk-like rigid body (IV).
In our current GBEMP model for phenylalanine dipeptide, electric
multipoles are included in the corresponding Gay–Berne particles
enclosed by the elliptical rigid bodies (II and V) and the disk-like
rigid body (IV). As for the elliptical rigid bodies (II and V), the oxygen
atoms of the peptide groups are considered as the locations of the
EMP sites, the indices of which are 121 and 861, and the nitrogen
atoms of the peptide groups are regarded as the positions of the
EMP sites indicated by the numbers 123 and 863. Please note that
in the disk-like rigid body (V) the corresponding all-atom model of
the Gay–Berne particle is represented by the phenyl group (–C6 H5 ),
and then the Gay–Berne and EMP potentials of benzene molecule
(C6 H6 ) can be applied to this Gay–Berne particle such that the EMP
site indicated by the number 493 shares the same spot with the Gay–
Berne site the index of which is 492. In our GBEMP model, a few
non-interaction EMP sites are introduced to connect two different
rigid bodies and they don’t involve electrostatic interactions; for
instance, the α and γ carbon atoms of phenylalanine dipeptide
are considered as the locations of the non-interaction EMP sites
indicated by the numbers 481 and 491. Thus, bonding occurs
between two consecutive rigid bodies via Gay–Berne and/or EMP
sites, for instance, bonds exists between the sites (111, 122), (123,
481), (482, 491), (481, 862) and (862, 871). One example for angle
bending happens between the sites (111, 122, 123) and that of
torsion angle occurs between the sites (111, 122, 123, 481).
February 15, 2016 12:8 PSP Book - 9in x 6in 13-Qiang-Cui-c13

482 A Physics-Based Coarse-Grained Model with Electric Multipoles

Polar molecules possess a permanent dipole moment, and the


quality of GBEMP model can be evaluated by comparing the GBEMP
and atomistic results for the calculations of the dipole moments of
polar dipeptide models. In order to do these calculations, various
conformations for each dipeptide model are needed to be sampled
from atomistic MD simulations (AMEOBA force field). Our previous
study has shown that the excellent correlation between GBEMP
and AMOEBA models can be obtained [50], indicating that it is
reasonable to believe the good quality of obtained EMP parameters
and to assume that our treatment of EMP sites (such as determining
the number of EMP sites and the positions of these EMP sites)
is acceptable. Furthermore, the electrostatic interactions between
dipeptide–water pairs and between dipeptide-dipeptide pairs have
been investigated in our recent study [50], showing that our
GBEMP model could reproduce the results from AMOEBA model.
In particular, our GBEMP model could correctly capture both the
repulsive and attractive features of the electrostatic interactions
between like or unlike pairs of dipeptides. These encouraging
results of GBEMP model observed in our previous studies should
be attributed to the inclusion of point multipoles since removing
these point multipoles would greatly impair the performance of our
GBEMP model.
It has been widely accepted that not only protein structure
but also protein dynamics is essential to protein functions. First,
a folded protein is formed through the packing of its secondary
structure elements (such as alpha-helix, beta-sheet and random
coil), which contribute significantly to the protein stability and in
turn affect protein function. Second, it has been recognized that the
local conformation changes play a critical role in the packing of the
secondary structure elements and the dynamics of these secondary
structure elements should be associated with two backbone torsion
angles, such as (C-N-Ca-C) and (N-Ca-C-N). Thus, the distribution
of the backbone torsion angles ( , ), which is well-known
as Ramachandran plot [62], has been extensively employed to
evaluate the secondary structure propensity of amino acids. For
instance, the alpha basin located in the neighborhood of αR ( ≈
–60◦ ,  ≈ –60◦ ) should be relevant to the formation of alpha
helix secondary structures; the beta basin consisting of the regions
February 15, 2016 12:8 PSP Book - 9in x 6in 13-Qiang-Cui-c13

GBEMP Model for Proteins 483

PPII ( ≈ –60◦ ,  ≈ 150◦ ) and C5 ( ≈ –150◦ ,  ≈ 170◦ )


is supposed to be associated with the formation of beta-sheet
secondary structures; the regions αL ( ≈ 60◦ ,  ≈ 60◦ ) and C7ax
( ≈ –70◦ ,  ≈ 70◦ ) are less populated and are believed to be
related to the formation of turns or loops. In our recent study, we
have shown that our GBEMP model not only could capture major
features of experimental Ramachandran plots (Dunbrack Library
[63]) but also achieved excellent agreement with experimental
results for the relative populations of the regions, such as αR basin,
beta basin and αL . First, both experimental and GBEMP results have
shown that the most stable conformations are observed in the αR
basin; second, according to experimental Ramachandran plots, the
beta basin is the second most population region for most of amino
acids, and this feature was well reproduced by our GBEMP model;
third, the αL region is less populated, which are both observed from
experimental and GBEMP results.
The orientation of amino acid side chains is another important
effect on the packing of secondary structure elements because the
relative orientation between pairs of side chains or between side-
chains and backbones contribute greatly to the magnitude level
of their interactions. Simply speaking, the interactions between
side chains or between side chain and backbone would affect the
distribution of the backbone torsion angles ( , ) and in turn
impact the formation of secondary structure elements. In general,
the dominant conformations in terms of the side chain torsion χ1
are located in three narrow regions, namely g– (≈60◦ ), t (≈180◦ ),
and g+ (≈300◦ ), and these regions follow the order of preferences
in population: g+ >t >g–. It is encouraging that our GBEMP model
could reproduce the important feature according to our recent work
that have shown the excellent correlation between experimental and
GBEMP results for the relative population of the three regions g+
t and g– Moreover, in contrast to the amino acids with nonpolar
and aromatic side chains, such as isoleucine (Ile), leucine (Leu),
phenylalanine (Phe) and so on, the amino acids having hydrophilic
side chains, such as serine (Ser) and threonine, exhibit the different
order of preferences in population for the three regions, such as
g– >g+ >t, which was captured by our GBEMP model The good
performance of GBEMP model in these studies should be credited
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

484 A Physics-Based Coarse-Grained Model with Electric Multipoles

to the point multipoles being included in the backbone Gay–Berne


particles representing peptide groups.
The conformational properties of polyalanine with different
lengths have been investigated experimentally and theoretically
[66–71]. The alpha helix conformation sampled from MD simula-
tions of 5-mer polyalanine using our GBEMP model achieved com-
parable results to atomistic simulations using various force fields
(AMBER, CHARMM and OPLSAA/L) [70, 71]. Although the alpha-
helix population was overestimated by computational approaches
according to the experimental studies of 5-mer polyalanine [71],
such as circular dichroism (CD) spectroscopy and Fourier-transform
infrared (FTIR) experiment, the distributions sampled from GBEMP
simulations appear to be in qualitative agreement with what was
obtained from atomistic models [49]. As a matter of fact, it is
understandable for the reasonable agreement between GBEMP and
atomistic models because they both share the same developing
philosophy. Similarly, the qualitative agreement between GBEMP
and atomistic models could be achieved in the study of simulating
12-mer polyalanine [49].
In our recent research [50], we have extended our GBEMP
model to model proteins. From protein data bank (PDB), we
randomly selected two small proteins with different topologies
to test the quality of our GBEMP model. One is actinobacterial
transcription factor RbpA (ATF-RbpA) (PDB ID: 2M6O) [72], which
is a small protein containing 48 amino acid residues, and another
is 2-Mercaptophenol-alpha3C (2M-alpha3C) (PDB ID: 2LXY) [73]
that is a small protein containing 67 amino acid residues. ATF-
RbpA consists of two anti-parallel beta sheets and 2M-alpha3C is
composed of three alpha helices forming a bundle, as shown in Fig.
13.3. From the GBEMP simulations of the two proteins, we observed
that the native structures of the proteins were well maintained,
and the stability of the two proteins was majorly contributed by
electrostatic interactions. Moreover, the overall pattern of backbone
Cα root-mean-square fluctuation (RMSF) landscape observed in the
atomistic simulations could be reconstructed by our GBEMP model,
showing good quality of our GBEMP model in modeling proteins.
Specifically, in the case of ATF-RbpA, the high flexible regions
associated with the N-terminal and C-terminal residues could be
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

GBEMP Model for Proteins 485

Figure 13.3 Cartoon representations for (A) actinobacterial transcription


factor RbpA (ATF-RbpA) (PDB ID:2M6O), (B) 2-Mercaptophenol-alpha3C
(2M-apha3C) (PDB ID: 2LXY).

identified by our GBEMP model; the two flexible regions of 2M-


alpha3C related to the segments (residues 22–26 and residues 44–
48), which have been identified by atomistic simulations, can be
qualitatively detected by our GBEMP simulations.
In considering the computational efficiency of GBEMP MD
simulations, we have compared the speed of our GBEMP model with
that of AMOEBA and CHARMM models in simulating 20 different
kinds of dipeptides in implicit solvents (GK implicit solve model
[74] was used in the GBEMP and AMOEBA simulations while GB/SA
implicit solve model [75] was applied to the CHARMM simulations),
and our GBEMP model can achieve 50∼200-fold speedup compared
to AMOEBA model and 10∼50-fold speedup compared to CHARMM
model, depending on the type of amino acids. For instance, in the
case of Glycine (Gly) dipeptide, the GBEMP simulation can achieve
about 50-fold speedup compared to the AMOEBA simulation and
about 10-fold speedup compared to the CHARMM simulation; in
the case of Tryptophan (Trp) dipeptide, our GBEMP model is able
to achieve about 200-fold speedup compared to AMOEBA model
and 50-fold speedup compared to CHARMM model. Furthermore,
we also compared the efficiency between GBEMP and AMBER
models in the MD simulations of two test proteins (actinobacterial
transcription factor RbpA and 2-mercaptophenol-alpha3C), we
discovered that our GBEMP simulations (in GK implicit solvent
model) can achieve the speedup factor of about 50 compared to
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

486 A Physics-Based Coarse-Grained Model with Electric Multipoles

AMBER simulations (in TIP3P explicit solvent model [76]). As such,


in terms of speed and efficiency, our GBEMP model underperforms
medium-resolution MARTINI model by the factor of about 2 [77] but
it can outperform high-resolution PRIMO CG model by the factor of
about 2 [36].

13.5 Summary

Our GBEMP model, based on the anisotropic Gay–Berne and point


electric multipole potentials, takes into account the anisotropic
nature of a CG particle as well as the reasonable approximation
to the charge distribution of its corresponding all-atom model.
The anisotropic Gay–Berne and explicit EMP potentials should
provide more accurate description of the non-bonded interactions
between CG particles than some physics-based potentials based on
spherical representation of CG particles and point charge models.
In this chapter, we have shown the encouraging performance of
our GBEMP model in calculating the dipole moment of different
kinds of dipeptides, in evaluating the electrostatic interactions
between dipeptides and water molecules in different orientations, in
estimating the non-bonded interactions between dipeptide dimers
(either homodimers or heterodimers) and in investigating the
stability and dynamics of small proteins. Moreover, the introduction
of the anisotropic Gay–Berne and EMP potentials does not increase
much computational burden to coarse-grained simulations when
you have shown that our GBEMP model can achieve the speedup
factor of 10∼200, depending on specific cases and atomistic models
(AMOEBA, or AMBER or CHARMM) The successful application of
GBEMP model in modeling proteins encourage us to extend our
GBEMP model to other biomolecules, such as lipids and nucleic
acids, since lipids are important materials for the membrane of
a cell and nucleic acids are carrying genetic information. Some
preliminary results for parameterizing GBEMP potentials for lipids
and nucleic acids have shown the promising future of our GBEMP
model.
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

References 487

Acknowledgments

Guohui Li thanks grants supported from the Ministry of Science


and Technology of China (863 project No. 2012AA01A305; 973
project No. 2012CB721002) and the National Science Foundation of
China under Contract No. 31370714, 91227126). “Hundreds Talent
Program” of Chinese Academy of Sciences.

References

1. McCammon, J. A., Gelin, B. R. and Karplus, M. (1977). Dynamics of folded


proteins, Nature, 267, pp. 585–590.
2. Karplus, M., and McCammon, J. A. (2002). Molecular dynamics simula-
tions of biomolecules, Nat. Struct. Biol., 9, pp. 646–652.
3. Karplus, M., and Kuriyan, J. (2005). Molecular dynamics and protein
function, Proc. Natl. Acad. Sci. U. S. A., 102, pp. 6679–6685.
4. Adcock, S. A., and McCammon, J. A. (2006). Molecular dynamics: A
survey of methods for simulating the activity of proteins, Chem. Rev.,
106, pp. 1589–1615.
5. Sherwood, P., Brookes, B. R., Sansom, M. S. P. (2008) Multiscale methods
for macromolecular simulations, Curr. Opin. Struct. Biol., 18, pp. 630–
640.
6. Ayton, G. S., and Voth, G. A. (2009). Systematic multiscale simulation of
membrane protein systems, Curr. Opin. Struct. Biol., 19, pp. 138–144.
7. Tozzini, V. (2010). Multiscale modeling of proteins, Acc. Chem. Res., 43,
pp. 220–230.
8. Tozzini, V. (2005). Coarse-grained models for proteins, Curr. Opin. Struct.
Biol, 15, pp. 144–150.
9. Merchant, B. A., and Madura, J. D. (2011). A review of coarsegrained
molecular dynamics techniques to access extended spatial and temporal
scales in biomolecular simulations, Annu. Rep. Comput. Chem., 7, pp. 67–
85.
10. Shen, H., Xia, Z., Li, G., and Ren, P. (2012). A review of physicsbased
coarse-grained potentials for the simulations of protein structure and
dynamics, Annu. Rep. Comput. Chem., 8, pp. 129–148.
11. Tanaka, S., and Scheraga, H. A. (1976). Medium- and long-range
interaction parameters between amino acids for predicting three-
dimensional structures of proteins, Macromolecules, 9, pp. 945–950.
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

488 A Physics-Based Coarse-Grained Model with Electric Multipoles

12. Eisenberg, D., and Mclachlan, A. D. (1986). Solvation energy in protein


folding and binding, Nature, 319, pp. 199–203.
13. Miyazawa, S., and Jernigan, R. L. (1985). Estimation of effective
interresidue contact energies from protein crystal structures: Quasi-
chemical approximation, Macromolecules, 18, pp. 534–552.
14. Wang, J., and Wang, W. (1999). A computational approach to simplifying
the protein folding alphabet, Nat. Struct. Biol., 6, pp. 1033–1038.
15. Wilson, C.; Doniach, S. (1989). A computer model to dynamically
simulate protein folding: Studies with crambin, Proteins, 6, pp. 193–209.
16. Godzik, A., and Skolnick, J. (1992). Sequence-structure matching in
globular proteins: Application to supersecondary and tertiary structure
determination, Proc. Natl. Acad. Sci. U. S. A. 89, pp. 12098–12102
17. Bryant, S. H., and Lawrence, C. E. (1993). An empirical energy function
for threading protein sequence through the folding motif, Proteins:
Struct. Funct. Genet., 16, pp. 92–112.
18. Skolnick, J., Jaroszewski, L., Kolinski, A., and Godzik, A. (1997).
Derivation and testing of pair potentials for protein folding. When is the
quasichemical approximation correct? Protein Sci., 6, pp. 676–688.
19. Thomas, P. D., and Dill, K. A. (1996). Statistical potentials extracted from
protein structures: How accurate are they? J. Mol. Biol, 257, pp. 457–
469.
20. Mullinax, J. W., and Noid, W. G. (2010). Recovering physical potentials
from a model protein databank, Proc. Natl. Acad. Sci. U. S. A., 107, pp.
19867–19872
21. Ben-Naim, A (1997). Statistical potentials extracted from protein
structures: Are these meaningful potentials? J Chem Phys, 107, pp.
3698–3706.
22. Levitt, M., and Warshel, A. (1975). Computer simulation of protein
folding, Nature, 235, pp. 694–698.
23. Reith, D. M., Pütz, M., and Müller-Plathe, F. (2003). Deriving effective
mesoscale potentials from atomistic simulations, J. Comput. Chem., 24,
pp. 1624–1636.
24. Izvekov, S., and Voth, G. A. (2005). A multiscale coarse-graining method
for biomolecular systems, J. Phys. Chem. B, 109, pp. 2469–2473.
25. Tozzini, V., Rocchia, W., and McCammon, J. A. (2006). Mapping all-atom
models onto one-bead coarse-grained models: General properties and
applications to a minimal polypeptide model, J. Chem. Theory Comput. 2,
pp. 667–673.
February 15, 2016 12:8 PSP Book - 9in x 6in 13-Qiang-Cui-c13

References 489

26. Tozzini, V., Trylska, J., Chang, C., and McCammon, J. A. (2007). Flap
opening dynamics in HIV-1 protease explored with a coarse-grained
model, J. Struct. Biol. 157, pp. 606–615.
27. Trylska, J., Tozzini, V., Chang, C., and McCammon, J. A. (2007). HIV-
1 protease substrate binding and product release pathways explored
with coarsegrained molecular dynamcis, Biophys. J., 92, pp. 4179–
4187.
28. Sorenson, J. M., and Head-Gordon, T. (2002). Toward minimalist models
of larger proteins: An ubiquitin-like protein, Proteins: Struct Func Genet.,
46, pp. 368–379.
29. Liwo, A., Khalili, M., and Scheraga, H. A. (2005). Ab initio simulations
of protein-folding pathways by molecular dynamics with the united-
residue model of polypeptide chains, Proc. Natl. Acad. Sci. U. S. A. 102,
pp. 2362–2367.
30. Shen, H., Liwo, A., and Scheraga, H. A. An improved functional form for
the temperature scaling factors of the components of the mesoscopic
UNRES force field for simulations of protein structure and dynamics, J.
Phys. Chem. B. 113, pp. 8738–8744.
31. Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P., and de
Vries, A. H. (2007). The MARTINI forcefield: Coarse grained model for
biomolecular simulations, J. Phys. Chem. B. 111, pp. 7812–7824.
32. Monticelli, L., Kandasamy, S. K., Periole, X., Larson, R. G., Tieleman, D.
P., and Marrink, S. J. (2008). The MARTINI coarsegrained force field:
Extension to proteins, J. Chem. Theory Comput. 4, pp. 819–834.
33. Periole, X., Cavalli, M., Marrink, S. J., and Ceruso, M. A. (2009).
Combining an elastic network with a coarse-grained molecular force
field: Structure, dynamics and intermolecular recognition, J. Chem.
Theory Comput. 5, pp. 2531–2543.
34. Shen, H., Moustafa, I. M., Cameron, C. E., and Colina, C. M. (2012).
Exploring the dynamics of four RNA-dependent RNA polymerases by a
coarse-grained model, J. Phys Chem. B, 116, pp. 14515–14524.
35. Wassenaar, T.A., Ingolfsson, H. I., Prieß, M., Marrink, S. J., and Schafer, L.
V. (2013) Mixing MARTINI: Electrostatic coupling in hybrid atomistic–
coarsegrained biomolecular simulations, J. Phys. Chem. B, 117, pp.
3516–3530.
36. Kar, P., Gopal, S. M., Cheng, Y., Predeus, A., and Feig, M. (2013). PRIMO:
A transferable coarsegrained force field for proteins, J. Chem. Theory
Comput. 9, pp. 3769–3788.
January 27, 2016 15:46 PSP Book - 9in x 6in 13-Qiang-Cui-c13

490 A Physics-Based Coarse-Grained Model with Electric Multipoles

37. Han, W., Wan, C-K., Jiang, F., and Wu, Y. (2012). PACE force field
for protein simulations. 1. Full parameterization of version 1 and
verification, J. Chem. Theory Comput., 6, pp. 3373–3389.
38. Ayton, G. S., and Voth, G. A. (2009). A hybrid coarse-graining approach
for lipid bilayers at large length and time scales, J. Phys. Chem. B, 113,
pp. 4413–4424.
39. Izvekov, S., and Voth, G. A. (2005). Multiscale coarse graining of liquid-
state systems, J. Chem. Phys., 123, pp. 134105.
40. Yesylevskyy, S. O., Schäfer, L. V., Sengupta, D., and Marrink, S. J. (2010).
Polarizable water model for the coarse-grained MARTINI force field,
PLoS Comput. Biol., 6, pp. e1000810.
41. Wu, Z., Cui, Q., and Yethiraj, A. (2010). A new coarse-grained model for
water: The importance of electrostatic interactions, J. Phys. Chem. B,
114, pp. 10524–10529.
42. Bratko, D., Blum, L., and Luzar, A. (1985). A simple-model for the
intermolecular potential of water, J. Chem. Phys., 83, pp. 6367–6370.
43. Liu, Y., and Ichiye, T. (1996). Soft sticky dipole potential for liquid water:
A new model, J. Chem. Phys, 100, pp. 2723–2730.
44. Ichiye, T., and Tan, M. L. (2006). Soft sticky dipole-quadrupole-octupole
potential energy function for liquid water: An approximate moment
expansion, J. Chem. Phys., 124, pp. 134504.
45. Golubkov, P. A., and Ren, P. (2006). Generalized coarse-grained model
based on point multipole and Gay–Berne potentials, J. Chem. Phys., 125,
pp. 064103
46. Golubkov, P. A., Wu, J. C., and Ren, P. (2008). A transferable coarse-
grained model for hydrogen bonding liquids, Phys. Chem. Chem. Phys.,
10, pp. 2050–2057.
47. Xu, P., Shen, H., Yang, L., Ding, Y., Li, B., Shao, Y., Mao, Y., and Li, G.
(2013). Coarse-grained simulations for organic molecular liquids based
on Gay–Berne and electric multipole potentials, J. Mol. Mod., 19, pp. 551–
558.
48. Xu, P., Tang, Y., Zhang, J., Zhang, Z., Wang, K., Shao, Y., Shen, H., and
Mao, Y. (2011). Molecular dynamics simulation of organic solvents
based on the coarse-grained model, Acta Phys. Sin., 27, pp. 1839–
1846.
49. Wu, J., Xia, Z., Shen, H., Li, G., and Ren, P. (2011). Gay–Berne and
electrostatic multipole based coarse-grain potential in implicit solvent,
J. Chem. Phys., 135, pp. 155104.
February 15, 2016 12:21 PSP Book - 9in x 6in 13-Qiang-Cui-c13

References 491

50. Shen, H., Li, Y., Ren, P., Zhang, D., and Li, G. (2014). An anisotropic coarse-
grained model for proteins based on Gay–Berne and electric multipole
potentials, J. Chem. Theory Comput. (in press)
51. Allinger, N. L., Yuh, Y. H., and Lii, J. H. (1989). Molecular mechanics. The
MM3 force field for hydrocarbons. 1, J. Am. Chem. Soc., 111, pp. 8551–
8566.
52. Yue, S., Xia, Z., Zhang, J., Best, R., Wu, C., Ponder, J. W., and Ren, R. (2013).
Polarizable atomic multipole-based AMOEBA force field for proteins, J.
Chem. Theory Comput., 9, pp. 4046–4063.
53. Duan, Y., Wu, C., Chowdhury, S., Lee, M. C., Xiong, G., Zhang, W., Yang, R.,
Cieplak, P., Luo, R., Lee, T., Caldwell, J., Wang, J., and Kollman, P. (2003). A
point-charge force field for molecular mechanics simulations of proteins
based on condensed-phase quantum mechanical calculations, J. Comp.
Chem., 24, pp. 1999–2012.
54. MacKerell, A. D., Feig, M., and Brooks, C. L. (2004). Improved treatment
of the protein backbone in empirical force fields, J. Am. Chem. Soc., 126,
pp. 698–699.
55. Gay, J. G., and Berne, B. J. (1981). Modification of the overlap potential to
mimic a linear site-site potential, J. Chem. Phys., 74, pp. 3316–3319.
56. Cleaver, D. J., Care, C. M., Allen, M. P., and Neal, M. P. (1996). Extension and
generalization of the Gay–Berne potential, Phys. Rev. E., 54, pp. 559–567.
57. Berne, B. J., and Pechukas, P. (1972). Gaussian model potential for
molecular interactions, J. Chem. Phys., 56, pp. 4213–4216.
58. Ren, P., and Ponder, J. W. (2003). Polarizable atomic multipole water
model for molecular mechanics simulation, J. Phys. Chem. B, 107, pp.
5933–5947.
59. Wheatley, R. J., and Mitchell, J. B. O. (1994). Gaussian multipoles in
practice: Electrostatic energies for intermolecular potentials, J. Comp.
Chem., 15, pp. 1187–1198.
60. Stone, A. J. (2011). Electrostatic Damping Functions and the Penetration
Energy, J. Phys. Chem. A, 115, pp. 7017–7027.
61. Stone, A. J., and Alderton, M. (1985). Distributed multipole analysis:
Methods and applications, Mol. Phys., 56, pp. 1047–1064.
62. Ramachandran, G. N., Ramakrishnan, C., and Sasisekharan, V. (1963).
Stereochemistry of polypeptide chain configurations, J. Mol. Biol., 7, pp.
95–99.
63. Ting, D., Wang, G., Shapovalov, M., Mitra, R., Jordan, M. I., and
Dunbrack, R. L. (2010). Neighbor-dependent Ramachandran probability
February 15, 2016 12:21 PSP Book - 9in x 6in 13-Qiang-Cui-c13

492 A Physics-Based Coarse-Grained Model with Electric Multipoles

distributions of amino acids developed from a hierarchical Dirichlet


process model, PLoS Comput. Biol., 6, pp. e1000763.
64. Buchete, N. V., Straub, J. E., and Thirumalai, D. (2003). Anisotropic
coarse-grained statistical potentials improve the ability to identify
native-like protein structures, J. Chem. Phys. 118, pp. 7658–7671.
65. Buchete, N. V., Straub, J. E., and Thirumalai, D. (2004). Orientational
potentials extracted from protein structures improve native fold
recognition, Protein Sci. 13, pp. 862–74.
66. Chou, P. Y., and Fasman, G. D. (1974). Conformational parameters for
amino acids in helical, β-sheet, and random coil regions calculated from
proteins, Biochemistry, 13, pp. 211–222.
67. Richardson, J. S., and Richardson, D. C. (1988). Amino acid preferences
for specific locations at the ends of alpha helices, Science, 240, pp. 1648–
1652.
68. Hudgins, R. R., Ratner, M. A., and Jarrold, M. F. (1998). Design of helices
that are stable in vacuo, J. Am. Chem. Soc. 120, pp. 12974–12975.
69. Levy, Y., Jortner, J., and Becker, O. M. (2001). Solvent effects on the energy
landscapes and folding kinetics of polyalanine, Proc. Natl. Acad. Sci. USA,
98, pp. 2188–2193.
70. Best, R. B., Buchete, N. V., and Hummer, G. (2008). Are current molecular
dynamics force fields too helical? Biophys. J. 95, pp. L07–L09.
71. Hegefeld, W. A., Chen, S. E., DeLeon, K. Y., Kuczera, K., and Jas, G. S.
(2010). Helix formation in a pentapeptide: Experiment and force-field
dependent dynamics, J. Phys. Chem. A, 114, pp. 12391–12402.
72. Tabib-Salazar, A., Liu, B., Doughty, P., Lewis, R. A., Ghosh, S., Parsy, M. L.,
Simpson, P. J., O’Dwyer, K., Matthews, S. J., and Paget, M. S. (2013). The
actinobacterial transcription factor RbpA binds to the principal sigma
subunit of RNA polymerase, Nucleic Acids Res., 41 pp. 5679–5691.
73. Tommos, C., Valentine, K. G., Martinez-Rivera, M. C., Liang, L., and
Moorman, V. R. (2013). Reversible phenol oxidation and reduction in the
structurally well-defined 2-mercaptophenol-α3 C protein, Biochemistry,
52, pp.1409–1418.
74. Schnieders, M., and Ponder, J. W. (2007). Polarizable atomic multipole
solutes in a generalized kirkwood continuum, J. Chem. Theory Comput.,
3, pp. 2083–2097.
75. Qui, D., Shenkin, P., Hollinger, F., and Still, W. (1997). The GB/SA
continuum model for solvation. A fast analytical method for the
calculation of approximate Born radii, J. Phys. Chem. A., 101, pp. 3005–
3014.
February 15, 2016 12:21 PSP Book - 9in x 6in 13-Qiang-Cui-c13

References 493

76. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., and Klein,
M. L. (1983). Comparison of simple potential functions for simulating
liquid water, J. Chem. Phys., 79, pp. 926–935.
77. Gu, J., Bai, F., Li, H., and Wang, X. (2012). a generic force field for protein
coarse-grained molecular dynamics simulation, Int. J. Mol. Sci. 13, pp.
14451–14469.
This page intentionally left blank
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

Chapter 14

Coarsed-Grained Membrane Force Field


Based on Gay–Berne Potential and
Electric Multipoles

Dejun Lin and Alan Grossfield


Department of Biochemistry and Biophysics,
University of Rochester Medical Center,
601 Elmwood Ave, Box 712, Rochester, New York 14620, USA
alan grossfi[email protected]

14.1 Introduction

Membranes play a vital role in many biological processes. They


separate cytoplasm from extracellular environment and provide
basic compartmentalization of intra-cellular processes. They also
regulate the exchange of material and information between the
enclosed cell and its environment. Biomembranes’ dynamic and
structural properties as well as their interaction with other cellular
components, such as membrane proteins, have long been an active
research field. The motions of biomembranes span a wide range
of temperal and spatial scales (Jacobson et al., 2007; Phillips

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

496 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

et al., 2009; Vereb et al., 2003), angstrom to micron, picosecond to


microsecond. Various computational models have been developed
to gain insight in the multiscale motions of biomembranes. The all-
atom force fields such as GROMACS (Berger et al., 1997; Chiu et al.,
2009) and CHARMM (Feller and MacKerell, 2000; Klauda et al., 2012,
2010; Lim et al., 2012) give details of atomic interactions between
membrane lipids and proteins. At the other extreme are finite-
element models that describe large-scale mechanical properties of
membranes (Chen et al., 2008; Ma et al., 2009; Tang et al., 2006,
2008). The all-atom and continuum models represent two ends of
a spectrum of multiscale modeling, where one can trade spatial
resolution and fidelity for computational performance depending on
the problem at hand. Between this two ends are the coarse-grained
(CG) models, where neighboring atoms are grouped and treated as
an individual interaction site or superatom. The CG models preserve
a reasonable level of details about molecular interactions while
dramatically improving computational performance in two crucial
ways: CG models have far fewer interactions (thus reducing the
cost per time step), and they involve moving heavier particles on
smoother potential energy surfaces (allowing a larger time step).
The computational efficiency of these CG models and their ability
to capture large-scale properties make simulations of membrane
assembly and vesicle fusion possible (Marrink et al., 2007; Orsi and
Essex, 2011; Risselada et al., 2008; Wu et al., 2011b).
An important aspect of modeling biomembranes is to reproduce
their electrostatic properties, which are well known to play
important roles in many biological processes (Cladera et al., 2003;
McLaughlin, 1989; O’Shea, 2003; Rokitskaya et al., 2002; Starke-
Peterkovic et al., 2005). However, for the sake of computational
efficiency, many CG force fields oversimplify or abstract out
completely the treatment of electrostatics (Elezgaray and Laguerre,
2006; Izvekov and Voth, 2005a; Kranenburg et al., 2004; Lyubartsev,
2005; Marrink et al., 2004, 2007; Shelley et al., 2001). A good
example is the unpolarizable water model in the MARTINI force field,
where four water molecules are grouped into one van der Waals
particle with no electrostatic interactions. For this reason, MARTINI
has serious flaws when representing electrostatic interactions and
solvation of charged particles. For example, it underestimates the
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

Introduction 497

free energy barrier of transferring arginine side chains into the


membrane by 10–15 kcal mol−1 (Vorobyov et al., 2008). Another
example is that the electrostatic interactions between the 1-
Palmitoyl-2-Oleoyl-sn-glycero-3-PhosphoGlycerol (POPG) lipid head
groups (negatively charged) and the lysine residues (positively
charged) in an antimicrobial lipopeptide micelle (C16-KGGK) are
exaggerated so that they co-crystallize into a two-dimensional
lattice (Horn et al., 2012).
Attempts have been made to improve the MARTINI force field
by introducing water models with explicit electrostatics, i.e., the
polarizable MARTINI water (PMW) (Yesylevskyy et al., 2010) and
the big multipole water (BMW) model (Wu et al., 2011b). Both
of them qualitatively improve the description of the electrostatic
properties of membrane bilayers, particularly those at the water–
membrane interface. However, further improvements of these
models are still required. For example, the potentials of mean force
(PMFs) of inserting charged amino acid side chains into membrane
calculated from the BMW-MARTINI models do not agree with those
calculated from atomic simulations (Wu et al., 2011b). Also, the
general lack of electrostatics in other uncharged but polar CG
particles, e.g., the glycerol ester, might cause issues in a complex
system where electrostatic interactions are important.
Another CG lipid force field worth mentioning is the ELBA force
field, which explicitly model electrostatics by including dipoles
in the glycerol-ester region of lipids and use the soft sticky
dipole (SSD) water model (Liu and Ichiye, 1996). The ELBA
model impressively predicted some electrostatic and mechanical
properties of membranes (Orsi and Essex, 2011) but the orientation
of the lipid head groups does not agree with the atomic model.
Also, the contribution from water to the electrostatic potential as
a function of the membrane normal does not agree with all-atom
model either. This is likely due to the lack of hydrogen bonding
network between water and the lipid head groups; the SSD water
models inter-water hydrogen bonds with an effective energy term,
but the ELBA force field does not have an equivalent term between
water and lipids.
The CG lipid model by Izvekov et al. (Izvekov and Voth, 2005a,b)
took a totally different approach. Where the other potentials were
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

498 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

parametrized in a top-down fashion to match specific experimental


quantities, Izvekov and co-workers instead worked bottom-up,
requiring the model to reproduce the mean forces from all-
atom simulations. In fact, it represents a general class of CG
methods termed the inversion methods where the CG models
are parametrized to reproduce the underlying PMFs from all-
atom simulations (Lyubartsev and Laaksonen, 1995; Murtola et al.,
2009; Reith et al., 2001). In those methods, the electrostatics
are implicitly modeled in the CG effective potentials and are not
easily deconvolved from other therms. We refer the reader to
a recent review for a more detailed discussion (Cisneros et al.,
2014). We do want to point out here, however, that inversion
methods, while arguably more rigorously based, suffer from limited
transferability, because the PMFs used to derive them are highly
dependent on the conditions under which they are obtained.
Moreover, these calculations require an enormous amount of data
to converge for even moderately complex systems. Also, the lack
of independent terms limits the ability of subsequent simulations
divine the intrinsic root of behavior, e.g., attributing phenomena to
electrostatics vs. van der Waals.

14.2 GBEMP: A Coarse-Grained Model Based on the


Gay–Berne Potential and Electric Multipoles

Recently, a CG force field, termed GBEMP, was proposed by


Ren’s group to model small molecules (Golubkov and Ren, 2006;
Golubkov et al., 2008), peptides, and proteins (Shen et al., 2014;
Wu et al., 2011a). This force field has two main innovations over
previous approaches. First, standard spherical van der Waals are
replaced with an elliptical potential, represented using the Gay–
Berne function (Cleaver et al., 1996). As a result, each CG particle
is associated with three rotational degrees of freedom (DOFs) in
addition to the three translational DOFs. The extra DOFs will help
suppress the configurational entropy lost due to coarse-graining,
and give the particle a “shape” that can more effectively represent
the distribution of the underlying atoms. Second, where standard CG
models often have electrostatics only when the atoms have a formal
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

Application of the GBEMP Model to Lipid Membranes 499

charge, this model uses a multipole expansion, giving each particle


a charge, dipole, and quadrupole moment. As a result, the model
can capture electrostatic interactions between polar but uncharged
moieties, for example the peptide backbone or the glycerol region
of a lipid. Finally, the GBEMP model can be rigorously reverse-
coarse-grained: one can trivially rebuild an all-atom representation
from the coarse-grained one without degeneracy; this is often a
problem when combining all-atom and CG approaches. GBEMP
can do so because the initial parameterization involves explicit
mapping of the local and global reference frames of the CG particle
and the underlying all-atom representation (Golubkov and Ren,
2006; Golubkov et al., 2008). This offers a portal to a multiscale
modeling scheme where the CG part is used to make prediction
of large-scale motion while the atomic part is used to refine
structural information. The force field is parametrized to reproduce
the fundamental intermolecular forces between CG particles and
then calibrated using experimental liquid-phase thermodynamic
data of the model compounds much as all-atom force fields are
parametrized by combining quantum calculations and experiment.
This strategy allows the parameters to be transferable to different
systems and environment. The force field is able to reproduce the
macroscopic properties of the model compounds as compared to
experimental data (Golubkov and Ren, 2006; Golubkov et al., 2008;
Wu et al., 2011a). It also gives reasonable results in representing
intermolecular interactions, such as PMFs along the torsional angles
as well as the dipole moments of dipeptides, as compared to atomic
simulations (Shen et al., 2014). Although GBEMP is significantly
more computationally costly than the simplest CG models, e.g.,
MARTINI, it is expected to be significantly faster than the equivalent
AA model (Wu et al., 2011a).

14.3 Application of the GBEMP Model to Lipid


Membranes

As discussed in Section 14.1, the oversimplified representation


of electrostatics is a major source of errors in CG modeling of
membranes. Accordingly, we have extended the GBEMP force field to
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

500 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

model membrane lipids. Since the formalism of the GB potential and


EMP expansion has been described extensively in previous work by
Ren’s group (Wu et al., 2011a) and the chapter in the present volume
by Li and Shen (Chapter 13 “A Physics-Based Coarse-Grained Model
with Electric Multipoles”). Here, we follow a similar parametrization
procedure, with a few slight modifications. The bonded terms such
as bond stretching, angle bending and torsion are parametrized
by fitting the potential energy function (quadratic functions for
bond and angle and three-term Fourier series for torsion) to
the distribution of the respective terms generated from all-atom
simulations (Wu et al., 2011a). The procedure for parameterizing the
non-bonded terms is described in detail below.

14.3.1 Group Neighboring Heavy Atoms into CG Particles


To maximize the efficiency of the CG model while preserving
physical fidelity, we define each CG particle in our model to consist
of the maximal number of heavy atoms that preserve the shape of
the underlying atom group. The shape of an atom group is defined as
the probability distribution of the first and the average of the second
and third principal moments of inertia (MOIs). This probability
distribution is generated from a 700 ns all-atom simulation of 150
POPE and 50 POPG lipids using the CHARMM27 force field (Feller
and MacKerell, 2000). Figure 14.1 shows this distribution of CG
groups representing 3, 4 or 5 carbons as in the hydrocarbon tails
of 1-Palmitoyl-2-Oleoyl-sn-glycero-3-PhosphoEthanolamine (POPE)
and POPG lipids. While the 3-carbon group results in a single peak,
the 4- and 5-carbon groups give distributions with much larger
variance, which means they do not have a single well-defined shape
during the simulations; this makes sense, since the presence of
rotatable carbon–carbon bonds allows the moieties to significantly
elongate and contract over the course of a simulation. Similar
analysis on CG groups other than carbons shows that three to four
heavy atoms per CG particle is the optimal choice of coarse-graining.
As a result, we have chosen to segment lipids into CG particles in the
manner described in Fig. 14.2; the figure shows a POPC molecule,
but other lipids are segmented analogously.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

Application of the GBEMP Model to Lipid Membranes 501

A B
principal components (Dalton*Å )
2

0.5 0.35 0.5 0.14


Average of the 2nd and 3rd

A B 0.12
0.3 0.4
0.4 0.1

Probability
0.25 0.3

Probability
0.08
0.3 0.2
0.2 0.06
0.2 0.15 0.04
0.1
0.1 0.02
0.1 0 0
0.05 0 0.5 1 1.5 2 2.5 3 3.5 4
0 0 1st principal component (Dalton*Å2)
0 0.5 1 1.5 2 2.5 3 3.5 4
principal components (Dalton*Å )
2

C
Average of the 2nd and 3rd

0.5 0.1
C 0.09
0.4 0.08

Probability
0.07
0.3 0.06
0.05
0.2 0.04
0.03
0.1 0.02
0.01
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4
2
1st principal component (Dalton*Å )

Figure 14.1 Distribution of the principal moments of inertia of a 3- (a), 4-


(b), or 5- (c) carbon coarse-grained group.

Figure 14.2 Segmentation of an all-atom (ball-and-stick) POPC molecule


into CG particles (ellipsoids) with hydrocarbon CG group in grey, ester in
red, glycerol backbone in white, phosphate in orange and choline in purple.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

502 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

6
All−atom cross
Coarse−grained cross
5 All−atom T−shape
Coarse−grained T−shape
All−atom side−by−side
Potential energy (kcal/mol)

4 Coarse−grained side−by−side
All−atom end−to−end
Coarse−grained end−to−end
3

−1
3.5 4 4.5 5 5.5 6 6.5 7
Center−of−mass distance (Å)

Figure 14.3 The potential energy of a dimer of propyl groups from all-atom
and coarse-grained models as a function of the center-of-mass distances
between the two monomers of different configurations. Refer to Cleaver
et al. (1996) for the definitions of the configurations. Note that the T-shape,
side-by-side and end-to-end curves are shifted up by 1, 2, and 3 kcal/mol,
respectively.

14.3.2 Derive Initial Parameters from Gas-Phase


Calculations
To parametrize the Gay–Berne potential, a homo-dimer of a coarse-
grained group in a set of canonical configurations as defined previ-
ously (Cleaver et al., 1996) are generated. Then, the corresponding
all-atom VDW potential energy of each configuration is calculated
using the AMOEBA force field (Ren et al., 2011). A minimization
algorithm seeded by an initial guess of the Gay–Berne parameters
is used to optimize the Gay–Berne parameters to reproduce the
energies for all configurations. Figure 14.3 shows an example fit,
comparing the Gay–Berne and all-atom AMOEBA potentials for a
dimer of propyl groups in gas phase. As expected, the Gay–Berne
potential does a good job representing the differences between the
various dimer orientations, something that would be impossible for
a standard spherical CG particle.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

Application of the GBEMP Model to Lipid Membranes 503

Figure 14.4 Electrostatic potential difference between the coarse-grained


and all-atom model of (A) a phosphate group and (B) a water molecule
contoured at 0.5 kcal (mol*electron)−1 . Crosses are points on the x- or y-
axis and the distance between two neighboring crosses is 1 Å. Molecules
are rendered in VDW spheres with oxygen in red, phosphorus in orange and
hydrogen in white.

The electric multipole moments are derived from ab initio quan-


tum mechanics calculation. First, the model chemical compound is
used to represent a certain type of CG particle. The structure of
this compound is taken from the Cambridge Structural Database
(CSD) (Allen, 2002). From there, the derivation of EMP moments
follows the procedure used to parameterize the AMOEBA force
field (Ren et al., 2011).
Figure 14.4 shows the difference of electrostatic potential (ESP)
between the coarse-grained models and the corresponding all-atom
models of a phosphate group (A) and a water molecule (B) in gas
phase. The coarse-grained ESP is calculated from a point multipole
expansion at the phosphorus and the center of mass, respectively,
while the all-atom ESP is calculated from multipole expansions at
each atom. In principle, the closer to the atoms, the poorer the
approximations of the coarse-grained models become. Thus, in the
example shown in Fig. 14.4, at about 3 Å away from the VDW surface
of the molecules, the error in the coarse-grained ESP is roughly 0.5
kcal/(mol·electron) and is much smaller at longer distance.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

504 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

14.3.3 Validate and Adjust Parameters by Liquid-Phase


Simulations
The above procedure used to derive the parameters is focused on
gas-phase interactions. As such, the resulting parameters must be
adjusted in order to be applicable to condensed-phase simulations.
In our case, the condensed phase is typically a liquid crystalline
bilayer solvated by water. For this reason, we refined our parameters
using structures from a 1-μs all-atom simulation of a POPC
membrane bilayer; for each frame, we calculated the induced dipole
moments on each atom using the permanent EMP moments derived
in Section 14.3.2; the calculation was performed using AMOEBA
force field (Ren et al., 2011) as implemented in Tinker (Ponder,
2010). We then averaged the dipole moments, combining results
for all equivalent atoms in all frames. Following the procedure used
in the parameterization of GBEMP model of small molecules and
proteins, (Golubkov and Ren, 2006; Golubkov et al., 2008; Shen
et al., 2014; Wu et al., 2011a), we added half of the resulting dipole
moments to each corresponding CG permanent EMP, so that the
resulting ESP is equal to that from the permanent and induced
EMP plus polarization energy. Next, we performed condensed-
phase simulations with the adjusted EMP parameters and gas-
phase GB parameters and compared the results to experimental
thermodynamic measurements, including the density and heat
of vaporization; we adjusted the GB parameters to improve the
match. The experimental values are taken from reference (Lide,
2007). Figure 14.5 shows the resulting densities and enthalpies of
vaporization for different CG hydrocarbon polymers as percentage
deviation from the corresponding experimental values.
Note that the parameter space of a CG particle is limited as in any
CG force field and might not potentially describe all the properties of
a molecule. Also, the aforementioned methods are applicable to any
CG particle in general but the resulting parameters might need be
adjusted to describe the important properties of lipid membranes,
such as the area per lipid of a bilayer and the lipid–water interfacial
electrostatic potential.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

Implement the GBEMP Force Field in LAMMPS 505

5 Density
Enthalpy of Vaporization

4
from experimental values
Percentage deviation

0
6 7 8 9 10 11 12 13 14 15
Number of carbons

Figure 14.5 Calculated density (solid line) and enthalpy of vaporization


(dashed line) profile of hexane, nonane, dodecane, and pentadecane shown
in percentage deviation from the experimental values.

14.4 Implement the GBEMP Force Field in LAMMPS

The original implementation of the GBEMP force field was based on


a modified version of Tinker (Wu et al., 2011a); while this code base
relatively clean and easy to work with, Tinker is not parallelized,
making it unsuitable for simulating large (on the order of 104 to
105 particles) complex systems in condensed phase. Moreover, the
Tinker implementation is only maintained as in-house code in the
Ren lab and is not publicly distributed. To take the advantage of
parallel computing and make the model available to more users,
we decided to re-implement it in an open-source software with
parallelization.
The GBEMP model requires some additional infrastructure
beyond that found in standard molecular dynamics codes. For
example, because the CG particles are elliptical, they have a
net orientation that must be tracked and integrated during the
simulation; this is most easily done using a rigid-body integrator.
For this reason, we chose LAMMPS (Large-scale Atomic/Molecular
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

506 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

Massively Parallel Simulator) (Plimpton, 1995) as our platform;


it is highly modular, implements diverse set of force fields and
potential models, and already had support for both the Gay–
Berne potential and rigid-body integration. Moreover, LAMMPS is
open-source, facilitating its distribution to the greater simulation
community. We implemented most of the remaining features of
GBEMP, including the multiple off-center bonded (bond, angle and
torsion) and non-bonded (GB and EMP) potentials on each CG
site. The current implementation benefits from the parallelized
computing in LAMMPS and as such offers a significant performance
boost compared to the serial implementation in Tinker. Currently,
only the NVE ensemble is supported, but support of the NVT and
NPT ensembles should be straightforward.

14.5 Discussion

Accurately reproducing electrostatic interactions in a coarse-


grained force field has always been a challenge (Cisneros et al., 2014)
due mainly to the reduced number of parameters. Electrostatics
are often oversimplified such that only moieties with a formal
charge make them. As discussed in Section 14.1, this can degrade
their performance, particularly in simulation of lipid membranes,
which contain a broad range of dielectric environments. Other CG
models are derived from structural (Lyubartsev and Laaksonen,
1995; Murtola et al., 2009; Reith et al., 2001) data such as radial
distribution functions (RDFs) or analogously, PMFs (Izvekov and
Voth, 2005a,b) obtained from higher-resolution all-atom simula-
tions. In such cases, the electrostatics are captured implicitly, making
it difficult to extract the electrostatic contribution from either the
effective potentials or the resulting simulation.
For this reason, we propose the GBEMP lipid model, which
retains much of the rigor of the structure-based models while having
interpretable and transferable terms. GBEMP offers significant
advantages when applied to lipids, because there are many moieties
in lipids that are polar but have no formal charge (e.g., the glycerol
backbone); electrostatic interactions by these groups are neglected
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

Discussion 507

in most coarse-grained models, but are captured by the multipole


expansion used to represent electrostatics in GBEMP (Golubkov and
Ren, 2006; Golubkov et al., 2008; Shen et al., 2014; Wu et al., 2011a).
These advantages are crucial in the simulations of lipid membrane
systems where the water-lipid interfacial electrostatic properties
are important in many biological processes (Cladera et al., 2003;
McLaughlin, 1989; O’Shea, 2003; Rokitskaya et al., 2002; Starke-
Peterkovic et al., 2005). Here we provide a robust parametrization
procedure for the GBEMP lipid model based on the one proposed
before (Wu et al., 2011a), and discuss its implementation in
LAMMPS (Plimpton, 1995).
However, some limitations remain in the present implementa-
tion, particularly with regard to condensed phase simulations. First
of all, at present only the NVE ensemble is supported, although the
extension to other ensembles should be straightforward. Second,
only short-range electrostatics is included; although the formalism
for implementing long-range interaction methods such as Particle-
Mesh Ewald are well known (Sagui et al., 2004), and working
implementations exist (e.g., in Tinker), the process is numerically
challenging, particularly with regards to efficient parallelization.
That said, it is worth mentioning that in some cases a cut-off scheme
is adequate and some of the long-range methods also have their
own issues (Cisneros et al., 2014). Third, the GBEMP protein model
developed by Ren’s group is intended for use with the Generalized-
Kirkwood implicit solvent (Shen et al., 2014; Wu et al., 2011a); the
latter would be challenging to combine with an explicit membrane
model of the type developed here, while the former might require
some parameter adjustment to optimize the interaction potential
although the majority of the work should involve only the addition
of induced dipole moments in condensed phase. Finally, only a
POPC lipid model has been developed and we have not tested it in
condensed phase simulations at the time of this writing.
In the future, we will parameterize more lipid species (e.g.,
POPE, POPG, cholesterol, etc.) and perform rigorous validation and
optimization of the parameters. We will also include more features
in our implementation of the model to make it more amenable to
general application.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

508 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

Acknowledgment

We thank the Center for Integrated Research Computing at the


University of Rochester for providing computational resource in our
research.

References

Allen, F. H. (2002). The cambridge structural database: A quarter of a million


crystal structures and rising. Acta Crystallogr B 58, Pt 3 Pt 1, pp. 380–
388.
Berger, O., Edholm, O., and Jähnig, F. (1997). Molecular dynamics simulations
of a fluid bilayer of dipalmitoylphosphatidylcholine at full hydration,
constant pressure, and constant temperature. Biophys J 72, 5, pp. 2002–
2013, doi:10.1016/S0006-3495(97)78845-3, URL http://dx.doi.org/
10.1016/S0006-3495(97)78845-3.
Chen, X., Cui, Q., Tang, Y., Yoo, J., and Yethiraj, A. (2008). Gating
mechanisms of mechanosensitive channels of large conductance, i:
A continuum mechanics-based hierarchical framework. Biophys J 95,
2, pp. 563–580, doi:10.1529/biophysj.107.128488, URL http://dx.doi.
org/10.1529/biophysj.107.128488.
Chiu, S.-W., Pandit, S. A., Scott, H. L., and Jakobsson, E. (2009). An improved
united atom force field for simulation of mixed lipid bilayers. J Phys
Chem B 113, 9, pp. 2748–2763, doi:10.1021/jp807056c, URL http://
dx.doi.org/10.1021/jp807056c.
Cisneros, G. A., Karttunen, M., Ren, P., and Sagui, C. (2014). Classical
electrostatics for biomolecular simulations. Chem Rev 114, 1, pp.
779–814, doi:10.1021/cr300461d, URL http://dx.doi.org/10.1021/
cr300461d.
Cladera, J., O’Shea, P., Hadgraft, J., and Valenta, C. (2003). Influence of mole-
cular dipoles on human skin permeability: Use of 6-ketocholestanol to
enhance the transdermal delivery of bacitracin. J Pharm Sci 92, 5, pp.
1018–1027, doi:10.1002/jps.10344, URL http://dx.doi.org/10.1002/
jps.10344.
Cleaver, D. J., Care, C. M., Allen, M. P. and Neal, M. P. (1996). Extension
and generalization of the gay-berne potential. Phys Rev E Stat Phys
Plasmas Fluids Relat Interdiscip Topics 54, 1, pp. 559–567, doi:10.1103/
PhysRevE.54.559, URL http://link.aps. org/doi/10.1103/PhysRevE.
54.559.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

References 509

Elezgaray, J., and Laguerre, M. (2006). A systematic method to derive


force fields for coarse-grained simulations of phospholipids, Computer
Physics Communications 175, 4, pp. 264–268, doi:http://dx.doi.org/10.
1016/j.cpc.2006.01.009, URL http://www.sciencedirect.com/science/
article/pii/S0010465506001585.
Feller, S. E., and MacKerell, A. D. (2000). An improved empirical potential
energy function for molecular simulations of phospholipids, The
Journal of Physical Chemistry B 104, 31, pp. 7510–7515, doi:10.1021/
jp0007843, URL http://pubs.acs.org/doi/abs/10.1021/jp0007843.
Golubkov, P. A., and Ren, P. (2006). Generalized coarse-grained model based
on point multipole and gay-berne potentials. J Chem Phys 125, 6,
p. 64103, doi:10.1063/1.2244553, URL http://dx.doi.org/10.1063/1.
2244553.
Golubkov, P. A., Wu, J. C., and Ren, P. (2008). A transferable coarse-grained
model for hydrogen-bonding liquids. Phys Chem Chem Phys 10, 15, pp.
2050–2057, doi:10.1039/b715841f, URL http://dx.doi.org/10.1039/
b715841f.
Horn, J. N., Sengillo, J. D., Lin, D., Romo, T. D., and Grossfield, A. (2012).
Characterization of a potent antimicrobial lipopeptide via coarse-
grained molecular dynamics. Biochim Biophys Acta 1818, 2, pp. 212–
218, doi:10.1016/j.bbamem.2011.07.025, URL http://dx.doi.org/10.
1016/j.bbamem.2011.07.025.
Izvekov, S., and Voth, G. A. (2005a). A multiscale coarse-graining method for
biomolecular systems. J Phys Chem B 109, 7, pp. 2469–2473, doi:10.
1021/jp044629q, URL http://dx.doi.org/10.1021/jp044629q.
Izvekov, S., and Voth, G. A. (2005b). Multiscale coarse graining of liquid-state
systems. J Chem Phys 123, 13, p. 134105, doi:10.1063/1.2038787, URL
http://dx.doi.org/10.1063/1.2038787.
Jacobson, K., Mouritsen, O. G., and Anderson, R. G. W. (2007). Lipid rafts: At a
crossroad between cell biology and physics, Nat Cell Biol 9, 1, pp. 7–14,
URL http://dx.doi.org/10.1038/ncb0107-7.
Klauda, J. B., Monje, V., Kim, T., and Im, W. (2012). Improving the charmm
force field for polyunsaturated fatty acid chains. J Phys Chem B 116,
31, pp. 9424–9431, doi:10.1021/jp304056p, URL http://dx.doi.org/
10.1021/jp304056p.
Klauda, J. B., Venable, R. M., Freites, J. A., O’Connor, J. W., Tobias, D. J.,
Mondragon-Ramirez, C., Vorobyov, I., MacKerell, A. D., Jr and Pastor,
R. W. (2010). Update of the CHARMM all-atom additive force field
for lipids: validation on six lipid types. J Phys Chem B 114, 23, pp.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

510 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

7830–7843, doi:10.1021/jp101759q, URL http://dx.doi.org/10.1021/


jp101759q.
Kranenburg, M., Nicolas, J.-P., and Smit, B. (2004). Comparison of mesoscopic
phospholipid-water models, Phys. Chem. Chem. Phys. 6, pp. 4142–4151,
doi:10.1039/B406433J, URL http://dx.doi.org/10.1039/B406433J.
Lide, D. R. (2007). CRC Handbook of Chemistry and Physics, Internet Version
2007, (87th Edition) (Taylor and Francis, BocaRaton, FL), URL http:
/www.hbcpnetbase.com.
Lim, J. B., Rogaski, B., and Klauda, J. B. (2012). Update of the cholesterol force
field parameters in charmm. J Phys Chem B 116, 1, pp. 203–210, doi:
10.1021/jp207925m, URL http://dx.doi.org/10.1021/jp207925m.
Liu, Y., and Ichiye, T. (1996). Soft sticky dipole potential for liquid water:
A new model, J Phys Chem 100, 7, pp. 2723–2730, doi:10.1021/
jp952324t, URL http://pubs.acs.org/doi/abs/10.1021/jp952324t.
Lyubartsev and Laaksonen (1995). Calculation of effective interaction
potentials from radial distribution functions: A reverse Monte Carlo
approach. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics
52, 4, pp. 3730–3737.
Lyubartsev, A. P. (2005). Multiscale modeling of lipids and lipid bilayers. Eur
Biophys J 35, 1, pp. 53–61, doi:10.1007/s00249-005-0005-y, URL http:
//dx.doi.org/10.1007/s00249-005-0005-y.
Ma, L., Yethiraj, A., Chen, X., and Cui, Q. (2009). A computational framework
for mechanical response of macromolecules: Application to the salt
concentration dependence of dna bendability. Biophys J 96, 9, pp. 3543–
3554, doi:10.1016/j.bpj.2009.01.047, URL http://dx.doi.org/10.1016/
j.bpj.2009.01.047.
Marrink, S. J., de Vries, A. H., and Mark, A. E. (2004). Coarse grained model
for semiquantitative lipid simulations, The Journal of Physical Chemistry
B 108, 2, pp. 750–760, doi:10.1021/jp036508g, URL http://pubs.acs.
org/doi/abs/10.1021/jp036508g.
Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P., and de Vries,
A. H. (2007). The MARTINI force field: coarse grained model for
biomolecular simulations. J Phys Chem B 111, 27, pp. 7812–7824, doi:
10.1021/jp071097f, URL http://dx.doi.org/10.1021/jp071097f.
McLaughlin, S. (1989). The electrostatic properties of membranes. Annu
Rev Biophys Biophys Chem 18, pp. 113–136, doi:10.1146/annurev.
bb.18.060189.000553, URL http://dx.doi.org/10.1146/annurev.bb.18.
060189.000553.
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

References 511

Murtola, T., Bunker, A., Vattulainen, I., Deserno, M., and Karttunen, M. (2009).
Multiscale modeling of emergent materials: biological and soft matter.
Phys Chem Chem Phys 11, 12, pp. 1869–1892, doi:10.1039/b818051b,
URL http://dx.doi.org/10.1039/b818051b.
Orsi, M., and Essex, J. W. (2011). The ELBA force field for coarse-grain
modeling of lipid membranes. PLoS One 6, 12, p. e28637, doi:10.
1371/journal.pone.0028637, URL http://dx.doi.org/10.1371/journal.
pone.0028637.
O’Shea, P. (2003). Intermolecular interactions with/within cell membranes
and the trinity of membrane potentials: kinetics and imaging. Biochem
Soc Trans 31, Pt 5, pp. 990–996, doi:10.1042/, URL http://dx.doi.org/
10.1042/.
Phillips, R., Ursell, T., Wiggins, P., and Sens, P. (2009). Emerging roles for
lipids in shaping membrane-protein function. Nature 459, 7245, pp.
379–385, doi:10.1038/nature08147, URL http://dx.doi.org/10.1038/
nature08147.
Plimpton, S. (1995). Fast parallel algorithms for short-range molecular
dynamics, Journal of Computational Physics 117, 1, pp. 1–19, doi:http://
dx.doi.org/10.1006/jcph.1995.1039, URL http://www.sciencedirect.
com/science/article/pii/S002199918571039X.
Ponder, J. W. (2010). Tinker molecular modeling package, v5.1; washington
university medical school: St. louis, mo, URL http://dasher.wustl.edu/
ffe/.
Reith, D., Meyer, H., and Müller-Plathe, F. (2001). Mapping atomistic to
coarse-grained polymer models using automatic simplex optimization
to fit structural properties, Macromolecules 34, 7, pp. 2335–2345,
doi:10.1021/ma001499k, URL http://pubs.acs.org/doi/abs/10.1021/
ma001499k.
Ren, P., Wu, C., and Ponder, J. W. (2011). Polarizable atomic multipole-based
molecular mechanics for organic molecules. J Chem Theory Comput 7,
10, pp. 3143–3161, doi:10.1021/ct200304d, URL http://dx.doi.org/10.
1021/ct200304d.
Risselada, H. J., Mark, A. E., and Marrink, S. J. (2008). Application of
mean field boundary potentials in simulations of lipid vesicles,
The Journal of Physical Chemistry B 112, 25, pp. 7438–7447,
doi:10.1021/jp0758519, URL http://pubs.acs.org/doi/abs/10.1021/
jp0758519, pMID: 18512884.
Rokitskaya, T. I., Kotova, E. A., and Antonenko, Y. N. (2002). Membrane dipole
potential modulates proton conductance through gramicidin channel:
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

512 Coarsed-Grained Membrane Force Field Based on Gay–Berne Potential

movement of negative ionic defects inside the channel. Biophys J 82,


2, pp. 865–873, doi:10.1016/S0006-3495(02)75448-9, URL http://dx.
doi.org/10.1016/S0006-3495(02)75448-9.
Sagui, C., Pedersen, L. G., and Darden, T. A. (2004). Towards an accurate
representation of electrostatics in classical force fields: efficient
implementation of multipolar interactions in biomolecular simulations.
J Chem Phys 120, 1, pp. 73–87, doi:10.1063/1.1630791, URL http://dx.
doi.org/10.1063/1.1630791.
Shelley, J. C., Shelley, M. Y., Reeder, R. C., Bandyopadhyay, S., and Klein,
M. L. (2001). A coarse grain model for phospholipid simulations, The
Journal of Physical Chemistry B 105, 19, pp. 4464–4470, doi:10.1021/
jp010238p, URL http://pubs.acs.org/doi/abs/10.1021/jp010238p.
Shen, H., Li, Y., Ren, P., Zhang, D. and Li, G. (2014). An anisotropic
coarse-grained model for proteins based on gay-berne and electric
multipole potentials, J Chem Theory Comput 10, 2, pp. 731–750,
doi:10.1021/ct400974z, URL http://dx.doi.org/10.1021/ ct400974z.
Starke-Peterkovic, T., Turner, N., Else, P. L., and Clarke, R. J. (2005). Electric
field strength of membrane lipids from vertebrate species: membrane
lipid composition and Na+-K+-ATPase molecular activity. Am J Physiol
Regul Integr Comp Physiol 288, 3, pp. R663–R670, doi:10.1152/
ajpregu.00434.2004, URL http://dx.doi.org/10.1152/ajpregu.00434.
2004.
Tang, Y., Cao, G., Chen, X., Yoo, J., Yethiraj, A., and Cui, Q. (2006). A
finite element framework for studying the mechanical response of
macromolecules: Application to the gating of the mechanosensitive
channel mscl. Biophys J 91, 4, pp. 1248–1263, doi:10.1529/biophysj.
106.085985, URL http://dx.doi.org/10.1529/biophysj.106.085985.
Tang, Y., Yoo, J., Yethiraj, A., Cui, Q., and Chen, X. (2008). Gating mechanisms
of mechanosensitive channels of large conductance, ii: systematic study
of conformational transitions. Biophys J 95, 2, pp. 581–596, doi:10.
1529/biophysj.107.128496, URL http://dx.doi.org/10.1529/biophysj.
107.128496.
Vereb, G., Szöllősi, J., Matkó, J., Nagy, P., Farkas, T., Vigh, L., Mátyus, L.,
Waldmann, T. A., and Damjanovich, S. (2003). Dynamic, yet structured:
The cell membrane three decades after the Singer-Nicolson model.
Proc Natl Acad Sci U S A 100, 14, pp. 8053–8058, doi:10.1073/pnas.
1332550100, URL http://dx.doi.org/10.1073/pnas.1332550100.
Vorobyov, I., Li, L., and Allen, T. W. (2008). Assessing atomistic and coarse-
grained force fields for protein-lipid interactions: the formidable
January 27, 2016 15:47 PSP Book - 9in x 6in 14-Qiang-Cui-c14

References 513

challenge of an ionizable side chain in a membrane. J Phys Chem B


112, 32, pp. 9588–9602, doi:10.1021/jp711492h, URL http://dx.doi.
org/10.1021/jp711492h.
Wu, J., Zhen, X., Shen, H., Li, G., and Ren, P. (2011a). Gay-berne and
electrostatic multipole based coarse-grain potential in implicit solvent.
J Chem Phys 135, 15, p. 155104, doi:10.1063/1.3651626, URL http:
//dx.doi.org/10.1063/1.3651626.
Wu, Z., Cui, Q., and Yethiraj, A. (2011b). A new coarse-grained force field for
membranepeptide simulations, J Chem Theory Comput 7, 11, pp. 3793–
3802, doi:10.1021/ct200593t, URL http://pubs.acs.org/doi/abs/10.
1021/ct200593t.
Yesylevskyy, S. O., Schäfer, L. V., Sengupta, D., and Marrink, S. J. (2010). Po-
larizable water model for the coarse-grained MARTINI force field. PLoS
Comput Biol 6, 6, p. e1000810, doi:10.1371/journal.pcbi.1000810, URL
http://dx.doi.org/10.1371/journal.pcbi.1000810.
This page intentionally left blank
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Chapter 15

RNA Coarse-Grained Model Theory

David Bell and Pengyu Ren


Department of Biomedical Engineering,
Section of Integrative Biology and Center for Computational Biology
and Bioinformatics, Institute for Cellular and Molecular Biology,
The University of Texas at Austin, 107 W. Dean Keeton St. Stop C0800,
Austin, Texas 78712, USA
[email protected]

The prediction of RNA 3D structure is an important yet challenging


task. Techniques used to predict RNA structure must balance com-
putational expense with desired accuracy. One technique that has
arisen to meet this balance is that of coarse-grained models. Here,
we discuss existing frameworks involving coarse-grained modeling
of RNA 3D structure. Basic RNA structure is first introduced. Then,
the coarse-grained models are divided into two groups: fragment
based and physics based. The models are presented in detail
to understand the advances and limitations of the field. Finally,
conclusions, challenges, and avenues for improvement are tendered,
understanding that coarse-grained modeling techniques fill a salient
niche in the field of RNA structure prediction.

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

516 RNA Coarse-Grained Model Theory

15.1 Introduction

There has been long-lasting interest in coarse-grained models for


biomolecules such as proteins and lipid membranes [1–4]. Due to
the polar nature of lipids and the folding nature of proteins, the
hydrophobic/hydrophilic properties of these molecules become the
focus of modeling interactions such as aggregation and assembly [5,
6]. Nucleic acids are highly charged molecules, similarly consisting of
charged, hydrophilic backbone and hydrophobic bases. The pairing
and stacking of the bases is a driving force for helix formation
while the charge-charge interactions, tuned by the counter ion
concentration in the surrounding medium, also determines the
compactness of the structures.
RNA structure prediction is a technique aiming to capture the
goal of entering in molecular sequence information and obtaining
an accurate, convenient three-dimensional RNA structure. The need
for RNA structure arises because RNA function is directly dependent
on molecular structure. For instance, riboswitches are short mRNA
sequences that act to bind small metabolites. Once they bind to
these actors, the riboswitches change their structure in order to
regulate the transcription or translation of the mRNA [7]. Often,
the associated mRNA or protein act on the metabolite bound to the
riboswitch. The riboswitch hence acts as an environmental sensor
for the cell. The binding domain of a riboswitch is often a loop or
pseudoknot region [7], determinable only by an accurate secondary
structure.
Higher order structures are necessary to understand the function
of ribozymes (ribonucleic acid enzymes). Ribozymes function in
diverse ways, from hydrolytic reactions in cleaving nucleic acids to
peptidyl-transferase activity in building proteins. How the ribozyme
folds into its 3-D structure determines which catalytic sites such
as metal ions and hydroxyl residues are available [8]. Though a
wide array of RNAs are undergoing clinical trials for therapeutic
applications (for a review, see [9]), a cited clinical study showed
favorable results for a therapeutic ribozyme [10].
In this chapter, we first discuss the primary and secondary
structure of RNA in Section 15.2. Next, we present RNA tertiary
structure as well as some discussion of RNA folding in Section 15.3.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Primary and Secondary Structure 517

We divide coarse-grained RNA structure prediction models into two


categories: fragment-based models examined in Section 15.4 and
physics-based models examined in Section 15.5 (force fields) and
Section 15.6 (model comparison). We then conclude our review and
present future goals and challenges for the field in Section 15.7.

15.2 Primary and Secondary Structure

The field of RNA structure prediction utilizes the terms primary


and secondary structure to refer to molecular sequence and base-
pairing information, respectively. The sequence of an RNA molecule
is comprised of the standard A, U, G, and C nucleotides.
The nucleotide adenosine monophosphate is shown in Fig. 15.1.
Adenosine along with the nucleotide guanosine is classified as a
purine due to the presence of the indole group. Uridine and cytosine
are known as pyrimidines due to the presence of the single benzene
group. The phosphate and sugar groups comprise the RNA backbone
and are present for each nucleotide. The directionality of RNA is
shown by the carbon atoms in the sugar ring. RNA sequences are
hence presented 5 to 3 corresponding to the direction along the
RNA according to the attached phosphates and the position of the
nucleobase. The nucleobase is attached to the 1 carbon on the sugar
ring. The assorted nucleobases determine the nucleotide “type” and
the corresponding sequence of a molecule.
The phosphate group of a nucleotide results in a highly charged
(negative) backbone. This limits the bending flexibility under low
salt conditions. In vitro, magnesium ions are often introduced to
facilitate RNA folding, as a magnesium ion has a stronger charge
(+2) than simple sodium (+1), magnesium works particularly well
to screen backbone phosphates and allow the folding of an RNA
molecule.
The aromatic ribose group of the backbone lends it’s name to
RNA: ribonucleic acid. The ribose group maintains a “sugar pucker,”
which effects the endo/exo configuration of the ring members.
Lastly, the nucleobases are able to base-pair due to the elec-
tronegative atoms on their edges, which leads to hydrogen bonding
of the nucleotides. Another noteworthy feature of nucleobases
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

518 RNA Coarse-Grained Model Theory

NH2

N
N

O
OH
N
P N
-
O
O 5'
O
4' 1'
3' 2'

O OH

O P O-

O-

Figure 15.1 Structure of adenosine nucleotide. The ribose group is labeled


showing the 5 to 3 directionality.

is that their aromatic groups lend themselves to pi–pi stacking


interactions, termed base stacking. A recent experiment showed that
the base-stacking interactions exerted slightly higher stability over
base pairing [11].

15.3 Three-Dimensional Structure

Following the hierarchical nomenclature of protein structure, RNA


tertiary structure represents a more global layout than secondary
structure alone. The “kissing loops” motif is a prime example of
an RNA tertiary structure contact not regularly seen in secondary
structure [12]. In sequence and in secondary structure, the two
loops of the motif may be distant. In the tertiary structure, the loops
are seen to be close in space and form a hydrogen bond. Hence, the
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Three-Dimensional Structure 519

tertiary structure elucidates more global interactions than sequence


or secondary structure alone. It is important to note that tertiary
structure contacts do not mean a full three-dimensional structure.
In some cases, the tertiary structure contacts could be drawn onto a
given RNA secondary structure. Due to this slight difference, RNA
3D structure is taken to mean the global, molecular view and is
sometimes favored by structure prediction models.
Computationally, the prediction of RNA three-dimensional struc-
ture is a large cost increase over secondary structure prediction.
Dynamic programming, the general engine of many secondary
structure models such as Mfold [13], was created for efficiency
[14]. Expanding the structure into three-dimensions with multiple
interaction potentials requires a large increase in computation. In
larger RNA molecules, the three-dimensional structure is too large to
be effectively sampled by conventional molecular dynamics. Several
simulation protocols have been devised to go around the barriers of
molecular dynamics. For instance, increasing the salt concentration
(specifically magnesium ions) will act to screen the highly charged
phosphate groups of the backbone. This will facilitate RNA folding.
Another method used to overcome molecular dynamics limitations
is that of increasing the temperature of the system, both through
a simulated annealing protocol and a replica exchange molecular
dynamics protocol. A simulated annealing involves increasing the
temperature very high and then slowly allowing the system to
“cool” into its lowest energy state. The drawback with using this
routine is that the system could become trapped in a meta-stable
state and unable to move into the lowest energy state. Replica
exchange molecular dynamics (REMD) is a protocol that periodically
switches the system between high and low temperatures. REMD has
found use in several of the models presented here. Ultimately, these
routines and protocols help to increase the efficiency of molecular
dynamics, but they are not sufficient to sample physiologically
relevant regimes.
Coarse-grained (CG) models are poised to mediate this conflict
of computational expense by discretizing the RNA molecule. In a CG
model, the nucleotides are not represented at the atomic resolution,
but are instead represented by a parameterized set of “beads” or
pseudoatoms. The number of pseudoatoms is less than the full set of
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

520 RNA Coarse-Grained Model Theory

nucleotide atoms. This requires fewer interactions and calculations


to be performed, decreasing the computational expense and the
computation time. Noticeably, the pseudoatom approximation will
not be able to capture all of the atomic interactions. Hence, ensuring
the CG models are accurate and can capture all of the interactions of
interest to biologists is key to developing a successful model.
It is straightforward to convert an all-atom nucleotide to a CG
representation: Most groups define a pseudoatom to be located on
the site of a nucleotide atom [15–19], or the pseudoatom is placed at
a center-of-mass location [20–23]. The more challenging task comes
from converting a CG representation back to an all-atom nucleotide.
It is necessary to convert back to the all-atom representation (1)
in order for developers to compare the all-atom structure to the
experimental structure (if it exists) and (2) to show a physical
representation with practical meaning to structural biologists. Due
to the aromatic ribose groups and nucleobases, much of a nucleotide
is not linear but planar. Defining the planarity of the aromatic rings
becomes challenging when working with a linear representation
of pseudoatoms. Some models have overcome this limitation by
modeling aromatic groups as a triangle, capturing the plane of
the ring group [17, 18]. Other models slowly add atoms back into
the structure, checking for clashes and overlaps of atoms. Despite
the various methods used to convert CG nucleotides to all-atom
representations, most models employ some scheme of molecular
dynamics–based minimization. This minimization allows the atoms
to relax slightly and assume a more physical structure.
There are two aspects to consider in the prediction nucleic acid
3D structure. One is the model used to represent the chemistry and
physics of nucleic acids; the other is the sampling algorithm that
provides the optimal solution, i.e. “best” structure, given the model.
In this chapter, we divide coarse-grained models into two categories:
library based and physics based. As of writing this chapter and
focusing on CG protocols, we evaluated four fragment library–
based models and eight physics-based models. If the CG criteria
were negated, fragment library based models would outnumber
physics-based models. We will present some typical examples of
these models in each category while also discussing the sampling
approach used along these models.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Fragment Library–Based Models 521

Figure 15.2 Generalized flow diagram of fragment-based models. (a)


Sequence information for PDB ID: 4JRC. The sequence is divided into (b)
coarse-grained sequence fragments, which are then (c) aligned to a pre-
assembled structure library. (d) The best-fit structures are chosen and
then (e) combined and minimized to form (f) the final three-dimensional
structure. Disclaimer: this diagram is generalized and not comprehensive to
all fragment-based models.

15.4 Fragment Library–Based Models

RNA structure prediction models that utilize a database of solved


3-D structures and structure motifs to piece together new structures
are for this work considered fragment library–based (FB) models.
FB models stem from the field of protein structure prediction; the
authors point to [24] for a review of these protein structure models.
In the RNA structure review by Laing and Schlick [25], the FB model
MC-Sym was favored over all other models and techniques, proving
that FB models retain pertinence to RNA structure prediction. The
typical simulation scheme for FB models is presented in Fig. 15.2,
taken for the RNA molecule PDB ID: 4JRC [26].
The centrality of FB models is the collection or library of
pre-solved nucleotide structures and nucleotide motifs. In coarse-
grained models, the representation of the nucleotides is reduced
so that only a few positions per nucleotide are of interest. In the
FB model BARNACLE [27], the seven backbone torsion angles are
specifically denoted in each fragment. When searching preexisting
RNA structures, backbone torsion information can give great detail
about the all-atom, global structure. BARNACLE also employs a base
type term (A, U, G, C) to capture sequence information. The majority
of the model is still dependent on the backbone conformation.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

522 RNA Coarse-Grained Model Theory

The library of MC-SYM [28] is built up of termed nucleotide


cyclic motifs (NCMs). Each NCM is up to 8 nt long and captures
such structures as bulge loops and hairpins. Once assembled onto
a molecule, the NCMs overlap each other by one base pair, allowing
for each NCM to “fit” to neighboring NCMs. The CG nature of the
model appears in construction of the NCMs. The backbone atoms
of the phosphate and ribose groups are compiled independently of
the nucleobase atomic structures. Both structures contain the atoms
necessary to capture glycosidic bonds between the ribose group
and the nucleobase. The nucleobases are then fit to the backbone
assemblies via alignment of the overlapping glycosidic bonds.
The model RSIM [29] maintains a library of 3 nt fragments.
When assembled onto a molecule, the fragments overlap each other
by 1 nt. Unlike other models, the fragments are saved at three
resolutions: 1 backbone bead per nt (located at C1 ), 1 backbone
bead plus nucleobase atoms, and all-atom structures. Once chosen
for the given RNA molecule, the fragments are first represented at
the lowest resolution of 1 bead per nt. In succeeding steps, the
nucleobase atoms are assembled for the fragments followed by all
other backbone atoms. In this way, the model is able to avoid steric
clashes and steep energy gradients for a non-minimized structure.
FARNA/FARFAR is an RNA structure prediction model that
employs a library of 3 nt fragments assembled from the crystal
structure of the large ribosomal subunit rRNA from PDB ID:
1FFK [30]. The fragments resolve the seven backbone torsion
angles as well as the sugar pucker and whether the base is
purine or pyrimidine. The nucleobases are added to the backbone
fragments and aligned (planarity, twist, etc.) according to the
model’s energy function. Nucleobase parameters such as ideal bond
length, used when assembling the all-atom structure, were taken
from the Nucleic Acid Database (http://ndbserver.rutgers.edu). In
the FARFAR extension, an all-atom force-field is employed after
initial fragment construction in order to minimize the structure.
Further comparison of the four FB models discussed here is shown
in Table 15.1.
Once FB models collect the RNA fragments to utilize in the RNA
structure, the fragments are combined and assembled following the
RNA sequence. Following this, FB models employ a Monte Carlo
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Coarse-Grained Force field 523

Table 15.1 Comparison of fragment-based models

Model Nucleotide fragment Parameterization method Simulation method Reference


MC-SYM Nucleotide cyclic Correct secondary structure Las Vegas [28]
motif (≤8 nt)
FARNA/ 3 nt; 7 dihedral Large ribosomal subunit; Monte Carlo [21, 22]
FARFAR angles; sugar Rosetta CG energy function
pucker
RSIM 3 nt; Doublet library from [31] Closed move Monte [29]
Carlo; graph theory
BARNACLE 7 dihedral angles Dynamic Bayesian network; Monte Carlo [27]
Maximum likelihood
estimation

type stochastic simulation to further sample novel conformations


and minimize the structure. Two of the models presented here,
RSIM and BARNACLE, employ a graph theory framework in addition
to a Monte Carlo sampling scheme. RSIM utilizes graph theory to
find the “optimal” structure out of several constructed molecules.
BARNACLE utilizes graph theory in order to determine and account
for hidden dependencies between variables. Both of these models
represent statistical frameworks that may have a favorable utility in
RNA structure prediction.

15.5 Coarse-Grained Force field

RNA structure prediction models that predominantly rely upon


molecular mechanics techniques such as molecular dynamics (MD)
are referred to here as physics-based (PB) models. These models
are based on the classical all-atom MD simulation technique, which
employs a parameterized force field, a set of energy rules, to emulate
atom interactions. In order to decrease computational expense from
the all-atom representation, CG models group multiple atoms into
beads or pseudoatoms. In a similar sense, the all-atom MD force
field must be converted to depict feasible interactions between these
pseudoatoms.
A typical CG force field utilizes an energy scheme such as
those shown in Equations 15.1–15.4. It is convenient to divide the
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

524 RNA Coarse-Grained Model Theory

energy into bonded and non-bonded terms, where the bonded terms
include harmonic potentials for distance (b − b0 ), bending angle
(θ − θ0 ), and a sinusoidal dependence on torsion (ϕ). The various
spring constants Kdist , Kangle , and Kn are parameterized specific to
each pseudoatom type.
E total = E distance + E angle + E torsion + E non-bonded (15.1)
E distance = Kdist (b − b0 )2 (15.2)
E angle = Kangle (θ − θ0 ) 2
(15.3)

3
E torsion = Kn [1+ cos (nφ − δn )] (15.4)
n=1
The non-bonded potential energy term depicts such interactions as
charged groups and van der Waals (vdW), among others. Due to
RNA’s highly charged phosphate backbone, electrostatic potentials
are commonly portrayed through a Debye–Hückel relation shown in
Equation 15.5, where D is the dielectric constant (usually of water)
and ξ is the Debye length.
 qi q j −1 − rξi j
E ele = r e (15.5)
i > j 4π D i j
The Lennard–Jones (LJ) potential of Equation 15.6 is commonly
employed to capture a span of interactions including vdW and some
electrostatics. The LJ potential shown here is denoted as a 6–12
potential due to the exponents. This potential relates the distance
between two atoms ri j , with a maximal attractive energy of ε, and a
zero attractive or repulsive energy at a “radius” of ri j = σ .
   12 
σ 6 σ
E LJ = 4ε + (15.6)
ri j ri j
The Weeks–Chandler–Andersen potential given in Equation 15.7,
acts to divide the LJ potential into attractive and repulsive potentials
[32]. Only the repulsive potential is shown below.
 1
E LJ + ε, ri j < 2 6 σ
E WCA = 1 (15.7)
0, ri j ≥ 2 6 σ
Lastly, the Buckingham potential of Equation 15.8 uses an exponen-
tial decay term in place of the 12 term in the LJ potential. In Equation
15.8, A, ρ, and C are parameterized constants.
ri j C
E Buckingham = Ae− ρ − 6 (15.8)
ri j
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Physics-Based Models 525

Figure 15.3 Generalized flow diagram of physics-based models. (a)


Sequence information for PDB ID: 4JRC is built into a (b) linear, coarse-
grained molecular structure. (c) The coarse-grained structure undergoes
molecular dynamics simulations with a parameterized force field. (d) The
optimal structure is chosen and then (e) converted to an all-atom structure,
which is minimized to form (f) the final three-dimensional structure.
Disclaimer: this diagram is generalized and not comprehensive to all
physics-based models.

15.6 Physics-Based Models

PB models retain advantages over FB models in that much of the


prediction process is physically realizable, if only to a coarse degree.
In spite of this advantage, PB models require the parameterization
of a complete force field. Further, PB models must follow simulation
protocols that are more computationally intensive than FB models.
Despite these disadvantages, there is a great deal of interest in
PB techniques, and several CG models are presented here. As an
introduction, Fig. 15.3 presents a flow diagram of a typical PB
protocol.
A distinguishing characteristic between PB models is the
representation of nucleotides; the number of pseudoatoms as well
as the geometry is often distinct between models. The coarsest
representation that will be discussed here is that of the PB model
YUP [33]/YAMMP [19] from the Harvey group. In this model, the
highest resolution is that of one pseudoatom per bead, centered
on the phosphate atom of the backbone. Harvey and coworkers
[19] emphasize that YAMMP is a refinement program based on
experimental constraints such as base-pairing contacts. Hence, their
model does not need the level of detail that an ab initio RNA folding
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

526 RNA Coarse-Grained Model Theory

model would require. Another property of the YUP/YAMMP model


is that as a refinement model, it is mainly interested in energy
minimization of the structure. This allows for the use of a Monte
Carlo sampling procedure using simple harmonic restraints rather
than the heft of a parameterized force field.
The NAST [16, 34] model represents each nucleotide by one
pseudoatom at the C3 atom of the ribose group. NAST utilizes
MD simulations and a force field parameterized from solved
rRNA structures. NAST relies upon information from an accurate
secondary structure and can also include experimental constraints.
These constraints are modeled by a harmonic energy term. The
bonded energy terms of distance, angle, and dihedral are further
modeled by a harmonic potential, parameterized according to a
Boltzmann inversion. Non-bonded interactions are modeled by a
Lennard-Jones potential with a hard sphere radii of 5 Å. Due to
the low-resolution representation of one pseudoatom per nt, the
conversion from the CG model to the all-atom model is complex and
may produce steric overlaps. In order to overcome this difficulty,
Jonikas et al. developed a program C2A [35] which is able to insert
and minimize the all atom structure.
The model iFoldRNA [23, 36], implemented on an automated
webserver found at http://iFoldRNA.dokhlab.org, employs a so-
called discrete molecular dynamics (DMD) protocol [37]. A DMD
simulation implements square well potential energy functions,
rather than the smoother harmonic energy functions. Hence, the
energy function is discretized so that only the desired states are
sampled. iFoldRNA represents each nt as three beads: one at the
center of mass of the phosphate group, one at the center of mass
of the ribose group, and one at the center of the six member ring
of the nucleobase. Besides the standard array of force-field terms
(distance, angle, dihedral, electrostatics, non-bonded), iFoldRNA
employs base stacking, base pairing, and hydrophobic energy terms.
Some nucleic acid interactions such as base pairing are often
not captured correctly by conventional force fields. One option to
include these terms then is to specifically account for them in the
energy function. iFoldRNA is the first program discussed to utilize
Replica Exchange Molecular Dynamics (REMD). For the theory of
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Physics-Based Models 527

REMD, the authors point to [38]. Briefly, REMD is an enhanced


sampling method that acts to vary the temperature stochastically
through a simulation. This allows for the RNA molecule to sample
faster and obtain a more global energy minimum structure.
Vfold [15, 39–41] is a hybrid of FB and PB models. Similar to MC-
Fold/MC-SYM, Vfold first creates a secondary structure for a given
sequence based on free energy minimization. Then, from a fragment
library of solved crystal structures, Vfold superposes 3-D A-form
helices and loop motifs onto the secondary structure. In its fragment
library, Vfold has a reduced representation of three pseudoatoms
per nt: one on the phosphate atom, one on the C4 ribose atom, and
one on the N1 or N9 nucleobase atom. Once the CG 3-D structure
is created, the rest of the nucleotide atoms are inserted into the
structure. Following this, the all atom structure is energy minimized
by MD.
The five-bead model by Xia and coworkers [17, 42] utilizes sim-
ulated annealing MD in order to produce a final 3-D structure from
sequence information alone. Simulated annealing is a temperature
protocol for MD that quickly raises the temperature of the system to
a high temperature (∼1000 K) and then slowly lowers the system
temperature back to a physiological temperature (∼298 K). This
protocol is used in order to increase the sampling of the system and
then allow for the system to cool toward a global minimum state. The
nt is represented with five beads: one on the phosphate, one on the
C4 ribose atom, and three on various atoms within the nucleobases.
The triangular base representation allows for the capture of base
stacking interactions and eases the expense to convert from the
CG model to the all atom model. The force field is derived from
statistical potentials and employs a Buckingham potential function
(Equation 15.8) for non-bonded interactions as well as a hydrogen
bonding term.
HiRe-RNA [43, 44] is a PB model that utilizes 6–7 pseudoatoms
per nt, depending if the nt is a pyrimidine or purine, respectively.
As Cragnolini et al. [43] cite, the nucleobase remains relatively rigid.
Hence, HiRe-RNA represents each purine as two beads found at
the center of mass of the two aromatic groups. The pyrimidines
are likewise represented by one bead at the center of the aromatic
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

528 RNA Coarse-Grained Model Theory

group. The backbone is then represented by 5 pseudoatoms: one on


the phosphate, and one on the O5 , C5 , C4 , and C1 of the ribose
group. HiRe-RNA employs several higher-order hydrogen bonding
terms as well as a modified Lennard Jones nonbonded potential
in their force field. No secondary structure information is required
and REMD simulations are employed to efficiently reach the final
conformation.
The model recently posited by Denesyuk and Thirumalai [20]
models each nt by three pseudoatoms located at the center of mass
of each group: phosphate, ribose, and nucleobase. The force field
employed by this model has a few terms novel to our discussion,
which are base stacking and excluded volume functions. The base-
stacking term in particular was heavily optimized according to
their previous paper. The excluded volume potential is the Weeks–
Chandler–Andersen potential of Equation 15.7, where the potential
is purely repulsive. In order to capture the attractive energies of
hydrogen bonding and base stacking, two energy terms specifically
targeted toward these interactions are used. The force field also
employs a hydrogen bonding function. Instead of MD simulations,
Denesyuk and Thirumalai utilize Langevin Dynamics simulations
with a viscosity ∼1% of water to enhance sampling.
Bernauer et al. [18] introduced a knowledge-based model that
doesn’t employ a conventional force field. Rather, interatomic
distances are constrained to stay within prescribed bounds and
follow a certain position distribution. Using these distance-based
potentials, there is no need to explicitly account for electrostatics
or other terms, as they are already captured. Bernauer et al.
have two model resolutions, a CG model and an all atom. The
CG model consists of five pseudoatoms: one at the phosphate,
C4 of the ribose, and one on the C2, C4, and C6 atoms of the
nucleobase. The knowledge-based potentials required the use of
a set of presolved structures; the authors chose these structures
based on stringent accuracy requirements. The authors score their
potentials by implementing an REMD protocol.
Table 15.2 summarizes some of the salient distinctions between
the presented PB models. Further, for ease in comparing nt
representations, Fig. 15.4 is given below showing each model’s CG
structure.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

Conclusion 529

Table 15.2 Comparison of physics-based models

Beads per
Model nucleotide Base representation Simulation method Reference
YUP/YAMMP 1 None Monte Carlo [19, 33]
NAST 1 None MD [16]
iFoldRNA 3 1 bead DMD; Replica Exchange [23, 36]
MD (REMD)
Vfold 3 1 bead Free energy minimization; [15, 39–41]
MD
Xia et al. 5 3 beads; triangle Simulated Annealing [17, 42]
shaped
HiRe-RNA 6–7 1 bead pyrimidine; REMD [43, 44]
2 beads purine
Denesyuk and 3 1 bead Langevin Dynamics [20]
Thirumalai
Bernauer et al. 5 3 beads; triangle- REMD [18]
shaped

Figure 15.4 Nucleotide representation of the physics-based CG models


discussed. The HiRe-RNA illustration taken from [44] and the Xia et al.
illustration taken from [42].

15.7 Conclusion

The rapid rise of interest in RNA biology has driven the need to
obtain novel 3-D structures from sequence information alone. CG
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

530 RNA Coarse-Grained Model Theory

RNA structure prediction models represent a tractable method to


realize this sequence-to-structure goal by balancing accuracy with
computational efficiency. Many of the discussed models and metrics
such as radius of gyration [45] are derived from successful protein
structure prediction models. As RNA biology is further quantified
and the structure prediction field progresses, new techniques to
better predict structure may arise. From the models presented,
several improvements are evident:

• Allow a variety of ion concentrations


• Accuracy for longer RNA molecules (>100 nt)
• Efficient algorithm for high-throughput applications

As discussed, RNA folding is heavily dependent on the concen-


tration of ions, which act to shield the charged phosphate groups.
In vitro experiments utilize a range of ion concentrations; hence,
structure prediction models should ideally take into account the ion
concentration of these experiments.
The fast growing field of long non-coding RNAs (lncRNAs; RNAs
>200 nt) necessitates the development of models to predict these
long RNA molecules. Current CG models are only feasible for RNA
molecules on the order of <100 nt. Conformational sampling of
these longer RNAs will continue to require enhanced sampling
techniques such as REMD. Most CG models have been trained or
parameterized to RNA molecules of fewer than 50 nt. It is not
yet validated if these models are still useful for larger RNAs that
are several hundred nt. A promising solution to the large RNA
problems is to combine the secondary structure prediction with the
3-D modeling, using the former as an efficient sampling tool while
the later provide a more sophisticated “scoring” of the second and
tertiary structures.
The increased need for RNA 3-D structures makes this an exciting
time for model developers. The effective balance of accuracy and
computational expense achieved by CG models, means that CG
models are poised to fill this niche of RNA structure prediction.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

References 531

Acknowledgments

The authors gratefully acknowledge support from the Robert A.


Welch Foundation (F-1691) and the National Institute of Health
(GM106137).

References

1. Saunders, M. G., and G. A. Voth, Coarse-graining methods for computa-


tional biology, in Annual Review of Biophysics, vol. 42, K. A. Dill, ed. 2013,
Annual Reviews: Palo Alto. pp. 73–93.
2. Saunders, M. G., and G. A. Voth, Coarse-graining of multiprotein
assemblies. Current Opinion in Structural Biology, 2012. 22(2): pp. 144–
150.
3. Takada, S., Coarse-grained molecular simulations of large biomolecules.
Current Opinion in Structural Biology, 2012. 22(2): pp. 130–137.
4. Nielsen, S. O., et al., Coarse grain models and the computer simulation
of soft materials. Journal of Physics-Condensed Matter, 2004. 16(15): pp.
R481-R512.
5. Shinoda, W., et al., Probing the structure of PEGylated-lipid assemblies
by coarse-grained molecular dynamics. Soft Matter, 2013. 9(48): pp.
11549–11556.
6. Klein, M. L., and W. Shinoda, Large-scale molecular dynamics simula-
tions of self-assembling systems. Science, 2008. 321(5890): pp. 798–
800.
7. Serganov, A., and E. Nudler, A decade of riboswitches. Cell, 2013. 152(1–
2): pp. 17–24.
8. Lilley, D. M. J., and F. Eckstein, Ribozymes and RNA catalysis: Introduc-
tion and primer, in Ribozymes and RNA Catalysis. 2007, The Royal Society
of Chemistry. Chapter 1, pp. 1–10.
9. Burnett, J. C., and J. J. Rossi, RNA-based therapeutics: Current progress
and future prospects. Chemistry & Biology, 2012. 19(1): pp. 60–71.
10. Weng, D. E., et al., A phase I clinical trial of a ribozyme-based
angiogenesis inhibitor targeting. Vascular endothelial growth factor
receptor-1 for patients with refractory solid tumors. Molecular Cancer
Therapeutics, 2005. 4(6): pp. 948–955.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

532 RNA Coarse-Grained Model Theory

11. Yakovchuk, P., E. Protozanova, and M. D. Frank-Kamenetskii, Base-


stacking and base-pairing contributions into thermal stability of the
DNA double helix. Nucleic Acids Research, 2006. 34(2): pp. 564–574.
12. Butcher, S. E., and A. M. Pyle, The molecular interactions that stabilize
RNA tertiary structure: RNA motifs, patterns, and networks. Accounts of
Chemical Research, 2011. 44(12): pp. 1302–1311.
13. Zuker, M., Mfold web server for nucleic acid folding and hybridization
prediction. Nucleic Acids Research, 2003. 31(13): pp. 3406–3415.
14. Zuker, M., and P. Stiegler, Optimal computer folding of large RNA
sequences using thermodynamics and auxiliary information. Nucleic
Acids Research, 1981. 9(1): pp. 133–148.
15. Cao, S., and S. J. Chen, Physics-based de novo prediction of RNA 3D
structures. Journal of Physical Chemistry B, 2011. 115(14): pp. 4216–
4226.
16. Jonikas, M. A., et al., Coarse-grained modeling of large RNA molecules
with knowledge-based potentials and structural filters. RNA, 2009.
15(2): pp. 189–199.
17. Xia, Z., et al., RNA 3D structure prediction by using a coarse-grained
model and experimental data. The Journal of Physical Chemistry B, 2013.
117(11): pp. 3135–3144.
18. Bernauer, J., et al., Fully differentiable coarse-grained and all-atom
knowledge-based potentials for RNA structure evaluation. RNA, 2011.
17(6): pp. 1066–1075.
19. Malhotra, A., R. K. Z. Tan, and S. C. Harvey, Modeling large RNAs
and ribonucleoprotein-particles using molecular mechanics techniques.
Biophysical Journal, 1994. 66(6): pp. 1777–1795.
20. Denesyuk, N. A., and D. Thirumalai, Coarse-grained model for predicting
RNA folding thermodynamics. Journal of Physical Chemistry B, 2013.
117(17): pp. 4901–4911.
21. Das, R., and D. Baker, Automated de novo prediction of native-like RNA
tertiary structures. Proceedings of the National Academy of Sciences,
2007. 104(37): pp. 14664–14669.
22. Das, R., J. Karanicolas, and D. Baker, Atomic accuracy in predicting and
designing noncanonical RNA structure. Nature Methods, 2010. 7(4): pp.
291–294.
23. Ding, F., et al., Ab initio RNA folding by discrete molecular dynamics:
From structure prediction to folding mechanisms. RNA, 2008. 14(6): pp.
1164–1173.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

References 533

24. Bujnicki, J. M., Protein-structure prediction by recombination of


fragments. Chembiochem, 2006. 7(1): pp. 19–27.
25. Laing, C., and T. Schlick, Computational approaches to 3D modeling of
RNA. Journal of Physics-Condensed Matter, 2010. 22(28): p. 18.
26. Grigg, J. C., et al., T box RNA decodes both the information content and
geometry of tRNA to affect gene expression. Proceedings of the National
Academy of Sciences of the United States of America, 2013. 110(18): pp.
7240–7245.
27. Frellsen, J., et al., A probabilistic model of RNA conformational space.
Plos Computational Biology, 2009. 5(6): p. 11.
28. Parisien, M., and F. Major, The MC-Fold and MC-Sym pipeline infers RNA
structure from sequence data. Nature, 2008. 452(7183): pp. 51–55.
29. Bida, J. P., and L. J. Maher, Improved prediction of RNA tertiary structure
with insights into native state dynamics. RNA, 2012. 18(3): pp. 385–393.
30. Ban, N., et al., The complete atomic structure of the large ribosomal
subunit at 2.4 angstrom resolution. Science, 2000. 289(5481): pp. 905–
920.
31. Sykes, M. T., and M. Levitt, Describing RNA structure by libraries
of clustered nucleotide doublets. Journal of Molecular Biology, 2005.
351(1): pp. 26–38.
32. Weeks, J. D., D. Chandler, and H. C. Andersen, Role of repulsive forces in
determining equilibrium structure of simple liquids. Journal of Chemical
Physics, 1971. 54(12): p. 5237.
33. Tan, R. K. Z., A. S. Petrov, and S. C. Harvey, YUP: A molecular simulation
program for coarse-grained and multiscaled models. Journal of Chemical
Theory and Computation, 2006. 2(3): pp. 529–540.
34. Chen, C. X., et al., Understanding the role of three-dimensional topology
in determining the folding intermediates of group i introns. Biophysical
Journal, 2013. 104(6): pp. 1326–1337.
35. Jonikas, M. A., R. J. Radmer, and R. B. Altman, Knowledge-based
instantiation of full atomic detail into coarse-grain RNA 3D structural
models. Bioinformatics, 2009. 25(24): pp. 3259–3266.
36. Sharma, S., F. Ding, and N. V. Dokholyan, iFoldRNA: Three-dimensional
RNA structure prediction and folding. Bioinformatics, 2008. 24(17): pp.
1951–1952.
37. Dokholyan, N. V., et al., Discrete molecular dynamics studies of the
folding of a protein-like model. Folding & Design, 1998. 3(6): pp. 577–
587.
January 27, 2016 15:53 PSP Book - 9in x 6in 15-Qiang-Cui-c15

534 RNA Coarse-Grained Model Theory

38. Sugita, Y., and Y. Okamoto, Replica-exchange molecular dynamics


method for protein folding. Chemical Physics Letters, 1999. 314(1–2):
pp. 141–151.
39. Cao, S., and S.-J. Chen, Predicting RNA folding thermodynamics with
a reduced chain representation model. RNA, 2005. 11(12): pp. 1884–
1897.
40. Cao, S., and S. J. Chen, Predicting structures and stabilities for H-type
pseudoknots with interhelix loops. RNA, 2009. 15(4): pp. 696–706.
41. Cao, S., D. P. Giedroc, and S.-J. Chen, Predicting loop–helix tertiary
structural contacts in RNA pseudoknots. RNA, 2010. 16(3): pp. 538–
552.
42. Xia, Z., et al., Coarse-grained model for simulation of RNA three-
dimensional structures. Journal of Physical Chemistry B, 2010. 114(42):
pp. 13497–13506.
43. Cragnolini, T., P. Derreumaux, and S. Pasquali, Coarse-grained simula-
tions of RNA and DNA duplexes. Journal of Physical Chemistry B, 2013.
117(27): pp. 8047–8060.
44. Pasquali, S., and P. Derreumaux, HiRE-RNA: A high resolution coarse-
grained energy model for RNA. The Journal of Physical Chemistry B,
2010. 114(37): pp. 11957–11966.
45. Parisien, M., et al., New metrics for comparing and assessing discrep-
ancies between RNA 3D structures and models. RNA, 2009. 15(10): pp.
1875–1885.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Chapter 16

Perspectives on the Coarse-Grained


Models of DNA

Ignacia Echeverriaa,b and Garegin A. Papoiana,b


a Department of Chemistry and Biochemistry,

University of Maryland, College Park, Maryland 20742, USA


b Institute for Physical Science and Technology,

University of Maryland, College Park, Maryland 20742, USA


[email protected]

16.1 Introduction

DNA is a highly charged, semi-flexible polyelectrolyte. In eukaryotic


cells, DNA chains are packed inside the microscopic volume of
the nucleus in an ordered, yet dynamical way. To overcome the
electrostatic repulsion that hinders compaction, DNA molecules
associate with a variety of counterions and proteins to condensate
into a hierarchical and tunable architecture named chromatin.
Besides DNA condensation, chromatin also plays a key role in
gene regulation by making DNA accessible for transcription in
a dynamical and specific way. Consequently, understanding the
physicochemical properties of DNA chains at the molecular and
mesoscopic scales is a required step to elucidate how cells regulate

Many-Body Effects and Electrostatics in Biomolecules


Edited by Qiang Cui, Markus Meuwly, and Pengyu Ren
Copyright c 2016 Pan Stanford Publishing Pte. Ltd.
ISBN 978-981-4613-92-7 (Hardcover), 978-981-4613-93-4 (eBook)
www.panstanford.com
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

536 Perspectives on the Coarse-Grained Models of DNA

critical events such as DNA transcription, replication, repair, and


recombination in vivo. In this chapter, we first briefly describe some
of the most salient challenges of modeling the biophysical properties
of DNA molecules and their solution environment. Following, we
review a selection of recently developed coarse-grained models and
computational approaches that may be used to investigate the DNA
structure, dynamics, and association with proteins.
Under physiological conditions, mobile counterions provide
electrostatic screening and a stabilizing medium for DNA molecules,
thus playing an important role in determining the mechanical
properties of the highly rigid DNA chains. In particular, the
condensed ions substantially reduce the intra-chain electrostatic
repulsion, favoring, in turn, bending deformations and, more gen-
erally, influencing DNA’s conformational preferences (Ponomarev
et al., 2004; Savelyev et al., 2011a). The distribution of counterions
near DNA’s surface is strongly influenced by its helical geometry.
Different counterions condense preferentially into a variety of local
environments, such as DNA grooves and strands (Cesare Marincola
et al., 2004; Denisov and Halle, 2000; Savelyev and Papoian, 2006).
For example, the minor groove’s enhanced electrostatic potential
provides favorable binding sites for positively charged ligands
(Echeverria and Papoian, 2015; Rohs et al., 2010). Additional
regulation is provided by the variability of the width of the
minor groove, modulated by sequence, representing one important
avenue for a indirect shape readout of the sequence (Bishop et al.,
2011; Echeverria and Papoian, 2015; Rohs et al., 2010, 2009).
Furthermore, the distinct organizational patterns and subsequent
dynamics of the condensed ions can give rise to complex interactions
among DNA molecules, resulting, for example, in various packing
and aggregation processes (Bai et al., 2005; Guldbrand et al., 1986;
Kornyshev and Leikin, 2013; MacKerell Jr., and Nilsson, 2008).
Given the crucial importance of the DNA structure in genetic
regulation, there is considerable interest in developing molecular
models capable of identifying the mechanisms that hinder DNA
organization and dynamics. These processes occur at multiple
length scales, ranging from hundreds to up to tens of thousands of
nucleotides. Fully atomistic simulations of DNA chains and protein–
DNA complexes have been very fruitful; however, they are still
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Introduction 537

limited to relatively small length scales (on the order of hundred


nucleotides) and time scales (on the order of microseconds) (Biswas
et al., 2013; Potoyan and Papoian, 2012; Yoo and Aksimentiev,
2013). On the contrary, many biological processes of intense current
interest, such as chromatin self-assembly, take place at significantly
larger scales. Consequently, to investigate the mesoscopic properties
of DNA complexes, novel computational techniques need to be
developed, which are capable of covering the larger length and
timescales, while preserving the salient features of the underlying
microscopic physics. Towards achieving these goals, significant
efforts have been recently devoted to developing accurate coarse-
grained (CG) models of DNA molecules solvated in various ionic
solutions.
A variety of recent DNA CG models are available; however, they
differ from each other both on the philosophy and on the degree
of desired coarse-graining (for a review see (de Pablo, 2011; Hyeon
and Thirumalai, 2011; Potoyan et al., 2012)). In all cases, the general
approach is to renormalize the dynamics of various groups of
atoms into collective degrees of freedom, sacrificing some of the
atomic detail, but where these coarse-grained variables can still be
used to accurately explore the physicochemical principles governing
the structure and dynamics of the biomolecules at larger scales.
Thus, a successful CG model would accurately characterize the main
features of the topography of the free-energy landscape of DNA
chain, which, in turn, would lead to successful description of DNA
molecule’s conformational preferences and thermodynamics. Every
coarse-graining approach consists of two-parts, usually guided by
chemical and physical intuition: (1) identifying the “important”
CG degrees of freedom and (2) determining and parametrizing
the effective interactions that govern the equation of motions
describing the temporal evolution of the selected degrees of
freedom. Consequently, by reducing the number of degrees of
freedom, CG models attain an increased computing efficiency in
a two-fold fashion. On one hand, by mapping the atoms into CG
sites, the number of pairwise interactions and forces that need to
be calculated at every time-step are greatly reduced. Additionally,
integrating the fastest degrees of freedom effectively makes the
energy landscape smoother, thus, allowing the use of significantly
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

538 Perspectives on the Coarse-Grained Models of DNA

longer time-steps during time integration of the equations of motion


describing system’s dynamics.
Coarse-graining strategies may be broadly divided into top-
down and bottom-up approaches. Within the former approach,
models are built with the purpose of reproducing emergent physical
phenomena, without directly relating to more detailed structural
models of the same system. Here, the parametrization of the CG
interactions is done mainly by relying on physicochemical intuition
and available experimental data. The predictive capability of the
resulting CG model strongly depends on the nature of the physical
principles that were explicitly included in the model, and the
quality of the underlying experimental data. When successful, these
models produce non-trivial predictions that can be further tested
experimentally. In contrast, bottom-up methodologies start from
detailed microscopic structural models, for example, from the all-
atom (AA) representation of DNA, from which the CG Hamiltonian is
systemically derived and parametrized to reproduce the structural
and dynamical features of the system. Hence, this approach allows
the development of models that are chemically accurate. To achieve
this goal, rigorous methods have been developed, based on statis-
tical mechanics, to solve the “inverse problem” of reconstructing
the Hamiltonian from the canonical structural averages obtained
from microscopic atomistic simulations. Some salient examples of
this approach include the Boltzmann inversion, inverse Monte-Carlo,
molecular renormalization and multi-scale coarse-graining by force
matching, among others (for a recent review see Noid (2012)).
The success of the bottom-up strategies depends on the quality
of the functional form of the CG Hamiltonian, which needs to be
rich enough to capture important physics, and also on the quality
of the underlying fully atomistic models used in parameterizing
the Hamiltonian. The former limitation also affects the top-down
approaches, which, for example, may lack the necessary detail
for investigating chemically specific phenomena, such as needed
for describing how the DNA sequence modulates its mechanical
properties.
When deriving CG potentials, physically distinct contributions
are usually treated as independent and additive, hence, it is impor-
tant to avoid double counting of these interactions. In this context,
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 539

one significant and outstanding challenge is to use consistent


coarse-graining strategies that ensure the transferability of the CG
model. For example, considerable effort has been devoted to deter-
mining the temperature dependence of CG models, and to probe
whether the potentials parametrized at one temperature can still
give reliable structural information at other temperatures (Wang
et al., 2009a,b). Additionally, the electrostatic forces, which are
rather complex near the highly charged surface of a DNA molecule,
arising from the many-body effects of the ionic environment around
DNA, need to be properly represented. In favor of computational
efficiency, many CG models do not include explicit solvent or
mobile ions, but formally integrate out the solvent and ion degrees
of freedom, instead incorporating their effects implicitly into the
effective CG interaction potentials. However, the complex dynamics
of hydrogen bonding networks in water and discrete nature of
mobile ions can generate spatial correlations that significantly affect
the structure, dynamics, and electrostatic atmosphere around the
DNA molecules. Thus, the specific treatments of solvent and ions will
largely determine how transferable are the resulting CG models, for
example, when applied at different ionic strengths.
In the present chapter, we discuss some of the state-of-the-art
strategies for coarse-graining DNA molecules, as well as the major
and most immediate challenges facing the field. The remainder of
this chapter is organized as follows: First, we discuss a number of
the recently developed DNA CG models, with special emphasis on
the particular coarse-graining and parameterization strategies; we
also outline their strengths and limitations. Afterwards, we present
examples from the literature where these CG models have been
applied to describe various emergent mesoscopic properties of DNA
molecules. Finally, we discuss how these models could be extended
to study complex DNA architectures, such as chromatin fibers.

16.2 Methods

Although all coarse-graining efforts aim to create an accurate


description of DNA molecules, while still greatly increasing the
computational efficiency, the available models specialize in creating
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

540 Perspectives on the Coarse-Grained Models of DNA

CG representations that reproduce only distinct subsets of DNA


properties. Furthermore, most models are developed and targeted
for different applications. In general, it is desirable for a CG model
to provide reliable structural and thermodynamic information, in
particular, when using the model to obtain the time evolution,
under given conditions, of DNA chains at longer timescales. From
the structural viewpoint, the CG model should reproduce the
appropriate tertiary structure of the DNA chain. At the local level,
this includes hydrogen bonding, stacking interactions, the helical
structure of DNA, and the corresponding geometries of the minor
and major grooves. Additionally, a successful CG model should
reproduce the correct DNA persistent length at different ionic
solution strengths, as well as the mechanical response to externally
applied forces. From the thermodynamic viewpoint, a CG model
should reproduce the effects of temperature on the stability of
the nucleic acids and predict, for example, the corresponding
temperature denaturation profiles.
The first step in coarse-graining is to define the collective
degrees of freedom of the CG model and, in bottom-up approaches,
systematically map a reference higher-resolution model to a lower-
resolution representation. Consequently, the selection of degrees of
freedom, determined in the case of DNA molecules by the number
of CG sites per nucleotide, ultimately determines the resolution of
the model. The selection of CG sites is usually done by relying on
intuitive or heuristic arguments. To the best of our knowledge, the
question of how various specific coarse-grained mappings change
the quality and the predictive capacity of various DNA CG models
has not been systematically addressed.
As follows, we will discuss three CG models that have different
number of interactions sites per nucleotide, starting from the
coarser representation. Each model was developed by a different
research group with the aim to employ the simplest model that
can describe the structural and thermodynamic properties of DNA
with a high accuracy for different specific applications. In each
case, we will point out the properties against which the model was
parametrized, its domain of applicability and its limitations.
Throughout this chapter, unless otherwise noted, the following
notation is considered: ri j corresponds to the distance between CG
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 541

sites i and j and ri j, 0 corresponds to the respective equilibrium


distance. Analogously, θ and θ0 represent a torsion angle and its
respective equilibrium value and φ and φ0 represent a dihedral angle
and its respective equilibrium value. qi represents the charge of
a CG bead at site i . β is the inverse of the thermal energy of the
system, 1/kB T , with kB being the Boltzmann constant and T the
temperature. 0 is the permittivity of vacuum and  is the dielectric
constant for water.

16.2.1 Model 1: One-Bead Double-Stranded DNA Model by


molecular Renormalization Group Coarse-Graining
The molecular renormalization group coarse-graining method
(MRG-CG) was developed by Savelyev and Papoian (Savelyev and
Papoian, 2009a,b, 2010a), generalizing the Monte Carlo renormal-
ization group (MCRG) approaches pioneered by Swendsen (Swend-
sen, 1979) and further developed by Laaksonen and Lyubartsev
(Lyubartsev and Laaksonen, 1995). The MRG-CG method was used
to systematically derive a two-bead per base-pair model of double-
stranded DNA (dsDNA), with both explicit and implicit treatments
of mobile ions. This coarse-graining strategy is a salient example
of the bottom-up approach, addressing the fundamental “inverse
problem” in statistical physics. Namely, how to reconstruct the
molecular Hamiltonian from the canonical structural averages either
measured experimentally or obtained from more detailed atomistic
simulations. The MRG-CG approach solves this problem by matching
the AA and CG partition functions, resulting in a CG model that
reproduces the complex local motions of the DNA chain, as observed
in fully atomistic MD simulations.
The MRG-CG approach relies on expanding the molecular
Hamiltonian (H) as a linear combination of N relevant physical
observables (Sα ) that comprise the structural foundation of the CG
model:

N
H = K α Sα . (16.1)
α=1

Broadly speaking, the term observable here refers to quantities,


such as bond lengths, angles or pairwise distances, that can be
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

542 Perspectives on the Coarse-Grained Models of DNA

Figure 16.1 The MRG-CG model showing the DNA chain in the NaCl salt
buffer. Each DNA base pair is represented by two beads, each placed in
the geometric center of the corresponding atomistic nucleotide. Dashed
lines indicate the fan interactions, which represent a superposition of
stacking and base pairing interactions among nucleotides. There are 11 fan
interactions imposed on each CG bead. Figure reproduced with permission
from Savelyev and Papoian (2010b).

measured in both AA and CG simulations. Here, the Kα s play the


role of Hamiltonian force constants, whose numerical value need to
be optimized to obtained the best possible CG model. For efficiency
reasons, it is desirable for a Hamiltonian, Eq. 16.1, to comprise
a relatively small number of structural elements (which, in turn,
arise as the most important collective order parameters of the
fully atomistic model). On the other hand, the Hamiltonian basis
set, {Sα }, needs to be complete enough to assure the convergence
of the parameter optimization procedure, as outlined below. Also,
sufficient completeness is needed to accurately characterize the
biophysical phenomena that will be studied with this model. In
practice, a judicious choice of the molecular basis set is as much of
art as science, and usually requires significant trial-and-error before
defining a successful model.
The aim of the MRG-CG optimization process is to find the set of
Kα s which allows the maximal matching of the AA and CG partition
functions. It turns out that the closest matching is achieved when
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 543

the average Sα s, computed from the CG simulations are maximally


similar to the same Sα s obtained from the detailed, fully atomistic
simulations, against which the model is being parametrized. One can
define:
Sα  ≡ Sα CG − Sα AA , (16.2)
as the difference between the expectation values of the observables,
Sα , averaged over the CG and AA simulations. Additionally, given that
the average values of the Sα s are some function of the parameters Kα ,
one can make the following expansion:
 ∂Sα CG
Sα  = Kγ + O(Kγ2 ). (16.3)
γ
∂ Kγ

The derivative in equation 16.3 can be expressed using exact


relations from statistical mechanics:
 
∂Sα  ∂ dq Sα (q) exp (−β λ Kλ Sλ (q))
=   (16.4)
∂ Kγ ∂ Kγ dq exp (−β λ Kλ Sλ (q))
Sα Sγ  − Sα Sγ 
=− , (16.5)
kB T
From Eq. 16.5, Eq. 16.3 can be re-written as:
1 
Sα  = − γ [Sα Sγ  − Sα Sγ ]Kγ , (16.6)
kB T
α
where ∂S∂ Kγ
can be interpreted as the susceptibility of the average
value of a specific physical observable to variation of a Hamiltonian
parameter. In this light, the parameters, Kγ , may be formally viewed
as conjugate fields to the structural observables, Sα , analogous to the
relationship between the external magnetic field and the magnetic
moment in Ising-like models.
By combining Eqs. 16.2 and 16.6, the optimization procedure
corresponds to iteratively solving a set of linear equations, for
Kγ , until a numerical convergence is obtained (i.e., Sα  ∼
0). At each step, the CG parameters are corrected by Kγ .
This optimization scheme explicitly accounts for cross-correlations
among the different CG degrees of freedom, thus including the
effects of the many-body interactions via a compact molecular basis
set of structural observables, {Sα }.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

544 Perspectives on the Coarse-Grained Models of DNA

The CG model of the B-DNA conformation of the dsDNA,


developed by Savelyev and Papoian, relied on a rather coarse
mapping of one bead per nucleotide. To reproduce the fine features
of DNA molecule’s atomistic conformational preferences, the Hamil-
tonian introduced bonded and non-bonded interactions within and
between the two DNA strands. The effective Hamiltonian is:
bonded non-bonded
     
Vtotal = Vbond + Vangle + Vfan +Vion-DNA + VDNA-DNA , (16.7)
   
intra-strand inter-strand

where Vbond and Vangle describe the bond and angles potentials,
corresponding to the intra-strand interactions. Their functional
forms are given by

bonds 4
Vbond = Kr, α (ri j − ri j, 0 )α (16.8)
i, j α=2


angles 4
Vangle = Kθ, α (θl − θl, 0 )α , (16.9)
l α=2
where Kr, α and Kθ, α are the linear combination coefficients.
Here, the quadratic polynomials were implemented to account
for the asymmetric shape of the DNA molecules and also the
anharmonicities of their local deformations. The Vfan term (where
the name originates from the topological diagram, see Figure 16.1)
was introduced to describe the inter-strand interactions by explicitly
connecting each bead i to eleven beads on the opposite strand:
[(N ± 0 . . . 5) − i ]. The functional form of this potential is analogous
to Vbond , i.e., a combination of harmonic, cubic, and quadratic terms:
4

fan  
α
Vfan = K f, α (ri : j +m − ri : j +m, 0 ) , (16.10)
i, j α=2 −5≤m≤5

where K f, α s are the linear combination coefficients. This potential


effectively represents a superposition of the base pairing and
stacking interactions among the nucleotides. In the model, the intra-
strand and inter-strand equilibrium distances and angles were not
part of the adjustable parameters and were extracted from direct
fitting of the above polynomials to the fully atomistic MD simulations
of a 16 base-pair DNA molecule.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 545

One of the DNA CG models developed by Savelyev and Papoian


explicitly accounts for the mobile ions by including non-bonded
interactions potentials composed of: (1) a repulsive short-range
potential characterizing excluded volume interactions, (2) five (or
three) Gaussian functions to reproduce the short-range hydration
effects and the atomistic behaviors of ions, and (3) long-range
electrostatics described by the Coulomb potential, dampened by the
dielectric constant of water, and with no adjustable parameters. The
functional forms of these potentials are

 A 5
qi q j
−C (k) [ri j −R (k) ]2
Vion-ion = + (k)
B exp + (16.11)
i> j
ri12
j k=1
4π 0 ri j
and

 A∗  3
qi q j
∗(k) −C ∗(k) [ri j −R ∗(k) ]2
VDNA-ion = + B exp + ,
i> j
ri6j k=1
4π 0 ri j
(16.12)
where the sum is over all non-bonded CG sites and A, B (k) , C (k) R (k)
and A ∗ , B ∗(k) , C ∗(k) R ∗(k) are adjustable parameters. It is important to
note that in Eqs. 16.11 and 16.12, the non-bonded excluded volume
interactions between the DNA beads and the mobile ions (the first
term on r.h.s.) are softer than those of ion–ion interactions and also
less number of Gaussians (the second term on r.h.s.) are necessary
to describe the corresponding short-range hydration structure.
In summary, the MRG-CG procedure is a systematic and reliable
general approach to optimizing the interactions potentials for DNA
and ions, reproducing important physical observables that charac-
terize the Hamiltonian itself. This, in turn, leads to the similarity of
the structural fluctuations of the macromolecule obtained from the
CG and fully atomistic simulations. Application of this technique to
coarse-graining DNA molecules resulted in a model that can be used
reliably describe the DNA’s structural dynamics, including complex
anharmonic local deformations of the DNA chains. Likewise, this
model also accurately describes the distribution of mobile ions
around the DNA molecules and reproduces the experimentally
measured dependence of DNA chain’s persistence length on the
solution ionic strength.
One main limitation of the MRG-CG approach is its reliance on
the accuracy of the fully atomistic force fields, serving as reference in
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

546 Perspectives on the Coarse-Grained Models of DNA

the bottom-up coarse-graining. Currently used atomistic force fields


have been developed and tested for short DNA segments, spanning,
at most, hundreds of base pairs in simulations at the microsecond
timescale or less. Whether or not the current atomistic force fields
will turn out to be robust at longer time and length scales is still
being investigated. The CG model of DNA developed by Savelyev and
Papoian was sequence-averaged, and, therefore, cannot be used to
study, for example, the sequence induced deformation of dsDNA.
However, development of a sequence specific CG DNA model using
the same MRG-CG technique is straightforward, albeit potentially
laborious. Finally, the MRG-CG model was not parametrized to
reproduce the structural behavior of single stranded DNA (ssDNA),
and, hence, cannot be used to study the thermodynamics of dsDNA
melting or hybridization.

16.2.2 Model 2: Three-Collinear Bead DNA Model for


Applications in Nanotechnology
The oxDNA model, developed by Ouldridge and co-workers
(Ouldridge et al., 2011) was designed to study the self-assembly
processes occurring in various DNA nanotechnology applications.
This is an example of a top-down approach, where the molecular
interactions, in particular, their functional forms, are motivated by
physical intuition, and are parametrized to reproduce a specific
set of experimentally measured geometrical, mechanical and ther-
modynamical properties. This includes the polymer’s persistence
length and the melting and hybridization temperatures. Among
the significant features of this model are its ability to capture
the mechanical properties of ssDNA and dsDNA, reproducing the
flexibility of the ssDNA and the relative rigidity of the dsDNA.
In the oxDNA model, each nucleotide is represented by a rigid set
of three collinear interaction sites and a vector that is perpendicular
to the plane of the base, thus capturing the planarity of the bases (see
Figure 16.2). The first bead mimics the position of the backbone,
while the other two beads are responsible for stacking, hydrogen
bonding and excluded volume interactions. Since the distances
between the beads within a single nucleotide are fixed, this model
can be considered to have two degrees of freedom per bead. The
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 547

Figure 16.2 The oxDNA model. (A) Coarse-grained sites of the oxDNA
model. Each nucleotide is represented by a set of three collinear beads
representing the backbone, stacking and hydrogen-bonding sites. (B)
Topology of the interactions. (B.1) Distances considered in the excluded
volume interactions. (B.2) Angles that modulate the hydrogen-bonding
interactions. (B.3) Angles that modulate the base stacking interactions. (B.4)
Angles that modulate the bases cross-stacking interactions. (C) A 12-bp
duplex as represented by the model. Figure adapted with permission from
Ouldridge et al., 2011.

effective Hamiltonian, for the oxDNA model is:


bonded non-bonded
   
Vtotal = Vbb + Vbstk + Vcstk + VHB + Vexcl
. (16.13)
      
intra-strand inter-strand both

Here, Vbb represent the bonds between consecutive backbone sites,


which are connected by nonlinear elastic springs (FENE). The
functional form of the potential is given by
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

548 Perspectives on the Coarse-Grained Models of DNA

 
bonds
ξ (ri j − ri j, 0 )2
Vbb = VFENE = − ln 1 − , (16.14)
bonds i, j
2 2

where ξ is the energy coefficient and  defines the range of


acceptable deviations from equilibrium, determining the steepness
of the potential.
The excluded volume interactions (Vexcl ) are responsible for
preventing crossings of the chains, also giving rise to the stiffness
of the unstacked single strands. These interactions are defined
between the base repulsive site and backbone site and are described
by a combination of Lennard–Jones and smoothing potentials:
⎧    σ 6 

⎪ V (r, ξ, σ ) = 4ξ σ 12
− r < r∗
⎨ LJ r r
fex (r) = Vsmooth (r, b, rc ) = bξ (rc − r)2 r ∗ < r < rc


⎩0 otherwise
(16.15)
Here, ξ is the energy coefficient, b the force constant, σ is the finite
distance at which the potential is zero and the cutoff distances r ∗ and
rc are free parameters. The functional form of the Vexcl interactions is
defined by a combination of Eq. 16.15 terms between the backbone
and base sites within a DNA strand and on the opposite strands,
except for the nearest neighbors (see Figure 16.2 (B.1)). The lack of
angular component on this potential allows for ssDNA to be highly
flexible.
The stacking term (Vstack ) between intra-strand bases sites i and
j are represented by a Morse potential, with the following functional
form:

VMorse = ξi j (1 − exp−αi j (ri j −ri j, 0 ) )2 , (16.16)

where ξi j is the depth of the attraction well and αi j is an adjustable


parameter to control the range of attraction. In the functional form
of Vstack , this term is multiplied by numerous orientation terms that
depend on mutual arrangement of the bases (see Figure 16.2 (B.3)).
The VHB potentials account for the hydrogen bonding interac-
tions within the WC base pairs. These potentials are built using a
combination of Morse potentials (Eq. 16.16), smoothing potentials
and orientation terms between the hydrogen bonding sites (see
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 549

Fig. 16.2 (B.2)). The radial portion of this potential is given by




⎪ VMorse (r, ξ, r0 , a) − VMorse (rc , ξ, r0 , a) if r low < r < r high


⎨ξ V low low
if rclow < r < r low
smooth (r, b , rc )
fHB (r) =


high
ξ Vsmooth (r, bhigh , rc ) if r high < r < rc
high



0 if otherwise
(16.17)
high
where r low , rclow , r high and rc are free cutoff parameters. Finally, the
Vcstk term represents the cross-stacking interactions between a base
site in a base pair and the nearest-neighbor bases on the oppose
strand (i.e., i : j +1 and i : j −1 interactions). This potential provides
additional stabilization to the DNA duplex and is implemented by a
harmonic term multiplied by additional smoothing and orientation
terms to modulate the alignment of the bases and the separation of
the strands. In the overall potential, the combination of Vbstck and
VHB causes the formation of antiparallel, right-handed dsDNA from
two complementary strands. Among the described interactions,
only the excluded volume and backbone iterations are isotropic,
whereas all the other interactions explicitly depend on the relative
orientation of the nucleotides.
The oxDNA model was parametrized by, first, setting the
equilibrium bond lengths from the structural data on the B-DNA
conformations. Afterwards, the stacking interactions were manually
adjusted to reproduce the experimental data by Holbrook et al.,
followed by adjusting the hydrogen-bonding and cross-stacking
parameters to give duplex and hairpin formation thermodynamics
consistent with the SantaLucia parametrization of the nearest
neighbor model (SantaLucia and Hicks, 2004). The only parameters
that depend of the temperature are the stacking interactions. The
process of adjusting the model parameters was iterated many times
until a consistent set of parameters was derived.
The OxDNA model provides a good representation of the DNA
thermodynamics, while also providing an accurate description of
the structural and mechanical properties of dsDNA B-DNA and
ssDNA. Moreover, the ssDNA correctly describes the formation
of hairpins or other ssDNA secondary structures through the
stacking interactions of the intra-strand sites. This model accurately
reproduces the persistence lengths of the ssDNA and dsDNA and
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

550 Perspectives on the Coarse-Grained Models of DNA

provides a quantitative description of the thermodynamics of single-


strand stacking, duplex hybridization, and hairpin formation, thus
providing a good description of the interplay between ssDNA and
dsDNA.
One of the main limitations of the OxDna model is that the
sequence dependence is only included at the level of WC base
pairing, thus, not capturing phenomena, where, for example, the
sequence dictates the local flexibility and shape deformations
of the DNA chain. The latter sometimes can have significant
functional consequences (Rohs et al., 2010). However, to capture
such dependencies would require the stacking interactions to
be contingent on the sequence. Additionally, electrostatics is not
treated explicitly but is modeled rather implicitly through the
excluded volume term. In a similar way, the salt concentrations are
modeled through implicit screening effects, without treating mobile
ions explicitly. Consequently, the interactions of strands in close
proximity that are not forming a dsDNA cannot be described by the
OxDna model. Similarly, the role of specific monovalent and divalent
counterions modulating DNA packing or aggregation cannot be
studied.

16.2.3 Model 3: Three-Bead DNA Model to Reproduce


Melting Temperatures
The 3-Site-Per-Nucleotide (3SPN.2) DNA CG model was developed
by de Pablo and co-workers (Hinckley et al., 2013; Knotts et al.,
2007) to provide a comprehensive representation of melting,
hybridization and the mechanical properties of ssDNA and dsDNA.
This is another example of a top-down approach, where parame-
trization was based on matching experimental measurements of
hybridization and base stacking free-energies and in reproducing
the equilibrium values for bond lengths, bonds angles and dihedral
angles from the structural data.
The 3SPN.2 model represents each nucleotide by three CG
interacting sites: the phosphate, the deoxyribose and the base.
The coordinates of each site are positioned at the centers of mass
for these moieties (see Figure 16.3). The size of each site was
set such that there are no excluded volume interactions of the
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 551

Figure 16.3 The 3SPN.2 model. (A) Coarse-grained sites of the 3SPN.2
model. Each nucleotide is represented by a phosphate site, a sugar site,
and a base site. (B) Schematic representation of the distances and angles
considered in the non-bonded interactions. Base–base interactions are
represented by anisotropic potentials: the inner cone represents the angles
where the potential is applied. Within the inner and outer cone, the
potential is modulated to zero. (B.1) Angular dependence of the intra-
strand base-stacking interactions. (B.2) Angular dependence of the base-
pairing interactions. (B.3) Angular dependence of the inter-strand cross-
stacking interactions. (B.4) Definition of angles used to modulate the base–
base interactions. θBS describes the base-stacking interactions, θ1 , θ2 , and φ1
describe the base-pairing interaction, and θ3 and θC S modulate the cross-
stacking interactions. (C) A 12-bp duplex as represented by the model. The
radius of each bead is proportional to its van der Waals radius. Figure
adapted with permission from Hinckley et al., 2013.

CG representation on the B-DNA conformation. In this model, the


resolution of three beads per nucleotide gives the appropriate
resolution of the major groove without requiring the use of
anisotropic potentials. The effective Hamiltonian, for the 3SPN.2 CG
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

552 Perspectives on the Coarse-Grained Models of DNA

DNA model, is the following:


bonded non-bonded
     
Vtotal = Vbond + Vbend + Vtors + Vcstk + Vbstk + Vbp + Vexcl + Velec
        
intra-strand inter-strand both
(16.18)
Here, bonded interactions correspond to the bonds, angles and
dihedral contributions, for which the functional forms are given by

bonds
Vbond = kb (ri j − ri j, 0 )2 + 100kb (ri j − ri j, 0 )4 , (16.19)
i, j


bends
Vbend = kθ (θl − θl, 0 )2 , (16.20)
l

 

dihedral
−(φl − φ0 )2
Vtors = −kφ exp . (16.21)
l
2σφ,2 l
where kb , kθ and kφ are the force constants for bonds (Vbond ), angles
(Vbend ) and dihedrals (Vtors ), respectively and σφ, l is the Gaussian
well width for dihedral l. Excluded volume interactions (Vexe ) were
set as a purely repulsive potential between sites i and j , which do
not participate in bonded interactions, with the following functional
form:

 ξr [( σi j )12 − 2( σi j )6 ] + ξr if r < rC
Vexe = ri j ri j
(16.22)
i< j 0 if r ≥ rC ,
where ξr is the energy parameter and σi j = (σi +σ j )/2 is the average
site diameter. The cutoff distance rC was set σi j . Bases that form WC
base pairs do not interact through this potential.
Base stacking (Vbstk ), base cross-stacking (Vcstk ) and base-pairing
(Vbp ) interactions are represented by angle-dependent potentials
that creates a cone of strong attraction described by a Morse
potential, surrounded by a larger cone within which the Morse
potential is modulated to vanish at the edges (see Figure 16.3.B). The
functional form of the Morse potential part is given by
VMorse = ξi j (1 − e−αi j (ri j −ri j, 0 ) )2 − ξi j , (16.23)
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 553

where ξi j is the depth of the attraction well and αi j is an adjustable


parameter to control the range of attraction between CG sites i
and j . The angle-dependent modulation function takes the following
form:

⎪ π π
⎪ 1 if − 2K < θ < 2K


⎨1 − cos2 (K θ) if − π < θ < − π
f (K, θ ) = K 2K
, (16.24)

⎪ or π
< θ < π

⎪ 2K K

0 if θ < − πK or θ > πK

where K controls the width of the cone of attraction and θ = θ −θ0 ,


is the difference between the current angle and the angle measured
in the crystal structure of B-DNA. Thus, functional forms of Vbp ,
Vbstk and Vcstk use a combination of Eqs. 16.23 and 16.24, while also
taking into account the distances and angles between bases.
Charged phosphate sites interact via a screened electrostatic
potential among the non-neighboring intra-strand nucleotides and
among all inter-strand nucleotides. The interaction is given by


nelec
qi q j exp−ri j /λ D
Velec = , (16.25)
i< j
4π 0 (T , C )ri j

where λD is the Debye screening length, (T , C ) is the dielectric


permittivity of the solution, and all other variables as previously
defined. The Debye screening length is given by

0 (T , C )
λD = , (16.26)
2β N A ec2 I

where I is the ionic strength and ec is the elementary charge.


Additionally, this model assumes that the solution dielectric permit-
tivity (T , C ) dependence on molarity (C ) and temperature (T ) are
independent:

(T , C ) = (T )(C ). (16.27)

The effects of ions are modeled by adjusting the effective charge of


the phosphate atoms (Eq. 16.25) and the ionic strength (Eq. 16.26).
This model, while not explicitly including mobile ions, provides a
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

554 Perspectives on the Coarse-Grained Models of DNA

reasonable scaling of the persistence length with ionic strength,


reproducing the experimental denaturation and renaturation curves
as a function of temperature and ionic strength.
Non-bonded interactions were parametrized by matching the
free-energies of hybridization and base-stacking to a number of
experimental measurements. The hybridization free-energies were
obtained by matching the melting temperature, Tm , which is directly
related to the melting free-energy experimentally measured by
Owczarzy et al. (Owczarzy et al., 2004). In the 3SPN.2 CG model,
the melting temperatures (Tm ) of the dsDNA and hairpins were
calculated using metadynamics (Laio and Gervasio, 2008), by
defining the temperature at which the probabilities of hybridization
and de-hybridization are equal. Using a similar approach, stacking
interactions were determined by calculating the stacking and de-
stacking free energies in nicked DNA (Protozanova and Yakovchuk,
2004). Here, metadynamics simulations, were used to evaluate the
relative probabilities of the stacked and de-stacked states. The free-
energy difference between the stacked and unstacked states was
obtained by integrating over the relative populations of these states.
Using the Boltzmann inversion method (Reith et al., 2003), the
stacking strength parameter (Eq. 16.23) was adjusted according
to:
Fn
ξn+1 = ξn + kb T ln , (16.28)
F exp
where ξ is the stacking interaction between two bases, n is
the iteration number, F n , is the free energy when using the ξn
parameter and F exp is the experimental free-energy. The Boltz-
mann inversion method was applied iteratively until satisfactory
agreement between the experimental and CG free-energies was
achieved. The same approach was used to determine the free-
energies of base pairing, for which the experimental nearest-
neighbor enthalpies were used for parametrization (SantaLucia and
Hicks, 2004). Structural quantities were parametrized by adjusting
the bonded interaction parameters to reproduce the persistence
length and other structural features such as the minor and major
groove widths. Several rounds of optimization for all the parameters
were performed to achieve consistency between structural and
thermodynamic quantities.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 555

Among the strengths of the 3SPN.2 model is the simultaneous


incorporation of the effects of sequence, ionic strength, and tem-
perature, even though mobile ions were not explicitly considered.
This model is capable of reproducing structural features such as the
dsDNA duplex width, base rise, and major and minor groove widths.
Additionally, the model captures the flexibility and the persistence
lengths of both ssDNA and dsDNA and predicts melting tempera-
tures that are consistent with the experiments. Among the model
limitations are that electrostatic interactions are treated only at
the level of the Debye–Hückel approximation. Additionally, excluded
volume interactions are represented by isotropic potentials, not
taking into account the effects of the planarity of the bases. DNA
bonded interactions were parametrized to reproduce the B-DNA
conformation and this parametrization might not be suitable to
describe the conformations sampled by deformed or alternative
dsDNA conformations or ssDNA conformations.
An extension of this approach, explicitly including solvation and
ions, was developed by deMille et al. (DeMille et al., 2011), where
the monatomic water model mW and its extension to ionic solutions
were incorporated (DeMille and Molinero, 2009). In the mW model,
the structure of liquid water is reproduced via the interplay between
the two-body attraction terms, which favor high coordination,
and the three-body repulsion terms, which promote tetrahedral
configurations. However, the overall treatment of electrostatics in
this combined mW/3SPN.2 DNA model seems rather ad hoc and
further theoretical justification and exploration are needed.

16.2.4 Other Models


We have described here, in detail, three different DNA CG models.
Other examples, found in the literature, vary both in their ap-
proaches to coarse-graining and in the structural detail included
in the corresponding models. For example, Hsu et al. developed a
CG DNA model using a bottom-up approach from ab initio total-
energy calculations based on density functional theory (DFT) (Hsu
et al., 2012). This model reduces each nucleotide into two CG sites,
where the Hamiltonian terms in the CG model include hydrogen
bonding, stacking interactions, backbone-backbone and backbone-
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

556 Perspectives on the Coarse-Grained Models of DNA

base interactions. The interaction energies of each contribution are


calculated from DFT and fitted by simple analytical expressions
for use in the CG model. The model reproduces the stable B-DNA
conformation and the experimentally measured persistence lengths
at varying salt concentrations. Electrostatic interactions are treated
via a Coulomb potential and the effects of ionic screening are taken
into account through microscopically derived potentials.
In another approach, He et al. (He et al., 2013) proposed a 2-
site per nucleotide (NARES-2P, nucleic acid united residue 2-point
model) CG model where chain connectivity, excluded volume and
base dipole interactions are sufficient to form helical DNA and
RNA structures. This model was parametrized using a bottom-up
strategy by employing a set of statistical potentials, derived from
DNA and RNA structures from the Protein Data Bank,a and the
Boltzmann inversion method to reproduce the structural features.
The base–base interactions were parametrized by fitting the
potential of mean force to detailed all-atoms MD simulations using
also the Boltzmann inversion approach. The respective potentials
do not explicitly define the nucleic-acid structure, dynamics and
thermodynamics, but are derived as potentials of mean force. By
detailed analysis of the different contribution to the Hamiltonian,
the authors determined that the multipole–multipole interactions
are the principal factor responsible for the formation of regular
structures, such as the double helical structures.
The HiRe-RNA model, recently proposed by Cragnolini et al.
(Cragnolini et al., 2013) is a high-resolution coarse-grained RNA
model, with six or seven CG sites per nucleotide, that has been
extended to DNA molecules using a top-down parametrization
approach. This model captures many geometric details such as base
pairing and stacking, allowing DNA molecules to fold to near-native
structures within a short computational time. The Hamiltonian
includes local, non bonded and hydrogen bond interactions, where
the latter are treated in great detail. In particular, hydrogen bond
potential consists of three terms: (1) a two-body interaction based
on distances and angles between the interacting beads, (2) a three-
body term with a repulsion barrier to avoid multiple hydrogen-

a http://www.rcsb.org.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Methods 557

bonds of just one base, and (3) a four-body cooperative term


involving neighboring base pairs. However, the HiRe-RNA model
does not include explicit treatment of electrostatics or mobile ions.
Another high-resolution coarse-grained DNA model, with up to
eight coarse-grained sites forming a rigid nucleotide, was developed
by Edens et al. Their model was parametrized using a top-down
approach to fit the energetic contributions of different interactions
to the corresponding free energies measured experimentally, or in
more detailed simulations. The functional form of the Hamiltonian
includes the usual bonded, base stacking and hydrogen bonding
terms. Additionally, to describe the role of solvent, the model assigns
energies (and forces) depending on the degree to which a bead is
exposed to solvent.
Linak et al. developed a 3-site per nucleotide CG model that
includes non-WC bonds, relying on a top-down approach, to study
the hybridization of ssDNA and dsDNA hairpins (Linak et al.,
2011). Their model explicitly takes into account the formation
of Hoogsteen bonds, which are known to stabilize multi-body
secondary structures in DNA. The model reproduces many of the
microscopic features of dsDNA and captures the experimental
melting curves for a number of short DNA hairpins.
In an innovative approach, Morriss-Andrews et al. (Morriss-
Andrews et al., 2010) developed a DNA CG model that uses non-
isotropic potentials to describe the geometry of the nucleotide
bases. In their model, each nucleotide is represented by three CG
sites: two spherical sites representing the phosphate and sugar
and one rigid ellipsoid representing the base. The anisotropic
potentials have the advantage of directly capturing the base-
stacking interactions and providing structural information about the
geometry and dynamics of the DNA molecules at the local level,
in processes such as base tilting and twisting. Hence, the model
reproduces the B-DNA conformation even though the latter was not
explicitly used when designing the potentials. The parametrization
of the model relied on a bottom-up approach, mainly based on
all-atoms (AA) MD simulations, to determine the one-dimensional
potentials of mean force associated with each of the different
physicochemical interactions considered in the Hamiltonian.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

558 Perspectives on the Coarse-Grained Models of DNA

16.3 Results

16.3.1 Reproducing DNA’s Structural Properties from CG


Models
Most of the described CG DNA models were designed to reproduce
the canonical B-DNA conformation and other DNA structural
properties, such as the number of base pairs per turn, the width
of the dsDNA and the widths of the minor and major grooves. All
these properties were included as targets of the parametrization,
thus, not surprisingly, most models reproduce these geometrical
features. However, for the models to be predictive and transferrable,
it is important that some of the additional structural and physical
properties of DNA molecules, which were not explicitly included in
the parametrization, are also predicted. Among such properties are
the large stiffness of the dsDNA chains (i.e., long persistence length)
and the high flexibility of the ssDNA chains (i.e., short persistence
length).
The physical basis of the dsDNA stiffness, and the ways
that it can be regulated by ions and other small molecules, is
not fully understood. From a physicochemical standpoint, it is
natural to suggest that the electrostatic self-repulsion of the DNA
backbone and the base-stacking interactions are the dominant
players. The relative contributions of these two forces have been
a matter of much debate from the experimental, theoretical and
computational viewpoints. Granted that the appropriate description
of the electrostatic interactions are incorporated into the CG
model, this can be used to systematically investigate the role
of different physical forces in determining the overall dsDNA
stiffness. For example, Savelyev and Papoian, by simulating a set
of hypothetical DNA chains, whose overall electric charges were
continuously neutralized, demonstrated that the electrostatic and
non-electrostatic effects play comparable roles in maintaining the
DNA’s stiffness (Savelyev et al., 2011b). A summary of how the
predictions from different CG DNA models are comparable to the
experimental measurements of the persistence lengths for dsDNA
and ssDNA is shown in Table 16.1.
January 29, 2016 11:33

Table 16.1 Comparison of different DNA coarse-grained models

MRG-CG oxDNA 3SPN.2 Hsu et al. NARES-2P HiRe-RNA Edens et al. Linak et al. Plotkin et al.

APPROACH bottom-up top-down top-down bottom-up bottom-up top-down top-down top-down bottom-up

SITES-PER-NUCLEOTIDE 1 3(2) 3 2 2 6-7 8 3 3

MODEL DEVELOPMENT

EXPLICIT ELECTROSTATICS  —    —  — 

EXPLICIT IONS  — — — — — — — —

SIMULATIONS AT DIFFERENT     — — — — 

SALT CONCENTRATIONS

STRUCTURAL FEATURES
PSP Book - 9in x 6in

SSDNA DESCRIPTION —   — — —   

DSDNA PERSISTENCE LENGTH     — —   

SSDNA PERSISTENCE LENGTH —   — — —   

WC BASE-PAIRING —        

THERMODYNAMICS FEATURES

TEMPERATURE DENATURATION —   —   —  —
Results 559
16-Qiang-Cui-c16
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

560 Perspectives on the Coarse-Grained Models of DNA

16.3.2 Reproducing DNA’s Thermodynamic Properties


from CG Models
Most of the CG models that use a top-down parametrization
approach have been designed to reproduce the thermodynamics
of DNA melting, base-stacking and hybridization. However, it is
also important that the models are also capable to capture the
thermodynamics of processes that were no explicitly included
into the parametrization by, for example, correctly predicting the
formation of bubbles in AT rich regions or the stabilities of
various DNA hairpins. In contrast, most of the CG models that have
been parametrized using a bottom-up approach did not explicitly
take into account the experimentally obtained thermodynamic
data, and hence, are not expected to necessarily reproduce the
thermodynamics of the above-mentioned DNA processes.

16.3.3 Example 1: Salt-Dependent Buckling of Circular


DNA Molecules
Bending, torsional, and buckling deformations of circular DNA
molecules provide a fertile framework for studying the balance
between elastic and electrostatic forces in determining the DNA
chains’ structural behaviors. For example, Savelyev and Papoian
(Savelyev et al., 2011b), using the MRG-CG DNA model, studied the
effects of salt concentration on the structural response of a 90-base-
pair circular dsDNA to stress. The authors observed that at low
salt concentrations, the strong electrostatic repulsion favors planar
circular conformations, while at high salt concentrations, elastic
forces induce buckling. As shown in Figure 16.4, a sharp phase
transition between these two states is found at salt concentrations
that are comparable with the physiological conditions (Savelyev
et al., 2011b). This example highlights how CG models can be used to
study complex phenomena where local structural interactions give
rise to global structural transitions. Furthermore, this study could be
extended, for example, to study the role of mobile ions in balancing
of the electrostatic and elastic forces.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Results 561

A) B)

Overtwisting Bending Circular formation

Figure 16.4 Salt-dependent buckling of circular DNA molecules. (A)


Schematic representation of the preparation of the over twisted 90-base-
pair circular DNA. (B) A pronounced phase transition, from circular to
buckled DNA, is observed at physiological conditions (concentrations of
50–200 mM). Q is an order parameter that quantifies the DNA circle’s
supercoiling. Figure reproduced with permission from Savelyev et al.,
2011b.

16.3.4 Example 2: Obtaining the Hybridization Rate


Constants
Based on experimental observations, DNA hybridization has been
proposed to occur via nucleation, involving a few consecutive, in-
register WC base pairs, followed by rapid cooperative zippering.
Using the 3SPN.2 model and the Forward Flux Sampling method
(Allen et al., 2009), Hinckley et al. (Hinckley et al., 2013), examined
the DNA’s hybridization mechanism and determined the kinetic
rates at a base-level resolution. Their calculated hybridization rates
showed good correlation with the trends seen in the corresponding
experimentally measured rates, being, however, one to two orders of
magnitude larger. This result highlights how coarse-graining effec-
tively smoothens biomolecules’ energy landscapes, thus, lowering
the kinetic barriers to structural transitions. Hence, future studies
should explore at greater depth the validity of using CG models to
calculate kinetic rates. From a mechanistic viewpoint, Hinckley et
al. study determined that zippering of complementary strands is the
dominant mechanism of hybridization (Hinckley et al., 2013).
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

562 Perspectives on the Coarse-Grained Models of DNA

16.3.5 Example 3: Toehold-Mediated DNA Strand


Displacement
The oxDNA model has been primarily developed to analyze the
transitions and dynamic behaviors of the processes occurring in
DNA nanotechnology. A ubiquitous example, central to many DNA
devices, is the use of the toehold mediated strand displacement for
controlling reaction kinetics. Such systems involve a two-stranded
complex, formed by an “incumbent” strand bound to a substrate
strand (S), which has a single-strand overhang called a toehold, and
a invader strand (I) (see Figure 16.5.A). The invader is fully WC
complementary to the substrate and may bind reversibly to it, using
the toehold domain. The rate constants for strand displacement
depend strongly on the length of the toehold length, varying several
orders of magnitude. Using the oxDNA model, Srinivas et al. (Srinivas
et al., 2013), studied the strand displacement processes at different
toeholds lengths, ranging from 0 to 7 bases, to investigate the
biophysical basis of this kinetic phenomena and also determine
the corresponding kinetic rates. The oxDNA model predicts an
exponential dependence on the toehold length for short toeholds
followed by a plateau for longer ones, with 10-fold acceleration
on the displacement rates for toeholds with lengths from 0 to 7
bases (see Figure 16.5). This result agrees remarkably well with
the experimental data (Srinivas et al., 2013). Furthermore, the
authors identified that the toehold-mediated strand displacement
involves four distinct time scales: branch migration initiation,
branch migration, rates of hybridization and fraying. For example,
initiating the branch migration process is slower than the average
branch migration step because it incurs a thermodynamic penalty
arising from the steric interference of the additional overhang
at the junction. The branch migration steps, which require ther-
modynamically unfavorable steps due to structural rearrangement
and disruption of favorable stacking interactions are slower than
fraying of the toehold. Overall, the strand displacement processes
are thermodynamically driven forward by the net gain in base pairs
due to the toehold, however, are far from being iso-energetic, and
involve multiple free energy penalty terms for the intermediate
states. This example illustrates how the oxDNA model can shed light
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Results 563

A) Initial state B) C)
S Incumbent

Disp. rate constant [expt] (/M/s)


I 100 106

Relative disp. rate [sim]


-1 5
10 10 Invading
-2
10 104
-3 3
10 10
10-4 102
-5
10 101
-6
10 100
0 1 2 3 4 5 6 7
Substrate
Toehold length

Figure 16.5 Toehold-mediated DNA strand displacement. (A) Strand


migration scheme. In the initial state the substrate (S) has a single-strand
overhand or toehold (in grey). The toehold mediates the displacement
of the incumbent by the invader (I). (B) Rate of strand displacement
as a function of toehold length from simulations (solid line, left axis)
and from experiments (dashed line, right axis). The logarithmic scales of
both axes are identical up to normalization constant. (C) Snapshot of the
simulation showing the process of strand migration. Figure reproduced with
permission from Srinivas et al., 2013.

on the structural and kinetic basis of the diverse phenomena that


take place in DNA nanotechnology.

16.3.6 Modeling of Chromatin


One major challenge in the DNA biophysics field is to properly
represent the packing of DNA into chromatin inside the eukaryotic
cell nucleus. This requires that the DNA chains are ordered in a
dynamically and retrievable manner. Under physiological conditions
this is achieved by the hierarchical assembly of DNA, proteins,
ions and water molecules. To quantitatively understand these
compaction processes it is necessary to build models that take
into account the known physicochemical properties of DNA and
associated proteins. However, the chromatin modeling efforts are
still in their infancy. On one hand, modeling chromatin at the
all-atoms level is impractical given the size of the system and
the timescales at which the relevant phenomena are observed.
At the other extreme, highly coarse-grained models have been
developed to represent each nucleosome by a single bead with
non-uniform point charges, distributed on the surface of the beads
to reproduce the Poisson–Boltzmann electric field, and with the
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

564 Perspectives on the Coarse-Grained Models of DNA

linker DNA represented as a wormlike chain (Korolev et al., 2012).


These models can capture interesting aspects of the chromatin’s
electrostatics, mechanics and conformational flexibility and can
be further refined to include, for example, flexible histones tails
and monovalent ions. However, it is desirable to have models that
can describe, at the microscopic level, the physical mechanisms of
phenomena such as the nucleosome-nucleosome interactions that
mediate chromatin compaction. Some of the recently developed DNA
CG models, some of which are described above, may be appropriate
for future studies of chromatin condensation. For this purpose, these
models should be extended to include explicit ions and proteins.
A variety of protein CG models, with consistent resolutions, are
available in the literature and could be considered for studies of
protein–DNA interactions and complex formations (Davtyan et al.,
2012; Voth, 2008).

16.4 Conclusions and Outlook

In this chapter we have outlined some of the current state-of-


the-art coarse-graining strategies and models available to simulate
DNA molecules at the mesoscale. The models described here use
a combination of knowledge-based and physics-based approaches
to define the Hamiltonian of DNA and ions. We described how
the different Hamiltonians have been parametrized and optimized
to reproduce the canonical averages of physical observables, such
as the structural and thermodynamic properties in terms of their
distribution functions. As discussed throughout this chapter, all the
stages of the coarse-graining process are far from being trivial.
First, the selection of the CG sites and the functional form of the
Hamiltonian are usually defined based on intuitive or heuristic
arguments, where physically distinct contributions are treated as
independent and additive. Consequently, choosing the appropriate
Hamiltonian ultimately determines the success of the CG model.
Following, it is key that the optimized set of parameters, with are
associated with diverse physical phenomena, is consistent. We have
described some of the special techniques used to solve this “inverse
problem.” These range from systematically matching the partition
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

Conclusions and Outlook 565

functions to ad hoc fitting of the free parameters to reproduce the


related experimental data. In all cases, it is fundamental to further
develop strategies for systematically deriving and validating CG
models.
We have highlighted some of the needs and challenges in the
development of accurate mesoscale descriptions of DNA. Among
the most immediate challenges are to develop methods that move
beyond the Debye–Hückle approximation and that take into account
the complex many-body dynamics of mobile ions near the highly
charged DNA surface. It is well known that the Debye–Hückel
approximation overestimates the electrostatic interactions and that
its validity is limited to systems with low salt concentrations and
low macromolecular surface charge densities. Serious theoretical
inconsistencies might be expected when the Debye–Hückle approx-
imation is applied to describe the DNA electrostatics, and should be
addressed in future studies.
In most cases, the DNA CG models presented here are used
in Langevin-based molecular dynamics (MD) simulations that
propagate the positions and velocities of the CG sites in time
and space. As described in the examples, when the appropriate
sampling strategies are used, these simulations provide detailed
information about the conformation space, in particular, appropri-
ately describing the thermodynamics associated with the different
conformations. Moving beyond equilibrium structural distributions,
much less attention has been paid to the dynamical behavior of
molecular systems in coarse-grained simulations, which should be
addressed in future studies.
As the DNA CG models come to age they would be used to
address many important questions, such as the functional role of
DNA topology and compaction in gene regulation, and to design
DNA-based nanodevices. This would require advances in parallel
algorithms and that the current DNA CG models are systematically
integrated with various CG models of proteins to describe important
nucleic acid-protein complexes and assembles found in cells and
beyond. Furthermore, to the best of our knowledge, there has been
no systematic assessment of the scalability of the currently available
DNA CG models to systems larger than a couple hundreds of base
pairs.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

566 Perspectives on the Coarse-Grained Models of DNA

Acknowledgments

The authors thank Thomas Ouldridge and Daniel Hinckley for their
help and comments on creating figures of the models. This work was
in part supported by the National Science Foundation (NSF) CAREER
Award CHE-0846701 and CHE-5242810 and by the University of
Maryland.

References

Allen, R. J., Valeriani, C., and ten Wolde, P. R. (2009). Forward flux sampling
for rare event simulations, Journal of Physics: Condensed Matter 21, 46,
p. 463102.
Bai, Y., Das, R., Millett, I. S., Herschlag, D., and Doniach, S. (2005). Probing
counterion modulated repulsion and attraction between nucleic acid
duplexes in solution, Proceedings of the National Academy of Sciences of
the United States of America 102, 4, pp. 1035–1040.
Bishop, E. P., Rohs, R., Parker, S. C. J., West, S. M., Liu, P., Mann, R. S., Honig, B.,
and Tullius, T. D. (2011). A map of minor groove shape and electrostatic
potential from hydroxyl radical cleavage patterns of DNA, ACS Chemical
Biology 6, 12, pp. 1314–1320.
Biswas, M., Langowski, J., and Bishop, T. C. (2013). Atomistic simulations of
nucleosomes, Wiley Interdisciplinary Reviews: Computational Molecular
Science 3, 4, pp. 378–392.
Cesare Marincola, F., Denisov, V. P., and Halle, B. (2004). Competitive Na+
and Rb+ binding in the minor groove of DNA, Journal of the American
Chemical Society 126, 21, pp. 6739–6750.
Cragnolini, T., Derreumaux, P., and Pasquali, S. (2013). Coarse-grained
simulations of RNA and DNA duplexes, The Journal of Physical Chemistry
B 117, 27, pp. 8047–8060.
Davtyan, A., Schafer, N. P., Zheng, W., Clementi, C., Wolynes, P. G., and Papoian,
G. A. (2012). AWSEM-MD: Protein structure prediction using coarse-
grained physical potentials and bioinformatically based local structure
biasing, The Journal of Physical Chemistry B 116, 29, pp. 8494–8503.
de Pablo, J. J. (2011). Coarse-grained simulations of macromolecules: From
DNA to nanocomposites, Annual Review of Physical Chemistry 62, 1, pp.
555–574.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

References 567

DeMille, R. C., Cheatham, T. E., and Molinero, V. (2011). A coarse-grained


model of DNA with explicit solvation by water and ions, The Journal of
Physical Chemistry B 115, 1, pp. 132–142.
DeMille, R. C., and Molinero, V. (2009). Coarse-grained ions without charges:
reproducing the solvation structure of NaCl in water using short-
ranged potentials, The Journal of Chemical Physics 131, p. 034107.
Denisov, V. P., and Halle, B. (2000). Sequence-specific binding of counterions
to B-DNA, Proceedings of the National Academy of Sciences of the United
States of America 97, 2, pp. 629–633.
Echeverria, I., and Papoian, G. A. (2015). DNA exit ramps are revealed in the
binding landscapes obtained from simulations in helical coordinates,
11(2): e1003980. doi:10.1371/journal.pcbi.1003980.
Guldbrand, L., Nilsson, L. G., and Nordenskiöld, L. (1986). A Monte Carlo
simulation study of electrostatic forces between hexagonally packed
DNA double helices, The Journal of Chemical Physics 85, 11, p. 6686.
He, Y., Maciejczyk, M., Ołdziej, S., Scheraga, H., and Liwo, A. (2013). Mean-
field interactions between nucleic-acid-base dipoles can drive the
formation of a double helix, Physical Review Letters 110, 9, p. 098101.
Hinckley, D. M., Freeman, G. S., Whitmer, J. K., and de Pablo, J. J. (2013). An
experimentally-informed coarse-grained 3-Site-Per-Nucleotide model
of DNA: structure, thermodynamics, and dynamics of hybridization, The
Journal of Chemical Physics 139, 14, p. 144903.
Hsu, C. W., Fyta, M., Lakatos, G., Melchionna, S., and Kaxiras, E. (2012). Ab
initio determination of coarse-grained interactions in double-stranded
DNA, The Journal of Chemical Physics 137, 10, p. 105102.
Hyeon, C., and Thirumalai, D. (2011). Capturing the essence of folding
and functions of biomolecules using coarse-grained models, Nature
Communications 2, p. 487.
Knotts, T. A., Rathore, N., Schwartz, D. C., and de Pablo, J. J. (2007). A coarse
grain model for DNA, The Journal of Chemical Physics 126, 8, p. 084901.
Kornyshev, A. A., and Leikin, S. (2013). Helical structure determines different
susceptibilities of dsDNA, dsRNA, and tsDNA to counterion-induced
condensation, Biophysical Journal 104, 9, pp. 2031–2041.
Korolev, N., Fan, Y., Lyubartsev, A. P., and Nordenskiöld, L. (2012). Modelling
chromatin structure and dynamics: status and prospects, Current
Opinion in Structural Biology 22, 2, pp. 151–159.
Laio, A., and Gervasio, F. L. (2008). Metadynamics: A method to simulate
rare events and reconstruct the free energy in biophysics, chemistry
and material science, Reports on Progress in Physics 71, 12, p. 126601.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

568 Perspectives on the Coarse-Grained Models of DNA

Linak, M. C., Tourdot, R., and Dorfman, K. D. (2011). Moving beyond


Watson–Crick models of coarse grained DNA dynamics, The Journal of
Chemical Physics 135, 20, p. 205102.
Lyubartsev, A., and Laaksonen, A. (1995). Calculation of effective interaction
potentials from radial distribution functions: A reverse Monte Carlo
approach, Physical Review E 52, 4, pp. 3730–3737.
MacKerell Jr., A. D., and Nilsson, L. (2008). Molecular dynamics simulations
of nucleic acid–protein complexes, Current opinion in structural biology
18, 2, pp. 194–199.
Morriss-Andrews, A., Rottler, J., and Plotkin, S. S. (2010). A systematically
coarse-grained model for DNA and its predictions for persistence
length, stacking, twist, and chirality, The Journal of Chemical Physics
132, 3, p. 035105.
Noid, W. G. (2012). Systematic methods for structurally consistent coarse-
grained models, in Biomolecular Simulations (Humana Press, Totowa,
NJ), pp. 487–531.
Ouldridge, T. E., Louis, A. A., and Doye, J. P. K. (2011). Structural, mechanical,
and thermodynamic properties of a coarse-grained DNA model, The
Journal of Chemical Physics 134, 8, p. 085101.
Owczarzy, R., You, Y., Moreira, B. G., and Manthey, J. A. (2004). Effects of
sodium ions on DNA duplex oligomers: improved predictions of melting
temperatures, Biochemistry 43, 12, pp. 3537–3554.
Ponomarev, S. Y., Thayer, K. M., and Beveridge, D. L. (2004). Ion motions in
molecular dynamics simulations on DNA, Proceedings of the National
Academy of Sciences of the United States of America 101, 41, pp. 14771–
14775.
Potoyan, D. A., and Papoian, G. A. (2012). Regulation of the H4 tail binding
and folding landscapes via Lys-16 acetylation, Proceedings of the
National Academy of Sciences 109, 44, pp. 17857–17862.
Potoyan, D. A., Savelyev, A., and Papoian, G. A. (2012). Recent successes
in coarse-grained modeling of DNA, Wiley Interdisciplinary Reviews:
Computational Molecular Science 3, 1, pp. 69–83.
Protozanova, E., and Yakovchuk, P. (2004). Stacked–unstacked equilibrium
at the nick site of DNA, Journal of Molecular Biology 342, pp. 775–
785.
Reith, D., Pütz, M., and Müller Plathe, F. (2003). Deriving effective mesoscale
potentials from atomistic simulations, Journal of Computational Chem-
istry 24, 13, pp. 1624–1636.
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

References 569

Rohs, R., Jin, X., West, S. M., Joshi, R., Honig, B., and Mann, R. S. (2010).
Origins of specificity in protein-DNA recognition, Annual Review of
Biochemistry 79, 1, pp. 233–269.
Rohs, R., West, S. M., Sosinsky, A., Liu, P., Mann, R. S., and Honig, B. (2009).
The role of DNA shape in protein-DNA recognition, Nature 461, 7268,
pp. 1248–1253.
SantaLucia, J., and Hicks, D. (2004). The thermodynamics of DNA structural
motifs, Annual Review of Physical Chemistry 33, 1, pp. 415–440.
Savelyev, A., Materese, C. K., and Papoian, G. A. (2011a). Is DNA’s rigidity
dominated by electrostatic or nonelectrostatic interactions? Journal of
the American Chemical Society 133, 48, pp. 19290–19293.
Savelyev, A., Materese, C. K., and Papoian, G. A. (2011b). Is DNA’s Rigidity
Dominated by Electrostatic or Nonelectrostatic Interactions? Journal of
the American Chemical Society 133, 48, pp. 19290–19293.
Savelyev, A., and Papoian, G. A. (2006). Electrostatic, steric, and hydration
interactions favor Na+ condensation around DNA compared with
K+, Journal of the American Chemical Society 128, 45, pp. 14506–
14518.
Savelyev, A., and Papoian, G. A. (2009a). Molecular renormalization group
coarse-graining of electrolyte solutions: Application to aqueous NaCl
and KCl, The Journal of Physical Chemistry B 113, 22, pp. 7785–7793.
Savelyev, A., and Papoian, G. A. (2009b). Molecular renormalization group
coarse-graining of polymer chains: Application to double-stranded
DNA, Biophysical Journal 96, 10, pp. 4044–4052.
Savelyev, A., and Papoian, G. A. (2010a). Chemically accurate coarse graining
of double-stranded DNA, Proceedings of the National Academy of
Sciences 107, 47, pp. 20340–20345.
Savelyev, A., and Papoian, G. A. (2010b). Chemically accurate coarse graining
of double-stranded DNA, Proceedings of the National Academy of
Sciences of the United States of America 107, 47, pp. 20340–20345.
Srinivas, N., Ouldridge, T. E., Šulc, P., Schaeffer, J. M., Yurke, B., Louis, A. A.,
Doye, J. P. K., and Winfree, E. (2013). On the biophysics and kinetics of
toehold-mediated DNA strand displacement, Nucleic Acids Research 41,
22, pp. 10641–10658.
Swendsen, R. H. (1979). Monte Carlo Renormalization Group, Physical
Review Letters 42, 14, pp. 859–861.
Voth, G. A. (ed.) (2008). Coarse-Graining of Condensed Phase and Biomolecu-
lar Systems (CRC Press).
January 29, 2016 11:33 PSP Book - 9in x 6in 16-Qiang-Cui-c16

570 Perspectives on the Coarse-Grained Models of DNA

Wang, Y., Feng, S., and Voth, G. A. (2009a). Transferable coarse-grained


models for ionic liquids, Journal of Chemical Theory and Computation
5, pp. 1091–1098.
Wang, Y., Noid, W. G., and Liu, P. a. (2009b). Effective force coarse-graining,
Physical Chemistry Chemical Physics 11, 12, pp. 2002–2015.
Yoo, J., and Aksimentiev, A. (2013). In situ structure and dynamics of
DNA origami determined through molecular dynamics simulations,
Proceedings of the National Academy of Sciences 110, 50, pp. 20099–
20104.

You might also like