0% found this document useful (0 votes)
46 views

0002unit 2 Notes

This document discusses computational chemistry and molecular modeling. It covers several topics, including the scope and applications of computational modeling in research, the availability of literature at different technical levels, and factors that influence the efficiency and resource requirements of computational calculations like molecular interactions, trade-offs between calculation methods, and examples of how theory can be combined with or guide experimentation.

Uploaded by

kishan kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

0002unit 2 Notes

This document discusses computational chemistry and molecular modeling. It covers several topics, including the scope and applications of computational modeling in research, the availability of literature at different technical levels, and factors that influence the efficiency and resource requirements of computational calculations like molecular interactions, trade-offs between calculation methods, and examples of how theory can be combined with or guide experimentation.

Uploaded by

kishan kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

COMPUTATIONAL CHEMISTRY – UNIT II

Scope of Computational Modeling


Due to the advancement of technologies the speed of research work is supplemented by the
usage of various tools and softwares. In many cases the computational research is done at the
initial stage followed by the experimental work. In order to do interdisciplinary research, the
expertise from different categories is essential. A background of computing with domain
knowledge is always beneficial. The challenge is always how do we incorporate the existing
knowledge/skill and build up new knowledge/skills to contribute effectively.

At one time, computational chemistry techniques were used only by experts extremely
experienced in using tools that were for the most part difficult to understand and apply. Today,
advances in software have produced programs that are easily used by any chemist. Along with
new software comes new literature on the subject. There are now books that describe the
fundamental principles of computational chemistry at almost any level of detail. A number of
books also exist that explain how to apply computational chemistry techniques to simple
calculations appropriate for student assignments. There are, in addition, many detailed research
papers on advanced topics that are intended to be read only by professional theorists.

The group that has the most difficulty finding appropriate literature are working chemists, not
theorists. These are experienced researchers who know chemistry and now have computational
tools available. These are people who want to use computational chemistry to address real-
world research problems and are bound to run into significant difficulties. This unit is chosen
to cover a large number of topics, with an emphasis on when and how to apply computational
techniques rather than focusing on theory. It gives a clear description with just the amount of
technical depth typically necessary to be able to apply the techniques to computational
problems. There are many good books describing the fundamental theory on which
computational chemistry is built. The description of that theory as given here is very minimal.
We have chosen to include just enough theory to explain the terminology used in computing.

Many computational chemistry techniques are extremely computer-intensive. Depending on


the type of calculation desired, it could take anywhere from seconds to weeks to do a single
calculation. There are many calculations, such as ab initio analysis of biomolecules, that cannot
be done on the largest computers in existence. Likewise, calculations can take very large
amounts of computer memory and hard disk space. In order to complete work in a reasonable
amount of time, it is necessary to understand what factors contribute to the computer resource
requirements. Ideally, the user should be able to predict in advance how much computing
power will be needed. Kishan kumar S D
There are often trade-offs between equivalent ways of doing the same calculation. For example,
many ab initio programs use hard disk space to store numbers that are computed once and used
several times during the course of the calculation. These are the integrals that describe the
overlap between various basis functions. Instead of the above method, called conventional
integral evaluation, it is possible to use direct integral evaluation in which the numbers are
recomputed as needed. Direct integral evaluation algorithms use less disk space at the expense
of requiring more CPU time to do the calculation. An in-core algorithm is one that stores all
the integrals in RAM memory, thus saving on disk space at the expense of requiring a computer
with a very large amount of memory. Many programs use a semidirect algorithm, which uses
some disk space and a bit more CPU time to obtain the optimal balance of both.
Cost and Efficiency
Chemistry’s impact on modern society is most readily perceived in the creation of mate-rials,
be they foods, textiles, circuit boards, fuels, drugs, packaging, etc. Thus, even the most ardent
theoretician would be unlikely to suggest that theory could ever supplant experiment. Rather,
most would opine that opportunities exist for combining theory with experiment so as to take
advantage of synergies between them.

With that in mind, one can categorize efficient combinations of theory and experiment into
three classes. In the first category, theory is applied post facto to a situation where some
ambiguity exists in the interpretation of existing experimental results. For example, photolysis
of a compound in an inert matrix may lead to a single product species analyzed by
spectroscopy. However, the identity of this unique product may not be obvious given a number
of plausible alternatives. A calculation of the energies and spectra for all of the postulated
products provides an opportunity for comparison and may prove to be definitive. In the second
category, theory may be employed in a simultaneous fashion to optimize the design and
progress of an experimental program. Continuing the above analogy, a priori calculation of
spectra for plausible products may assist in choosing experimental parameters to permit the
observation of minor components which might otherwise be missed in a complicated mixture
(e.g., theory may allow the experimental instrument to be tuned properly to observe a signal
whose location would not otherwise be predictable).

Finally, theory may be used to predict properties which might be especially difficult or
dangerous (i.e., costly) to measure experimentally. In the difficult category are such data as rate
constants for the reactions of trace, upper-atmospheric constituents that might play an
important role in the ozone cycle. For sufficiently small systems, levels of quantum mechanical
theory can now be brought to bear that have accuracies comparable to the best modern
experimental techniques, and computationally derived rate constants may find use in complex
kinetic models until such time as experimental data are available. As for dangerous
experiments, theoretical pre-screening of a series of toxic or explosive compounds for desirable
(or undesirable) properties may assist in prioritizing the order in which they are prepared,
thereby increasing the probability that an acceptable product will be arrived at in a maximally
efficient manner.

Molecular Interactions:
Molecular interactions are attractive or repulsive forces between molecules and between non-
bonded atoms. Molecular interactions are important in all aspects of chemistry, biochemistry IMPORTANCE
and biophysics, including protein folding, drug design, pathogen detection, material science,
sensors, gecko feet, nanotechnology, separations, and origins of life. Molecular interactions are
also known as noncovalent interactions, intermolecular interactions, non-bonding interactions,
noncovalent forces and intermolecular forces. All of five of these phrases mean the same thing.

Non-Bonding Interactions. Molecular Interactions are between molecules, or between atoms


that are not linked by bonds. Molecular interactions include cohesive (attraction between like),
adhesive (attraction between unlike) and repulsive forces between molecules. Molecular
interactions change (and bonds remain intact) when (a) ice melts, (b) water boils, (c) carbon
dioxide sublimes, (d) proteins unfold, (e) RNA unfolds, (f) DNA strands separate and (g)
membranes disassemble. The enthalpy of a given molecular interaction, between two non-
bonded atoms, is 1 - 10 kcal/mole (4 - 42 kjoule/mole), which in the lower limit is on the order
of RT and in the upper limit is significantly less than a covalent bond.
Bonding Interactions. Bonds hold atoms together within molecules. A molecule is a group of
atoms that associates strongly enough that it does not dissociate or lose structure when it
interacts with its environment. At room temperature two nitrogen atoms can be bonded (N2).
Bonds break and form during chemical reactions. In the chemical reaction called fire, bonds of
cellulose break while bonds of carbon dioxide and water form. Bond enthalpies are on the order
of 100 kcal/mole (400 kjoule/mole), which is much greater than RT at room temperature; bonds
do not break at room temperature.

Boiling Points. When a molecule transitions from the liquid to the gas phase (as during boiling),
ideally all molecular interactions are disrupted. Ideal gases are the ONLY systems where there
are no molecular interactions. Differences in boiling temperatures give good qualitative
indications of strengths of molecular interactions in the liquid phase. High boiling liquids have
strong molecular interactions. The boiling point of H 2O is hundreds of degrees greater than the
boiling point of N2 because of stronger molecular interactions in H2O (liq) than in N2(liq). The
forces between molecules in H2O (liq) are greater than those in N2(liq).

Short range repulsion


Atoms take space. Force two atoms together and they will push back. When two atoms are
repulsion close together, the occupied orbitals on the atom surfaces overlap, causing electrostatic
repulsion between surface electrons. This repulsive force between atoms acts over a very short
range, but is very large when distances are short.

repulsive The repulsive energy goes up as (di / R)12, where R is the distance between the atoms and di is
energy the distance threshold below which the energy becomes repulsive. d i depends on the types of
atoms. The large exponent means that when R < di then small decreases in R cause large
increases in repulsion. Short range repulsion only matters when atoms are in very close
proximity (R < di), but at close range it dominates other interactions. Because this repulsion
rises so sharply as distance decreases it is often useful to pretend that atoms are hard spheres,
like very small pool balls, with hard surfaces (called van der Waals surfaces) and well-defined
radii (called van der Waals radii).

As two atoms approach each other their van der Waals surfaces make contact when the distance
between them equals the sum of their van der Waals radii. At this distance the repulsive energy
van der Waals skyrockets. The smallest distance between two non-bonded atoms is the sum of the van der
Waals radii of the two atoms. A sulfur atom and a carbon atom can come no closer together
than:

rS + rC = 1.8 + 1.7 = 3.5 Å.

Of course, we are assuming here that bonds do not form. When two atoms form a bond, they
come very close together and their der Waals radii and surfaces are violated.
Figure 1 shows how short range repulsion sets the distance of 3.4 Å between sheets in graphite.
If two non-bonded atoms are separated by a distance of less than the sum of their VDW radii,
* short range repulsion forces them apart.

Short range repulsion is important to you. It prevents your hands from passing through each
other when you clap, and prevents atoms from collapsing into tightly packed states of enormous
* density of 1014 g/ml, which is the density of condensed atomic nuclei.

Here in earth, with our modest gravity, the van der Waals radius of carbon (rC) is evident from
the spacing between the layers in graphite. The distance between atoms in different layers of
graphite is never less than twice the van der Waals radius of carbon (2 x r C = 2 x 1.7 = 3.4 Å).
The atoms within a graphite layer are covalently linked (bonded), which causes interpenetration
of van der Waals surfaces. Carbon atoms within a layer are separated by 1.42 Å, which is much
less than twice the van der Waals radius of carbon. As explained in other sections of this
document vdw surfaces are also violated when molecules form hydrogen bonds.

Electrostatic interactions
Electrostatic interactions are between and among cations and anions, species with charge of ...-
2, -1, +1, +2... Electrostatic interactions can be either attractive or repulsive, depending on the
signs of the charges. Like charges repel. Unlike charges attract. Favorable electrostatic
interactions cause the vapor pressure of sodium chloride and other salts to be very low. If you
leave crystals of table salt (NaCl; Na+=cation, Cl-=anion) on a hot pan, how long does it take
before they vaporize and sublime away? A very very long time; electrostatic interactions are
very very strong. The electrostatic interactions within a sodium chloride crystal are called ionic
bonds. But when a single cation and a single anion are close together, within a protein, or within
a folded RNA, those interactions are considered to be non-covalent electrostatic interactions.
Non-covalent electrostatic interactions can be strong, and act at long range. Electrostatic forces
fall off gradually with distance (1/r2, where r is the distance between the ions).
Figure 2 shows electrostatic interactions in a cross section of a NaCl crystal. Each sodium
cation experiences strong electrostatic interactions with adjacent chloride anions.

Figure 3 shows electrostatic interactions. In RNA (for example in the ribosome), anionic
phosphate oxygens (charge = -1) engage in attractive electrostatic interactions with a
magnesium cation (charge = +2). Two phosphate groups can 'clamp' onto the Mg2+ ion. The O
to Mg2+ distance is 2.1 Å. The dashed lines represent favorable electrostatic interactions.

Electrostatic interactions are the primary stabilizing interaction between phosphate oxygens of
RNA (charge = -1) and magnesium ions (charge = +2), as shown in the figure below. There are
many magnesium ions associated with RNA and DNA in vivo. Electrostatic interactions are
highly attenuated (dampened) by water. In protein folding, RNA folding and DNA annealing,
electrostatic interactions are dependent on salt concentration and pH.

Ion pairs in proteins


Favorable electrostatic interactions between paired anionic and cationic amino acid sidechains
are reasonably frequent in proteins. Ion Pairs, sometimes called Salt Bridges, are formed when
the charged group of a cationic amino acid (like lysine or arginine) is around 3.0 to 5.0 Å from
the charged group of an anionic amino acid (like aspartate or glutamate). The charged groups
in an ion pair are generally linked by hydrogen bonds, in addition to electrostatic interactions.
The electrostatic force between two-point charges is given by:
Force = k q1 q2 / ε r2
where k = 9.0 x 109 nt-meter2 / coul2
q = -1.6 x 10-19 coulombs for an electron.
r = distance between the point charges (meters)
ε = the dielectric constant of the medium (unitless).
ε is the dielectric constant. It reflects the tendency of the medium to shield charged species
from each other. ε is 1 in a vacuum, around 4 in the interior of a protein and 80 in water. Water
is very efficient at shielding charges, reducing electrostatic forces between ions. The problem
of calculating electrostatic effects in biological systems is complex in part because of non-
uniformity of the dielectric environment.

The dielectric micro-environments are complex and variable, with less shielding of
charges in regions of hydrocarbon sidechains and greater shielding in regions of polar
sidechains. The electrostatic energy is given by:

ΔE= k a q1 q2 / ε r
where a = Avogadro's number.

One can crudely estimate the energetics of a charge-charge interaction in a protein. The energy
of an amine (charge +1) and a carboxylic acid (charge -1) separated by 4 Å in the interior of
protein is given by:

ΔE = -(9.0x109nt-m2/coul2)(6.02x1023)(1.6x10-19coul)2 /4( 4x10-10m)


= 87 kjoules/mole = 21 kcal/mole

This rough approximation is around 10-fold greater than the values determined experimentally.
An ion pair contributes favourable ΔG of 1 to 4 kcal/mole (4.1 to 16.4 kjoule/mole) to the
stability of a native protein.

Figure 4 shows an ion pair within a folded protein. An anionic aspartic acid (charge = -1)
engages in attractive electrostatic interactions with cationic arginine (charge = +1). The dashed
lines represent hydrogen bonds.
Hydrogen Bonding
The idea that a single hydrogen atom could interact simultaneously with two other atoms was
proposed in 1920 by Latimer and Rodebush and their advisor, G. N. Lewis. Maurice Huggins,
who was also a student in Lewis' lab, describes the hydrogen bond in his 1919 dissertation.

A hydrogen bond is a favourable interaction between an atom with a basic lone pair of electrons
(a Lewis Base) and a hydrogen atom that has been partially stripped of its electrons because it
is covalently bound to an electronegative atom (N, O, or S). In a hydrogen bond, the Lewis
Base is the hydrogen bond acceptor (A) and the partially exposed proton is bound to the
hydrogen bond donor (H-D).

Why hydrogen? Hydrogen is special because it is the only atom that (i) forms covalent sigma
bonds with electronegative atoms like N, O and S, and (ii) uses the inner shell (1S) electron(s)
in that covalent bond. When its electronegative bonding partner pulls the bonding electrons
away from hydrogen, the hydrogen nucleus (a proton) is exposed on the back side (distal from
the bonding partner). The unshielded face of the proton is exposed, attracting the partial
negative charge of an electron lone pair. Hydrogen is the only atom that exposes its nucleus
this way. Other atoms have inner shell non-bonding electrons that shield the nucleus.

Figure 5 illustrates the elements of a hydrogen bond, including the HB acceptor and HB donor,
the lone pair and the exposed proton. N, O, S are the predominant hydrogen bonding atoms (A
& D) in biological systems.

A hydrogen bond is not an acid-base reaction, where the proton (H+) is fully transferred from
H-D to A to form D- and HA+. However, the strength of a hydrogen bond correlates well with
the acidity of donor H-D and the basicity of acceptor A. In a hydrogen bond, the H+ is partially
transferred from H-D to A, but H+ remains covalently attached to D. The H-D bond remains
intact.
Figure 6 illustrates three different styles for representing a hydrogen bond. Atom A is the Lewis
base (for example the N in NH3 or the O in H2O) and the atom D is electronegative (for example
O, N or S). The conventional nomenclature is confusing: a hydrogen bond is not a covalent
bond.

Figure 7 shows the most common hydrogen bond acceptors and donors in biological
macromolecules.

he most common hydrogen bonds in biological systems involve oxygen and nitrogen atoms as
A and D. Keto groups (=O), amines (R3N), imines (R=N-R) and hydroxyl groups (-OH) are
the most common hydrogen bond acceptors in DNA, RNA, proteins and complex
carbohydrates. Hydroxyl groups and amines/imines are the most common hydrogen bond
donors. Hydroxyls and amines/imines can both donate and accept hydrogen bonds.
In traversing the Period Table, increasing the electronegativity of atom D strips electron density
from the proton (in H-D), increasing its partial positive charge, and increasing the strength of
any hydrogen bond. Thiols (-SH) can both donate and accept hydrogen bonds but these are
generally weak, because sulfur is not sufficiently electronegative. Hydrogen bonds involving
carbon, where H-D equals H-C, are observed, although these are weak and infrequent. C is
insufficiently electronegative to form good hydrogen bonds. Hydrogen bonds are essentially
electrostatic in nature, although the energy can be decomposed into additional contributions
from polarization, exchange repulsion, charge transfer, and mixing.

Hydrogen bond strengths form a continuum. Strong hydrogen bonds of 20-40 kcal/mole (82 to
164 kjoule/mole), generally formed between charged donors and acceptors, are nearly as strong
as covalent bonds, Weak hydrogen bonds of 1-5 kcal/mole (4 - 21 kjoule/mole), sometimes
formed with carbon as the proton donor, are no stronger than conventional dipole-dipole
interactions. Moderate hydrogen bonds, which are the most common, are formed between
neutral donors and acceptors are from 3 - 12 kcal/mole (12 - 50 kjoule/mole)).

A hydrogen bond is not a bond. It is a molecular interaction (a non-bonding interaction).

Cooperativity of hydrogen bonds


In biological systems, hydrogen bonds are frequently cooperative and are stabilized by
resonance involving multiple hydrogen bonds. In systems with multiple hydrogen bonds, the
strength of one hydrogen bond is increased by a adjacent hydrogen bond. For example in the
hydrogen-bonded systems below (the acetic acid dimer), the top hydrogen bond increases both
the acidity of the hydrogen, and the basicity of the oxygen in the bottom hydrogen bond. Each
hydrogen bond makes the other stronger than it would be in isolation. Cooperativity of
hydrogen bonding is observed in base pairing and in folded proteins.

Figure 8 shows cooperativity of the hydrogen bonds of an acetic acid dimer (top) and of a G-
C base pair (bottom). One hydrogen bond increases the stability of the adjacent hydrogen bond
(and vice versa).
Figure 9 shows cooperativity via resonance of the hydrogen bonds of an anti-parallel β-sheet.

Molecular Topology:
One property of molecules appears to be very close to a binary relation: that is two atoms in a
given molecule are either bonded or not bonded. Therefore, molecules can be represented by
graphs when the only property considered is the existence or not of a chemical bond. This
property is called molecular topology. We define molecular topology as the totality of
information contained in the molecular graph. In chemistry graphs can represent different
chemical objects: molecules, reactions, crystals, polymers, clusters, etc. The common feature
of chemical systems is the presence of sites and connections between them. Sites may be atoms,
electrons, molecules, molecular fragments, groups of atoms, inter- mediates, orbitals, etc. The
connections between sites may represent bonds of any kind, bonded and nonbonded
interactions, elementary reaction steps, rearrangements, van der Waals forces, etc. Chemical
systems may be depicted by chemical graphs using a simple conversion rule:

Site ↔ vertex
connection ↔-» edge

A special class of chemical graphs are molecular graphs. Molecular graphs are chemical
graphs which represent the constitution of molecules. They are also called constitutional
graphs. In these graphs vertices correspond to individual atoms and edges to chemical bonds
between them. Molecular graphs are necessarily connected graphs. As examples the molecular
graphs corresponding to propane and cyclopropane are shown in Figure 10.
Figure 10 The molecular graphs corresponding to propane and cyclopropane

In order to simplify the handling of molecular graphs, hydrogen-suppressed graphs, i.e., graphs
depicting only molecular skeletons without hydrogen atoms and their bonds, are often used.
They are also called skeleton graphs. The hydrogen-suppressed graphs are almost universally
used in chemical graph theory, because the neglect of the hydrogen atoms and their bonds in
most cases cannot be the cause of any ambiguity. The hydrogen- suppressed graphs
corresponding to butane and cyclobutane are given in Figure 11.

Figure 11 The hydrogen suppressed molecular graphs depicting butane and cyclobutene

The molecular graph grossly simplifies the complex picture of a molecule by depicting only
its constitution (i.e., the chemical bonds between the various pairs of atoms in the molecule)
and neglecting other structural features (e.g., geometry, stereochemistry, chirality). Even so,
a simple picture of a molecule as the molecular graph can enable one to make useful
predictions about physical and chemical properties of molecules. Since the predictions of
properties and reactivities of molecules are of prime interest to chemists, the development of
chemical graph theory is, thus, justified.
Molecular graphs depicting constitutional formulae of molecules represent their topology. This
is a chemist’s view of molecular topology. However, a more precise definition of molecular
topology may also be given using the concept of the molecular graph. A topological space is
formed by a set and the topological structure defined upon the set. A simple connected
(molecular) graph can be associated with a topological space if it can be shown that a
topological structure is defined upon its vertex-set.

Graph theoretical matricesJASTHI LEKKA BEKARE NOTES KAN


Graphs, adequately labeled, may be associated with several matrices. A graph G is labeled if a
certain numbering of vertices of G is introduced. Here two graph-theoretical matrices, i.e., the
adjacency matrix and the distance matrix will be discussed. They are also sometimes referred
to as topological matrices. These matrices may be used for identifying certain properties of
graphs, which would not otherwise easily emerge.

The Adjacency Matrix

The most important matrix representation of a graph G is the vertex- adjacency matrix A =
A(G). This matrix is also of importance in chemistry and physics.

The vertex-adjacency matrix A(G) of a labeled connected graph G with N vertices is the
square N x N symmetric matrix which contains information about the internal connectivity of
vertices in G. It is defined as,
1 𝑖𝑓, 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓(𝑖, 𝑗) ∈ 𝐸(𝐺)
Aij = { (1)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Aii = 0 (2)

Therefore, a nonzero entry appears in A(G) only if an edge connects vertices i and j.
For example, the following vertex-adjacency matrix can be constructed for a labeled
graph G (Figure 12).

Figure 12 A vertex and edge labelled graph G


The adjacency matrix is symmetrical about the principal diagonal. There- fore, the
transpose of the adjacency matrix A leaves the adjacency matrix unchanged,
AT(G) = A(G) (3)

This transpose AT is formed by interchanging rows and columns of the matrix A.

The edge-adjacency matrix of a graph G, EA = EA(G), is determined by the


adjacencies of edges in G. It is very rarely used. The edge-adjacency matrix is defined
as
1 if edges 𝑒𝑖 and 𝑒𝑗 are adjacent
(EA)ij = { (4)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(EA)ii = 0 (5)

For example, the following edge-adjacency matrix can be constructed for a labeled graph G
in Figure 12:

Although both the vertex-adjacency matrix and the edge-adjacency matrix reflect the topology
of a molecule, they differ in their structure. However, it should be noted while the vertex-
adjacency matrix uniquely determines the graph, the edge-adjacency matrix does not. In other
words, there are known non-isomorphic graphs with identical edge-adjacency matrices. A
pair of such non-isomorphic graphs is shown in Figure 13. The corresponding edge-adjacency
matrix is given by

Figure14 A pair of nonisomorphic graphs (G1 and G2) which possess the identical edge-
adjacency matrix.

Graphs Gi and G2 have obviously different vertex-adjacency matrices.

The Distance Matrix


The distance matrix (which is also sometimes called the metrics matrix) is, in a sense, a more
complicated and also a richer structure than the adjacency matrix. It is a graph-theoretical
(topological) matrix less common than the adjacency matrix, but it has been increasingly used
in the last two decades. in many different areas of chemistry and physics. It has been pointed
out an interesting fact that the distance matrix has also found considerable use in the areas of
research which are relatively remote from chemistry and physics and to a great extent non
mathematical, such as anthropology, geography, geology, ornithology, philology, and
psychology.

The distance matrix D = D(G) of a labeled connected graph G is a real symmetric N x N


matrix whose elements ( D )ij are defined as follows:

𝑒𝑖𝑗, if i ≠ j
(D)ij = { (6)
0, 𝑖𝑓 𝑖 = 𝑗
where 𝑒𝑖𝑗,is the length of the shortest path (i.e., the minimum number of edges) between
the vertices vi , and vj. The length is also called the distance between the vertices vi, and vj
thence the term distance matrix. For example, the following distance matrix can be
constructed for a labeled graph G (Figure 1 5):

Figure 15 A label graph G

The distance matrix has found a widespread application in chemistry in both explicit and
implicit forms. The first explicit use of the distance matrix was employed the for studying the
permutational isomers of stereo chemically nonrigid molecules. The distance matrix in
explicit form is also used to generate the distance polynomial and the distance spectrum.

Topological indices
A single number that can be used to characterize the graph of a molecule is called a
topological index. (The term graph-theoretical index would be more accurate than topological
index, but the latter is more common in the chemical literature.) A topological index, thus,
appears to be a convenient device for converting chemical constitution into a number.
Evidently, this number must have the same value for a given molecule regardless of ways
in which the corresponding graph is drawn or labeled. Such a number is referred to by graph
theorists as a graph invariant. For example, one of the simplest graph invariants
(topological indices) is the number of vertices in the graph (the number of atoms in the
molecule). Hence, it could be simply said that topological indices are graph invariants. It
should also be pointed out that topological indices do not generally allow the reconstruction
of the molecular graph, implying that a certain loss of information has occurred during their
creation.

The interest in topological indices is in the main related to their use in nonempirical quantitative
structure-property relationships (QSPR) and quantitative structure-activity relationships
(QSAR). The latter use in such areas as pharmacology, toxicology, environmental chemistry,
and drug design is intensively studied by many researchers.

Definition of Topological indices


Topological indices were introduced (albeit unknowingly) 150 years ago, and the very fact that
they are still in use today is demonstration of their durability and versatility. There are more
than 120 topological indices (including information-theoretic indices) available to date in the
literature, with no sign that their proliferation will stop in the near future. This large (and
every increasing) number of topological indices indicates that perhaps a clear and
unambiguous criterion for their selection and verification is still missing, although some
attempts along these lines have been reported. Moreover, a large number of topological indices
also lead to a question to what extent are they orthogonal? In other words, is it possible that
some topological indices express predominantly the same type of constitutional information:
the difference residing in the scaling factor? Several analyses on the example of alkane
trees with up to 12 vertices indicate that a number of topological indices are strongly
intercorrelated i.e., that many of them contain to a great extent the same type of structural
information.

Most of the proposed topological indices are related to either a vertex adjacency relationship
(connectivity) in the molecular graph G or to graph- theoretical (topological) distances in G.
Therefore, the origin of topological indices can be traced either to the adjacency matrix of
a molecular graph or to the distance matrix of a molecular graph. Furthermore, since the
distance matrix can be generated from the adjacency matrix, most of the topological indices
are really related to the latter matrix.

One of the ultimate targets of theoretical chemists is to build schemes that would allow accurate
predictions of the bulk properties of matter from the knowledge of molecular structure. We are
still far away from this ideal., but one way of trying to achieve this goal is by means of
topological indexes since they serve as convenient descriptors of molecular structure

Examples of topological indices


Zagreb Indices
The first and second Zagreb indices (M1 and M2) are another set of classic vertex-based
descriptors developed in 1972 and 1975, respectively. They were called the Zagreb group
indices as their authors were members of the “Rudjer Bošković” Institute in Zagreb, Croatia.

In these indices one counts the connections from each vertex (node, carbon). The first Zagreb
index M 1(G) is equal to the sum of squares of the degrees of the vertices, and the second
Zagreb index M 2(G) is equal to the sum of the products of the degrees of pairs of adjacent
vertices of the underlying molecular graph G.
or pentane, each would be calculated as:

Figure 16 A label graph of pentane

M1 = 12 + 22 + 22 + 22 + 12 = 1 + 4 + 4 + 4 + 1= 14

M2= 1x2 + 2x2 + 2x2 + 2x1 = 2+4+4+2 = 12

For 2-methylpentane, each would be calculated as:

Figure 17 A label graph of 2-methylpentane

M1 = 12 + 12 + 32 + 22 + 22 + 12 = 1 + 1 + 9+ 4 + 4 + 1= 20

M2 = 1x3 + 1x3 + 3x2 + 2x2 + 2x1 = 3+3+6+4+2 = 18

There are thousands of 2D descriptors that are frequently applied in modeling or predicting
properties or biological functions. What is interesting is that these graphs are often descriptors
that are reduced to a single value that can be used to make meaning of the physical world.
Zagreb group indices were introduced to characterize branching.

Wiener Index
One of the first mathematical representations of chemical structure used for prediction of
properties was developed in 1947 by Harold Weiner. It is defined at the sum of distances
between any two carbon atoms (pairs of nodes) in the molecule. Mathematically it is
represented as:
1 2
Where G represents the total atoms in the molecule, u and v are individual carbon atoms and
d(u,v) is the distance in bonds between any two carbon atoms in the shortest path between any
two atoms. In using this index, Weiner showed that the index value is closely correlated with
the boiling point of a series of alkanes. Further work also showed that it correlated with other
physical properties such as density, surface tension and viscosity.

To calculate the Wiener index for a molecule, for each pair of atoms in the structure, count the
distance between atoms. Take the sum of all distances and divide by two. For example in the
case of ethane, which only has two nodes:

Figure 18 A label graph of ethane

u v
u 0 1
v 1 0

Pentane has 5 nodes, and distances between each node are calculated and summed.
B D
A B C D E total
A 0 1 2 3 4 10 A
C E
B 1 0 1 2 3 7
C 2 1 0 1 2 6
D 3 2 1 0 1 7
E 4 3 2 1 0 10

The Platt Number


Platt was also interested in devising a scheme for predicting physical parameters (molar
volumes, boiling points, heats of formation, heats of vaporization) of alkanes. He introduced
an index F = F(G), which is equal to the total sum of edge-degrees in a graph G. The edge-
degree of an edge e, D(e), is the number of its adjacent edges. This index was named the Platt
number. The Platt number of a graph G is defined by

F(G) =∑𝑀
𝑖=1 D(𝑒𝑖 ), The Platt number, thus represents the first neighbors sum.

2 2
2 3 3 2

Figure 19 A label graph of 2, 4 di-methyl pentane


DBT
The Largest Eigenvalue
The characteristic (spectral) polynomial P(G;x) of a graph G is the characteristic polynomial
of its adjacency matrix,

P(G;x) = det |xI - A|

where A and I are, respectively, the adjacency matrix of a graph G with N vertices and the N x
N unit matrix. A graph eigenvalue xi is a zero of the characteristic polynomials.

P(G;xi) = 0
for i = 1, 2 . , N. The complete set of graph eigenvalues {x1, x2, . . . xN} forms the spectrum
of the graph. The eigenvalues are all real and the interval in which they lie is bounded.
According to the Frobenius theorem, the limits of the graph spectrum are determined by the
maximum valency of a vertex Dmax in a graph: -Dmax ≤ xi ≤ Dmax
The largest eigenvalue, xi, in the graph spectrum may be used as a topological index. For
example, it has been found that xi can be employed as a measure of branching and that (alkane)
trees can be well ordered according to xi. In Figure 20 as an example, the ordering of alkane
trees with seven vertices is shown. The smallest value of xi belongs to C7 chain and the largest
value of xi to the most branched C7 alkane tree. The largest eigenvalue is not a very
discriminative index, because in many cases the same xi value belongs to two (or more)
nonisomorphic molecular graphs. One such degenerate pair appears also in the alkane trees
shown in Figure 20.
Alkane Tree xi

Figure 20 The ordering of alkane trees with seven vertices according to the increasing value of
xi. This order follows the intuitive notion of branching
QSAR/QSPC concept for Insilco prediction of properties
Explain the difference between QSAR & QSPR, which are used to predict the properties of molecules
IMP
Quantitative structure property relationships (QSPR) and, when applied to biological activity,
quantitative structure activity relationships (QSAR) are methods for1determining properties due
to very sophisticated mechanisms purely by a curve fit of that property to aspects of the molecular
structure. This allows 2a property to be predicted independent of having a complete knowledge of its
origin. For example, drug activity can be predicted without knowing the nature of the binding
site for that drug. Structure–property relationships are3qualitative or quantitative empirically
defined relationships between molecular structure and observed properties. In some cases, this may
seem to4duplicate statistical mechanical or quantum mechanical results. However, structure-property
relationships5 need not be based on any rigorous theoretical principles. The6simplest case of
structure-property relationships a qualitative rule of thumb. For example, the statement that
branched polymers are generally more biodegradable than straight-chain polymers is a
qualitative structure–property relationship. When structure-property relationships are
mentioned in the current literature, it usually implies a quantitative mathematical relationship.
Such relationships are most often derived by using curve-fitting software to find the linear
combination of molecular properties that best predicts the property for a set of known
compounds. This prediction equation can be used for either the interpolation or extrapolation
of test set results. Interpolation is usually more accurate than extrapolation. When the property
being described is a physical property, such as the boiling point, this is referred to as a
quantitative structure–property relationship (QSPR). When the property being described is a
type of biological activity, such as drug activity, this is referred to as a quantitative structure–
activity relationship (QSAR). Our discussion will first address QSPR. All the points covered
in the QSPR section are also applicable to QSAR, which is discussed next.

QSPR
The first step in developing a QSPR equation is to compile a list of compounds for which the
STEP1 experimentally determined property is known. Ideally, this list should be very large. Often,
thousands of compounds are used in a QSPR study. If there are fewer compounds on the list
than parameters to be fitted in the equation, then the curve fit will fail. If the same number
exists for both, then an exact fit will be obtained. This exact fit is misleading because it fits the
equation to all the anomalies in the data, it does not necessarily reflect all the correct trends
necessary for a predictive method. In order to ensure that the method will be predictive, there
should ideally be 10 times as many test compounds as fitted parameters. The choice of
compounds is also important. For example, if the equation is only fitted with hydrocarbon data,
it will only be reliable for predicting hydrocarbon properties.

The next step is to obtain geometries for the molecules. Crystal structure geometries
STEP2can be used; however, it is better to use theoretically optimized geometries. By using the
theoretical geometries, any systematic errors in the computation will cancel out. Furthermore,
the method will predict as yet unsynthesized compounds using theoretical geometries. Some
of the simpler methods require connectivity only.

STEP3 Molecular descriptors must then be computed. Any numerical value that describes the molecule
could be used. Many descriptors are obtained from molecular mechanics or semiempirical
calculations. Energies, population analysis, and vibrational frequency analysis with its
associated thermodynamic quantities are often obtained this way. Ab initio results can be used
reliably, but are often avoided due to the large amount of computation necessary. The largest
percentage of descriptors are easily determined values, such as molecular weights, topological
indexes, moments of inertia, and so on. Table 30.1 lists some of the descriptors that have been
found to be useful in previous studies. These are discussed in more detail in the review articles
listed in the bibliography.

STEP4 Once the descriptors have been computed, is necessary to decide which ones will be used. This
is usually done by computing correlation coefficients. Correlation coefficients are a measure
of how closely two values (descriptor and property) are related to one another by a linear
relationship. If a descriptor has a correlation coefficient of 1, it describes the property exactly.
A correlation coefficient of zero means the descriptor has no relevance. The descriptors with
the largest correlation coefficients are used in the curve fit to create a property prediction
equation. There is no rigorous way to determine how large a correlation coefficient is
acceptable.
Intercorrelation coefficients are then computed. These tell when one descriptor is redundant
with another. Using redundant descriptors increases the amount of fitting work to be done, does
not improve the results, and results in unstable fitting calculations that can fail completely (due
to dividing by zero or some other mathematical error). Usually, the descriptor with the lowest
correlation coefficient is discarded from a pair of redundant descriptors.
A curve fit is then done to create a linear equation, such as

Property = c0 + c1d1 + c2d2 + ···

where ci are the fitted parameters and di the descriptors. Most often, the equation being fitted
is a linear equation like the one above. This is because the use of correlation coefficients and
linear equations together is an easily automated process. Introductory descriptions cite linear
regression as the algorithm for determining coefficients of best fit, but the mathematically
equivalent matrix least- squares method is actually more efficient and easier to implement.
Occasionally, a nonlinear parameter, such as the square root or log of a quantity, is used. This
is done when a researcher is aware of such nonlinear relationships in advance.

QSAR : import from QSPR (property=activity)


QSAR is also called traditional QSAR or Hansch QSAR to distinguish it from the 3D QSAR
method. This is the application of the technique described above to biological activities, such
as environmental toxicology or drug activity. The discussion above is applicable but a number
of other caveats apply; which are addressed in this section. The following discussion is oriented
toward drug design, although the same points may be applicable to other areas of research as
well.
In order to parameterize a QSAR equation, a quantified activity for a set of compounds must
be known. These are called lead compounds, at least in the pharmaceutical industry. Typically,
test results are available for only a small number of compounds. Because of this, it can be
difficult to choose a number of descriptors that will give useful results without fitting to
anomalies in the test set. Three to five lead compounds per descriptor in the QSAR equation
are normally considered an adequate number. If two descriptors are nearly col- linear with one
another, then one should be omitted even though it may have a large correlation coefficient.
In the case of drug design, it may be desirable to use parabolic functions in place of linear
functions. The descriptor for an ideal drug candidate often has an optimum value. Drug activity
will decrease when the value is either larger or smaller than optimum. This functional form is
described by a parabola, not a linear relationship.
The advantage of using QSAR over other modeling techniques is that it takes into account the
full complexity of the biological system without re- quiring any information about the binding
site. The disadvantage is that the method will not distinguish between the contribution of
binding and trans- port properties in determining drug activity. QSAR is very useful for deter-
mining general criteria for activity, but it does not readily yield detailed structural predictions.

Predicting Molecular Geometry


Computing the geometry of a molecule is one of the most basic functions of a computational
chemistry program. However, it is not trivial process. The user of the program will be able to
get their work done more quickly if they have some understanding of the various algorithms
within the software. The user must first describe the geometry of the molecule. Then the
program computes the energies and gradients of the energy to find the molecular geometry
corresponding to the lowest energy.

Specifying Molecular Geometry


1 One way of defining the geometry of a molecule is by using a list of bond distances, angles,
and conformational angles, called a Z-matrix. A Z-matrix is a convenient way to specify the
geometry of a molecule by hand. This is because it corresponds to the way that most chemists
think about molecular structure: in terms of bonds, angles, and so on (we are not discussing
details of constructing a Z-matrix).
2 Another way to define the geometry of a molecule is as a set of Cartesian coordinates for each
atom. Graphic interface programs often generate Cartesian coordinates since this is the most
convenient way to write those programs. It is becoming more common to uses programs that
have a graphical builder in which the user can essentially draw the molecule. There are several
ways in which such programs work. Some programs allow the molecule to be built as a two-
dimensional stick structure and then convert it into a three-dimensional structure. Some
programs have the user draw the three-dimensional backbone and then automatically add the
hydrogens. This works well for organic molecules. Some programs build up the molecule in
three dimensions starting from a list of elements and hybridizations, which can be most
convenient for inorganic molecules. Many programs include a library of commonly used
functional groups, which is convenient if it has the functional groups needed for a particular
project. A number of programs have specialized building modes for certain classes of
molecules, such as proteins, nucleotides, or carbohydrates.

Coordinate space for optimization


Many computational chemistry programs will do the geometry optimization in Cartesian
coordinates. This is often the only way to optimize geometry in molecular mechanics programs
and an optional method in orbital-based programs. A Cartesian coordinate optimization may
be more efficient than a poorly constructed Z-matrix. Cartesian coordinates are often preferable
when simulating more than one molecule since they allow complete freedom of motion
between separate molecules. Geometry optimizations that run poorly either take a large number
of iterations or fail to find an optimized geometry.

IMP Optimization algorithms


There are many different algorithms for finding the set of coordinates corresponding to the
minimum energy. These are called optimization algorithms because they can be used equally
well for finding the minimum or maximum of a function.
If only the energy is known, then the simplest algorithm is one called the simplex algorithm.
This is just a systematic way of trying larger and smaller variables for the coordinates and
keeping the changes that result in a lower energy. Simplex optimizations are used very rarely
because they require the most CPU time of any of the algorithms discussed here. A much better
algorithm to be used when only energy is known is the Fletcher–Powell ( FP) algorithm. This
algorithm builds up an internal list of gradients by keeping track of the energy changes from
one step to the next. The Fletcher–Powell algorithm is usually the method of choice when
energy gradients cannot be computed.
If the energy and the gradients of energy can be computed, there are a number of different
algorithms available. Some of the most efficient algorithms are the quasi-Newton algorithms,
which assume a quadratic potential surface. One of the most efficient quasi-Newton algorithms
is the Berny algorithm, which internally builds up a second derivative Hessian matrix. Steepest
decent and scaled steepest decent algorithms can be used if this is not a reasonable assumption.
Another good algorithm is the geometric direct inversion of the iterative subspace (GDIIS)
algorithm. Molecular mechanics programs often use the conjugate gradient method, which
finds the minimum by following each coordinate in turn, rather than taking small steps in each
direction. The Polak– Ribiere algorithm is a specific adaptation of the conjugate gradient for
molecular mechanics problems.
Algorithms using both the gradients and second derivatives (Hessian matrix) often require
fewer optimization steps but more CPU time due to the time necessary to compute the Hessian
matrix. In some cases, the Hessian is computed numerically from differences of gradients.
These methods are sometimes used when the other algorithms fail to optimize the geometry.

The materials in Unit-II are compiled from different reference sources.


1. COMPUTATIONAL CHEMISTRY, A Practical Guide for Applying Techniques to
Real-World Problems, Darid C. Young
2. Essentials of Computational Chemistry by Christopher J. Cramer
3. Molecular Interactions and the Behaviors of Biological Macromolecules, by Loren
Dean Williams
https://williams.chemistry.gatech.edu/structure/molecular_interactions/mol_int.html
4. Chemical graph theory, Nenad Trinajstic, Ph.D., Professor of Chemistry, The Rugjer
Boskovic Institute Zagreb, The Republic of Croatia
5. Topological Index calculator:
https://www.cs.gordon.edu/courses/organic/topo/manual-v3/manual.html

Syllabus
Computational chemistry: Scope, cost and efficiency of computational modeling. Stabilizing
interactions: Bonded and non-bonded interactions. Molecular topology, topological matrix
representation, topological indices, QSAR/QSPC concept for insilico prediction of properties.
3D co-ordinate generation for small molecules, geometry optimization.

You might also like