0002unit 2 Notes
0002unit 2 Notes
At one time, computational chemistry techniques were used only by experts extremely
experienced in using tools that were for the most part difficult to understand and apply. Today,
advances in software have produced programs that are easily used by any chemist. Along with
new software comes new literature on the subject. There are now books that describe the
fundamental principles of computational chemistry at almost any level of detail. A number of
books also exist that explain how to apply computational chemistry techniques to simple
calculations appropriate for student assignments. There are, in addition, many detailed research
papers on advanced topics that are intended to be read only by professional theorists.
The group that has the most difficulty finding appropriate literature are working chemists, not
theorists. These are experienced researchers who know chemistry and now have computational
tools available. These are people who want to use computational chemistry to address real-
world research problems and are bound to run into significant difficulties. This unit is chosen
to cover a large number of topics, with an emphasis on when and how to apply computational
techniques rather than focusing on theory. It gives a clear description with just the amount of
technical depth typically necessary to be able to apply the techniques to computational
problems. There are many good books describing the fundamental theory on which
computational chemistry is built. The description of that theory as given here is very minimal.
We have chosen to include just enough theory to explain the terminology used in computing.
With that in mind, one can categorize efficient combinations of theory and experiment into
three classes. In the first category, theory is applied post facto to a situation where some
ambiguity exists in the interpretation of existing experimental results. For example, photolysis
of a compound in an inert matrix may lead to a single product species analyzed by
spectroscopy. However, the identity of this unique product may not be obvious given a number
of plausible alternatives. A calculation of the energies and spectra for all of the postulated
products provides an opportunity for comparison and may prove to be definitive. In the second
category, theory may be employed in a simultaneous fashion to optimize the design and
progress of an experimental program. Continuing the above analogy, a priori calculation of
spectra for plausible products may assist in choosing experimental parameters to permit the
observation of minor components which might otherwise be missed in a complicated mixture
(e.g., theory may allow the experimental instrument to be tuned properly to observe a signal
whose location would not otherwise be predictable).
Finally, theory may be used to predict properties which might be especially difficult or
dangerous (i.e., costly) to measure experimentally. In the difficult category are such data as rate
constants for the reactions of trace, upper-atmospheric constituents that might play an
important role in the ozone cycle. For sufficiently small systems, levels of quantum mechanical
theory can now be brought to bear that have accuracies comparable to the best modern
experimental techniques, and computationally derived rate constants may find use in complex
kinetic models until such time as experimental data are available. As for dangerous
experiments, theoretical pre-screening of a series of toxic or explosive compounds for desirable
(or undesirable) properties may assist in prioritizing the order in which they are prepared,
thereby increasing the probability that an acceptable product will be arrived at in a maximally
efficient manner.
Molecular Interactions:
Molecular interactions are attractive or repulsive forces between molecules and between non-
bonded atoms. Molecular interactions are important in all aspects of chemistry, biochemistry IMPORTANCE
and biophysics, including protein folding, drug design, pathogen detection, material science,
sensors, gecko feet, nanotechnology, separations, and origins of life. Molecular interactions are
also known as noncovalent interactions, intermolecular interactions, non-bonding interactions,
noncovalent forces and intermolecular forces. All of five of these phrases mean the same thing.
Boiling Points. When a molecule transitions from the liquid to the gas phase (as during boiling),
ideally all molecular interactions are disrupted. Ideal gases are the ONLY systems where there
are no molecular interactions. Differences in boiling temperatures give good qualitative
indications of strengths of molecular interactions in the liquid phase. High boiling liquids have
strong molecular interactions. The boiling point of H 2O is hundreds of degrees greater than the
boiling point of N2 because of stronger molecular interactions in H2O (liq) than in N2(liq). The
forces between molecules in H2O (liq) are greater than those in N2(liq).
repulsive The repulsive energy goes up as (di / R)12, where R is the distance between the atoms and di is
energy the distance threshold below which the energy becomes repulsive. d i depends on the types of
atoms. The large exponent means that when R < di then small decreases in R cause large
increases in repulsion. Short range repulsion only matters when atoms are in very close
proximity (R < di), but at close range it dominates other interactions. Because this repulsion
rises so sharply as distance decreases it is often useful to pretend that atoms are hard spheres,
like very small pool balls, with hard surfaces (called van der Waals surfaces) and well-defined
radii (called van der Waals radii).
As two atoms approach each other their van der Waals surfaces make contact when the distance
between them equals the sum of their van der Waals radii. At this distance the repulsive energy
van der Waals skyrockets. The smallest distance between two non-bonded atoms is the sum of the van der
Waals radii of the two atoms. A sulfur atom and a carbon atom can come no closer together
than:
Of course, we are assuming here that bonds do not form. When two atoms form a bond, they
come very close together and their der Waals radii and surfaces are violated.
Figure 1 shows how short range repulsion sets the distance of 3.4 Å between sheets in graphite.
If two non-bonded atoms are separated by a distance of less than the sum of their VDW radii,
* short range repulsion forces them apart.
Short range repulsion is important to you. It prevents your hands from passing through each
other when you clap, and prevents atoms from collapsing into tightly packed states of enormous
* density of 1014 g/ml, which is the density of condensed atomic nuclei.
Here in earth, with our modest gravity, the van der Waals radius of carbon (rC) is evident from
the spacing between the layers in graphite. The distance between atoms in different layers of
graphite is never less than twice the van der Waals radius of carbon (2 x r C = 2 x 1.7 = 3.4 Å).
The atoms within a graphite layer are covalently linked (bonded), which causes interpenetration
of van der Waals surfaces. Carbon atoms within a layer are separated by 1.42 Å, which is much
less than twice the van der Waals radius of carbon. As explained in other sections of this
document vdw surfaces are also violated when molecules form hydrogen bonds.
Electrostatic interactions
Electrostatic interactions are between and among cations and anions, species with charge of ...-
2, -1, +1, +2... Electrostatic interactions can be either attractive or repulsive, depending on the
signs of the charges. Like charges repel. Unlike charges attract. Favorable electrostatic
interactions cause the vapor pressure of sodium chloride and other salts to be very low. If you
leave crystals of table salt (NaCl; Na+=cation, Cl-=anion) on a hot pan, how long does it take
before they vaporize and sublime away? A very very long time; electrostatic interactions are
very very strong. The electrostatic interactions within a sodium chloride crystal are called ionic
bonds. But when a single cation and a single anion are close together, within a protein, or within
a folded RNA, those interactions are considered to be non-covalent electrostatic interactions.
Non-covalent electrostatic interactions can be strong, and act at long range. Electrostatic forces
fall off gradually with distance (1/r2, where r is the distance between the ions).
Figure 2 shows electrostatic interactions in a cross section of a NaCl crystal. Each sodium
cation experiences strong electrostatic interactions with adjacent chloride anions.
Figure 3 shows electrostatic interactions. In RNA (for example in the ribosome), anionic
phosphate oxygens (charge = -1) engage in attractive electrostatic interactions with a
magnesium cation (charge = +2). Two phosphate groups can 'clamp' onto the Mg2+ ion. The O
to Mg2+ distance is 2.1 Å. The dashed lines represent favorable electrostatic interactions.
Electrostatic interactions are the primary stabilizing interaction between phosphate oxygens of
RNA (charge = -1) and magnesium ions (charge = +2), as shown in the figure below. There are
many magnesium ions associated with RNA and DNA in vivo. Electrostatic interactions are
highly attenuated (dampened) by water. In protein folding, RNA folding and DNA annealing,
electrostatic interactions are dependent on salt concentration and pH.
The dielectric micro-environments are complex and variable, with less shielding of
charges in regions of hydrocarbon sidechains and greater shielding in regions of polar
sidechains. The electrostatic energy is given by:
ΔE= k a q1 q2 / ε r
where a = Avogadro's number.
One can crudely estimate the energetics of a charge-charge interaction in a protein. The energy
of an amine (charge +1) and a carboxylic acid (charge -1) separated by 4 Å in the interior of
protein is given by:
This rough approximation is around 10-fold greater than the values determined experimentally.
An ion pair contributes favourable ΔG of 1 to 4 kcal/mole (4.1 to 16.4 kjoule/mole) to the
stability of a native protein.
Figure 4 shows an ion pair within a folded protein. An anionic aspartic acid (charge = -1)
engages in attractive electrostatic interactions with cationic arginine (charge = +1). The dashed
lines represent hydrogen bonds.
Hydrogen Bonding
The idea that a single hydrogen atom could interact simultaneously with two other atoms was
proposed in 1920 by Latimer and Rodebush and their advisor, G. N. Lewis. Maurice Huggins,
who was also a student in Lewis' lab, describes the hydrogen bond in his 1919 dissertation.
A hydrogen bond is a favourable interaction between an atom with a basic lone pair of electrons
(a Lewis Base) and a hydrogen atom that has been partially stripped of its electrons because it
is covalently bound to an electronegative atom (N, O, or S). In a hydrogen bond, the Lewis
Base is the hydrogen bond acceptor (A) and the partially exposed proton is bound to the
hydrogen bond donor (H-D).
Why hydrogen? Hydrogen is special because it is the only atom that (i) forms covalent sigma
bonds with electronegative atoms like N, O and S, and (ii) uses the inner shell (1S) electron(s)
in that covalent bond. When its electronegative bonding partner pulls the bonding electrons
away from hydrogen, the hydrogen nucleus (a proton) is exposed on the back side (distal from
the bonding partner). The unshielded face of the proton is exposed, attracting the partial
negative charge of an electron lone pair. Hydrogen is the only atom that exposes its nucleus
this way. Other atoms have inner shell non-bonding electrons that shield the nucleus.
Figure 5 illustrates the elements of a hydrogen bond, including the HB acceptor and HB donor,
the lone pair and the exposed proton. N, O, S are the predominant hydrogen bonding atoms (A
& D) in biological systems.
A hydrogen bond is not an acid-base reaction, where the proton (H+) is fully transferred from
H-D to A to form D- and HA+. However, the strength of a hydrogen bond correlates well with
the acidity of donor H-D and the basicity of acceptor A. In a hydrogen bond, the H+ is partially
transferred from H-D to A, but H+ remains covalently attached to D. The H-D bond remains
intact.
Figure 6 illustrates three different styles for representing a hydrogen bond. Atom A is the Lewis
base (for example the N in NH3 or the O in H2O) and the atom D is electronegative (for example
O, N or S). The conventional nomenclature is confusing: a hydrogen bond is not a covalent
bond.
Figure 7 shows the most common hydrogen bond acceptors and donors in biological
macromolecules.
he most common hydrogen bonds in biological systems involve oxygen and nitrogen atoms as
A and D. Keto groups (=O), amines (R3N), imines (R=N-R) and hydroxyl groups (-OH) are
the most common hydrogen bond acceptors in DNA, RNA, proteins and complex
carbohydrates. Hydroxyl groups and amines/imines are the most common hydrogen bond
donors. Hydroxyls and amines/imines can both donate and accept hydrogen bonds.
In traversing the Period Table, increasing the electronegativity of atom D strips electron density
from the proton (in H-D), increasing its partial positive charge, and increasing the strength of
any hydrogen bond. Thiols (-SH) can both donate and accept hydrogen bonds but these are
generally weak, because sulfur is not sufficiently electronegative. Hydrogen bonds involving
carbon, where H-D equals H-C, are observed, although these are weak and infrequent. C is
insufficiently electronegative to form good hydrogen bonds. Hydrogen bonds are essentially
electrostatic in nature, although the energy can be decomposed into additional contributions
from polarization, exchange repulsion, charge transfer, and mixing.
Hydrogen bond strengths form a continuum. Strong hydrogen bonds of 20-40 kcal/mole (82 to
164 kjoule/mole), generally formed between charged donors and acceptors, are nearly as strong
as covalent bonds, Weak hydrogen bonds of 1-5 kcal/mole (4 - 21 kjoule/mole), sometimes
formed with carbon as the proton donor, are no stronger than conventional dipole-dipole
interactions. Moderate hydrogen bonds, which are the most common, are formed between
neutral donors and acceptors are from 3 - 12 kcal/mole (12 - 50 kjoule/mole)).
Figure 8 shows cooperativity of the hydrogen bonds of an acetic acid dimer (top) and of a G-
C base pair (bottom). One hydrogen bond increases the stability of the adjacent hydrogen bond
(and vice versa).
Figure 9 shows cooperativity via resonance of the hydrogen bonds of an anti-parallel β-sheet.
Molecular Topology:
One property of molecules appears to be very close to a binary relation: that is two atoms in a
given molecule are either bonded or not bonded. Therefore, molecules can be represented by
graphs when the only property considered is the existence or not of a chemical bond. This
property is called molecular topology. We define molecular topology as the totality of
information contained in the molecular graph. In chemistry graphs can represent different
chemical objects: molecules, reactions, crystals, polymers, clusters, etc. The common feature
of chemical systems is the presence of sites and connections between them. Sites may be atoms,
electrons, molecules, molecular fragments, groups of atoms, inter- mediates, orbitals, etc. The
connections between sites may represent bonds of any kind, bonded and nonbonded
interactions, elementary reaction steps, rearrangements, van der Waals forces, etc. Chemical
systems may be depicted by chemical graphs using a simple conversion rule:
Site ↔ vertex
connection ↔-» edge
A special class of chemical graphs are molecular graphs. Molecular graphs are chemical
graphs which represent the constitution of molecules. They are also called constitutional
graphs. In these graphs vertices correspond to individual atoms and edges to chemical bonds
between them. Molecular graphs are necessarily connected graphs. As examples the molecular
graphs corresponding to propane and cyclopropane are shown in Figure 10.
Figure 10 The molecular graphs corresponding to propane and cyclopropane
In order to simplify the handling of molecular graphs, hydrogen-suppressed graphs, i.e., graphs
depicting only molecular skeletons without hydrogen atoms and their bonds, are often used.
They are also called skeleton graphs. The hydrogen-suppressed graphs are almost universally
used in chemical graph theory, because the neglect of the hydrogen atoms and their bonds in
most cases cannot be the cause of any ambiguity. The hydrogen- suppressed graphs
corresponding to butane and cyclobutane are given in Figure 11.
Figure 11 The hydrogen suppressed molecular graphs depicting butane and cyclobutene
The molecular graph grossly simplifies the complex picture of a molecule by depicting only
its constitution (i.e., the chemical bonds between the various pairs of atoms in the molecule)
and neglecting other structural features (e.g., geometry, stereochemistry, chirality). Even so,
a simple picture of a molecule as the molecular graph can enable one to make useful
predictions about physical and chemical properties of molecules. Since the predictions of
properties and reactivities of molecules are of prime interest to chemists, the development of
chemical graph theory is, thus, justified.
Molecular graphs depicting constitutional formulae of molecules represent their topology. This
is a chemist’s view of molecular topology. However, a more precise definition of molecular
topology may also be given using the concept of the molecular graph. A topological space is
formed by a set and the topological structure defined upon the set. A simple connected
(molecular) graph can be associated with a topological space if it can be shown that a
topological structure is defined upon its vertex-set.
The most important matrix representation of a graph G is the vertex- adjacency matrix A =
A(G). This matrix is also of importance in chemistry and physics.
The vertex-adjacency matrix A(G) of a labeled connected graph G with N vertices is the
square N x N symmetric matrix which contains information about the internal connectivity of
vertices in G. It is defined as,
1 𝑖𝑓, 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓(𝑖, 𝑗) ∈ 𝐸(𝐺)
Aij = { (1)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Aii = 0 (2)
Therefore, a nonzero entry appears in A(G) only if an edge connects vertices i and j.
For example, the following vertex-adjacency matrix can be constructed for a labeled
graph G (Figure 12).
For example, the following edge-adjacency matrix can be constructed for a labeled graph G
in Figure 12:
Although both the vertex-adjacency matrix and the edge-adjacency matrix reflect the topology
of a molecule, they differ in their structure. However, it should be noted while the vertex-
adjacency matrix uniquely determines the graph, the edge-adjacency matrix does not. In other
words, there are known non-isomorphic graphs with identical edge-adjacency matrices. A
pair of such non-isomorphic graphs is shown in Figure 13. The corresponding edge-adjacency
matrix is given by
Figure14 A pair of nonisomorphic graphs (G1 and G2) which possess the identical edge-
adjacency matrix.
𝑒𝑖𝑗, if i ≠ j
(D)ij = { (6)
0, 𝑖𝑓 𝑖 = 𝑗
where 𝑒𝑖𝑗,is the length of the shortest path (i.e., the minimum number of edges) between
the vertices vi , and vj. The length is also called the distance between the vertices vi, and vj
thence the term distance matrix. For example, the following distance matrix can be
constructed for a labeled graph G (Figure 1 5):
The distance matrix has found a widespread application in chemistry in both explicit and
implicit forms. The first explicit use of the distance matrix was employed the for studying the
permutational isomers of stereo chemically nonrigid molecules. The distance matrix in
explicit form is also used to generate the distance polynomial and the distance spectrum.
Topological indices
A single number that can be used to characterize the graph of a molecule is called a
topological index. (The term graph-theoretical index would be more accurate than topological
index, but the latter is more common in the chemical literature.) A topological index, thus,
appears to be a convenient device for converting chemical constitution into a number.
Evidently, this number must have the same value for a given molecule regardless of ways
in which the corresponding graph is drawn or labeled. Such a number is referred to by graph
theorists as a graph invariant. For example, one of the simplest graph invariants
(topological indices) is the number of vertices in the graph (the number of atoms in the
molecule). Hence, it could be simply said that topological indices are graph invariants. It
should also be pointed out that topological indices do not generally allow the reconstruction
of the molecular graph, implying that a certain loss of information has occurred during their
creation.
The interest in topological indices is in the main related to their use in nonempirical quantitative
structure-property relationships (QSPR) and quantitative structure-activity relationships
(QSAR). The latter use in such areas as pharmacology, toxicology, environmental chemistry,
and drug design is intensively studied by many researchers.
Most of the proposed topological indices are related to either a vertex adjacency relationship
(connectivity) in the molecular graph G or to graph- theoretical (topological) distances in G.
Therefore, the origin of topological indices can be traced either to the adjacency matrix of
a molecular graph or to the distance matrix of a molecular graph. Furthermore, since the
distance matrix can be generated from the adjacency matrix, most of the topological indices
are really related to the latter matrix.
One of the ultimate targets of theoretical chemists is to build schemes that would allow accurate
predictions of the bulk properties of matter from the knowledge of molecular structure. We are
still far away from this ideal., but one way of trying to achieve this goal is by means of
topological indexes since they serve as convenient descriptors of molecular structure
In these indices one counts the connections from each vertex (node, carbon). The first Zagreb
index M 1(G) is equal to the sum of squares of the degrees of the vertices, and the second
Zagreb index M 2(G) is equal to the sum of the products of the degrees of pairs of adjacent
vertices of the underlying molecular graph G.
or pentane, each would be calculated as:
M1 = 12 + 22 + 22 + 22 + 12 = 1 + 4 + 4 + 4 + 1= 14
M1 = 12 + 12 + 32 + 22 + 22 + 12 = 1 + 1 + 9+ 4 + 4 + 1= 20
There are thousands of 2D descriptors that are frequently applied in modeling or predicting
properties or biological functions. What is interesting is that these graphs are often descriptors
that are reduced to a single value that can be used to make meaning of the physical world.
Zagreb group indices were introduced to characterize branching.
Wiener Index
One of the first mathematical representations of chemical structure used for prediction of
properties was developed in 1947 by Harold Weiner. It is defined at the sum of distances
between any two carbon atoms (pairs of nodes) in the molecule. Mathematically it is
represented as:
1 2
Where G represents the total atoms in the molecule, u and v are individual carbon atoms and
d(u,v) is the distance in bonds between any two carbon atoms in the shortest path between any
two atoms. In using this index, Weiner showed that the index value is closely correlated with
the boiling point of a series of alkanes. Further work also showed that it correlated with other
physical properties such as density, surface tension and viscosity.
To calculate the Wiener index for a molecule, for each pair of atoms in the structure, count the
distance between atoms. Take the sum of all distances and divide by two. For example in the
case of ethane, which only has two nodes:
u v
u 0 1
v 1 0
Pentane has 5 nodes, and distances between each node are calculated and summed.
B D
A B C D E total
A 0 1 2 3 4 10 A
C E
B 1 0 1 2 3 7
C 2 1 0 1 2 6
D 3 2 1 0 1 7
E 4 3 2 1 0 10
F(G) =∑𝑀
𝑖=1 D(𝑒𝑖 ), The Platt number, thus represents the first neighbors sum.
2 2
2 3 3 2
where A and I are, respectively, the adjacency matrix of a graph G with N vertices and the N x
N unit matrix. A graph eigenvalue xi is a zero of the characteristic polynomials.
P(G;xi) = 0
for i = 1, 2 . , N. The complete set of graph eigenvalues {x1, x2, . . . xN} forms the spectrum
of the graph. The eigenvalues are all real and the interval in which they lie is bounded.
According to the Frobenius theorem, the limits of the graph spectrum are determined by the
maximum valency of a vertex Dmax in a graph: -Dmax ≤ xi ≤ Dmax
The largest eigenvalue, xi, in the graph spectrum may be used as a topological index. For
example, it has been found that xi can be employed as a measure of branching and that (alkane)
trees can be well ordered according to xi. In Figure 20 as an example, the ordering of alkane
trees with seven vertices is shown. The smallest value of xi belongs to C7 chain and the largest
value of xi to the most branched C7 alkane tree. The largest eigenvalue is not a very
discriminative index, because in many cases the same xi value belongs to two (or more)
nonisomorphic molecular graphs. One such degenerate pair appears also in the alkane trees
shown in Figure 20.
Alkane Tree xi
Figure 20 The ordering of alkane trees with seven vertices according to the increasing value of
xi. This order follows the intuitive notion of branching
QSAR/QSPC concept for Insilco prediction of properties
Explain the difference between QSAR & QSPR, which are used to predict the properties of molecules
IMP
Quantitative structure property relationships (QSPR) and, when applied to biological activity,
quantitative structure activity relationships (QSAR) are methods for1determining properties due
to very sophisticated mechanisms purely by a curve fit of that property to aspects of the molecular
structure. This allows 2a property to be predicted independent of having a complete knowledge of its
origin. For example, drug activity can be predicted without knowing the nature of the binding
site for that drug. Structure–property relationships are3qualitative or quantitative empirically
defined relationships between molecular structure and observed properties. In some cases, this may
seem to4duplicate statistical mechanical or quantum mechanical results. However, structure-property
relationships5 need not be based on any rigorous theoretical principles. The6simplest case of
structure-property relationships a qualitative rule of thumb. For example, the statement that
branched polymers are generally more biodegradable than straight-chain polymers is a
qualitative structure–property relationship. When structure-property relationships are
mentioned in the current literature, it usually implies a quantitative mathematical relationship.
Such relationships are most often derived by using curve-fitting software to find the linear
combination of molecular properties that best predicts the property for a set of known
compounds. This prediction equation can be used for either the interpolation or extrapolation
of test set results. Interpolation is usually more accurate than extrapolation. When the property
being described is a physical property, such as the boiling point, this is referred to as a
quantitative structure–property relationship (QSPR). When the property being described is a
type of biological activity, such as drug activity, this is referred to as a quantitative structure–
activity relationship (QSAR). Our discussion will first address QSPR. All the points covered
in the QSPR section are also applicable to QSAR, which is discussed next.
QSPR
The first step in developing a QSPR equation is to compile a list of compounds for which the
STEP1 experimentally determined property is known. Ideally, this list should be very large. Often,
thousands of compounds are used in a QSPR study. If there are fewer compounds on the list
than parameters to be fitted in the equation, then the curve fit will fail. If the same number
exists for both, then an exact fit will be obtained. This exact fit is misleading because it fits the
equation to all the anomalies in the data, it does not necessarily reflect all the correct trends
necessary for a predictive method. In order to ensure that the method will be predictive, there
should ideally be 10 times as many test compounds as fitted parameters. The choice of
compounds is also important. For example, if the equation is only fitted with hydrocarbon data,
it will only be reliable for predicting hydrocarbon properties.
The next step is to obtain geometries for the molecules. Crystal structure geometries
STEP2can be used; however, it is better to use theoretically optimized geometries. By using the
theoretical geometries, any systematic errors in the computation will cancel out. Furthermore,
the method will predict as yet unsynthesized compounds using theoretical geometries. Some
of the simpler methods require connectivity only.
STEP3 Molecular descriptors must then be computed. Any numerical value that describes the molecule
could be used. Many descriptors are obtained from molecular mechanics or semiempirical
calculations. Energies, population analysis, and vibrational frequency analysis with its
associated thermodynamic quantities are often obtained this way. Ab initio results can be used
reliably, but are often avoided due to the large amount of computation necessary. The largest
percentage of descriptors are easily determined values, such as molecular weights, topological
indexes, moments of inertia, and so on. Table 30.1 lists some of the descriptors that have been
found to be useful in previous studies. These are discussed in more detail in the review articles
listed in the bibliography.
STEP4 Once the descriptors have been computed, is necessary to decide which ones will be used. This
is usually done by computing correlation coefficients. Correlation coefficients are a measure
of how closely two values (descriptor and property) are related to one another by a linear
relationship. If a descriptor has a correlation coefficient of 1, it describes the property exactly.
A correlation coefficient of zero means the descriptor has no relevance. The descriptors with
the largest correlation coefficients are used in the curve fit to create a property prediction
equation. There is no rigorous way to determine how large a correlation coefficient is
acceptable.
Intercorrelation coefficients are then computed. These tell when one descriptor is redundant
with another. Using redundant descriptors increases the amount of fitting work to be done, does
not improve the results, and results in unstable fitting calculations that can fail completely (due
to dividing by zero or some other mathematical error). Usually, the descriptor with the lowest
correlation coefficient is discarded from a pair of redundant descriptors.
A curve fit is then done to create a linear equation, such as
where ci are the fitted parameters and di the descriptors. Most often, the equation being fitted
is a linear equation like the one above. This is because the use of correlation coefficients and
linear equations together is an easily automated process. Introductory descriptions cite linear
regression as the algorithm for determining coefficients of best fit, but the mathematically
equivalent matrix least- squares method is actually more efficient and easier to implement.
Occasionally, a nonlinear parameter, such as the square root or log of a quantity, is used. This
is done when a researcher is aware of such nonlinear relationships in advance.
Syllabus
Computational chemistry: Scope, cost and efficiency of computational modeling. Stabilizing
interactions: Bonded and non-bonded interactions. Molecular topology, topological matrix
representation, topological indices, QSAR/QSPC concept for insilico prediction of properties.
3D co-ordinate generation for small molecules, geometry optimization.