Conformer
Generation
QSAR & 3D-
QSAR
Conformer Generation:
• Systematic search (or grid search): It generates all probable conformations by systematically
varying each of the torsion angles of a molecule by some increment, keeping the bond lengths
and bond angles fixed.
• Monte Carlo: It simulates dynamic behavior of a compound and generates the conformations by
making random changes in its structure, calculating and comparing its energy with that of the
previous conformation, and accepting the result if it is unique.
• Random search: It generates a set of conformations by repetitively and arbitrarily changing
either the Cartesian (x, y, z) or the internal (bond lengths, bond angles, and torsion/dihedral
angles) coordinates of a starting geometry of the molecule under consideration.
• Molecular dynamics: It employs Newton’s second law of motion (force=mass*acceleration) to
simulate the time-dependent movements and conformational changes in a molecular system, and
results in a so-called trajectory showing how the positions and velocities of atoms in the
molecular system vary with time.
2
Conformer Generation:
• Simulated annealing: It theoretically heats up the molecular system under consideration to high
temperatures to overcome huge energy barriers, and after equilibrating there for some time using
molecular dynamics, cools down the system slowly and gradually to obtain low-energy
conformations according to the Boltzmann distribution.
• Distance geometry algorithm: It generates a random set of coordinates by selecting random
distances within each pair of upper and lower bounds to form constraints in a distance matrix,
which are employed to create energetically feasible conformations of a set of molecules.
• Genetic and evolutionary algorithms: It is based on the concept of biological evolution and
initially creates a population of promising solutions to the problem. The solutions with the best
fitness scores undergo crossovers and mutations over a time, and proliferate their good
distinctiveness down the generations resulting in better solutions in the form of new conformers.
3
Determination of bioactive conformations:
The bioactive conformation defines a particular conformation of the molecule in which it is bound
to the receptor.
The intrinsic forces between the atoms in the molecule, as well as extrinsic forces between the
molecule and its surrounding environment, considerably influence the bioactive conformation of
the molecule.
Bioactive conformations of the compounds can be attained both by experimental and theoretical
techniques.
Experimental methods for creating bioactive conformations comprise the techniques described.
* Xray Crystallography.
* NMR spectroscopy.
4
X-ray
crystallography:
• The precise 3D structure of the macromolecules can be obtained by this method.
• Drug-receptor complexes generated by X-ray crystallography logically offer the exact
information, but this method has several disadvantages:
• The protein needs to be crystallized, and the formation of crystallizing media is not typically like
the physiological conditions.
• There is a chance of structural distortion due to crystal packing.
• Due to crystal instability and active-site occlusion, it is often not promising to disperse
substrates or other biologically applicable molecules into the existing crystals.
• The positions of hydrogen atoms are tricky to be determined.
• There is a possibility of errors in determining the structure of the ligand.
5
NMR
spectroscopy:
• The 3D structural data is obtained in the solution and is a method of selection when the
molecule cannot be crystallized through experimental ways, as in the case of the membrane-
bound receptors or receptors, which have not yet been isolated due to stability, resolution, or
other issues. The imperative features of this method are:
• As no protein crystallization is required, the conformation of the protein is not influenced by
packing forces of the crystal environment.
• The solution conditions (pH, ionic strength, substrate, temperature, etc.) can be accustomed to
match the physiological conditions.
• Significant information regarding dynamic aspects of molecular motion can be obtained.
• It requires much less time but applicable to small molecules only.
• The positions of hydrogen atoms can be resolved.
• Apolar solvents may lead to an overprediction of hydrogen-bonding phenomena.
• Structures generated from NMR may not be comparable to the ones obtained from the
experiment and frequently it may not signify the receptor-bound conformation.
6
Chemo-informatics tools for drug
discovery:
• Marvin Sketch
• ChemSketch
7
QSAR:
• Quantitative structure-activity relationship (QSAR) approach relies on the basic principle of
chemistry that states that the biological activity of any ligand or compound is associated with
the arrangement of atoms forming the molecular structure.
• In other words, structurally related molecules possess similar biological activities.
• This structural information can be defined in terms of a series of parameters called molecular
descriptors.
• In QSAR, the biological activity is represented as a function of these molecular descriptors as
depicted in equation below.
Biological response or activity = f (molecular descriptors)
• The model thus developed based on the biological activities of known ligands is used to predict
the response of new compounds.
• QSAR finds applicability in a wide range of fields including toxicology, ecotoxicology, drug
design and discovery, chemical data mining, combinatorial library design and so on.
8
QSAR:
• QSAR studies therefore involve selection of active and inactive compounds with the measure of
their biological activity, description and calculation of molecular descriptors, selection of
appropriate features followed by construction of the mathematical model and its evaluation.
• Quantitative structure-activity relationship (QSAR) prediction depends on the structure of
molecules and atoms present in the compound. Biological activity is understood in terms of
numerical values (example bioavailability, inhibitory concentration) and presence/absence of a
condition (example infected/not infected, mutagenic/non mutagenic).
• Various QSAR studies have been carried out to understand biological properties such as
pharmacokinetics, blood brain barrier penetration (BBB), carcinogenicity, drug metabolism, bio-
concentration, permeability, drug clearance, mutagenicity, etc.
• Another term associated with this approach is Quantitative structure-property relationship
(QSPR).
• In QSPR, physiochemical properties of the chemical compounds are determined based on the
molecular structure information. Physiochemical properties such as melting point, boiling
point, solubility, stability, dielectric constant, reactivity, diffusion coefficient, thermodynamic
properties, hydrophobicity have been exploited to determine quantitative structure-property
relationships.
9
QSAR:
• In the classical QSAR studies, biological responses have been correlated with atomic, group, or
molecular properties such as lipophilicity, polarizability, electronic, and steric properties
(Hansch analysis) or with certain structural features (Free-Wilson analysis).
• However, in these techniques, one cannot ignore their limited utility for designing diverse
functional new molecules due to the lack of consideration of the three-dimensional (3D)
structures of the molecules.
• As a consequence, 3D-QSAR has emerged as a natural extension to the classical Hansch and
Free-Wilson approaches that exploits the 3D properties of the ligands to predict their biological
response by employing robust chemometric tools.
10
3D-
QSAR:
• The 3D-QSAR is a broad term encompassing all those QSAR methods that correlate macroscopic
target properties with computed atom-based descriptors derived from the spatial
representation of the molecular structures.
• These approaches have served as a valuable predictive tool in the design of pharmaceuticals
and agrochemicals.
• The prime goal of any 3D-QSAR method is to establish the relationship between biological
activity and spatial properties of chemicals like steric, electrostatic, and lipophilic ones.
• The 3D-QSAR methodology is computationally more exhaustive and complex than 2D-QSAR
approaches.
11
3D-
QSAR:
• Normally, it consists of several steps to acquire numerical descriptors from the compound
structures:
1. The optimum (near bioactive) conformation of the compound has to be determined, either
from experimental data (X-ray crystal structure or NMR) or a theoretical tool like molecular
mechanics, and then optimization of the energy has to be performed.
2. An alignment of the conformers in the data set has to be generated in 3D-space.
3. The space with an immersed conformer is probed computationally for generating various
descriptors.
4. Finally, the computed descriptors should be correlated with the experimental biological
response of the studied compounds.
12
3D-
QSAR:
• One has to understand that the QSAR model is not a substitute for the experimental assays,
although experimental techniques are also not free of inaccuracies. However, QSAR researchers
are trying to develop a model that is as close as possible to the real one, and for this purpose,
the 3D-QSAR techniques have to rely on some basic assumptions, which are illustrated here:
Binding of a drug molecule or ligand with the receptor is considered directly related to the
biological response. Effects on second messengers or other signaling effects between receptor
binding and experimentally observed response are not normally considered.
Molecular properties (physical, chemical, and biological) are encoded with a set of numbers or
descriptors.
It is believed in general that compounds with common structures have comparable properties,
and thus they have similar binding modes and accordingly equivalent biological activities and
vice versa.
Structural properties leading to a biological response are usually determined by nonbonding
forces, mainly steric and electrostatic ones.
Another important assumption is that the biological response is shown by the ligand itself, not
by its metabolite product.
13
3D-
QSAR:
The lowest-energy conformation of the ligand is its bioactive conformation, which exerts
binding effects.
The geometry of the receptor binding site is considered rigid, though there are a few
exceptions.
The loss of translational and rotational degrees of freedom (entropy) upon binding is believed
to follow a similar pattern for all these compounds.
The protein binding site is assumed to be the same for all of the studied ligands.
The major factors that contribute to the overall free energy of binding, like desolvation energy,
temperature, diffusion, transport, pH, salt concentration, and plasma protein binding, are
difficult to identify and thus are generally ignored.
14
CoMFA: (concept)
• Comparative molecular field analysis (CoMFA) is a molecular field based, alignment dependent,
ligand-based method developed by Cramer et al., which helps in building the quantitative
relationship of molecular structures and its response property.
• The method mostly focuses on ligand properties like steric and electrostatic ones, and the
resulting favorable and unfavorable receptor-ligand interactions.
• As CoMFA is an alignment-dependent, descriptor-based method, all aligned ligands are placed
in an energy grid, and by placing an appropriate probe at each lattice point, energy is
calculated.
• The resultant energy calculated at each unit fraction corresponds to electrostatic (Coulombic)
and steric (van der Waals) properties.
• These computed values serve as descriptors for model development. These descriptor values
are then correlated with biological responses employing a robust linear regression method like
partial least squares (PLS).
• The PLS results serve as an important signal to identify the favorable and unfavorable
electrostatic and steric potential and also correlate it with biological responses.
15
CoMFA: (methodology)
• The formalism of the CoMFA methodology is described next:
a. Structures of all molecules are drawn using any structure-drawing software.
b. The bioactive conformation of each molecule is generated and energy minimization is carried
out.
c. All the molecules are superimposed or aligned using either manual or automated methods
employed in the working software, in a manner defined by the supposed mode of interaction
with the receptor.
d. Thereafter, the overlaid compounds are positioned in the center of a lattice grid with a spacing
of 2 A °.
e. In the 3D space, the steric and electrostatic fields are calculated around the molecules with
different probe groups positioned at all intersections of the lattice.
16
CoMFA: (methodology)
Computation of the steric field uses the Lennard-Jones equation as follows:
is the depth of the potential well, σ is the finite distance at which the interparticle potential is
zero, r is the distance between the particles, and rm is the distance at which the potential reaches
its minimum. At rm, the potential function has the value 2ε. The distances are given as rm521/6σ.
Again, computation of electrostatic field follows the Coulombic interaction equation as follows:
where q1 and q2 denote point charges, r is the distance between charges, and ε is the dielectric
constant of the medium.
17
CoMFA: (methodology)
f. The interaction energy or field values forming a pool of the descriptor/variable matrix are
correlated with the biological response data employing the PLS technique, which identifies and
extracts the quantitative influence of specific features of molecules on their activity.
g. The results may be expressed as correlation equations with the number of latent variable
terms, each of which is a linear combination of original independent lattice descriptors.
h. For visual interpretation, the PLS output is illustrated in the form of interactive graphics
consisting of colored contour plots of coefficients of the corresponding field variables at each
lattice intersection, and showing the imperative favorable and unfavorable regions in the 3D
space, which are closely associated with the biological activity.
18
19
Advantages of CoMFA
The CoMFA technique has been very successful in medicinal chemistry and allied fields due to the
high interpretability of the models and ability to design new ligands in the structure-activity
correlation problems. The major advantages of CoMFA are illustrated as follows:
• The CoMFA considers important physicochemical features like steric and electrostatic forces
involved in ligandreceptor interactions.
• The technique appears extremely general, being directly applicable to any series of molecules
for which alignable models can be constructed and whose desired property is believed to result
from an alignment-dependent, noncovalent molecular interactions.
• Each CoMFA parameter represents the interaction energy of an entire ligand, not just the
interaction of a more or less randomly selected substructure of the ligand.
• The only inputs needed are models of all the molecules, their lattice description, and usually, an
explicit alignment rule. The most important outputs are the coefficient contour map displays and
model predictions.
20
Limitations and
drawbacks of CoMFA:
Although CoMFA offers many advantages over classical QSAR, it also has several limitations and
defects:
• Too many variables like overall orientation, lattice placement, step size, and probe atom type
are considered.
• It is appropriate only with in vitro data.
• There is a low signal-to-noise ratio due to many ineffectual field variables.
• There is improbability in the choice of molecules and variables.
• There are fragmented contour maps with variable selection procedures.
• Some potential energy functions are flawed.
• Hydrophobicity is not well quantified.
• Cutoff limits are utilized.
In general, CoMFA results are highly dependent on the accuracy of the conformational analysis,
determination of bioactive conformation, and method of alignment.
21
CoMSIA: (concept)
• Comparative molecular similarity indices analysis (CoMSIA) is a ligand-based, alignmentdependent,
and linear 3D-QSAR method that is a modified version of CoMFA.
• The approaches of CoMFA and CoMSIA are almost similar except for molecular similarity, which is
also computed in the case of CoMSIA.
• CoMFA mostly focuses on the alignment of molecules and may lead to errors in alignment
sensitivity and interpretation of electrostatic and steric potential.
• To address this, Gaussian potentials are employed in CoMSIA fields which are much softer than the
CoMFA functions.
• The usual energy grid box is created, and similar probes are positioned throughout the grid lattice.
• In addition, the solvent reliant molecular entropic (hydrophobicity) term is also included in the
CoMSIA. To analyze the property of a data set molecule, a common probe is placed and similarity at
each grid point is calculated.
• The computation is mostly done on steric, electrostatic, hydrophobic, and hydrogen-bonding
properties. The mentioned properties are computed at regularly spaced grid points corresponding
to a particular descriptor, and these are significant in correlation with the biological response.
22
CoMSIA: (methodology)
• In CoMSIA, five different similarity fields are calculated at regularly spaced grid points for the
aligned molecules: namely, steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen
bond acceptor.
• The interactions of the molecules with the probe atom under the influence of different similarity
fields are correlated with the biological responses of the molecules using appropriate chemometric
tool.
• The general formalism of the CoMSIA technique is illustrated as follows:
• a. Initially, conformer generation is performed for the studied molecules employing one of the
approaches like monte carlo, simulation etc.
• b. Energy minimization of the molecules is performed (the choice of technique depends on the
employed software, as well as the researchers’ requirements), and then partial atomic charges of
the molecules are calculated (using methods like the Gasteiger-Huckle method, Mulliken analysis,
Coulson’s charges, dipole charges, Voronoi deformation density, and density -derived electrostatic
and chemical methods).
23
CoMSIA: (methodology)
• c. The training set molecules are aligned based on the points of alignment of the most active
compound, which is used as the template molecule.
• d. Thereafter, molecular interaction based on the five physicochemical properties should be
calculated using a common probe atom with 1 A ° radius, charge of 1, hydrophobicity of 1, and
hydrogen bond donor and acceptor properties of 1. The grid can be extended beyond the molecular
dimensions by 2.0 A ° in all directions.
• e. Subsequently, the PLS approach is employed to derive the 3D-QSAR models using the similarity
(CoMSIA) factors as the independent variables and biological response as the dependent variable.
• f. The results are represented in the form of contour maps that characterize the favorable and
unfavorable regions for the five different interaction fields. Based on favorable interaction regions
obtained from the contour map, the molecular fragments essential for the respective activity
should be characterized.
24
25
Advantages of CoMSIA:
• The CoMSIA technique shares a few drawbacks of CoMFA, but it also offers several distinguishing
advantages:
• The utilization of the “Gaussian distribution of similarity indices” evades the unexpected changes in
grid-based probeatom interactions.
• The choice of similarity probe is not only limited to either steric or electrostatic potential fields, but
also hydrogen bonding (hydrogen bond acceptors and donors) and hydrophobic fields.
• The effect of the solvent entropic provisions can also be incorporated by employing a hydrophobic
probe.
• With CoMFA, a contour map highlights those regions in space where the aligned molecules would
favorably or unfavorably interact with a probable receptor environment.
• On the other hand, the CoMSIA contours indicate those areas within the region occupied by the
ligands that “favor” or “dislike” the occurrence of a group with a particular physicochemical property.
• This relationship between the requisite properties and a possible ligand shape is a more direct
guide to authenticate whether all features crucial for response are present in the structures being
considered.
26