grade2
grade2
Release 1.6.0
1 Introduction 3
3 Usage 13
4 Examples 27
5 Grade2 outputs 31
6 Charging 43
9 CSD-Core compatibility 71
13 Comparing Grade2 and EH99 Restraints for amino acid side chains 85
14 Grade2 Changelog 89
i
ii
grade2 Documentation, Release 1.6.0
Grade2 is a ligand restraint dictionary generation tool for refinement and fitting.
CONTENTS: 1
grade2 Documentation, Release 1.6.0
2 CONTENTS:
CHAPTER
ONE
INTRODUCTION
Grade2 is a tool for generating restraint dictionaries for ligands in macromolecular structure determination. Such a
dictionary provides stereochemical restraints that are essential to produce chemically reasonable structures in X-ray
refinement of macromolecules in normal resolution ranges (Steiner and Tucker, 2017). A restraint dictionary describes
the conformational flexibility of the ligand molecule, and so is essential for ligand fitting into electron density (both
manually using Coot (Emsley and Cowtan, 2004) or with automated ligand fitting tools such as Rhofit).
Grade2 is distributed as part of the BUSTER package https://www.globalphasing.com/buster/ but its restraint dictio-
naries can be used by a wide range of building, fitting and refinement tools. Grade2 is a reimplementation of the
original Grade restraint generation tool that was developed at Global Phasing from 2010. Grade2 (like Grade) aims to
produce restraints using information from the Cambridge Structural Database (CSD) of small-molecule organic crystal
structures whenever possible. Both Grade and Grade2 use the Mogul tool (Bruno et al., 2004) to search the CSD. But
whereas Grade uses the Mogul program in batch mode via a Mogul Instruction File, Grade2 performs Mogul search
with the CSD Python API, allowing additional custom analysis. Where CSD information is not available Grade and
Grade2 base restraints on computational chemistry methods. Currently, Grade2 uses the MMFF94s (Halgren, 1996)
or UFF force field (Rappe et al., 1992) as implemented (Tosco et al., 2014) in the RDKit package.
3
grade2 Documentation, Release 1.6.0
4 Chapter 1. Introduction
CHAPTER
TWO
2.1 Installation
Grade2 is installed as part of the BUSTER distribution. For a full description of how to install BUSTER please see the
installation documentation available at https://www.globalphasing.com/buster/manual/installation/index.html
Grade2 makes extensive use of the CSD Python API from the CCDC. Because of this Grade2 requires an installation
of the CSD-Core package, that provides the API and the CSD and Mogul databases, to work. For details on how to
obtain CSD-Core please see https://www.ccdc.cam.ac.uk/solutions/csd-core/
Note: If no local installation of CSD-Core is available, we recommend using the Grade Web Server, for non-
confidential ligands.
To work Grade2 needs to be able to locate the CSD Python API and the directory containing the databases (only the CSD
and Mogul databases are required for Grade2). The locations for these are found by setting environment variable(s).
If you have followed the BUSTER Snapshot Installation Guide available at https://www.globalphasing.com/buster/
manual/installation/index.html it is likely that Grade2 will already work.
Note: The configuration procedure has been altered in Grade2 release 1.6.0 in July 2024. If you are still using an
old version, please refer to the instructions in $BDG_home/docs/grade2/html/installation.html, or better still
update BUSTER to the latest release.
To check whether Grade2 can find the CSD installations use the grade2 command-line option -checkdeps:
grade2 -checkdeps
If this results in a final line starting with SUCCESS then Grade2 has been successfully setup, for example:
$ grade2 -checkdeps
INFO: BDG_CSD_TOP_DIRECTORY set to /home/software/CCDC_2024.1
INFO: Running /home/software/CCDC_2024.1/ccdc-utilities/csd-location/c_linux-64/bin/csd_
˓→location.x \
/home/software/CCDC_2024.1/ccdc-data
INFO: ---- this writes to file ~/.config/CCDC/CSD.ini setting:
INFO: CSD data root folder = /home/software/CCDC_2024.1/ccdc-data
(continues on next page)
5
grade2 Documentation, Release 1.6.0
If instead, grade2 -checkdeps has lines starting with ERROR then you should set the environment variable
BDG_CSD_TOP_DIRECTORY to the location the top-level directory of the CSD installation on your system (see below).
Please also make sure that the withdrawn environment variable BDG_TOOL_MOGUL has not been set. If you are a sh,
bash or dash user this can be achieved by commands like:
unset BDG_TOOL_MOGUL
export BDG_CSD_TOP_DIRECTORY=/home/software/CCDC_2024.1/
whereas if you are a tcsh or csh you should use commands like:
unset BDG_TOOL_MOGUL
setenv BDG_CSD_TOP_DIRECTORY /home/software/CCDC_2024.1/
You will need to modify the command used to the correct location of the CSD top directory on your system.
Once you have found the environment variables necessary to get grade2 -checkdeps reporting SUCCESS it is best
if these are added to the BUSTER setup_local.sh and/or setup_local.csh files, as explained in the BUSTER
Snapshot Installation Guide BUSTER configure section. This means that the Grade2 configuration will done together
with BUSTER by the setup script setup.sh or setup.csh.
Note: The BUSTER CSD configuration procedure for Grade2 is shared with the buster-report and the (old, deprecated)
Grade programs
To test whether Grade2 has been configured correctly then use the grade2 command-line option -checkdeps:
grade2 -checkdeps
If this does not result in a final line that starts with SUCCESS then please follow instructions in the Configuration section
above.
To test that all the components used by Grade2 work as expected on your system then run the command grade2_tests
-n auto. For example:
$ grade2_tests -n auto
INFO: BDG_CSD_TOP_DIRECTORY set to /home/software/CCDC_2024.1
INFO: Running /home/software/CCDC_2024.1/ccdc-utilities/csd-location/c_linux-64/bin/csd_
˓→location.x \
/home/software/CCDC_2024.1/ccdc-data
INFO: ---- this writes to file ~/.config/CCDC/CSD.ini setting:
INFO: CSD data root folder = /home/software/CCDC_2024.1/ccdc-data
====================================================================================␣
˓→test session starts␣
˓→====================================================================================
configfile: pytest.ini
plugins: mock-3.11.1, cov-4.1.0, xdist-3.3.1
16 workers [583 items] ipped
.........................................................................................
˓→.......................................................................................
˓→..... [ 31%]
.........................................................................................
˓→.ss....................................................................................
˓→..... [ 62%]
.........................................................................................
˓→........................................................................s..............
˓→..... [ 93%]
........s.........s................s.... ␣
˓→ ␣
˓→ [100%]
============================================================================== 577␣
˓→passed, 8 skipped in 36.07s␣
˓→==============================================================================
csd_directory: /home/software/CCDC_2024.1/ccdc-data/csd
PDB components InChiKey store last modified date: 2024-04-05
grade2_tests will run over 500 unit, functional and integration tests written as part of the test-driven development
used when developing Grade2. Any failure is serious, so please report it to us.
Please also see the Examples section for how to run/test the grade2 command-line tool.
Note: The -n auto argument of grade2_tests specifies that the tests will use as many processes as your computer
has physical CPU cores. To run the tests on a single specify do not specify the option.
If is possible to configure site-specific features for Grade2 by using the environment variables as set out below. Do not
worry about these if are a new user of Grade2.
In general, environment variables are used to configure an installation of Grade2 by specifying site-specific choices
that have can be set once and will not vary from run to run. In contrast, command-line arguments are used to specify
things that will vary for individual grade2 runs.
It is best Grade2 environment variables are are added to the BUSTER setup_local.sh or setup_local.csh file, as
explained in the BUSTER Snapshot Installation Guide BUSTER configure section, so that there are setup together with
BUSTER. Users can of course set or alter any of the environment variables if they wish by using export or setenv
(depending on the shell they use).
BDG_CSD_TOP_DIRECTORY
BDG_CSD_TOP_DIRECTORY is the environment variable that gives the location of the top-level directory for the CSD
installation (that is required to run Grade2). For Grade2 to work BDG_CSD_TOP_DIRECTORY must be set. The CSD
installation top-level directory must contain the subdirectory ccdc-software and the subdirectory ccdc-utilities.
It will also normally contain the subdirectory ccdc-data that will be used by Grade2 as the default location of the
Mogul and CSD databases (unless this is overridden by setting BDG_CCDC_DATA).
Please see the Initial Configuration section above for more detail on setting BDG_CSD_TOP_DIRECTORY.
BDG_CCDC_DATA
The environment variable BDG_CCDC_DATA can be used to specify the location of the ccdc-data directory that will
be used as the location of the Mogul and CSD databases. The BDG_CCDC_DATA must contain the subdirectories csd
and mogul for Grade2 to work (it will also normally contain other subdirectories such as isostar).
If BDG_CCDC_DATA is not set, then Grade2 will by default use the subdirectory ccdc-data of
BDG_CSD_TOP_DIRECTORY .
BDG_CCDC_DATA can be used if you are seeing long Grade2 runs because of CSD is installed on a slow networked disk
as explained the Making a local copy of ccdc-data section below.
BDG_GRADE2_PYTHON_VERSION
Grade2 is distributed with BUSTER within in a miniconda environment. In order to work with the CSD Python API that
is loaded at run time from the CSD Python API distributed with the CSD the two Python versions must be compatible.
To use Grade2 following release 1.4.1 or above with a CSD installation that predates 2023.2.0, released in July 2023
set the environment variable BDG_GRADE2_PYTHON_VERSION to 3.7. If not set, then BDG_GRADE2_PYTHON_VERSION
currently defaults to 3.9.
BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB
The grade2 command line option --pubchem_names does an online lookup the systematic (IUPAC) name for ligands
that occur in PubChem. This online search involves uploading the SMILES string of the molecule to the PubChem
server. For this reason, the --pubchem_names option should not be used for confidential ligands. To be extra
careful, by default the --pubchem_names option is deactivated until the environment variable is set. To enable the
--pubchem_names option set BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB to "yes".
Note: Apologies for the long name of the environment variable BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB
but it is chosen to explicitly show you have accepted that the ligand's SMILES string is uploaded to a public web
server.
BDG_GRADE2_LIGAND_LOOKUP
The --lookup option provides a mechanism whereby an external script is invoked to look up details of a ligand from a
database. To use your own script, set environment variable BDG_GRADE2_LIGAND_LOOKUP to the location of the script.
Please see the --lookup option for more details.
BDG_GRADE2_MOGUL_IN_HOUSE_DATABASE
Grade2 can use an additional in-house Mogul database, as described in the in-house Mogul databases chapter. Once
you have prepares the database please set BDG_GRADE2_MOGUL_IN_HOUSE_DATABASE to the full path of the directory
containing the in-house Mogul database.
BDG_GRADE2_SSL_DISABLE_VERIFICATION
When BUSTER is installed on some Ubuntu Linux OS the --lookup ID using the default
pubchem_g2_lookup_script.py script distributed with BUSTER can fail terminating with an error message
that includes SSL: CERTIFICATE_VERIFY_FAILED. A similar error when the --pubchem_names option is used.
If the problem occurs set environment variable BDG_GRADE2_SSL_DISABLE_VERIFICATION to "yes". This will
disable SSL verification for PubChem lookups and should mean the options work. As you should not using the options
for any confidential ligands turning off verification should be fine.
BDG_GRADE2_CIF_LOOP_ALL
Set BDG_GRADE2_CIF_LOOP_ALL to "yes" to write all CIF categories as loops, even if they only contain a single item.
Currently this only affects category gphl_chem_comp_info which by default is written using key-value pairs as this
makes inspection easier. All other CIF categories are written as loops anyway.
BDG_GRADE2_TEST_WEB
By default, the grade2_tests tool does not run tests that use external online services, as these can be unavailable
because of maintenance or network issues. Set BDG_GRADE2_CIF_LOOP_ALL to "yes" to turn on the tests involving
external online services. This will enable testing of the --pubchem_names and the --lookup ID options.
Prior to release 1.6.0 in July 2024 the environment variable BDG_TOOL_MOGUL was used to configure Grade2, buster-
report and Grade. Grade2 release 1.6.0 will continue to work using BDG_TOOL_MOGUL but will produce an initial
warning message, for example:
$ grade2 -checkdeps
WARNING: The old configuration environment variable BDG_TOOL_MOGUL is set, instead
WARNING: please set the new configuration environment variable BDG_CSD_TOP_DIRECTORY
WARNING: (to the top directory of your CSD installation). For now, I am doing
WARNING: this for you by:
WARNING:
WARNING: unset BDG_TOOL_MOGUL
WARNING: export BDG_CSD_TOP_DIRECTORY=/home/software/xtal/CCDC/CSDS/2023.3/
WARNING:
WARNING: For more information please see:
WARNING:
WARNING: https://gphl.gitlab.io/grade2_docs/installation.html#withdrawn
WARNING:
INFO: BDG_CSD_TOP_DIRECTORY set to /home/software/xtal/CCDC/CSDS/2023.3/
INFO: Running /home/software/xtal/CCDC/CSDS/2023.3//ccdc-utilities/csd-location/c_linux-
˓→64/bin/csd_location.x \
/home/software/xtal/CCDC/CSDS/2023.3//ccdc-data
INFO: ---- this writes to file ~/.config/CCDC/CSD.ini setting:
INFO: CSD data root folder = /home/software/xtal/CCDC/CSDS/2023.3//ccdc-data
############################################################################
## [grade2] ligand restraint dictionary generation
############################################################################
It is best if you alter the BUSTER setup_local.sh or setup_local.csh file, as explained in the BUSTER Snapshot
Installation Guide BUSTER configure section, setting the environment variable BDG_CSD_TOP_DIRECTORY rather than
BDG_TOOL_MOGUL. The environment variables CSD_HOME and BDG_TOOL_CSD_PYTHON_API have also been supported
in the past and should no longer be set.
From the next release of Grade2 setting any of the obsolete environment variables BDG_TOOL_MOGUL, CSD_HOME or
BDG_TOOL_CSD_PYTHON_API will result in Grade2 terminating with an error message.
Please note that the next release will not work with CSD releases that pre-date CSD 2023.2 and we advise update to
CSD 2024.1 or subsequent.
A major revision and improvement of the directory structure and installation procedures was made in CSD-Core release
2023.1, that introduced a configuration file CSD.ini.
• The directory structure was improved to separate out the databases from the software. Prior to this release there
was a tight coupling between the software and the databases in the same CSD installation. Following the revision,
CSD software can now be used with the databases from different releases and in different locations. This is a
great improvement.
• The location of databases used is specified in CSD.ini. Normally this file is individual to each user, and for
Linux and macOS is located in the user's home directory with the path:
~/.config/CCDC/CSD.ini
• CSD-Core includes a program csd_location, that, given a directory path containing the databases, updates the
user's CSD.ini file.
• To make sure that the latest databases are used following a CSD-Core update, Grade2 and buster-report have been
altered from release 1.6.0 to invoke the csd_location tool on every run. By default, the ccdc-data location is
set from the environment variable BDG_CSD_TOP_DIRECTORY , ensuring that the database and software are
kept in sync when different versions of the CSD-Core are used.
• The BDG_CCDC_DATA environment variable can be used to set the database location to other locations: for
instance, in making a local copy of ccdc-data.
• When Grade2 and Buster-report update the CSD.ini file, this is noted in the terminal log on lines starting INFO:,
for example:
$ grade2 -P ATP
INFO: BDG_CSD_TOP_DIRECTORY set to /home/software/CCDC_2024.1
INFO: Running /home/software/CCDC_2024.1/ccdc-utilities/csd-location/c_linux-64/bin/csd_
˓→location.x \
/home/software/CCDC_2024.1/ccdc-data
INFO: ---- this writes to file ~/.config/CCDC/CSD.ini setting:
INFO: CSD data root folder = /home/software/CCDC_2024.1/ccdc-data
############################################################################
## [grade2] ligand restraint dictionary generation
...
Note: On every run, Grade2 and Buster-report will overwrite the user's ~/.config/CCDC/CSD.ini configuration
file.
If you are seeing long Grade2 runs because of CSD is installed on a slow networked disk then it is possible for a user
to speed things up by using copies of the ccdc-data subdirectories csd and mogul on a fast local filesystem.
Note: Although individual users can perform this procedure without root privilege and separately from the main CSD
installation, you are advised to talk to your system manager and/or software installer before doing so. Filing up the
/tmp disk could make you unpopular!
The procedure is to first check that there is sufficient free space on the disk that you want to use (it is best if this an
SSD). For example if you want to use /tmp then use the command df -h /tmp, like this:
$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-tmp 49G 48M 47G 1% /tmp
in this case the disk has 47GB free. This can be compared to the size of database directories that need to be copied:
So in this case, things are fine as around 19GB are required and compared to 47GB free space.
Make a directory for the local copy. For example /tmp/ccdc-data_2024.1_local
$ mkdir /tmp/ccdc-data_2024.1_local
Then copy of over the complete contents of the directories ccdc-data/csd and ccdc-data/mogul into your local
copy. For example:
$ cp -a $BDG_CSD_TOP_DIRECTORY/ccdc-data/csd /tmp/ccdc-data_2024.1_local/
$ cp -a $BDG_CSD_TOP_DIRECTORY/ccdc-data/mogul /tmp/ccdc-data_2024.1_local/
This may take some time as this involves copying 19GB of data from a slow network disk. The local directory now has
copies of the two required databases:
$ du -hsx /tmp/ccdc-data_2024.1_local/*
12G /tmp/ccdc-data_2024.1_local/csd
6.9G /tmp/ccdc-data_2024.1_local/mogul
To use the local directory then set the environment variable BDG_CCDC_DATA to its location, this is normally is done
by:
$ export BDG_CCDC_DATA=/tmp/ccdc-data_2024.1_local/
Once you have checked that this works, it is best that BDG_CCDC_DATA is added to the BUSTER setup_local.sh or
setup_local.csh file, as explained in the Initial Configuration section, above.
Making a local copy of ccdc-data should result in a speedup of Grade2 and buster-report (please see example timings).
Note: It is only necessary to copy over the csd and mogul sub-directories, to run Grade2, buster-report and Grade.
THREE
USAGE
Before you run the Grade2, make sure that you have followed the Initial Configuration instructions and tested that
Grade2 works properly.
To run Grade2 you need to specify the molecule that you want to create a restraint dictionary for. There are currently
4 alternative input options:
SMILES (Simplified Molecular-Input Line-Entry System) provides a way to describe a molecular structure as an ASCII
string https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system. To generate a restraint dictionary
for a given SMILES string simply run grade2 on the command-line followed by the SMILES surrounded by single
quotes: grade2 'SMILES', for example:
$ grade2 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
Please note that the dollar symbol $ above represents the command prompt. This will run grade2 producing output like
the following:
$ grade2 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
set CSDHOME=/home/software/xtal/CCDC/CSDS/2021.3/CSD_2022 from $BDG_TOOL_MOGUL=/home/
˓→software/xtal/CCDC/CSDS/2021.3/CSD_2022/bin/mogul
############################################################################
## [grade2] ligand restraint dictionary generation
############################################################################
13
grade2 Documentation, Release 1.6.0
-----------------------------------------------------------------------------
• As you can see, before the restraint dictionary is produced a CHECK is made to see whether the ligand has already
be defined in the wwPDB Chemical Component Dictionary https://www.wwpdb.org/data/ccd that describes
residues small molecules in PDB entries. As can be seen, in this case the SMILES string is for caffeine and
it would be sensible to use a restraint dictionary for CFF so that the atom names agree with the existing definition
https://www.rcsb.org/ligand/CFF. See the next subsection.
• Note that the CIF-format restraint dictionary is written to file LIG.restraints.cif and has the default PDB
chemical component id (aka residue name or 3-letter code) of LIG. To set the 3-letter code use the command-line
option --resname.
• As well as the CIF-format restraint dictionary grade2 will write "ideal" coordinates based on the restraints to
PDB, SDF and MOL2 formats. For more details see the coordinates files section.
• Molecular diagrams are also produced, for more details see the schematic 2D molecular diagrams section.
• Finally suggestions are given how to view the coordinates and restraints produced using Coot or EditREFMAC
(supplied with BUSTER).
• Note that if you do not want the coordinate or molecular diagram output files then the --just_cif option can be
used.
14 Chapter 3. Usage
grade2 Documentation, Release 1.6.0
When the chirality of one or more chiral centers is not specified in the SMILES string the output molecule will have
an arbitrary stereochemistry assigned. In these cases, the chiral restraint will be set to both for the ambiguous centers
that will allow the chiral centre to flip during fitting and in refinement (From release 1.3.1). A warning message is now
written when there are any ambiguous chiral centers.
For example, the SMILES string C([C@@H]1C(C(C(C(O1)O)O)O)O)OC defines an 6-O methyl ester of a D-pyranose
sugar but lacks chiral specification for 4 of the chiral carbons (so it could be a derivative glucose, galactose or another
stereo isomer). It is sensible to run Grade2 using the --antecedent option to base the atom naming on PDB component
GLC to get standard sugar atom IDs. The --diagram_stereo_label option is used so that labels showing the stereo config
are added to the molecular diagrams.
$ grade2 -P GLC
This will produce schematic 2D SVG molecular diagrams showing that atoms C1, C2, C3 and C4 have ambiguous
chirality, labeled as (?) and marked by wavy bonds. In contrast, as the chirality of atom C5 is defined as (R) (it is a
D-pyranose) that is also shown with a wedged bond:
The restraint file will include a restraint specifying the chirality of atom C5 whereas the other chiral restraints will have
both restraints allowing flipping.
Chemaxon Extended SMILES enhanced stereochemistry and and or group definitions allow the detailed description
of a way of describing detailed information about mixtures of stereoisomers or absolute, but unknown stereochem-
istry. Greg Landrum's RDKit Blog article Intro to Stereo Groups and Enhanced Stereochemistry provides an excellent
description.
From release 1.6, Grade2 can process CXSMILES with enhanced stereochemistry and and or group definitions. In
the following example a sugar with one known stereo center but a mixture of the other stereoisomers for the other two
centers (expressed as a CXSMILES and group):
The --diagram_stereo_label option is used so that labels showing the stereo config are added to the molecular diagrams.
This will produce schematic 2D SVG molecular diagrams showing that the atom C3 has a defined chirality (R) but the
other two atoms are part of an and group and so there is a mixture of chiralities. The chiral restraints for atoms C7 and
C9 will be set to be type both to allow flipping and wavy bonds are used in the 2D diagrams:
Please note that currently neither chiral groups in SDF files nor wiggle bonds are supported in Grade2. If you would
like support for either please please let us know.
To generate a restraint dictionary for an existing PDB ligand it is best to use the --PDB_ligand option. For instance, to
generate a restraint dictionary for caffeine CFF run:
$ grade2 -P CFF
This will produce output using the wwPDB Chemical Component Dictionary (CCD) compound record for caf-
feine CFF (see https://www.rcsb.org/ligand/CFF for an overview). Grade2 will download the wwPDB CCD CIF
file for the compound from either PDBeChem: https://www.ebi.ac.uk/pdbe-srv/pdbechem/ or from Ligand Expo:
http://ligand-expo.rcsb.org/. The output restraint dictionary will be called CFF.restraints.cif and other files will
be named CFF.*, see the Grade2 outputs chapter.
If the --PDB_ligand option is used then the atom names will agree with the wwPDB CCD definition for the compound.
This has the advantage that if you deposit the final structure to the PDB the compound's atoms will not be renamed.
The third input option is to use a file to specify the input molecule. The command-line option --in should be used to
specify the input filename. For instance, to generate a restraint dictionary for the SDF file ligand_35.sdf with the
3-letter code L35 run:
the output restraint dictionary will be L35.restraints.cif and other files will be named L35.*, see the Grade2
outputs chapter.
16 Chapter 3. Usage
grade2 Documentation, Release 1.6.0
Normally, the format of the input file is detected from the filename extension (for example .sdf). If necessary the
command-line option --itype can be used to specify the input format.
Currently, Grade2 supports the following input formats:
The --lookup option provides a mechanism whereby an external script is invoked to look up details of a ligand from
a database. To use your own script, set environment variable BDG_GRADE2_LIGAND_LOOKUP to the location of the
script. Please see https://gitlab.com/gphl/grade2_lookup_scripts for example scripts written in different languages and
description of what your script needs to do.
By default, if BDG_GRADE2_LIGAND_LOOKUP is not set, grade2 --lookup CID uses a script that downloads ligand
details from PubChem https://pubchem.ncbi.nlm.nih.gov/ using CID the PubChem compound identifier. For example,
running
will download details, of the drug Triforin, of from PubChem using its CID 123 (see https://pubchem.ncbi.nlm.nih.
gov/compound/123 for the Triforin PubChem entry). This will run grade2 producing output like the following:
-----------------------------------------------------------------------------
18 Chapter 3. Usage
grade2 Documentation, Release 1.6.0
You can notice that the SMILES string C(CC(=O)N)CN=C(N)N downloaded from PubChem is used as a starting point
for the molecule. A CIF-format restraint dictionary is output to the file CID_123.restraints.cif, and this will
include information about the molecule's name, its systematic (IUPAC) name and the PubChem information page.
Please note, that most grade2 command-line arguments have a long version, for instance --just_cif and a short
version -j (see --just_cif ). The long version can be abbreviated when this creates no ambiguity.
-h, --help
The --help option will write out a help message listing all the command-line arguments. Please note that help on each
option is deliberately brief and more detail can be found in this chapter.
-checkdeps, --checkdeps
-checkdeps is a special option that checks that the external tool (CSD) that grade2 needs is accessible and works
properly. Useful for setting up grade2 and for a quick test that the program works on a particular host. Please see the
Installation section of this document for more details.
-V, --versions
--versions writes out version numbers of the program and Python/Data libraries used. Please use this option when
reporting bugs.
You must specify exactly one molecular input argument, so if you provide a SMILES string you cannot also provide an
input CIF file.
'SMILES'
SMILES string input. The SMILES string should be given in single quotes to avoid SHELL mangling, for instance:
grade2 'C(=O)OH'
downloads information for the given PDB chemical component id (also known as the residue name or 3-letter code)
from PDBe or RCSB PDB. Please see the section above for more details.
Use the filename IN_FILE for the input molecule. Please see the section above for more details, including supported
file formats.
-L ID, --lookup ID
Use an external script to lookup the molecule with ID in an external database. Please see the section above and
https://gitlab.com/gphl/grade2_lookup_scripts for more details.
The --resname option sets the output PDB chemical component id (aka residue name or 3-letter code) to the string
specified by PDB_ID. Note that using --resname will normally alter the output filenames. The default PDB_ID code
is LIG unless the code is available from the input (for instance, if the -P PDB_ID, --PDB_ligand PDB_ID option has
been used).
Please see the FAQ What are the Grade2/BUSTER restrictions on residue name? for more information.
Output files produced will have filenames starting with this string. The actual filenames will be formed of the specified
OUT_ROOT with an appropriate extension (see the Grade2 outputs chapter for more details), for instance the restraint
dictionary CIF file will be called OUT_ROOT.restraints.cif.
If --out is not specified, by default output filenames will start with LIG., where LIG is the PDB_ID that can be set by
the --resname or --PDB_ligand options.
The --ocif OUT_CIF option sets the full filename for the CIF restraint dictionary to the user-specified string OUT_CIF.
This option can be used to exactly control the filename for the restraint dictionary including its file type. For instance,
using --ocif ../ligand_ABC.dic will result in the restraint dictionary being written to a file ligand_ABC.dic in
the directory above the current working directory.
Please note that the --ocif option overrides the -o/--out option. Furthermore, the --ocif option has no effect on
the filename for other output files (if any). Consequently, it is recommended that it is used with the --just_cif option.
20 Chapter 3. Usage
grade2 Documentation, Release 1.6.0
-f, --force_overwrite
By default grade2 will not overwrite existing files, instead exiting with an error message. Use the
--force_overwrite option (or the -f short option) to force overwriting existing files.
-j, --just_cif
By default grade2 writes a number of output files (see the Grade2 outputs chapter). The --just_cif option will
cause grade2 to write only the CIF-format restraint dictionary. It turns off the production of all other (PDB, SDF,
MOL2 & SVG) files.
-s, --shelx
Produce SHELX restraint .dfix format output files. If --shelx is specified two additional output files will be created
with the extensions .dfix and .with_hydrogen.dfix. The former file has restraints excluding those to hydrogen
atoms.
-N, --no_charging
Use the --no_charging option to turn off the standard charging scheme that modifies groups likely to be charged at
pH7. For instance, the standard charging scheme alters a neutral carboxylic acid to a carboxylate ion and also a neutral
phosphoric acid to a phosphate ion, for more detail see the Charging chapter.
It should be noted that, the --no_charging option leaves the input molecule unchanged. So if the input molecule
has a charged group then this will NOT be altered by the --no_charging option. If you want to model a ligand
with a protonation state that is distinct from the standard charging scheme then use manual editing with Mercury as
demonstrated by the FAQ How can I produce restraints for a ligand with a different protonation state or tautomer?.
-e, --ecloud
The -ecloud option now specifies that the ideal xyz coordinates will use the electron-cloud distances for bonds to
hydrogen atoms rather than nuclear distances.
It should be noted, that in the first public release 1.0.0 of Grade2 the -ecloud option specified that for bond restraints
to hydrogen atoms to be set to electron-cloud distances that are adequate for X-ray refinement. From release 1.1.0,
Grade2 produces CIF restraint dictionaries containing both electron-cloud and nucleus X-H bond restraints, avoiding
the requirement of separate restraint dictionaries for the two use cases. The -ecloud option is retained with the
narrower effect on just the ideal xyz coordinates.
-c, --chirality_both
Use the --chirality_both option if you are not certain of the chiral configuration of the input molecule. The
--chirality_both set the volume of all chiral restraints identified to "both" to allow for cases of ambiguous stere-
ochemistry.
Note that the --chirality_both flag is not needed if starting from a non-stereo SMILES as restraints will then
automatically be set to "both".
--chiral_non_carbon
Grade2 by default only places chiral configuration restraints on chiral tetrahedral carbon atoms. Use the
--chiral_non_carbon option to also place chiral restraints for atoms that are nitrogen, phosphorous and sulfur.
It is noteworthy that chiral centres at nitrogen atoms can often rapidly interconvert, for example ammonium ions are
not regarded as chiral unless they are quaternary (see Athabasca University Chemistry 350 Organic Chemistry I: 5.10:
Chirality at Nitrogen, Phosphorus, and Sulfur). Using --chiral_non_carbon can introduce undesirable chiral re-
straints for ammonium ions and phosphates (such as ATP). We advise you to use this option cautiously for cases where
you are sure of the chiral configuration.
-b, --big_planes
-4, --4_atom_planes
instead of creating a single plane restraint for each flat 5/6-atom ring, produce 5 or 6 separate four-atom planes around
that ring. In practice, using this option has little effect on refinement results. The --4_atom_planes option is included
for testing as separate four-atom plane restraints are used by both Grade and in the first Grade2 release 1.0.0.
--eh99_sigma_correction
scales up all non-hydrogen bond and angle sigmas to match the mean sigma values of the EH99 amino restraints used by
BUSTER. Please see the Comparing Grade2 and EH99 Restraints for amino acid side chains chapter for background
and details of the option.
The full name of ligand can be set using the --name option. Ideally, the full name should be human-readable, for
example, "retinoic acid". The name will be shown in buster-report output. You should quotation marks if the full
name contains a space, for example:
By default, the full name will be set to the InChIKey for the molecule, unless a name is already known for instance for
PDB ligands.
The --systematic option allows the systematic (IUPAC) name of the molecule to be specified. The systematic name
provided will be included in the output CIF restraint dictionary using the _pdbx_chem_comp_identifier data category.
It is optional to specify the name and version of the program used to find the systematic name.
For example, specifying --systematic "2-acetyloxy-4-iodobenzoic acid" specifies just the systematic name,
without recording the program details. Note the use of the double quotation marks as the systematic name has a
space. To record the program used and its version simply add after the systematic name. For example, --systematic
"2-acetyloxy-4-iodobenzoic acid" ACD/Name v2021 will result in the following CIF records in the output
restraint dictionary:
22 Chapter 3. Usage
grade2 Documentation, Release 1.6.0
_pdbx_chem_comp_identifier.comp_id LIG
_pdbx_chem_comp_identifier.type "SYSTEMATIC NAME"
_pdbx_chem_comp_identifier.program ACD/Name
_pdbx_chem_comp_identifier.program_version v2021
_pdbx_chem_comp_identifier.identifier "2-acetyloxy-4-iodobenzoic acid"
--pubchem_names
The --pubchem_names option performs an online search for the ligand in the PubChem database https://pubchem.
ncbi.nlm.nih.gov/. If the option is activated and the molecule is found then the PubChem title is used for the full name
of ligand and the systematic name is set to the PubChem IUPAC name. The PubChemPy package is used to make most
of the lookups.
Warning: The --pubchem_names option involves uploading the SMILES string of the molecule to the PubChem
server and so should not be used for confidential ligands.
The online search involves uploading the SMILES string of the molecule to PubChem. For this
reason, the --pubchem_names option should not be used for confidential ligands. To be ex-
tra careful, by default the --pubchem_names option is deactivated until the environment variable
BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB is set. If the option is specified without activa-
tion then Grade2 will terminate with an error message.
To activate the --pubchem_names option then, if you are a bash ksh or dash shell user:
$ export BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB="yes"
If you are happy for --pubchem_names to be permanently enabled for all users of grade2 at your site then please see
the Advanced Configuration section.
--group GROUP
Set the CCP4-extension CIF item _chem_comp.group to GROUP. This item is used by CCP4 programs, like Coot,
when producing restraints to link monomers together. Grade2 automatically sets the _chem_comp.group to peptide
for amino acids both for PDB chemical components and while Setting atom IDs for amino acids. The item is also
automatically set for PDB chemical components that are saccharides (to pyranose or furanose).
The --group option can be used to manually set _chem_comp.group to any value. If the option is used it overrides
any automatically set value. Please note that to work properly it will also be necessary to set appropriate atom IDs for
monomers to be connected properly.
Set a corporate or database ID for the molecule and optionally other details for the molecule. The ID should be database
identifier for the molecule, for example: 2083 (for PubChem) or DB01001 (for DrugBank).
One or more additional optional arguments DB_NAME, URL and DETAILS can also be given (separated by spaces).
DB_NAME should be the name of the database (for example, PubChem or DrugBank). The URL should be a URL of a page
giving details of the ligand a the database (for instance, https://pubchem.ncbi.nlm.nih.gov/compound/2083).
DETAILS can be used for any other information (for example, "Corporate Compound Database - internal
access only").
The ID will be shown in buster-report output. Future reporting tools will display all the information.
As an example, when producing a restraint dictionary for the PDB component VIA information about the DrugBank
entry for Sildenafil from https://go.drugbank.com/drugs/DB00203 can be added:
Note how grade2 options can be abbreviated when there is no ambiguity with other options. The information provided
will be included in the output restraint CIF dictionary in the in gphl_chem_comp_database the CIF data category:
loop_
_gphl_chem_comp_database.comp_id
_gphl_chem_comp_database.id
_gphl_chem_comp_database.database
_gphl_chem_comp_database.url
_gphl_chem_comp_database.details
VIA VIA PDB https://www.rcsb.org/ligand/VIA
˓→"RCSB PDB"
For more information please see the section Database Information in output CIF Restraint Dictionary.
Please note that if you want to add information about more database entries then further --database_id options can
be specified. For instance to add information about the Wikipedia page:
-X, --no_extra
By default the output restraint dictionary CIF file will have many extra Grade2-specific items, for instance giving source
of restraint values. Use the --no_extra to turn off the extra Grade2-specific items.
24 Chapter 3. Usage
grade2 Documentation, Release 1.6.0
--itype {cif,sdf,mol,mol2,smi}
Format for the --in input file, selected from allowed list. By default, the format is detected from the filename extension
and file contents (please see the section above for more details).
--rcsb
For the --PDB_ligand option download first from the RCSB site https://files.rcsb.org/ligands/ rather than from
PDBeChem.
--diagram_stereo_label
This option will add a label indicating the stereo configuration for each stereo centre in the schematic 2D molecular
diagrams SVG files. The a small labels will be positioned adjacent to each stereo center. The label will be (R) or (S)
if the stereo configuration is known. For stereo centers with unknown configuration a small label (?) will normally be
added. For CXSMILES strings with enhanced stereochemistry groups the label will be based on this information. For
an example, please see the schematic 2D molecular diagrams section.
--debug
The --debug option turns on debug-level terminal output. The STDOUT output written by Grade2 will then include
a large number of lines starting DEBUG:. These are not intended to be intelligible by end users but instead are useful to
the program developers. You should only use the --debug option if reporting problems with Grade2.
3.2.4 Optional arguments for setting atom IDs (aka atom names)
This option is used to base the atom IDs and 2D coordinates on those from a related molecule. A filename
RELATED_RESTRAINTS_CIF for a CIF restraint dictionary of a related molecule must be provided. It is best if the
restraint dictionary is produced by Grade2 itself.
The option is demonstrated in the Atom Naming chapter.
This option is similar to --antecedent except that atoms are not required to have the same element. Where possible
atom IDs are altered so that the non-element part of matching atoms is maintained. So for example if atom CL24 is
matched to a fluorine atom it will be given the atom ID F24 (provided there is not an another atom with that label).
-R, --rdkit_canonical_atom_ids
Sets the atom IDs from the RDKit canonical SMILES order. This means will get the same atom IDs regardless of the
input atom order.
This option is explained and demonstrated in the Atom Naming chapter.
--inchi_canonical_atom_ids
Set InChI-canonical atom IDs. These are "universal" but rather ugly.
This option is explained and demonstrated in the Atom Naming chapter.
--no_aa_labels
This option turns off recognizing amino acids and setting atom IDs to N CA C O OXT CB. Please see Setting atom IDs
for amino acids for more details.
--aa_loose
extends setting atom IDs to "exotic" amino acids, such as N-modified and beta amino acids. Please see Setting atom
IDs for "exotic" amino acids for more details.
26 Chapter 3. Usage
CHAPTER
FOUR
EXAMPLES
To produce restraints for a given SMILES string then run grade2 on the command-line followed by the SMILES
surrounded by single quotes: grade2 'SMILES', for example:
$ grade2 'Oc1ccccc1'
Please note that the dollar symbol $ above represents the command prompt. This will produce CIF-format restraint
dictionary LIG.restraints.cif with the default PDB chemical component id (aka residue name or 3-letter code)
of LIG. To set the 3-letter code use the command-line option --resname, for instance if you wanted the 3-letter residue
name DRG:
To produce a restraint dictionary for a compound that already exists in the PDB it is best to use the --PDB_ligand
option. For example, to produce a restraint dictionary for Sildenafil that has the PDB chemical component ID VIA run:
This will produce output using the wwPDB Chemical Component Dictionary (CCD) compound record for VIA (see
https://www.rcsb.org/ligand/VIA for an overview). Grade2 will download the wwPDB CCD CIF file for the compound
from either PDBeChem: https://www.ebi.ac.uk/pdbe-srv/pdbechem/ or from Ligand Expo: http://ligand-expo.rcsb.
org/. The output restraint dictionary will be called VIA.restraints.cif and other files will be named VIA.*, see
the Grade2 outputs chapter. In practice, it is important to look at the Terminal Output produced as you will be warned
that there is a nitrogen atom that has been charged by adding a proton to it. If instead you wanted results for the original
uncharged molecule then use the --no_charging option:
The --force_overwrite option allows Grade2 to overwrite any existing VIA.* grade2 files. If you prefer less typing, as
each of the options used has a short single-character alternative so the above command is equivalent to:
27
grade2 Documentation, Release 1.6.0
$ grade2 -P VIA -f -N
In all cases, the output restraint dictionary will be called VIA.restraints.cif and other files will be named VIA.*,
see the Grade2 outputs chapter for more details.
Suppose you want to produce restraints for a compound from a chemical database (either public or corporate). Let us
take Sildenafil as an example case. The PubChem entry for Sildenafil is https://pubchem.ncbi.nlm.nih.gov/compound/
135398744. Given that PubChem provides reliable model 3D coordinates for its compounds it is most sensible to use
the 3D SDF coordinates download option to download Conformer3D_CID_135398744.sdf. grade2 can be run for the
downloaded SDF file, using the --in option:
This command will result in outputs with the default PDB chemical component id (aka residue name or 3-letter code)
of LIG. It is normally best to use the command-line option --resname to set the 3-letter code, for instance here to S_L.
This will result in output files whose names starting S_L.. Suppose you wanted the output files to be named
135398744.*, this can be achieved using the --out option:
Furthermore, as in this case we know both an appropriate name for the compound and a database ID, it is a good idea
to set these, using the --resname and --database_id options:
If you prefer less typing each of the options used has a short single-character alternative so the above command is
equivalent to:
In practice, it is important to look at the terminal output produced as you will be warned that there is already a wwPDB
compound definition for Sildenafil in output lines:
28 Chapter 4. Examples
grade2 Documentation, Release 1.6.0
So it is best to use a restraint dictionary for the PDB chemical component VIA to ensure compatibility. Please see the
previous section that explains how to generate a restraint dictionary for VIA.
30 Chapter 4. Examples
CHAPTER
FIVE
GRADE2 OUTPUTS
Running the grade2 command, as described in the Usage and Examples chapters, will result in outputs both to file(s)
and to the terminal. This chapter gives a guide to as to what to expect.
grade2 writes out information about the restraint generation process as it runs to the terminal. This output is intended
to be intelligible and to give an indication that the restraints generation process it proceeding normally for the ligand
in question.
For example, generating a restraint dictionary for the PDB chemical component ID VIA (Sildenafil) running
produces an initial output giving copyright, authors and program version information, following the normal BUSTER
package convention:
############################################################################
## [grade2] ligand restraint dictionary generation
############################################################################
-----------------------------------------------------------------------------
This is followed by output lines saying where the information from where the PDB chemical components definition
for VIA is collected and giving web URLs to get further information about VIA (https://www.rcsb.org/ligand/VIA and
31
grade2 Documentation, Release 1.6.0
https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/VIA ).
This is followed by information that the VIA molecule has a nitrogen atom that will normally be charged at neutral pH
and a proton has been added to the molecule:
If this charging is not wanted then the use --no_charging command-line option. For more details on the charging
process see the Charging chapter.
A check is then made that the RDKit molecule generated for restraint production has an InChI that matches that from
the input file (if this is available). InChI is short for International_Chemical_Identifier and provide a way to quickly
check that the stereochemistry of molecules match. In this case, the output indicates that there is a match other than
for the protonation layer, as would be expected giving the charging:
RDKit molecule generated has the same InChIKey as the other than the last protonation␣
˓→character.
---- This indicates that the stereochemistry matches other than the change caused by␣
˓→charging.
For all input sources, a check is made comparing the InChiKey of the RDKit molecule generated for restraint production
with those for known PDB components (from the wwPDB Chemical Component Dictionary https://www.wwpdb.org/
data/ccd ). In this case, the CHECK produces the expected result - that the molecule matches component VIA (apart
from the checking):
The check is most important for when using a molecule from a SMILES string or file input when this matches an
existing PDB component (see example of this). If there is a match then it normally makes sense to use the restraint
dictionary for the matching PDB component (see FAQ on matching components).
The checks are followed by information about the progress of restraint-generation including the force field used, the
Mogul version and the final geometry optimization:
The final part of the terminal output gives information about the output files produced and suggestions as to commands
to view the results:
grade2 follows standard Unix (and BUSTER) practice with normal output being written to STDOUT and errors to
STDERR. This means that redirection or pipe/tee can be used to capture the output to a file (see How do I save terminal
output to a file? for a guide to the many ways to do this).
The CIF-format restraint dictionary file is the principal output of Grade2. The file lists the restraints generated as well
as the important run-related information. The CIF-format restraint dictionary produced by Grade2 can be used with
the BUSTER refine, Rhofit, Buster-report and the EditREFMAC restraint editor. In addition it can be used with Coot
and should work with other 3rd-party refinement programs. Please let us know any compatibility issues you find.
The CIF-format restraint dictionary standard used by Grade2 is currently rather loosely set by what is understood by
REFMAC and Coot, and has many items not set in the official PDBx/mmCIF Dictionary. Grade2-specific extensions
are stored as data categories with name starting _gphl_ for instance _gphl_chem_comp_info. The command-line
option --no_extra can be used to turn off Grade2-specific CIF categories and items.
A Grade2 CIF-format restraint dictionary will always contain a chem_comp_atom category that defines the atoms of the
ligand. Take for example, two atoms extracted from the restraint dictionary for the charged-version of PDB component
VIA (sildenafil):
loop_
_chem_comp_atom.comp_id
_chem_comp_atom.atom_id
_chem_comp_atom.type_symbol
_chem_comp_atom.type_energy
_chem_comp_atom.partial_charge
_chem_comp_atom.charge
_chem_comp_atom.x
_chem_comp_atom.y
_chem_comp_atom.z
VIA C18 C CH2 0.091 0 1.888 2.510 -4.235
VIA N17 N NT1 -0.335 1 0.943 3.489 -3.615
Note that some items in the category do not follow the official PDBx/mmCIF Dictionary chem_comp_atom definitions.
Each atom in the molecule is identified by an atom ID (aka atom name) assigned in the chem_comp_atom.atom_id
item. Atom IDs must be unique within a particular ligand and are used to define the atoms in each of the restraints.
Grade2 has a number of options to set atom IDs, as described in the Atom Naming chapter.
The chem_comp_atom.type_symbol item provides an upper case version of the atom's element.
The chem_comp_atom.type_energy item is a widely-used extension to the official PDBx/mmCIF Dictionary giving
an atom type as defined in the CCP4 suite file $CCP4/lib/data/monomers/ener_lib.cif. The type_energy is
used by BUSTER to setup non-bonded contacts allowing atoms that can form hydrogen bonds to get closer than normal
hydrogen bond contacts.
Note that formal atomic charges are given as item _chem_comp_atom.charge as these are important in unambiguously
defining the chemistry of a ligand. In the example above, nitrogen atom N17 of the VIA is assigned a formal charge of
+1 after the piperazine is protonated (see the VIA example in the Charging chapter). The now-obsolete program Grade
fails to provide formal atomic charge information, making it difficult to use Grade restraint dictionaries as an input to
Grade2, see FAQ: on using Grade input.
Partial atomic charges are given in addition to the formal charges. The partial charges by the Gasteiger and Marseli
(1980) method as implemented in the RDKit ComputeGasteigerCharges module. Please note that there are many ways
of calculating partial charges and so care needs to be taken that they are suitable before using them for any given
application.
Cartesian coordinates for each atom are given in the CIF items chem_comp_atom.x, chem_comp_atom.y, and
chem_comp_atom.z. These CIF items do not comply to the PDBx/mmCIF Dictionary chem_comp_atom standard
but are widely used. The Cartesian coordinates are "ideal", as described below . In the Coot program, the conforma-
tion described by the coordinates can be retrieved by first importing the CIF dictionary, then by using either the File ...
Get Monomer option or the Calculate ... Modelling >>> Monomer from Dictionary option. The ideal coordinates are
also used by the Rhofit ligand fitting program.
A Grade2 CIF-format restraint dictionary will contain a chem_comp_bond category giving information about each of
the bonds that join the atoms of the ligand (except for ligands that are monoatomic). For example the following defines
the first two bonds extracted from the restraint dictionary for the charged-version of PDB component VIA (sildenafil):
_chem_comp_bond.comp_id
_chem_comp_bond.atom_id_1
_chem_comp_bond.atom_id_2
_chem_comp_bond.type
_chem_comp_bond.aromatic
_chem_comp_bond.value_dist_nucleus
_chem_comp_bond.value_dist_nucleus_esd
_chem_comp_bond.value_dist
_chem_comp_bond.value_dist_esd
_chem_comp_bond.source_value_dist_nucleus
_chem_comp_bond.source_value_dist_nucleus_esd
_chem_comp_bond.source_value_dist
_chem_comp_bond.source_value_dist_esd
VIA C34 C33 single n 1.513 0.033 1.513 0.033 Mogul_mean_1207_hits Mogul_sd Mogul_
˓→mean_1207_hits Mogul_sd
VIA C34 H341 single n 1.093 0.020 0.979 0.015 MMFF94s_equilibrium default_to_H ␣
˓→ ecloud ecloud
• chem_comp_bond.comp_id lists the chemical component ID (aka residue residue name) of the ligand in this
case VIA.
• chem_comp_bond.atom_id_1, & .atom_id_2 each bond joins two atoms identified by their atom IDs in items
chem_comp_bond.atom_id_1 and chem_comp_bond.atom_id_2. The atom IDs must appear in the preceding
chem_comp_atom table.
• chem_comp_bond.type gives the order of bond and is one of single, double or triple. Note that the bonds
in aromatic groups are assigned alternating single and double types by the RDKit Kekulize method. The bond
type is also used for 2D schematic pictures and will be displayed in coot.
• chem_comp_bond.aromatic the aromatic item is set to y or n depending on whether the bond is assigned to
be aromatic by RDKit. The RDKit book section on aromaticity provides a description of the approach taken and
starts with an instructive paragraph:
"Aromaticity is one of those unpleasant topics that is simultaneously simple and impossibly compli-
cated. Since neither experimental nor theoretical chemists can agree with each other about a definition,
it’s necessary to pick something arbitrary and stick to it. This is the approach taken in the RDKit."
For this reason it is important to for downstream procedures not to rely on the aromatic item. It is likely to vary
between different programs and there is no "correct" definition. The aromatic item is provided for consistency
with older programs and it would have been better if it had not been adopted in the past.
• chem_comp_bond.value_dist_nucleus, & value_dist the items value_dist_nucleus and value_dist
both give an ideal length for the bond in Å. For bonds that do not involve a hydrogen atom they will have
an identical value. For bonds to hydrogen atoms, the value_dist_nucleus gives the normal bond distance
between the nucleus of the two atoms, whereas the value_dist gives the shorter bond length that is suitable for
X-ray refinement (Stewart et al., 1965).
The value_dist_nucleus is used to define a harmonic restraint for the bonds, as 𝑏𝑖𝑑𝑒𝑎𝑙 in the formula below.
∑︁ (︂ 𝑏 − 𝑏𝑖𝑑𝑒𝑎𝑙 )︂2
𝑉𝑏𝑜𝑛𝑑 = 𝑊𝑏𝑜𝑛𝑑
𝜎
𝑏𝑜𝑛𝑑𝑠
where 𝑏 is the actual bond length and 𝜎 the estimated standard deviation (the next item).
• chem_comp_bond.value_dist_nucleus_esd, & .value_dist_esd the items value_dist_nucleus_esd
and value_dist_esd provide the "estimated standard deviation" of value_dist_nucleus and value_dist
in Å. This parameter is also know as the "standard uncertainty" or the "sigma" (𝜎 in the equation above). Values
are taken from the standard deviation of the bond length distribution found from small molecule crystal structures,
when these are available.
• chem_comp_bond.source_value_dist_nucleus, .source_value_dist_nucleus_esd, .
source_value_dist, & .source_value_dist_esd These source items provide information about the
source of each the parameters defining the restraint. These items have been introduced by the Grade2 program as
data provenance is important in any scientific study, and we think it is important to record where restraints come
from. If the extract items are a problem, then use the command-line option --no_extra to turn off Grade2-specific
CIF categories and items
When available Grade2 will base ideal bond lengths on values from the Mogul tool analysis of CSD
small molecule X-ray crystal structures. In these cases the source item will start Mogul. For example
Mogul_mean_1207_hits shows that a parameter is taken from the mean of a distribution of 1207 values from
relevant CSD structures.
If Mogul cannot be used to obtain a value, for instance for all parameters involving hydrogen atoms, then a value
will be obtained from a force field. For example, a source MMFF94s_equilibrium shows that the value is an
equilibrium length obtained from the RDKit implementation of the MMFF force field (Tosco et al., 2014).
The chem_comp_angle CIF category defines restraints on bond angles. For example the following defines the two
bond angles extracted from the restraint dictionary for the charged-version of PDB component VIA (sildenafil):
_chem_comp_angle.comp_id
_chem_comp_angle.atom_id_1
_chem_comp_angle.atom_id_2
_chem_comp_angle.atom_id_3
_chem_comp_angle.value_angle
_chem_comp_angle.value_angle_esd
_chem_comp_angle.source_value_angle
_chem_comp_angle.source_value_angle_esd
VIA H342 C34 H343 108.6 3.0 MMFF94s_optimised_coords default
VIA C34 C33 C32 112.6 2.8 Mogul_mean_528_hits Mogul_sd
• chem_comp_angle.comp_id lists the chemical component ID (aka residue residue name) of the ligand, in this
case VIA.
• chem_comp_angle.atom_id_1, .atom_id_2, & .atom_id_3 give the atom IDs for the 3 atoms that form the
bond angle.
• chem_comp_angle.value_angle is the ideal or target angle of the restraint in degrees.
• chem_comp_angle.value_angle_esd is the estimated standard deviation, also known as standard uncertainty
or sigma for the restraint in degrees.
• chem_comp_angle.source_value_angle These source items provide information about the source of each
the parameters defining the restraint. Please see the chem_comp_angle.source* documentation above for details.
When Mogul information is unavailable for a particular bond angle, then ideal angles are based on a force field.
For instance above the bond angle involving hydrogen atoms has a source MMFF94s_optimised_coords, this
means the "ideal" angle is based on the angle found after the ligand has been energy minimised with the RDKit
implementation of the MMFF force field (Tosco et al., 2014).
Planar restraints are specified in the chem_comp_plane_atom category. A separate line is used for each atom involved
the plane (that can involve many atoms). For example, here are 2 of the 18 planes that Grade2 produces for the charged-
version of PDB component VIA (sildenafil):
_chem_comp_plane_atom.comp_id
_chem_comp_plane_atom.plane_id
_chem_comp_plane_atom.atom_id
_chem_comp_plane_atom.dist_esd
_chem_comp_plane_atom.source
VIA atom-C4 C9 0.02 Mogul_sum_angles_362
VIA atom-C4 C4 0.02 Mogul_sum_angles_362
VIA atom-C4 C5 0.02 Mogul_sum_angles_362
VIA atom-C4 O3 0.02 Mogul_sum_angles_362
VIA ring5A C30 0.005 Mogul+_ring_tors_rmsd_0.6_56_hits
VIA ring5A N29 0.005 Mogul+_ring_tors_rmsd_0.6_56_hits
VIA ring5A N28 0.005 Mogul+_ring_tors_rmsd_0.6_56_hits
VIA ring5A C24 0.005 Mogul+_ring_tors_rmsd_0.6_56_hits
VIA ring5A C25 0.005 Mogul+_ring_tors_rmsd_0.6_56_hits
• chem_comp_plane_atom.comp_id lists the chemical component ID (aka residue residue name) of the ligand,
in this case VIA.
• chem_comp_plane_atom.plane_id provides a ID for the plane in question. For the two planes in the example
the plane IDs are atom-C4 and ring5A. Grade2 uses descriptive plane IDs where possible. In the example,
atom-C4 is a plane that holds atom C4 flat involving three atoms to which it is bonded (C9, C5 and O3). Plane
ring5A is a plane that holds a five-membered ring in the ligand flat (in this case the pyrazole ring in VIA). If
there is a second five-membered ring in the ligand that is flat then the ring would be assigned the ID ring5B.
A plane ID that starts 2fold- is used for torsion angles that Grade2 assigns to be flat but where there is no
preference as to whether the torsion is predominately 0º or 180º. Note that such a group will be held planar by a
plane restraint rather than a 2-fold torsion angle as some programs, such as Coot, do not activate torsion angle
restraints by default (and there can be differences in handling non-bonded contacts between the atoms involved).
A plane ID that starts trans- or cis- imposes a plane restraint that Mogul indicates has a strong preference
to be around 180º or 0º respectively. The plane restraint imposes no preference to either conformation but a
corresponding 1-fold torsion is defined that is normally inactive.
If you ever edit a restraint dictionary to introduce your own plane definitions, it should be noted that some
programs have a 8-character limit to the plane_id.
• chem_comp_plane_atom.atom_id lists the atom ID for an atom within the plane (that is defined on multiple
lines).
• chem_comp_plane_atom.dist_esd is the estimated standard deviation, also known as standard uncertainty or
sigma for the restraint in Ångstroms. The plane restraint provides a harmonic penalty forcing atoms towards the
mean plane formed by the atoms. The dist_esd determines the stiffness of the restraint. The previous Grade
program and many other restraint generation tool use a sigma of 0.02Å for all planes. Grade2 goes beyond this
assigning values of sigma depending on the tightness of distributions from Mogul + custom ring analysis, please
see the Treatment of Planar Groups chapter for more information.
In the example above, the plane ID atom-C4 that holds atom C4 is assigned the default sigma of 0.02Å. In
constrast, the plane ring5A that holds the pyrazole ring flat is assigned a tighter sigma 0.005Å.
• chem_comp_plane_atom.source This is an Grade2 extra item that provides some source information as to
why the plane was assigned. If the extract items are a problem, then use the command-line option --no_extra to
turn off Grade2-specific CIF categories and items.
In the example above, the plane ID atom-C4 has a source Mogul_sum_angles_362. Atom C4 is assigned to be
planar because it is bonded to 3 other atoms and the sum of ideal angles for the 3 bond angle restraints with C4
as a central atom is 362º. All three bond angles restraints are from Mogul distributions. Currently, if the sum of
angles from Mogul is above 356º then a plane restraint is added, with the default sigma of 0.02Å.
For the pyrazole ring plane ID ring5A the source is Mogul+_ring_tors_rmsd_0.6_56_hits. The plane is
assigned from Mogul + custom ring analysis from 56 Mogul hits. The ring torsions of the hits have a root mean
squared deviation from zero of 0.6º. This means that the rings are very flat. The sigma for the plane restraint is
set at the limit of 0.005Å.
For planes holding atoms bonded to one or more hydrogen atoms are normally set from the the RD-
Kit implementation of the MMFF force field (Tosco et al., 2014). This will have a source item like
MMFF_out_of_plane_koop_0.015, meaning the that the atom is held planar on the basis of the MMFF out-of-
plane term.
Here are some of the 33 torsion restraints that Grade2 produces for the charged-version of PDB component VIA (silde-
nafil):
_chem_comp_tor.comp_id
_chem_comp_tor.id
_chem_comp_tor.atom_id_1
_chem_comp_tor.atom_id_2
_chem_comp_tor.atom_id_3
_chem_comp_tor.atom_id_4
_chem_comp_tor.value_angle
_chem_comp_tor.value_angle_esd
_chem_comp_tor.period
_chem_comp_tor.source
VIA CONST_ring6B-6 C9 C8 C7 C6 0.0 1000000.0 0 ␣
˓→ planar_ring
VIA puck_ring6C-1 C19 C18 N17 C16 -60.0 12.3 3 Mogul+_pucker_
˓→tors_rmsd_57.1_53_hits
• chem_comp_tor.comp_id lists the chemical component ID (aka residue residue name) of the ligand, in this
case VIA.
• chem_comp_tor.id provides a ID for the torsion angle.
A torsion ID that starts CONST_ is used for torsions within planar rings. No active restraint is placed on such
torsions.
For six-membered saturated rings, such as the piperazine group in VIA, the chem_comp_tor.id of the six
ring torsion angles will start with puck_ring6. 3-fold torsion restraints with minima at +60º and -60º (the 180º
minimum is irrelevant because of the ring closure).
If Grade2 judges that a should have a 3-fold active torsion restraint the chem_comp_tor.id will start 3fold (and
chem_comp_tor.period will be set to 3).
Grade2 does not impose active 2-fold or 1-fold torsion restraints, instead using a plane restraint to hold the atoms
planar. In such a case the chem_comp_plane_atom.plane_id and chem_comp_tor.id will be consistent.
2-fold torsions will have an ID starting 2-fold. 1-fold torsions will have an ID starting trans- or cis-. In all
these, cases the torsion restraint is inactivated by setting chem_comp_tor.value_angle_esd to a very large
value (1,000,000º).
• chem_comp_tor.atom_id_1, .atom_id_2, .atom_id_3, & .atom_id_4 give the atom IDs for the 4 atoms
that form the torsion angle.
• chem_comp_tor.value_angle is the ideal or target angle of the restraint in degrees. For 3-fold torsion angles
there will be two additional minima at ± 120º
• chem_comp_tor.value_angle_esd is the estimated standard deviation, also known as standard uncertainty
or sigma for the restraint in degrees. Inactive restraints are produced by setting the sigma to a very large value
(1,000,000º).
• chem_comp_tor.period is the periodicity of the restraint - that is the number of minima in 360º range of the
angle.
• chem_comp_tor.source provides information about the source of the restraint. For example
Mogul+_pucker_tors_rmsd_57.1_53_hits says that a six-membered ring is set to have restraint to
maintain its pucker ± 60º as the 53 CSD hits from Mogul+ analysis have an root mean square deviation from 0º
of 57.1º.
Restraints controlling the configuration of chiral centres within a ligand are specified by the chem_comp_chir category.
It should be noted that this category differs markedly from the official chem_comp_chir and reflects the de facto standard
used by Libcheck and succeeding programs.
BUSTER reads chem_comp_chir but then controls chirality using a restraint on the improper torsion angle rather than
a chiral volume as explained in the GELLY documentation Appendix E: CHIRAL Restraints in gelly.
Here are the 4 chem_comp_chir records that Grade2 produces for the PDB component RIB
_chem_comp_chir.comp_id
_chem_comp_chir.id
_chem_comp_chir.atom_id_centre
_chem_comp_chir.atom_id_1
_chem_comp_chir.atom_id_2
_chem_comp_chir.atom_id_3
_chem_comp_chir.volume_sign
_chem_comp_chir.source
RIB chir_01 C4 C5 O4 C3 negativ rdkit
RIB chir_02 C3 C4 O3 C2 negativ rdkit
RIB chir_03 C2 C3 O2 C1 negativ rdkit
RIB chir_04 C1 O4 C2 O1 negativ rdkit
• chem_comp_chir.comp_id lists the chemical component ID (aka residue residue name) of the ligand, in this
case RIB.
• chem_comp_chir.chir_id provides an ID for the chiral centre. Grade2 uses IDs starting chir_01
• chem_comp_chir.atom_id_centre provides the atom ID for the chiral atom.
• chem_comp_chir.atom_id_1, .atom_id_2, & .atom_id_3 provide the atom IDs of three atoms that are
bonded to the chiral atom.
• chem_comp_chir.volume_sign specifies the chiral configuration of the centre. Possible values are positiv,
negativ and both. When there is a chiral centre whose configuration has not been set in the input, for example
a SMILES string that lacks stereo specification, the volume_sign is set to both.
• chem_comp_chir.source provides information about the source of the assignment.
If available, the output CIF-format restraint dictionary will contain information as to the systematic name of the ligand.
The pdbx_chem_comp_identifier data category will be used. Systematic names for PDB ligands are automatically
obtained from the input PDB chemical component definition (if the ligand is charged by Grade2 then " (CHARGED)"
will be added). The --pubchem_names option can be used to do a online lookup the systematic name for ligands that
occur in PubChem. The --systematic option allows the systematic name to be manually set. Currently, we are not aware
of any open source systematic chemical name programs but commercial programs to produce systematic names are
available from ACD/Labs, OpenEye and Chemaxon.
From Grade2 version 1.3.0 information about entries for the ligand in Chemical databases is included in the output CIF
restraint dictionary. This information is held in the CIF data category gphl_chem_comp_database. For example, for
PDB chemical component VIA output restraint dictionary will have the automatically have the information:
#
loop_
_gphl_chem_comp_database.comp_id
_gphl_chem_comp_database.id
_gphl_chem_comp_database.database
_gphl_chem_comp_database.url
_gphl_chem_comp_database.details
VIA VIA PDB https://www.rcsb.org/ligand/VIA "RCSB PDB"
VIA VIA PDB https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/VIA PDBe
#
For PDB chemical components Grade2 automatically provides details to access RCSB PDB and PDBe pages. The
--pubchem_names also automatically sets appropriate gphl_chem_comp_database records if a match is found.
Users can provide information about in-house databases using the --database_id option.
At the end of the restraint generation process a geometry optimization of the coordinates of molecule with the gelly
geometry-only minimizer is made. This produces a set of coordinates that where the bond length, bond angles and
other terms are adjusted to be as close as possible to the "ideal" values. These coordinates are then used to output files
in a variety of formats. Please note that the "ideal" coordinates can be trapped at a local minimum.
5.3.1 PDB-format
PDB-format is a widely used exchange chemical file format for proteins. Grade PDB-format ideal coordinates are
written using RDKit routines, as so have CONECT records giving the bond order that are recognized by some molecular
graphics programs (such as Jmol).
5.3.2 SDF-format
Please note that SDF-format file will use Kekulé bonding (where aromatic bonds are marked with alternating single
and double bonds) whereas the MOL2-format uses CSD conventions for aromatic bonds.
5.3.3 MOL2-format
The MOL2-format file uses aromatic bonding following the CSD convention. This makes them suitable for running
addition Mogul geometry analysis.
As well as writing a CIF restraint dictionary and "ideal" coordinates files Grade2 will produce schematic 2D molecular
diagrams that can be useful. For instance, for the PDB component CFF caffeine, running grade2 --PDB_ligand CFF
will write two SVG files that can be visualized using a web-browser (such as Chrome):
1. CFF.diagram.svg
If the command line option --diagram_stereo_label is used, then small labels showing the stereo configuration of each
atom will be added to the both diagrams. For example, for the PDB component SRT meso-tartaric acid, running grade2
--PDB_ligand SRT --diagram_stereo_label produces diagrams with (S) and (R) next to the chiral atoms.
SIX
CHARGING
Molecular Databases like the wwPDB Chemical Component Dictionary normally have neutral forms of compounds
that can be expected to be charged at neutral pH. For instance, carboxylic acids are protonated rather than forming a
carboxylate ion.
By default, Grade2 examine the input molecule and charge a number of common groups by adding or removing a
proton. If a group is charged then a WARNING message will be written to the terminal output (please see WARNING
about charging example output). If the charging is not wanted then use the --no_charging command-line option and
no alteration will be made to the input molecule.
Currently, Grade2 will charge the groups listed in the sections below.
If you want to model a ligand with a protonation state that is distinct from the standard charging scheme then use
manual editing with Mercury as demonstrated by the FAQ How can I produce restraints for a ligand with a different
protonation state or tautomer?.
A SMARTS pattern [$([OX2H1][CX3]=O)] is used to detect neutral carboxylic acid and the proton is removed to
leave a carboxylate ion.
For example, running grade2 for the PDB component PPI https://www.rcsb.org/ligand/PPI propanoic acid:
43
grade2 Documentation, Release 1.6.0
A SMARTS pattern [$([OX2H1][PX4]=O)] is used to detect phosphoric acids with a hydrogen atom and the proton
is removed to leave a phosphate ion. If a phosphoric acid has multiple hydrogen atoms attached all are removed.
For example, running grade2 for the PDB component R1P https://www.rcsb.org/ligand/R1P ribose-1-phosphate:
The final molecule will be the ribose-1-phosphate dianion with both the protons removed:
44 Chapter 6. Charging
grade2 Documentation, Release 1.6.0
A SMARTS pattern [$([NX3;H2][CX4])] is used to detect nitrogen atoms with 2 hydrogen atoms attached that are
also bonded to an SP3 carbon atom. A proton hydrogen atom is added in these cases.
For example, running grade2 for the PDB component 01R https://www.rcsb.org/ligand/01R:
Notice that the name of the proton hydrogen added is given. The result is a cation:
notice that only the terminal -CH2-NH2 nitrogen atom is protonated with the other nitrogen atoms left alone.
A SMARTS pattern [$([NX3;H1]([CX4])[CX4])] is used to detect nitrogen atoms with a hydrogen atom attached
that are also bonded to two SP3 carbon atoms. A proton hydrogen atom is added in these cases.
For example, running grade2 for the PDB component PIP https://www.rcsb.org/ligand/PIP piperidine:
Notice that the atom name of the proton hydrogen added is based on the atom name of the existing hydrogen. The
resulting molecule is a cation:
In more complex molecules all piperidine rings will be charged like this.
A SMARTS pattern [$([NX3]([CX4])([CX4])[CX4])] is used to detect nitrogen atoms with no hydrogen atom
attached that are also bonded to three SP3 carbon atoms. A proton hydrogen atom is added in these cases.
For example, running grade2 for the PDB component VIA https://www.rcsb.org/ligand/VIA Sildenafil:
Notice that the atom name of the proton hydrogen added is based on the atom name of the nitrogen. The resulting
molecule is a cation:
46 Chapter 6. Charging
grade2 Documentation, Release 1.6.0
Notice that the other piperazine nitrogen atom is not protonated as it is attached to a sulfur atom. The small molecule
structure for Sildenafil citrate monohydrate (Yathirajan et al., 2005) CSD entry: FEDTEO shows that Grade2 protonates
the correct nitrogen atom.
If the input molecule matches more than one of the charging patterns above then each will be applied. For instance,
phosphotyrosine https://www.rcsb.org/ligand/PTR will be output with the amino acid in a zwitterionic with phosphate
having two negative charges:
The warning messages show that in this case show that 3 charging patterns are applied:
48 Chapter 6. Charging
CHAPTER
SEVEN
In terms of obtaining a good results when fitting/refining ligands in moderate-low resolution structures probably the
most critical restraint term is that for planes. Imposing plane restraints when there should be none will often prevent
realistic fitting. Conversely correctly identifying missing planes can reveal misfit ligands, for a good example of this
see Smart, O. S. and G. Bricogne (2015) and PDB entry 1PMQ/4Z9L.
For each ring, Grade2 analyses the CSD hits from Mogul assess whether the ring is flat or puckered. This custom
ring analysis is an advance over the original Grade where ring restraints where based on quantum chemical results and
heuristics. An advantage of the custom analysis is that for flat ring it results in a 𝜎 _chem_comp_plane_atom.dist_esd)
to be set based on the flatness distribution of the CSD hits.
For example, for PDB component DZ3 https://www.rcsb.org/ligand/DZ3 Grade2 produces planes with 𝜎 's obtained
from Mogul + custom CSD analysis:
Note that the two the phenyl rings have plane 𝜎 set to 0.007 Angstroms. This is tighter than the 0.020 Angstroms used
for all planes in Grade.
Also notice the weak planes across the bonds (with a 𝜎 of 0.085 Angstrom) joining the phenyl rings to the amide
(marked in green). These are set because the CSD distribution for the torsion angles show a preference for planarity
but have a broad distibution. The restraints act to weakly encourage planarity but can easily overcome if the electron
density fit warrants it.
49
grade2 Documentation, Release 1.6.0
The Grade2 --big_planes option produces large fused planes that overemphasize ring planarity. Using
--big_planes is a generally a poor idea as it overemphasizes planarity but the option is provided as some users
like planes to be kept very flat.
Historically protein crystallographers have tended to favour large single planar groups in refinement. Indeed, for ex-
ample BUSTER currently uses a single plane restraint for the indole ring in tryptophan TRP.
Although aromatic rings are normally planar under certain conditions they can be induced to adopt bent structures (See
for example [2.2]Paracyclophane in CSD structure DXYLEN13, and the FMN isoalloxazine ring in the high-resolution
PDB entry 2wqf). The default plane restraints produced by Grade2 are designed to allow rings to bend in refinement.
If the option --big_planes is specified, Grade2 will merge together all planes with three atoms in common if they are
"strong" planes. "Strong" planes have a sigma (_chem_comp_plane_atom.dist_esd) of 0.02 Angstroms or less. The 𝜎
of each fused plane is set to the lowest 𝜎 of any contributing planes. "Weak" planes (such as those in amide bonds) are
not incorporated into -big_planes.
For example taking the PDB component DZ3 https://www.rcsb.org/ligand/DZ3. by default Grade2 produces planes:
But if the option --big_planes is specified, the phenyl ring and atom planes are merged:
Notice that merged big planes are created for each of the phenyl rings and the atoms around the rings. Each big plane
also includes the hydroxyl hydrogen atom that should be coplanar. Each big plane 𝜎 is set to 0.007 Angstrom, resulting
in a tightening in the overall planarity compared to default Grade2 restraints.
It is also noteworthy that the weak planes that more loosely encourage planarity for the amide bond and its neighbours
are now not incorporated when --big_planes is specified. In the original Grade2 release 1.0.0, the --big_planes
option had a bug where weak planes were incorporated in the plane merging process, resulting in unrealistic confor-
mational restriction. The bug has been fixed from Grade2 release 1.1.5 (bug fix #342).
For fused aromatic rings the effect of the --big_planes option is normally to create a single large stiff plane. For
example, for Flavin mononucleotide (FMN https://www.rcsb.org/ligand/FMN), by default Grade2 will produce separate
plane restraints for each ring and planar atom in the isoalloxazine:
Using the --big_planes option for FMN results in the isoalloxazine ring being held planar with a single plane in-
volving 22 atoms:
Although isoalloxazine rings are normally flat some enzymes bend the cofactor, as clearly demonstrated, in the 1.35
Angstrom resolution nitroreductase structure PDB entry 2wqf. BUSTER re-refinement of 2wqf with a default Grade2
dictionary allows the isoalloxazine ring to bend:
Fig. 1: 2wqf re-refinement with BUSTER using FMN Grade2 default dictionary shows the restraints allow the isoal-
loxazine ring to bend and fit the electron density. The 2Fo-Fc map contoured at 1.5 rmsd is shown in a light blue mesh,
whereas the Fo-Fc difference map contoured at 3.0 rms is shown in red/green.
In contrast, re-refinement with the a grade2 FMN dictionary produced with the using the --big_planes option forces
the isoalloxazine ring to be flat. The flat ring is clearly incompatible with the electron density.
Fig. 2: 2wqf re-refinement with BUSTER using FMN Grade2 --big_planes dictionary shows the single plane re-
straint forces the isoalloxazine ring to be flat. The 2Fo-Fc map contoured at 1.5 rmsd is shown in a light blue mesh,
whereas the Fo-Fc difference map contoured at 3.0 rms is shown in red/green.
BUSTER re-refinement of a low resolution homology of 2wqf shows that the default Grade2 FMN dictionary allows
a similar bend (to be published).
In conclusion, the --big_planes option can be used if you want ligand planar systems to be held strictly planar in
refinement, even if the electron density indicates otherwise.
Grade2 does not produce specific, distinct restraints for cis and trans stereoisomers of a molecule. Plane restraints are
used to keep cis/trans stereoisomers flat. A plane restraint on a torsion angle has a minimum at a torsion angle of 0º and
another at 180º, thus allowing either a cis or a trans conformation. Grade2 does produce an "ideal" 3D conformation
that will be one of the two stereoisomers. However, it is easy to flip the conformation to the other stereoisomer in Coot.
The PDB component 14Y is a fatty acid methyl ester that provides a valuable example of how Grade2 handles stereoiso-
mers.
Running Grade2 on 14Y:
Produces several files including a restraint dictionary 14Y.restraints.cif and an SVG diagram 14Y.diagram.
atom_labels.svg that shows the PDB component's atom IDs:
Fig. 3: Grade2 diagram 14Y.diagram.atom_labels.svg showing the atom IDs for PDB component 14Y.
Examining the restraint dictionary 14Y.restraints.cif, using a text editor or the interactive restraint editor EditRE-
FMAC, shows that Grade2 imposes a plane restraint on each of the isomeric bonds:
For both of these planes, there are corresponding inactive torsion angle restraints:
As each torsion angle restraint has an enormous sigma value, it will only contribute a minuscule amount to the overall
restraint function and is, therefore, inactive. The torsion angle restraints allow the alteration of the ligand's conformation
using Coot's "Edit Chi Angles" tool (as demonstrated below). They also facilitate altering the restraint dictionary if
the user wants to ensure that only one stereoisomer is allowed. The torsion-IDs are chosen to describe the torsion angle
distribution found by the Mogul search.
Although the plane for the ester bond is given an ID of "trans-01," plane restraints do not have directionality. So, the
active plane restraint can be equally satisfied by a cis or trans conformation of the ester. The source information for
the ester bond shows that it arises from a Mogul distribution of 1143 hits; these have a root mean squared out-of-
plane torsion angle of 3.6º. The distribution is heavily biased towards the trans conformation with 99% of the torsion
population with 10º of 180. The torsion distribution can be visualised by running the interactive Mogul program with
the Grade2 .mol2 coordinate file, as shown in the figure below. For the ester, the inactivated torsion angle restraint is
given an ideal angle of 180º and a periodicity of 1, meaning there is only a single minimum at a torsion angle 180º.
You might ask, "Why doesn't Grade2 set an active restraint forcing the ester bond to be trans?" Early versions of Grade
(the predecessor to Grade2) did this using an active 1-fold torsion restraint in such a case, but this led to two problems.
A minor issue is that Coot does not use torsion angle restraints unless the user turns them on. The major problem is
that rare stereoisomers sometimes occur, for instance, when stabilized by intermolecular contacts, contacts with a metal
ion, or ring closure of a macrocycle. Indeed, the methyl-ester bond we are looking at here can occur in a cis-stereo
isomer, as demonstrated by CSD entry KATVEJ. Later versions of Grade and then subsequently Grade2 use a plane
Fig. 4: Mogul torsion angle distribution for methyl ester torsion in PDB component 14Y.
for all cis and trans stereoisomers, thus avoiding users needing to edit restraint dictionaries if their ligand adopts a rare
stereoisomer.
The double bond in the fatty acid tail is ascribed an ID 2fold-11 (see the tables above). The Mogul torsion dis-
tribution, shown below, does not show a strong preference for either the cis or trans form, hence the ascription of
2-fold. The source information for the restraint Mogul_plane_rms_out_of_plane_torsion6.4_degs_249_hits
indicates that the rms out-of-plane torsion angle for the distribution is 6.4º. This can be compared to the smaller out of
plane torsion for the ester bond of 3.6º. The looser distribution leads to the fatty acid double plane being given a plane
sigma of 0.034Å compared to 0.020Å for the tighter ester.
Although Grade2 outputs files with "ideal" 3D coordinates for the trans (aka E) isomer, the restraint dictionary
14Y.restraints.cif also allows the cis form. The following section shows how to use Coot to flip 14Y into the cis
stereoisomer.
Suppose you are working on a structure that binds methyl oleate but in a cis (Z) stereoisomer. In Coot, it is easy to
produce "ideal" coordinates for the cis stereoisomer using the restraint dictionary 14Y.restraints.cif following
these steps:
1. Load the restraint dictionary
2. Load the "ideal" coordinates from the restraint dictionary using the "File" -"Get Monomer" menu item.
3. Select the "Edit chi angles" icon from the right-hand menu and double-click on the molecule.
4. Select the restricted planar bond from the list of torsion angles.
Fig. 5: Mogul torsion angle distribution for fatty acid double bond torsion in PDB component 14Y.
5. Roughly adjust the angle to the other stereoisomer and click "Accept".
6. From the right-hand menu, select the "Regularize zone" item, double-click on the molecule and hit "Apply".
7. Once you are happy that the stereoisomer has been flipped, save the coordinates.
The following video demonstrates the complete process:
Although it is possible to alter Grade2 restraints to allow only one stereoisomer, this is not generally necessary. However,
suppose you want to prevent fitting into the wrong stereoisomer when you have a low-resolution map. In that case, you
can alter the Grade2 restraints to allow only one of the stereoisomers by activating the torsion restraint for the bond and
making it one-fold.
For instance, to produce a restraint dictionary that only allows the cis stereoisomer of 19Y a one-fold torsion restraint
with a minimum at 0º is appropriate:
The corresponding plane restraint should be deactivated but retained (as some programs use plane restraints when
making decisions about non-bonded contacts):
Fig. 6: Grade2 diagram showing the 2D coordinates and atom IDs for the cis stereoisomer of 19Y.
4. Alter the SMILES and InChI records for the cis form.
The resulting restraint dictionary can be downloaded here: 14Y.restraints.modified_to_cis.cif
When using a restraint dictionary with one-fold torsions it is important to ensure that the torsion restraint term is used.
BUSTER will do this by default but Coot by default does not turn on torsion restraints. Instead the user must turn them
on:
The CSD-Core program Conquest allows custom data mining of CSD, including applying conformational restraints to
a search. This enables finding separate bond angle distributions for different stereoisomers. Engh and Huber (2006)
did such a search to produce distinct angle restraints for the trans and cis stereoisomers of proline in the widely used
EH99 restraint set.
Table 5: Conquest searches for the angle CH2 -CH=CH using the fragment
-CH2 -CH2 -CH=CH-CH2 -CH2 - compared to the equivalent angle in 14Y
Grade2 restraints
CH2 -CH=CH-CH2 torsion angle angle CH2 -CH=CH mean standard deviation
set
trans 125.3º 2.3º
cis 126.7º 2.4º
both (Grade2 14Y C8-C9=C10) 126.1º 2.0º
The table shows the results of running Conquest searches for the angle CH2 -CH=CH compared to the equivalent angle
in 14Y Grade2 restraints. The mean angle differs by less than 2º between the trans and cis stereoisomers, with the
Grade2 value lying in between. This difference is much smaller than that found by Engh and Huber (2006) for the
proline main-chain bond angle C-C-CA, which has a difference in means of over 7º (119.3±1.5º for the trans form
compared to 127.0±2.4º for the cis stereoisomer). For 14Y, the difference between the two stereoisomers is less than
the standard deviation, which shows that it is unnecessary to introduce conformation-dependent restraints in this case.
EIGHT
This chapter describes how Grade2 produces the atom IDs (also know as atom names) of individual atoms in a ligand
molecule.
Please note that, because of limitations in the legacy Protein Data Bank (PDB) Format Grade2 sets all atom IDs to be
uppercase and attempts wherever possible to keep them to be 4 or fewer characters in length. This is because the PDB
format is currently used by BUSTER and other crystallographic tools.
Where possible Grade2, by default, will reuse atom names from the input file. For instance, all PDB chemical com-
ponents have specified atom IDs and it is important to use these to ensure consistency and compatibility with existing
PDB data.
Atom IDs are also set in:
• All CIF restraint dictionaries.
• Most (but not all) MOL2 files. MOL2 files offer a flexible method for manipulating atom IDs within a molecule.
The CSD-core program Mercury, provides a user-friendly interface for editing MOL2 files and adjusting atom
IDs as demonstrated in the FAQ on editing a molecule.
• Some SDF files.
If atom IDs are set in the input but you want to use different atom names then Grade2 has a number of options to set
atom IDs, that will override the input IDs.
Please note that all lower case letters in atom IDs are altered to uppercase by Grade2 as programs such as BUSTER
require that atom IDs are all uppercase.
If atom IDs are not set in the input then Grade2 by default will base the atom IDs on the order of the atoms, unless
the the molecule is a typical amino acid. The first non-hydrogen atom will be assigned an atom ID composed of its
element abbreviation (made upper case), followed by 1. Subsequent non-hydrogen atoms will be assigned IDs made
up of their element followed by their input list order.
Using a SMILES string N(C)[C@@H](C)[C@H](O)c1ccccc1 for ephedrine as an example, Grade2 will set atom IDs:
59
grade2 Documentation, Release 1.6.0
The first atom in the SMILES string is a nitrogen so it is assigned atom ID N1. The second atom is a carbon and so it
gets ID C2. The oxygen atom is the sixth non-hydrogen atom and so it is assigned O6 .
Hydrogen atoms IDs all start with H followed by the list number taken from the atom to which they are attached and
then A, B or C if there is more than hydrogen atom attached. So in the ephedrine example above the hydrogen atom
attached to nitrogen N1 is given the ID H1. As there are three hydrogen atoms attached to C2 they are assigned IDs to
H2A, H2B and H2C.
It should be noted that as SMILES strings are not unique then different atom IDs can be assigned for the same molecule.
If this is a problem then the Grade2 option --rdkit_canonical_atom_ids discussed below sets the IDs from a canonical
atom order that is independent of the input order.
Typical alpha amino acids with an amino group and a single beta carbon atom
Grade2 will now by default, recognize typical amino acids when supplied with an input that lacks atom IDs (aka atom
names), for instance a SMILES string. The exact requirement used is that the molecule matches the SMARTS pattern:
[$([NX3H2,NX4H3+])][CX4H]([#6])[CX3](=[OX1])[OX2H,OX1-]
The pattern specifies that the molecule must have have either a neutral NH2 or a NH3 + amino group followed by a a
4-valent carbon atom with one hydrogen atom and one carbon atom attached and then a neutral or charged carboxylic
acid. A wider range of amino acids are recognized when the --aa_loose option is used (see next section).
If a typical amino acid is recognized then the PDB-standard atom IDs (N CA C O OXT CB) will be set for the main
chain and beta carbon atoms and for the hydrogen atoms that they are bonded to. In addition, the ligand's atoms will
be reordered so that the main chain atoms are first in the list. Currently, side chain atoms are assigned atom IDs using
their numerical order (rather than PDB-style Greek letter remoteness codes CG CD CE etc). So using 4-fluoroglutamate
from SMILES C(C(F)C(=O)O)[C@@H](C(=O)O)N as an example, Grade2 will assign atom IDs:
It should be noted that the --antedecent option can normally be used to assign more atom IDs from the parent amino
acid, as shown below for 4-fluoroglutamate.
If you prefer for the renaming not to happen, then the Grade2 command-line --no_aa_labels option turns it off, leaving
standard numerical order based atom IDs.
Note that, currently, no alterations are made if the input file specifies atom IDs (for example CIF restraint dictionaries
and most MOL2 files).
In addition to setting main chain atom IDs the output restraint dictionary will have the CCP4-extension CIF item
_chem_comp.group is set to peptide This enables Grade2 CIF restraint dictionaries to be used in Coot to replace
protein residues with modified amino acids.
Setting atom IDs for "exotic" amino acids with the --aa_loose option
Following a user-request, the atom naming feature has been extended to a wide range of "exotic" amino acids with
the command line option --aa_loose is used. If the option is not used but atom names could be set then a warning
message is produced in the terminal output, for instance:
WARNING: The molecule is an "GLY-like alpha amino acid with an amino group", so ....
WARNING: ---- could set conventional amino acid atom IDs. If you want ....
WARNING: ---- this done, then please rerun with the option: --aa_loose
WARNING:
If a molecule is recognized as an amino acid by the --aa_loose option the output restraint dictionary will have the
CCP4-extension CIF item _chem_comp.group is set to peptide. Please note that setup of restraints between an
"exotic" amino acid and adjacent monomers is dependent on the program using the restraint dictionary and that setting
atom IDs is not likely to be sufficient to ensure that correct restraints are used.
The amino acid classes that are currently recognized by --aa_loose are detailed below. If there is any need for
recognition of any other class of amino acid then please let us know.
alpha amino acid with CB and N-modification
This pattern allows modification of the nitrogen atom by a single carbon atom. The SMARTS used is:
[$([NX3])]([#6])[CX4H]([#6])[CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CN CA C O OXT CB will be set. Please note that for PDB chemical components there is no
standard atom name for the carbon atom attached to the nitrogen, but CN is used in N-methyl-L-serine
https://www.rcsb.org/ligand/5JP and seems sensible.
For an example, given the SMILES input C[C@@H](C(=O)O)NCC the following atom IDs will be set:
[$([NX3H2,NX4H3+])][CX4]([#6])([#6])[CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CA CB1 CB2 C O OXT will be set. For an example, given the SMILES input
NC(C)(CO)C(O)=O the following atom IDs will be set:
[$([NX3])]([#6])[CX4]([#6])([#6])[CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CN CA CB1 CB2 C O OXT will be set. For an example, given the SMILES input
CNC(C)(CO)C(O)=O the following atom IDs will be set:
[$([NX3H2,NX4H3+])][CX4][CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CA C O OXT will be set. For an example, given the SMILES input F[C@@H](C(=O)O)N the
following atom IDs will be set:
$([NX3])]([#6])[CX4][CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CN CA C O OXT will be set. For an example, given the SMILES input F[C@@H](C(=O)O)NC
the following atom IDs will be set:
[$([NX3])][#6][#6][CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CB CA C O OXT will be set. Please note that for PDB chemical components there is no
standard atom name for the extra main chain carbon atom, but CB is used in both beta-alanine https://www.
rcsb.org/ligand/BAL and 62H https://www.rcsb.org/ligand/62H . For an example, given the SMILES input
FCC(CN)C(=O)O the following atom IDs will be set:
8.3.1 Basing atom IDs on those from a related molecule with --antecedent
When dealing with a molecule that is a derivative of another it is often helpful for the atom IDs of the two molecules to
be consistent. The Grade2 option --antecedent RELATED_RESTRAINTS_CIF allows this. The short version of the
option is -a RELATED_RESTRAINTS_CIF. A filename RELATED_RESTRAINTS_CIF for a CIF restraint dictionary of a
related molecule must be provided. It is best if the restraint dictionary is produced by Grade2 itself.
--antecedent uses the RDKit maximum common substructure (MCS) routines. RDKit maximum common substruc-
ture (MCS) routines comparing RELATED_RESTRAINTS_CIF with the input molecule. Bonds orders are not required
to match but rings only match other complete rings.
Please note that if the input already has atom IDs these will be wiped and disregarded if either the --antecedent, (or
the --antecedent_disregard_element) option is used.
Atoms that are matched in the MCS are assigned the same atom ID. For atoms that are not matched IDs are assigned by
first finding the largest number within any atom IDs of non-hydrogen atoms in the antecedent molecule. For instance,
in the N-acetyldopamine example below the largest number within the atom IDs from the antecedent dopamine, PDB
component LDP, is 8 (from C8). The extra non non-hydrogen atoms are then assigned atom IDs that follow on from
this. In Non-hydrogen atoms that do not match are assigned atom IDs that follow on from this, so in the example below
the acetyl group atoms are given atom IDs C9, C10 and O11. Hydrogen atoms that are not matched are assigned atom
IDs based on the atom to which they are attached. So in the example below the hydrogen atoms in the methyl group
are assigned to be H9A, H9B, and H9C.
As well as matching the atom IDs for the two molecules the 2D coordinates and diagram will also be aligned as shown
here:
Unfortunately, there is already a PDB component definition for N-acetyldopamine 7DP that uses inconsistent atom
labels but in future this option could be used to avoid similar incompatibilities.
The --antecedent option can also be used for modified amino acids. Taking for example 4-fluoroglutamate from
SMILES: C(C(F)C(=O)O)[C@@H](C(=O)O)N this involves first producing a grade2 restraint dictionary for glutamate
GLU and then using it in --antecedent option.
This results in the atom IDs being taken from GLU except for the extra fluorine atom that is labelled F3:
Once again the 2D coordinates are carried over so that the SVG diagrams are aligned.
The --antecedent_disregard_element option (that can be shortened to -ad) is similar to --antecedent except that
atoms are not required to have the same element to match. Where possible atom IDs are altered so that the non-element
part of matching atoms is maintained. So for example, if atom CL24 is matched to a fluorine atom it will be given the
atom ID F24 (provided there is not an another atom with that label).
Taking for example the cyclin-dependent kinase inhibitors SC8 and SC9, running grade2 for each in turn:
As can been seen below the PDB components definitions of the two inhibitors SC8 and SC9, have consistent atom
numbers for the central pyrazolopyrimidine ring but the halogenophenyl and pyridine rings have distinct numbering
and atom IDs.
overrides the input atom IDs and instead sets atom IDs by matching atoms from SC8:
It can be seen that all atoms are matched to equivalents SC8, including both the halogenophenyl and pyridine rings.
The --antecedent_disregard_element option is useful to set consistent IDs and produce aligned 2D diagrams for
series of related inhibitors.
8.3.3 Basing atom IDs on the RDKit canonical SMILES string with
--rdkit_canonical_atom_ids
The default procedure for setting atom IDs used by Grade2 described above, uses the atom order of the input
molecule. This means that it is common for two restraint dictionaries a single compound to have completely dif-
ferent atom naming because the atom orders of the input descriptions to be different. To avoid this problem the
--rdkit_canonical_atom_ids option (short option -R) can be used. This uses atom order in the RDKit canon-
ical SMILES string as a basis for the atom IDs. As the RDKit canonical SMILES is independent of the input atom
order this will produce the same atom IDs for a single compound whatever the source.
For example, using three different SMILES strings describing ephedrine grade2 -R will produce the same atom IDs:
Hydrogen atom IDs are based on the list number of the non-hydrogen atom to which they are attached, as described
above.
Please note that --rdkit_canonical_atom_ids wipes any existing atom IDs and that atoms are reordered by the
option.
8.3.4 Basing atom IDs on the InChI canonical atom order with
--inchi_canonical_atom_ids
Using the canonical RDKit canonical SMILES atom order to produce consistent atom IDs for a single molecule, with
the --rdkit_canonical_atom_ids option option, works well. But one problem is that canonical SMILES strings produced
by different programs are not consistent and so the atom IDs are not universal. Dashti et al. (2017) introduced the idea
of the canonical atom order found as part of calculating the International Chemical Identifier (InChI) of a molecule to
produce ALATIS unique identifiers. The --inchi_canonical_atom_ids option uses this idea and produces atom
IDs that from the InChI canonical atom order. For non-hydrogen atoms the --inchi_canonical_atom_ids numer-
ical part of the atom ID is the same as ALATIS ID.
Once again using as an example three different SMILES strings describing ephedrine grade2
--inchi_canonical_atom_ids produces:
As expected consistent atom IDs are produced by --inchi_canonical_atom_ids regardless of the atom order
in the input SMILES string. But the adjacent atom IDs are far apart in a molecule, for instance atom C1 is
bonded to atom C8 and not adjacent to atom C2. This makes the IDs less "user-friendly" but more universal than
--rdkit_canonical_atom_ids (that for me are more intuitive).
NINE
CSD-CORE COMPATIBILITY
Please check the online version of this page: https://gphl.gitlab.io/grade2_docs/csd_compatibility.html before up-
dating your CSD installation. The online page is updated when new CSD releases are made.
Grade2 obtains data from the Cambridge Structural Database (CSD) using the CSD Python API and you must have an
CSD-Core installation to use Grade2. Because the CSD-Core and BUSTER installations are separate, incompatibilities
can arise. This page shows which recent releases of the two packages work together. Releases made in the past year
are listed here.
Note: The CSD update procedure is, to our knowledge irreversible and you can only download the latest version. So
if you update the code to an incompatible version and there is a problem you may well be stuck. If you have sufficient
disk space, then it is a good idea to install new releases of CSD-core separately rather than using the update procedure.
Please note that the configuration procedure has been altered in Grade2 release 1.6.0 in July 2024. Please
see the Configuration Instructions for details of the new method that involves setting environment variable
BDG_CSD_TOP_DIRECTORY.
Note: To simplify support, from the next Grade2 release (scheduled late-2024) Grade2 will not work with CSD-Core
releases older than 2024.1.
Note: If you are using an old version of CSD-Core you are strongly recommended to update to CSD-Core 2024.2
71
grade2 Documentation, Release 1.6.0
9.2 Linux
9.3 MacOS
9.4 Using Grade2 release 1.4.1 (and following) with old CSD releases
CSD release 2023.2 (July 2023) contained an update to the miniconda environment included with CSD from Python
3.7 to Python 3.9. Grade2 is distributed with BUSTER within in a separate miniconda environment. In order to work
together the two Python versions of these two miniconda environments must be the same. Grade2 in all versions up to
1.4.0 used a Python 3.7 miniconda environment and so will not work with the new CSD release 2023.2 (July 2023).
From Grade2 release 1.4.1 includes both a Python 3.9 miniconda environment to work with the new CSD release 2023.
2 (July 2023) and a Python 3.7 miniconda environment to work with older CSD releases. We will continue to include
a Python 3.7 miniconda environment until July 2024 to support older CSD releases.
The easiest way to ensure compatibility is to update both BUSTER (and hence Grade2) and CSD.
If you wish to use an older installation of CSD with Grade2 following release 1.4.1 or following set the environment
variable BDG_GRADE2_PYTHON_VERSION to 3.7. Users of bash/ksh/zsh shells can do this by:
export BDG_GRADE2_PYTHON_VERSION="3.7"
It is best practice that Grade2 environment variables are are added to the BUSTER setup_local.sh or setup_local.
csh file as explained in Grade2 environment variables section.
Once you update CSD you will need to make sure that BDG_GRADE2_PYTHON_VERSION is not set for Grade2 to work.
(Page updated 02 September 2024)
9.4. Using Grade2 release 1.4.1 (and following) with old CSD releases 73
grade2 Documentation, Release 1.6.0
TEN
The Grade Web Server https://grade.globalphasing.org/ provides a way to use Grade2 (or Grade) without having to
licence and install both BUSTER and the CSD-core packages. To provide easy access to Grade2 the CCDC as kindly
agreed that we can provide a free-to-use Grade Web Server that includes the use of CSD software and data.
Please note that use of the Grade Web Server is subject to agreeing to the conditions of use:
http://grade.globalphasing.org/grade_server/conditions.html
The Grade Web Server should only be used for non-confidential ligands.
The Grade Web Server was publicly announced in March 2012 and in the first ten years of its operation it has been used
to produce over 20,000 restraint dictionaries. On 14 December 2022, an improved version of the Grade Web Server
that runs Grade2 was made available. As part of this work the Grade Web Server user interface has been improved
by simplifying the entry forms and providing help links. A wider range of input file types is now supported including
MOL/SDF and CIF restraint dictionaries. It is also possible to run the original Grade although this is deprecated and
the service will eventually be withdrawn.
75
grade2 Documentation, Release 1.6.0
Video: Using the Grade Web Server: 3 Input the SMILES string
Video: Using the Grade Web Server: 5 Unpacking results tarball & using restraints
(Page updated 29 December 2022)
ELEVEN
11.1 Introduction
If you work for an organisation that has a large set of in-house small molecule structures it is possible to use them to
build a Mogul database that can be searched in addition to the CSD structures. Please note that to do so you will need
to be an established CCDC Research Partner, please contact the CCDC for more details.
Note: Please note that it is only worthwhile to put in the effort of preparing an in-house Mogul database if your set of
in-house structures provides coverage of chemical groups that are not already present in the CSD, but are represented
in the ligands you are working on.
The process of preparing an in-house Mogul database has two major stages:
1. Create an sqlite database of your structures from your set of small molecule CIF coordinate files. The CSD-
Editor software should be used for this stage. Please note that the CSD-Editor program is available for Linux
(and Windows) but is not supplied for MacOS. The procedures for this stage are described in the csd-editor-
industrial documentation that can be found in the CSD distribution:
~/CCDC/ccdc-software/csd-editor/docs/csd_editor_industrial/csd-editor-industrial.
˓→html
2. Once you have prepared the CSD-format sqlite database with your structures and checked that this works with
Mercury, this can be used to build a corresponding Mogul data library using the mogulbuilder.py script. The
mogulbuilder.py script is distributed to established CCDC Research Partners separately from the main CSD
installation, please contact the CCDC for more details.
Please note that the Python libraries necessary to run the script are now all included in the CSD Python API (there
used to be a separate conda package). The mogulbuilder.py script can be run using the run CSD Python API
command, for instance to get the script's help message:
~/CCDC/ccdc-software/csd-python-api/run_csd_python_api mogulbuilder.py -h
79
grade2 Documentation, Release 1.6.0
The Mogul database is output to the output_directory and will have the name name_db. Depending on the
number of structures in the input database the script will take a number of hours to run.
Once the script has run, use an editor to check the file mogul.path in the output_directory. This should list the
full path of the input structure sqlite database on the line that starts CSD, but older versions of the script fail to
record the path. To use the Mogul database with Grade2 it is essential for the full path to be listed on the CSD
line.
Now that you have prepared your in-house Mogul database, please check that it works with the standalone Mogul
program as explained in the section "Configuring multiple data libraries in the Mogul GUI" in the documentation
supplied with the script.
For further details on the mogulbuilder.py script please refer to the documentation supplied with it.
Note: Please note that Grade2 support for in-house Mogul databases requires recent versions of Grade2 (1.6.0 and
following) and CSD-Core (2024.1 and following).
Before using an in-house Mogul database please ensure that the mogul.path file in the database correctly provides the
full path of the input structure sqlite database on the line that starts CSD, as explained above. You should also ensure
that the database works correctly with the standalone Mogul program.
To use the database with Grade2 then set the environment variable BDG_GRADE2_MOGUL_IN_HOUSE_DATABASE to the
full path of the directory containing the database. For further details see the Grade2 environment variables section.
Once you have done this then you can check that Grade2 recognizes the in-house Mogul database by running the Grade2
-checkdeps option:
$ grade2 -checkdeps
Look through the terminal output for a line listing the Mogul data libraries, this should list the name for your in-house
library that you set in the mogulbuilder.py run. For example:
TWELVE
This chapter describes how Grade2 tackles chemical groups that present problems in restraint generation.
If you come across a chemical group that Grade2 does not handle properly, then please let us know. Please note that
it is unnecessary to disclose the whole of your ligand but instead just the problematic chemical group, for instance, by
finding a compound in PubChem containing the group.
The pentafluorosulfanyl (-SF5 ) moiety is a functional group with useful properties, such as high electronegativity,
chemical and thermal stability, that is becoming increasing popular (Chan, 2019).
For restraint generation the pentafluorosulfanyl group present a challenge as it adopts an octahedral molecular geometry.
Fig. 1: CSD entry GISVOU with labels showing Grade2 classification of the atoms in the CSF5 group.
The octahedral geometry means that the fluorine atoms become non-equivalent and produces bond angles at the sulfur
atoms that are close to either 90º or 180º. As shown in the figure above, Grade2 assigns the fluorine atoms so that Faway
is opposite the carbon atom and Fnext1 is opposite Fnext3 .
81
grade2 Documentation, Release 1.6.0
Restraints for the group were found by using the CSD Conquest program to search for CSF5 in CSD entries with R
factors better than 0.05, excluding organometallics and entries with disorder or errors. This identified 68 CSD entries.
For each of these Grade2 was used to measure the bond angles at the sulfur atom the CSF5 group (excluding the angles
close to 180º).
Fig. 2: Distribution found in the CSD database for bond angles at the sulfur atom of the CSF5 group.
The distributions show that that geometry is not exactly octahedral. From release 1.4.0, Grade2 imposes angle restraints
using the mean and standard deviation found in the CSD distributions, as shown in the figure and table below.
Note that restraints for all angles involving atoms on opposite sides of the octahedral coordination are inactivated, as
the geometry is already heavily restrained by the other bond angle restraints.
It should be noted that RDKit does not handle the CSF5 group well, as it is not supported by the UFF force field as
implemented in RDKit. This means that the RDKit SMILES to 3D routine used by Grade2 fails for CSF5 group. As a
workaround, Open Babel can generate 3D coordinates for molecules with SF5 group:
Fig. 3: Grade2 bond angle restraints for the CSF5 group. Values ideal angle and estimated standard deviation (esd or
sigma) are shown.
THIRTEEN
13.1 Introduction
Grade2 produces restraints for ligands that are based on information from the Cambridge Structural Database (CSD) of
small molecule structures where possible. This chapter examines the compatibility of Grade2 restraints with the EH99
(Engh and Huber, 2006) restraints for amino acids that are used in BUSTER refinements for proteins. EH99 restraints
were obtained by an analysis of CSD structures using different methods when the CSD contained just over 200 thousand
structures compared to today when it has over 1.2 million structures (Statistics on the Cambridge Structural Database).
It is shown how Grade2 bond and angle restraint ideal values are consistent with EH99 values. Furthermore, the
agreement between the sigma values is examined with the conclusion that it is necessary to scale up Grade2 bond and
angle restraint sigmas for complete consistency with EH99.
We thank a Grade Web Server user for raising this matter.
13.2 Method
Grade2 restraint dictionaries were produced for 17 common proteinogenic amino acids (excluding GLY, ALA and PRO).
The bond and bond angle restraints were compared to the EH99 (Engh and Huber, 2006) restraints for each of the side
chains. Data were analyzed using Jupyter Notebook with matplotlib and scipy.stats
13.3 Results
As shown in the figure above, the ideal bond lengths are directly related (Pearson's r (78) = 1.00, p < .001). The root
mean squared difference (rmsD) between the ideal bond lengths for the two sets of restraints is 0.006Å. This can be
compared to an mean bond sigma value of 0.022Å for EH99 and 0.013Å for Grade2. The rmsD between the two sets is
therefore approximately half the lower of the mean sigma values. It can be concluded that ideal bond lengths of Grade2
reproduce EH99 values consistently, as would be expected given they are both based on CSD structural information.
As shown in the figure above, the ideal bond angle in EH99 and Grade2 are also directly related (Pearson's r (104) =
.99, p < .001). The rmsD between the ideal bond angles for the two sets of restraints is 0.7º. This can be compared
to an mean EH99 bond angle sigma value of 1.7º and 1.4º for Grade2. Once again, the rmsD between the two sets is
therefore approximately half the lower of the mean sigma values. It can be concluded that ideal bond angles also agree
well.
85
grade2 Documentation, Release 1.6.0
Fig. 1: Comparison between the ideal bond lengths of EH99 and Grade2 restraints for amino acid side chains. Blue
circles are used to mark each ideal bond length. The green dotted line marks equality.
Fig. 2: Comparison between the ideal angles in EH99 and Grade2 restraints for amino acid side chains.
86 Chapter 13. Comparing Grade2 and EH99 Restraints for amino acid side chains
grade2 Documentation, Release 1.6.0
Fig. 3: Comparison between the bond sigma of EH99 and Grade2 restraints for amino acid side chains.
Comparing the sigma values for bond lengths shows a moderate positive correlation (Pearson's r (78) = .58, p < .001).
The Grade2 value for sigma are smaller than corresponding EH99 value in 95% of bonds. The mean sigma value for
this set of bonds is 0.013Å for Grade2 compared to 0.022Å for EH99. The ratio of the two means is 1.68.
In a similar fashion the sigma values for bond angles (below) show a moderate positive correlation (Pearson's r (105)
= .53, p < .001). The Grade2 angle sigma 1.4º is smaller than than the mean EH99 angle sigma 1.7º by a factor of 1.25.
A further investigation was conducted to find out why a tighter distribution for both bond lengths and bond angles
is produced by Grade2 compared to EH99 values. Grade2 uses a Mogul option to only analyze CSD structures with
an Rfactor <=5%. When this filter was disabled, the mean amino acid bond sigma increased by a factor of 1.63 and
the mean angle sigma by 1.32. As these values are comparable to the difference between Grade2 and EH99, the major
difference between Grade2 and EH99 is likely to be because Grade2 data is based on a selection of the highest resolution
CSD structures rather than a wider set.
Grade2 from release 1.5.0 includes a command line option --eh99_sigma_correction that scales up sigma values
for bonds and angles to match the mean sigma values of the EH99 amino restraints:
• The sigma values for bonds, that do not involve hydrogen atoms, are increased by a factor 1.68.
• The sigma values for bond angles, that do not involve hydrogen atoms, are increased by a factor 1.25.
The --eh99_sigma_correction option can be used for refinements where EH99 restraints are used for protein
residues, for instance in BUSTER. If it is not used then ligand bond and angles will be restrained to ideal values
more strongly in comparison to protein residues (for most users this will not be a problem). The option should not be
used if Grade2 restraints are to be used with CCP4 restraints for amino acids. Comparing Grade2 and CCP4 restraints
Fig. 4: Comparison between the bond angle sigma of EH99 and Grade2 restraints for amino acid side chains.
for amino acid bonds shows they have comparable mean sigmas for both bond length and bond angles (Unpublished
data).
88 Chapter 13. Comparing Grade2 and EH99 Restraints for amino acid side chains
CHAPTER
FOURTEEN
GRADE2 CHANGELOG
Please check the online version of this issues page: https://gphl.gitlab.io/grade2_docs/changelog.html for updates.
14.1 v 1.6.0
• The BUSTER CSD configuration method, used by Grade2, has been revised following recent large changes to
the CSD directory structure and configuration processes. Please see the Configuration Instructions for details
of the new method that involves setting environment variable BDG_CSD_TOP_DIRECTORY. Using a configuration
from a previous BUSTER installation (where BDG_TOOL_MOGUL is set) will continue to work for now but will
trigger WARNING messages.
The new configuration procedure facilitates making a local copy of ccdc-data to speed up Grade2 if CSD is
installed on a slow networked disk. For more details, please the FAQ on Grade2 running slowly.
If you are using an old version of CSD-Core you are strongly recommended to update to CSD-Core 2024.1
• Added support for using an in-house Mogul database. Please see the documentation chapter on in-house Mogul
databases.
Thanks to Bob Nolte for requesting this improvement. (&24)
• If using the latest release of CSD-Core (2024.1 and later), Grade2 will now only use the main Mogul database
based on the annually released CSD database, ignoring the update releases that are made 3 times a year. The
sampling in the update releases is not consistent with that of the main database. This can lead to problems where
strained structures are oversampled leading to bias. It is better to ignore the update releases. (#267).
• The 2D molecular diagram for ligands having chiral carbon atom with undefined chirality have been improved
by using wavy lines to clearly mark the ambiguity, as described in the SMILES string with ambiguous chirality
section of the Usage Chapter. (#670)
• Added support for CXSMILES with enhanced stereochemistry and and or group definitions, as described in the
CXSMILES section section of the Usage Chapter.
Thanks to Prakash Rucktooa for requesting this improvement. (#670)
• A new command line option --diagram_stereo_label will add a label indicating the stereo configuration for
each stereo centre in the schematic 2D molecular diagrams SVG files, as shown in the schematic 2D molecular
diagrams section. (#674)
89
grade2 Documentation, Release 1.6.0
14.1.2 Fixes
• Removed out-of-dates URLs for collecting PDB chemical component definitions. Now only use the EBI URL
like https://www.ebi.ac.uk/pdbe/static/files/pdbechem_v2/GOL.cif and RCSB URL like https://files.rcsb.org/
ligands/download/A1LU6.cif . (#663 & #666)
• Improved error handling for CIF restraint dictionaries that have been manually edited and have restraints involving
an atom_id not in the _chem_comp_atom.atom_id table and hence are corrupt. (#665)
• Fixed bug for PDB component I4O where the extra hydrogen atom added when charging the main chain amide
nitrogen was assigned the ID I rather than H3.
Thanks to Robbie Joosten for reporting this bug. (#683)
• Improved error handling if there is CSD licence problem so that Grade2 now terminates cleanly with a clear error
message. (#689)
14.2 v 1.5.0
• Dative bonds in 2D schematic diagrams are now marked with dotted lines rather than arrows. For instance for
heme HEM:
Previously arrows were used (see below) but dotted lines are more standard and clearer. (#633)
• Performed an analysis comparing Grade2 and EH99 Restraints for amino acid side chains and introduced a new
option --eh99_sigma_correction that scales up all non-hydrogen bond and angle sigmas to match the mean sigma
values of the EH99 amino restraints used by BUSTER. (#635)
14.2.2 Fixes
• Fixed a major bug where a plane was defined for torsions that the MMFF94s force field held flat, when there was
an insufficient number of Mogul hits to assess planarity. This leads to incorrect plane assignment, particularly
for bonds between two aromatic rings. Now the MMFF94s 2-fold torsion term is only used to assign planarity
for hydrogen atoms. For an example of the problem, see the Known Issues page.
Thanks to Markus Rudolph for reporting this bug. (#636)
• The --big_planes option now orders atoms more sensibly using numbers in the atom ID rather than alphabet-
ically. This is intended to make any editing of the big plane easier. Please note that the --big_planes option
overemphasizes ring planarity and should be used with caution.
Thanks to Markus Rudolph for requesting this improvement. (#637)
• Fixed bug where using the --antecedent option with a restraint dictionary for a related molecule that matches
only part of a large complex ligand can result in Grade2 terminating with error message containing atom index
out of range.
The problem is caused by difficulties in creating 2D coordinates with the matching atoms being aligned. If the
problem occurs then Grade2 will now include the following warning message in its output:
Grade2 will then proceed and produce 2D coordinates and 2D schematic diagrams without alignment. (#632)
• Checked that Grade2 can handle five-character PDB chemical component IDs. The PDB will soon be issuing
five-character PDB chemical component IDs as three-character IDs are exhausted. Grade2 has no problem in
handling five-character chemical component IDs. But programs that use PDB-format coordinate files (such as
BUSTER) will have problem with chemical components IDs longer than 3-characters. One possible solution is to
use the --resname option to set a working ID such as 01 for the chemical component, in place of the five-character
PDB chemical component ID. (#640)
• Fixed bug where Grade2 erroneously imposed a planar restraint on 3-atom rings. This makes no sense as 3 points
define a plane, and so BUSTER terminates with an error if supplied with a 3-atom plane.
Thanks to Clemens Vonrhein for raising this issue. (#641)
• Fixed bug where Grade2 did not properly handle PDB ligands with commas in atom IDs (for example T46).
Thanks to Clemens Vonrhein for raising this issue. (#643)
• Fixed bug where Grade2 crashed for malformed PDB chemical components: the original releases of VLW and
VMI that both had a nitrogen atom wrongly assigned a charge +4. Grade2 can now handle the problematic
definitions and we will supply the PDB with information to correct both the ligands.
Thanks to Clemens Vonrhein for raising this issue. (#646)
• The --PDB_ligand --rcsb option will now first download information from a URL like https://files.rcsb.org/
ligands/download/ABC.cif , as this is now the approved URL to obtained for this information.
Thanks to the RCSB for the update. (#657)
14.2. v 1.5.0 91
grade2 Documentation, Release 1.6.0
14.3 v 1.4.1
14.3.1 Fixes
• The CSD update 2023.2 (July 2023) introduces a change that means that old Grade2 versions prior to this release
1.4.1 will not work with it. This release will work with both the CSD update 2023.2 and with older versions
of the CSD. Please see the CSD compatibility chapter https://gphl.gitlab.io/grade2_docs/csd_compatibility.html
for further details.
• Fixed bug where Grade2 produces restraints with bond angle sigma set to zero that causes BUSTER to
terminate. Now a minimum value of 0.3º is used for bond angle sigmas (CIF item _chem_comp_angle.
value_angle_esd). A similar protection is now also applied to bond restraints with a minimum value of 0.005Å
for the bond sigma (CIF item _chem_comp_bond.value_dist_esd). The minimum values were set by sur-
veying the range of sigmas in the set of restraints for common compounds distributed with BUSTER.
Thanks to Christian Schleberger for reporting this bug. (#622)
14.4 v 1.4.0
• The 2D coordinates used for schematic 2D molecular diagrams are now also stored in the output CIF-format
restraint dictionary. The CIF category pdbe_chem_comp_atom_depiction is used for the records (as this is
used in PDBe CCD files). Storing the 2D coordinates will facilitate the production of consistent 2D diagrams in
a future validation tool. (#549)
• The schematic 2D molecular diagrams SVG output files are now scaled to be clear for both small and large
ligands. Previously the same sized diagram was used regardless of ligand size. (#569)
• Molecules where RDKit needs to add a hydrogen atom to show the chirality in the schematic 2D molecular
diagrams (such as MOI and TXL) are now shown without any chiral wedges as this produces clearer pictures.
(#572)
• The font size for atom labels used for 2D diagrams has been slightly reduced. Chiral wedges are no longer shown
in diagrams with atom labels. This makes the diagrams clearer. (#573)
• For PDB chemical components the 2D coordinates used in schematic 2D molecular diagrams are now taken from
PDBe Data-enriched chemical component definitions (PDBe CCD), where possible. Please note if the command
line option --rcsb is used then the default wwPDB CCD will be used and as this lacks the data-enrichment records
RDKit generated 2D coordinates will be produced. (#142)
The PDBe CCD 2D depiction coordinates are produced by the ccdutils tool https://github.com/PDBeurope/
ccdutils . In many cases they are superior to the default RDKit depictions. For instance, for compounds containing
porphyrin rings, such as HEM, Grade2 now creates a clear schematic 2D molecular diagram:
Here the arrows are used to show dative bonds to the iron atom.
To see how the use of PDBe CCD coordinates combined with other improvements to the 2D coordinates produce
better results, contrast the 2D diagram for the PDB component MOI morphine with atom labels, produced by this
release with the cluttered diagram produced by the previous releases:
14.4. v 1.4.0 93
grade2 Documentation, Release 1.6.0
disregarded and new atom IDs are assigned. For more information see the Atom Naming chapter.
Thanks to Steven Sheriff for suggesting this extension. (#548 #550 #577)
• A new option -R, --rdkit_canonical_atom_ids sets new atom IDs based on the atom order in the RDKit canonical
SMILES string for the molecule. As the canonical SMILES string is independent of order of the atoms in
the input molecule, the same atom IDs will be produced regardless of the source. The atom IDs produced
--rdkit_canonical_atom_ids are straightforward with adjacent atoms generally having adjacent IDs, where
possible. For more information, please see the Atom Naming chapter.
Please note, that using --rdkit_canonical_atom_ids wipes out any existing atom IDs and reorders the atom
list. If you would like Grade2 to write out an alias table showing the mapping between the original and new atom
IDs, please let us know. (#592)
• A new option --inchi_canonical_atom_ids sets new atom IDs based on the canonical atom order created when
calculating the InChI of the molecule. The procedure is based on the ALATIS Unique Identifiers proposed by
Dashti et al. (2017) and non-hydrogen atom IDs should be identical to ALATIS. It should be noted that``--
inchi_canonical_atom_ids`` adjacent atom IDs are normally far apart in a molecule, making them less "user-
friendly" but more universal than --rdkit_canonical_atom_ids. For more information, please see the Atom
Naming chapter.
Please note, that using --rdkit_canonical_atom_ids wipes out any existing atom IDs and reorders the atom
list. (#132)
• CSD-based restraints for pentafluorosulfanyl SF5 , that keep the group in an octahedral geometry, have been
introduced. Please see the Difficult Chemical Groups chapter for a full description.
Thanks to Michael Blaesse for suggesting this improvement. (#518)
• A new option --chiral_non_carbon will add chiral restraints specifying the configuration for all chiral atoms not
just tetrahedral carbon atoms; so nitrogen, sulfur, and phosphorous atoms will be treated as defined chiral centers.
• The command line arguments used in a Grade2 run are now recorded in the restraints CIF as item
_gphl_chem_comp_info.arguments. Similarly item _gphl_chem_comp_info.run_date records the date
of the run. (#552)
14.4.2 Fixes
• Fixed Grade2 to work following the large alterations to the CSD directory structure made in the April 2023 CSD
release. The fix allows Grade2 should to work both with the new CSD release as well as with previous CSD
releases, provided the environment variable BDG_TOOL_MOGUL to is set to the location the Mogul executable on
your system as explained in the Installation Chapter. (#580)
• The default --PDB_ligand option will now first download information from a URL like https://www.ebi.ac.uk/
pdbe/static/files/pdbechem_v2/ATP.cif using the https protol before falling back to ftp. This has the advantage
that the https protocol is used and consequently is unlikely to cause firewall connection issues. (#587)
• When RDKit cannot calculate atomic partial charges no longer give a WARNING message. Partial charges are not
commonly used by crystallographers and so the message is unnecessary. Any user of the restraint dictionary who
wants partial charges can easily notice they are all zero.
Thanks to Clemens Vonrhein for suggesting this improvement. (#589)
• When adding a proton to groups likely to be charged at neutral pH during the Charging process five character
atom IDs are now avoided to allow a PDB file to be written.
Thanks to Clemens Vonrhein for suggesting this improvement. (#593)
14.5 v 1.3.2
14.5.1 Fixes
• Fixed bug where Grade2 wrongly warns and assigns ambiguous chiral restraints for SDF input files produced by
some programs.
The problem is that SD files from different programs vary in the markup of whether the chirality of the molecule
is set. The problem is that SD files have multiple places where the chiral configuration can be specified:
– The 3D coordinates (if present)
– The COUNTS line chiral flag
– On individual atoms in the ATOMS block (often written but seldom read?).
– On individual bonds in the BONDS block
SD files produced by different programs will often not set some or all of these, Please see https://depth-first.
com/articles/2021/12/29/stereochemistry-and-the-v2000-molfile-format/ for a discussion of the problem. This
bug is an unintended consequence of bug fix #559 in Grade2 release 1.3.1 which meant that instead of simply
basing chirality on the 3D coordinates supplied within the SD file it also required that appropriate chiral flags be
set with the BONDS block.
The bug fix alters behaviour back to that of previous Grade2 releases where the chirality of the molecule is taken
from the 3D coordinates (just like for MOL2 files). To make the source of chiral restraints clear the terminal
output now includes a line:
14.6 v 1.3.1
14.5. v 1.3.2 95
grade2 Documentation, Release 1.6.0
14.6.1 Fixes
• Grade2 will now write all CIF categories as loops, even if they only contain a single item. An exception is made
for category gphl_chem_comp_info which by default is written using key-value pairs as this makes inspection
easier. If you want all CIF categories including gphl_chem_comp_info to be output as loops then set the
environment variable BDG_GRADE2_CIF_LOOP_ALL to "yes".
The change will help other programs that are restricted to read restraints in CIF loops (rather than fully supporting
the CIF standard). For example, Coot 0.9.6 that does not read a single bond restraint or a bond angle restraint
written as a key-value CIF category. Grade2 restraint dictionaries with a single bond or bond angle restraint can
now be read in.
Thanks to Steven Sheriff for suggesting this alteration. (#555)
• Fixed bug found for Grade2 with an input SMILES string having some chiral center configurations specified but
some left ambiguous. Grade2 now produces an output restraint dictionary where the chiral restraint volume is
set to both for the ambiguous centers rather than being arbitrarily assigned. A warning message is now written
when there are any ambiguous chiral centers.
Thanks to Meigang Gu for reporting this bug. (#559)
• Fixed bug where Grade2 terminates with IndexError: string index out of range for large ligands that
have more than 26 five-membered or six-membered rings.
Thanks to Deepak Deepak for reporting this bug. (#532)
• Fixed bug where Grade2 produced SHELXL restraint files with REM comment lines longer than 80 characters.
Thanks to Tim Gruene for reporting this bug, using the Grade Web Server. (#544)
• Fixed bug where a mangled MOL2 file with valency problems causes a Grade2 crash on checking amino acid
labelling.
Thanks to a user of the Grade Web Server from China for raising this problem. (#545)
• Fixed bug where Grade2 crashed when using --pubchem_names with a ligand that PubChem cannot standardize,
like a peroxide ion [O-][O-]. (#538)
• Fixed bug where Grade2 crashed on some Ubuntu Linux OS with a message containing SSL:
CERTIFICATE_VERIFY_FAILED when using either the default --lookup ID or the --pubchem_names option. If
the problem occurs then the environment variable BDG_GRADE2_SSL_DISABLE_VERIFICATION should be
set (#540)
• Fixed bug where the default grade2 --lookup crashes if the PubChem entry lacks a systematic name (#558).
• When a CIF restraint dictionary with ambiguous chiral center(s) is used as an input, Grade2 now produces an
output restraint dictionary where the chiral restraint volume is set to both for the ambiguous centers rather than
being arbitrarily assigned. (#560)
• Fixed a problem where DEBUG logging output was wrongly produced when CIF restraint dictionaries were used
as an input. (#561)
• Extended bug fix #438 where long records cannot be read by Coot to SMILES strings. Now SMILES strings
over 500 characters are not written to the CIF restraint dictionary. (#533)
14.7 v 1.3.0
• A new option --lookup ID allows an external script to be invoked and look up details of a molecule from a
corporate (or public) database and then run Grade2 to produce restraints for it. The environment variable
BDG_GRADE2_LIGAND_LOOKUP is used to set the location of the script. Please see https://gitlab.com/gphl/
grade2_lookup_scripts for example scripts written in different languages and description of how to write you
own lookup script.
By default, if BDG_GRADE2_LIGAND_LOOKUP is not set, grade2 --lookup CID uses a script that downloads
ligand details from PubChem https://pubchem.ncbi.nlm.nih.gov/ using CID the PubChem compound identifier.
Thanks to Christian Schleberger for suggesting this extension. (#519)
• Grade2 will now write the systematic name of the ligand, if it is available, to the output CIF restraint dictio-
nary. Systematic names for PDB ligands are automatically obtained from the input PDB chemical component
definition. The --systematic option allows the systematic name to be manually set. For further details see the
Systematic names section.
Thanks to Gilbert Bey for suggesting this extension. (#495 & #516)
• The --pubchem_names option can be used to search online for the systematic name of a molecule by looking for
it the PubChem database https://pubchem.ncbi.nlm.nih.gov/.
As the process involves uploading the SMILES string of the molecule to PubChem it should not be used for
confidential ligands. To be extra careful, by default the option is deactivated please see --pubchem_names
documentation for details of the activation process. (#529)
• Added a new FAQ Security: does Grade2 upload any ligand information to public servers?
(#520).
• The Grade Web Server has been updated and improved to run Grade2.
Please see the Grade Web Server chapter for more information. (&1)
• The CCP4-extension CIF item _chem_comp.group is now set to peptide for PDB chemical components that
have _chem_comp.type set to either L-peptide linking or D-peptide linking. In addition, for other inputs
(such as SMILES, SDF or MOL2 file), if an alpha amino acid is recognized and atom IDs (N CA C O OXT CB)
are set then _chem_comp.group will also be set to peptide. This enables Grade2 CIF restraint dictionaries to
be used in Coot to replace protein residues with modified amino acids.
Thanks to Chip Lesburg for suggesting this extension. (#471)
• For PDB chemical components that have _chem_comp.type containing saccharide, DNA LINKING or RNA
LINKING the CCP4-extension CIF item _chem_comp.group is now set to either pyranose, furanose, DNA
or RNA. For saccharides, the identification of either pyranose or furanose is made using the full name for the
ligand from _chem_comp.name. The improvement allows Grade2 CIF restraint dictionaries to be used for glycan
and nucleic acid chains in Coot.
Currently, no check for the chemistry of saccharides or nucleic acids is made for other inputs (such as SMILES).
Please let us know if you would like this to be added. (#477 & #478)
• A new option --group allows the CCP4-extension CIF item _chem_comp.group to be manually set.
Please see --group usage documentation for full details. (#479)
14.7. v 1.3.0 97
grade2 Documentation, Release 1.6.0
• The new option --aa_loose extends setting atom IDs to "exotic" amino acids. By default, only alpha amino
acids with an unmodified amino group are recognized. --aa_loose extends recognition to N-modified amino
acids, Aib-like amino acids with two beta carbon atoms, Gly-like amino acids, and beta amino acids. Please note,
the option only works for input molecules that lack atom IDs (aka atom names) for instance a SMILES string or
an SD file. For further details, please see the Atom Naming chapter.
Thanks to Markus Rudolph, for suggesting this enhancement (&7).
• The --PDB_ligand --rcsb option will now download information from https://files.rcsb.org/ligands/ in preference
to Ligand Expo. This has the advantage that the https protocol is used and consequently is unlikely to cause
firewall connection issues.
Thanks to Clemens Vonrhein for suggesting this improvement (#509).
14.7.2 Fixes
• Fixed problem when Grade2 is supplied with a SMILES input that is then charged then atoms are often reordered
during the charging process. This reordering can cause chiral inversions compared to the original input. The fix
involves producing an initial restraint dictionary from the original SMILES string and then applying the charging
routine to the initial restraint dictionary. This avoids reordering atoms and the the chiral inversion problems.
Thanks to Andrew Sharff and Matthias Zebisch for reporting the bug. (#470)
• Remove misleading wedge indications of chirality from non-carbon atoms in SVG schematic 2D molecular di-
agrams. Now only carbon atoms will be marked as chiral in 2D schematics. For example, the PDB component
VIA, once charged, previously had a schematic 2D diagram with wedges indicating that both a piperazine nitro-
gen atom and the sulfonyl sulfur atom are chiral:
14.7. v 1.3.0 99
grade2 Documentation, Release 1.6.0
14.8 v 1.2.0
• Grade2 will now by default, recognize a typical alpha amino acid with an amino group when supplied with an
input that lacks atom IDs (aka atom names), for instance a SMILES string. If an alpha amino acid is recognized
then the PDB-standard atom IDs (N CA C O OXT CB) will be set for the main chain and beta carbon atoms and
for the hydrogen atoms that they are bonded to. For further details, please see the Atom Naming chapter.
If you prefer for the renaming not to happen, then the new Grade2 command-line --no_aa_labels option turns it
off, leaving standard numerical order based atom IDs.
Note that, currently, no alterations are made if the input file specifies atom IDs (for example CIF restraint dictio-
naries and most MOL2 files).
Please let us know if you would like this feature extended, for instance to set PDB-style Greek letter remoteness
IDs for side chain atoms beyond CB.
Thanks to Thierry Fischmann and Chip Lesburg for suggesting this extension. (#234)
• A new option --ocif is introduced to set the full filename for the CIF restraint dictionary. This allows the specifi-
cation of the exact filename to be used for output. It is most useful when used with the --just_cif option. Thanks
to Steven Sheriff for suggesting this option. (#447)
14.8.2 Fixes
• Grade2 should now deal with MOL2 files of charged molecules that have partial charges for atoms. To correctly
identify the chemistry of a molecule the formal charge of each atom is required. This information is not stored
in MOL2-format if partial charges are defined (the CSD-convention for MOL2 files is to use the partial charge
field to store the formal charge). Grade2 now uses valency considerations to reconstruct the atomic formal
charges if necessary. The fix has been tested with OpenBabel MOL2 files and copes with carboxylic acids,
amines, imidazoles, nitro groups, azido groups, tetrazolates, isocyano groups, sulfanium groups, phosphonium
and borates. Please let us know if you find a chemical group that causes problems. Thanks to Steven Sheriff for
bringing the problem to our attention. (#444, #446 & #448)
• Grade2 should now correctly handle MOL2 files that use bond type ar for carboxylate groups. The CSD nor-
malisation method can make a mistake when standardising the bonding of the group. Grade2 will now correct
which oxygen atom carries the formal negative charge. Thanks to Dirk Reinert for reporting this bug. (#462)
• Fixed problem whereby Grade2 restraint dictionaries could not be read by Coot because of long InChI records.
The problem occurs because of long InChI records in the restraint dictionary, and it also occurs with CCP4-
distributed restraint dictionaries. Currently, the CCP4 MMDB library (against which Coot is linked) places a
line length limit of 500 characters, despite the IUCR CIF specification allowing lines of up to 2048 characters.
We have let CCP4 know and the limit will be raised in a future CCP4/Coot release (by mmdb2 revision 56).
From this release, Grade2 will no longer output long InChI records so there should be no problem in using
Grade2 restraint dictionaries with older versions of Coot. Thanks to Steven Sheriff for bringing the problem to
our attention. (#438)
• If there is a problem with the RDKit chemical setup of a molecule read from MOL2-format Grade2 should now
continue and produce a rudimentary fallback restraint dictionary rather than terminating with an error message.
(#450)
• If presented with an input molecule that has atom names (aka atom IDs) longer than 4 characters give a WARNING
and do not output a PDB file. When producing custom atom names for molecules from SMILES avoid 5-character
atom names where possible. (#454)
• Fixed a problem where DEBUG logging output was wrongly produced when certain CIF restraint dictionaries were
used as an input. (#453)
• Fixed a problem where the message WARNING: Proton(s) added/removed was written to STDERR when a
ligand with charged atoms was processed. The message comes from the InChI generation routine and is nothing
to be worried about. Now InChI generation warning messages are captured and available in the --debug output
if they are of interest. Thanks to Dirk Reinert for reporting this bug. (#461)
• Fixed a problem that the grade2_utils --pdb_to_mol2 script used by buster-report failed when supplied
with old CCP4 restraint dictionaries that contained chiral restraints with volumes such as cross2. Now the script
logs a WARNING about invalid chiral volumes and continues. Thanks to Andrew Sharff for reporting this bug.
(#463)
14.9 v 1.1.0
• LIG is now used for the default residue name (aka PDB chemical component id or 3-letter code). Please see the
FAQ on residue names for more information.
• Grade2 can now read an Grade CIF restraint dictionary as an --in input file. As Grade CIF restraint dictionaries
lack atom formal charge (_chem_comp_atom.charge) records these are set zero when the restraint dictionary is
read and care must be taken as this may cause the output molecule to be incorrect. The InChIKey is read from
the Grade CIF restraint dictionary to enable a check that the stereochemistry matches. Please note that the bond
orders from Grade restraint dictionaries can be incorrect. For further information, please see the FAQ: How can
I use Grade2 to generate a restraint dictionary with atom names consistent with an existing Grade dictionary?.
(#354 & #358)
• Grade2 can now read an eLBOW CIF restraint dictionary as an --in input file (as well as those from AceDRG,
Grade and Grade2 itself). (#350 & #353)
• Known Issues and FAQs chapters added to this documentation (#313). The FAQs include "How can I run Grade2
if I only have a PDB file for the ligand?" and "How can I produce restraints for a ligand with a different protonation
state or tautomer?" with a video demonstration. It is best to check the online versions of the chapters as these are
frequently updated as new issues and questions come in:
– Known Issues: https://gphl.gitlab.io/grade2_docs/issues.html
– FAQs: https://gphl.gitlab.io/grade2_docs/faqs.html
• Information about Mogul data libraries used is now included in the terminal output and the output CIF restraint
dictionary in item _gphl_chem_comp_info.mogul_data_libraries. The CCDC release periodic updates to
the CSD through each year and these will be recorded. In addition, the use of Mogul information from in-house
databases should be logged. (#368)
• A tool to produce MOL2 files for buster-report Mogul analysis using Grade2 code has been produced. This
is to avoid problems in chemical markup from coordinate file. This tool enables the chemistry of the ligand in
Mogul analysis to be based on the CIF restraint dictionary used for refinement (after CSD standardization). The
grade2_utils script option --pdb_to_mol2 is used for the conversion. (#380 & #433)
• Grade2 now produces CIF restraint dictionaries with both electron-cloud and nucleus X-H bond restraints, avoid-
ing requiring separate restraint dictionaries for the two use cases. The --ecloud option is retained to specify that
the ideal coordinates for the ligand should use the electron cloud rather than nuclear distances. The BUSTER
refine option -M Ecloud can be be used to select the e-cloud model or -M HydrogenHybridModel the hybrid
model. (#431)
• Grade2 has been altered to produce a single plane restraint for each separate ring that is judged to be flat. Previ-
ously, planar rings were held flat by a number of four-atom planes. The --4_atom_planes option can be used to
restore the previous behaviour. In practice, the change simplifies the restraint list but there is little difference in
results. Please see the Treatment of Planar Groups chapter for more information. (#342)
• The _chem_comp_atom.type_energy for hydrogen atoms is now set to proper context dependent values rather
than being left as H for all atoms. The information is required for BUSTER to setup non-bonded contacts properly
distinguishing between polar, aromatic and other hydrogen atoms. (#406)
• The version information from the -V, --versions option has been extended to include information as to the location
from which the CSD Python API is loaded. (#368)
• The testing script grade2_tests has been altered to output Grade2 version information. Thanks to Andrew
Sharff for this suggestion. (#430)
• Added FAQ Grade2 says that the ligand matches an existing PDB chemical component. What should I do?. The
FAQ is given in help messages by both the command-line and Grade Web Server interfaces. (#507)
14.9.2 Fixes
WARNING: input has atom names with lower case letters: Br1 Cl1
WARNING: converting lower case atom names to upper case
14.10 v 1.0.0
14.11.1 v 0.1.15
New Features
• Terminate with a clear error message if an attempt is made to run grade2 on a CentOS 6 system. (#301)
• First draft of the Documentation "Charging" chapter. (#274)
Fixes
• Workaround to give exit status 0 if the Grade2 run is successful but where there is a Segmentation fault or
std::bad_alloc on shutdown. This should mean that exit status of grade2 should reliably indicate success or
failure. (#300)
• Improvements in the Documentation "Outputs" chapter. (#261)
• Fixed minor bug where the first information line about $CSDHOME produced by the grade2 script was not indented
by a space. (#293)
• grade2_tests skip the test for EL9 restraint dictionary generation with Mogul as it takes 55 seconds. (#294)
• Do not output final suggestion if --just_cif option is used (because the suggestions given require the PDB
file). (#297)
• Fix many typos in documentation. (#303 & #305)
14.11.2 v 0.1.14
New Features
• The default output PDB chemical component id (aka residue name or 3-letter code) is now L_1 rather than XXX.
The use of an underscore ensures that there is no conflict with the id's of existing PDB components. (#273)
• The names of the files output by grade2 have been altered to make their contents clearer. In particular, the
principal output restraint dictionary is now named L_1.restraints.cif (as CIF-format is used for many types
of data). The molecular diagram filenames start with L_1.diagram., whereas 3D coordinate filenames begin
L_1.xyz.. (#266)
• The grade2 -h help message has been improved. The Help & setup arguments are now listed as a separate
group. All argument descriptions have been shortened with detail now given in the Documentation "Usage"
chapter. (#256)
• All grade2 terminal output messages now start with a space, following a request from Clemens. This is for
consistency with other BUSTER package programs and allows the distinction between program-produced and
system messages. (#282).
• At the end of a grade2 run a suggestion for running Coot or EditREFMAC to view/edit the restraints is now
made. (#268)
• Add Normal termination (N sec) to the end of terminal output, giving elapsed seconds, following
a request from Clemens. Also include the elapsed time information in restraint dictionary as CIF item
_gphl_chem_comp_info.elapsed_seconds. (#281)
• If an input SDF file has 2D coordinates a WARNING: message that XYZ coordinates are generated (#288).
• Improved error handling for PDB chemical components that lack complete ideal or model coordinates (a current
example is T0D). These cases will now terminate with the line:
ERROR: the PDB CCD lacks complete ideal or model coordinates: cannot proceed.
Note that having incomplete coordinates often indicates that there is a problem with the chemical markup of the
PDB component. (#61)
Fixes
• Fixed bug in the gelly geometry optimization chiral restraints setup that caused serious distortions for some
chiral centers. (#265)
• The -P PDB_ID, --PDB_ligand PDB_ID input option will now convert a lower case pdb_id to uppercase,
following Claus' suggestion. (#264)
• Fix bug where occasional atoms on phenyl rings next to bulky groups were not set planar (for example PDB
ligand GVV https://www.rcsb.org/ligand/GVV atom C9). (#270)
• Fixed bug where flat PDB ligands (such as QBK https://www.rcsb.org/ligand/QBK) failed with "cannot load
RDKit coordinate as not 3D" exception. (#287)
• Fixed bug where a Kekulization problem caused failure to produce a rudimentary fallback restraint dictionary
for some PDB ligands containing metal atoms. The fix allows production of fallback restraint dictionaries for
ligands such as X8P https://www.rcsb.org/ligand/X8P . Note that the rudimentary fallback restraint dictionary
is based on the input coordinates from the PDB CCD where Mogul information is not available. (#286)
• Fixed bug where the output log became scrambled when dealing with PDB CCD cif file input that had model
rather than ideal coordinates (such as TPP). (#290)
14.11.3 v 0.1.13
New Features
14.11.4 v 0.1.12
New Features
• Once the restraints have been finalized, a final 'ideal' set of coordinates is produced by geometry optimizing the
current coordinates with the restraints. The stand-alone gelly executable is used for the optimization to ensure
compatibility with BUSTER refine. The optimized conformation is written to the restraint dictionary CIF file,
PDB, SDF and MOL2 files. (#38 & #251)
• The documentation section "Installation and Testing" now explains how to configure and test Grade2. (#250)
• The configuration of Grade2 has been streamlined with clearer advice in error messages. A new optional envi-
ronment variable BDG_TOOL_CSD_PYTHON_API has been introduced for cases where there has been creative use
of symbolic links in the location of CSDHOME. (#250)
Fixes
• Fixed bug that caused the grade2 0.1.12 rc1 Linux version to crash out with FileNotFoundError: ...
screen_final.txt' message. (#253)
14.11.5 v 0.1.11
New Features
• Plane restraints are now used for torsions that are detected to have a strong trans or cis preference in Mogul
analysis. Previously, 1-fold torsion angle restraints were used, like in old versions of Grade. 1-fold torsion
restraints preclude flipping to the other rare conformer as well as not working well in recent versions of Coot.
The plane restraints have an .id starting trans- or cis- so that the conformational preference information could be
utilized downstream. (#240)
• Restraints are now defined for torsions where there is a preference for a planar conformation but where steric
interactions interfere. An example of this is provided by folic where a carbonyl group is attached to a phenyl ring
and is most commonly found pushed out the plane. (#241)
• For plane restraints the standard deviation for the out-of-plane distance (also know as the sigma) is now based
on analysis of Mogul+ data for both ring and non-ring restraints. In practice, this means that ring planes will be
held tighter than non-ring planes. (#241)
Fixes
• The chemistry of nitro groups is now altered to the CSD convention before Mogul geometry analysis is performed.
The MOL2 coordinate file output uses the CSD standardised bonding, including for nitro groups. (#229)
• Sporadic bug in cython version that caused Mogul torsions results to be ignored traced to NamedTuple alteration,
and hopefully fixed. (#243)
14.11.6 v 0.1.10
New Features
• grade2 implement the command-line option -e, --ecloud to use electron-cloud distances for bonds to hydro-
gen atoms that are adequate for X-ray refinement. This option is based on grade -ecloud and the same bond
ideal distances and sigmas are used. (#37 & #220)
Fixes
• Fix bug that chiral restraints involving a hydrogen atom could be produced (planning#5).
14.11.7 v 0.1.9
New Features
• The Grade2 command line option --no_mogul option has been removed to be consistent with Grade. This
means that CSD must be installed to use Grade2. (#204 and #212)
• Grade2 command line option -checkdeps added to be consistent with other BUSTER tools. The -checkdeps
option checks that CSD Mogul is accessible through the CSD Python API and works properly. Like grade
-checkdeps as part of check the ideal bond angle for carbon dioxide is found from a CSD mogul check.
• Started writing user documentation for Grade2. The documentation, in HTML and PDF formats, is included
with BUSTER and can be found in the directory $BDG_home/docs/grade2. The documentation includes this
changelog. (#203, #216)
• The update process for the store of PDB chemical components InChiKeys has been automated to run every
Wednesday after the weekly wwPDB release. This means that store should be up-to-date whenever Grade2 is
released. (#210)
Fixes
• Fix bug where Grade2 run through the distributed shell wrapper gave a exit status of 0 (success) when an error
occurred. (#193)
14.11.8 v 0.1.8
New Features
• The miniconda environment that will be used to distribute Grade2 to users now has Grade2 installed in a binary
form produced by cython. This means the Grade2 Python source code will not be distributed and so is protected
from "prying eyes" and tinkering. (#169, #174 and #178)
The use of cython can be confirmed by using the grade2 option -V, --versions:
$ grade2 --versions
using CSD from $CSDHOME=/Applications/CCDC/CSD_2021/
grade2 0.1.8 (2021-02-02), RDKit 2020.09.1, Mogul 2020.3.0, CSD 542, csd_python_api␣
˓→3.0.4
If a binary cython version is used then loaded from will end in .so and contain cpython
• An automated procedure to produce the conda_pack tarballs that will be used to distribute Grade2 with the
BUSTER installation has been developed. The procedure uses a GitLab CI/CD pipeline. The delivery process
is run whenever a git tag is created, for instance, by making a GitLab release of the grade2 project. Separate
installation tarballs are created for both Linux and macOS. (#177)
• Once Grade2 is installed it can now be tested using the command grade2_tests. This will run over-300 unit
and functional tests using the pytest testing framework. grade2_tests provides a quick way to ensure that a
Grade2 installation works as it should, including that the CSD Python API loads and behaves as expected. (#183)
Fixes
• Fix the WARNING message given if molecule is charged to give the correct -N, --no_charging option and
to be more readable. (#163)
• If a rudimentary fallback restraint dictionary is produced for from MOL2 coordinates the bond and angle re-
straints that are set from the input coordinates will have source input_mol2_coords. (#149)
• Fix PDB output for cases where residue name (aka chemical component id) is not 3-letters so it is correctly col-
umn formatted. In addition restraint CIF dictionary item _chem_comp.three_letter_code is now truncated
to the first 3 letters of _chem_comp.id. (#147)
14.11.9 v 0.1.7
New Features
• Procedure to distribute Grade2 with the BUSTER installation has been developed. Currently, this involves un-
packing a tarball containing a miniconda environment produced by conda_pack and some helper scripts. For
details contact Oliver. (#160 and #164).
Fixes
• Fixed where CSD_PYTHON_API location setting failed if CSD occurred more than once in the CSDHOME
path. (#167)
14.11.10 v 0.1.6
New Features
• Grade2 can now work using a run time import of the CSD Python API from the CCDC miniconda Python
environment that is distributed with the CSD. Please note that this is likely to be the way that grade2 is included
in the BUSTER distribution as it avoids update problems and redistribution of CCDC software. (#155)
• Grade2 now checks the InChIKey of the input molecule against a store of those from the wwPDB chemical
components definitions (wwPDB CCDs) https://www.wwpdb.org/data/ccd . This provides similar information
to the recognise-compound feature of Grade, with improvements such as detection of tautomers. For example,
the CHECK user output when generating restraints for PDB ligand 2D3:
(((output omitted)))
shows that 2D3 ligand has a tautomer XQK in the wwPDB CCD. The output includes the RCSB URLs for each
ligand as this is useful to help the user examine the hit(s). The check is made before time consuming step as
this makes it more likely for the output to be read. The CHECK output is included in the output restraint CIF in
the items _gphl_check_inchikey_pdb_ccd.text. The wwPDB CCDs store used is provided in the separate
repo that will be updated on a weekly basis. (#104)
• Improvement in the logging information that Grade2 provides about a PDB component to include the RCSB and
PDBeChem URLs for the molecule. In addition, the all upper case molecule names used for old components are
now reformatted for readability. Using component 468 as an example:
Fixes
• The csd_python_api version number is included along side Mogul and CSD version numbers both in user output
and in the output restraint dictionary CIF file as item _gphl_chem_comp_info.csd_python_api. (#154)
• Switch to using the latest PDBeCIF parser directly available from pip. This simplifies the installation process
removing the need to separately install PDBeCIF. (#157)
• The installation section of README.md has been updated (#158).
14.11.11 v 0.1.5
New Features
• Grade2 has a new option -s, --shelx to produce SHELX restraint .dfix files. If specified two additional files
will be created with the suffices .dfix and .with_hydrogen.dfix. The former file has restraints excluding
those to hydrogen atoms. The actual filenames will be depend on the OUT_ROOT that can be set with the -o
OUT_ROOT, --out OUT_ROOT option if the default is not suitable. (#26)
• Grade2 will now try to produce a rudimentary fallback restraint dictionary for PDB ligands where there is an
RDKit sanitization problem. This can occur in cases where the PDB Chemical Components Definition has
valency problems or for problematic groups such as carborane. The rudimentary fallback restraint dictionary
will be based on input coordinates where Mogul information is not available. (#134)
• Improve treatment of metal-containing PDB ligands to recognize dative bonds and run Mogul against CSD
organometallics. Restraint dictionaries for compounds such as heme (HEM) are improved. (#135)
Note that there is still a limitation that the UFF force field setup does not work for transition metals because of
RDKit limitation. Much future work is required to treat metals properly.
Fixes
• The ligand name is now included in the user log output for PDB ligands. (#137)
• Grade2 is now hard coded to only create chiral restraints with a central carbon atom, so nitrogen atoms that
RDKit recognises as chiral will no longer be affected. (#120)
14.11.12 v 0.1.4
New Features
• Grade2 now provides improved logging of the InChI comparison. When an InChI is available from the input (for
instance for PDB ligands using -P PDB_ID) this is compared to the InChI for the RDKit molecule generated. If
there is a match then this is noted in the output log as this is a good indication that the stereochemistry of molecule
has been correctly setup. If there is a mismatch then a WARNING message is produced. Information about the
InChI comparison is also provided in the output CIF restraint file in items _gphl_chem_comp_info.input_inchi*
to allow machine reading. (#124)
• A script pdb_ideal_mol2_generator has been added that produces a MOL2 file for a given PDB ligand from
the PDB Chemical Component Definition using Grade2 input parsing to RDKit and the CSD Python API. For
help on using the script use the -h option. Please note this is only likely to be useful to developers for test and
may be removed before release to users. The grade2 option -P PDB_ID, --PDB_ligand PDB_ID should be
used for to generate restraint dictionaries for PDB ligands. (#126)
14.11.13 v 0.1.3
New Features
• Grade2 now supports molecule input from a CIF-format restraint dictionary produced by Acedrg or Grade2
itself. Unfortunately because of incomplete information it would be difficult to support reading of Grade restraint
dictionaries. (#105)
• Grade2 now outputs SDF and MOL2 format files for the molecule in addition to the PDB format file. The SDF
and MOL2 files have the advantage of explicitly including bonding and atom formal charge information. The
MOL2 file is written by CCDC routines and represents the chemistry supplied for Mogul analysis. (#116)
• The output CIF-format restraints dictionary produced by Grade2 has been extended to include information about
bond aromaticity. The CIF item _chem_comp_bond.aromatic is used following the practice of Acedrg. The
information presented is from the RDKit_aromaticity_model. It should be noted that there are a number of dif-
ferent models of aromaticity, that can lead to different results for fused and multi-ring systems as demonstrated in
the OpenEye OEChem Toolkit page on aromaticity_perception. For this reason, procedures based on aromatic-
ity perception should be undertaken with caution. For this reason, Grade2 does not use aromatic information
internally. (#122)
Fixes
• SMILES and InChi descriptors are now reported as CIF item _pdbx_chem_comp_descriptor in the output CIF-
format restraint dictionary to conform this the PDB Exchange Data Dictionary. (#115)
• Bug where SMILES files containing just the SMILES string and no names caused a crash has been fixed. (#118)
• Fix bug where charging adding hydrogen atom starting from MOL2 input caused crash "ZeroDivisionError: float
division by zero". (#119)
• Fix bug reading acedrg CIF restraint dictionary from MOL2 start that lacks a _pdbx_chem_comp_descriptor
information for SMILES and InChIKey. (#121)
14.11.14 v 0.1.2
New Features
• Grade2 can now handle file input from smi (SMILES) file type. (#25)
• Grade2 can now handle file input from MOL2 (SYBYL) file type. (#24) Routines from the CSD Python API are
used to input MOL2 files as the RDKit MOL2 parser has limitations. (#113)
Fixes
• Grade2 can now handle monoatomic PDB ligands, like NA sodium ion. (#65)
• Improve handling of problematic SMILES. If there is a problem in the initial coordinate generation will now
retry using random coordinates. (#86)
• Where PDB Chemical Component Definition has _chem_comp_atom.charge as '?" grade2 will now set the charge
to 0 and issue a WARNING message (problem arose for PDB ligand QQ7). (#67)
• Alter grade2_utils command-line options to be consistent with grade2. The option --compare IN_FILE2
now checks that the file IN_FILE2 exists before opening. (#107 and #52)
• Error messages about CSD Python API and Mogul problems have been cleaned up and include suggestion of
rerunning with -n, --no_mogul. (#60).
• Grade2 now produces a sensible error message if supplied with a file that cannot be processed (#108).
• Bug where non-zero _chem_comp_atom.charge was not set working starting from SMILES input has been fixed.
(#109)
• Charging carboxylic acid to carboxylate no longer assumes hydrogen atom specified second in bond. Fixes bug
for CSD MOL2 INDPRA01 (#112).
14.11.15 v 0.1.1
New Features
• Grade2 can now handle file input from mol and sdf file types. (#23)
• Command line option --itype implemented to allow user setting of the file type. By default, this is detected
from the filename extension and file contents. (#23)
• Command line option --name implemented to set _chem_comp.name name of compound. This will be displayed
in buster-report. (#28)
• Command line option --database_id implemented to set a database_id. buster-report will provide a hyperlink
for known PDB ligands. (#28)
• Command line option -b, --big_planes implemented to produce fused planes rather than lots of 4-atom
planes. (#27)
Fixes
• Grade2 will now produce logging output to STDOUT rather than STDERR. This is similar to original Grade and
makes redirection of output much easier (#103).
14.11.16 v 0.1.0
New Features
• The project is now called "Grade2" rather than "Gorr". Command-line scripts for users are now called grade2
and grade2_utils (#94). Command line options for grade2 have been revised in line with the "Grade2 Release
Candidate Proposal Document". (#93 and #96)
• Charging common neutral groups such as carboxylic acids, phosphates and alkyl amines. By default if you supply
Grade2 with a molecule that has a neutral carboxylic acid and/phosphate group this will be deprotonated to form
charged carboxylate or phosphate ion. Conversely if you the molecule has an 'alkyl amine' (that is a neutral
nitrogen atom bound to hydrogen atoms and/or carbon atoms that are connected to 4 other atoms) a proton will
be added to it. This charges primary amino, piperidine, and piperazine groups. To turn off the feature then use
the command line option --no_charging or -N. Please Oliver know if you would like for the list of groups to
be charged to be extended. (#53)
• If Grade2 is supplied with a SMILES string that has ambiguous stereochemistry then the user will be warned
and the resulting restraints will have the chiral restraint volume set to both. In other cases of ambiguous stereo-
chemistry command-line option --chirality_both or -c can be used to set chiral restraint volumes to both.
(#71)
• grade2_utils can read restraint CIF files from CCP4 ACEDRG to facilitate comparison of restraints between
Grade2 and ACEDRG. (#81)
Fixes
• Planar atoms without full Mogul information now set from MMFF94s out-of-plane restraint rather than sum of
bond angles (#84)
• Chiral restraints no longer placed on phosphorous atoms. These restraints can cause distorted phosphate groups
if the oxygen atom's atom_ids are not standard. Grade2 is now hard coded to only create chiral restraints with a
central carbon atom, so phosphate groups will no longer be affected. (#75 and #120)
• Fix bug where piperidine and piperazine ring nitrogen atoms wrongly set planar from Mogul results (PDB ligands
9JY and VIA) (#83)
• Ideal bond angles not available from Mogul now taken from force field optimized values rather than the force
field equilibrium value. This is get around cases where MMFF94 has bond angle restraints inconsistent with
planar restraints like atom N6 of ATP. (#91)
Other Changes
14.11.17 v 0.0.3
New Features
14.11.18 v 0.0.2
Fixes
• Restraint CIF produced for PDB ligand CLF (FE8-S7 cluster) (#62).
• gorr -PDB_ligand now retries the download 3 times after a wait of 0, 10, 40 seconds wait (#58).
• Deal with PDB ligands lacking or incomplete ideal coordinates, for instance TDP (#55). Model coordinates will
be used.
14.11.19 v 0.0.1
New Features
• First release for GPhL testing. Limited to SMILES and PDB ligands
FIFTEEN
KNOWN ISSUES
Please check the online version of this issues page: https://gphl.gitlab.io/grade2_docs/issues.html as this is updated
frequently as new issues arise.
15.1.1 When used with CSD Core 2024.2 Grade2 outputs "Detected locale..."
.............................
When Grade2 is used with the latest version of CSD Core (2024.2) a multi-line message like the following
is written to the terminal output:
Detected locale "C" with character encoding "ANSI_X3.4-1968", which is not UTF-
˓→8.
Please ignore the message. Grade2 does not use Qt and it will work perfectly well.
We are trying to work out how we can stop the message appearing.
.............................
A Consortium user reports that grade2 terminates immediately after outputting the initial INFO lines
(without producing any additional output). In contrast, grade2_tests works normally. The problem
does not occur for most users.
The problem can be fixed by the software installer setting up BUSTER and then editing the file
$BDG_home/scripts/grade2 to remove the second line:
set -e
115
grade2 Documentation, Release 1.6.0
Grade2 should then work normally. Please let us know if you continue to have problems.
• For the changes made in the release see Changelog release 1.4.1
• See also the BUSTER issues page https://www.globalphasing.com/buster/wiki/index.cgi?IssuesPage202307
Grade2 uses Mogul/CSD information for planes where possible. But if there are not enough Mogul hits,
Grade2 creates a plane when MMFF94s force field (as implemented in RDKit) imposed a 2-fold torsion.
This can lead to the wrong assignment of planarity. For example, in the PDB component PWR a plane
was assigned between the amide group and the isoquinoline ring:
After fix #636 in Grade2 release 1.5.0 MMFF94s 2-fold torsion term is only used to assign planarity for
hydrogen atoms. As can be seen in the diagram, the bond in question is now allowed free rotation. The
PWR ligand pose in PDB structure 7GH0 is supported by clear electron density and has a out-of-plane
torsion around 30º.
Problem fixed #636 in Grade2 release 1.5.0.
15.2.2 Antecedent option terminates with an "atom index out of range" error
Using the --antecedent RELATED_RESTRAINTS_CIF option with a restraint dictionary for a related
molecule that matches only part of a large complex ligand can result in Grade2 terminating with error
messages, like the following:
...
ANTECEDENT: 16 unmatched atoms from input molecule so set new atom IDs: C10␣
˓→C11 C12 C13 C14 N15 N16 O17 O18 H10A H10B H12 H14A H14B H15 H16
The problem is caused by difficulties in creating 2D coordinates with the matching atoms being aligned.
The bug will be fixed in the next release of Grade2 (fix #632).
In the meantime, it is possible to workaround the problem by editing the restraint dictionary used for the
--antecedent option and removing the CIF loop pdbe_chem_comp_atom_depiction that contains the
2D coordinates.
15.3.1 Grade2 will not work with the latest CSD update 2023.2 until you update to
Grade2 1.4.1
The CSD update 2023.2 (July 2023) introduces a change that means that old Grade2 versions prior to
release 1.4.1 will not work with it.
The easiest way to ensure compatibility is to update both BUSTER (and hence Grade2) and CSD. Please
see the CSD compatibility chapter https://gphl.gitlab.io/grade2_docs/csd_compatibility.html for further
details.
Grade2 release 1.4.1 works with both the CSD update 2023.2 and with older versions of the CSD.
15.3.2 Bug: bond angle sigma set to zero in Grade2 restraints causes BUSTER to
terminate
Occasionally, Mogul returns a bond angle sigma of 0.0º. When this occurs the TNT program used Grade2
via gelly reports an error message that is included in the gelly output:
Grade2 then proceeds and writes a .restraints.cif file including the bond angle with the problem-
atic sigma of 0.0º. Using this restraint file in BUSTER will cause BUSTER to fail because of the TNT
GEOMETRY protecting against Sigma cannot be zero or negative errors.
This issue affects all versions of Grade2 prior to release 1.4.1.
As a temporary workaround, the .restraints.cif can be edited to alter the CIF item
_chem_comp_angle.value_angle_esd for the problem bond angle from 0.0 to 0.3. The editing can
be done either with a text editor or with a restraint editor like EditREFMAC or within Coot. Once this is
done then BUSTER should be able to use the .restraints.cif without any problems.
Thanks to Christian Schleberger for reporting this bug.
15.3. Issues found with Grade2 version 1.4.0 (Distributed with BUSTER 20230614) 117
grade2 Documentation, Release 1.6.0
15.4.1 Getting Grade2 to work with CSD release 2023.1 released April 2023
The CSD release 2023.1 made late April 2023 involves a large alteration in its directory structure. The
next Grade2 release (due in early June 2023) will work immediately with CSD 2023.1. Until then there is
a workaround to get Grade2 to work with the new CSD release.
Problem fixed #580 in Grade2 release 1.4.0.
After update to CSD release 2023.1, it is possible for Grade2 to terminate with an error message ending
with a line that contains CSDNotFoundException. If this occurs please see: FAQ Grade2 terminates with
a "CSDNotFoundException", what should I do?.
15.4.3 Grade2 -P, --PDB_ligand failed for some PDB components (now fixed)
Between 31 March 2023 and 13 April 2023 grade2 --PDB_ligand or grade2 --P failed for many
PDB components with an error that ended pdbecif.mmcif_tools.MMCIFWrapperSyntaxError. The
problem was caused by a change to the ccdtools tool used at PDBe to the chemical components defi-
nitions (CCDs). This has now been fixed https://github.com/PDBeurope/ccdutils/issues/12 and grade2
--PDB_ligand will now work fine.
The problem affected all versions of Grade2 but has been fixed by PDBe. Big thanks to the PDBe team,
particularly Ibrahim Roshan for the fix.
15.5.1 Grade2 wrongly assigns ambiguous chiral restraints for some SDF input files
15.6.1 Grade2 does not properly handle SMILES with some ambiguous chirals
When Grade2 is given an input SMILES string having some chiral center configurations specified but some
left ambiguous it arbitrarily assigns a configuration to the ambiguous centers.
The bug affects all Grade2 releases prior to 1.3.1.
15.6.2 Grade2 termination IndexError: string index out of range for a very
large ligand
When supplied with a ligand that has contains than 26 six-membered rings Grade2 crashes with an error
message that ends:
The error occurs because rings are given identifiers in the sequence ring6A, ring6B, ring6C ... ring6Z.
This broke down when there are more the 26 rings.
The bug has been fixed (#532) allowing any number of rings by continuing the sequence of identifiers ...
ring6Z, ring6AA, ring6AB The bug affects all Grade2 releases prior to 1.3.1.
Thanks to Deepak Deepak for reporting this bug.
Bug fixed #532 in Grade2 release 1.3.1.
When BUSTER is installed on some Ubuntu Linux OS the --lookup ID using the default
pubchem_g2_lookup_script.py script distributed with BUSTER can fail terminating with an error
message that includes SSL: CERTIFICATE_VERIFY_FAILED. For example:
15.6. Issues found with Grade2 version 1.3.0 (Distributed with BUSTER 20221121) 119
grade2 Documentation, Release 1.6.0
15.7.1 Charging from SMILES reorders atoms and can cause chiral inversion prob-
lems
If Grade2 is supplied with a SMILES input that is then charged then atoms are often reordered during the
charging process. This reordering can cause chiral inversions compared to the original input. The bug was
first observed when testing the new amino acid atom labelling feature using modified amino acids from
SMILES strings and can cause an erroneous chiral restraint forcing a D-amino acid conformation.
The problem can be avoided by using the --no_charging command-line option that will mean the original
SMILES string will be used. If charging is wanted then the resulting Grade2 restraint dictionary can be
used for a follow-on Grade2 run. For instance:
the second Grade2 run will apply charging so a zwitterionic amino acid restraint dictionary is produced.
Please note that the bug is limited to SMILES inputs and does not occur with other kinds of input like SD
files. This bug affects previous Grade2 releases.
Thanks to Andrew Sharff and Matthias Zebisch for reporting this bug.
Bug fixed #470 in Grade2 release 1.3.0.
15.8.1 Grade2 cannot read a MOL2 file of a charged molecule when it has atomic
partial charges
Depending on their source, MOL2 files can contain a variety of atomic charge records. Grade2 uses
CSD routines to read MOL2 files and can correctly process MOL2 files for neutral uncharged molecules.
Currently, grade2 has a problem reading in molecules with formal charges when the MOL2 has partial
charge records (that will result in unusual valence and RDKit sanitization errors). There is no
problem reading MOL2 files that use the CSD convention where the atomic charges are used for formal
charges, for instance those written by Conquest or Grade2 itself.
We are currently working on improving Grade2 so that it will better handle MOL2 files with partial charges.
This will be included in the next release. In the meantime, it is possible to manually edit correct formal
charges, please see the FAQ Editing MOL2 file of a charged molecule with atomic partial charges. Thanks
to Steven Sheriff for reporting the problem. (#444)
Bug fixed #444 in Grade2 release 1.2.0.
15.8.2 Grade2 restraint dictionaries cannot be read by Coot because of long InChI
records
For large molecules, attempting to read a Grade2 restraint dictionary results in an error message that begins
Dirty mmCIF file?, for instance:
The problem occurs because of long InChI records in the restraint dictionary and also occurs with CCP4-
distributed restraint dictionaries. Currently the CCP4 MMDB library (against which Coot is linked) places
a line length limit of 500 characters, despite the IUCR CIF specification allowing lines of up to 2048
characters. We have let CCP4 know and the limit will be raised in a future CCP4/Coot release. As an
additional measure to avoid the problem, the next Grade2 release will not output long InChI records.
In the meantime, if the problem occurs then please adapt the following command-line fix:
The resulting restraint dictionary will be stripped of the problematic lines and should work with Coot.
Thanks to Steven Sheriff for reporting this problem. (#438)
Bug fixed #438 in Grade2 release 1.2.0.
15.8.3 On MacOS 10.15 using NFS - there may be problems running Grade2
We have found a problem in running Grade2 on a MacOS 10.15 workstation where the software has been
installed on a NFS-mounted filesystem. If you come across such issues please let us know.
15.9.1 Grade2 cannot read a MOL2 file of a charged molecule when it has atomic
partial charges
Please see section above - Bug fixed #444 in Grade2 release 1.2.0.
15.9.2 Grade2 restraint dictionaries cannot be read by Coot because of long InChI
records
Please see section above - Bug fixed #438 in Grade2 release 1.2.0.
15.9. Issues for Grade2 version 1.0.0 (Distributed with BUSTER 20210716 and 20211020) 121
grade2 Documentation, Release 1.6.0
15.9.3 On MacOS, Grade2 does not work with latest CSD Release 2021.2 (September
2021)
On MacOS there is a problem using Grade2 with the latest CSD update due to library duplication.
So please do not update CSD to release 2021.2
Bug fixed #390 in Grade2 release 1.1.0.
15.9.4 On MacOS installation of (or update to) 2021.1 CSD Release (July 2021)
causes Grade2 to crash
The --big_planes option merges neighbouring individual four-atom planes into as large a single plane as
possible. -big_planes was included in the Grade2 options as Grade had this feature. Unfortunately, the
current Grade2 --big_planes option ignores the 𝜎 's (standard deviation of the out-of-plane distance)
for individual 4-atom plane restraints when merging, instead setting the 𝜎 of each big plane to 0.020
Angstroms.
For example, for PDB component DZ3 https://www.rcsb.org/ligand/DZ3 Grade2 produces four-atom
planes with 𝜎 's obtained from Mogul + custom CSD analysis:
In particular, notice the weak planes across the bonds joining the phenyl rings to the amide (marked in
green). These weakly encourage planarity but can easily overcome if the electron density fit warrants it.
Running grade2 -P DZ3 --big_planes produces a single big plane:
The subtleties of the individual 4-atom plane 𝜎 's are ignored and the single plane is erroneous, imposing
unrealistic conformational restriction.
We will be looking at whether it is sensible to merge plane definitions given that plane 𝜎 's are now derived
from CSD analysis and how this best be done. This issue will be fixed in the next release. For now, it is
strongly recommended that the Grade2 --big_planes option should not be used.
The BUSTER tool aB_fuseplanes that has a number of different modes for combining planes. Unfor-
tunately, aB_fuseplanes from Jul 16 2021 BUSTER release does not work with Grade2 CIF restraint
dictionaries. The issue has been fixed in the BUSTER release Oct 20 2021 and aB_fuseplanes now
works with Grade2 CIF restraint dictionaries. Initial test show that aB_fuseplanes -checkTOR pro-
duces a reasonable set of fused planes.
Bug fixed #342 in Grade2 release 1.1.0.
15.9.6 Bug: input from MOL2 file that has atom names containing lower case letters
If supplied with a MOL2 file that has atom names with lower case letters such as Cl2 Grade2 will output
a restraint dictionary CIF using these identifiers unchanged. This will cause a downstream problems as
BUSTER cannot cope with lower case letters in atom names. Thanks to Dirk Reinert for reporting this.
Grade2 should emulate Grade when by giving a WARNING message and then making the atom name
upper case. The bug will be fixed (#324) in the next release. In the meantime, then sed (or awk) could be
used as a workaround, for instance:
15.9.7 Bug: grade2 will fail if the environment variable PYTHONPATH is set
The environment variable PYTHONPATH can be set to add additional directories where Python will look for
modules and packages (see tutorialspoint PYTHONPATH for more detail). In general, it is best practice
to avoid setting PYTHONPATH in your default run time environment in order to get a particular program to
work as this can interfere with other programs.
Currently, Grade2 is vulnerable to PYTHONPATH being set and will fail with a message being ERROR:
ImportError when grade2 is run. Thanks to Yong Wang for reporting the problem.
The easiest way to avoid the problem is to unset PYTHONPATH before Grade2 is run by:
unset PYTHONPATH
One way to do this is to use an alias for grade2, for bash/dash shell this can be done by:
Alternatively the wrapper script $BDG_home/scripts/grade2 could be edited adding a line unset
PYTHONPATH after the first line.
The bug will be fixed in the next release (#349).
Bug fixed #349 in Grade2 release 1.1.0.
15.9. Issues for Grade2 version 1.0.0 (Distributed with BUSTER 20210716 and 20211020) 123
grade2 Documentation, Release 1.6.0
15.9.8 Bug: "Suggestion: to view/edit the restraints, use one of the commands"
gives wrong filenames when --out is used
Grade2 tries to be helpful by suggesting commands that can be used to view or edit the restraints. For
instance, if restraints for the PDB chemical component ID VIA are produced by:
The EditREFMAC and coot commands will work to be able to view/edit the restraints produced.
However, if the --out option is used then currently incorrect suggested commands are given. For example,
running:
both commands will fail as the restraint dictionary and PDB filenames are incorrect. The suggested com-
mands will be corrected in the next release of Grade2, in this case to:
(the blank lines above are included so that hyperlinks to issues above work better).
Please check the online version of this issues page: https://gphl.gitlab.io/grade2_docs/issues.html as this is updated
frequently as new issues arise.
15.9. Issues for Grade2 version 1.0.0 (Distributed with BUSTER 20210716 and 20211020) 125
grade2 Documentation, Release 1.6.0
SIXTEEN
Please check the online version of this FAQs page: https://gphl.gitlab.io/grade2_docs/faqs.html as this is updated as
new questions come in.
Please also see:
• the BUSTER frequently-asked questions, and
• the Grade2 Known issues page.
16.1.1 How can I check that Grade2 is correctly installed and works properly?
grade2 -checkdeps
If this results in a final line starting with SUCCESS then Grade2 has been successfully configured with access
to a working CSD software installation. If there is a problem please see the Configuration Instructions.
To test that all the components used by Grade2 work as expected on your system then run the command:
grade2_tests
grade2_tests will run over 300 unit, functional and integration tests written as part of the test-driven
development used for coding Grade2. Please see Testing section for more details.
Grade2 uses the Mogul and the CSD Python API tools from the CCDC. Because of this Grade2 requires
an installation of the CSD-Core package to work. For details on how to obtain CSD-Core please see:
https://www.ccdc.cam.ac.uk/solutions/csd-core/
If you cannot get access to CSD-Core then you can run Grade2 on your non-confidential ligand using the
Grade Web Server:
http://grade.globalphasing.org/
127
grade2 Documentation, Release 1.6.0
16.1.3 Why must I enter my name and email address to use the Grade Web Server?
We ask you to enter your name and email address so that we can contact you if an issue arises when
Grade2 is run with your molecule. We do not routinely contact people and do not store email addresses in
the long term. For further details please see the Grade Web Server Conditions of Use and Privacy Policy:
http://grade.globalphasing.org/grade_server/conditions.html .
Because the CSD and BUSTER installations are separate incompatibilities can arise. Before you update
your CSD update please check https://gphl.gitlab.io/grade2_docs/csd_compatibility.html to make sure
there is not a problem.
16.1.5 Why does Grade2 give a WARNING about using an old version of CSD Python
API?
.............................
As you are using an old version of CSD-Core you are recommended to update CSD-Core, having first
checked the CSD compatibility page: https://gphl.gitlab.io/grade2_docs/csd_compatibility.html
If using the latest release of CSD-Core (2024.1 and later), Grade2 will now only use the main Mogul
database based on the annually released CSD database, ignoring the update releases that are made 3 times
a year. The sampling in the update releases is not consistent with that of the main database. This can lead
to problems where strained structures are oversampled leading to bias. It is better to ignore the update
releases.
Grade2 will work, for now, using your old version of CSD-Core but the Mogul update databases will be
not be ignored.
Note: To simplify support, from the next Grade2 release (scheduled autumn 2024) Grade2 will not work
with CSD-Core releases older than 2024.1.
Grade2 uses the CSD Python API to perform access CSD data and analysis routines. Grade2 loads the
CSD Python API at runtime from a separately installed CSD. This procedure requires that the Python
versions used by Grade2 and the CSD Python API are compatible. If incompatible versions are detected
Grade2 will terminate with a message like the following:
ERROR:
ERROR: Grade2 is running using Python version "3.9"
ERROR: but the CSD Python API miniconda has Python version is "3.7"␣
˓→from:
ERROR:
ERROR: /Volumes/SmartDisk/CSD_2021/Python_API_2021/miniconda/lib/
˓→python3.7/
ERROR:
ERROR: These are incompatible and cannot work together.
(continues on next page)
This indicates that a recent update of Grade2, version 1.4.1 or above has been run with an older version
CSD, dating to before version 2023.2.0, released prior to July 2023. As described in the message, there
are two ways to get around the problem, either:
1. Update CSD to the latest version, or:
2. Set the environment variable BDG_GRADE2_PYTHON_VERSION to 3.7 as described in the Using
Grade2 release 1.4.1 (and following) with old CSD releases section.
We really like suggestions for improvements to Grade2. Please send an E-mail to buster-
[email protected] saying what you would like.
In order to help us in the book-keeping of user support requests, of the issues they raise and of the responses
we supply, it would be really helpful if you could follow the following guidelines.
If you have a problem with Grade2 please:
1. First check the online Known issues page:
https://gphl.gitlab.io/grade2_docs/issues.html
as it may already be a known problem with a solution or workaround.
2. Then check the online version of this FAQs page:
https://gphl.gitlab.io/grade2_docs/faqs.html
as this is updated frequently as new questions come in.
3. Please send an e-mail to [email protected] describing the problem with as much
detail as possible.
Please make sure you include the following information in the e-mail:
• A clear description of the issue, what kind of input was used and from where it originated (for
instance, if a MOL2 file was used what program wrote it).
• A descriptive subject for the e-mail. For instance, "Grade2 crashes for ligands containing
boron" is much better than "Restraints Problem".
• The terminal output of Grade2 where the problem is encountered.
grade2 -checkdeps
grade2_tests
16.1.10 Security: does Grade2 upload any ligand information to public servers?
Grade2 does not upload any information about a ligand to public servers, unless you activate and then
choose to use the --pubchem_names option.
The --PDB_ligand ID option retrieves information from wwPDB sites about the specified existing PDB
chemical component given ID, its three letter code . Similarly the --lookup ID uses a script to retrieve
information for a molecule with ID from a public or internal chemical database depending on the script
used. Both of these options retrieve information about pre-existing molecules.
During a Grade2 run a check for related PDB components is made. This check is entirely local with no use
of any external services. The procedure finds the input molecule's InChIKey using RDKit routines. This
InChIKey is compared to a precalculated list of InChIKeys for all PDB components that is distributed as
part of Grade2. The list of InChiKeys will be up-to-date at the time of the Grade2 release and can be found
in one of the following files (depending on the operating system):
$BDG_home/.mc/linux64/lib/python3.7/site-packages/pdbccdinchikeys/data/
˓→PDBCCD_id_status_date_inchikey_name.csv
$BDG_home/.mc/darwin/lib/python3.7/site-packages/pdbccdinchikeys/data/
˓→PDBCCD_id_status_date_inchikey_name.csv
The --pubchem_names option involves uploading the SMILES string of the molecule to PubChem and so it
should not be used for confidential ligands. To be extra careful, by default this option is deactivated and
will not work until it is activated. Please see --pubchem_names documentation for details of the activation
process.
The Grade Web Server http://grade.globalphasing.org/ provides a way to Grade2 online. Clearly, this
necessarily involves transmission of the molecule of interest to a public web server, so the Grade Web
Server should not be used for confidential ligands.
16.2.1 How can I use Grade2 to generate a restraint dictionary with atom names
consistent with an existing Grade dictionary?
Suppose you are working on a project and have used Grade to generate a restraint dictionary for the ligand,
used this for model building & refinement and now want to continue using a Grade2 restraint dictionary. To
make the switch painless it is important that the atom naming for the ligand should not be altered. Thanks
to Wei-Chun Kao for raising this question.
Grade2 can reliably use CIF restraint dictionaries from AceDRG, eLBOW and Grade2 as an input. But
Grade's restraint dictionaries CIF lacks explicit atom charge records (_chem_comp_atom.charge). The
first release of Grade2 (1.0.0) would terminate with an error message in such a case. However, in most
cases it is normally OK to assign all atoms a charge of 0 . Hence, from release 1.1.0, when Grade2 reads
such as file it assigns a charge of 0 to each atom and writes a WARNING message to the terminal output:
WARNING:
WARNING: Input restraint file CIF lacks explicit atom charge records _chem_
˓→comp_atom.charge
WARNING: ---- so will set all the atom charges to 0 and continue.
WARNING: ---- This should be fine for most neutral molecules.
WARNING: ---- But for charged atoms/molecules it will fail!
WARNING: ---- See FAQs https://gphl.gitlab.io/grade2_docs/faqs.html for more␣
˓→information
WARNING: ---- and instructions about a manual workaround to set atom charges.
WARNING: ----
WARNING: ---- Check InChi match messages below for problems!
WARNING:
For most neutral molecules assigning a charge of zero will be fine. However, for charged molecules or
those containing groups like nitro the approach will produce incorrect chemistry. It is important to check
the subsequent terminal output for messages about the checks made on the InChI read from the input file
and that for the RDKit molecule used by Grade2. There should be a message:
RDKit molecule generated has the same InChI as that from the input.
---- This indicates that the stereochemistry matches so setup is successful.
WARNING: RDKit molecule created has an InChI that does not match that read␣
˓→from the input.
then it will be necessary to manually edit the molecule's bonding and charge state - please see the next
FAQ for a guide of how to do so.
16.2.2 How can I run Grade2 if I only have a PDB file for the ligand?
Grade2 does not allow PDB-format files to be directly used as an input. This is because the PDB-format
does not normally carry information as to the order of the molecule's bonds. Assigning the bond order
is necessary before restraints can be generated. If the PDB file for the ligand includes hydrogen atoms,
then Open Babel can be used to assign bond orders and produce a MOL2-format file. For example for
grade-INH.pdb:
Carefully examine the results checking that chemistry of the resulting molecule matches that of the original
Grade. If there is any difference then Mercury should be used for the conversion to MOL2 as Mercury
allows a manual editing of the chemical markup as described next.
In the case of an old macromolecular refinement result, ligands routinely lack explicit hydrogen atoms and
this makes chemical markup particularly challenging and prone to error.
The CSD-core program Mercury (that you will have access to as it is distributed alongside Mogul) can be
used to read in a PDB-format file of a ligand, assign bond orders and add hydrogen atoms if necessary.
There is an "auto Edit Structure..." option but the results of this should always be carefully checked. If
there is a problem then Mercury has comprehensive manual editing options that can be used to alter bond
orders, add/delete hydrogen atoms and set atom charges. Once you are happy the chemistry of the ligand
molecule is correct, then in Mercury save it to a MOL2 format. The MOL2 file can then be used as an
input to Grade2, using the --in option.
As an example to show this process, lets use the PQA ligand from PDB structure 2bal. Suppose that the
only details of the ligand available was the conformation from the PDB file that lacks hydrogen atoms (in
reality the grade2 option --PDB_ligand PQA should be used).
Extracting the coordinates of the ligand from the PDB file:
Then run Mercury and load the file 2bal_pqa.pdb. Once loaded, select the Mercury option "Edit" ->
"Auto Edit Structure..."
If your molecule lacks explicit hydrogen atoms tick the option "Add missing H atoms". Then click the
"Apply" button and Mercury will then analyze the structure, assign bond orders and add hydrogen atoms:
You should check the results carefully as ascribing chemistry to a molecular structure in the absence
of bond orders and without hydrogen atoms is a difficult task. If the bond orders and/or hydrogen atoms
added are wrong then Mercury has comprehensive manual editing options that can be used to alter bond
orders, add/delete hydrogen atoms and set atom charges. In the PQA example test case, Mercury correctly
assigns the bond orders and adds hydrogen atoms to the PQA molecule and no editing is required, despite
the piperidine ring being in a 'mangled' conformation.
Once you are happy that the edited chemical markup of the molecule is correct, select the Mercury menu
item "File" -> "Save As" and the option "Mol2 files". This will save the molecule as a MOL2 file that
will preserve the atom names from the original PDB file as well as your edited chemical markup.
Then use the resulting MOL2 file as the input to grade2 using the --in option. Note that you will also need
to specify the correct residue name (3-letter code) for the molecule as this is not preserved by Mercury.
Doing this for the example test case:
results in producing a restraint dictionary for the molecule where the piperidine is charged. The non-
hydrogen atom names are consistent with the input PDB file. The restraint dictionary can then be used for
16.2.3 How can I produce restraints for a ligand with a different protonation state
or tautomer?
The CSD-core program Mercury (that you will have access to as it is distributed alongside Mogul) can
be used to manually edit the bonding, hydrogen atom positions and atomic charges of a ligand. Mercury
can be used to edit a ligand's tautomeric or protonation state, while preserving its atom names. For a
demonstration of how to do this in practice please see:
Video: Using Mercury to edit charge/ tautomeric state for Grade2 restraint dictionary generation
16.2.4 How can I produce restraints for the trans (or cis) stereoisomer of my ligand?
.............................
As explained in the documentation section "How Grade2 handles cis-trans stereoisomers", it is usually
unnecessary to produce specific Grade2 restraints for the trans or cis stereoisomer of a ligand. However, It
may be necessary to flip the stereoisomer in Coot. Similarly, if you actually need to alter Grade2 restraints
to allow only one stereoisomer, this is also described.
16.2.5 Editing MOL2 file of a charged molecule with atomic partial charges
There is currently an issue Grade2 cannot read a MOL2 file of a charged molecule when it has atomic
partial charges. When this happens, it is possible to manually edit correct formal charges using Mercury.
For a demonstration of how to do this in practice please see:
Some molecular inputs to Grade2, such as SMILES strings or 2D SDF files, do not have 3D coordinates
for the atoms. When given such an input, Grade2 will use RDKit routines to produce an initial 3D confor-
mation.
Generating 3D conformations for a knotted molecule with many intersecting rings can be difficult. Indeed,
it is possible to construct SMILES strings for molecules that cannot be constructed in 3D, for instance
c1c2ccc3cc2ccc13 is a napthalene with an additional bond between carbon atoms on opposite sides of
the double ring:
If Grade2 produces a message ERROR: Cannot generate a 3D conformation for the input
molecule. then we would suggest:
• Check the input SMILES string. Has it been corrupted? How reliable is its source?
• Use an online 2D image generator for instance http://hulab.rxnfinder.org/smi2img/ or https://cactus.
nci.nih.gov/gifcreator/ . Does the image make sense?
• If the molecule is already available in another format (for instance SDF) then use this.
• If the SMILES string contains stereo atom specifiers @ try removing or altering these.
• Try other 3D conformation generators. These could be restraint generation programs (for instance
Grade or AceDRG) or an online tool such as https://cactus.nci.nih.gov/translate/ . Check any re-
sults carefully using molecular graphics (for instance Mercury). If a reasonable 3D conformation is
produced then use this as an input to Grade2.
If you are still stuck, please contact us at [email protected] and we will try to help.
16.2.7 Grade2 runs slowly for a ligand. What can I do about this?
.............................
The time consuming part of a Grade2 run is the use of Mogul procedures (Bruno et al., 2004) to search
the CSD through the CSD Python API. Mogul speeds up retrieval of geometric information by storing
tables of data for the most common chemical features found in the CSD. Access to information from the
precalculated tables is fast. However, if the molecule in question has unusual chemistry (in particular for
rings) Mogul will perform searches on structures from the CSD. Such searches involve accessing many
gigabytes of data. Consequently, the speed of the searches is dependent on the speed of programmatic
access to the CSD and Mogul databases.
From Grade2 release 1.6.0, it is easy to specify the location of CSD and Mogul databases separately from
the CSD software, as described in the making a local copy of ccdc-data section.
To demonstrate that installation of the databases on low-performance disks can result in slow runs, a test
job (using a cut-down SMILES string from cephalosporin C 0MU):
was run on an old Linux workstation (Intel i7-3770S @ 3.10GHz, 8 Gb memory) using Grade2 release
1.6.0 and CSD 2024.1.
It can be seen that the elapsed run times are heavily dependent on the disk type for the ccdc-data directory
(that contains CSD and Mogul databases with 19Gb of data). Using a slow hard disk drive results in run
times that are around 4 times slower than using an solid state drive. Using a very slow network drive results
in a run time over 150 times greater. In contrast, the disk type of the CSD top directory, containing the
CSD Python API, has little effect on run times.
Consequently, if Grade2 performance is an issue it is advisable to use a local disk (preferably a SSD) to
store the CSD and Mogul databases, as described in the making a local copy of ccdc-data section.
Note: It should be noted that Grade2 works well when the CSD is installed on a high performance
NFS. For a modern workstation (Intel Core i7-6700K @ 4.20GHz and 32Gb RAM) when CSD and ccdc-
data are installed on a high performance NFS (Synology Disk Station NAS SA3400). the test job runs in 4
minutes 36 seconds elapsed. Using ccdc-data copied a local SSD disk results, the test job runs in in small
decrease in run time by 7 seconds. It therefore only makes sense to make a local copy of ccdc-data for low
performance network drives.
Note: In the past, particular issues have been found using NFS version 3 with Grade
(the predecessor of Grade2), see https://www.globalphasing.com/buster/wiki/index.cgi?
SoftwareMogulRelease2014NFSissues for details. NFS version 3 has been obsolete for many years
and should not now be in use.
16.3.1 Grade2 prints "Segmentation fault" after a successful run, does this matter?
.............................
If Grade2 prints out a line containing Segmentation fault after a successful run there is no need to
worry about this, please ignore the message.
This FAQ provides details about the problem.
At the end of a successful run (that ends with a line to the STDOUT terminal output that starts with
Normal termination), Grade2 occasionally writes out a message containing Segmentation fault to
the STDERR terminal output, for example:
The problem appears to only occur at the end of a successful run. The problem occurs sporadically and
normally if the run is repeated no Segmentation fault is then reported. In our tests the problem occurs
in around 1 in 200 runs on some servers, very occasionally on some computers but never occurs on other
servers.
The problem was observed from the start of the development of Grade2 in 2020. A workaround was
introduced before the first public release so that the grade2 command gives an exit status 0 (meaning
success) at the end of a successful run (with Normal termination), ignoring any Segmentation fault
on the final Python process tear down.
Note: Such segmentation faults generally arise when a C library tries to write/read outside the memory
allocated for it or when writing to memory which can only be read. The sporadic nature of the problem is
likely explained by the fact a failure will only occur if the memory accessed is locked to another process.
It would be a good idea to fix the problem. We have recently (May 2024) looked into the problem again
and found that similar segmentation fault terminations also sporadically occur using a minimal test job
using the Python interpreter distributed with CSD Python API. We have let the CCDC know about this.
16.3.2 Grade2 says that the ligand matches an existing PDB chemical component.
What should I do?
As part of the Grade2 run a check is made whether the molecule matches any existing PDB chem-
ical component (from the wwPDB Chemical Component Dictionary https://www.wwpdb.org/data/ccd
). The check uses the InChIKey of the ligand. The InChIKey is a shortened form of the Interna-
tional_Chemical_Identifier (InChI) that facilitates the comparison of molecules.
For example, if Grade2 is supplied the SMILES string Cn1cnc2c1C(=O)N(C(=O)N2C)C then the follow-
ing terminal output will result:
So the SMILES string is for caffeine that is an existing PDB component https://www.rcsb.org/ligand/CFF.
It is likely to be sensible to use the Grade2 dictionary for CFF for fitting and refinement.
Please note that tautomers normally have the same InChIKey (for an example see PDB components 2D3
and XQK).
If Grade2 reports an unexpected match to an existing PDB chemical component, then it may be a good
idea to switch to using a Grade2 restraint dictionary for the matching chemical component. If you deposit
the structure to the PDB with a ligand that matches an existing PDB chemical component then it will be
renamed (including all the atom IDs). If a match is reported please check that the tautomeric/charge state
is what you want before switching.
The Grade2 option --antecedent allows the atom IDs to be taken from a related ligand. If you are working
on a tautomer of an existing PDB ligand component then it can be used to produce consistent atom IDs.
There is no problem in Grade2 using any string as a residue name for a novel ligand (using the --resname
option. It is common practice for companies to use LIG, INH or DRG, although these three codes were
issued in the PDB chemical component library, they have now been withdrawn:
Dear all,
The wwPDB OneDep team would like to inform you that we have reserved a set of
ligand identifier codes that will never be used by the PDB. This is to allow
depositors to use such codes for their new ligands during structure
determination processes.
These reserved ligand codes are LIG, INH, DRG, and 01-99 (two digits). The
OneDep deposition system will be ready for this change in December 2021.
Regards,
Jasmine
It can be noted, that some groups use the work code UNL but this has a specific meaning in the wwPDB
database meaning "unknown ligand" https://www.rcsb.org/ligand/UNL. This normally indicates that an
unexpected ligand has been identified from a blob of electron density. So it is best to avoid UNL as a
working residue name.
By default, Grade2 now uses residue name LIG.
Grade2 can produce CIF restraint dictionaries for residue names longer than three-characters but currently
there are often compatibility problems with downstream programs, such as BUSTER. BUSTER uses PDB-
format for molecular input and currently can only handle residue names that are 3-characters or shorter.
This will need to be dealt with soon (PDB news: once all three-character alphanumeric codes are exhausted
four-character codes will be issued).
Please note, that it is also necessary to avoid residue names for the common compounds (such as SO4) that
can be found in $BDG_home/tnt/data/common-compounds.
(the blank lines above are included so that hyperlinks work better).
Please check the online version of this FAQs page: https://gphl.gitlab.io/grade2_docs/faqs.html as this is updated as
new questions come in.