0% found this document useful (0 votes)
44 views9 pages

Graph-Based Molecular Representation Learning

Uploaded by

Johan Ruister
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views9 pages

Graph-Based Molecular Representation Learning

Uploaded by

Johan Ruister
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Graph-based Molecular Representation Learning

Zhichun Guo1 , Bozhao Nan1 , Yijun Tian1 , Olaf Wiest1 , Chuxu Zhang2 , Nitesh V. Chawla1
1
University of Notre Dame, 2 Brandeis University
{zguo5,bnan,yijun.tian,Olaf.G.Wiest.1}@nd.edu, [email protected], [email protected]
arXiv:2207.04869v1 [q-bio.QM] 8 Jul 2022

Abstract paper, we provide a systematic review of the progress in this


rapidly-developing topic, charting the path from representa-
Molecular representation learning (MRL) is a key tion learning methods that incorporate molecular structures to
step to build the connection between machine the methods that also incorporate domain knowledge.
learning and chemical science. In particular, it
encodes molecules as numerical vectors preserv- Motivation 1: why molecular representation learning
ing the molecular structures and features, on top matters?
of which the downstream tasks (e.g., property pre- Molecular representation learning has a broad spectrum
diction) can be performed. Recently, MRL has of applications closely related to people’s life. For exam-
achieved considerable progress, especially in deep ple, drug discovery via wet-lab experimentation is extremely
molecular graph learning-based methods. In this time-consuming and expensive. With the advancement of
survey, we systematically review these graph-based deep learning, a great number of experiments can be simu-
molecular representation techniques. Specifically, lated by machine learning models. Property prediction can
we first introduce the data and features of the 2D help identify the molecules with target properties. Reaction
and 3D graph molecular datasets. Then we summa- prediction could predict the major products. These signifi-
rize the methods specially designed for MRL and cantly reduce the amount of failed experiments. For all these
categorize them into four strategies. Furthermore, chemical applications, MRL is the key determinant of the suc-
we discuss some typical chemical applications sup- cess of deep learning models.
ported by MRL. To facilitate studies in this fast- Motivation 2: why deep graph learning for molecular rep-
developing area, we also list the benchmarks and resentation learning?
commonly used datasets in the paper. Finally, we Molecular graphs naturally describe molecules with rich
share our thoughts on future research directions. structural and spatial information. Molecules are essen-
tially atoms and bonds interconnecting atoms, which natu-
rally lead themselves to graph representations. Compared
1 Introduction with SMILES, a line-based representation (i.e., string) of
molecules, molecular graphs provide richer information for
The interaction between machine learning and chemical sci- MRL models to learn from. As a result, graph-based MRL
ence has received great attention from researchers in both ar- models evolve much faster than sequence-based MRL mod-
eas. It has made remarkable progress in various chemical els. Additionally, more and more general graph learning pa-
applications including molecular property prediction [Guo pers [Gilmer et al., 2017; Hu* et al., 2020; You et al., 2020]
et al., 2020; Sun et al., 2021; Yang et al., 2021b; Liu et employ molecular graph datasets to examine the performance
al., 2022b], reaction prediction [Jin et al., 2017; Do et al., of their algorithms as well.
2019], molecular graph generation [Jin et al., 2018a; Jin et Contributions. The main contributions of this work are
al., 2020b] and also drug-drug interaction prediction [Lin et summarized as follows:
al., 2020]. Molecular representation learning (MRL) is an • We present a systematic review of the recent progress in
important step in bridging the gap between these two fields. graph-based MRL models based on various kinds of molec-
MRL aims to utilize deep learning models to encode the in- ular inputs and summarize the strategies specifically de-
put molecules as numerical vectors, which preserve useful signed for MRL.
information about the molecules and serve as feature vec-
tors for downstream (machine learning) applications. Ear- • To encourage reproducible research on this topic, we sum-
lier molecular representation learning methods use general marize the representative benchmarks and commonly used
representation learning models to represent molecules with- datasets in various downstream applications.
out explicit involvement of domain knowledge. Recently, • We discuss the limitations of 2D and 3D molecular graphs
many algorithms are specifically designed for MRL, which as input and share our thoughts on future research direc-
can better incorporate chemical domain knowledge. In this tions of MRL for giving references to the community.
Molecular Graph Input Molecular representation Learning

Molecular Graph (c) Molecular Structure Domain-knowledge


based Method based Method

Graph Neural Network


Molecular forces
Molecular property Aggregation
Molecular Reaction
3D Molecular Graph
Spatial-Learning Knowledge Graph
based Method based Method Molecular Representation

Applications

(a) (b) (d)

Figure 1: Overview of graph-based molecular representation learning: (a) Two molecular graphs; (b) The general learning process of graph
neural networks; (c) Four methods proposed for graph-based molecular representation learning; (d) The process of aggregating atoms’
representations to obtain the molecular representation.
Table 1: Details of node and edge features in molecular graph.
2 Data Representations Attribute Details
Traditionally, researchers use fixed fingerprint feature ex- Node
traction rules to identify important information about each Atom type 118
Chirality tag unsprcified, tetrahedral cw, tetrahedral ccw, other
molecule and feed this hand-crafted information to a linear Hybridization sp, sp2 , sp3 , sp3 d, or sp3 d2
classification/regression head for downstream tasks. This re- Atomaticity 0 or 1 (aromatic atom or not)
quires significant time to determine and calculate the most Edge
relevant features, and the designed features still cannot sup- Bond type single, double, triple, aromatic
Ring 0 or 1 (bond is in a ring or not)
port all tasks. To avoid these efforts, most deep learning Bond direction -, endupright, enddownright
models are developed to learn the molecular features auto- Stereochemistry -, any, Z, E, cis, trans
matically. Two kinds of molecular representations are used
as inputs: molecular graphs and sequences. Accordingly,
graph-based and sequence-based models are developed to
learn from different input molecular representations. The
sequence representations, such as simplified molecular in- cency matrix. Molecules are usually saved as SMILES for
put line-entry system (SMILES) [Weininger et al., 1989] and convenience and converted to molecular graphs for compu-
SELF-referencing Embedded Strings (SELFIES) [Krenn et tation using specific tools. For example, RDKit [Landrum,
al., 2020] can be converted into molecular graphs, but this 2020] can convert a SMILES string into a molecular graph
conversion involves a significant amount of domain knowl- with the feature and adjacency matrices. The commonly used
edge. When we take sequence representations as input, this features of nodes and edges are listed in Table 1. In this ta-
knowledge is not easy to aware of by sequence-based learning ble, atom and bond types are mandatory features to be in-
models. In contrast, the graph representations can naturally cluded, while other features are optional and they can be
incorporate additional information in nodes and edges, which included on demand for different tasks [Tang et al., 2020].
is easily leveraged by the rich suite of graph-based models Among these features, the atom’s chirality tag cannot be
(e.g., graph neural networks). Therefore, we will focus on the learned from the common 2D molecular graph representa-
graph representation in this survey, as it is more commonly tion without 3D geometric information. Other features are
used nowadays. In this section, we will clarify the molecular all learnable from both 2D and 3D structures. For the con-
graphs (without spatial information) and 3D molecular graphs nection relationship, we consider each bond as a bidirectional
representations, as shown in Figure 1 (a). For each represen- edge, which means that a bond between atom A and B will
tation, we analyze its characteristics and discuss its usages result in two edges in the adjacency list: one from A to B,
and limitations when utilized in deep learning models. another from B to A. Given the above data, researchers can
leverage homogeneous [Gilmer et al., 2017; Guo et al., 2021;
2.1 Molecular Graph Coley et al., 2019] or heterogeneous networks [Shui and
Karypis, 2020] to learn molecular representations.
A graph consists of nodes and edges interconnecting the
nodes. Analogously, in a molecule, we may consider atoms The advantage of using molecular graphs as input is obvi-
as nodes and bonds as edges between the atoms. Thus, a ous: graph neural networks can be applied directly to learn
molecule has a natural graph structure. This renders the molecular representations using their topological structures.
molecular graph to be the most feasible input for deep learn- However, the bonds in this kind of graph are determined by
ing models and leads to their extensive use. The most com- the distance between atoms, which neglects the spatial direc-
mon form of molecular graphs is described by three matri- tion and torsion between atoms. This limits the knowledge
ces: the node feature matrix, edge feature matrix, and adja- that can be derived from the general molecular graph.
2.2 3D Molecular Graph and R are neural networks with the learned weights updated
The 3D molecular graph provides the missing geometric in- during the training process.
formation by explicitly encoding the spatial structure. It pro- Besides MPNN, different variants of graph neural net-
vides the atomic structure as a set of atoms together with their works like GCN [Kipf and Welling, 2017], GIN [Xu et al.,
3D coordinates, which involves more atoms’ information. As 2019], GAT [Veličković et al., 2018], GGNN [Li et al.,
a result, this representation format has received increasing at- 2016] and GraphSage [Hamilton et al., 2017] can also be
tention in MRL [Liu et al., 2022b]. Different from taking 2D used directly to learn molecular representations. These meth-
molecular graphs as input, the techniques based on the 3D ods are widely utilized as the base encoder for molecular
graphs take atoms as nodes but learn the atomic interactions representation learning in various downstream tasks, such
as edges using graph neural networks. Under this setting, the as reaction prediction [Coley et al., 2019], property predic-
atom features are spatial-invariant (e.g., atom types) while the tion [Brockschmidt, 2020] and drug discovery [Jin et al.,
coordinates provide relative positions between atoms. For 2020c]. Hu et al. [Hu* et al., 2020] conduct a comparative
example, the bonds can be determined by the distance be- study on graph neural networks in property prediction and
tween two atoms using their coordinates. To incorporate more find that GIN usually achieves the best results. While these
complicated spatial relationships, spherical graph neural net- models are powerful in learning graph structures, chemical
works [Liu et al., 2022b] are designed to learn molecule traits, and knowledge, the essence of molecules is largely
structure from the 3D graphs. neglected. Recently, various deep learning methods are de-
signed specifically for molecules as well. These methods are
categorized into four parts in Figure 1(c), which are elabo-
3 Methodology rated as follows.
In this section, we start with the general graph neural net-
works for MRL. Then, we discuss methods designed specif- 3.1 Molecular Structure-based Method
ically for this task and categorize these methods into four
Graph-based MRL generally considers molecular graphs the
strategies. These specific methods incorporate chemistry-
same as other plain graphs. It only focuses on the topo-
related information to strengthen molecular representations
logical structures but cares less about special substructures
in different ways, which leads to better performance. The
or properties contained in the molecular graphs. Recent re-
representative methods are listed in Table 2.
search has seen a foray into self-supervised learning strate-
Formally, each molecule generally is considered as an gies [Jin et al., 2020a] that push the model to pay more at-
undirected graph G = (V, E, X) with node features xv ∈ tention to the graph structures. PreGNN [Hu* et al., 2020]
X for v ∈ V and edge features euv ∈ E for (u, v) ∈ utilizes context prediction and node/edge attribute masking
E [Brockschmidt, 2020]. Here, nodes represent atoms two self-supervised strategies to pre-train GNN. Different
and edges represent bonds. Generally, graph-based learn- from this general unsupervised design, GROVER [Rong et
ing methods can fit into Message Passing Neural Networks al., 2020] proposes molecular-specific self-supervised pre-
(MPNN) [Gilmer et al., 2017] scheme. Therefore, we take training methods: contextual property prediction and graph-
MPNN as an example to illustrate the learning process, as level motif1 prediction. MGSSL[Zhang et al., 2021] also
shown in Figure 1 (b). The forward pass consists of three op- designs a motif-based graph self-supervised strategy, which
erations: message passing, node update, and readout. During predicts the motif’s topology and label during the motif tree
the message passing phase, node features are updated itera- generation process. INFOGRAPH[Sun et al., 2020] trains
tively according to their neighbors in the graph for T times. the model by maximizing the mutual information between
We initialize the embedding of node v as h0v = xv . Formally, the representations of the entire graph and substructures of
node hidden states at step t + 1 are obtained based on mes- different granularity.
sages mt+1
v , which are represented as: Contrastive learning is a common self-supervised learn-
X ing strategy, which utilizes data augmentation to make mod-
mt+1
v = Mt (htv , htu , euv ), (1) els produce graph representations with better generalizabil-
u∈N (v) ity, transferability, and robustness. Three general graph
augmentation methods are proposed by GraphCL [You et
ht+1
v = Ut (htv , mt+1
v ), (2) al., 2020], which can also be applied to molecule datasets.
where Mt is the message function, Ut is node update func- MoCL [Sun et al., 2021] proposes two molecular graph aug-
tion, N (v) is the set of node v’s neighbors in the graph. After mentation methods: one is replacing a valid substructure
updating the node features T times, the readout function R with a similar physical or a chemical property-related sub-
computes the whole graph embedding vector as follows: structure. The other one is changing a few general carbon
atoms. Molecular 2D and 3D graph representations are natu-
rally two augmented views of molecules. Using this charac-
ŷ = R(hTv | v ∈ V). (3) teristic, GeomGCL[Li et al., 2022] and GRAPHMVP[Liu et
al., 2022a] train the model with contrastive learning. Molecu-
Note that, R is invariant to the order of nodes so that the
framework can be invariant to graph isomorphism. ŷ is the 1
Motifs are recurrent sub-graphs among the input graph data,
representation for the molecule and passed to a fully con- which encode rich domain knowledge of molecules and can be eas-
nected layer to do downstream tasks. All functions Mt , Ut , ily detected by the professional software
Table 2: A list of representative graph-based molecular representation learning algorithms. Four methods (MS, DK, SS, KG) corresponding
to four parts presented in Section3. Here, MS specifically represents molecular substructure related methods. There are four training methods
included in this table: self-supervised learning (SSL), supervised learning (SL), pre-training (PT), and contrastive learning (CL).
Algorithm Task Encoder Method Train Method Venue Code Link
[1]
MPNN Property prediction MPNN / SL ICML’17 /
DimeNet[2] Property prediction MPNN SS SL ICLR’19 https://github.com/klicperajo/dimenet

GNN-FiLM[3] Property prediction MPNN / SL ICML’20 https://github.com/microsoft/tf-gnn-samples

GROVER[4] Property prediction GAT MS SSL + PT NeurIPS’20 https://github.com/tencent-ailab/grover

Pre-GNN[5] Property prediction GIN MS SSL +PT ICLR’20 https://github.com/snap-stanford/pretrain-gnns/

INFOGRAPH[6] Property prediction GNN MS SSL ICLR’20 https://github.com/fanyun-sun/InfoGraph

GraphCL[7] Property prediction GNN MS SSL+CL+PT NeurIPS’20 https://github.com/Shen-Lab/GraphCL

MoCL[8] Property prediction GIN MS SSL+CL+PT KDD’21 https://github.com/illidanlab/MoCL-DK

MGSSL[9] Property prediction GIN MS SSL+PT NeurIPS’21 https://github.com/zaixizhang/MGSSL

PhysChem[10] Property prediction MPNN DK SL NeurIPS’21 /


KCL[11] Property prediction MPNN KG SSL+CL AAAI’22 https://github.com/ZJU-Fangyin/KCL

GeomGCL[12] Property prediction MPNN SS SSL AAAI’22 /


GRAPHMVP[13] Property prediction GNN SS SSL+PT ICLR’22 https://github.com/chao1224/GraphMVP

SphereNet[14] Dynamics simulation MPN SS SL ICLR’22 https://github.com/Open-Catalyst-Project/ocp

WLDN[15] Reacton prediction WLN / SL NeurIPS’17 https://github.com/wengong-jin/nips17-rexgen

MolR[16] Reaction prediction GNN DK SL ICLR’22 https://github.com/hwwang55/MolR

JT-VAE[17] Molecule generation MPNN MS SL ICLR’18 https://github.com/wengong-jin/icml18-jtnn

MoleculeChef[18] Molecule generation GGNN / SL NeurIPS’19 https://github.com/john-bradshaw/molecule-chef

GraphAF[19] Molecule generation R-GCN / SL ICLR’20 https://github.com/DeepGraphLearning/GraphAF

HierVAE[20] Molecule generation MPN MS SL IMLR’20 https://github.com/wengong-jin/hgraph2graph

MoLeR[21] Molecule generation GNN MS SL ICLR’22 /


VJTNN[22] Molecule optimization MPNN MS SL ICLR’18 https://github.com/wengong-jin/iclr19-graph2graph

AttSemiGAE[23] Drug-drug interaction GAE / SL IJCAI’18 https://github.com/matenure/mvGAE

KGNN[24] Drug-drug interaction GNN KG SL IJCAI’20 https://github.com/xzenglab/KGNN

ConfVAE[25] Conformation generation GNN SS SL ICML’21 https://github.com/MinkaiXu/ConfVAE-ICML21

RationaleRL[26] Drug discovery MPNN MS SL ICML’20 https://github.com/wengong-jin/multiobj-rationale

[1]
[Gilmer et al., 2017]; [Klicpera et al., 2019]; [Brockschmidt, 2020]; [Rong et al., 2020]; [Hu* et al., 2020];[6] [Sun et al., 2020];[7] [You et al., 2020];
[2] [3] [4] [5]
[8]
[Sun et al., 2021];[9] [Zhang et al., 2021];[10] [Yang et al., 2021b];[11] [Fang et al., 2022];[12] [Li et al., 2022];[13] [Liu et al., 2022a];
[14]
[Liu et al., 2022b];[15] [Jin et al., 2017];[16] [Wang et al., 2022];[17] [Jin et al., 2018a];[18] [Bradshaw et al., 2019];[19] [Shi et al., 2020];
[20]
[Jin et al., 2020b];[21] [Maziarz et al., 2022];[22] [Jin et al., 2018b];[23] [Ma et al., 2018];[24] [Lin et al., 2020];[25] [Xu et al., 2021];[26] [Jin et al., 2020c]

lar structure knowledge is not only utilized in self-supervised 3.3 Spatial Learning-based Method
learning. Motif, substructure, and scaffold-based molecular
representation learning applied in molecular generation [Jin Molecular spatial information, especially geometric informa-
et al., 2020c; Maziarz et al., 2022; Wu et al., 2022] also tion, attracts wide attention and is involved more and more
achieved competitive performance. in the molecular representation learning process, especially
when the model needs to learn forces or energy on atoms.
3.2 Domain Knowledge-based Method DimeNet [Klicpera et al., 2019], GemNet [Klicpera et al.,
Combining deep learning and molecular science is vital for 2021a] and Directional MPNN [Klicpera et al., 2021b] pro-
molecular representation learning. Involving chemical do- pose directional message embeddings. Although they still
main knowledge in the model design is an effective way to take 2D molecular graphs as input, they consider not only the
improve performance. Yang et.al [Yang et al., 2021b] pro- distances between atoms but also the spatial directions, which
pose a novel model, PhysChem, which is composed of a are calculated by atoms’ 2D coordinates. They use directional
physicist network (PhysNet) and a chemist network (Chem- information by transforming messages based on the angle be-
Net). PhysNet learns molecular conformations and ChemNet tween atoms. Using spherical Bessel functions and spheri-
learns Chemical properties using neural networks. By fusing cal harmonics, distance and angle can be jointly presented
physical and chemical information, PhysChem obtains de- effectively. In general, 2D graph emphasizes topological in-
sired performance on property prediction tasks. PAR[Wang et formation, while 3D geometric graphs focus more on energy.
al., 2021] involves task information and proposes a property- GeomGCL [Li et al., 2022] calculates the definite geometric
aware embedding method. Wang et. al [Wang et al., 2022] are factors (angle and distance) and utilizes radial basis functions
inspired by the relation of equivalence between reactants and to obtain geometric embeddings. GRAPHMVP [Liu et al.,
products in a chemical reaction. They propose, MolR, to pre- 2022a] adopts 3D conformers and learn molecular represen-
serve the equivalence relation in the embedding space, which tations via 3D GNN model. To complete the identification of
means forcing the sum of reactant embeddings and the sum 3D graph structures, SphereNet [Liu et al., 2022b] designs
of product embeddings to be equal. MolR achieves SOTA a spherical message passing as a powerful scheme for 3D
performance in a variety of downstream tasks. molecular learning.
3.4 Knowledge Graph-based Method molecular generation models operate on molecular graphs di-
The knowledge graph is an effective strategy to involve rectly most time. To avoid invalid states [Jin et al., 2018a],
molecular-structure-invariant but rich external knowledge in most works generate the graphs substructure by substructure
the model. Different from previous methods, KGNN [Lin instead of node by node. JT-VAE [Jin et al., 2018a] and
et al., 2020] and MDNN [Lyu et al., 2021] explore the VJTNN [Jin et al., 2018b] decompose the molecular graph
knowledge graph consisting of molecules as nodes and con- into the junction tree first, based on substructures in the graph.
nection relationship between molecules as edges. In this Then they encode the tree using a neural network. Next, they
way, molecular representations are learned by the knowl- reconstruct the junction tree and assemble nodes in the tree
edge graph structure instead of molecular structure. Fang et. back to the original molecular graph. HierVAE [Jin et al.,
al [Fang et al., 2022] construct a chemical element knowl- 2020b] generates molecular graphs hierarchically based on
edge graph, which is formed by triples in the form of (chem- motifs. MoLeR [Maziarz et al., 2022] keeps scaffolds struc-
ical element, relation, attribute), such as (Gas, isStateOf, Cl). ture during the generative procedure and generates molecules
They propose to use this KG to produce augmented nodes and relying on motifs. GraphAF [Shi et al., 2020] utilizes the flow
edges in molecules and utilize contrastive learning to maxi- model to generate the molecular graph. MoleculeChef [Brad-
mize agreement between two views of molecular graphs. shaw et al., 2019] is the model designed to generate syn-
thesizable molecules. It generates reactant molecules first
and then utilizes the molecular transformer [Schwaller et al.,
4 Applications 2019] model to generate the target molecule.
Here, we present several representative applications and al-
gorithms to explain how models are designed to deal with the 4.3 Reaction Prediction
specific applications based on MRL. Reaction prediction and retrosynthesis prediction are funda-
mental problems in organic chemistry. Reaction prediction
4.1 Property Prediction means utilizing reactants to predict reaction products. The
Molecular property prediction plays a fundamental role in process of retrosynthesis prediction is the opposite of reac-
drug discovery to identify potential drug candidates with tar- tion prediction. When taking SMILES as input, the reaction
get properties. Generally, this task consists of two phases: a prediction task is taken as a translation task. When taking
molecular encoder to generate a fixed-length molecular rep- molecular graphs as input, there are two steps to do both
resentation and a predictor. A predictor is utilized to pre- for reaction prediction and retrosynthesis prediction. Like
dict whether the molecule has the target property or pre- WLDN [Jin et al., 2017] and WLDN++ [Coley et al., 2019],
dict the reaction of molecules to the target property based the model needs to predict the reaction center first and then
on learned molecular representation. Property prediction predict the potential products which is the major product. Dif-
results can reflect the quality of learned molecular repre- ferent from previous work, MolR [Wang et al., 2022] for-
sentation directly. As a result, property prediction tasks mulates the task of reaction prediction as a ranking problem.
achieve great researchers’ attention. More and more general All the products in the test set are put in the candidate pool.
graph learning papers [Hu* et al., 2020; Gilmer et al., 2017; MolR ranks these candidate products based on the embedding
Brockschmidt, 2020; You et al., 2020] employ molecular learned from given reactant sets.
graph datasets and property prediction tasks to examine the
performance of their algorithms. What’s more, molecular 4.4 Drug-drug Interactions
specifically designed deep learning methods for MRL are Detecting drug-drug interaction(DDI) is an important task
proposed and applied in this task first. MolR [Wang et al., that can help clinicians make effective decisions and sched-
2022] proposes a novel way to learn molecular representa- ule appropriate therapy programs. Accurate DDI can not only
tion by keeping equivalence relation of the molecule reaction help medication recommendations but also effectively iden-
in the embedding space, which is also applied in the prop- tify potential adverse effects, which is critical for patients and
erty prediction task first. Besides, the insufficient available society. AttSemiGAE[Ma et al., 2018] proposes to do DDI
molecular dataset is a common problem existing in the chem- by measuring drug similarity with multiple types of drug fea-
istry field. Guo et al. [Guo et al., 2021] and Wang et al. [Wang tures. SafeDrug [Yang et al., 2021a] designs global and local
et al., 2021] propose meta-learning methods to deal with this two modules to fully encode the connectivity and functional-
problem on property prediction. ity of drug molecules to make DDI. Both KGNN [Lin et al.,
2020] and MDNN [Lyu et al., 2021] build the drug knowl-
4.2 Molecular Generation edge graph to improve the accuracy of DDI.
The key challenge of drug discovery is to find target
molecules with the target properties, which heavily relies on 5 Datasets and Benchmarks
domain experts. The molecular generation is to automate this We summarize representative molecular representation learn-
process. Two steps are necessary to complete this task: one ing algorithms in Table 2. To conveniently access the empir-
is designing an encoder to represent molecules in a continu- ical results, each paper is attached with code links if avail-
ous manner, which is beneficial to optimize and predict prop- able. Corresponding tasks, encoding algorithms, methods,
erty; the other is proposing a decoder to map the optimized and training methods are also listed. Here, methods specify
space to a molecular graph with the optimized property. Due four methods we discussed in Section 3. For training meth-
to SMILES is not designed to capture molecular similarity, ods, we include self-supervised learning, supervised learning,
Table 3: Datasets for molecular representation learning research.

Dataset Category #Train #Dev #Test Reference Data Link


ZINC15 Structure Pretraining / / / [Sterling and Irwin, 2015] https://zinc15.docking.org
PubChem Structure Pretraining / / / [Kim et al., 2019] https://pubchem.ncbi.nlm.nih.gov
ChEMBL Structure Pretraining / / / [Gaulton et al., 2017] https://www.ebi.ac.uk/chembl/
QM9 Property prediction 107,108 13,388 13,388 [Wu et al., 2018] https://moleculenet.org/datasets-1
ESOL Property prediction 902 112 112 [Wu et al., 2018] https://moleculenet.org/datasets-1
FreeSolv Property prediction 513 64 64 [Wu et al., 2018] https://moleculenet.org/datasets-1
Lipophilicity Property prediction 3,360 420 420 [Wu et al., 2018] https://moleculenet.org/datasets-1
MUV Property prediction 74,470 9,308 9,308 [Wu et al., 2018] https://moleculenet.org/datasets-1
HIV Property prediction 32,901 4,112 4,112 [Wu et al., 2018] https://moleculenet.org/datasets-1
PDBbind Property prediction 9,526 1,190 1,190 [Wu et al., 2018] https://moleculenet.org/datasets-1
BACE Property prediction 1,210 151 151 [Wu et al., 2018] https://moleculenet.org/datasets-1
BBBP Property prediction 1,631 203 203 [Wu et al., 2018] https://moleculenet.org/datasets-1
Tox21 Property prediction 6,264 783 783 [Wu et al., 2018] https://moleculenet.org/datasets-1
ToxCast Property prediction 6,860 857 857 [Wu et al., 2018] https://moleculenet.org/datasets-1
SIDER Property prediction 1,141 142 142 [Wu et al., 2018] https://moleculenet.org/datasets-1
ClinTox Property prediction 1,182 147 147 [Wu et al., 2018] https://moleculenet.org/datasets-1
USPTO MIT Reaction Prediction 400,000 40,000 40,000 [Jin et al., 2017] https://github.com/wengong-jin/nips17-rexgen
USPTO-15K Reaction Prediction 10500 1500 3000 [Coley et al., 2017] https://github.com/connorcoley/ochem_predict_nn
USPTO-full Reaction Prediction 760,000 95,000 95,000 [Lowe, 2012] https://github.com/dan2097/patent-reaction-extraction
ZINC-250k Molecular Generation 200,000 25,000 25,000 [Kusner et al., 2017] https://github.com/mkusner/grammarVAE
DrugBank Drug-drug interaction 489,910 61,238 61,238 [Lin et al., 2020] https://github.com/xzenglab/KGNN
KEGG-drug Drug-drug interaction 45,586 5,698 5,698 [Lin et al., 2020] https://github.com/xzenglab/KGNN/tree/master

pre-training, and contrastive learning. Except for the algo- most important part in MRL? Why do these machine learning
rithms, we also summarize commonly used datasets for dif- models work for chemical tasks? How do these MRL mod-
ferent chemical tasks in Table 3. els learn? The answers to these questions contribute more
to breaking the boundaries. AttSemiGAE[Ma et al., 2018],
6 Future Directions E2E[Gao et al., 2018] and GCNN[Henderson et al., 2021] all
propose strategies to improve their model’s explainability to
Graph-based methods for MRL develop fast. Although MRL provide more guidance for researchers. To this end, explain-
has achieved satisfactory results applied in various applica- able MRL is also a potential future direction.
tions, there are still some challenges that remain to be solved.
We list several future directions for reference.
6.3 Graph-based MRL with Insufficient Data
6.1 Graph-based MRL with Spatial Learning Insufficient molecules for training is a common problem ex-
3D geometric information attracts great attention recently in isting in the chemistry field. 3D molecular structures can pro-
graph-based MRL. There are several ways to encode 3D in- vide more geometric information which contributes a lot to
formation. One is an equivariant graph neural network, like MRL. However, determining 3D structures from experiments
SE(3)-transformers [Fuchs et al., 2020]. Another category is challenging and costly. Existing available 3D molecule
of methods takes relative 3D information as input, like the graphs are insufficient for the model training. Molecular con-
directional message passing methods [Klicpera et al., 2019; formation generation [Xu et al., 2021] is one of the solutions
Klicpera et al., 2021a] introduced in Section 3, which in- for this problem, but it has not been widely researched. Be-
clude distances between atoms and angles between bonds sides, Guo et al. [Guo et al., 2021] and Wang et al. [Wang et
as features to learn geometric information. What’s more, al., 2021] propose meta-learning algorithms to deal with the
SphereNet [Liu et al., 2022b] proposes spherical message few-shot molecule problems, which also appeals to some fol-
passing to learn 3D molecular representation. However, how lowing work. Algorithms proposed to deal with insufficient
different geometries contribute to molecular representation data problems should be another important research direction.
learning still lacks rigorous justification. There is no estab-
lished standard spatial information learning method for now.
It should be a promising future research direction for MRL.
7 Conclusion
Molecular representation learning builds a strong and vi-
6.2 Graph-based MRL with Explainabitity tal connection between machine learning and chemical sci-
Models’ explainability is always a common challenge and ence. In this work, we introduce the problem of graph-based
a vital task, especially for MRL. To break down the gap molecular representation learning and provide a comprehen-
between machine learning and chemical science, a well- sive overview of the recent progresses on this research topic.
designed MRL model to produce competitive prediction or To facilitate reproducible research, we take the first step to
generation results on chemical tasks is important but not the release the representative molecular representation learning
end of MRL research. Which molecular features play the benchmarks and commonly used datasets for the research
community. Finally, we share our thoughts about future di- [Henderson et al., 2021] Ryan Henderson, Djork-Arné Clev-
rections in this topic. ert, and Floriane Montanari. Improving molecular graph
neural network explainability with orthonormalization and
References induced sparsity. In ICML, 2021.
[Bradshaw et al., 2019] John Bradshaw, Brooks Paige, [Hu* et al., 2020] Weihua Hu*, Bowen Liu*, Joseph Gomes,
Matt J Kusner, Marwin Segler, and José Miguel Marinka Zitnik, Percy Liang, Vijay Pande, and Jure
Hernández-Lobato. A model to search for synthesizable Leskovec. Strategies for pre-training graph neural net-
molecules. NeurIPS, 2019. works. In ICLR, 2020.
[Brockschmidt, 2020] Marc Brockschmidt. Gnn-film: [Jin et al., 2017] Wengong Jin, Connor Coley, Regina Barzi-
Graph neural networks with feature-wise linear modula- lay, and Tommi Jaakkola. Predicting organic reaction out-
tion. In ICML, 2020. comes with weisfeiler-lehman network. NeurIPS, 2017.
[Coley et al., 2017] Connor W Coley, Regina Barzilay, [Jin et al., 2018a] Wengong Jin, Regina Barzilay, and
Tommi S Jaakkola, William H Green, and Klavs F Jensen. Tommi Jaakkola. Junction tree variational autoencoder for
Prediction of organic reaction outcomes using machine molecular graph generation. In ICML, 2018.
learning. ACS central science, 2017. [Jin et al., 2018b] Wengong Jin, Kevin Yang, Regina Barzi-
[Coley et al., 2019] Connor W Coley, Wengong Jin, Luke lay, and Tommi Jaakkola. Learning multimodal graph-
Rogers, Timothy F Jamison, Tommi S Jaakkola, to-graph translation for molecule optimization. In ICLR,
William H Green, Regina Barzilay, and Klavs F Jensen. 2018.
A graph-convolutional neural network model for the pre- [Jin et al., 2020a] Wei Jin, Tyler Derr, Haochen Liu, Yiqi
diction of chemical reactivity. Chemical science, 2019. Wang, Suhang Wang, Zitao Liu, and Jiliang Tang. Self-
[Do et al., 2019] Kien Do, Truyen Tran, and Svetha supervised learning on graphs: Deep insights and new di-
Venkatesh. Graph transformation policy network for rection. arXiv preprint arXiv:2006.10141, 2020.
chemical reaction prediction. In KDD, 2019. [Jin et al., 2020b] Wengong Jin, Regina Barzilay, and
[Fang et al., 2022] Yin Fang, Qiang Zhang, Haihong Yang, Tommi Jaakkola. Hierarchical generation of molecular
Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, graphs using structural motifs. In ICML, 2020.
Zhuo Chen, Xiaohui Fan, and Huajun Chen. Molecu- [Jin et al., 2020c] Wengong Jin, Regina Barzilay, and
lar contrastive learning with chemical element knowledge Tommi Jaakkola. Multi-objective molecule generation us-
graph. In AAAI, 2022. ing interpretable substructures. In ICML, 2020.
[Fuchs et al., 2020] Fabian Fuchs, Daniel Worrall, Volker [Kim et al., 2019] Sunghwan Kim, Jie Chen, Tiejun Cheng,
Fischer, and Max Welling. Se (3)-transformers: 3d roto- Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Ben-
translation equivariant attention networks. NeurIPS, 2020. jamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. Pub-
[Gao et al., 2018] Kyle Yingkai Gao, Achille Fokoue, Heng chem 2019 update: improved access to chemical data. Nu-
Luo, Arun Iyengar, Sanjoy Dey, and Ping Zhang. Inter- cleic acids research, 2019.
pretable drug target prediction using deep neural represen- [Kipf and Welling, 2017] Thomas N Kipf and Max Welling.
tation. In IJCAI, 2018. Semi-supervised classification with graph convolutional
[Gaulton et al., 2017] Anna Gaulton, Anne Hersey, Michał networks. In ICLR, 2017.
Nowotka, A Patricia Bento, Jon Chambers, David [Klicpera et al., 2019] Johannes Klicpera, Janek Groß, and
Mendez, Prudence Mutowo, Francis Atkinson, Louisa J Stephan Günnemann. Directional message passing for
Bellis, Elena Cibrián-Uhalte, et al. The chembl database molecular graphs. In ICLR, 2019.
in 2017. Nucleic acids research, 2017. [Klicpera et al., 2021a] Johannes Klicpera, Florian Becker,
[Gilmer et al., 2017] Justin Gilmer, Samuel S Schoenholz, and Stephan Günnemann. Gemnet: Universal directional
Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural graph neural networks for molecules. NeurIPS, 2021.
message passing for quantum chemistry. In ICML, 2017. [Klicpera et al., 2021b] Johannes Klicpera, Chandan Yesh-
[Guo et al., 2020] Zhichun Guo, Wenhao Yu, Chuxu Zhang, wanth, and Stephan Günnemann. Directional message
Meng Jiang, and Nitesh V Chawla. Graseq: graph and se- passing on molecular graphs via synthetic coordinates.
quence fusion learning for molecular property prediction. NeurIPS, 2021.
In CIKM, 2020. [Krenn et al., 2020] Mario Krenn, Florian Häse, AkshatKu-
[Guo et al., 2021] Zhichun Guo, Chuxu Zhang, Wenhao Yu, mar Nigam, Pascal Friederich, and Alan Aspuru-Guzik.
John Herr, Olaf Wiest, Meng Jiang, and Nitesh V Chawla. Self-referencing embedded strings (selfies): A 100% ro-
Few-shot graph learning for molecular property predic- bust molecular string representation. Machine Learning:
tion. In Web Conference(WWW) 2021, 2021. Science and Technology, 2020.
[Hamilton et al., 2017] Will Hamilton, Zhitao Ying, and Jure [Kusner et al., 2017] Matt J Kusner, Brooks Paige, and
Leskovec. Inductive representation learning on large José Miguel Hernández-Lobato. Grammar variational au-
graphs. NeurIPS, 2017. toencoder. In ICML, 2017.
[Landrum, 2020] G. A. Landrum. Rdkit: Open-source chem- [Sun et al., 2020] Fan-Yun Sun, Jordan Hoffman, Vikas
informatics software. http://www.rdkit.org, 2020. Verma, and Jian Tang. Infograph: Unsupervised and semi-
[Li et al., 2016] Yujia Li, Richard Zemel, Marc supervised graph-level representation learning via mutual
Brockschmidt, and Daniel Tarlow. Gated graph se- information maximization. In ICLR, 2020.
quence neural networks. In ICLR, 2016. [Sun et al., 2021] Mengying Sun, Jing Xing, Huijun Wang,
[Li et al., 2022] Shuangli Li, Jingbo Zhou, Tong Xu, De- Bin Chen, and Jiayu Zhou. Mocl: Contrastive learning
jing Dou, and Hui Xiong. Geomgcl: Geometric graph on molecular graphs with multi-level domain knowledge.
contrastive learning for molecular property prediction. In arXiv preprint arXiv:2106.04509, 2021.
AAAI, 2022. [Tang et al., 2020] Bowen Tang, Skyler T Kramer, Meijuan
[Lin et al., 2020] Xuan Lin, Zhe Quan, Zhi-Jie Wang, Fang, Yingkun Qiu, Zhen Wu, and Dong Xu. A self-
Tengfei Ma, and Xiangxiang Zeng. Kgnn: Knowledge attention based message passing neural network for pre-
graph neural network for drug-drug interaction prediction. dicting molecular lipophilicity and aqueous solubility.
In IJCAI, 2020. Journal of cheminformatics, 2020.
[Liu et al., 2022a] Shengchao Liu, Hanchen Wang, Weiyang [Veličković et al., 2018] Petar Veličković, Guillem Cucurull,
Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. Pre- Arantxa Casanova, Adriana Romero, Pietro Lio, and
training molecular graph representation with 3d geometry. Yoshua Bengio. Graph attention networks. In ICLR, 2018.
In ICLR, 2022. [Wang et al., 2021] Yaqing Wang, Abulikemu Abuduweili,
[Liu et al., 2022b] Yi Liu, Limei Wang, Meng Liu, Yuchao Quanming Yao, and Dejing Dou. Property-aware rela-
Lin, Xuan Zhang, Bora Oztekin, and Shuiwang Ji. Spher- tion networks for few-shot molecular property prediction.
ical message passing for 3d molecular graphs. In ICLR, NeurIPS, 2021.
2022.
[Wang et al., 2022] Hongwei Wang, Weijiang Li, Xiaomeng
[Lowe, 2012] Daniel Mark Lowe. Extraction of chemical Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, and Martin
structures and reactions from the literature. PhD thesis, Burke. Chemical-reaction-aware molecule representation
University of Cambridge, 2012. learning. In ICLR, 2022.
[Lyu et al., 2021] Tengfei Lyu, Jianliang Gao, Ling Tian,
[Weininger et al., 1989] David Weininger, Arthur
Zhao Li, Peng Zhang, and Ji Zhang. Mdnn: A multimodal
Weininger, and Joseph L Weininger. Smiles. 2. al-
deep neural network for predicting drug-drug interaction
gorithm for generation of unique smiles notation. 1989.
events. In IJCAI, 2021.
[Ma et al., 2018] Tengfei Ma, Cao Xiao, Jiayu Zhou, and Fei [Wu et al., 2018] Zhenqin Wu, Bharath Ramsundar, Evan N
Wang. Drug similarity integration through attentive multi- Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S
view graph auto-encoders. In IJCAI, 2018. Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a
benchmark for molecular machine learning. Chemical sci-
[Maziarz et al., 2022] Krzysztof Maziarz, Henry Richard ence, 2018.
Jackson-Flux, Pashmina Cameron, Finton Sirockin, Na-
dine Schneider, Nikolaus Stiefl, Marwin Segler, and Marc [Wu et al., 2022] Yulun Wu, Nicholas Choma, Andrew Deru
Brockschmidt. Learning to extend molecular scaffolds Chen, Mikaela Cashman, Erica Teixeira Prates, Veron-
with structural motifs. In ICLR, 2022. ica G Melesse Vergara, Manesh B Shah, Austin Clyde,
[Rong et al., 2020] Yu Rong, Yatao Bian, Tingyang Xu, Thomas Brettin, Wibe Albert de Jong, Neeraj Kumar,
Martha S Head, Rick L. Stevens, Peter Nugent, Daniel A
Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou
Jacobson, and James B Brown. Spatial graph attention
Huang. Self-supervised graph transformer on large-scale
and curiosity-driven policy for antiviral drug discovery. In
molecular data. NeurIPS, 2020.
ICLR, 2022.
[Schwaller et al., 2019] Philippe Schwaller, Teodoro Laino,
[Xu et al., 2019] Keyulu Xu, Weihua Hu, Jure Leskovec, and
Théophile Gaudin, Peter Bolgar, Christopher A Hunter,
Costas Bekas, and Alpha A Lee. Molecular transformer: a Stefanie Jegelka. How powerful are graph neural net-
model for uncertainty-calibrated chemical reaction predic- works? In ICLR, 2019.
tion. ACS central science, 2019. [Xu et al., 2021] Minkai Xu, Wujie Wang, Shitong Luo,
[Shi et al., 2020] Chence Shi, Minkai Xu, Zhaocheng Zhu, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli,
Weinan Zhang, Ming Zhang, and Jian Tang. Graphaf: a and Jian Tang. An end-to-end framework for molecu-
flow-based autoregressive model for molecular graph gen- lar conformation generation via bilevel programming. In
eration. In ICLR, 2020. ICML, 2021.
[Shui and Karypis, 2020] Zeren Shui and George Karypis. [Yang et al., 2021a] Chaoqi Yang, Cao Xiao, Fenglong Ma,
Heterogeneous molecular graph neural networks for pre- Lucas Glass, and Jimeng Sun. Safedrug: Dual molecular
dicting molecule properties. In ICDM, 2020. graph encoders for recommending effective and safe drug
[Sterling and Irwin, 2015] Teague Sterling and John J Irwin. combinations. In IJCAI, 2021.
Zinc 15–ligand discovery for everyone. Journal of chemi- [Yang et al., 2021b] Shuwen Yang, Ziyao Li, Guojie Song,
cal information and modeling, 2015. and Lingsheng Cai. Deep molecular representation
learning via fusing physical and chemical information.
NeurIPS, 2021.
[You et al., 2020] Yuning You, Tianlong Chen, Yongduo Sui,
Ting Chen, Zhangyang Wang, and Yang Shen. Graph con-
trastive learning with augmentations. NeurIPS, 2020.
[Zhang et al., 2021] Zaixi Zhang, Qi Liu, Hao Wang,
Chengqiang Lu, and Chee-Kong Lee. Motif-based graph
self-supervised learning for molecular property prediction.
NeurIPS, 2021.

You might also like