0% found this document useful (0 votes)

44 views9 pages

Graph-Based Molecular Representation Learning

Uploaded by

Johan Ruister

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views9 pages

Graph-Based Molecular Representation Learning

Uploaded by

Johan Ruister

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Graph-based Molecular Representation Learning

Zhichun Guo1 , Bozhao Nan1 , Yijun Tian1 , Olaf Wiest1 , Chuxu Zhang2 , Nitesh V. Chawla1
1
University of Notre Dame, 2 Brandeis University
{zguo5,bnan,yijun.tian,Olaf.G.Wiest.1}@nd.edu, [email protected], [email protected]
arXiv:2207.04869v1 [q-bio.QM] 8 Jul 2022

Abstract paper, we provide a systematic review of the progress in this

rapidly-developing topic, charting the path from representa-
Molecular representation learning (MRL) is a key tion learning methods that incorporate molecular structures to
step to build the connection between machine the methods that also incorporate domain knowledge.
learning and chemical science. In particular, it
encodes molecules as numerical vectors preserv- Motivation 1: why molecular representation learning
ing the molecular structures and features, on top matters?
of which the downstream tasks (e.g., property pre- Molecular representation learning has a broad spectrum
diction) can be performed. Recently, MRL has of applications closely related to people’s life. For exam-
achieved considerable progress, especially in deep ple, drug discovery via wet-lab experimentation is extremely
molecular graph learning-based methods. In this time-consuming and expensive. With the advancement of
survey, we systematically review these graph-based deep learning, a great number of experiments can be simu-
molecular representation techniques. Specifically, lated by machine learning models. Property prediction can
we first introduce the data and features of the 2D help identify the molecules with target properties. Reaction
and 3D graph molecular datasets. Then we summa- prediction could predict the major products. These signifi-
rize the methods specially designed for MRL and cantly reduce the amount of failed experiments. For all these
categorize them into four strategies. Furthermore, chemical applications, MRL is the key determinant of the suc-
we discuss some typical chemical applications sup- cess of deep learning models.
ported by MRL. To facilitate studies in this fast- Motivation 2: why deep graph learning for molecular rep-
developing area, we also list the benchmarks and resentation learning?
commonly used datasets in the paper. Finally, we Molecular graphs naturally describe molecules with rich
share our thoughts on future research directions. structural and spatial information. Molecules are essen-
tially atoms and bonds interconnecting atoms, which natu-
rally lead themselves to graph representations. Compared
1 Introduction with SMILES, a line-based representation (i.e., string) of
molecules, molecular graphs provide richer information for
The interaction between machine learning and chemical sci- MRL models to learn from. As a result, graph-based MRL
ence has received great attention from researchers in both ar- models evolve much faster than sequence-based MRL mod-
eas. It has made remarkable progress in various chemical els. Additionally, more and more general graph learning pa-
applications including molecular property prediction [Guo pers [Gilmer et al., 2017; Hu* et al., 2020; You et al., 2020]
et al., 2020; Sun et al., 2021; Yang et al., 2021b; Liu et employ molecular graph datasets to examine the performance
al., 2022b], reaction prediction [Jin et al., 2017; Do et al., of their algorithms as well.
2019], molecular graph generation [Jin et al., 2018a; Jin et Contributions. The main contributions of this work are
al., 2020b] and also drug-drug interaction prediction [Lin et summarized as follows:
al., 2020]. Molecular representation learning (MRL) is an • We present a systematic review of the recent progress in
important step in bridging the gap between these two fields. graph-based MRL models based on various kinds of molec-
MRL aims to utilize deep learning models to encode the in- ular inputs and summarize the strategies specifically de-
put molecules as numerical vectors, which preserve useful signed for MRL.
information about the molecules and serve as feature vec-
tors for downstream (machine learning) applications. Ear- • To encourage reproducible research on this topic, we sum-
lier molecular representation learning methods use general marize the representative benchmarks and commonly used
representation learning models to represent molecules with- datasets in various downstream applications.
out explicit involvement of domain knowledge. Recently, • We discuss the limitations of 2D and 3D molecular graphs
many algorithms are specifically designed for MRL, which as input and share our thoughts on future research direc-
can better incorporate chemical domain knowledge. In this tions of MRL for giving references to the community.
Molecular Graph Input Molecular representation Learning

Molecular Graph (c) Molecular Structure Domain-knowledge

based Method based Method

Graph Neural Network

Molecular forces
Molecular property Aggregation
Molecular Reaction
3D Molecular Graph
Spatial-Learning Knowledge Graph
based Method based Method Molecular Representation

Applications

(a) (b) (d)

Figure 1: Overview of graph-based molecular representation learning: (a) Two molecular graphs; (b) The general learning process of graph
neural networks; (c) Four methods proposed for graph-based molecular representation learning; (d) The process of aggregating atoms’
representations to obtain the molecular representation.
Table 1: Details of node and edge features in molecular graph.
2 Data Representations Attribute Details
Traditionally, researchers use fixed fingerprint feature ex- Node
traction rules to identify important information about each Atom type 118
Chirality tag unsprcified, tetrahedral cw, tetrahedral ccw, other
molecule and feed this hand-crafted information to a linear Hybridization sp, sp2 , sp3 , sp3 d, or sp3 d2
classification/regression head for downstream tasks. This re- Atomaticity 0 or 1 (aromatic atom or not)
quires significant time to determine and calculate the most Edge
relevant features, and the designed features still cannot sup- Bond type single, double, triple, aromatic
Ring 0 or 1 (bond is in a ring or not)
port all tasks. To avoid these efforts, most deep learning Bond direction -, endupright, enddownright
models are developed to learn the molecular features auto- Stereochemistry -, any, Z, E, cis, trans
matically. Two kinds of molecular representations are used
as inputs: molecular graphs and sequences. Accordingly,
graph-based and sequence-based models are developed to
learn from different input molecular representations. The
sequence representations, such as simplified molecular in- cency matrix. Molecules are usually saved as SMILES for
put line-entry system (SMILES) [Weininger et al., 1989] and convenience and converted to molecular graphs for compu-
SELF-referencing Embedded Strings (SELFIES) [Krenn et tation using specific tools. For example, RDKit [Landrum,
al., 2020] can be converted into molecular graphs, but this 2020] can convert a SMILES string into a molecular graph
conversion involves a significant amount of domain knowl- with the feature and adjacency matrices. The commonly used
edge. When we take sequence representations as input, this features of nodes and edges are listed in Table 1. In this ta-
knowledge is not easy to aware of by sequence-based learning ble, atom and bond types are mandatory features to be in-
models. In contrast, the graph representations can naturally cluded, while other features are optional and they can be
incorporate additional information in nodes and edges, which included on demand for different tasks [Tang et al., 2020].
is easily leveraged by the rich suite of graph-based models Among these features, the atom’s chirality tag cannot be
(e.g., graph neural networks). Therefore, we will focus on the learned from the common 2D molecular graph representa-
graph representation in this survey, as it is more commonly tion without 3D geometric information. Other features are
used nowadays. In this section, we will clarify the molecular all learnable from both 2D and 3D structures. For the con-
graphs (without spatial information) and 3D molecular graphs nection relationship, we consider each bond as a bidirectional
representations, as shown in Figure 1 (a). For each represen- edge, which means that a bond between atom A and B will
tation, we analyze its characteristics and discuss its usages result in two edges in the adjacency list: one from A to B,
and limitations when utilized in deep learning models. another from B to A. Given the above data, researchers can
leverage homogeneous [Gilmer et al., 2017; Guo et al., 2021;
2.1 Molecular Graph Coley et al., 2019] or heterogeneous networks [Shui and
Karypis, 2020] to learn molecular representations.
A graph consists of nodes and edges interconnecting the
nodes. Analogously, in a molecule, we may consider atoms The advantage of using molecular graphs as input is obvi-
as nodes and bonds as edges between the atoms. Thus, a ous: graph neural networks can be applied directly to learn
molecule has a natural graph structure. This renders the molecular representations using their topological structures.
molecular graph to be the most feasible input for deep learn- However, the bonds in this kind of graph are determined by
ing models and leads to their extensive use. The most com- the distance between atoms, which neglects the spatial direc-
mon form of molecular graphs is described by three matri- tion and torsion between atoms. This limits the knowledge
ces: the node feature matrix, edge feature matrix, and adja- that can be derived from the general molecular graph.
2.2 3D Molecular Graph and R are neural networks with the learned weights updated
The 3D molecular graph provides the missing geometric in- during the training process.
formation by explicitly encoding the spatial structure. It pro- Besides MPNN, different variants of graph neural net-
vides the atomic structure as a set of atoms together with their works like GCN [Kipf and Welling, 2017], GIN [Xu et al.,
3D coordinates, which involves more atoms’ information. As 2019], GAT [Veličković et al., 2018], GGNN [Li et al.,
a result, this representation format has received increasing at- 2016] and GraphSage [Hamilton et al., 2017] can also be
tention in MRL [Liu et al., 2022b]. Different from taking 2D used directly to learn molecular representations. These meth-
molecular graphs as input, the techniques based on the 3D ods are widely utilized as the base encoder for molecular
graphs take atoms as nodes but learn the atomic interactions representation learning in various downstream tasks, such
as edges using graph neural networks. Under this setting, the as reaction prediction [Coley et al., 2019], property predic-
atom features are spatial-invariant (e.g., atom types) while the tion [Brockschmidt, 2020] and drug discovery [Jin et al.,
coordinates provide relative positions between atoms. For 2020c]. Hu et al. [Hu* et al., 2020] conduct a comparative
example, the bonds can be determined by the distance be- study on graph neural networks in property prediction and
tween two atoms using their coordinates. To incorporate more find that GIN usually achieves the best results. While these
complicated spatial relationships, spherical graph neural net- models are powerful in learning graph structures, chemical
works [Liu et al., 2022b] are designed to learn molecule traits, and knowledge, the essence of molecules is largely
structure from the 3D graphs. neglected. Recently, various deep learning methods are de-
signed specifically for molecules as well. These methods are
categorized into four parts in Figure 1(c), which are elabo-
3 Methodology rated as follows.
In this section, we start with the general graph neural net-
works for MRL. Then, we discuss methods designed specif- 3.1 Molecular Structure-based Method
ically for this task and categorize these methods into four
Graph-based MRL generally considers molecular graphs the
strategies. These specific methods incorporate chemistry-
same as other plain graphs. It only focuses on the topo-
related information to strengthen molecular representations
logical structures but cares less about special substructures
in different ways, which leads to better performance. The
or properties contained in the molecular graphs. Recent re-
representative methods are listed in Table 2.
search has seen a foray into self-supervised learning strate-
Formally, each molecule generally is considered as an gies [Jin et al., 2020a] that push the model to pay more at-
undirected graph G = (V, E, X) with node features xv ∈ tention to the graph structures. PreGNN [Hu* et al., 2020]
X for v ∈ V and edge features euv ∈ E for (u, v) ∈ utilizes context prediction and node/edge attribute masking
E [Brockschmidt, 2020]. Here, nodes represent atoms two self-supervised strategies to pre-train GNN. Different
and edges represent bonds. Generally, graph-based learn- from this general unsupervised design, GROVER [Rong et
ing methods can fit into Message Passing Neural Networks al., 2020] proposes molecular-specific self-supervised pre-
(MPNN) [Gilmer et al., 2017] scheme. Therefore, we take training methods: contextual property prediction and graph-
MPNN as an example to illustrate the learning process, as level motif1 prediction. MGSSL[Zhang et al., 2021] also
shown in Figure 1 (b). The forward pass consists of three op- designs a motif-based graph self-supervised strategy, which
erations: message passing, node update, and readout. During predicts the motif’s topology and label during the motif tree
the message passing phase, node features are updated itera- generation process. INFOGRAPH[Sun et al., 2020] trains
tively according to their neighbors in the graph for T times. the model by maximizing the mutual information between
We initialize the embedding of node v as h0v = xv . Formally, the representations of the entire graph and substructures of
node hidden states at step t + 1 are obtained based on mes- different granularity.
sages mt+1
v , which are represented as: Contrastive learning is a common self-supervised learn-
X ing strategy, which utilizes data augmentation to make mod-
mt+1
v = Mt (htv , htu , euv ), (1) els produce graph representations with better generalizabil-
u∈N (v) ity, transferability, and robustness. Three general graph
augmentation methods are proposed by GraphCL [You et
ht+1
v = Ut (htv , mt+1
v ), (2) al., 2020], which can also be applied to molecule datasets.
where Mt is the message function, Ut is node update func- MoCL [Sun et al., 2021] proposes two molecular graph aug-
tion, N (v) is the set of node v’s neighbors in the graph. After mentation methods: one is replacing a valid substructure
updating the node features T times, the readout function R with a similar physical or a chemical property-related sub-
computes the whole graph embedding vector as follows: structure. The other one is changing a few general carbon
atoms. Molecular 2D and 3D graph representations are natu-
rally two augmented views of molecules. Using this charac-
ŷ = R(hTv | v ∈ V). (3) teristic, GeomGCL[Li et al., 2022] and GRAPHMVP[Liu et
al., 2022a] train the model with contrastive learning. Molecu-
Note that, R is invariant to the order of nodes so that the
framework can be invariant to graph isomorphism. ŷ is the 1
Motifs are recurrent sub-graphs among the input graph data,
representation for the molecule and passed to a fully con- which encode rich domain knowledge of molecules and can be eas-
nected layer to do downstream tasks. All functions Mt , Ut , ily detected by the professional software
Table 2: A list of representative graph-based molecular representation learning algorithms. Four methods (MS, DK, SS, KG) corresponding
to four parts presented in Section3. Here, MS specifically represents molecular substructure related methods. There are four training methods
included in this table: self-supervised learning (SSL), supervised learning (SL), pre-training (PT), and contrastive learning (CL).
Algorithm Task Encoder Method Train Method Venue Code Link
[1]
MPNN Property prediction MPNN / SL ICML’17 /
DimeNet[2] Property prediction MPNN SS SL ICLR’19 https://github.com/klicperajo/dimenet

GNN-FiLM[3] Property prediction MPNN / SL ICML’20 https://github.com/microsoft/tf-gnn-samples

GROVER[4] Property prediction GAT MS SSL + PT NeurIPS’20 https://github.com/tencent-ailab/grover

Pre-GNN[5] Property prediction GIN MS SSL +PT ICLR’20 https://github.com/snap-stanford/pretrain-gnns/

INFOGRAPH[6] Property prediction GNN MS SSL ICLR’20 https://github.com/fanyun-sun/InfoGraph

GraphCL[7] Property prediction GNN MS SSL+CL+PT NeurIPS’20 https://github.com/Shen-Lab/GraphCL

MoCL[8] Property prediction GIN MS SSL+CL+PT KDD’21 https://github.com/illidanlab/MoCL-DK

MGSSL[9] Property prediction GIN MS SSL+PT NeurIPS’21 https://github.com/zaixizhang/MGSSL

PhysChem[10] Property prediction MPNN DK SL NeurIPS’21 /

KCL[11] Property prediction MPNN KG SSL+CL AAAI’22 https://github.com/ZJU-Fangyin/KCL

GeomGCL[12] Property prediction MPNN SS SSL AAAI’22 /

GRAPHMVP[13] Property prediction GNN SS SSL+PT ICLR’22 https://github.com/chao1224/GraphMVP

SphereNet[14] Dynamics simulation MPN SS SL ICLR’22 https://github.com/Open-Catalyst-Project/ocp

WLDN[15] Reacton prediction WLN / SL NeurIPS’17 https://github.com/wengong-jin/nips17-rexgen

MolR[16] Reaction prediction GNN DK SL ICLR’22 https://github.com/hwwang55/MolR

JT-VAE[17] Molecule generation MPNN MS SL ICLR’18 https://github.com/wengong-jin/icml18-jtnn

MoleculeChef[18] Molecule generation GGNN / SL NeurIPS’19 https://github.com/john-bradshaw/molecule-chef

GraphAF[19] Molecule generation R-GCN / SL ICLR’20 https://github.com/DeepGraphLearning/GraphAF

HierVAE[20] Molecule generation MPN MS SL IMLR’20 https://github.com/wengong-jin/hgraph2graph

MoLeR[21] Molecule generation GNN MS SL ICLR’22 /

VJTNN[22] Molecule optimization MPNN MS SL ICLR’18 https://github.com/wengong-jin/iclr19-graph2graph

AttSemiGAE[23] Drug-drug interaction GAE / SL IJCAI’18 https://github.com/matenure/mvGAE

KGNN[24] Drug-drug interaction GNN KG SL IJCAI’20 https://github.com/xzenglab/KGNN

ConfVAE[25] Conformation generation GNN SS SL ICML’21 https://github.com/MinkaiXu/ConfVAE-ICML21

RationaleRL[26] Drug discovery MPNN MS SL ICML’20 https://github.com/wengong-jin/multiobj-rationale

[1]
[Gilmer et al., 2017]; [Klicpera et al., 2019]; [Brockschmidt, 2020]; [Rong et al., 2020]; [Hu* et al., 2020];[6] [Sun et al., 2020];[7] [You et al., 2020];
[2] [3] [4] [5]
[8]
[Sun et al., 2021];[9] [Zhang et al., 2021];[10] [Yang et al., 2021b];[11] [Fang et al., 2022];[12] [Li et al., 2022];[13] [Liu et al., 2022a];
[14]
[Liu et al., 2022b];[15] [Jin et al., 2017];[16] [Wang et al., 2022];[17] [Jin et al., 2018a];[18] [Bradshaw et al., 2019];[19] [Shi et al., 2020];
[20]
[Jin et al., 2020b];[21] [Maziarz et al., 2022];[22] [Jin et al., 2018b];[23] [Ma et al., 2018];[24] [Lin et al., 2020];[25] [Xu et al., 2021];[26] [Jin et al., 2020c]

lar structure knowledge is not only utilized in self-supervised 3.3 Spatial Learning-based Method
learning. Motif, substructure, and scaffold-based molecular
representation learning applied in molecular generation [Jin Molecular spatial information, especially geometric informa-
et al., 2020c; Maziarz et al., 2022; Wu et al., 2022] also tion, attracts wide attention and is involved more and more
achieved competitive performance. in the molecular representation learning process, especially
when the model needs to learn forces or energy on atoms.
3.2 Domain Knowledge-based Method DimeNet [Klicpera et al., 2019], GemNet [Klicpera et al.,
Combining deep learning and molecular science is vital for 2021a] and Directional MPNN [Klicpera et al., 2021b] pro-
molecular representation learning. Involving chemical do- pose directional message embeddings. Although they still
main knowledge in the model design is an effective way to take 2D molecular graphs as input, they consider not only the
improve performance. Yang et.al [Yang et al., 2021b] pro- distances between atoms but also the spatial directions, which
pose a novel model, PhysChem, which is composed of a are calculated by atoms’ 2D coordinates. They use directional
physicist network (PhysNet) and a chemist network (Chem- information by transforming messages based on the angle be-
Net). PhysNet learns molecular conformations and ChemNet tween atoms. Using spherical Bessel functions and spheri-
learns Chemical properties using neural networks. By fusing cal harmonics, distance and angle can be jointly presented
physical and chemical information, PhysChem obtains de- effectively. In general, 2D graph emphasizes topological in-
sired performance on property prediction tasks. PAR[Wang et formation, while 3D geometric graphs focus more on energy.
al., 2021] involves task information and proposes a property- GeomGCL [Li et al., 2022] calculates the definite geometric
aware embedding method. Wang et. al [Wang et al., 2022] are factors (angle and distance) and utilizes radial basis functions
inspired by the relation of equivalence between reactants and to obtain geometric embeddings. GRAPHMVP [Liu et al.,
products in a chemical reaction. They propose, MolR, to pre- 2022a] adopts 3D conformers and learn molecular represen-
serve the equivalence relation in the embedding space, which tations via 3D GNN model. To complete the identification of
means forcing the sum of reactant embeddings and the sum 3D graph structures, SphereNet [Liu et al., 2022b] designs
of product embeddings to be equal. MolR achieves SOTA a spherical message passing as a powerful scheme for 3D
performance in a variety of downstream tasks. molecular learning.
3.4 Knowledge Graph-based Method molecular generation models operate on molecular graphs di-
The knowledge graph is an effective strategy to involve rectly most time. To avoid invalid states [Jin et al., 2018a],
molecular-structure-invariant but rich external knowledge in most works generate the graphs substructure by substructure
the model. Different from previous methods, KGNN [Lin instead of node by node. JT-VAE [Jin et al., 2018a] and
et al., 2020] and MDNN [Lyu et al., 2021] explore the VJTNN [Jin et al., 2018b] decompose the molecular graph
knowledge graph consisting of molecules as nodes and con- into the junction tree first, based on substructures in the graph.
nection relationship between molecules as edges. In this Then they encode the tree using a neural network. Next, they
way, molecular representations are learned by the knowl- reconstruct the junction tree and assemble nodes in the tree
edge graph structure instead of molecular structure. Fang et. back to the original molecular graph. HierVAE [Jin et al.,
al [Fang et al., 2022] construct a chemical element knowl- 2020b] generates molecular graphs hierarchically based on
edge graph, which is formed by triples in the form of (chem- motifs. MoLeR [Maziarz et al., 2022] keeps scaffolds struc-
ical element, relation, attribute), such as (Gas, isStateOf, Cl). ture during the generative procedure and generates molecules
They propose to use this KG to produce augmented nodes and relying on motifs. GraphAF [Shi et al., 2020] utilizes the flow
edges in molecules and utilize contrastive learning to maxi- model to generate the molecular graph. MoleculeChef [Brad-
mize agreement between two views of molecular graphs. shaw et al., 2019] is the model designed to generate syn-
thesizable molecules. It generates reactant molecules first
and then utilizes the molecular transformer [Schwaller et al.,
4 Applications 2019] model to generate the target molecule.
Here, we present several representative applications and al-
gorithms to explain how models are designed to deal with the 4.3 Reaction Prediction
specific applications based on MRL. Reaction prediction and retrosynthesis prediction are funda-
mental problems in organic chemistry. Reaction prediction
4.1 Property Prediction means utilizing reactants to predict reaction products. The
Molecular property prediction plays a fundamental role in process of retrosynthesis prediction is the opposite of reac-
drug discovery to identify potential drug candidates with tar- tion prediction. When taking SMILES as input, the reaction
get properties. Generally, this task consists of two phases: a prediction task is taken as a translation task. When taking
molecular encoder to generate a fixed-length molecular rep- molecular graphs as input, there are two steps to do both
resentation and a predictor. A predictor is utilized to pre- for reaction prediction and retrosynthesis prediction. Like
dict whether the molecule has the target property or pre- WLDN [Jin et al., 2017] and WLDN++ [Coley et al., 2019],
dict the reaction of molecules to the target property based the model needs to predict the reaction center first and then
on learned molecular representation. Property prediction predict the potential products which is the major product. Dif-
results can reflect the quality of learned molecular repre- ferent from previous work, MolR [Wang et al., 2022] for-
sentation directly. As a result, property prediction tasks mulates the task of reaction prediction as a ranking problem.
achieve great researchers’ attention. More and more general All the products in the test set are put in the candidate pool.
graph learning papers [Hu* et al., 2020; Gilmer et al., 2017; MolR ranks these candidate products based on the embedding
Brockschmidt, 2020; You et al., 2020] employ molecular learned from given reactant sets.
graph datasets and property prediction tasks to examine the
performance of their algorithms. What’s more, molecular 4.4 Drug-drug Interactions
specifically designed deep learning methods for MRL are Detecting drug-drug interaction(DDI) is an important task
proposed and applied in this task first. MolR [Wang et al., that can help clinicians make effective decisions and sched-
2022] proposes a novel way to learn molecular representa- ule appropriate therapy programs. Accurate DDI can not only
tion by keeping equivalence relation of the molecule reaction help medication recommendations but also effectively iden-
in the embedding space, which is also applied in the prop- tify potential adverse effects, which is critical for patients and
erty prediction task first. Besides, the insufficient available society. AttSemiGAE[Ma et al., 2018] proposes to do DDI
molecular dataset is a common problem existing in the chem- by measuring drug similarity with multiple types of drug fea-
istry field. Guo et al. [Guo et al., 2021] and Wang et al. [Wang tures. SafeDrug [Yang et al., 2021a] designs global and local
et al., 2021] propose meta-learning methods to deal with this two modules to fully encode the connectivity and functional-
problem on property prediction. ity of drug molecules to make DDI. Both KGNN [Lin et al.,
2020] and MDNN [Lyu et al., 2021] build the drug knowl-
4.2 Molecular Generation edge graph to improve the accuracy of DDI.
The key challenge of drug discovery is to find target
molecules with the target properties, which heavily relies on 5 Datasets and Benchmarks
domain experts. The molecular generation is to automate this We summarize representative molecular representation learn-
process. Two steps are necessary to complete this task: one ing algorithms in Table 2. To conveniently access the empir-
is designing an encoder to represent molecules in a continu- ical results, each paper is attached with code links if avail-
ous manner, which is beneficial to optimize and predict prop- able. Corresponding tasks, encoding algorithms, methods,
erty; the other is proposing a decoder to map the optimized and training methods are also listed. Here, methods specify
space to a molecular graph with the optimized property. Due four methods we discussed in Section 3. For training meth-
to SMILES is not designed to capture molecular similarity, ods, we include self-supervised learning, supervised learning,
Table 3: Datasets for molecular representation learning research.

Dataset Category #Train #Dev #Test Reference Data Link

ZINC15 Structure Pretraining / / / [Sterling and Irwin, 2015] https://zinc15.docking.org
PubChem Structure Pretraining / / / [Kim et al., 2019] https://pubchem.ncbi.nlm.nih.gov
ChEMBL Structure Pretraining / / / [Gaulton et al., 2017] https://www.ebi.ac.uk/chembl/
QM9 Property prediction 107,108 13,388 13,388 [Wu et al., 2018] https://moleculenet.org/datasets-1
ESOL Property prediction 902 112 112 [Wu et al., 2018] https://moleculenet.org/datasets-1
FreeSolv Property prediction 513 64 64 [Wu et al., 2018] https://moleculenet.org/datasets-1
Lipophilicity Property prediction 3,360 420 420 [Wu et al., 2018] https://moleculenet.org/datasets-1
MUV Property prediction 74,470 9,308 9,308 [Wu et al., 2018] https://moleculenet.org/datasets-1
HIV Property prediction 32,901 4,112 4,112 [Wu et al., 2018] https://moleculenet.org/datasets-1
PDBbind Property prediction 9,526 1,190 1,190 [Wu et al., 2018] https://moleculenet.org/datasets-1
BACE Property prediction 1,210 151 151 [Wu et al., 2018] https://moleculenet.org/datasets-1
BBBP Property prediction 1,631 203 203 [Wu et al., 2018] https://moleculenet.org/datasets-1
Tox21 Property prediction 6,264 783 783 [Wu et al., 2018] https://moleculenet.org/datasets-1
ToxCast Property prediction 6,860 857 857 [Wu et al., 2018] https://moleculenet.org/datasets-1
SIDER Property prediction 1,141 142 142 [Wu et al., 2018] https://moleculenet.org/datasets-1
ClinTox Property prediction 1,182 147 147 [Wu et al., 2018] https://moleculenet.org/datasets-1
USPTO MIT Reaction Prediction 400,000 40,000 40,000 [Jin et al., 2017] https://github.com/wengong-jin/nips17-rexgen
USPTO-15K Reaction Prediction 10500 1500 3000 [Coley et al., 2017] https://github.com/connorcoley/ochem_predict_nn
USPTO-full Reaction Prediction 760,000 95,000 95,000 [Lowe, 2012] https://github.com/dan2097/patent-reaction-extraction
ZINC-250k Molecular Generation 200,000 25,000 25,000 [Kusner et al., 2017] https://github.com/mkusner/grammarVAE
DrugBank Drug-drug interaction 489,910 61,238 61,238 [Lin et al., 2020] https://github.com/xzenglab/KGNN
KEGG-drug Drug-drug interaction 45,586 5,698 5,698 [Lin et al., 2020] https://github.com/xzenglab/KGNN/tree/master

pre-training, and contrastive learning. Except for the algo- most important part in MRL? Why do these machine learning
rithms, we also summarize commonly used datasets for dif- models work for chemical tasks? How do these MRL mod-
ferent chemical tasks in Table 3. els learn? The answers to these questions contribute more
to breaking the boundaries. AttSemiGAE[Ma et al., 2018],
6 Future Directions E2E[Gao et al., 2018] and GCNN[Henderson et al., 2021] all
propose strategies to improve their model’s explainability to
Graph-based methods for MRL develop fast. Although MRL provide more guidance for researchers. To this end, explain-
has achieved satisfactory results applied in various applica- able MRL is also a potential future direction.
tions, there are still some challenges that remain to be solved.
We list several future directions for reference.
6.3 Graph-based MRL with Insufficient Data
6.1 Graph-based MRL with Spatial Learning Insufficient molecules for training is a common problem ex-
3D geometric information attracts great attention recently in isting in the chemistry field. 3D molecular structures can pro-
graph-based MRL. There are several ways to encode 3D in- vide more geometric information which contributes a lot to
formation. One is an equivariant graph neural network, like MRL. However, determining 3D structures from experiments
SE(3)-transformers [Fuchs et al., 2020]. Another category is challenging and costly. Existing available 3D molecule
of methods takes relative 3D information as input, like the graphs are insufficient for the model training. Molecular con-
directional message passing methods [Klicpera et al., 2019; formation generation [Xu et al., 2021] is one of the solutions
Klicpera et al., 2021a] introduced in Section 3, which in- for this problem, but it has not been widely researched. Be-
clude distances between atoms and angles between bonds sides, Guo et al. [Guo et al., 2021] and Wang et al. [Wang et
as features to learn geometric information. What’s more, al., 2021] propose meta-learning algorithms to deal with the
SphereNet [Liu et al., 2022b] proposes spherical message few-shot molecule problems, which also appeals to some fol-
passing to learn 3D molecular representation. However, how lowing work. Algorithms proposed to deal with insufficient
different geometries contribute to molecular representation data problems should be another important research direction.
learning still lacks rigorous justification. There is no estab-
lished standard spatial information learning method for now.
It should be a promising future research direction for MRL.
7 Conclusion
Molecular representation learning builds a strong and vi-
6.2 Graph-based MRL with Explainabitity tal connection between machine learning and chemical sci-
Models’ explainability is always a common challenge and ence. In this work, we introduce the problem of graph-based
a vital task, especially for MRL. To break down the gap molecular representation learning and provide a comprehen-
between machine learning and chemical science, a well- sive overview of the recent progresses on this research topic.
designed MRL model to produce competitive prediction or To facilitate reproducible research, we take the first step to
generation results on chemical tasks is important but not the release the representative molecular representation learning
end of MRL research. Which molecular features play the benchmarks and commonly used datasets for the research
community. Finally, we share our thoughts about future di- [Henderson et al., 2021] Ryan Henderson, Djork-Arné Clev-
rections in this topic. ert, and Floriane Montanari. Improving molecular graph
neural network explainability with orthonormalization and
References induced sparsity. In ICML, 2021.
[Bradshaw et al., 2019] John Bradshaw, Brooks Paige, [Hu* et al., 2020] Weihua Hu*, Bowen Liu*, Joseph Gomes,
Matt J Kusner, Marwin Segler, and José Miguel Marinka Zitnik, Percy Liang, Vijay Pande, and Jure
Hernández-Lobato. A model to search for synthesizable Leskovec. Strategies for pre-training graph neural net-
molecules. NeurIPS, 2019. works. In ICLR, 2020.
[Brockschmidt, 2020] Marc Brockschmidt. Gnn-film: [Jin et al., 2017] Wengong Jin, Connor Coley, Regina Barzi-
Graph neural networks with feature-wise linear modula- lay, and Tommi Jaakkola. Predicting organic reaction out-
tion. In ICML, 2020. comes with weisfeiler-lehman network. NeurIPS, 2017.
[Coley et al., 2017] Connor W Coley, Regina Barzilay, [Jin et al., 2018a] Wengong Jin, Regina Barzilay, and
Tommi S Jaakkola, William H Green, and Klavs F Jensen. Tommi Jaakkola. Junction tree variational autoencoder for
Prediction of organic reaction outcomes using machine molecular graph generation. In ICML, 2018.
learning. ACS central science, 2017. [Jin et al., 2018b] Wengong Jin, Kevin Yang, Regina Barzi-
[Coley et al., 2019] Connor W Coley, Wengong Jin, Luke lay, and Tommi Jaakkola. Learning multimodal graph-
Rogers, Timothy F Jamison, Tommi S Jaakkola, to-graph translation for molecule optimization. In ICLR,
William H Green, Regina Barzilay, and Klavs F Jensen. 2018.
A graph-convolutional neural network model for the pre- [Jin et al., 2020a] Wei Jin, Tyler Derr, Haochen Liu, Yiqi
diction of chemical reactivity. Chemical science, 2019. Wang, Suhang Wang, Zitao Liu, and Jiliang Tang. Self-
[Do et al., 2019] Kien Do, Truyen Tran, and Svetha supervised learning on graphs: Deep insights and new di-
Venkatesh. Graph transformation policy network for rection. arXiv preprint arXiv:2006.10141, 2020.
chemical reaction prediction. In KDD, 2019. [Jin et al., 2020b] Wengong Jin, Regina Barzilay, and
[Fang et al., 2022] Yin Fang, Qiang Zhang, Haihong Yang, Tommi Jaakkola. Hierarchical generation of molecular
Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, graphs using structural motifs. In ICML, 2020.
Zhuo Chen, Xiaohui Fan, and Huajun Chen. Molecu- [Jin et al., 2020c] Wengong Jin, Regina Barzilay, and
lar contrastive learning with chemical element knowledge Tommi Jaakkola. Multi-objective molecule generation us-
graph. In AAAI, 2022. ing interpretable substructures. In ICML, 2020.
[Fuchs et al., 2020] Fabian Fuchs, Daniel Worrall, Volker [Kim et al., 2019] Sunghwan Kim, Jie Chen, Tiejun Cheng,
Fischer, and Max Welling. Se (3)-transformers: 3d roto- Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Ben-
translation equivariant attention networks. NeurIPS, 2020. jamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. Pub-
[Gao et al., 2018] Kyle Yingkai Gao, Achille Fokoue, Heng chem 2019 update: improved access to chemical data. Nu-
Luo, Arun Iyengar, Sanjoy Dey, and Ping Zhang. Inter- cleic acids research, 2019.
pretable drug target prediction using deep neural represen- [Kipf and Welling, 2017] Thomas N Kipf and Max Welling.
tation. In IJCAI, 2018. Semi-supervised classification with graph convolutional
[Gaulton et al., 2017] Anna Gaulton, Anne Hersey, Michał networks. In ICLR, 2017.
Nowotka, A Patricia Bento, Jon Chambers, David [Klicpera et al., 2019] Johannes Klicpera, Janek Groß, and
Mendez, Prudence Mutowo, Francis Atkinson, Louisa J Stephan Günnemann. Directional message passing for
Bellis, Elena Cibrián-Uhalte, et al. The chembl database molecular graphs. In ICLR, 2019.
in 2017. Nucleic acids research, 2017. [Klicpera et al., 2021a] Johannes Klicpera, Florian Becker,
[Gilmer et al., 2017] Justin Gilmer, Samuel S Schoenholz, and Stephan Günnemann. Gemnet: Universal directional
Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural graph neural networks for molecules. NeurIPS, 2021.
message passing for quantum chemistry. In ICML, 2017. [Klicpera et al., 2021b] Johannes Klicpera, Chandan Yesh-
[Guo et al., 2020] Zhichun Guo, Wenhao Yu, Chuxu Zhang, wanth, and Stephan Günnemann. Directional message
Meng Jiang, and Nitesh V Chawla. Graseq: graph and se- passing on molecular graphs via synthetic coordinates.
quence fusion learning for molecular property prediction. NeurIPS, 2021.
In CIKM, 2020. [Krenn et al., 2020] Mario Krenn, Florian Häse, AkshatKu-
[Guo et al., 2021] Zhichun Guo, Chuxu Zhang, Wenhao Yu, mar Nigam, Pascal Friederich, and Alan Aspuru-Guzik.
John Herr, Olaf Wiest, Meng Jiang, and Nitesh V Chawla. Self-referencing embedded strings (selfies): A 100% ro-
Few-shot graph learning for molecular property predic- bust molecular string representation. Machine Learning:
tion. In Web Conference(WWW) 2021, 2021. Science and Technology, 2020.
[Hamilton et al., 2017] Will Hamilton, Zhitao Ying, and Jure [Kusner et al., 2017] Matt J Kusner, Brooks Paige, and
Leskovec. Inductive representation learning on large José Miguel Hernández-Lobato. Grammar variational au-
graphs. NeurIPS, 2017. toencoder. In ICML, 2017.
[Landrum, 2020] G. A. Landrum. Rdkit: Open-source chem- [Sun et al., 2020] Fan-Yun Sun, Jordan Hoffman, Vikas
informatics software. http://www.rdkit.org, 2020. Verma, and Jian Tang. Infograph: Unsupervised and semi-
[Li et al., 2016] Yujia Li, Richard Zemel, Marc supervised graph-level representation learning via mutual
Brockschmidt, and Daniel Tarlow. Gated graph se- information maximization. In ICLR, 2020.
quence neural networks. In ICLR, 2016. [Sun et al., 2021] Mengying Sun, Jing Xing, Huijun Wang,
[Li et al., 2022] Shuangli Li, Jingbo Zhou, Tong Xu, De- Bin Chen, and Jiayu Zhou. Mocl: Contrastive learning
jing Dou, and Hui Xiong. Geomgcl: Geometric graph on molecular graphs with multi-level domain knowledge.
contrastive learning for molecular property prediction. In arXiv preprint arXiv:2106.04509, 2021.
AAAI, 2022. [Tang et al., 2020] Bowen Tang, Skyler T Kramer, Meijuan
[Lin et al., 2020] Xuan Lin, Zhe Quan, Zhi-Jie Wang, Fang, Yingkun Qiu, Zhen Wu, and Dong Xu. A self-
Tengfei Ma, and Xiangxiang Zeng. Kgnn: Knowledge attention based message passing neural network for pre-
graph neural network for drug-drug interaction prediction. dicting molecular lipophilicity and aqueous solubility.
In IJCAI, 2020. Journal of cheminformatics, 2020.
[Liu et al., 2022a] Shengchao Liu, Hanchen Wang, Weiyang [Veličković et al., 2018] Petar Veličković, Guillem Cucurull,
Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. Pre- Arantxa Casanova, Adriana Romero, Pietro Lio, and
training molecular graph representation with 3d geometry. Yoshua Bengio. Graph attention networks. In ICLR, 2018.
In ICLR, 2022. [Wang et al., 2021] Yaqing Wang, Abulikemu Abuduweili,
[Liu et al., 2022b] Yi Liu, Limei Wang, Meng Liu, Yuchao Quanming Yao, and Dejing Dou. Property-aware rela-
Lin, Xuan Zhang, Bora Oztekin, and Shuiwang Ji. Spher- tion networks for few-shot molecular property prediction.
ical message passing for 3d molecular graphs. In ICLR, NeurIPS, 2021.
2022.
[Wang et al., 2022] Hongwei Wang, Weijiang Li, Xiaomeng
[Lowe, 2012] Daniel Mark Lowe. Extraction of chemical Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, and Martin
structures and reactions from the literature. PhD thesis, Burke. Chemical-reaction-aware molecule representation
University of Cambridge, 2012. learning. In ICLR, 2022.
[Lyu et al., 2021] Tengfei Lyu, Jianliang Gao, Ling Tian,
[Weininger et al., 1989] David Weininger, Arthur
Zhao Li, Peng Zhang, and Ji Zhang. Mdnn: A multimodal
Weininger, and Joseph L Weininger. Smiles. 2. al-
deep neural network for predicting drug-drug interaction
gorithm for generation of unique smiles notation. 1989.
events. In IJCAI, 2021.
[Ma et al., 2018] Tengfei Ma, Cao Xiao, Jiayu Zhou, and Fei [Wu et al., 2018] Zhenqin Wu, Bharath Ramsundar, Evan N
Wang. Drug similarity integration through attentive multi- Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S
view graph auto-encoders. In IJCAI, 2018. Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a
benchmark for molecular machine learning. Chemical sci-
[Maziarz et al., 2022] Krzysztof Maziarz, Henry Richard ence, 2018.
Jackson-Flux, Pashmina Cameron, Finton Sirockin, Na-
dine Schneider, Nikolaus Stiefl, Marwin Segler, and Marc [Wu et al., 2022] Yulun Wu, Nicholas Choma, Andrew Deru
Brockschmidt. Learning to extend molecular scaffolds Chen, Mikaela Cashman, Erica Teixeira Prates, Veron-
with structural motifs. In ICLR, 2022. ica G Melesse Vergara, Manesh B Shah, Austin Clyde,
[Rong et al., 2020] Yu Rong, Yatao Bian, Tingyang Xu, Thomas Brettin, Wibe Albert de Jong, Neeraj Kumar,
Martha S Head, Rick L. Stevens, Peter Nugent, Daniel A
Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou
Jacobson, and James B Brown. Spatial graph attention
Huang. Self-supervised graph transformer on large-scale
and curiosity-driven policy for antiviral drug discovery. In
molecular data. NeurIPS, 2020.
ICLR, 2022.
[Schwaller et al., 2019] Philippe Schwaller, Teodoro Laino,
[Xu et al., 2019] Keyulu Xu, Weihua Hu, Jure Leskovec, and
Théophile Gaudin, Peter Bolgar, Christopher A Hunter,
Costas Bekas, and Alpha A Lee. Molecular transformer: a Stefanie Jegelka. How powerful are graph neural net-
model for uncertainty-calibrated chemical reaction predic- works? In ICLR, 2019.
tion. ACS central science, 2019. [Xu et al., 2021] Minkai Xu, Wujie Wang, Shitong Luo,
[Shi et al., 2020] Chence Shi, Minkai Xu, Zhaocheng Zhu, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli,
Weinan Zhang, Ming Zhang, and Jian Tang. Graphaf: a and Jian Tang. An end-to-end framework for molecu-
flow-based autoregressive model for molecular graph gen- lar conformation generation via bilevel programming. In
eration. In ICLR, 2020. ICML, 2021.
[Shui and Karypis, 2020] Zeren Shui and George Karypis. [Yang et al., 2021a] Chaoqi Yang, Cao Xiao, Fenglong Ma,
Heterogeneous molecular graph neural networks for pre- Lucas Glass, and Jimeng Sun. Safedrug: Dual molecular
dicting molecule properties. In ICDM, 2020. graph encoders for recommending effective and safe drug
[Sterling and Irwin, 2015] Teague Sterling and John J Irwin. combinations. In IJCAI, 2021.
Zinc 15–ligand discovery for everyone. Journal of chemi- [Yang et al., 2021b] Shuwen Yang, Ziyao Li, Guojie Song,
cal information and modeling, 2015. and Lingsheng Cai. Deep molecular representation
learning via fusing physical and chemical information.
NeurIPS, 2021.
[You et al., 2020] Yuning You, Tianlong Chen, Yongduo Sui,
Ting Chen, Zhangyang Wang, and Yang Shen. Graph con-
trastive learning with augmentations. NeurIPS, 2020.
[Zhang et al., 2021] Zaixi Zhang, Qi Liu, Hao Wang,
Chengqiang Lu, and Chee-Kong Lee. Motif-based graph
self-supervised learning for molecular property prediction.
NeurIPS, 2021.

Coordination Chemistry: Complex Ion
No ratings yet
Coordination Chemistry: Complex Ion
11 pages
The Theory of Rate Processes. The Kinetics of Chemical Reactions, Viscosity, Diffusion and Electrochem
No ratings yet
The Theory of Rate Processes. The Kinetics of Chemical Reactions, Viscosity, Diffusion and Electrochem
310 pages
Prathmesh Project Thesis
No ratings yet
Prathmesh Project Thesis
36 pages
Allen Signature Plaza Doubt Schedule From 26TH Feb To 3RD March 2024
No ratings yet
Allen Signature Plaza Doubt Schedule From 26TH Feb To 3RD March 2024
17 pages
Master Thesis Cortecchia Tommaso
No ratings yet
Master Thesis Cortecchia Tommaso
84 pages
Introduction To Graph Neural Network and I Ts Applications 1714702672
No ratings yet
Introduction To Graph Neural Network and I Ts Applications 1714702672
84 pages
Science Important
No ratings yet
Science Important
98 pages
A Comprehensive Survey On Deep Graph Representation Learning
No ratings yet
A Comprehensive Survey On Deep Graph Representation Learning
85 pages
Learning Methods
No ratings yet
Learning Methods
70 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
MolFORMER Paper
No ratings yet
MolFORMER Paper
29 pages
A Systematic Survey of Chemical Pre-Trained Models
No ratings yet
A Systematic Survey of Chemical Pre-Trained Models
9 pages
GNN ML For Materials ACS 2019
No ratings yet
GNN ML For Materials ACS 2019
31 pages
Phys 3191
No ratings yet
Phys 3191
1 page
3DMolNet：分子结构的生成网络
No ratings yet
3DMolNet：分子结构的生成网络
9 pages
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning With Triplet Graph Transformers
No ratings yet
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning With Triplet Graph Transformers
24 pages
Molecular Representations in AI-driven Drug Discovery: A Review and Practical Guide
No ratings yet
Molecular Representations in AI-driven Drug Discovery: A Review and Practical Guide
22 pages
Representing Molecules As Random Walks Over Interpretable Grammars
No ratings yet
Representing Molecules As Random Walks Over Interpretable Grammars
30 pages
10 0893 02 8RP AFP tcm143-725674
No ratings yet
10 0893 02 8RP AFP tcm143-725674
20 pages
Moler
No ratings yet
Moler
22 pages
Chemical Reactions & Equations (CN)
No ratings yet
Chemical Reactions & Equations (CN)
20 pages
G AF: F - A M M G G: Raph A LOW Based Utoregressive Odel For Olecular Raph Eneration
No ratings yet
G AF: F - A M M G G: Raph A LOW Based Utoregressive Odel For Olecular Raph Eneration
18 pages
Constrained Graph Variational Autoencoders For Molecule Design
No ratings yet
Constrained Graph Variational Autoencoders For Molecule Design
13 pages
Xóa 3
No ratings yet
Xóa 3
23 pages
2022 - A Review of Molecular Representation in The Age of Machine Learning
No ratings yet
2022 - A Review of Molecular Representation in The Age of Machine Learning
19 pages
Georecon: Graph-Level Representation Learning For 3D Molecules Via Reconstruction-Based Pretraining
No ratings yet
Georecon: Graph-Level Representation Learning For 3D Molecules Via Reconstruction-Based Pretraining
14 pages
Noe 2020 Machine Learning For Molecular Simu
No ratings yet
Noe 2020 Machine Learning For Molecular Simu
32 pages
Bilodeau-Generative Models For Molecular Discovery-Recent Advances and challenges-article-2022-NA
No ratings yet
Bilodeau-Generative Models For Molecular Discovery-Recent Advances and challenges-article-2022-NA
17 pages
ArXiv-2022-YuyangWang-0-Molecular Contrastive Learning of Representations Via Graph Neural Networks
No ratings yet
ArXiv-2022-YuyangWang-0-Molecular Contrastive Learning of Representations Via Graph Neural Networks
19 pages
Moflow
No ratings yet
Moflow
10 pages
JT VAE JTVAE Junction Tree
No ratings yet
JT VAE JTVAE Junction Tree
17 pages
Geometric Deep Learning On Molecular Representations
No ratings yet
Geometric Deep Learning On Molecular Representations
16 pages
2309 09355v3
No ratings yet
2309 09355v3
16 pages
Yang Et Al 2019 Analyzing Learned Molecular Representations For Property Prediction
No ratings yet
Yang Et Al 2019 Analyzing Learned Molecular Representations For Property Prediction
19 pages
24 AIP Conference Rohit
No ratings yet
24 AIP Conference Rohit
9 pages
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
No ratings yet
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
24 pages
ArXiv 2020 Stanislaw 0 Molecule Attention Transformer
No ratings yet
ArXiv 2020 Stanislaw 0 Molecule Attention Transformer
16 pages
Solved Exercises Analytical Chemistry
No ratings yet
Solved Exercises Analytical Chemistry
11 pages
College Student Spent Monthly: and Where Their Allowance Mostly Spends
No ratings yet
College Student Spent Monthly: and Where Their Allowance Mostly Spends
15 pages
VJTNN
No ratings yet
VJTNN
13 pages
A Universal Framework For Accurate and Efficient Geometric Deep Learning of Molecular Systems
No ratings yet
A Universal Framework For Accurate and Efficient Geometric Deep Learning of Molecular Systems
17 pages
ML Techniques in Molecular Modeling Seminar All Complete
No ratings yet
ML Techniques in Molecular Modeling Seminar All Complete
14 pages
Designing and Managing Drilling Fluid: Plano, Texas, USA
No ratings yet
Designing and Managing Drilling Fluid: Plano, Texas, USA
22 pages
Deep Learning Methods For Molecular Representa
No ratings yet
Deep Learning Methods For Molecular Representa
13 pages
IJCAI 2023 Graph Sampling Based Meta Learning For Molecular Property Prediction
No ratings yet
IJCAI 2023 Graph Sampling Based Meta Learning For Molecular Property Prediction
12 pages
Lewis Dot Structures and Molecular Geometries: Dr. Walker
No ratings yet
Lewis Dot Structures and Molecular Geometries: Dr. Walker
35 pages
Molgan: An Implicit Generative Model For Small Molecular Graphs
No ratings yet
Molgan: An Implicit Generative Model For Small Molecular Graphs
13 pages
Molecular Representation Learning Via Heterogeneous Motif Graph Neural Networks
No ratings yet
Molecular Representation Learning Via Heterogeneous Motif Graph Neural Networks
7 pages
DNA Extraction
No ratings yet
DNA Extraction
11 pages
Schnet: A Continuous-Filter Convolutional Neural Network For Modeling Quantum Interactions
No ratings yet
Schnet: A Continuous-Filter Convolutional Neural Network For Modeling Quantum Interactions
11 pages
Learning Graph-Level Representation For Drug Discovery: Junying Li, Deng Cai, Xiaofei He
No ratings yet
Learning Graph-Level Representation For Drug Discovery: Junying Li, Deng Cai, Xiaofei He
7 pages
Enhancing Geometric Representations For Molecules With Equivariant Vector-Scalar Interactive Message Passing
No ratings yet
Enhancing Geometric Representations For Molecules With Equivariant Vector-Scalar Interactive Message Passing
13 pages
Final Chemistry Project Report 2024
No ratings yet
Final Chemistry Project Report 2024
10 pages
Junction Tree Variational Autoencoder For Molecular Graph Generation
No ratings yet
Junction Tree Variational Autoencoder For Molecular Graph Generation
10 pages
Articolo 7
No ratings yet
Articolo 7
9 pages
JournalofCheminformatics-2024-XiaofanZheng-1-A BERT-based Pretraining Model For Extracting Molecular Structural Information From A SMILES Sequence
No ratings yet
JournalofCheminformatics-2024-XiaofanZheng-1-A BERT-based Pretraining Model For Extracting Molecular Structural Information From A SMILES Sequence
9 pages
Product Sheet RIGGER 6288C 2
No ratings yet
Product Sheet RIGGER 6288C 2
1 page
Test-25 Mains F2 QP
No ratings yet
Test-25 Mains F2 QP
16 pages
DMF For Acyclovir: Index
No ratings yet
DMF For Acyclovir: Index
33 pages
Chemistry: Grade 2, Semester 1
No ratings yet
Chemistry: Grade 2, Semester 1
8 pages
Structure To Property: Chemical Element Embeddings and A Deep Learning Approach For Accurate Prediction of Chemical Properties
No ratings yet
Structure To Property: Chemical Element Embeddings and A Deep Learning Approach For Accurate Prediction of Chemical Properties
11 pages
Why Deep Models Often Cannot Beat Non-Deep Counterparts On Molecular Property Prediction?
No ratings yet
Why Deep Models Often Cannot Beat Non-Deep Counterparts On Molecular Property Prediction?
11 pages
Prediction of Vapor - Liquid Equilibria of Aqueous Systems in The Subcritical Range by Using The NRTL Equation
No ratings yet
Prediction of Vapor - Liquid Equilibria of Aqueous Systems in The Subcritical Range by Using The NRTL Equation
12 pages
Chem Prop
No ratings yet
Chem Prop
9 pages
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
VAE Molecular Graphs Niloy AAAI19
No ratings yet
VAE Molecular Graphs Niloy AAAI19
8 pages
Geometry-Enhanced Molecular Representation Learning For Property Prediction
No ratings yet
Geometry-Enhanced Molecular Representation Learning For Property Prediction
8 pages
PressureTest AquasignKISS 123 PDF
No ratings yet
PressureTest AquasignKISS 123 PDF
6 pages
A Merged Molecular Representation Learning For Molecular Properties Prediction With A Web Based Service
No ratings yet
A Merged Molecular Representation Learning For Molecular Properties Prediction With A Web Based Service
9 pages
Artificiall Intelligence Paper 4
No ratings yet
Artificiall Intelligence Paper 4
15 pages
Predicting Drug Solubilty Wtih Deep Learning
No ratings yet
Predicting Drug Solubilty Wtih Deep Learning
9 pages
C2019 12787GalvanicCPwithVCIforASTs
No ratings yet
C2019 12787GalvanicCPwithVCIforASTs
13 pages
2110-Transformers For Molecular Graph Generation-Conf
No ratings yet
2110-Transformers For Molecular Graph Generation-Conf
6 pages
ML Techniques in Molecular Modeling Seminar All Exaggerated
No ratings yet
ML Techniques in Molecular Modeling Seminar All Exaggerated
7 pages
A Self-Attention Based Message Passing Neural Netw
No ratings yet
A Self-Attention Based Message Passing Neural Netw
10 pages
Rubric Assignment
No ratings yet
Rubric Assignment
2 pages
Predicting Ground-State Molecular Properties Using The Quantum-Mechanical Dataset QM7-X
No ratings yet
Predicting Ground-State Molecular Properties Using The Quantum-Mechanical Dataset QM7-X
8 pages
Smi-Ted-11855 Smi Ted A Large Scale Fo
No ratings yet
Smi-Ted-11855 Smi Ted A Large Scale Fo
16 pages
Colligative Properties of Solutions
No ratings yet
Colligative Properties of Solutions
3 pages
Can Large Language Models Empower Molecular Property Prediction
No ratings yet
Can Large Language Models Empower Molecular Property Prediction
7 pages
SD-5543 Rev.03
No ratings yet
SD-5543 Rev.03
1 page
TDS Hydrazine Hydrate
No ratings yet
TDS Hydrazine Hydrate
1 page
2024 A Focus On Molecular Representation Learning
No ratings yet
2024 A Focus On Molecular Representation Learning
4 pages
Graph Neural Networks For Molecular Property Prediction
No ratings yet
Graph Neural Networks For Molecular Property Prediction
1 page
Bioengineering & Biomedical Science: DNA Biosensors-A Review
No ratings yet
Bioengineering & Biomedical Science: DNA Biosensors-A Review
5 pages
Chemical Equilibrium DPP 241129 182329
No ratings yet
Chemical Equilibrium DPP 241129 182329
4 pages
Kf 730英文版tds
No ratings yet
Kf 730英文版tds
2 pages
Chemical Functional Descriptors 1
No ratings yet
Chemical Functional Descriptors 1
6 pages
Diamond Brite: Exposed Aggregate Pool Finish
No ratings yet
Diamond Brite: Exposed Aggregate Pool Finish
5 pages

Graph-Based Molecular Representation Learning

Uploaded by

Graph-Based Molecular Representation Learning

Uploaded by

Graph-based Molecular Representation Learning

Abstract paper, we provide a systematic review of the progress in this

Molecular Graph (c) Molecular Structure Domain-knowledge

Graph Neural Network

(a) (b) (d)

GNN-FiLM[3] Property prediction MPNN / SL ICML’20 https://github.com/microsoft/tf-gnn-samples

GROVER[4] Property prediction GAT MS SSL + PT NeurIPS’20 https://github.com/tencent-ailab/grover

Pre-GNN[5] Property prediction GIN MS SSL +PT ICLR’20 https://github.com/snap-stanford/pretrain-gnns/

INFOGRAPH[6] Property prediction GNN MS SSL ICLR’20 https://github.com/fanyun-sun/InfoGraph

GraphCL[7] Property prediction GNN MS SSL+CL+PT NeurIPS’20 https://github.com/Shen-Lab/GraphCL

MoCL[8] Property prediction GIN MS SSL+CL+PT KDD’21 https://github.com/illidanlab/MoCL-DK

MGSSL[9] Property prediction GIN MS SSL+PT NeurIPS’21 https://github.com/zaixizhang/MGSSL

PhysChem[10] Property prediction MPNN DK SL NeurIPS’21 /

GeomGCL[12] Property prediction MPNN SS SSL AAAI’22 /

SphereNet[14] Dynamics simulation MPN SS SL ICLR’22 https://github.com/Open-Catalyst-Project/ocp

WLDN[15] Reacton prediction WLN / SL NeurIPS’17 https://github.com/wengong-jin/nips17-rexgen

MolR[16] Reaction prediction GNN DK SL ICLR’22 https://github.com/hwwang55/MolR

JT-VAE[17] Molecule generation MPNN MS SL ICLR’18 https://github.com/wengong-jin/icml18-jtnn

MoleculeChef[18] Molecule generation GGNN / SL NeurIPS’19 https://github.com/john-bradshaw/molecule-chef

GraphAF[19] Molecule generation R-GCN / SL ICLR’20 https://github.com/DeepGraphLearning/GraphAF

HierVAE[20] Molecule generation MPN MS SL IMLR’20 https://github.com/wengong-jin/hgraph2graph

MoLeR[21] Molecule generation GNN MS SL ICLR’22 /

AttSemiGAE[23] Drug-drug interaction GAE / SL IJCAI’18 https://github.com/matenure/mvGAE

KGNN[24] Drug-drug interaction GNN KG SL IJCAI’20 https://github.com/xzenglab/KGNN

ConfVAE[25] Conformation generation GNN SS SL ICML’21 https://github.com/MinkaiXu/ConfVAE-ICML21

RationaleRL[26] Drug discovery MPNN MS SL ICML’20 https://github.com/wengong-jin/multiobj-rationale

Dataset Category #Train #Dev #Test Reference Data Link

You might also like