Local Graph-motif Features Improve Gene Interaction Network Prediction
Local Graph-motif Features Improve Gene Interaction Network Prediction
{victor.leon, jordan.matelsky}@jhuapl.edu
ABSTRACT
Gene interaction networks specify how genes interact to produce an organism’s phenotype. These networks
are often incomplete due to absent or unobserved information. Predicting these missing links is critical for many
applications, including genome-wide association studies and phenotype prediction. Efforts have previously applied
graph neural networks (GNNs) to this missing-link prediction problem, but these techniques too have limitations
when the sparsity of the networks is very high. Here, we apply a novel feature engineering technique that uses
local graph motif incidence to enhance the feature set for variational graph autoencoders (VGAE). We compare
the performance of our technique against state-of-the-art approaches, and then progressively hide more and more
of the original graph edges. Our results show that VGAEs with our local-area motif prevalence (LAMP) features
outperform state-of-the-art node embeddings for a wide range of missing edges on both a benchmark and a
biological dataset. We also observe that this combined VGAE and LAMP technique has the potential to facilitate
the search for novel genetic interactions in an experimental adaptive sampling context with far fewer samples.
Improvements to gene interaction imputation can lower the barrier to new pharmaceutical and epidemiological
discoveries by revealing hidden gene interactions that steer the development of potential drug targets.
Introduction capability.
Recent work has demonstrated the promise of ma-
Gene interaction networks are graphical representations chine learning methods in addressing the challenge of
of how an organism’s genome interacts to produce its edge prediction in gene interaction networks. The most
traits, or phenotypes1 . The networks are generally rep- common methods use a variety of neural network ar-
resented as a graph G = ⟨V, E, A⟩ where V is a set of chitectures, such as deep variational autoencoders5, 6 .
vertices each representing a gene, (u, v) ∈ E is a set of Despite ongoing progress in gene interaction prediction,
edges where u, v ∈ V represent an interaction between u many techniques still fail to consider the rich, underly-
and v, and A is an optional set of attributes (such as gene ing graph structure of the data7 . More recently, methods
names or molecular metadata) that can be assigned to based on graph neural network (GNN) architectures have
either vertices or edges. In rare cases, the gene interac- been proposed for gene expression imputation. GNNs
tion network of a species is well-studied and well-known. perform feature extraction by exploring the local neigh-
More commonly, however, gene interactions are sparsely borhood of a node in the gene interaction network in
characterized, and thus the network suffers from many order to produce features for each vertex7, 8 . Specifi-
unknown edges2 . cally, variational graph autoencoders (VGAE) have been
Genome-wide association studies (GWAS) and phe- shown to help address this imputation task, likely due to
notype prediction are two of the many applications in the improved graph context of these models2, 8, 9 . Using
the biology domain which require complete gene inter- higher-order graph features and vertex or edge attributes
action networks. These techniques are now commonly is a promising approach to improve the performance of
applied to crop evaluation, epidemiology, and pharma- these models.
cology3, 4 . Many hopeful studies, however, are stymied In this work we propose a motif-based feature extrac-
by datasets where many or most of the edges denoting tion technique that uses the set of subgraph instances to
gene interactions are unknown. Thus, edge prediction which a vertex belongs to produce a unique embedding.
in gene interaction networks is a timely and important In our technique, we enumerate instances of selected
1
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
A M1 M2 M3 B
C D
M1 12 12 12
M2 8 9 9 9
Z
^
M, X M
M3 10 18 8 11 20 0 q (Z | M, X) p (M | Z)
Host graph
Figure 1. A visual overview of the LAMP algorithm. A. An example of three motifs used for LAMP feature
generation. These motifs were hand-selected to have minimal automorphisms. B. An example host network. This
network is undirected, though LAMP works on directed graphs as well as multigraphs. C. For a given vertex in the
host graph (black, left), local motif participation is enumerated. Because this vertex participates as the “antenna” of
the motif M2 in eight different ways, the first element of the feature vector (cyan square) is 8. These per-motif
feature vectors are concatenated and fed into the downstream graph neural network. D. A schematic of the VGAE
architecture, illustrating a graph enriched with LAMP features X prior to encoding in the encoder q.
subgraph motifs within the interaction graph and use techniques. Furthermore, our approach is applicable to a
these explicit subgraph counts to enhance the gene ex- wide range of graph-based machine learning problems
pression network node feature set (Fig. 1). Using the outside the field of gene interaction, including social
DotMotif network-motif search algorithm, we are able network analysis and chemical compound prediction.
to efficiently and precisely count the number of times a
given motif appears in the local neighborhood of each Background
vertex in the graph, in a highly parallelizable manner10 . Our proposed approach combines knowledge of gene in-
We evaluate our technique’s performance on a VGAE11 teraction networks, variational graph autoencoders, and
and discover that we perform at or above state-of-the art graph motif search. In this section we will introduce
in a link prediction task. We progressively remove edges these topics and comment on strategies for marrying
from the partial starting-graph and observe that our tech- these technologies to improve gene interaction imputa-
nique outperforms state-of-the-art for edge-imputation tion and edge imputation more broadly.
across a wide range of missing edges on both a bench-
mark (FB15k-237)12, 13 and a biological dataset of a Gene interaction networks
eukaroytic cell1 . This scalable approach is applicable Gene interaction networks are formalized as a graph
to graphs of various sizes, as the motif finding process G = ⟨V, E, A⟩, where nodes / vertices V represent genes,
is highly parallelizable and can be sparsely subsampled and edges (u ∈ V, v ∈ V ) ∈ E represent (undirected) in-
to improve performance, at the cost of feature accuracy. teractions between genes.1 These edges can represent
This work has the potential to facilitate the search for interactions of various types, including physical interac-
novel genetic interactions in an experimental adaptive tions, genetic interactions, and regulatory interactions.
sampling context with far fewer samples than competing Different types of interactions can be represented by
2/9
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
adding edge or vertex attributes A to the graph and the times a given motif appears in the local neighborhood of
library of motifs, though we do not address these in this each vertex in the graph. In GNNs, such an embedding
work. can be performed as a preprocessing transformation18–20 ,
Gene interaction graphs are generated through a va- or as a derived, learnable property itself21 .
riety of experimental techniques, including yeast two- DotMotif is a graph motif search algorithm that uses
hybrid assays, synthetic genetic array analysis, and other an induced monomorphism search algorithm to find mo-
analyses14, 15 . Due to the cost and complexity of these ex- tifs in a larger graph10 . The algorithm is highly paralleliz-
periments, gene interaction networks are often sparsely able and can be used to find motifs in large graphs, such
characterized, and thus contain many unknown-unknown as gene interaction networks, in a reasonable amount of
edges. Because these graphs are binary and undirected, time — especially if constraints are placed on the size or
it is difficult to draw a clear line between the case of attributes of the motifs to be found.
an unexplored or unobserved edge and the case of a Using the DotMotif algorithm, we can efficiently and
non-interacting pair of genes. For this reason, gene inter- precisely count the number of times a given motif ap-
action graph edge imputation is a challenging problem pears in the local neighborhood of each vertex in the
in lesser-studied organisms and in the case of large-scale graph. We will use this information to enhance the fea-
experiments. ture set for our VGAE and compare the performance of
our technique against node2vec and the Jaccard index,
Variational Graph Autoencoders progressively hiding more and more of the original graph
Graph variational autoencoders (VGAEs) are a class of edges.
graph neural network (GNN) that use a variational au-
toencoder (VAE) architecture to learn a low-dimensional Methods
representation of a graph11 and have been proposed for a Our technique comprises a novel feature engineering
variety of graph-based machine learning tasks, including technique and we show a practical use of this feature en-
link prediction, node classification, and graph classifi- gineering approach for missing-link prediction in a gene
cation7 . The basic architecture consists of an encoder interaction network. We first describe the underlying
that maps the graph into a low-dimensional latent space, algorithm for producing local-area motif participation
and a decoder that reconstructs the graph from the la- features, and then describe the training and validation
tent space. The intention is that this low-dimensional steps we used to compare this approach against the cur-
representation will capture the essential structure of the rent industry best-practices.
graph, and that the decoder will be able to reconstruct
the graph from this low-dimensional representation even Local-area Motif Participation
when some information has been lost. We want to generate a set of features that describe the
local graph neighborhood of a vertex in a concise but
Motifs and subgraph search maximally unique vector “fingerprint.” It is well-known
Like many other networks, such as social graphs or that subgraph monomorphisms well-describe the local
knowledge graphs, gene interaction networks are known properties of a graph10 and so we propose that a valuable
to contain recurring, self-similar subgraphs10, 16 . The feature vector to uniquely identify a vertex can be de-
subgraphs that occur with statistically significant fre- scribed by how that vertex participates in a small library
quency — known as motifs — are thought to be respon- of subgraphs. We call this feature set the local-area motif
sible for many of the network’s emergent or repeating participation (LAMP) of a vertex (Fig. 1) which can be
properties16 , and it is thus of interest to identify those used to uniquely embed vertices of a target graph.
motifs in gene interaction networks where they may cor- The first step of LAMP feature generation is the selec-
respond to biologically interpretable or therapeutically tion of motif library M, an unordered set of subgraphs
useful mechanisms. These graph motifs are not to be (Fig. 1A that will be counted in the host network. In
confused with genetic motifs, which are short repeat- general, this set should be curated to be a small set of
ing DNA or RNA sequences that are known to have a graphs for which no motif is a subgraph of any other
biological function17 . motif (or at least such cases are minimized, as they result
Motif enumeration can also be used to reduce the in a higher runtime cost without adding new informa-
dimensionality of a graph, by counting the number of tion), and for which automorphism symmetries are as
3/9
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
few as possible (as automorphisms will result in axes Algorithm 2 Pseudocode for the LAMP feature ex-
of the LAMP feature vector that are perfectly linearly traction algorithm. The for-loops can be parallelized
correlated and thus yield no additional information). In across vertices.
general the selected motifs should be simple (i.e., fewer Require: Graph G, Some small number of subgraphs,
edges per motif will result in faster runtime): Runtime “Motif library” m ∈ M
of subgraph monomorphism search scales exponentially 1: Initialize empty feature matrix X
with the number of edges in a motif. For this work, we 2: for each vertex v in G do
arbitrarily selected a cycle, a cycle with a connected 3: Initialize empty motif count vector C
edge to reduce automorphic symmetry, and a combina- 4: for each motif m in M do
tion of adjoining cycles (M1 , M2 , and M3 respectively in 5: for each vertex u in m do
Fig. 1A), although similar performance characteristics 6: Count occurrences of motif m where u ∼
=v
were found anecdotally using other combinations. 7: Append the count to C
Our proposed algorithm is described in Algorithm 8: end for
2. The algorithm starts with a host graph G and a motif 9: end for
library M. For each vertex v ∈ VG and each vertex u ∈ Vmi 10: Append C to X
in a motif mi ∈ M, we count the occurrences of the 11: end for
motif m and the number of times vertex v is mapped by
the monomorphism function to vertex u. The process
continues for each node in the graph. Note that this tary Materials of Costanzo et al.1 with a Pearson corre-
node-level motif search can be performed in parallel lation coefficient >0.2 indicating a genetic interaction.
for any size graph, meaning that while the subgraph
monomorphism algorithm is NP-hard, it can be run on a Training and evaluation
sufficiently small subset of the host graph that captures In this work, we compare the performance of a VGAE
only the local graph area surrounding a vertex, which trained with LAMP features (LAMP) against a VGAE
can in general reduce performance bottlenecks. trained with node2vec features (node2vec)22 and the Jac-
In this work, LAMP features were computed by card Index22, 23 . VGAE and node2vec are used since
searching for subgraph monomorphism instances of the they perform well compared to other state-of-the-art
library M in the host (dataset) graphs using the Gran- models2, 24–26 . The Jaccard index is included as it is a
dIso/DotMotif software suite.10 The proposed induced common heuristic used in link prediction tasks2, 23 . The
monomorphism search can be carried out either at the amount of edges hidden to the feature generating algo-
vertex-level (i.e., searching only in the local neighbor- rithms is varied by randomly removing various amounts
hood of the vertex as described above), and thus was of the networks’ edges, with the goal of studying the
parallelized across vertices. Alternatively, an exhaus- performance of the various models over a wide range
tive motif search can be run globally on the entire host of graph densities (Fig. 2). After generating features,
network and then post-processed. The latter technique the set of edges not used during feature generation is
is slightly faster, if hardware can support storing the split into a 80% train and 20% test split. Performance is
complete search in memory at once. The size- and time- evaluated with 3 random train-test splits.
complexity characteristics of the GrandIso algorithm are Performance is measured using area under the receiver
described in Matelsky et al., 202110 . operating characteristic curve (AUC) and average pre-
cision (AP). Both datasets are quite sparse, with few
Datasets edges between nodes. In the gene-interaction network,
LAMP, node2vec, and the Jaccard index are compared this can both represent the absence of a measurement or
on the benchmark FB15k-237 dataset, which has 14,541 the confirmed absence of an interaction between genes.
nodes and 272,115 edges12, 13 . Then, we measure the Since we are interested in predicting genetic interactions
models’ performance on a biologically relevant, com- (positive samples), AP is the most relevant metric in
prehensive genetic interaction network of a eukaryotic this data-imbalanced case2, 24, 25 . AUC also is included
cell with 5,362 genes (nodes) and 25,405 interactions for completeness and ease of comparison with previous
(edges)1 . Herein, we study undirected graphs. We follow literature on link prediction. We evaluate model AP and
the data cleaning procedure described in the Supplemen- AUC on balanced datasets, which has been noted in re-
4/9
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
Results
cent literature to over-estimate model performance27 .
In Model evaluation
a balanced dataset, the train and test set have 1 non-edge VGAE with LAMP features (LAMP) outperforms both
for each edge. Jaccard Index and VGAE with node2vec features
More concretely, 10% missing edges during feature (node2vec) for a wide range of missing edges for feature
generation means 90% of edges are used to generate generation and training set sizes on the benchmark and
features (i.e. Jaccard indices, node2vec features, and biological datasets (Tables 1 and 2). The Jaccard index
LAMP features) and the remaining 10% of edges are performs much worse than the other two methods across
used to train and evaluate the models (e.g. VGAE model). all proportions of edges missing and datasets.
This means that, for the FB15k-237 dataset with approx- Counterintuitively, LAMP and node2vec performance
imately 272k total edges, the 50% missing edges condi- is observed to increase with % edges missing for fea-
tion during feature generation has 272k edges available ture generation (e.g. from 10 to 99% missing edges for
for train and test of the VGAE (136k edges and 136k FB15k-237 and from 50 to 90% edges missing for the
non-edges), while the 99% missing edges condition has biological dataset). This is an artifact of the interaction
538k edges available for train and test of the VGAE between the amount of data available for feature gener-
(269k edges and 269k non-edge). The amount of edges ation and training set size on model performance. To
available for training of the VGAE model sets the lower isolate the effect of training set size on model perfor-
bound on % missing edges for each dataset. For exam- mance, we vary train set size in the biological dataset for
ple, we do not evaluate 10% missing edges condition for a constant 90% edges missing in Figure 3. As train set
the genetic dataset because 1k edges is not enough to size increases LAMP model performance significantly
consistently train and evaluate a VGAE. We also study increases, while node2vec performance stays relatively
the effect of train and test set size on VGAE model constant. LAMP features perform better than node2vec
performance by performing a study with constant 90% at larger train set sizes, implying that LAMP features are
edges missing for feature generation while varying the more informative than node2vec features. We hypoth-
train-test set size. esize that LAMP’s worse performance at smaller train
All VGAE models are trained with the ADAM opti- set sizes is due to having relatively few and sparser fea-
mizer28 . For the FB15k-237 dataset with node2vec, we tures vs node2vec (13 for LAMP vs 128 for node2vec),
observe the best performance with 200 epochs and learn- making the LAMP features we generated using 3 motifs
ing rate η = 1 × 10−1 for all missing edge percentages. more difficult to learn. Further studies on the effect of
5/9
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
Table 1. Performance of the three link prediction models on the benchmark FB15k-237 dataset. For a wide range
of % edges missing for feature generation and training set sizes, LAMP has the highest AP and AUC. Clearly, as
training set size increases, LAMP performance increases more significantly than the negative effect of higher %
missing edges. At 10% missing edges, node2vec outperforms LAMP. We hypothesize that this is due to the smaller
training set size and the relatively small motifs chose (3 to 7 edges), which may be too small to capture informative
larger motifs in the graph with less missing edges. Each % edges missing is a unique subsample of edges from the
overall graph. VGAE performance metrics are measured for 3 different train-test splits of graph edges with results
presented as mean ± standard deviation. Jaccard index is deterministic for a given graph.
Table 2. Performance of the three link prediction models on the comprehensive genetic interaction network of a
eukaryotic cell. We observe that LAMP outperforms the other two models for a wide range of % missing edges for
feature generation and training set sizes (i.e. 50 and 90 percent missing). This demonstrates that choosing motifs
can be more informative than relying on node2vec’s random walk based featurization on a biologically relevant
graph. On the higher ends (99% edges missing), node2vec features outperform LAMP features. The LAMP method
is likely sensitive to the motifs chosen (Figure 1). At 99% edges missing, only around 250 edges are available for
LAMP feature generation. In this case, there may not be enough edges to form the motifs selected for LAMP.
node2vec is set with default 128 steps random walk, which can still take advantage of the lower number of missing
edges. Each % edges missing is a unique subsample of edges from the overall graph. VGAE performance metrics
are measured for 3 different train-test splits of graph edges with results presented as mean ± standard deviation.
Jaccard index is deterministic for a given graph.
motif choice during LAMP feature generation could help % edges missing, but LAMP has a larger performance
improve LAMP feature performance at smaller train set drop than node2vec. We hypothesize that this bigger
sizes. drop in performance for LAMP is due to the motifs
LAMP performance also drops below node2vec for selected having 3 to 7 edges (Figure 1). At 99% missing
99% missing case on the biological graph. Here, both edges in the biological graph, we only expect around 250
models show a reduction in performance relative to less edges to be available for LAMP feature generation. At
6/9
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
Case study
The AUCs and APs reported in Table 2 and discussed
thus far are useful to compare between predictive mod-
els since they are average model performance statistics.
Here, we demonstrate how LAMP can significantly in-
crease interaction sampling efficiency in practice using
the 90% missing case of the biological genetic interac-
tion network.
The activations output by LAMP for each edge in
Figure 4. Precision-recall curve for LAMP with 90%
the balanced test set of 50% positive (4k samples, edge
missing edges in the biological dataset during feature
present) and 50% negative (4k samples, edge not present)
generation.
are shown in Figure 5. In practice, a threshold activation
is set by the user of the model to determine what the
model predicts to be an edge or non-edge. For example,
setting an activation threshold of 0.8 means that any edge
7/9
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
8/9
bioRxiv preprint doi: https://doi.org/10.1101/2025.02.21.639582; this version posted February 27, 2025. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
4. Fang, G. et al. Discovering genetic interactions bridging 20. Wang, G. et al. Motif-based graph attentional neural
pathways in genome-wide association studies. Nature network for web service recommendation. Knowledge-
communications 10, 4274 (2019). Based Systems 269, 110512 (2023).
5. Chen, J. & Shi, X. Sparse convolutional denoising autoen- 21. Besta, M. et al. Motif prediction with graph neural net-
coders for genotype imputation. Genes 10, 652 (2019). works. In Proceedings of the 28th ACM SIGKDD Confer-
6. An, U. et al. Deep learning-based phenotype imputa- ence on Knowledge Discovery and Data Mining, 35–45
tion on population-scale biobank data increases genetic (2022).
discoveries. bioRxiv 2022–08 (2022). 22. Grover, A. & Leskovec, J. node2vec: Scalable feature
learning for networks. CoRR abs/1607.00653 (2016).
7. Zhou, J. et al. Graph neural networks: A review of
1607.00653.
methods and applications. AI Open 1, 57–81, DOI: 10.
1016/j.aiopen.2021.01.001 (2020). 23. Liben-Nowell, D. & Kleinberg, J. The link prediction
problem for social networks. In Proceedings of the
8. Lazaros, K., Koumadorakis, D. E., Vlamos, P. & Vrahatis,
Twelfth International Conference on Information and
A. G. Graph neural network approaches for single-cell
Knowledge Management, CIKM ’03, 556–559, DOI:
data: a recent overview. Neural Computing Applica-
10.1145/956863.956972 (Association for Computing Ma-
tions 36, 9963–9987, DOI: 10.1007/s00521-024-09662-6
chinery, New York, NY, USA, 2003).
(2024).
24. Zhu, Z., Zhang, Z., Xhonneux, L.-P. & Tang, J. Neural
9. Wang, J. et al. scgnn is a novel graph neural network
bellman-ford networks: A general graph neural network
framework for single-cell rna-seq analyses. Nature com-
framework for link prediction (2022). 2106.06935.
munications 12, 1882 (2021).
25. Delarue, S., Bonald, T. & Viard, T. Link Prediction With-
10. Matelsky, J. K. et al. DotMotif: an open-source out Learning. In Proceeding of the European Conference
tool for connectome subgraph isomorphism search and on Artificial Intelligence (Santiago de compostela, Gali-
graph queries. Scientific Reports 11, DOI: 10.1038/ cia, Spain, 2024).
s41598-021-91025-5 (2021).
26. Hu, W. et al. Open graph benchmark: Datasets for ma-
11. Kipf, T. N. & Welling, M. Variational graph auto- chine learning on graphs (2021). 2005.00687.
encoders (2016). 1611.07308.
27. Mattos, J., Huang, Z., Kosan, M., Singh, A. & Silva,
12. Toutanova, K. & Chen, D. Observed versus latent features A. Attribute-enhanced similarity ranking for sparse link
for knowledge base and text inference. In Proceedings prediction (2024). 2412.00261.
of the 3rd Workshop on Continuous Vector Space Models
28. Kingma, D. P. & Ba, J. Adam: A method for stochastic
and Their Compositionality, 57–66, DOI: 10.18653/v1/
optimization (2017). 1412.6980.
W15-4007 (Association for Computational Linguistics,
Beijing, China, 2015). 29. Fey, M. & Lenssen, J. E. Fast graph representation learn-
ing with PyTorch Geometric. In ICLR Workshop on Rep-
13. Schlichtkrull, M. et al. Modeling relational data with
resentation Learning on Graphs and Manifolds (2019).
graph convolutional networks (2017). 1703.06103.
30. Hagberg, A. A., Schult, D. A. & Swart, P. J. Explor-
14. Brückner, A., Polge, C., Lentze, N., Auerbach, D. & ing network structure, dynamics, and function using net-
Schlattner, U. Yeast two-hybrid, a powerful tool for sys- workx. In Varoquaux, G., Vaught, T. & Millman, J. (eds.)
tems biology. International journal molecular sciences Proceedings of the 7th Python in Science Conference, 11
10, 2763–2788 (2009). – 15 (Pasadena, CA USA, 2008).
15. Bebek, G. Identifying gene interaction networks. Stat.
human genetics: methods protocols 483–494 (2012). Acknowledgements
16. Milo, R. et al. Network motifs: Simple building blocks
We gratefully acknowledge the support of Johns Hopkins
of complex networks. Science 298, 824–827, DOI: 10.
University Applied Physics Laboratory internal research and
1126/science.298.5594.824 (2002).
development funding which enabled this work.
17. Hertz, G. Z. & Stormo, G. D. Identifying dna and pro-
tein patterns with statistically significant alignments of
Code and data availability
multiple sequences. Bioinformatics (Oxford, England)
15, 563–577 (1999). All code produced for this study will be made available upon
publication. DotMotif is available at https://github.
18. Wang, B., Cheng, L., Sheng, J., Hou, Z. & Chang, Y.
com/aplbrain/dotmotif. A reference implementa-
Graph convolutional networks fusing motif-structure in-
tion of GrandIso is available at https://github.com/
formation. Scientific Reports 12, 10735 (2022).
aplbrain/grandiso-networkx.
19. Chen, X. et al. Motif graph neural network. arXiv
preprint arXiv:2112.14900 (2021).
9/9